Paulo Silva
Verified Expert in Engineering
Statistics Developer
保罗是一名数据科学家,在多个业务领域拥有四年的经验. With Python as the main stack, he worked with numerous machine learning algorithms, data analysis, visualization, and hypothesis testing such as A/B, statistical analysis, and even data engineering work. 保罗有工程背景,解决问题对他来说很自然.
Portfolio
Experience
Availability
Preferred Environment
Python, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jupyter Notebook, PyCharm, Visual Studio Code (VS Code)
The most amazing...
...我所做的就是用数据科学来减少大学辍学的学生人数.
Work Experience
Data Scientist
Oko Exchange Inc.
- Used OpenAI's Large Language Models (LLMs) APIs (GPT-3.Turbo和GPT-4)将数据从非结构化文本解析为结构化格式.
- 利用Azure文档智能(以前的Azure表单识别器)从文件中提取文本,并利用LangChain和矢量存储来利用llm处理大型文本文件.
- 使用AWS Lambda提供模型服务,使用Amazon S3 (AWS S3)存储文件.
Data Scientist
Quero Quitar
- 创建预测模型,指导债务回收机构向谁求助.
- Built models to direct the chance to contact the debtor, making the company's approaches more effective.
- Migrated from Pandas to Databricks to process large heaps of data.
Data Engineer/Analyst for Qlik Sense
CBD Industries, LLC
- 在Qlik Sense内部开发了ETL(提取-转换-加载)架构.
- Integrated multiple third-party APIs into Qlik Sense.
- Utilized AWS services to scale the solution for a big data context.
Data Scientist
Limehome
- Developed a dynamic pricing algorithm for a hotel chain.
- Performed ad-hoc data analysis to help drive the business forward.
- Helped data analysts with their research to find inconsistencies, give feedback and provide overall technical support.
Data Scientist
Chama
- 为企业开发了一种动态定价算法,以连接瓶装天然气的买家和卖家.
- Helped with experiments to roll out new features in a data-driven way.
- 与公司的分析部门合作,传播数据驱动的文化.
Data Scientist
Zup
- 帮助公司识别在旅行或与客户会面时滥用食品支出的员工.
- 协助公司找出对客户收费不正确的领导, causing money loss.
- 创建了一个模型,帮助运营部门了解他们是否有足够的电脑供新员工使用, based on past hiring behavior.
Data Scientist
CRM Educacional
- 开发了一个领先的评分模型,帮助私立大学获得更多的学生.
- 创建了一个模型来识别学生放弃大学的风险,并提供了避免这种风险的必要步骤.
- 改进了公司数据管道的工作,因为它是为小数据构建的, which became unfeasible.
Data Scientist
Maxtrack
- 开发了一个模型,根据跟踪器数据和之前已知的用户行为来预测汽车是否被盗.
- 使用Spark改进了公司的数据管道,因为之前的管道对于处理的数据量不再可行.
- 分析数据以确定一些先前开发的模型是否如预期的那样工作.
Data Scientist
4hoofs
- Created a model to predict a cow milk yield in a day.
- 根据牛奶生产商的公开数据,帮助公司找到新的营销场所.
- Developed an IoT device to monitor the milk quality in a tank.
Experience
College Dropout Prediction
Students drop out mainly because they face financial hardships, live too far from the campus, can't manage to work and study simultaneously, or even struggle academically and think it's not worth the effort.
辍学对学校来说是一个大问题,因为学校将失去从这些学生身上获得的多年收入. 因此,从长远来看,大学提供短期激励措施来留住学生是有益的.
With that in mind, 我开发了一个机器学习模型来识别辍学的风险和原因. 最后,我提出了学院可以提供什么激励措施来留住学生的见解.
Car-theft Prediction Using Tracking Data
The project I worked on revolved around tracking the users' data, 使用一个机器学习模型建立用户的典型行为, 然后使用另一种机器学习模型预测汽车是否被盗. 目标是在用户报告这些事件之前预测这些事件,以加快取回汽车的过程.
For this project, I used Python as the programming language. For the data processing part, 我们在Databricks平台上使用了Apache Spark,因为它有很多数据, 单台机器上的处理对于需求来说太慢了(时间敏感)。. The historical data storage was on a MongoDB database, and the API we used to serve the model was Flask.
Dynamic Pricing to Sell Cooking Gas Bottles
However, once the gas runs out while a person is cooking, 他们想要一个新罐头尽快送到他们家,因为没有它可能会毁了他们的饭.
考虑到这一点,该公司的业务通过一款移动应用程序将供应商和客户联系起来. 问题是这些供应商不习惯激烈的竞争,对我们很不满意.
To calm the situation, 我们开发了一种动态定价算法,使用机器学习将价格维持在供应商的可持续水平,同时对客户也有利.
For this project, I used Python for the programming part, Flask to serve my model, and Docker to containerize the model with the API.
Skills
Languages
Python, SQL, Python 3, C, R, JavaScript, C#
Libraries/APIs
Pandas, REST APIs, XGBoost, NumPy, TensorFlow
Tools
BigQuery, Tableau, GitHub, PyCharm, Git, Postman, Amazon SageMaker, Microsoft Power BI, Pytest, Qlik Sense, Azure ML Studio, Apache Airflow
Paradigms
数据科学、ETL、数据库设计、Azure DevOps、商业智能(BI)
Platforms
Jupyter Notebook, Google Cloud Platform (GCP), Visual Studio Code (VS Code), Amazon Web Services (AWS), Docker, Azure, Android, Kubernetes, AWS Lambda, Databricks
Storage
Data Pipelines, Databases, SQL Server 2016, MySQL, Redis, Relational Databases, Data Integration, MongoDB, PostgreSQL, Amazon S3 (AWS S3), Data Lakes
Other
Machine Learning, Data Analysis, Data Visualization, Software Development, Statistics, Algorithms, API Integration, Analytics, Data, ETL Tools, Data Reporting, Data Analytics, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, Statistical Analysis, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Artificial Intelligence (AI), OpenAI, ChatGPT, Back-end, APIs, Data Engineering, Data Mining, Signal Processing, Hospitality, Google BigQuery, Data Warehousing, Cloud, Web Development, Large Language Models (LLMs), Industrial IT, Google Data Studio, Natural Language Processing (NLP), Web Scraping, Azure Data Factory, Dremio, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, OpenAI GPT-3 API
Frameworks
Flask, Apache Spark, Streamlit, Spark, React Native, Swagger
Education
Bachelor's Degree in Control and Automation Engineering
米纳斯吉拉斯州联邦大学-贝洛奥里藏特,米纳斯吉拉斯州,巴西
Master's Degree in Control Engineering
Lund University - Lund, Skane, Sweden
Certifications
Natural Language Processing Nanodegree
Udacity
How to Work with Toptal
在数小时内,而不是数周或数月,我们的网络将为您直接匹配全球行业专家.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring