- Designing a fast, clean and responsive platform for users to easily and enjoyably work on
- Adding a RESTful API for integration with the Zymewire platform
- Scaling the app to allow for multiple users to simultaneously be working on the same task
- Auto-accepting submitted tasks when the content was similar enough for what we needed.
Ange Melhuish
Senior Data Engineer
Studying cyber security part-time and building data pipelines at Prodigy full time.
Prodigy Education - Senior Data Engineer
Improving the reliability of Prodigy’s data pipelines and data lakehouse.
• Currently embedded in a product team working to expose modelled data to users
• Worked across teams to build and scale an event pipeline to ingest thousands of messages per second.
• Developed processes for deploying ml pipelines using MLflow, implemented PII redaction, and added automation tools
• Mentoring, training and on-boarding new teammates.
• Expanding my understanding of data systems, reliability, data warehousing/data lakes
Technologies: DBT, SQL, Python, Spark, AWS, Databricks, Docker, Airflow, Terraform, Datadog, Kubernetes
Skills Developed :
Data pipeline scheduling and automation
DBT model development and optimization
Automating backfill processes
Updating and testing packages
Developing production ML pipeline processes
Shopify Data Platform Engineer
Member of the Starscream Runtime - Batch Transformations team, maintaining, scaling and upgrading Shopify’s data ETL platform.
- Currently adding tools to assist in the migration of 3000+ batch transformation flows from Apache Oozie to Airflow.
- Lead the migration of 3000+ batch transformation flows from unmanaged Hadoop YARN to Google's Dataproc and coordinating with data science teams. Updated coordinator tools to ensure a smooth and seamless transition for our data scientists.
- Helped to convert the core part of the platform to be python 2 and python 3 compatible.
- Mentoring, training and on-boarding new teammates.
- Expanding my understanding of data systems, reliability, data warehousing/data lakes
- Technologies: Python, Spark, Hadoop, GCP, Dataproc, Docker, Oozie, Airflow, Terraform, Datadog, SQL, Kubernetes
- Skills Developed :
- Data pipeline scheduling and automation
- Continuous deployment pipeline maintenance
- Troubleshooting and mitigating incidents involving data corruption
- Refactoring and updating legacy code
- Test-driven software development
Lead Engineer
Data Team Lead, Engineer, Developer Data
- Major challenges :
- seeking out new sources of data for finding and categorizing companies
- managing how we tracked 4000+ companies in their daily activities, scraping the latest news, seeking out their future plans to start clinical trials, attend conferences, hire new talent, release results, apply for funding, etc
- designing data models and pipeline processes for the acquisition of all the above collected data
- automating the processes to reduce the need for human intervention and review
- creating machine learning models to assist in the categorization process
- Technologies: Ruby, Ruby on Rails, React.js, Javascript, Mongodb
- Skills Developed :
- test-driven software development, with weekly code reviews
- extracting, transforming and loading (ETL) large datasets
- designing and automating the loading of new datasets
- defining metrics to help the team stay focused on projects aligned with the business
- experimenting with processes to help improve the team’s workflow
- growing a team
- managing team member’s personal growthT