Ange Melhuish

Senior Data Engineer

Studying cyber security part-time and building data pipelines at Prodigy full time.

Prodigy Education - Senior Data Engineer

Senior Data Engineer · Prodigy Education 

Improving the reliability of Prodigy’s data pipelines and data lakehouse.

• Currently embedded in a product team working to expose modelled data to users
• Worked across teams to build and scale an event pipeline to ingest thousands of messages per second.
• Developed processes for deploying ml pipelines using MLflow, implemented PII redaction, and added automation tools
• Mentoring, training and on-boarding new teammates.
• Expanding my understanding of data systems, reliability, data warehousing/data lakes

Technologies: DBT, SQL, Python, Spark, AWS, Databricks, Docker, Airflow, Terraform, Datadog, Kubernetes

Skills Developed :
Data pipeline scheduling and automation
DBT model development and optimization
Automating backfill processes
Updating and testing packages
Developing production ML pipeline processes

Shopify Data Platform Engineer

 

Data Developer · 

Member of the Starscream Runtime - Batch Transformations team, maintaining, scaling and upgrading Shopify’s data ETL platform.

  • Currently adding tools to assist in the migration of 3000+ batch transformation flows from Apache Oozie to Airflow. 
  • Lead the migration of 3000+ batch transformation flows from unmanaged Hadoop YARN to Google's Dataproc and coordinating with data science teams. Updated coordinator tools to ensure a smooth and seamless transition for our data scientists.
  • Helped to convert the core part of the platform to be python 2 and python 3 compatible.
  • Mentoring, training and on-boarding new teammates.
  • Expanding my understanding of data systems, reliability, data warehousing/data lakes
  • Technologies: Python, Spark, Hadoop, GCP, Dataproc, Docker, Oozie, Airflow, Terraform, Datadog, SQL, Kubernetes
  • Skills Developed :
    • Data pipeline scheduling and automation
    • Continuous deployment pipeline maintenance
    • Troubleshooting and mitigating incidents involving data corruption
    • Refactoring and updating legacy code
    • Test-driven software development

Lead Engineer

Developer · 
  
Technologies: Node.js, React.js, Mongodb, RESTful API
Major Challenges:
  • Designing a fast, clean and responsive platform for users to easily and enjoyably work on
  • Adding a RESTful API for integration with the Zymewire platform
  • Scaling the app to allow for multiple users to simultaneously be working on the same task
  • Auto-accepting submitted tasks when the content was similar enough for what we needed.

Data Team Lead, Engineer, Developer Data

Over the 4 years, I wore many hats, but I'm most proud of establishing, growing and managing the data team that was responsible for building, maintaining and scaling the company’s data pipelines.
  • Major challenges :
    • seeking out new sources of data for finding and categorizing companies
    • managing how we tracked 4000+ companies in their daily activities, scraping the latest news, seeking out their future plans to start clinical trials, attend conferences, hire new talent, release results, apply for funding, etc
    • designing data models and pipeline processes for the acquisition of all the above collected data
    • automating the processes to reduce the need for human intervention and review
    • creating machine learning models to assist in the categorization process
  • Technologies: Ruby, Ruby on Rails, React.js, Javascript, Mongodb
  • Skills Developed :
    • test-driven software development, with weekly code reviews
    • extracting, transforming and loading (ETL) large datasets
    • designing and automating the loading of new datasets
    • defining metrics to help the team stay focused on projects aligned with the business
    • experimenting with processes to help improve the team’s workflow
    • growing a team
    • managing team member’s personal growthT