You are on page 1of 4

SAMPLE RESUME 5

I’m a data engineer with 8+ years of software development experience having strong fundamentals in
clean
code, design principles of software systems. I design and architect data pipelines using state of the art
data
engineering techniques using technologies like Spark, Kafka, Apache Airflow, Databricks, Cloud (AWS,
Azure, Snowflake, GCP) to enable seamless data flow for teams. Along with this, I also have experience
in
data cleansing, building ML Models, realizing AI strategy from design to code using Pandas, Numpy,
Flask, Scikitearn, NLP techniques. For visualization, I use Tableau, PowerBI, Metabase and any of the
Python visualization libraries.
Apart from this, I was a Professor at Centennial College & Montreal College of Information Technology
teaching Advanced Databases, Python, Tableau, Software Development to students.
Skillset:

Work Experience

Senior Data Engineer


ZayZoon - Present)
August 2022 to August 2022
Description: Working in
Tech Stack: Python, Spark, Redshift, Flask, Pandas, NumPy
Responsibilities:
- Setting up the infrastructure for the new workflow orchestrator – Prefect 2.0
- Maintaining the data warehouse to handle the loads and segregate data
- Creating data models with right dimension, facts by following right design principles
- Creating batch ETL load pipelines to synch data from multiple vendors using API into AWS Redshift
- Improving performance of the queries which were heavy on the infrastructure
- Lead the team and taught best practices in terms of understanding DAGs in Spark architecture along
with
making changes to existing ETL batch pipelines
- Designed and built the architecture for MLOps after validating the idea with the VPs to use
existing infrastructure in deploying the models
- Improved the Risk performance model and fine tuned it to predict with more than 75% accuracy
- Writing unit tests to ensure code security of the ETL pipelines and the APIs
- Mentor junior developers in the team to about the best practices in writing code using Design
Patterns, TDD
(Test Driven Development)
- Built dashboards in Metabase that serve insights to the RevOps, Sales teams about their current and
furture
states of work
- Setup Confluence docs for best practices of the Data Engineering that act as founding blocks since
I’m building out the team

Professor
Centennial College
January 2022 to April
2022
- Advanced Databases
- Software Development Project

Professor
Montreal College of Information Technology
November 2019 to December 2021
- Data Analysis with Tableau (Tableau Desktop)
- Python (Django & Flask)

Consultant Data Engineer


Next Pathway Inc
July 2020 to June 2021
Project: Data Integration and Reporting
Tech Stack: Python, Spark, Snowflake, Azure, GCP, Flask, Pandas
Projects Delivered, Clientele:
- Big Lots
- Premera (Snowlfake Client)
- Wallgreens (Accenture Client)
- Scotia Bank
- Rogers (Slalom
Client) Responsibilities:
- Develop ETL pipelines for the client to automatically migrate from source data warehouse
toSnowflake
- Writing Python scripts to verify the data models in the source & target code
- Assessing the client needs to include Spark & Databricks pipeline in the data workflows
- Presenting the design and final code functionalities to the client after finishing the project
- Performing load tests & unit tests to see any bottlenecks and chart a plan to resolve them
- Built Data Pipelines in Databricks with Azure Container, ADF architecture
- Mentor junior developers in the team to about the best practices in writing code using Design
Patterns, TDD
(Test Driven Development)
- Design and develop POCs for new client projects based on the requirements and present them prior
to the
implementation
- Implement the approved design of the ETL pipeline using the tech stack in the timelines
- Managed & tuned existing Airflow workflows in client environments
- Migrated the existing on-prem data pipeline to GCP using Cloud Dataflow to enable ETL for
streaming & batch processing

Data Developer
Montreal College of Information Technology
July 2020 to June 2021
Project: Data Integration and Reporting
Description: Data among different departments – Academics, Finance, Marketing need to be
communicated
with seamless flow along with respective dashboards to take decisions effectively.
Tech Stack: Python (Flask, Scikit Learn, PySpark) | Jupyter | Hadoop, HDFS
Responsibilities:
- Created ETL pipeline to synchronize data flow across academic, marketing, finance departments
and stored it in a destination database
- Created RESTful service end points using Flask to make data accessible across departments
- Developed Tableau dashboards for respective departments to learn more about the KPIs
- Built Django site with access to internal course reviews and modelled the review data
- Setup Hadoop, HDFS, Hive in-house for the institution to utilize for academic & administrative
purposes
- Created workflows to synchronize jobs using Airflow to migrate data from various systems

Data Scientist
Bombardier Transport
January 2020 to March 2020
Project: Applying Predictive Analytics towards maintenance of Fleets of Train (Aventra | NAT )
Description: There’s huge fleet of trains which generate sensor data across many cities. Predictive
Analytical solutions are catered to those teams to help them understand and gain insights from that
valuable
data. Later shall be used by Engineering teams towards maintenance.
Tech Stack: Python (Scikit Learn, Pyplot, PySpark) | Jupyter Notebook | Azure
Responsibilities:
- Automated Pre-processing of data using pandas in python to calculate thresholds for signals that
are highly correlate
- Scripts to extract the data from large csv files using PySpark
- Built Airflow jobs from scratch to migrate on-prem data to cloud
- Implementing best software engineering practices to clean the code to prep it for analysis
- Using best graphical methods to analyze time-series data to identify patterns & correlations with
Plotly & forecasting models
- Design an architecture for a pipeline to consume data from engineering teams and process it for
analysis

Academic Assistant – MCIT


March 2019 to January 2020
Project: Student CRM for the college including different data input points
including Tech Stack: Python (Scikit Learn, Seaborn) | Jupyter | SQL | Apace
Airflow Responsibilities:
- Designed the flow of student data across departments – (Academics, Finance, IT)
- Wrote backend scripts in SQL to generate custom reports in the CRM
- Analyzed course reviews collected from students using Python stack
- Created apache

Sr. Software Engineer


CDK Global
October 2014 to September 2018
Projects: ‘New Lot Intelligence’ and ‘Used Lot Intelligence’
Description: Innovative solutions in form of both websites and apps that help dealers in pricing new
and used cars separately. This helps in easy inventory management.
Tech Stack: C#, SQL, AngularJS, Unit tests, HTML, Bootstrap, Python Scripting, Bamboo, Ansible
Responsibilities:
- Was instrumental in redesigning the web pages to improve the code readability and reusability and
better
maintenance of the application by writing clean code following Test Driven Development
- Created APIs using C# to provide end points for consumption in other projects while integration
with third party application in the company with Unit Tests
- Optimized the Dashboard performance to be below 4 seconds by tuning the API & SQL queries
- Reviewed huge sets of data in SQL to give insights about frequently used reports by writing efficient
and optimized SQL queries

Scrum Master
CDK Global
November 2015 to December 2017
Responsibilities:
- Facilitated Scrum Collaborations and coached Teams in Project Management Best Practices
- Created Kanban boards for releases and bug fixes for smooth releasing after sprints

Education
Montreal College of Information Technology - Montréal, QC
March 2019 to March 2020

Graduate Certificate in Data Science in INSAID


International School of AI & Data Science
November 2018

B.Tech in Computer Science and Engineering


Jawaharlal University - Hyderabad, Telangana
June 2014

Skills
• Languages: Python, JavaScript
Front End: VueJs
Databases: SQL, Hive, MongoDB, DynamoDB, CosmosDB, PostGtes
Packages: PySpark, Kafka, Plotly, Flask, Scitkitlearn, Pandas
Streaming: Kafka, Spark
Data Warehouse: Redshift, Snowflake
Data Migration: AWS Data Pipeline, Google Cloud Data Flow
Workflow Orchestration: Prefect, Apache Airflow
Cloud: AWS, Azure, Snowflake, Databricks, GCP
Machine Learning: Supervised, Unsupervised, NLP

You might also like