You are on page 1of 8

Nagasri Harsh .

M
Sr. Data Engineer
http://linkedin.com/in/nagasrihk
nagasri619@gmail.com +1720-806-9236

Professional Summary:

I am an accomplished Data Engineer with over 10 years of extensive IT experience, specializing in the design,
development, and implementation of large-scale data engineering projects. My professional expertise
encompasses:

Cloud Technologies & Big Data:


 Proficient in leveraging cloud platforms like AWS, Azure, and GCP for data processing, storage, and
analytics.
 Hands-on experience with Big Data analytics using Hadoop Ecosystem tools, Spark, Kafka, Flume, Avro,
and integration with databases like Cassandra and Solr.
Database Management:
 Expertise in various SQL (e.g., MySQL, ORACLE, MS SQL Server) and NoSQL databases (e.g., MongoDB,
Cassandra), and multiple file formats including JSON, Avro, and Parquet.
 Experience in container orchestration using Docker and ECS, as well as database management in Azure
and AWS.
Data Architecture & Modeling:
 Skilled in creating Snowflake Schemas, and managing Data Lakes.
 Knowledgeable in OLAP, Dimensional Data Modeling with Ralph Kimball Methodology, including Star
Schema and Snow-Flake Modeling.
Data Pipeline & ETL Development:
 Extensive experience in developing and maintaining data pipelines using AWS services, Azure Data
Platform services, Informatica, and various ETL solutions.
 Expertise in migrating data between different platforms, including on-premise ETLs to GCP and Azure
to GCP using native tools.
Software Development Life Cycle (SDLC):
 Comprehensive understanding of SDLC, including Database Architecture, Logical and Physical
modeling, test-driven development (TDD), and acceptance test-driven development (ATDD).
Analytics & Visualization Tools:
 Extensive experience in Text Analytics, data visualizations using R, Python, and dashboards using Power
BI, SSRS Tableau.
Deployment & Automation:
 Experience utilizing Kubernetes and Docker for CI/CD processes.
 Proficient in code management and facilitating code reviews using Git and Bitbucket.
Collaboration & Communication:
 Ability to collaborate with cross-functional teams, working with RESTful APIs, and leveraging web
services to integrate data from external sources.

I bring a comprehensive understanding of modern data engineering practices and technologies, combined with
a strong ability to translate complex data into valuable business insights. My hands-on approach ensures
alignment with organizational goals, enhancing efficiency, and delivering high-impact solutions.

1
Technical Skills:

Cloud Computing Platforms AWS (Redshift, RDS, S3, EC2, Glue, Lambda, Step Functions, Cloud
Watch, SNS, DynamoDB, SQS, EMR), Azure (Data Lake, Data Factory,
Stream Analytics, SQL DW, HDInsight/Databricks), Google Cloud Platform
(Big Query, Cloud DataProc, Google Cloud Storage, Composer).

Big Data Ecosystem HDFS, MapReduce, YARN/MRv2, Pig, Hive, HBase, Sqoop, Kafka, Flume,
Oozie, Avro, Spark (Spark Core, Spark SQL, Spark MLlib, Spark GraphX,
Spark Streaming), Cassandra, Zookeeper.

Database Systems MongoDB, Cassandra, MySQL, ORACLE, MS SQL Server, Azure SQL,
NoSQL DB.

ETL Tools Informatica, AWS Glue, Azure Data Factory.

Containerization & Orchestration Docker, Kubernetes, AWS ECS, AWS Lambda.

Programming Languages Python, R, Scala.

Database Query Languages SQL (MySQL, PostgreSQL, Redshift, SQL Server and Oracle dialects).

Data Warehousing Solutions Snowflake Schemas, Data Marts, OLAP, Dimensional Data Modelling
with Ralph Kimball Methodology (Star Schema Modelling, Snow-Flake
Modelling for FACT and Dimensions Tables), Azure Analysis Services.

File Formats Delimited files, Avro, Json and Parquet.

Data Visualization Tools Microsoft Power BI, Tableau.

Data Analytics Text Analytics, R, Python, SPSS, Rattle.

Version Control Systems Git, Bitbucket.

Software Development Methodologies Test Driven Development (TDD), Behaviour Driven Development (BDD),
Acceptance Test Driven Development (ATDD).

Operating Systems Unix/Linux.

APIs RESTful APIs.

Continuous Integration/Continuous Docker, Kubernetes.

2
Deployment (CI/CD)

Software Development Life Cycle Database Architecture, Logical and Physical modelling, Data
(SDLC) Warehouse/ETL development using MS SQL Server and Oracle, ETL
Solutions/Analytics Applications development.

Business Intelligence Solutions MS SQL Server Data tools, SQL Server Integration Services (SSIS),
Reporting Services (SSRS).

Data Migration Tools Sqoop, Azure Data Factory.

Other Tools/Technologies Spring Boot, Solr, AWS ALB, ECS, Informatica, Map R.

Education: Bachelor of Technology in Computer Science, Osmania University, 2013, India

Certifications:
 Microsoft certified Azure Data Engineer
 AWS certified Solutions Architect

Professional Experience:

Nova Signal, Los Angeles - CA Mar 2022 - Till Date


Sr. Data Engineer

Responsibilities:

Data Processing & Analytics:


 Utilized Python for various data manipulation and analysis tasks, employing libraries like Pandas,
NumPy, and SciPy to clean, transform, and visualize data.
 Developed and optimized machine learning models using Python's scikit-learn and TensorFlow to
provide insights and predictive analytics for business decision-making.
 Utilized Spark for algorithm optimization and performed advanced text analytics using Spark's in-
memory computing capabilities.
 Used Sqoop for data import/export between relational databases and Cassandra, and Golang for
applications development.
Data Warehousing & ETL:
 Developed stored procedures in Snowflake; extracted and loaded data between AWS S3 and
Snowflake.

 Developed and maintained complex SQL stored procedures to optimize data retrieval, reducing query
execution time by 30%.
 Designed and implemented ETL workflows using SSIS, facilitating the integration of data from multiple
sources into a centralized data warehouse.

3
 Utilized Talend for robust ETL pipeline design in complex data-intensive environments, and maintained
financial data ETL pipelines for budgeting and forecasting.
Automation & Scripting:
 Created and maintained Python scripts for automation of routine tasks, data extraction, and
integration with various APIs and web services.
 Implemented Python-based solutions for real-time monitoring, alerting, and logging, enhancing system
performance and reliability.
AWS Cloud & Serverless Architecture:
 Designed Cloud Formation templates for deploying web applications and databases, and optimized
AWS service performance.
 Implemented serverless architecture using API Gateway, Lambda, DynamoDB; deployed AWS Lambda
code from Amazon S3 buckets.
 Developed ETL process in AWS Glue to migrate campaign data into Redshift, automated dashboards
with Terraform.
Business Intelligence & Reporting:
 Utilized tools like Power BI for data visualization and reporting.
 Conducted data analysis and compliance reviews to secure customer sensitive data for risk, AML, and
marketing teams.
Big Data & Hadoop Ecosystem:
 Worked with Hortonworks Apache Falcon for data management and utilized AWS EMR for map-reduce
jobs.
 Developed Hive queries for structured and semi-structured data transformation and used ELK (Elastic
Search, Logstash, Kibana) for log management.
Other Responsibilities:
 Implemented data interfaces using REST API and processed data using MapReduce 2.0, stored in HDFS
(Hortonworks).
 Performed data extraction and aggregation within AWS Glue using PySpark and tested jobs locally
using Jenkins.
 Played a key role in implementing and maintaining data pipelines using Actimize on Cloud, ensuring
efficient data ingestion, transformation, and loading (ETL) processes.
Environment: Python, PySpark, AWS Cloud Formation, AWS Lambda, AWS Glue, AWS Redshift, Datadog,
Terraform, AWS API Gateway, DynamoDB, AWS S3, Tableau, Spark, Scala, Spark-SQL, Kafka, Snowflake,
Golang, MapReduce 2.0, HDFS (Hortonworks), ELK Stack (Elastic Search, Logstash, Kibana), AWS EMR, Amazon
AWS EC2, Hive, Talend, Linux, Jenkins, AWS Glue, Git.

T-Mobile, Bellevue - WA Nov 2020 - Feb 2022


Sr. Data Engineer

Responsibilities:

Data Processing & Machine Learning:


 Implemented Spark Scripts using Scala and tested Apache Tez for building high-performance data
processing applications.
 Utilized Python in conjunction with libraries such as pandas and scikit-learn for advanced ETL
capabilities, data analytics, and machine learning techniques.
 Migrated an entire Oracle database to Big Query and used Power BI for reporting.

4
Development & Programming:
 Utilized Golang to build RESTful APIs and developed MapReduce programs in Java for raw data parsing.
 Employed Python to automate routine tasks, extract data, and interact with various APIs and web
services.
 Utilized Git in conjunction with Docker and Kubernetes for version control, testing, and deployment of
the CI/CD pipeline.
Data Pipeline & ETL Management:
 Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines and designed
DAGs for ETL pipelines.
 Utilized Azure Data Factory, T-SQL, Spark SQL, and U-SQL for data extraction, transformation, loading,
and integration across various Azure services.
 Used Python for designing and implementing configurable data delivery pipelines for scheduled
updates to customer-facing data stores.
 Led ETL processes using SSIS, extracting data from varied sources, transforming as per business logic,
and loading into data warehouses.
Cloud & Infrastructure Management:
 Designed and built Azure Cloud environment infrastructure, integrating Azure SQL Database, Azure
Analysis Services, and Azure Data Factory.
 Implemented a Continuous Delivery pipeline with Docker, GitHub and managed Azure Cloud relational
servers and databases.
Business Intelligence & Visualization:
 Managed the development of Power BI reports and dashboards, and developed Tableau reports
integrated with Hive for data-driven decision-making.
 Created Databricks job workflows to extract data using PySpark and Python, and worked with BigQuery
and spark data frames.
Other Responsibilities:
 Worked on Confluence and Jira for project management, and used Jenkins for continuous integration.
 Utilized Linux for system administration and performed troubleshooting in a Linux environment.
 Designed and implemented Salesforce data models tailored to specific business requirements,
ensuring data accuracy and consistency.

Environment: Python, Pyspark, Azure Data Factory, T-SQL, Spark SQL, Azure Data Lake, Azure Storage, Azure
SQL, Azure Databricks, Azure SQL Database, Azure Analysis service, Azure SQL Data Warehouse, Trillium
Quality, Azure DevOps, Power shell, MongoDB, MS SQL Server, Golang, RESTful APIs, Power BI, Confluence,
Jira, Flume, HBase, Pig Latin, Hive QL, Jenkins, Tableau, MapReduce, Apache Tez, Docker, GitHub, Databricks,
BigQuery, Git, Kubernetes.

Cygnus, Marshall – MN Mar 2018 - Sep 2020


Data Engineer

Responsibilities:

Data Transformation & Analysis:


 Utilized Python and SAS for ETL processes, generating reports, insights, and key conclusions.
Automation & Scripting:

5
 Developed automated scripts in Python for data cleaning, filtering, and analysis with tools such as SQL,
HIVE, and PIG.
Hadoop Cluster Management:
 Managed Hadoop clusters, ranging from 4-8 nodes during pre-production to 24 nodes during
production, and transitioned Hadoop jobs to HBase.
API Development:
 Built APIs to allow customer service representatives access to data, and developed RESTful APIs using
Golang for data processing functionalities.
Data Warehousing:
 Improved Business Data Warehouse (BDW) performance, established self-service reporting in Cognos,
and developed database management systems for data access.

 Utilized views to simplify data access for reporting purposes, reducing the need for redundant query
creation.

Image Processing:
 Processed image data through Hadoop, using Map and Reduce, and stored into HDFS.
Data Visualization:
 Designed and documented dashboards with Tableau, including charts, summaries, graphs, and
geographical maps. Utilized show me functionality for various visualizations.
Golang-based Pipelines:
 Developed and maintained data processing pipelines for handling large volumes of data, including data
ingestion, transformation, and loading.
Statistical Analysis & Data Processing:
 Performed analysis using Python, R, and Excel, including extensive work with Excel VBA Macros and
Microsoft Access Forms.
Azure Databricks & ETL Workflows:
 Designed data pipelines using Azure Databricks for real-time processing and developed ETL workflows
to transform and load data into target systems.

Environment: Python, Hadoop, API Development, HBase, Cassandra, ORACLE, JSON, Azure SQL DW,
HDInsight/Databricks, Data Lakes, Stackdriver Monitoring, Jenkins, Hive, Java, MapReduce, HDFS, Talend,
Tableau, Waterfall Methodology, Git, Golang, RESTful API, R Programming, SQL, SAS, Azure Databricks, ETL
workflows.

AbbVie Inc, Hyderabad - India Sept 2015 - Jan 2018


Data Engineer

Responsibilities:

Data Modeling & Transformation:


 Developed logical data models from conceptual designs using Erwin and transformed legacy tables
into HDFS and HBase using Sqoop. Managed data mapping, transformation, cleansing, and
performance enhancement tasks such as defragmentation, partitioning, and indexing.

6
Python Development & Testing:
 Built data validation programs using Python and Apache Beam, executed in cloud Dataflow, and
integrated BigQuery tables. Utilized PyTest for unit and integration testing to ensure the proper
functioning of data pipelines and applications.
SQL Operations & Optimization:
 Utilized SQL across various dialects (PostgreSQL, Redshift, SQL Server, and Oracle) for advanced data
manipulation, reporting, and performance optimization. Successfully migrated data between RDBMS,
NoSQL databases, and HDFS using Sqoop.
Big Data Analytics & Data Science:
 Applied Big Data analytics using Azure Databricks, Hive, Hadoop, Python, PySpark, Spark SQL, and
MapReduce on petabytes of data. Implemented advanced data analysis techniques, including
regressions, data cleaning, and visualization tools such as Excel v-look up, histograms, and TOAD client
to provide insights for investors.
Hadoop Ecosystem Design & Development:
 Leveraged the Hadoop ecosystem, utilizing technologies such as MapReduce, Spark, Hive, Pig, Sqoop,
HBase, Oozie, and Impala. Designed and implemented Oozie pipelines to perform tasks like data
extraction from Teradata and SQL, loading into Hive, and executing business-required aggregations.
Optimization & Parallel Processing:
 Developed Apache Hadoop, CDH, and Map-R distros, optimizing data latency by leveraging parallel
processing wherever possible.
Data Visualization & Machine Learning:
 Integrated Power BI with Python to enhance visualization and employed Python for implementing
machine learning algorithms on different data formats like JSON and XML.
ETL & Data Processing Automation:
 Created automated ETL processes, including Spark jars for business analytics, and developed JSON
scripts for SQL Activity-based data processing. Converted Pig scripts into JAR files and parameterized
them within Oozie for HDFS data handling.
Version Control & Development Workflow:
 Utilized Git for version control, including pulling, adding, committing, and pushing code, paired with
screwdriver.yaml for build and release management. Employed Git tagging for efficient and traceable
release management.
Cloud & Distributed Computing:
 Worked on cloud-based data processing and deployed outcomes using Spark and Scala code in the
Hadoop cluster.

Unify Technologies, Hyderabad, INDIA June 2013 - Aug 2015


Python Developer

Responsibilities:

● Build and maintain server-side logic for web applications, often using frameworks like Django and Flask.

● Worked with a team of developers to build data-driven applications to provide analytical insights and decision
support tools for executives. Used Python libraries like pandas, NumPy and SciPy.

7
● Developed advanced data access routines using Python and libraries such as SQLAlchemy to extract data from
source systems, replacing tasks previously done using VBA, SQL Server SSIS, SAS and SQL.
● Utilized Python libraries like Dash and Plotly, in conjunction with Tableau and R, to develop data visualizations
and dashboards for large datasets.
● Identified and implemented process improvements using Python to automate repetitive tasks and improve
workflow efficiencies.
● Developed and executed sophisticated data integration strategies using Python scripts, harnessing the power
of industry-leading libraries such as Apache Beam and pandas.
● Wrote and executed tests for the code developed, ensuring that it functions as expected and is robust against
possible edge cases. This involved unit tests, integration tests using Python's testing libraries like unittest and
pytest.
● Used version control systems such as Git to manage code, track changes and collaborate with other
developers.
● Wrote clean, maintainable and efficient Python code for developing applications. Debugging code to identify
and fix issues that arise.

Environment: Python, Django, Flask, pandas, NumPy, SciPy, Tableau, pytest, Git.

You might also like