Anusha

Anusha M.
Irving, Texas, USA- 75038 | MO.: +1 (312) 248 1255 | Email: anushamungi027@gmail.com | LinkedIn
SUMMARY
• 3+ years of experience as a Data Engineer in Data Engineering, Data Analysis, ETL, and Data warehousing, Business Intelligence background in
Designing, Developing, Analysis, Implementation, and post implementation support of DWBI applications with a proven track record of designing
and implementing robust and scalable data solutions.
• Expertise in ETL processes, data modeling, and database management, ensuring optimal data flow, organization, and retrieval.
• Proficient in utilizing cloud platforms, including AWS and Azure, to architect and implement end-to-end data pipelines for diverse business needs.
• Skilled in leveraging big data technologies such as Apache Spark and Hadoop to process and analyze large datasets efficiently.
• Experienced in implementing data quality and governance measures, ensuring data accuracy, compliance, and security.
• Demonstrated ability to collaborate with cross-functional teams to understand business requirements and translate them into effective data solutions.
• Strong background in optimizing database performance, conducting query tuning, and implementing best practices for efficient data storage and
retrieval.
• Engineered data pipelines in Azure Data Factory (ADF), orchestrating ETL processes from diverse sources to Azure SQL, Blob storage, and Azure
SQL Data Warehouse.
• Hands-on experience with SQL, NoSQL databases, and data warehouse solutions, contributing to seamless integration and accessibility of data.
• Proven expertise in version control systems (e.g., Git) and agile methodologies, ensuring a collaborative and efficient development process.
• Passionate about staying current with emerging data technologies and trends, continuously seeking opportunities to enhance data engineering
capabilities.
• Worked on architecting, designing, and implementation of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
• Efficient in preprocessing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature Scaling and Dimensionality
Reduction techniques using Machine learning platforms like Python Data Science Packages (Scikit-Learn, Pandas, NumPy).
• Well-Versed in analyzing data using Python, R and SQL.
• Experience in building reliable ETL processes and data pipelines for batch and real - time streaming using SQL, Python, Databricks, Spark,
Streaming, Sqoop, Hive, AWS, Azure, NiFi, Oozie and Kafka.
• Responsible for designing and building new data models and schemas using Python and SQL.
• Involved in developing python scripts, SSIS, Informatica, and other ETL tools for extraction, transformation, loading of data into the data warehouse.
• Building the Tableau dashboards to provide the effectiveness of weekly campaigns and customer acquisition.
• Utilized AWS EMR to transform and move large amounts of data into and out of other AWS Data stores and databases, such a S3 and Dynamo
DB. To support business intelligence objectives, I built data warehousing systems utilizing Amazon S3, Dynamo DB, EC2, and Snowflake.
• Implements best practices to create cloud functions, applications, and databases.
• Responsible for loading the tables from the azure data lake to azure blob storage for pushing them to snowflake.
• Worked in all stages of Software Development Life Cycle (SDLC).
EXPERIENCE
PNC Financial – Data Engineer | USA | Jul 2023 - Present
• Worked on data integration projects using ETL tools like SSIS, Informatica, and Talend Studio to extract data from various sources like Oracle,
MySQL, SQL Server, and load it into Snowflake cloud data warehouse.
• Assemble large, complex data sets that meet functional / non-functional business requirements.
• Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing
infrastructure for greater scalability, etc.
• •Worked on the development and maintenance of data lakes, utilizing Hadoop Distributed File System (HDFS) for scalable storage and
retrieval of structured and unstructured data.
• Collaborated with data engineering teams to implement and enhance Hadoop-based data solutions, leveraging technologies such as
MapReduce and Apache Spark.
• Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL.
• Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key
business performance metrics.
• Collaborated with vendor partners to stay informed about the latest advancements in Azure and cloud technologies.
• Collaborated with data engineering teams to implement Azure-based data solutions, leveraging services such as Azure Data Factory and Azure
Databricks.
• Implemented data lakes on Azure Storage, enabling efficient storage and retrieval of structured and unstructured data.
• Conducted performance tuning for Azure SQL Data Warehouse, optimizing query performance for analytical workloads.
• Implemented Azure Active Directory (AAD) for identity and access management, strengthening security measures and achieving compliance
with industry standards.
• Orchestrated the adoption of Azure DevOps for continuous integration and continuous deployment (CI/CD) pipelines, resulting in a 30%
reduction in deployment time.
• Spearheaded the migration of on-premises applications to Microsoft Azure, achieving a 20% reduction in operational costs and improving
overall scalability.
• Setting up a continuous integration (CI) process, entails creating a Git repository, configuring a Jenkins server, and installing the Git plugin,
creating a new Jenkins job, and configuring it to build the project from the Git repository, configuring build settings, saving the job, running a
test build, and finally publishing the build to a remote Git repository or deploying it to the production environment after a successful build.
• Implemented New ETL Solution based on Business Requirements.
• Design & implement strategies to build new data quality frameworks to replace old systems in place.
• Working with Agile environment and using rally tool to maintain the user stories and tasks. Participated in collaborative team designing and
developing a Snowflake data warehouse.
• Tableau was used to create data visualization dashboards to communicate findings to stakeholders.
• Implemented security best practices and compliance measures in accordance with AWS Well-Architected Framework.
• Implemented Infrastructure as Code using AWS CloudFormation, automating the provisioning and management of AWS resources.
• Designed and implemented serverless applications, leveraging AWS Lambda, API Gateway, and DynamoDB.
• Spearheaded the adoption of serverless architecture using AWS Lambda, reducing development time and infrastructure costs by 25%.
• Conducted training sessions for development teams to promote serverless best practices and coding standards.
CVS Health- Data Engineer | USA | Aug 2022 – Jun 2023

• Led the end-to-end migration of on-premises infrastructure to AWS, resulting in a 40% reduction in operational costs and improved scalability.
• Implemented AWS services such as EC2, S3, and RDS to replicate and enhance on-premises workloads in the cloud.
• Utilized AWS Migration Hub for tracking and monitoring the progress of the migration. Used to analyze the financial statements of the companies in
order to predict the future sales, profit/loss and share values.
• Updating key financial highlights on different & customized financials decks for each type of sector including trading and transaction comps, relative
valuation.
• Implemented CI/CD pipelines using AWS Code Pipeline for automated deployment of serverless applications.
• Data cleaning, De-duplication, validation, consolidation by reviewing data reports and indicators, add/edit data as per process guidelines.
• Conducting research to gather critical business information through creatively using sourcing techniques for internal & external databases such as
Contributor reports, Annual Reports, SEC/SEDAR filings etc. and conducting qualitative analysis.
• Designed Power BI dashboard reports aligned with business requirements, enhancing data visualization. Optimized data collection, analysis, and
visualization methods for improved efficiency.
• Developed scripts (Python) to Extract, Load, and Transform data. Used the Waterfall methodology to build the different phases of the Software
Development Life Cycle. Involve in requirements gathering, analysis, design, development, testing production of an application using the Agile
model.
• Designing and deploying dashboards with Tableau and providing complex reports, including charts, summaries, and graphs to interpret the findings
to the team and stakeholders.
• Initiatives for reporting, web review, benchmarking, and content development using Google Analytics.
• Data extraction from various flat files, MS Excel, MS Access, and transformation of the data depending on user requirements and loading data into
the target by scheduling sessions.
• Used Python, implemented multiple machine learning techniques such as Decision Tree, Naive Bayes, Logistic Regression, and Linear Regression to
evaluate the accuracy rate of each model.
• Stayed up to date with the latest cloud technologies, data analytics tools, and industry trends, and recommended innovative solutions to enhance
dataanalytics capabilities.
• Designed, developed, and maintained scalable and efficient data pipelines, reducing data processing time by 30%.
Fusion Software- Data Engineer | India | Jan 2020- Jul 2021

• Implemented ETL processes using tools such as Apache NiFi, Talend, and custom Python scripts to ingest, transform, and load data from
various sources.
• Orchestrated and managed data workflows using Apache Airflow, ensuring timely and reliable execution of data processing tasks.
• Built and maintained data warehouse solutions using technologies like Snowflake, optimizing data storage and retrieval for analytical purposes.
• Collaborated with data scientists and analysts to understand data requirements and provide optimal data solutions that meet business objectives.
• Conducted data modeling and schema design to ensure effective organization and storage of data in databases.
• Implemented data quality checks and validation processes to ensure the accuracy and consistency of data across multiple systems.
• Worked closely with stakeholders to define data governance policies and ensure compliance with privacy regulations.
• Conducted performance tuning and optimization of database queries, enhancing overall system performance.
• Collaborated with cross-functional teams to integrate new data sources into existing data pipelines, ensuring seamless data flow.
• Maintained documentation for data pipelines, data models, and data governance processes, facilitating knowledge transfer and onboarding.
• Evaluated and implemented new technologies and tools to enhance data engineering capabilities, staying abreast of industry best practices.
• Developed data wrangling through data gathering and querying process over SQL and implemented data cleaning process with Rand Python.
• Worked with a diverse set of clients all over the world, identifying and analyzing Dynamic Values, and managing development with agile
methodology. Worked in a Unix environment with Oracle SQL Developer, Microsoft SQL Developer, Oracle Warehouse, and PyCharm.
• Presented Dashboards to Management for more Insights using Microsoft Power BI. Queried data from SQL server, imported other formats of data
and performed data checking, cleansing, manipulation, and reporting using SQL.
• Migrating data from on-premises databases (SQL Server) to Cloud databases/Snowflake.
• Developed and maintained data models to support business intelligence and machine learning projects, and optimized Spark tasks to run on the
Kubernetes Cluster for quicker data processing.
• Design of workflow which includes setting up DEV, QA, and PROD environments, creating users, and managing their permissions.
• Created and maintained data workflows using Apache Airflow to schedule and monitor ETL jobs, ensuring data quality and accuracy.
• Working on architecting, designing, and implementation of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple File formats.
• Collaborate with Data Science to gather requirements, analyze data, design, implement, and test an application utilizing the Agile paradigm.
• To handle massive datasets from diverse sources, such as SQL databases, NoSQL databases, One Lake, and APIs, designed and deployed scalable
data pipelines with complicated data processing using Apache Airflow.
•· To enable real-time data analytics, built real-time streaming apps with Apache Kafka and Apache Spark.
EDUCATION
Masters in computer information systems | University of Central Missouri, Lee’s Summit, MO, USA
(Aug 2021-Dec 2022)
Bachelors in computer science | Usha Rama College of Engineering and Technology, Andhra Pradesh, India
(Jul 2016- Sep 2020)
SKILLS
Methodologies: SDLC, Agile, Scrum
Programming Languages and Platforms: Python, Java, C, Scala, SQL, Unix Shell Script
Big Data: Hadoop, HDFS, Sqoop, Hive, HBase, Spark, Kafka, Impala, Nifi, Cassandra, Apache Airflow, Databricks
ETL/ELT Tools: SSRS, SSIS, SSAS, Informatica, Azure Data factory, DBT
Databases: MySQL, SQL Server, Snowflake cloud, SQL, NoSQL, MySQL, SQL Server DB2, PostgreSQL, Oracle, MongoDB
Tools/ IDE/ Build Tools:
PowerBI, Tableau, Talend Studio, Git, Git Bash, Eclipse, IntelliJ, Maven, Jenkins, GitHub, Jira, Snowflakes, Bitbucket, Data pipelines, QlikView
Cloud Computing: AWS (S3, CloudWatch, Athena, RedShift, EMR, EC2, DynamoDB), Azure (Azure Data Factory, Azure Blob, Azure Databricks),
IAM, Secret Manager, S3, Lambda, CloudWatch, Messaging Queue (SNS & SQS), Azure - ADF, Blob Storage
Data Analytics Skills:
Data Cleaning, Data Masking, Data Manipulation, Data Visualization
BI & CRM Tools:
Tableau, Microsoft Business Intelligence (Power BI), Sigma Computing
Packages:
NumPy, Pandas, Matplotlib, SciPy Scikit-learn, TensorFlow.
File Formats:
Parquet, Avro, ORC, JSON
Operating System:
Windows, Linux, Unix, MacOS

Anusha

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anusha

Uploaded by

Copyright:

Available Formats

Anusha M.

CVS Health- Data Engineer | USA | Aug 2022 – Jun 2023

Fusion Software- Data Engineer | India | Jan 2020- Jul 2021

You might also like