You are on page 1of 5

ARAVIND REDDY PATHOORI

aravindreddy0703@gmail.com
913-735-0692
PROFESSIONAL SUMMARY
 As a senior data engineer with 9+ years of diversified IT experience, Solving business use cases for several clients. Have an
expert hand in areas of database development, ETL development, data modeling, big data technologies, and software
development with Agile methodology, which includes business analysis and modeling, user interaction, planning and testing,
migration, and documentation.
 Working experience with the Hadoop framework and its ecosystems like Hadoop Distributed File System (HDFS),
MapReduce, Zookeeper, GitHub, Pig, Impala, Hive, HBase, Sqoop, and Oozie
 Experience with configuration and development on multiple Hadoop distribution platforms like Cloudera and Hortonworks
(on-premise).
 Good understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node,
Data Node, and MapReduce concepts
 Good Exposure to Apache Hadoop Map Reduce Programming, PIG Scripting, and Distributing Applications and HDFS Good
knowledge of Hadoop cluster architecture and monitoring the cluster.
 Experience in using Partitions, bucketing optimization techniques in Hive and designed both Managed and External tables in
Hive to improve performance.
 Procedural knowledge on cleansing and analyzing data using Hive, Hadoop Platform and on Relational databases such as
Oracle, SQL, Teradata.
 Experience designing and building Data Lake solutions based on organizations needs and capabilities.
 Good understanding on Installing and maintaining the Linux servers.
 Worked with HL7 functions to facilitate interoperability between various health information systems, data management, HL7,
Data Masking, FHIR, data governance, and data quality.
 Experience in developing data warehouse applications using Hadoop, Oracle, Teradata, and MS SQL servers on UNIX and
Windows platforms and experience in creating complex mappings using various transformations and developing strategies for
ETL mechanisms by using SSIS
 Experience with cloud-based services such as AWS EMR, EC2, S3, Athena, EKS, RDS, SNS, SQS, Cloud Watch, AWS
Lambda, DynamoDB and Redshift to work in distributed data models that provide fast and efficient processing of big data
 Experience in configuring and maintaining long-running Amazon clusters manually as well as through cloud formation
scripts on AWS.
 Worked with various transformations like Normalizer, expression, rank, filter, group, aggregator, lookups, joiner, sequence
generator, sorter, SQLT, stored procedure, Update strategy, Source Qualifier, Transaction Control, Union, CDC etc.,
 Proficient in Python programming language, with a strong focus on data manipulation, analysis, and machine learning
modeling using libraries such as NumPy, Pandas, and Scikit-learn.
 Extensive experience in PySpark, a powerful open-source framework for big data processing and analytics. Developed and
optimized PySpark applications to process large-scale datasets efficiently, leveraging distributed computing capabilities .
 Proficient in the Hive Query language and experienced in hive performance optimization using Static and Dynamic
partitioning, bucketing, and parallel execution.
 Carried out Hive, a simple, generic custom UDF, and developed multiple hive views for accessing underlying table data.
 Proficient in importing and exporting data using SQOOP from HDFS to RDBMS and vice versa.
 Integrated Kafka with Spark streaming for real-time data processing.
 Knowledge of big data workflow orchestration tools like Apache Oozie and Airflow
 Experience in Extraction, Transformation, and Loading (ETL) data from various sources into data warehouses by using AWS
Glue, as well as data processing like collecting and moving data from various sources using Apache Kafka.
 Using SSIS and SSRS for data extraction, transformation, loading, and reporting in the MS SQL Server
 Experience in Dimensional Data Modeling Techniques like Star Schema, Snow-Flake Schema, Fact Tables, Dimensional
Tables, Transactional Modeling, and SCD (slowly changing dimensions)
 Hands-on experience with the Snowflake cloud data warehouse for integrating data from multiple source systems, including
loading nested JSON-formatted data into the Snowflake table.
 Handling NoSQL databases, including MongoDB, Cassandra, and HBase.
 Experience in using different columnar file formats like RC File, AVRO, ORC, and Parquet formats
 Working experience with functional programming languages like SQL, R, Scala, and Python
 Skilled data engineer with experience in designing, developing, and managing data solutions in Azure.
 Worked on Microsoft Azure services like HDInsight Clusters, BLOB, ADLS, Data Factory, and Logic Apps and also did
POC on Azure Data Bricks.
 Expertise in Azure services such as Azure Data Factory, Azure Databricks, Azure Data Lakes, Azure Blob Storage, Azure
Synapse Analytics, Azure Cosmos DB, Azure HD Insight, Azure Event Hubs, Azure Virtual Machines, Azure Resource
Manager, Azure Functions, Azure SQL Database, Azure Queue Storage, Azure Stream Analytics, and Azure Analysis
Services
 Expertise in creating, debugging, scheduling, and monitoring jobs using Airflow
 Experience writing Spark streaming and Spark batch jobs, using Spark MLlib for analytics.
 Experienced in Normalization (1NF, 2NF,3NF and BCNF) and De-normalization techniques for effective and optimum
performance in OLTP and OLAP environments.
 Capable of processing large sets of structured, semi-structured, and unstructured data and supporting system application
architecture.
 Worked with different database oracle, SQL Server, Teradata, Cassandra SQL Programming.
 Proficient in UNIX/Linux commands, shell scripting, and application deployment.
 Strong background in mathematics with excellent analytical skills.

TECHNICAL SKILLS
Hadoop/Big Data Tools Hadoop, Hive, HDFS, MapReduce, YARN, Pig, Flume, Spark, Kafka, Sqoop, Oozie,
Zookeeper, Nifi, Airflow.
Programming Languages Python, SQL, Scala, PySpark, Shell Scripting
DWH Schema Dimensional Modeling, Star Schema, Snowflake Schema
Databases Oracle, MySQL, SQL Server 2005, HBase, Cassandra, Netezza.
Cloud Environments AWS (EC2, EMR, S3, Kinesis, Dynamo DB, RedShift, AWS Lambda, RDS, SQS,
Quick Sight, SNS, EKS) Azure (Azure Data Factory, Data Bricks, Data Lakes, Azure
Synapse, Azure Blob storage, Azure Functions, Azure Queue Storage, Azure Event
hubs, Azure HD Insight, Azure virtual machines, Azure resource Manager, Azure
SQL DB)
Data Files JSON, CSV, PARQUET, AVRO, TEXTFILE. ORC
Methodologies Agile/Scrum
Operating System MacOS, Windows, Unix

PROFESSIONAL EXPERIENCE

Client: WellCare, Tampa, FL. January 2022- Present


Role: Senior AWS Data Engineer
Responsibilities:
As a Senior Data AWS Engineer at WellCare, designed, built, and improved secure data solutions within the organization. To
enable data-driven healthcare to emphasize and maintain patient confidentiality and data integrity. Advancing healthcare through
secure and efficient data management within the AWS ecosystem and healthcare data management to enhance patient care and
drive data-driven decision-making at WellCare Health Care.

 Constructed robust data warehouse solutions to store and organize extensive healthcare data for analysis and reporting.
 Created intricate ETL jobs with AWS Glue, converting data from diverse sources into a unified data model using Python and
SQL.
 Formulated ETL processes to intake, transform, and load substantial data volumes from various sources into Delta Lake.
 Designed and executed data solutions on AWS, leveraging services like AWS Lambda and Amazon Redshift.
 Transitioned an on-premises application to AWS, employing services like EC2 and S3 for data processing and storage, and
maintained a Hadoop cluster on AWS EMR.
 Proficient in data wrangling techniques, encompassing data cleansing, transformation, and aggregation, using PySpark’s Data
Frame API for efficient data processing and analysis.
 Devised Spark streaming job to consume data from Kafka topic of different source systems and push the data into HDFS.
 Expertise in creating DataStage parallel and sequence jobs, following established standards.
 Performed MapReduce jobs to enhance data quality and accuracy
 Formulated Kafka producer clients using confluent and generated events into Kafka topic
 Extracted data from Cassandra using Sqoop, placed it in HDFS, and processed it with Hive.
 Composed SQL scripts for data migration and handled data discrepancies, including migration from Teradata SQL to
Snowflake.
 Converted Hive/SQL queries into Spark transformations using Spark RDDs and PySpark
 Utilized Spark SQL for loading Parquet data, creating Schema RDD, and handling structured data.
 Collaborated with the Hadoop ecosystem, enacted Spark with Scala, and utilized Data Frames and Spark SQL for faster data
processing.
 Devised and maintained data orchestration workflows with AWS Step Functions for ETL tasks, and monitored data pipelines'
performance using AWS CloudWatch.
 Subscribed to Kafka topics with Kafka consumer clients, processing real-time events using Spark.
 Formulated Scala scripts and Hive UDFs, Implemented RDD in Spark for data aggregation and storage in S3.
 Demonstrated expertise in creating, debugging, scheduling, and monitoring jobs using Airflow.
 Proficiency in working with distributed computing frameworks like Hadoop and Apache Spark, enabling scalable and
parallelized data processing for large-scale machine learning projects.
 Implemented CI/CD pipelines using Jenkins and Kubernetes for automated deployment and updates of data pipelines.
 Implemented security measures to protect sensitive clinical data sets in accordance with HIPAA regulations.
 Orchestrated data ingestion processes from various sources into Kubernetes-managed containers, ensuring seamless data flow
and maintaining high data integrity.
 Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function.
 Ensured compliance with healthcare data standards, including FHIR and HL7c2, in data processing and storage.
 Utilized Apache Spark with Python for big data analytics and ML applications, including Spark ML and ML lib.
 Expertise in writing complex SQL queries and performing data transformations using SQL functions, joins, subqueries, and
aggregations within Snowflake.
 Implemented security measures and ensured compliance with data protection standards in both SNS and EKS configurations.
 Strong understanding of data modeling concepts and database design principles, including star schema and snowflake
schema, to optimize data structures for efficient data loading and querying in Snowflake.
 Proficient in performance tuning and optimization techniques in Snowflake, including query optimization, partitioning,
clustering, and using Snowflake's query and resource optimization features to improve data processing efficiency.
 Accomplished data warehouse and integration solutions with a strong focus on healthcare data compliance, including HIPAA
regulations.
 Worked with building data warehouse structures and creating facts, dimensions, and aggregated tables by dimensional
modeling star and snowflake schema.
 Implemented performance optimization techniques in PySpark jobs to enhance data processing speed and reduce resource
consumption, resulting in improved pipeline efficiency.
 Ensured adherence to FHIR/HL7c2 standards in data integration processes, facilitating seamless interoperability across
healthcare systems.
 Created data models and schema designs to support the storing and retrieving of clinical information.
 Provided technical guidance and training to junior data engineers on Snowflake best practices, SQL optimization techniques,
and data modeling principles.
 Managed schema evolution in PySpark applications to accommodate changes in data structure and ensure seamless
integration with evolving business needs.
 Successfully handled large-scale data ingestion into Snowflake, optimizing file sizes and formats for efficient loading.
 Implemented ELT processes in Snowflake, including loading raw data loading and transforming data using Snowflake's SQL
capabilities and virtual warehouses.
 Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function.
 Adhered to healthcare regulations such as HIPAA by configuring EKS with appropriate security controls and auditing
capabilities and setting up SNS notifications for monitoring and alerting on data quality issues.
 Conducted data analysis and provided insights to support business decision-making, using tools like AWS Quick Sight,
Tableau, or custom data visualization solutions.
Environment: Hadoop, Hive, MapReduce, Sqoop, Apache Kafka, Spark ML and ML lib FHIR/HL7c Standards, AWS Glue,
AWS Step Functions, AWS CloudWatch, AWS Lambda, Amazon Redshift, EMR, EC2, S3, SNS, EKS, AWS Quick Sight,
DataStage, Jenkins, Kubernetes, Snowflake, PySpark, Spark RDDs, Spark SQL, Scala 2.1.1, Python, SQL, Delta Lake, Airflow,
Tableau.

Client: Ford Motor Company, Michigan December 2019 to December 2021


Role: Azure Data Engineer
Responsibilities:
 Designed and implemented ETL processes using Azure Data Factory to load data from various sources, ensuring data quality
and reliability.
 Developed Python scripts for data extraction, transformation, and loading into Azure SQL Database, automating data
pipelines and improving efficiency.
 Utilized Azure Blob Storage for storing large datasets, implementing data partitioning and clustering for optimal
performance.
 Designed and implemented data models in Azure SQL Database, ensuring efficient querying and high performance for
complex analytic workloads.
 Collaborated with cross-functional teams to integrate Apache NiFi with existing data infrastructure, including Hadoop/HDFS
on Azure HDInsight.
 Implemented best practices for Data Warehousing, including OLTP, OLAP, Dimensions, Facts, and Data modeling
techniques.
 Conducted performance tuning and optimization of SQL queries against Azure SQL Database, improving query efficiency
and reducing processing times.
 Designed and implemented end-to-end data solutions on Azure, leveraging services such as Azure Functions and Azure SQL
Database.
 Migrated an existing on-premises application to Azure and used its services like Azure Virtual Machines and Azure Blob
Storage for data processing and storage.
 Implemented Azure Event Hubs for real-time data streaming and processing, enabling timely analysis of streaming data for
critical business decisions.
 Created real-time data streaming solutions using Apache Kafka and Azure Stream Analytics, storing the data in Azure Blob
Storage and Azure SQL Database.
 Implemented partitioning and clustering techniques in Azure SQL Database to improve performance and involved in
choosing different file formats like Parquet and ORC.
 Developed complex SQL queries, stored procedures, and data transformation scripts, enhancing data quality and enabling
accurate business reporting.
 Identified and validated data between source and target applications, ensuring data consistency and accuracy.
 Implemented data cataloging, classification, and lineage using Azure Purview, developing data governance policies, and
ensuring compliance.
 Implemented monitoring and logging strategies in CI/CD pipelines to quickly identify and resolve issues in the deployment
process.
 Designed data lake architecture on Azure Blob Storage, utilizing Azure Resource Manager (ARM) templates to define storage
containers, lifecycle policies, and data partitioning for efficient data organization and retrieval.
 Developed Spark SQL scripts using PySpark to perform transformations and actions on Data Frames, Data Sets in Spark for
faster data processing on Azure HDInsight.
 Implemented scalable and fault-tolerant data processing solutions leveraging Apache NiFi's clustering capabilities, enabling
seamless handling of high-volume data streams while ensuring high availability and data integrity using Azure Stream
Analytics.
 Designed and optimized database schemas and data models to support efficient data retrieval and analysis, leveraging SQL
optimization techniques and indexing strategies on Azure SQL Database and Azure SQL Data Warehouse.
 Collaborated with stakeholders to define data requirements and develop data models for efficient storage and retrieval.
 Monitored and optimized the performance of data infrastructure, troubleshooted issues, and proposed enhancements form
maximum efficiency and reliability.
 Built a recommendation engine for personalized product recommendations based on purchase history on Azure.
 Collaborated with cross-functional teams and DevOps teams to ensure continuous integration and continuous deployment
data pipelines on Azure.
 Created data visualizations using Power BI to support data analysis and decision-making on Azure.
Environment: Hadoop, Hive, HDFS, HBase, Map Reduce, Spark, Kafka, PySpark, T-SQL, Apache Airflow, Scala, Azure Data
Factory, Azure Data Lakes, Azure Data Bricks, Azure Synapses, Azure SQL Database, Azure Blob Storage, Azure Functions,
Azure Virtual Machines: Azure Event Hubs, Azure Stream Analytics, Azure Purview, PySpark, Azure Cosmo DB, Power BI.

Global Atlantic financial group, IN June 2016 to November 2019


Azure Data Engineer
Responsibilities:
 Designed and implemented the ETL process using Azure Data Factory to load data from different sources, perform data
mining, and analyze data using Power BI for visualization and reporting tools to analyze users' transactional data.
 Designed and implemented large-scale data and analytics solutions on Azure Cloud Data Warehouse, ensuring scalability and
performance.
 Developed Python scripts for Extract, Load, and Transform (ELT) processes, integrating with Azure Data Factory to
automate data pipelines.
 Utilized Azure Data Factory utilities for data integration, data movement, and data orchestration to enhance data operations.
 Applied Data Warehouse/ODS concepts and principles to design and optimize data models for efficient data storage and
retrieval.
 Implemented ETL pipelines using a combination of Python and Azure SQL Database, ensuring data integrity and efficiency.
 Analyzed system requirements and gathered data transformation and translation requirements, selecting appropriate tools for
the job.
 Implemented best practices for Data Warehousing, including OLTP, OLAP, Dimensions, Facts, and Data modeling
techniques.
 Collaborated with stakeholders to understand and translate business requirements into technical solutions, ensuring alignment
with project goals.
 Conducted performance tuning and optimization of SQL queries against Azure SQL Database, improving query efficiency
and reducing processing times.
 Documented ETL processes, data models, and system configurations, ensuring clear and comprehensive documentation for
future reference.
 Designed and implemented end-to-end data solutions on Azure, leveraging services such as Azure Functions, and Azure SQL
Data Warehouse.
 Migrated an existing on-premises application to Azure and used its services like Azure Virtual Machines and Azure Blob
Storage for small data sets processing and storage, experienced in maintaining the Hadoop cluster on Azure HDInsight
 Collaborated with cross-functional teams to integrate Apache Nifi with existing data infrastructure and ecosystem, including
Hadoop/HDFS.
 Designed 3NF data models for OLTP systems and dimensional data models using the Star and Snowflake Schema.
 Configured Spark streaming to get ongoing information from Kafka and store the information into Azure Data Lake Storage.
 Utilized Azure Event Hubs for real-time data streaming and processing, enabling timely analysis of streaming data for critical
business decisions.
 Created real-time data streaming solutions using Apache Spark and Kafka and storing the data in Azure Data Lake Storage
and Azure SQL Data Warehouse.
 Implemented partitioning and bucketing techniques in Azure SQL Data Warehouse to improve performance and involved in
choosing different file formats like Parquet and ORC.
 Worked with building data warehouses structures, and creating facts, dimensions, and aggregated tables by dimensional
modeling star and snowflake schema.
 Implemented data modeling best practices in Azure SQL Data Warehouse, including schema design, table creation, and
defining relationships, to ensure efficient query performance and data integrity.
 Developed complex SQL queries, stored procedures, and data transformation scripts, enhancing data quality and enabling
accurate business reporting.
 Experience in handling complex data transformation scripts in Azure Databricks.
 Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, and OLAP reporting.
 Conducted performance tuning and optimization of Scala-based data processing workflows, improving overall system
efficiency.
 Extracted files from Cassandra through Sqoop and placed in Azure Data Lake Storage and processed them using Azure
Databricks
 Created tables in Azure SQL Database as per requirements, internal, or external tables defined with appropriate static and
dynamic partitions. Worked extensively with Azure SQL Database DDLs and SQL.
 Azure Functions with other Azure services (Azure Queue Storage, Azure Blob Storage) for event-driven architectures and
real-time data.
 Worked extensively with flat files. Loading them into on-premises applications and retrieve data from applications to files.
 Designed data lake architecture on Azure Data Lake Storage, utilizing Azure Resource Manager templates to define storage
accounts, lifecycle policies, and data partitioning for efficient data organization and retrieval.
 Designed and optimized data models in Azure Synapse Analytics, ensuring efficient querying and high performance for
complex analytic workloads.
 Implemented Azure Functions for automating data movement and triggering ETL processes, increasing operational
efficiency.
 Optimized data loading processes by leveraging Azure Synapse Analytics' parallel loading capabilities and Azure Synapse
Analytics Connector for Python, resulting in a 30% reduction in data processing time.
 Implemented data modeling best practices in Azure Synapse Analytics, including schema design, table creation, and defining
relationships, to ensure efficient query performance and data integrity.
 Identified and validate data between source and target applications
 Created real-time dashboards and reports for monitoring fraud detection alerts and system performance using Power BI.

Environment: Hadoop, Hive, HDFS, Hive DDL’s Map-Reduce, Sqoop, Kafka, Apache NiFi, Terraform, Scala, Azure Data
Factory, Azure Databricks, Azure Data Lake Storage, Azure Blob Storage, Azure Functions, Azure SQL Database, Azure Cloud
Data Warehouse, Azure Queue Storage, Azure HDInsight, Azure Event Hubs, Azure Synapse Analytics, Azure Virtual Machines,
Azure Resource Manager, Power BI.

Client: Myntra, Bangalore, India October 2014 to March 2016


Role: Big Data Engineer
Responsibilities:
 Experience in designing, developing, and implementing SSIS packages to extract, transform, and load data.
 Written Hive Queries to transform the data into a tabular format and process the results using Hive Query.
 Experience with partitions, bucketing concepts in Hive, and designing both managed tables and external tables
 Optimizing Hive Queries using various file formats like PARQUET, JSON, and AVRO.
 Experience writing Sqoop jobs to move data from various RDBMS into HDFS and vice versa.
 Responsible for managing data coming from different RDBMS source systems like Oracle and SQL Teradata, and involved
in maintaining the structured data within the HDFS in various file formats such as Parquet and Avro for optimized storage
patterns.
 Improving data processing and storage throughput by using the Cloudera Hadoop framework for distributed computing across
a cluster of up to 17 nodes.
 By using Apache Flume, loaded real-time unstructured data like XML data and log files into HDFS.
 Handled large amounts of both structured and unstructured data using the MapReduce framework.
 Performed performance tuning and troubleshooting of MapReduce jobs by analyzing Hadoop log files.
 Worked with the Oozie workflow engine to schedule time-based jobs to perform multiple actions.
 Experience in developing SSRS reports, including designing report layouts and data visualization.
 Proficiently manage version control using Git and GitHub.
 Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
 Involved in Agile methodologies, daily scrum meetings, and sprint planning
Environment: Hive, HBase, HDFS, Map Reduce, YARN, Spark, Sqoop, Oozie, PL/SQL, Cloudera, Oracle, Sqoop, Apache
Flume, Pig, SSIS and SSRS.

Education:
 Bachelors in Computer Science from JNTU Hyderabad, 2014.

You might also like