You are on page 1of 7

ADITHYA JATANGI

Sr. Data Engineer/ Big Data Developer


Phone: (704) 266-6716
Email: adithyaadit247@gmail.com

Professional Summary
 Around 7 years of technical IT experience in all phases of Software Development Life Cycle (SDLC)
with skills in data analysis, design, development, testing, and deployment of software systems.
 Had more than 5 years of industrial experience in Big Data analytics, Data manipulation using
Hadoop Eco system tools Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, HBase, Spark, Kafka, Flume,
Sqoop, Oozie, Avro, AWS, Spark integration with Cassandra, Zookeeper.
 Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL
Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database
access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
 Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats for analysing and transforming the data
to uncover insights into the customer usage patterns.
 Hands on experience in Hadoop Ecosystem components such as Spark, SQL, Hive, Pig, Sqoop, Flume,
Zookeeper/Kafka, HBase and MapReduce.
 Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and
Performed map-side joins on RDD's.
 Experience in creating configuration files to deploy the SSIS packages across all environments.
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems
(RDBMS)/ Non-Relational Database Systems and vice-versa.
 Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue,
Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
 Experience in developing, support and maintenance for the ETL (Extract, Transform and Load)
processes using Talend Integration Suite.
 Installed, configured Kafka brokers, zookeepers 4.x and 5.x, Kafka connect, schema registry, KSQL,
Rest proxy and Kafka Control center.
 Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake
schema and Extended Star.
 Worked in Dimension Data modeling concepts like Star Join Schema Modeling, Snowflake Modeling,
FACT and Dimensions Tables, Physical and Logical Data Modeling.
 Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
 Excellent Programming skills at a higher level of abstraction using Scala, Java, and Python.
 Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive
SerDes.
 Experience developing Kafka producers and Kafka Consumers for streaming millions of events per
second on streaming data.
 Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data
Warehouse and Integration projects

 Hands on Experience in designing and developing applications in Spark using Scala and Pyspark to
compare the performance of Spark with Hive and SQL/Oracle.
 Experience in writing complex SQL queries, creating reports and dashboards.
 Having working experience with Building RESTful web services, and RESTful API.
 Experience in planning intuitive dashboards, reports, performing analysis and perceptions utilizing
Tableau, Power BI, Arcadia and Matplotlib.
TECHNICAL SKILL-SET

Big Data Ecosystem HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka,
Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala
Hadoop Distribution Cloudera CDH, Horton Works HDP, Apache, AWS
Machine Learning Logistic Regression, Decision Tree, Random Forest, K-Nearest
Classification Algorithms Neighbour (KNN), Principal Component Analysis
Languages Shell scripting, SQL, PL/SQL, Python, R, PySpark, Pig, Hive QL, Scala,
Regular Expressions
Web Technologies HTML, JavaScript, Restful, SOAP
Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Version Control GIT, GIT HUB
IDE & Tools, Design Eclipse, Visual Studio, Net Beans, Junit, CI/CD, SQL Developer, MySQL,
SQL Developer, Workbench, Tableau
Databases Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS
Access, Snowflake, NoSQL Database (HBase, MongoDB).
Operating Systems Windows 98, 2000, XP, Windows 7,10, Mac OS, Unix, Linux

Cloud Technologies MS Azure, Amazon Web Services (AWS)

Data Engineer/Big Data Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop,
Tools MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera,
/Cloud/Visualization/Other MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data
Tools Explorer, Azure HDInsight, Salesforce, Google Shell, Linux, Bash Shell,
Unix, etc., Tableau, Power BI, SAS, Crystal Reports, Dashboard Design.

EDUCATION

 Bachelor of Technology from Jawaharlal Nehru Technological University Hyderabad, India-2015


from Electrical and electronics engineering.
 Masters from Pittsburg State University, USA- 2019 from Information Technology.
Professional Experience

Client: Ally Financial, Troy, Michigan


Mar 20 - Present
Designation: Sr. Data Engineer
Roles & Responsibilities

 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load


data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool
and backwards.
 Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL,
and cloud (Blob, Azure SQL DB).
 Strong experience of leading multiple Azure Big Data and Data transformation implementations in
Banking and Financial Services, High Tech and Utilities industries.
 Created Unix Shell scripts to automate the data load processes to the target Data Warehouse..
 Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data
Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
 Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data
using the SQL Activity.
 Worked on migrating MapReduce programs into Spark transformations using Spark and Scala,
initially done using python (PySpark).
 Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats for analysing & transforming the data to uncover insights into
the customer usage patterns.
 Widely used different features of Teradata such as BTEQ, Fast load, Multiload, SQL Assistant, DDL
and DML commands and very good understanding of Teradata UPI and NUPI, secondary indexes and
join indexes.
 Designed, developed, tested, and maintained Tableau functional reports based on user
requirements.
 Experienced in performance tuning of Spark Applications for setting right Batch Interval time,
correct level of Parallelism and memory tuning.
 Created Build and Release for multiple projects (modules) in production environment using Visual
Studio Team Services (VSTS).
 Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala, and Hive
to perform Streaming ETL and apply Machine Learning.
 Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and
then loaded data into Parquet hive tables from Avro hive tables. 
 Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark
SQL. 
 Designed strategies for optimizing all aspect of the continuous integration, release and deployment
processes using container and virtualization techniques like Docker and Kubernetes. Built Docker
containers using microservices project and deploy to Dev.
 Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and
updates in Hive tables
 Developed PL/SQL triggers and master tables for automatic creation of primary keys.
 Extensively worked on Data Extraction from SAP ECC or Legacy Systems to SAP BW system as a
Target By using SAP BODS.
 Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL,
and cloud (Blob, Azure SQL DB, cosmos DB)
 Responsible for resolving the issues and troubleshooting related to performance of Hadoop cluster.
 Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
 Used Apache Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing
POC's using Scala, Spark SQL and MLlib libraries.
 Performed all necessary day-to-day GIT support for different projects, Responsible for maintenance
of the GIT Repositories, and the access control strategies.

Environment: Hadoop 2.x, Hive v2.3.1, Spark v2.1.3, Databricks, Lambda, Glue, Azure, ADF, Blob,
cosmos DB, Python, PySpark, Java, Scala, SQL, Sqoop v1.4.6, Kafka, Airflow v1.9.0, Oozie, HBase,
Oracle, Teradata, Cassandra, MLlib, Tableau, Maven, Git, Jira.

Client: UPS, Louisville, KY


Dec 2018 – Mar 2020
Data Engineer
Roles & Responsibilities

 Worked as Data Engineer to review business requirement and compose source to target data
mapping documents.
 Advice the business on best practices in the Spark Sql while making sure the solution meet the
business needs.
 Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes. 
 Involve in preparation, distribution and collaboration of client specific quality documentation on
developments for Big Data and Spark along with regular monitoring on reflecting the modifications
or enhancements done in Confidential Schedulers.
 Designed dimensional data models using Star and Snowflake Schemas.
 Consumed XML messages using Kafka and processed the xml file using Spark streaming to capture UI
updates.
 Experience on Vagrant, AWS and Kubernetes based container deployments to create self-
environments for dev teams and containerization.
 Implemented a production ready, load balanced, highly available, fault tolerant Kubernetes
infrastructure.
 Azure Kubernetes Service was used to deploy a managed Kubernetes cluster in Azure and built an
Azure portal AKS cluster with Azure CLI and used template-driven deployment options such as
templates for the Resource Manager and Terraform.
 Used Kubernetes to deploy scale, load balance, scale and manage docker containers with multiple
name spaced versions.
 Worked on Snowflake environment to remove redundancy and load real time data from various
data sources into HDFS using Kafka.
 Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
 Built and designed ETL pipeline using python to fetch data from Redshift data warehouse and
applications.
 Accessed the Hive tables using Spark Hive context (Spark sql) and used scala for interactive
operations.
 Created S3 buckets in the AWS environment to store Marketing data and copy data to Redshift
cluster.
 Used EMR (Elastic Map Reducing) to perform big data operations in AWS.
 Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue,
Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
 Implemented a fully operational production grade large scale data solution on Snowflake Data
Warehouse.
 Worked with structured/semi-structured data ingestion and processing on AWS using S3, Python
and Migrate on-premises big data workloads to AWS.
 Wrote Python scripts to parse XML documents and load the data in database.
 Worked on QA the data and adding Data sources, snapshot, caching to the report
 Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are
resolved on basis of using defect reports.
 Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
 Involved in preparing SQL and PL/SQL coding convention and standards.
 Involved in Data mapping specifications to create and execute detailed system test plans.

Environment: Agile, ODS, OLTP, ETL, HDFS, Kafka, AWS, S3, Python, K-means, XML, SQL, Talend,
Redshift, Glue, Lambda, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Ranger, Git. Spark, Hive,
Scala

Client: ICICI Prudential, India


Jun 2017 – Jun 2018
Designation: Big Data Developer
Roles & Responsibilities:

 Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming,
Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka
 Performing hive tuning techniques like partitioning and bucketing and memory optimization.
 Worked on different file formats like parquet, orc, json and text files.
 Used spark sql to load data and created schema RDD on top of that which loads into hive tables and
handled structured using spark sql
 Worked on analysing Hadoop cluster using different big data analytic tools including Flume, Pig,
Hive, HBase, Oozie, ZooKeeper, Sqoop, Spark and Kafka.
 As a Big Data Developer implemented solutions for ingesting data from various sources and
processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce
Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.
 Explored with the Spark improving the performance and optimization of the existing algorithms
in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark, YARN.
 Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types
of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
 Involved in the data support team as role of bug fixes, schedule change, memory tuning, schema
changes loading the historic data.
 Worked on implementation of some check points like hive count check, Sqoop records check, done
file create check, done file check and touch file lookup.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java,
GitHub, Talend Big Data Integration, Impala.

Client: GGK Technologies, Hyderabad, India


Jan 2015 – May 2017
Designation: Big Data Developer
Roles & Responsibilities
 Involved in the Complete Software development life cycle (SDLC) to develop the application.
 Worked with different source data file formats like JSON, CSV, and ORC etc. 
 Worked on Spark for improving performance and optimization of existing algorithms in Hadoop
using Spark-SQL and Scala.
 Worked on creating Hive managed and external tables based on the requirement.
 Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion
into Hive tables for analysis.
 Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP,
performed transformations using Hive, Pig and loaded data back into HDFS. 
 Import and export data between the environments like MySQL, HDFS and deploying into
productions.  
 Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve
the performance.
 Involved in developing Impala scripts to do Adhoc queries. 
 Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive,
Shell scripts, etc. 
 Involved in importing and exporting data from HBase using Spark. 
 Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment. 
 Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in
Hive like Structs and Maps.
 Created Cassandra tables to load large sets of structured, semi-structured and unstructured data
coming from UNIX, NoSQL and a variety of portfolios.
 Collaborated with the infrastructure, network, database, application, and BI teams to ensure data
quality and availability.
 Designing ETL processes using Informatica to load data from Flat Files, Oracle, and Excel files to
target Oracle Data Warehouse database.
 Worked on Hortonworks distribution and responsible for Data Ingestion, Data Cleansing, Data
Standardization and Data Transformation.
 Worked on Oozie to develop workflows to automate ETL data pipeline.
 Imported data from various sources into Spark RDD for analysis.
 Configured Oozie workflow to run multiple Hive jobs which run independently with time and data
availability. 
Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Horton works, MapReduce,
Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper, Oozie, Oracle, MySQL, Netezza, Kubernetes,
Docker, CI/CD and UNIX Shell Scripting.

You might also like