Hemanth Hadoop

HEMANTH PEDDAREDDYGARI
Phone: +91 9866673012 | Email: hemanthth412@yahoo.com
Professional Summary
 Databricks Certified Data Engineer Associate, Fivetran certified Delivery Foundations. Databricks
Certification valid till January 2025.
 10 years of experience in IT on different tech stack like Databricks, Snowflake, Big Data/Hadoop
Development, Spark (PySpark), AWS, GCP, Abinitio and Unix.
 Experience in data ingestion using Apache Sqoop from RDBMS to HDFS and Hive.
 Good SQL experience and exposure to MS SQL Server, MYSQL, Postgress, Greenplum, Snowflake,
Big Query and Oracle databases.
 Have basic knowledge in SAP BO, Microsoft PowerBI, Qlik Sense, Denodo data visualization products
on business intelligence.
 Related experience with Hadoop and its ecosystem components like Hive, Pig, Sqoop, Impala,
MapReduce, Oozie etc.
 Related experience with AWS, GCP and its services like S3, EMR, Athena, Big Query, GCS and Airflow.
 Related experience with Atlassian offerings like Confluence, JIRA, Bitbucket.
 Good knowledge about Spark (Pyspark), Databricks, Git and Gitlab.
 Hadoop cluster setup, installation, and configuration experience of multi-node cluster using Hortonworks
or Cloudera distributions of Hadoop.
 Experience in methodologies such as Agile and Scrum.
 Good problem solving skills, Communications skills and willingness to learn.
Academic Details
MS in Software Engineering(Part 2015 Birla Institute of Technology & Science (BITS), Pilani. 6.7 CGPA
time)
B.Tech (Electronics & 2011 Jawaharlal Nehru Technological University, Anantpur, 72.04%
Communication Engineering) Andhra Pradesh.
Intermediate 2007 Sri Chaitanya Jr. College, Chittoor, Andhra Pradesh. 89.1%
Class X 2005 The Indian School, Chittoor, Andhra Pradesh. 84.33%
Technical Skills
Primary Skill category Databricks, Hadoop

Sub Skills Map Reduce, Sqoop, HiveQL, Impala, Unix Shell scripting, Pig Scripting,
Oozie Scheduling, Spark (Pyspark), AWS (S3, EMR, Athena, RDS, MWAA),
GCP (Big Query, GCS, Databricks).
Secondary skills Snowflake cloud data warehouse, Ab initio, Python, Unix
Sub skills Git, Gitlab, JIRA, Confluence, Bitbucket
Work Profile
1. Software Engineer, Tech Mahindra: Aug’2013 – Aug’2015

2. Assistant Consultant, TCS: Dec’2015 – Mar’2022
3. Data Engineering Associate Manager, Accenture: Mar’2022 – Present
Projects Handled
1. EDW Remediation – HIGHMARK
Organization Accenture
Duration May 2022 – Present
Client HIGHMARK, USA
Role Databricks Data Engineer
Environment (with skill Language/ DB : Databricks, Pyspark, Denodo VQL, GCP, GCS, Big
versions) Query, SQL
O/S : Windows 11
Tools : Putty, PyCharm, Git lab, Databricks UI, gsutils, Denodo
Project Description
As part of Enterprise Data warehouse (EDW) migration to Cloud project, Data loading is handled by Common
Ingestion Framework (CIF) team. EDW Remediation is next main work stream in which we gather business logic
from legacy mainframe (COBOLs, JCLs) to convert to Python (PySpark). To validate and execute Blade Bridge
converted Informatica workflows to PySpark scripts and Teradata SQLs to BigQuery SQLs. Manually convert
Teradata cursors to BigQuery procedures and to develop a tool to convert user’s Teradata SQLs to Denodo
equivalent VQLs.
Contribution
 Managed team of 8 members in conversion of COBOLs, JCLs to Python (PySpark) in Databricks,

guided them in completion of deliverables on time.
 Assisted in validating converted Teradata SQLS to Big Query, able to convert Teradata cursors to
Big Query equivalent.
 Core member in developing automation tool to convert Teradata SQL to Denodo VQL.
 Converted around 100 Teradata patterns to equivalent Denodo patterns, also key role in validating
them.
 Active participation in code integration, stitching and presenting to client.
2. The Business Analytics – STELLANTIS
Organization TCS
Duration August 2018 – March 2022
Client STELLANTIS, USA
Role Cloud Data Engineer
Environment (with skill Language/ DB : Snowflake, Sqoop, Pig, Hive, Greenpum, Pyspark,
versions) Oozie, Spark, SQL, Unix, AWS, EMR, Redshift, Athena, S3
O/S : Windows, REDHAT
Tools : Putty, WinSCP, Squirrel, Ambari, VS, git, Snowflake UI, EMR
Project Description
As part of The Business Analytics – STELLANTIS engagement we continued to build analytical solutions and
also continued the maintenance of the applications. This engagement also includes continuing to define and
implement the optimal BA infrastructure, software configurations, and to monitor and support BA infrastructure
and software. As part of this project, we have supported On-premise Hadoop cluster and Greenplum DW which
we later migrated to AWS cloud and Snowflake DW.
Contribution
 Automated loading data from DB sources to HDFS using Sqoop jobs through Oozie.
 Creating Greenplum External tables on top of HDFS data and loading to internal tables after
performing if any transformations using PIG and PySpark.
 To achieve performance improvement, migrated all transformations from Pig to Pyspark.
 Participated in migrating On-premise data lake environment to AWS and Greenplum DW to
Snowflake cloud data warehouse.
 Used to provide the data to Data science team, Qlik sense app for visualization and self service
Business users.
3. ABB Data Lake Implementation
Organization TCS
Duration January 2016 – June 2018
Client ABB, Switzerland
Role Hadoop developer
Environment (with skill Language : Sqoop, HiveQL, Impala, Oozie, Shell scripting
versions)
O/S : CentOS and SUSE 11
Tools : Putty, WinSCP, Hue, Cloudera Manager, Confluence, JIRA,
Bitbucket
Project Description
With this project proved the technical capabilities of Hadoop as archival solution to Client. As part of this Project,
we have load data from the SQL-server, Oracle and DB2 databases to a Hadoop cluster (Cloudera) and make it
available for Querying and Visualization through existing SAP BO layer and Power BI. Also started analytics
projects like Vulnerability analytics, HR analytics in multiple phases and provided insights to improve business.
Contribution
a. Participated in client calls to gather and analyze the requirements
b. In writing Sqoop scripts to establish a connection between SQL server, Oracle, Db2 and Hadoop
for loading data to hive.
c. In writing validation script to validate the loaded data in hive with source data.
d. Established connections to SAP BO (Universe) and Power BI to Impala and Hive through ODBC
connections.
e. In writing several transformation scripts such as to update hive tables with flat file data coming
weekly basis, loading db2 data using hex functions, Loading CSV files data by cleansing the data
etc.
4. Data Integration Hub Automation Project
Organization Tech Mahindra

Client General Electric, USA
Role Hadoop developer
Duration February 2015 – August 2015
Environment (with skill Database : MS Sql, Map reduce, Hive and Netezza
versions)
O/S : RHEL 6.6
Distribution : HDP 2.3 with Ambari 2.1
Project Description
To automate the data ingestion process that incrementally loads the DuckCreek XML data into Netezza
through Hive. The process includes
 Extracting DuckCreek XML from a SQL Server database (Sqoop)
 Converting XML data into JSON (Map Reduce)
 Inferring schema changes from JSON (Map Reduce)
 Creating hive external tables on top of the JSON files.
 Loading JSON data into Hive ORC tables (Hive)
 Incremental Loading of tables into Netezza DW
Contribution
a. Installing HDP and Ambari to set up a 5 node cluster.
b. Loading data from MS SQL to HDFS using Sqoop.
c. Creating Hive External tables on top of the converted JSON files.
d. Loading the Hive ORC table’s data into Netezza using Fluid query.
5. Leading Telecom Operator
Organization Tech Mahindra

Duration November 2013 – October 2014
Client Leading Telecom Operator, Indonesia
Role ETL Developer
Tools Ab Initio, Oracle, Unix
Project Description
Our client is the second largest telecom operator in Indonesia. This project was basically on revamp of
the existing data mart. The latency of the existing data mart was D-3 which is really a big drawback. The main
aim of this data mart revamp is to bring down the latency to D-1 and near real time for some of the important
feeds.
Contribution
 Participated in team meetings to analyze the requirements assigned.
 Developed graphs for data extraction and scrubbing for various transformation processes.
 Worked on different change requests using Ab Initio graphical development environment according
to the specifications.
 Build the graphs to incorporate the transformation logic as per the business requirements.
(HEMANTH P)

Hemanth Hadoop

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hemanth Hadoop

Uploaded by

Copyright:

Available Formats

HEMANTH PEDDAREDDYGARI

Phone: +91 9866673012 | Email: hemanthth412@yahoo.com

Primary Skill category Databricks, Hadoop

1. Software Engineer, Tech Mahindra: Aug’2013 – Aug’2015

1. EDW Remediation – HIGHMARK

 Managed team of 8 members in conversion of COBOLs, JCLs to Python (PySpark) in Databricks,

2. The Business Analytics – STELLANTIS

4. Data Integration Hub Automation Project

Organization Tech Mahindra

5. Leading Telecom Operator

Organization Tech Mahindra

You might also like