You are on page 1of 5

Mohit Shivramwar

Email-ID: mshivramwar90@gmail.com Mobile Number: +1 732 440 8107

Professional Summary:

 Big Data Engineer having 11+ years of IT experience, which includes experience in Big Data Analytics
field and Java Web Application development.
 In depth understanding of Hadoop Architecture including YARN and various components such as
HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR. 
 Experienced and good knowledge on Spark architecture and designing optimized Spark ETL.
 Hands on experience with Spark Core, Spark SQL and Data Frames and RDD`s.
 Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries
into Spark transformations.
 Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and
Teradata into HDFS and Hive using Sqoop. 
 Developed Map Reduce programs in Java for data cleansing, data filtering, and data aggregation. 
 Proficient in designing table partitioning, bucketing and optimized hive scripts using different
performance utilities and techniques. 
 Proficient in working different file formats such as ORC, Parquet and Text.
 Experience in developing Hive UDF's and running hive scripts.
 Experience in working on cloud platforms like AWS and GCP.
 Experience in migration of the data from On-premises Hadoop cluster to cloud (AWS & GCP).
 Experience in using different job orchestration tool such as Airflow, Universal Controller, Azkaban
and Autosys.
 Experience in working on Restful web services using Spring boot. Used Rest/SOAP extensively and
tested with Postman.
 Experienced in working with confluent Kafka and Ksql
 Experience in setting up and working on Presto with data sources such as Hive, Mysql, Postgresql,
S3, Hbase with Phoenix.
 Extensive programming experience in Java Core concepts like OOPS, Collections and IO. 
 Well-Versed with Agile/SCRUM and Waterfall methodologies using Software development tool JIRA.
 Experience using Jira for ticketing issues and Jenkins & GIT for continuous integration.
 Extensive experience with UNIX commands, shell scripting and setting up CRON jobs.
 Hands-on experience in using Bit bucket, Subversion and GIT as source code version control.
 Good experience in using Relational databases Oracle, MySQL, SQLServer and PostgreSQL. 
 Strong team player with good communication, analytical, presentation and inter-personal skills.
 Experienced in analyzing business requirements and translating requirements into functional and
Technical design specifications.

Education: Bachelor of Engineering (B.E) in Information Technology in 2011.

Technical Skills
• Big Data Technologies: Map Reduce, Hive, Pig, Impala, Hue, Sqoop, Kafka, Presto
• Spark components: RDD, Spark SQL (Data Frames and Dataset).
• Cloud Infrastructure: AWS S3, EC2, EMR, Lambda, Cloud Front.GCP Data Proc, GS Bucket, Big Query.
• Programming Languages: SQL, Core Java, Basic of Scala, Python.
• Databases: Oracle, MySQL, SQL Server.
• Scripting and Query Languages: Shell scripting, SQL and PL/SQL.
• Web Technologies: SpringBoot. SOAP, Rest JavaScript, CSS, HTML and JSP.
• Operating Systems: Windows, UNIX/Linux and Mac OS.
• Build Management Tools: Maven, Ant, Jenkins.
• IDE’S & Command line tools: Eclipse, IntelliJ and WinSCP.

Work Experience:

Early Warning Services Jun 2022 – Till date


Lead Data Engineer Location: Chandler, AZ

Early Warning Services, LLC (EWS), the network owner of Zelle ®, is a fintech company owned by seven of
the country’s largest banks. It is having tremendous amount of data generated with various banks with
Zelle transfers and other financial related activities. Worked on different projects to ingest the data from
various enterprise sources to Kafka topics. While putting the data into Kafka topic process/transform
the data as per business transformation logic. Eventually load the data into various target sync such as
EDW table, Aerospike and Elastic search with help of spark loader. This data is being used by data
scientist to train their AI models and run the different financial advice to the customer.

Responsibilities:

 Created data pipeline for ingestion, transformation and aggregation of different financial data using
Pyspark . This includes various type of files ingestions such as CSV, fixed length, IBM 1047 files etc.
 Worked on framework to parse the files in Python/Pyspark and put the data into Kafka topic.
 Developed Spark loaders in Pyspark to load the data into EDW, Aerospike and Elastic from Kafka
topic.
 Created Universal Controller jobs to automate processing of wide range of data sets.
 Monitored and Troubleshot Spark jobs using Yarn Resource Manager Optimized the data to be used
by Data Scientists.
 Worked on Utility in Pyspark to merge the data of two indexes of Elastic search.
 Worked on Agile scrum team using Jira for ticketing issues.
 Worked on CICD Pipeline using Maven, Jenkins, GIT for continuous integration.
 Gathering and understanding the requirements of the client.
 Preparing the Technical Design Document based on the Business Requirement Document.
 Drafting Architecture documents and reviewing with client.
 Responsible for deploying the code on all the environment Dev, UAT and Production.

Environment: Hadoop, Kafka, Spark, Python, Hive, Sqoop, Cloudera, Autosys, UNIX Shell Scripting, Maven.
Wells Fargo Jan 2020 – Jun 2022
Sr. Big Data Engineer Location: Chandler, AZ

Wells Fargo is a provider of banking, mortgage, investing, credit card, and personal, small business, and
commercial financial services. It is having tremendous amount of data generated with user transactions,
customer, sales, mortgages, mutual funds and all financial related activities. Worked on different
projects to ingest the data from various enterprise sources to Data lake of Hadoop and process it.
Eventually published the data to data scientist to train their AI models and run the different financial
advices to the customer. These are around 500 different models where these data are being used.

Responsibilities:

 Created a data pipeline to extract the data from RDBMS such as Oracle, Teradata, SQL Server to
Hadoop Hive tables in Parquet format by using Sqoop.
 Created data pipeline for ingestion, transformation and aggregation of different financial data using
Pyspark, Hive . This includes file ingestion of Bill Payments and Credit Card Transactions with volume
of 129 million daily records. (30 GB per day in size)
 Created Autosys Scheduling scripts in Jil to automate processing of wide range of data sets.
 Monitored and Troubleshot Hadoop jobs using Yarn Resource Manager Optimized the data to be
used by Data Scientists.
 Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
 Worked on CSV file ingestion pipelines using Pyspark into Hadoop.
 Worked on Agile scrum team using Jira for ticketing issues.
 Worked on CICD Pipeline using Maven, Jenkins, GIT for continuous integration.
 Gathering and understanding the requirements of the client.
 Preparing the Technical Design Document based on the Business Requirement Document.
 Drafting Architecture documents and reviewing with client.
 Responsible for deploying the code on all the environment Dev, UAT and Production.

Environment: Hadoop, Spark, Python, Hive, Sqoop, MapR, Autosys, UNIX Shell Scripting, Maven.

Seagate Technology Oct 2018 – Dec 2019


Sr. Big Data Cloud Engineer Location: Pune,
India
Seagate having all the data related to the manufacturing of hard disk & its components, factory sensor
data of various devices, image data of various components etc. Worked on two different projects to
ingest the data from various enterprise sources to AWS S3 and process it. Eventually published the data
to business through Presto to the custom dashboard created client side. This data also be used in ML
models to automatically detect the defects in their hard disks.

Responsibilities:

 Technical Lead of a team of developers in Agile scrum.


 Created data pipeline for ingestion, transformation, and aggregation of sensor data in AWS S3.
 Created data pipeline for ingestion of image data into AWS cloud front mounted on S3 bucket.
 Created an ingestion pipeline using Sqoop to ingest the data from Oracle to Hive tables mounted in
AWS S3 bucket.
 Built a data pipeline to ingest the data from Oracle to HBase using spark and Phoenix.
 Created Airflow Scheduling scripts in Python to automate processing wide range of data sets.
 Monitored and Troubleshot Hadoop jobs using Yarn Resource Manager
 Set up of three node airflow cluster using AWS EC2 instances.
 Optimized the existing jobs and brought down the running time of critical jobs by 65%
 Optimized the data to be retrieved using Presto and down the execution time of Presto query to
80%.
 Drafting Architecture documents and reviewing with client.

Environment: AWS, Hadoop, Spark, Python, EMR, HDFS, Hive, Sqoop, CloudFront, Presto, Airflow, UNIX
Shell Scripting, Maven.

KOHLS Retailer Oct 2015 – Sept 2018


Sr. Big Data Cloud Engineer Location: Pune, India
Kohls is an American department store retail chain, operated by Kohl's Corporation. With 1,158
locations, it is the largest department store chain in the United States. I have worked on three projects
of Kohls and the purpose of projects are to get the sales data from different sources to the Hadoop
cluster and perform the required Transformation and store it in the HIVE database of the cluster and use
the data for real time visualization using Hive, HBase and Tableau to analyze what type of products are
popular in a store at a particular region, to improve business. Also, to create a single, multi-dimensional
view of the customer that leverages 3rd party data to improve customer matching and enhance
customer data.

Responsibilities:

 Technical Lead of a team of developers in Agile scrum.


 Preparing the Technical Design Document based on the Business Requirement Document.
 Responsible for loading the sales data and event logs from Oracle database, Teradata into HDFS.
 Responsible for loading the customer’s data coming in XML format into HDFS and hive tables.
 Responsible for loading the Digital marketing data into HDFS and hive tables.
 Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally
in Map Reduce way.
 Implemented different analytical algorithms using Map Reduce programs to apply on top of HDFS
data.
 Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
 Load and transform large data sets of structured, semi structured and unstructured data.
 Create/Modify shell scripts for scheduling various data cleansing scripts and ETL load process.
 Created data validation scripts to ensure data is in SYNC between Source and Target.
 Responsible for deploying the code on all the environment Dev, UAT and Production.
 Involved in working migration of the on-premise Hadoop system to GCP. It includes migrating data
from Hadoop to Big Query.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Zookeeper, Azkaban, Oracle 11g/10g, UNIX
Shell Scripting, Maven, GIT, GCP, Big Query.

MEDIAMORPH May 2011 – Sept 2015


Software Developer Location: Pune, India
Mediamorph lies at the center of the media and entertainment ecosystem. All major Hollywood studios,
leading television networks and the largest video service operators use Mediamorph to better measure,
understand and future-proof their businesses in a rapidly changing environment. Mediamorph do this
through our industry leading cloud-based platform that collects rights, performance and social data on
an industry-wide scale. Mediamorph then overlay that data with tools to automate operations, perform
data analysis and optimize business results. Mediamorph’s core services include Cross platform
audience measurement, Contracts and rights management, Accounting and royalty automation, and
Social data dashboards

Responsibilities:
 Gathering and understanding the requirements of the client
 Preparing the Technical Design Document based on the Function Design specification.
 Responsible as a full stack developer to design and deliver full stack web applications.
 Responsible for working on Restful web services using Spring boot.
 Used Rest/SOAP extensively and tested with Postman.
 Responsible for making the changes in web application as per client requirement which involves
changing JSP, Java code changes as well as Database changes.
 Writing Unix script for migrating the data to Data warehouse.
 Developed Hibernate Configuration files for PostgreSQL and Java integration.
 Write PLSQL Procedures and functions in PostgreSQL
 Write different business logic in Java.

Environment: SQL, PL/SQL, PostgreSQL, Struts2, Spring, Rest, SOAP, Postman Core Java, Unix Shell Script

You might also like