Expert Big Data Engineer Resume

Jagrut Nimmala
(980)-705-0090
jagrut9202@gmail.com
https://www.linkedin.com/in/jagrut9202/
PROFESSIONAL SUMMARY
 Over 7 +Years of professional IT experience which includes experience in big data ecosystem and Java/J2EE
related technologies
 Specializing in ML/Big Data and Web architecture solutions using Scala2.11, Python, Hive, Spark, Kafka and
Storm.
 Expertise in JVM (Java Virtual Machine) and Java based Middleware.
 Experienced with Cloudera, Horton networks and Map R
 In-depth understanding of Data Structure and Algorithms.
 Extensive experience in working with MS Excel, SQL Server and RDBMS databases
 Experience in developing some deliverable documentations including Data Flow, Use Cases, and Business
rules.
 Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years' worth of
claim data to detect and separate fraudulent claims.
 Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into
HDFS (AWS cloud) using Sqoop and Flume.
 Hands on experience in development, installation, configuring, and using Hadoop & ecosystem components like
Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Flume, Kafka and Spark
 Implemented in setting up standards and processes for Hadoop based application design and implementation.
 Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
 Involved in creating Hive tables, loading, and analyzing data using Hive scripts.
 Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
 Involved in build applications using SBT and integrated with continuous integration servers like Jenkins to build
jobs.
 Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud
Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage,
Composer.
 Experience in managing and reviewing Hadoop log files.
 Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, and Hue.
 Setting up the High-Availability for Hadoop Clusters components and Edge nodes
 Experience in developing Shell scripts and Python Scripts for system management.
 Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
 Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage
mechanism.
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-
versa.
 Extensive experience in developing applications using JSP, Servlets, Spring, Hibernate, Java Script, Angular,
AJAX, CSS, JQuery, HTML, JDBC, JNDI, JMS, XML, and SQL across the platforms like Windows, Linux,
and UNIX.
 Proven ability to investigate and customize large scale software like JVM, Web kit and open-source projects.
 Experienced in a variety of scripting languages such as UNIX scripts and Java Scripts.
 Installing, configuring, and managing of Hadoop Clusters and Data Science tools
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Zookeeper, Hive, Kafka, Sqoop, Oozie,
Flume, Spark
Programming Languages: Java, VB, Python
Scripting Languages : JSP & Servlets, PHP, JavaScript, XML, HTML, and Python.
Databases : Oracle, My SQL, MS SQL
Tools : Eclipse, CVS, Ant, MS Visual Studio, NetBeans
Platforms : Windows, Linux/Unix
Application Servers : Apache Tomcat 5.x 6.0, Jboss 4.0
Methodologies : Agile, UML, Design Patterns
EDUCATION
 Masters in Computers and Information Sciences
Southern Arkansas University, AR - 2016
 Bachelors in Computer Science
Jawaharlal Nehru Technological University, Hyderabad, India - 2012
PROFESSIONAL EXPERIENCE
Client: DXC Technology, Tysons, VA Aug 2020 – Till Date
Big Data Engineer/PySpark Developer
Roles and Responsibilities:
 Designed and Implemented Big Data Analytics architecture, transferring data from Oracle.
 Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data.
Understand current Production state of application and determine the impact of new implementation on existing
business processes.
 Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more
Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure
Databricks.
 Created SSIS reusable packages to extract data from multi formatted flat files and excel files into SQL Database
 Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deploy the
Talend ETL processes for Data warehouse team using PIG, HIVE
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from
different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
 Designed, developed and deployed Business Intelligence solutions using SSIS, SSRS and SSAS
 Experienced with API & Rest services in collecting the data and publishing to downstream applications
 Designed Data Pipeline to migrate the data from on-prem/traditional sources to Cloud Platform
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation
from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage
pattern.
 Developed PySpark and Spark-SQL code to process the data in Apache Spark on Amazon EMR to perform the
necessary transformations based on the STMs developed.
 Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
 Developed business intelligence solutions using SQL server data tools 2015 & 2017 versions and load data to
SQL & Azure Cloud databases.
 Involved in creating fact and dimension tables in the OLAP database and created cubes using MS SQL Server
Analysis Services (SSAS)
 Exposure to Lambda functions and Lambda Architecture.
 Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads.
 Implemented logical and physical relational database and maintained Database Objects in the data model using
Erwin.
 Exporting the analyzed and processed data to the RDBMS using Sqoop for visualization and for generation of
reports for the BI team.
 Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python
and PySpark.
 Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use
cases, performance evolution.
 Created SSIS Packages to perform filtering operations and to import the data on daily basis from the OLTP
system to SQL server.
 Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
 Built MDX queries for Analysis Services (SSAS) & Reporting Services (SSRS).
 Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD’s in Scala
 Worked on designing, building, deploying and maintaining Mongo DB.
 Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful
data.
 Coordinated with team and Developed framework to generate Daily adhoc, Report's and Extracts from
enterprise data and automated using Oozie.
 Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
 Developed pipeline for POC to compare performance/efficiency while running pipeline using the AWS EMR
Spark cluster and Cloud Dataflow on GCP.
 Experienced in Designed and developed Data models for Database (OLTP), the Operational Data Store (ODS),
Data warehouse (OLAP), and federated databases to support client enterprise Information Management Strategy
and excellent Knowledge of Ralph Kimball and BillInmon's approaches to Data Warehousing.
 Responsible for maintaining and tuning existing cubes using SSAS and Power BI.
 Worked on cloud deployments using maven, docker and Jenkins
Environment: Hadoop, Python, HDFS, Spark, AWS Redshift, AWS Glue, Map Reduce, Pig, Hive, Sqoop, Kafka,
HBase, Oozie, Flume, Scala, Java, SQL Scripting and Talend, Pyspark, Linux Shell Scripting, Kinesis, Cassandra,
Zookeeper, HBase, MongoDB, Cloudera, Cloudera Manager, EC2, EMR, S3, Oracle, MySQL.
Client: Goldman Sachs, Chicago, IL May 2019 – Aug 2020

Data Engineer
 Involved in complete project life cycle starting from design discussion to production deployment.
 Designed and Configured Azure Cloud relational servers and databases analyzing current and future business
requirements.
 Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) &
Azure SQL DB).
 Have extensive experience in creating pipeline jobs, scheduling triggers, Mapping data flows using Azure Data
Factory(V2) and using Key Vaults to store credentials
 Design and develop a daily process to do incremental import of raw data from DB2 into Hive tables using
Sqoop.
 Involved in debugging Map Reduce job using MR Unit framework and optimizing Map Reduce.
 Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into Hive tables.
 Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
 Good experience working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse
analytics (DW).
 Created packages in SSIS with error handling as well as created complex SSIS packages using various data
transformations like conditional split, Cache, for each loop, multi cast, Derived column, Data conversions,
Merge, OLEDB Command, script task components
 Worked on creating correlated and non-correlated sub-queries to resolve complex business queries involving
multiple tables from different databases.
 Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to
HDFS, HBase and Hive by integrating with Storm.
 Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
 Strong experience in Normalization and De-normalization techniques for effective and optimum performance in
OLTP and OLAP environments and experience with Kimball Methodology and Data Vault Modeling
 Designed Hive external tables using shared meta-store instead of derby with dynamic partitioning &buckets.
 Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Nifi and web Methods.
 Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
 Design & implement ETL process using Abinito to load data from Worked extensively with Sqoop for
importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
Loading data into HDFS.
 Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
 Created concurrent access for hive tables with shared/exclusive locks enabled by implementing Zookeeper in
cluster.
 Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and
administration.
 Implemented using Pyspark and SQL for faster testing and processing of data. Real time streaming the data
using with KAFKA.
 Experience in moving data between GCP and Azure using Azure Data Factory.
 Used OOZIE Operational Services for batch processing and scheduling workflows dynamically. worked on
creating End-End data pipeline orchestration using Oozie.
 Populated HDFS and Cassandra with massive amounts of data using Apache Kafka.
 Involved in design and developed Kafka and Storm based data with the infrastructure team.
 Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration,
Pyspark, Sqoop and Flume.
 Developed Hive Scripts, Pig scripts, Unix Shell scripts, programming for all ETL loading processes and
converting the files into parquet in the Hadoop File System.
 Worked with Oozie and Zookeeper to manage job workflow and job coordination in the cluster.
Environment: Hadoop, Hive, Impala, Oracle, PySpark, Pig, Sqoop, Oozie, Map Reduce, SQL, Abinitio, Azure.
Client: EchoStar, Englewood, CO Mar 18 – Apr 2019

Big Data Engineer/PySpark Developer
 Involved in complete project life cycle starting from design discussion to production deployment.
 Worked closely with the business team to gather their requirements and new support features.
 Developed a 16-node cluster in designing the Data Lake with the Cloudera Distribution.
 Responsible for building scalable distributed data solutions using Hadoop.
 Implemented and configured High Availability Hadoop Cluster.
 Installed and configured Hadoop Clusters with required services (HDFS, Hive, HBase, PySpark, Zookeeper).
 Developed Hive scripts to analyze data and PHI are categorized into different segments and promotions are
offered to customer based on segments.
 Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
 Created KPIs, calculated members, Named Sets by using SSAS
 Extensive experience in writing Pig scripts to transform raw data into baseline data.
 Developed UDFs in Java as and when necessary to use in Pig and HIVE queries.
 Worked on Oozie workflow engine for job scheduling.
 Building Reusable Data ingestion and Data transformation frameworks using Python.
 Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads and used Pig
as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto
HDFS.
 Develop, maintain, monitor, and performance tuning of the data mart databases and SSAS OLAP cube(s)
 Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
 Created different staging tables like ingestion tables and preparation tables in Hive environment.
 Optimized Hive queries and used Hive on top of PySpark engine.
 Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines,
such as AWS, GCP
 Worked on Sequence files, Map side joins, Bucketing, Static and Dynamic Partitioning for Hive performance
enhancement and storage improvement.
 Experience in retrieving data from oracle using PHP and Java programming.
 Tested Apache TEZ, an extensible framework for building high performance batch and interactive data
processing applications, on Pig and Hive jobs.
 Created tables in HBase to store the variable data formats of data coming from different upstream sources.
 Experience in managing and reviewing Hadoop log files.
 Good understanding of ETL tools and how they can be applied in a Big Data environment.
 Followed Agile Methodologies while working on the project.
 Bug fixing and 24-7 production support for running the processes
Environment: Hadoop, Python, MapReduce, HDFS, Sqoop, GCP, flume, Kafka, Hive, Pig, HBase, SQL, Shell
Scripting, Eclipse, DBeaver, Datagrip, SQL Developer, IntelliJ, GiT, SVN, JIRA, Unix, SSIS, SSAS
Client: Anthena health Inc, Watertown, MA July 2016 – Feb 2018

Big Data / Hadoop Developer
Setup, configuration and maintenance of a large-scale Hadoop based distributed computing cluster for analytic analysis
of petabytes of event data with associated metadata. Engineered data flows from a heterogeneous collection of both on
and offsite data sources as well as engineered monitoring and analysis frameworks for real-time analysis of both data
flows and data flow contents. Worked on Research and Development Hadoop cluster of 2.5 Petabytes Cloudera Hadoop
cluster serving as primary experimental test bed for real time analytic processing and multi-terabyte data models.
Equipped with Pig, hive, spark, MapReduce, Impala and kept on the bleeding edge of current applications
 Configured a Spark streaming application to stream syslog’s and various application logs from 100+ nodes for
monitoring and alerting as well as to feed the data to dynamic dashboards.
 Migrated traditional MR jobs to Spark MR Jobs.
 Worked on setup process Hadoop cluster on Amazon EMR/S3 for POC.
 Transformed the processing framework from Map-Reduce to Spark. This increases the overall speed of
processing. Processed the data using Spark Context, Spark-SQL, Data Frame API and Pair RDDs using Scala.
 Worked on PySpark SQL and Spark Streaming.
 Imported, exported file to the HDFS, Hive, Impala SQL.
 The processed results were consumed by HIVE, Scheduling applications and various other BI reports through
data warehousing multi-dimensional models.
 Developed Hive Scripts, Pig scripts, UNIX Shell scripts, programming for all ETL loading processes and
converting the files into parquet in the Hadoop File System.
 Run Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
 Used Python for SQL/CRUD operations in DB, file extraction/transformation/generation.
 Worked on setup process of Hadoop cluster on Amazon EMR / S3 for poc.
 Big data analysis using Pig, Hive and User defined functions (UDF).
 Performed joins, group by and other operations in MapReduce using Java or PIG Latin
 Setting up Virtual Machines and managing storage devices
 Involved in managing and reviewing Hadoop log files
 Developed Scripts and Batch Jobs to schedule various Hadoop Program.
 Used Amazon EC2 as a instance from Amazon S3 web services on Databricks framework.
 I analyzed Apache webserver log files using Notebook.
 Notebook allow users to write and run arbitrary Apache spark code, interactively visualize the results.
Environment: Hadoop 2.6.0-cdh5.4.2, Scala2.11, Python, Spark, Hadoop, Cloudera Manager, Big Data, Redhat Linux,
java, Perl, Cloudera Navigator, Amazon AWS, Elastic Search.
Client: HDFC Ergo, Mumbai - India Oct 2012 – Nov 2014

Java Developer
 Worked on designing the content and delivering the solutions based on understanding the requirements.
 Wrote web service client for tracking operations for the orders which is accessing web services API and
utilizing in our web application.
 Developed User Interface using JavaScript, JQuery and HTML.
 Used AJAXAPI for intensive user operations and client-side validations.
 Worked with Java, J2EE, SQL, JDBC, XML, JavaScript, web servers.
 Utilized Servlet for the controller layer, JSP and JSP tags for the interface
 Worked on Model View Controller Pattern and various design patterns.
 Worked with designers, architects, developers for translating data requirements into the physical schema
definitions for SQL sub-programs and modified the existing SQL program units.
 Designed and Developed SQL functions and stored procedures.
 Involved in debugging and bug fixing of application modules.
 Efficiently dealt with exceptions and flow control.
 Worked on Object Oriented Programming concepts.
 Added Log4j to log the errors.
 Used Eclipse for writing code and SVN for version control.
 Installed and used MS SQL Server 2008 database.
 Spearheaded coding for site management which included change of requests for enhancing and fixing bugs
pertaining to all parts of the website.
Environment: Java, JavaScript, JSP, Rest API, JDBC, Servlets, MS SQL, XML, Windows XP, Ant, SQL Server
database, Eclipse Luna, SVN.

Expert Big Data Engineer Resume

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert Big Data Engineer Resume

Uploaded by

Copyright:

Available Formats

Jagrut Nimmala

Client: Goldman Sachs, Chicago, IL May 2019 – Aug 2020

Client: EchoStar, Englewood, CO Mar 18 – Apr 2019

Client: Anthena health Inc, Watertown, MA July 2016 – Feb 2018

Client: HDFC Ergo, Mumbai - India Oct 2012 – Nov 2014

You might also like