Professional Documents
Culture Documents
(980)-705-0090
jagrut9202@gmail.com
https://www.linkedin.com/in/jagrut9202/
PROFESSIONAL SUMMARY
Over 7 +Years of professional IT experience which includes experience in big data ecosystem and Java/J2EE
related technologies
Specializing in ML/Big Data and Web architecture solutions using Scala2.11, Python, Hive, Spark, Kafka and
Storm.
Expertise in JVM (Java Virtual Machine) and Java based Middleware.
Experienced with Cloudera, Horton networks and Map R
In-depth understanding of Data Structure and Algorithms.
Extensive experience in working with MS Excel, SQL Server and RDBMS databases
Experience in developing some deliverable documentations including Data Flow, Use Cases, and Business
rules.
Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years' worth of
claim data to detect and separate fraudulent claims.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into
HDFS (AWS cloud) using Sqoop and Flume.
Hands on experience in development, installation, configuring, and using Hadoop & ecosystem components like
Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Flume, Kafka and Spark
Implemented in setting up standards and processes for Hadoop based application design and implementation.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Involved in creating Hive tables, loading, and analyzing data using Hive scripts.
Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
Involved in build applications using SBT and integrated with continuous integration servers like Jenkins to build
jobs.
Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud
Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage,
Composer.
Experience in managing and reviewing Hadoop log files.
Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, and Hue.
Setting up the High-Availability for Hadoop Clusters components and Edge nodes
Experience in developing Shell scripts and Python Scripts for system management.
Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage
mechanism.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-
versa.
Extensive experience in developing applications using JSP, Servlets, Spring, Hibernate, Java Script, Angular,
AJAX, CSS, JQuery, HTML, JDBC, JNDI, JMS, XML, and SQL across the platforms like Windows, Linux,
and UNIX.
Proven ability to investigate and customize large scale software like JVM, Web kit and open-source projects.
Experienced in a variety of scripting languages such as UNIX scripts and Java Scripts.
Installing, configuring, and managing of Hadoop Clusters and Data Science tools
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Zookeeper, Hive, Kafka, Sqoop, Oozie,
Flume, Spark
Programming Languages: Java, VB, Python
Scripting Languages : JSP & Servlets, PHP, JavaScript, XML, HTML, and Python.
Databases : Oracle, My SQL, MS SQL
Tools : Eclipse, CVS, Ant, MS Visual Studio, NetBeans
Platforms : Windows, Linux/Unix
Application Servers : Apache Tomcat 5.x 6.0, Jboss 4.0
Methodologies : Agile, UML, Design Patterns
EDUCATION
Masters in Computers and Information Sciences
Southern Arkansas University, AR - 2016
Bachelors in Computer Science
Jawaharlal Nehru Technological University, Hyderabad, India - 2012
PROFESSIONAL EXPERIENCE
Client: DXC Technology, Tysons, VA Aug 2020 – Till Date
Big Data Engineer/PySpark Developer
Roles and Responsibilities:
Designed and Implemented Big Data Analytics architecture, transferring data from Oracle.
Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data.
Understand current Production state of application and determine the impact of new implementation on existing
business processes.
Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more
Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure
Databricks.
Created SSIS reusable packages to extract data from multi formatted flat files and excel files into SQL Database
Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deploy the
Talend ETL processes for Data warehouse team using PIG, HIVE
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from
different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Designed, developed and deployed Business Intelligence solutions using SSIS, SSRS and SSAS
Experienced with API & Rest services in collecting the data and publishing to downstream applications
Designed Data Pipeline to migrate the data from on-prem/traditional sources to Cloud Platform
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation
from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage
pattern.
Developed PySpark and Spark-SQL code to process the data in Apache Spark on Amazon EMR to perform the
necessary transformations based on the STMs developed.
Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
Developed business intelligence solutions using SQL server data tools 2015 & 2017 versions and load data to
SQL & Azure Cloud databases.
Involved in creating fact and dimension tables in the OLAP database and created cubes using MS SQL Server
Analysis Services (SSAS)
Exposure to Lambda functions and Lambda Architecture.
Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads.
Implemented logical and physical relational database and maintained Database Objects in the data model using
Erwin.
Exporting the analyzed and processed data to the RDBMS using Sqoop for visualization and for generation of
reports for the BI team.
Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python
and PySpark.
Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use
cases, performance evolution.
Created SSIS Packages to perform filtering operations and to import the data on daily basis from the OLTP
system to SQL server.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
Built MDX queries for Analysis Services (SSAS) & Reporting Services (SSRS).
Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD’s in Scala
Worked on designing, building, deploying and maintaining Mongo DB.
Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful
data.
Coordinated with team and Developed framework to generate Daily adhoc, Report's and Extracts from
enterprise data and automated using Oozie.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Developed pipeline for POC to compare performance/efficiency while running pipeline using the AWS EMR
Spark cluster and Cloud Dataflow on GCP.
Experienced in Designed and developed Data models for Database (OLTP), the Operational Data Store (ODS),
Data warehouse (OLAP), and federated databases to support client enterprise Information Management Strategy
and excellent Knowledge of Ralph Kimball and BillInmon's approaches to Data Warehousing.
Responsible for maintaining and tuning existing cubes using SSAS and Power BI.
Worked on cloud deployments using maven, docker and Jenkins
Environment: Hadoop, Python, HDFS, Spark, AWS Redshift, AWS Glue, Map Reduce, Pig, Hive, Sqoop, Kafka,
HBase, Oozie, Flume, Scala, Java, SQL Scripting and Talend, Pyspark, Linux Shell Scripting, Kinesis, Cassandra,
Zookeeper, HBase, MongoDB, Cloudera, Cloudera Manager, EC2, EMR, S3, Oracle, MySQL.