You are on page 1of 4

PRANJAL SONI

BigData Engineer

E-mail: spranjal94@gmail.com Mobile:+7748877073

LinkedIn:-https://www.linkedin.com/in/pranjal-soni-367844106

Professional Summary:
• 3.4 years of IT experience working as BigData Developer in Big Data Engineering starting with technical
requirements, design and development of projects on platforms - Hadoop and Spark.
• Having experience in building data-ingestion pipeline and building EDW in Hadoop and Spark.
• Having certified as Google Certified Professional Data Engineer.
• Having 1.5 Years of working experience in Google Cloud Platform.
• Hands on experience on major components of Hadoop Ecosystem - HDFS , Hive, Pig, HBase, Sqoop,
Map Reduce ,YARN and Spark with Scala.
• Worked on real-time messaging system – Kafka with Spark Structured Streaming.
• Experience in end to end data-pipeline implementations - data ingestion, data cleansing ,data
processing and data loading in Hadoop and Spark.
• Experience around data analytics on Google Cloud Platform - worked on Dataproc , Google Cloud
Storage, BigQuery , BigTable , Dataflow , Apache Airflow , Google Cloud Composer.
• Experience around data analytics on Azure Cloud - worked on Azure Databricks(spark cluster & spark
job),Azure Data Lake Storage(ADLS) .
• Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems
and vice-versa.
• Experience in analyzing data using HiveQL , SparkSql, Pig Latin, and custom MapReduce programs in
Java.
• Having experience in Core Java , Scala ,Shell Scripting and Python .
• Worked with different storage file format such as ORC , Parquet and Avro.
• Experience around data analytics by processing - CSV FILES , JSON FILES AND FIXED LENGTH FILES.
• Implemented SCD 2 and CDC data pipelines.
• Implemented Joins , SerDe and User Defined Functions in Hive.
• Worked for optimizations and tuning of HiveQL and SparkSql.
• Knowledge in job workflow scheduling and monitoring tools like azkaban , autosys and oozie.
• Experience in continuous integration and continuous deployment CICD build tools such as - Jenkins.
• Experience in Code Management using versioning control - Git

Cloud Certification:
• Certified as Google Certified Professional Data Engineer
Technical Skills:
Hadoop Technologies and Distributions Cloudera Hadoop Distribution(CHD4, CHD5) and Horton works Data
Platform (HDP)
Hadoop Ecosystem HDFS, Map-Reduce, Hive, Pig, Sqoop, Oozie,Hbase,Spark , Kafka
NoSQL Databases HBase , BigTable
Programming Scala, Core Java, Shell Scripting
Google Cloud Platform Dataproc , BigQuery , Google Cloud Storage, BigTable , Dataflow ,
Cloud Composer
Real-time messaging system Kafka – Spark Structured Streaming
Microsoft Azure Databricks and Azure Datalake Storage
RDBMS ORACLE ,MySQL,NETEZZA,TERADATA
Version Control System Git

Professional experience:
Period Employer Location Designation
July - 2016 - till date Datametica Solutions Pvt Ltd Pune, India Bigdata Engineer

Projects:

Project: Datalake Setup On Azure Cloud Platform


Client : Catalina Marketing (US)
Role : Data Engineer / Hadoop Developer
Environment: Hadoop, Hive, Sqoop, ,Spark ,HDP 2.7, Azure VM , Azure Datalake Storage and MySql ,Python.
Project Description : Project is aimed to setup datalake over Azure Datalake Storage to be consumed further by
the data scientist for building different retail analytics ML models for example – 1) Reach Expansion 2) Shopper
Personalities 3)Shopper Insights 4)MFD.
Responsibilities:
• Designing and development of data Ingestion pipeline to ingest data from Netezza EDW to Microsoft
Azure Storage(Azure Data Lake Storage) using technologies - Shell Scripting , Sqoop job ,Hive,Mysql
Server.
• Designing and development of data ingestion pipeline to ingest files from SFTP Server Location to
Microsoft Azure Storage(Azure Data Lake Storage).File Types – CSV , Fixed Length File , Complex JSON
Files.
• Development of data pipeline in SparkSql , Hive and Azure Datalake Storage to build - Raw Data Layer ,
Stage Data Layer and Gold Data Layer of Datalake.
Project: Retail Hub 360
Client : Catalina Marketing (US)
Role : Data Engineer / Hadoop Developer / BA
Environment: Hadoop, Hive, Sqoop, ,Spark ,HDP 2.7, Azure VM , Azure Datalake Storage and MySql , Azure
Databricks.
Project Description : Project is aimed to build a retail analytics dashboard in order to showcase the
performance of retailer in different areas of - product , sales , promotion , campaign , audience etc.
Responsibilities:
• Understand the data and contribute in building the data model for retailer dashboard reports.
• Develop complex aggregation logics in SparkSql using Scala.
• Optimizations of SparkSql aggregation code to run efficiently over huge volume of data.
• Design and develop data pipeline to ingest ,process and load data into hive tables to be further
consumed by the solr indexes and microservices to feed into the live dashboard.

Project: Kohls EDW Migration from On-Premise to GCP


Client : Kohls Corporation (US)
Role : Data Engineer / Hadoop Developer
Environment: Hadoop, Hive, Sqoop,Spark ,Google Cloud Dataproc , Google Cloud Storage, Google Cloud
Bigquery,Google Cloud BigTable.
Project Description : Project is aimed to migrate the Hadoop application workloads – sqoop jobs , hive jobs ,
spark jobs , mapreduce applications from on-premise environment to GCP environment along with the
migrations of data from on-premise HDFS to the Google Cloud Storage.
Responsibilities:
• Migration of Hadoop applications workloads from on-premise hadoop cluster to Google Dataproc
Cluster which involves – application code changes ,configuration changes ,thorough testing and
deployment.
• Develop jobs for automated scheduled migration of data from on-premise hadoop to Google Cloud
Storage buckets using distcp ,shell scripting and scheduler.
• Develop jobs for automated scheduled migration of hive tables from on-premise to Google Dataproc
cluster Hive using - custom mapreduce job.
• Develop jobs for automated scheduled report generations for Client which involves -
1)Status of migrated applications 2)UAT - Data Testing Report
• Develop Bigquery jobs to – for loading data and analysis.
Project: Yes Bank – EDW Setup on On-Premise Hadoop
Client : Yes Bank
Role : Data Engineer / Hadoop Developer
Environment: Hadoop, Hive, Sqoop,Spark ,HBase
Project Description : Project is aimed to build enterprise data ware house for YES BANK on Hadoop. EDW is
further divided into different data marts which serves data to do further analysis to take several insights.
Responsibilities:
• Spark Developer – developed SCD2 and CDC generic framework in Spark Scala to build the gold layer in
data mart.
• Develop data processing scripts in SparkSql.

Education:
• Bachelor of Technology in CSE (B.Tech) from Institute Of Technology , Central University of Bilaspur(Chhattisgarh)
with 85% aggregate in year 2016

Personal Details:
• Address: A2-201, Ganga Orchad Society , Mundhawa, Pune - 411036.
• Date of Birth : 1st May, 1994
• Marital status : Unmarried
• Languages Known : English, Hindi

Declaration:
I hereby declare that the above information is true to the best of my knowledge.

Place: Pune                                                                                                Pranjal Soni

You might also like