Professional Documents
Culture Documents
Professional Summary
Around 7 years of technical IT experience in all phases of Software Development Life Cycle (SDLC)
with skills in data analysis, design, development, testing, and deployment of software systems.
Had more than 5 years of industrial experience in Big Data analytics, Data manipulation using
Hadoop Eco system tools Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, HBase, Spark, Kafka, Flume,
Sqoop, Oozie, Avro, AWS, Spark integration with Cassandra, Zookeeper.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL
Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database
access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats for analysing and transforming the data
to uncover insights into the customer usage patterns.
Hands on experience in Hadoop Ecosystem components such as Spark, SQL, Hive, Pig, Sqoop, Flume,
Zookeeper/Kafka, HBase and MapReduce.
Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and
Performed map-side joins on RDD's.
Experience in creating configuration files to deploy the SSIS packages across all environments.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems
(RDBMS)/ Non-Relational Database Systems and vice-versa.
Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue,
Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
Experience in developing, support and maintenance for the ETL (Extract, Transform and Load)
processes using Talend Integration Suite.
Installed, configured Kafka brokers, zookeepers 4.x and 5.x, Kafka connect, schema registry, KSQL,
Rest proxy and Kafka Control center.
Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake
schema and Extended Star.
Worked in Dimension Data modeling concepts like Star Join Schema Modeling, Snowflake Modeling,
FACT and Dimensions Tables, Physical and Logical Data Modeling.
Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
Excellent Programming skills at a higher level of abstraction using Scala, Java, and Python.
Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive
SerDes.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per
second on streaming data.
Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data
Warehouse and Integration projects
Hands on Experience in designing and developing applications in Spark using Scala and Pyspark to
compare the performance of Spark with Hive and SQL/Oracle.
Experience in writing complex SQL queries, creating reports and dashboards.
Having working experience with Building RESTful web services, and RESTful API.
Experience in planning intuitive dashboards, reports, performing analysis and perceptions utilizing
Tableau, Power BI, Arcadia and Matplotlib.
TECHNICAL SKILL-SET
Big Data Ecosystem HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka,
Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala
Hadoop Distribution Cloudera CDH, Horton Works HDP, Apache, AWS
Machine Learning Logistic Regression, Decision Tree, Random Forest, K-Nearest
Classification Algorithms Neighbour (KNN), Principal Component Analysis
Languages Shell scripting, SQL, PL/SQL, Python, R, PySpark, Pig, Hive QL, Scala,
Regular Expressions
Web Technologies HTML, JavaScript, Restful, SOAP
Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Version Control GIT, GIT HUB
IDE & Tools, Design Eclipse, Visual Studio, Net Beans, Junit, CI/CD, SQL Developer, MySQL,
SQL Developer, Workbench, Tableau
Databases Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS
Access, Snowflake, NoSQL Database (HBase, MongoDB).
Operating Systems Windows 98, 2000, XP, Windows 7,10, Mac OS, Unix, Linux
Data Engineer/Big Data Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop,
Tools MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera,
/Cloud/Visualization/Other MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data
Tools Explorer, Azure HDInsight, Salesforce, Google Shell, Linux, Bash Shell,
Unix, etc., Tableau, Power BI, SAS, Crystal Reports, Dashboard Design.
EDUCATION
Environment: Hadoop 2.x, Hive v2.3.1, Spark v2.1.3, Databricks, Lambda, Glue, Azure, ADF, Blob,
cosmos DB, Python, PySpark, Java, Scala, SQL, Sqoop v1.4.6, Kafka, Airflow v1.9.0, Oozie, HBase,
Oracle, Teradata, Cassandra, MLlib, Tableau, Maven, Git, Jira.
Worked as Data Engineer to review business requirement and compose source to target data
mapping documents.
Advice the business on best practices in the Spark Sql while making sure the solution meet the
business needs.
Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
Involve in preparation, distribution and collaboration of client specific quality documentation on
developments for Big Data and Spark along with regular monitoring on reflecting the modifications
or enhancements done in Confidential Schedulers.
Designed dimensional data models using Star and Snowflake Schemas.
Consumed XML messages using Kafka and processed the xml file using Spark streaming to capture UI
updates.
Experience on Vagrant, AWS and Kubernetes based container deployments to create self-
environments for dev teams and containerization.
Implemented a production ready, load balanced, highly available, fault tolerant Kubernetes
infrastructure.
Azure Kubernetes Service was used to deploy a managed Kubernetes cluster in Azure and built an
Azure portal AKS cluster with Azure CLI and used template-driven deployment options such as
templates for the Resource Manager and Terraform.
Used Kubernetes to deploy scale, load balance, scale and manage docker containers with multiple
name spaced versions.
Worked on Snowflake environment to remove redundancy and load real time data from various
data sources into HDFS using Kafka.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Built and designed ETL pipeline using python to fetch data from Redshift data warehouse and
applications.
Accessed the Hive tables using Spark Hive context (Spark sql) and used scala for interactive
operations.
Created S3 buckets in the AWS environment to store Marketing data and copy data to Redshift
cluster.
Used EMR (Elastic Map Reducing) to perform big data operations in AWS.
Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue,
Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
Implemented a fully operational production grade large scale data solution on Snowflake Data
Warehouse.
Worked with structured/semi-structured data ingestion and processing on AWS using S3, Python
and Migrate on-premises big data workloads to AWS.
Wrote Python scripts to parse XML documents and load the data in database.
Worked on QA the data and adding Data sources, snapshot, caching to the report
Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are
resolved on basis of using defect reports.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Involved in preparing SQL and PL/SQL coding convention and standards.
Involved in Data mapping specifications to create and execute detailed system test plans.
Environment: Agile, ODS, OLTP, ETL, HDFS, Kafka, AWS, S3, Python, K-means, XML, SQL, Talend,
Redshift, Glue, Lambda, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Ranger, Git. Spark, Hive,
Scala
Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming,
Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka
Performing hive tuning techniques like partitioning and bucketing and memory optimization.
Worked on different file formats like parquet, orc, json and text files.
Used spark sql to load data and created schema RDD on top of that which loads into hive tables and
handled structured using spark sql
Worked on analysing Hadoop cluster using different big data analytic tools including Flume, Pig,
Hive, HBase, Oozie, ZooKeeper, Sqoop, Spark and Kafka.
As a Big Data Developer implemented solutions for ingesting data from various sources and
processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce
Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.
Explored with the Spark improving the performance and optimization of the existing algorithms
in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark, YARN.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types
of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Involved in the data support team as role of bug fixes, schedule change, memory tuning, schema
changes loading the historic data.
Worked on implementation of some check points like hive count check, Sqoop records check, done
file create check, done file check and touch file lookup.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java,
GitHub, Talend Big Data Integration, Impala.