You are on page 1of 6

 Working as Software Engineer to design, implement and Support different big data

applications by using Apache Spark, Hadoop and AWS.


 Having 10+ years of experience in IT, include 3 years’ experience in Apache Spark
to implement Big Data Applications
 Involved in developing Big Data applications by using different frameworks like
Hadoop, Hive, Sqoop, Spark, Flink, Phoenix, Cassandra, Kafka and Redshift
 Having extensive experience on SparkSQL, Spark streaming include tune-up of the
Spark applications.
 Having experience in all stages of the project including requirements gathering,
designing & documenting architecture, development, performance optimization,
cleaning, and reporting.
 Strong experience on AWS-EMR, Spark Installation, HDFS & Map-reduce Architecture.
Along with that having a good knowledge on Spark, Scala and Hadoop distributions
like Hortonworks, Apache Hadoop, Cloudera, Azure Talend and Qubole.
 Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Shell
Script, Sqoop, Nifi, Kafka, Oozie, Cassandra, Spark SQL, Spark Streaming and Flink.
 Working on bigdata administration activities like Cloudera, Hortonworks, Apache
spark installation as well.
 Having Primary knowledge & eager to implement Artificial Intelligence & IOT
applications. Having POC knowledge on deep learning, Machine learning and
Artificial Intelligence.

 Having 4 years experience in all Big Data ecosystems include spark, Hadoop
ecosystems.
 Having extensive experience on SparkSQL, Spark streaming, SparkR include tune-up
of the Spark applications.
 Strong experience on AWS-EMR & Azure Cluster setup, Spark Installation, HDFS &
MapReduce Architecture. Along with that having a good knowledge on Spark, Scala and
Hadoop distributions like Hotonworks, Apache Hadoop, Cloudera and Azure.
 Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop,
Flume, Kafka, Cassandhra, SparkSQL, Spark Streaming, SparkR and Flink.

* Hadoop/Spark Developer 8+ years of overall IT experience in a variety of


industries, which includes hands on experience on Big Data Analytics, and
Development.
* Having good experience in Bigdata related technologies like Hadoop frameworks,
Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, ZooKeeper, Oozie, and
Storm.
* Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker,
Name Node, Data Node and Map Reduce programming paradigm
* Experienced in writing complex MapReduce programs that work with different file
formats like Text, Sequence, Xml, JSON and Avro.
* Having working experience on Cloudera Data Platform using VMware Player, Cent OS
6 Linux environment. Strong experience on Hadoop distributions like Cloudera, and
HortonWorks.
* Good knowledge of No-SQL databases Cassandra, MongoDB and HBase.
* Expertise in Database Design, Creation and Management of Schemas, writing Stored
Procedures, Functions, DDL, DML SQL queries.
* Worked on HBase to load and retrieve data for real time processing using Rest
API.
* Very good experience of Partitions, Bucketing concepts in Hive and designed both
Managed and External tables in Hive to optimize performance.
* Good working experience using Sqoop to import data into HDFS or Hive from RDBMS
and exporting data back to HDFS or HIVE from RDBMS.
* Extending HIVE and PIG core functionality by using custom User Defined Function's
(UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating
Functions (UDAF) for Hive and Pig.
* Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera
Manager.
* Worked in ETL tools like Talend to simplify Map Reduce jobs from the front end.
Also have knowledge of Pentaho and Informatica as another working ETL tool with Big
Data.
* Worked with BI tools like Tableau for report creation and further analysis from
the front end.
* Extensive knowledge in using SQL queries for backend database analysis.
* Involved in unit testing of Map Reduce programs using Apache MRunit.
* Worked on Amazon Web Services and EC2
* Excellent Java development skills using J2EE, J2SE, Servlets, JSP,
Spring,Hibernate, JDBC.
* Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator,
Expression, Lookup, Router, Filter, Update Strategy, Sequence Generator, Normalizer
and Rank) and Mappings using Informatica Designer and processing tasks using
Workflow Manager to move data from multiple sources into targets.
* Implemented SOAP based web services.
* Used Curl scripts to test RESTful Web Services.
* Experience in database design using PL/SQL to write Stored Procedures, Functions,
Triggers and strong experience in writing complex queries for Oracle.
* Experience working with Build tools like Maven and Ant.
* Experienced in both Waterfall and Agile Development (SCRUM) methodologies
* Strong Problem Solving and Analytical skills and abilities to make Balanced &
Independent Decisions.
* Experience in developing service components using JDBC.
* Authorized to work in the US for any employer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Overall 8 Years of experience in IT Industry including 5+Years of experience as
Hadoop/Spark Developer using <strong>Big data Technologies like Hadoop Ecosystem,
Spark Ecosystems and 2+Years of Java/J2EE Technologies and SQL.
*Hands on experience in installing, configuring and using Hadoop ecosystem
components like HDFS, MapReduce Programming, Hive, Pig, Yarn, Sqoop, Flume, Hbase,
Impala, Oozie, Zoo Keeper, Kafka, Spark.
*In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data
Frames, Spark Streaming, Spark MLib
*Hands on experience in Analysis, Design, Coding and Testing phases of Software
Development Life Cycle (SDLC).
*Experienced in writing MapReduce programs in Java to process large data sets using
Map and Reduce Tasks.
*Experience in using Accumulator variables, Broadcast variables, RDD caching for
Spark Streaming.
*Worked on HBase to perform real time analytics and experienced in CQL to extract
data from Cassandra tables.
*Hands on experience in various big data application phases like data ingestion,
data analytics and data visualization.
*Expertise in using Spark-SQL with various data sources like JSON, Parquet and
Hive.
*Experience in usage of Hadoop distribution like Cloudera 5.3(CDH5,CDH3), Horton
works distribution &amp; Amazon AWS
*Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
*Experience in creating tables, partitioning, bucketing, loading and aggregating
data using Hive.
*Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
*Experience in working with flume to load the log data from multiple sources
directly into HDFS.
*Experience data processing like collecting, aggregating, moving from various
sources using Apache Flume and Kafka
*Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce
programs in Java
*Experience in NoSQL Column-Oriented Databases like Hbase, Cassandra and its
Integration with Hadoop cluster. Experience in manipulating/analyzing large
datasets and finding patterns and insights within structured and unstructured data.

*Uploaded and processed terabytes of data from various structured and unstructured
sources into HDFS (AWS cloud) using Sqoop and Flume.
*Involved in Cluster coordination services through Zookeeper.
*Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and
JSP.
*Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-
structures, Multi-threading, Serialization and deserialization.
*Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
*Developed web application in open source java framework Spring. Utilized Spring
MVC framework.

Skills
---------
Hadoop Technologies
Apache Hadoop, Cloud era Hadoop Distribution (HDFS and MapReduce)
Technologies HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka,
Zookeeper, and Oozie
Java/J2EE Technologies
Core Java, Servlets, Hibernate, Spring, Struts.
NOSQL Databases
Hbase, Cassandra
Programming Languages
Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting
Web Technologies
HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML
Application Servers
Web Logic, Web Sphere, JBoss, Tomcat
Cloud Computing tools
Amazon AWS
Build Tools
Jenkins, Maven, ANT, SBT
Databases
MySQL, Oracle, DB2
Business Intelligence Tools
Tableau, Splunk
Development Methodologies
Agile/Scrum, Waterfall
Development Tools
Microsoft SQL Studio, Toad, Eclipse, NetBeans
Operating Systems
Windows 95/98/2000/XP, MAC OS, UNIX, LINUX

-----------------
Work History
-----------------
Responsible for design development of Spark SQL Scripts based on Functional
Specifications.
Responsible for Spark Streaming configuration based on type of Input Source.
Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
Developed the services to run the Map-Reduce jobs as per the requirement basis.
Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
Responsible to manage data coming from different sources.
Monitoring the running MapReduce programs on the cluster.
Responsible for loading data from UNIX file systems to HDFS.
Installed and configured Hive and also written Pig/Hive UDFs.
Involved in creating Hive Tables, loading with data and writing Hive queries which
will invoke and run MapReduce jobs in the backend.
Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading
into Hive (Hadoop) tables.
Implemented the workflows using Apache Oozie framework to automate tasks.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets
of semi structured data coming from various sources.
Developing design documents considering all possible approaches and identifying
best of them.
Loading Data into HBase using Bulk Load and Non-bulk load.
Developed scripts and automated data management from end to end and sync up b/w all
the clusters.
Exploring with the Spark improving the performance and optimization of the existing
algorithms in Hadoop.
Import the data from different sources like HDFS/HBase into Spark RDD.
Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
Import the data from different sources like HDFS/HBase into Spark RDD.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD,
Scala and Python.
Involved in gathering the requirements, designing, development and testing.
Followed agile methodology for the entire project.
Prepare technical design documents, detailed design documents.
Environment: Hive, HBase, Flume, Java, Maven, Impala, Splunk, Pig, Spark, Oozie,
Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat,
Java, Scala, Python.
Hadoop developer.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consumed the data from Kafka queue using Spark.
Configured different topologies for Spark cluster and deployed them on regular
basis.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in loading data from LINUX file system to HDFS.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-
site.xml based upon the job requirement.
Involved in performing the Linear Regression using Scala API and Spark.
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH5 Hadoop
cluster on CentOS.
Assisted with performance tuning, monitoring and troubleshooting.
Created Map Reduce programs for some refined queries on big data.
Involved in the development of Pig UDF'S to analyze by pre-processing the data.
Involved in setting up of HBase to use HDFS.
Used Hive partitioning and bucketing for performance optimization of the hive
tables and created around 20000 partitions.
Created RDD's in Spark technology and extracted data from data warehouse on to the
Spark RDD's.
Used Spark with Scala.
Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Worked on analyzing Hadoop cluster using different big data analytic tools
including Pig, Hive and MapReduce.
Involved in exploring Hadoop, Map Reduce programming and its ecosystems.
Implementing Map Reduce programs / Algorithms for Organizing the data, For
performing Aggregation over the data, Joining different data sets, Filtering the
data, Classification, Partitioning.
Importing and exporting data into HDFS and Hive using Sqoop.
Writing UDF (User Defined Functions) in Pig, Hive when needed.
Developing the Pig scripts for processing data.
Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
Involved in creating Hive tables, loading data &writing hive queries.
Written Hive queries for data analysis to meet the business requirements.
Created HBase tables to store various data formats of incoming data from different
portfolios.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Automated the History and Purge Process.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Load and transform large sets of structured, semi structured and unstructured data.
Validating the data using MD5 algorithems.
Experience in Daily production support to monitor and trouble shoots Hadoop/Hive
jobs.
Involved in Configuring core-site.xml and mapred-site.xml according to the multi
node cluster environment.
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux
scripts.
Used AVRO, Parquet file formats for serialization of data.
Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Developed Web module using Spring MVC, JSP.
Developing model logic by using Hibernate ORM framework.
Handle server side validations.
Involved in Bug fixing.
Involved in Unit Testing by using Junit.
Writing Technical Design Document.
Gathered specifications from the requirements.
Developed the application using Spring MVC architecture.
Developed JSP custom tags support custom User Interfaces.
Developed front-end pages using JSP, HTML and CSS.
Developed core Java classes for utility classes, business logic, and test cases.
Developed SQL queries using MySQL and established connectivity.
Used Stored Procedures for performing different database operations.
Used Hibernate for interacting with Database.
Developed control classes for processing the request.
Used Exception Handling for handling exceptions.
Designed sequence diagrams and use case diagrams for proper implementation.
Used Rational Rose for design and implementation Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Developing Spark programs using Scala API's to compare the performance of Spark
with Hive and SQL.
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
Implemented Spark using Scala and SparkSQL for faster testing and processing of
data.
Designed and created Hive external tables using shared meta-store instead of derby
with partitioning, dynamic partitioning and buckets.
Used Impala for querying HDFS data to achieve better performance.
Implemented Apache PIG scripts to load data from and to store data into Hive.
Imported data from AWS S3 and into Spark RDD and performed transformations and
actions on RDD's.
Used the JSON and XML SerDe's for serialization and de-serialization to load JSON
and XML data into HIVE tables.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like
EC2, S3, EBS, RDS and VPC.
Imported the data from different sources like AWS S3, LFS into Spark RDD.
Worked with various HDFS file formats like Avro, Sequence File and various
compression formats like Snappy.
Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and
backup.
Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive
Tables and handled Structured data using SparkSQL.
Develop Spark/MapReduce jobs to parse the JSON or XML data.
Involved in HBASE setup and storing data into HBASE, which will be used for
analysis.
Used Scala libraries to process XML data that was stored in HDFS and processed data
was stored in HDFS.
Load the data into Spark RDD and do in memory data Computation to generate the
Output response.
Used Spark for interactive queries, processing of streaming data and integration
with popular NoSQL database for huge volume of data.
Wrote different pig scripts to clean up the ingested data and created partitions
for the daily data.
Involved in converting Hive/SQL queries into Spark transformations using Spark
RDDs, Scala and Python.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Involved in converting MapReduce programs into Spark transformations using Spark
RDD in Scala.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for
faster processing of data.
Used Avro, Parquet and ORC data formats to store in to HDFS.
Used Oozie workflow to co-ordinate pig and hive scripts.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie,
Flume, Scala, AWS, Python, Java, JSON, SQL Scripting and Linux Shell Scripting,
Avro, Parquet, Hortonworks.

You might also like