Professional Documents
Culture Documents
Points To Write Spark Dev Resume
Points To Write Spark Dev Resume
Having 4 years experience in all Big Data ecosystems include spark, Hadoop
ecosystems.
Having extensive experience on SparkSQL, Spark streaming, SparkR include tune-up
of the Spark applications.
Strong experience on AWS-EMR & Azure Cluster setup, Spark Installation, HDFS &
MapReduce Architecture. Along with that having a good knowledge on Spark, Scala and
Hadoop distributions like Hotonworks, Apache Hadoop, Cloudera and Azure.
Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop,
Flume, Kafka, Cassandhra, SparkSQL, Spark Streaming, SparkR and Flink.
*Uploaded and processed terabytes of data from various structured and unstructured
sources into HDFS (AWS cloud) using Sqoop and Flume.
*Involved in Cluster coordination services through Zookeeper.
*Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and
JSP.
*Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-
structures, Multi-threading, Serialization and deserialization.
*Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
*Developed web application in open source java framework Spring. Utilized Spring
MVC framework.
Skills
---------
Hadoop Technologies
Apache Hadoop, Cloud era Hadoop Distribution (HDFS and MapReduce)
Technologies HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka,
Zookeeper, and Oozie
Java/J2EE Technologies
Core Java, Servlets, Hibernate, Spring, Struts.
NOSQL Databases
Hbase, Cassandra
Programming Languages
Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting
Web Technologies
HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML
Application Servers
Web Logic, Web Sphere, JBoss, Tomcat
Cloud Computing tools
Amazon AWS
Build Tools
Jenkins, Maven, ANT, SBT
Databases
MySQL, Oracle, DB2
Business Intelligence Tools
Tableau, Splunk
Development Methodologies
Agile/Scrum, Waterfall
Development Tools
Microsoft SQL Studio, Toad, Eclipse, NetBeans
Operating Systems
Windows 95/98/2000/XP, MAC OS, UNIX, LINUX
-----------------
Work History
-----------------
Responsible for design development of Spark SQL Scripts based on Functional
Specifications.
Responsible for Spark Streaming configuration based on type of Input Source.
Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
Developed the services to run the Map-Reduce jobs as per the requirement basis.
Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
Responsible to manage data coming from different sources.
Monitoring the running MapReduce programs on the cluster.
Responsible for loading data from UNIX file systems to HDFS.
Installed and configured Hive and also written Pig/Hive UDFs.
Involved in creating Hive Tables, loading with data and writing Hive queries which
will invoke and run MapReduce jobs in the backend.
Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading
into Hive (Hadoop) tables.
Implemented the workflows using Apache Oozie framework to automate tasks.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets
of semi structured data coming from various sources.
Developing design documents considering all possible approaches and identifying
best of them.
Loading Data into HBase using Bulk Load and Non-bulk load.
Developed scripts and automated data management from end to end and sync up b/w all
the clusters.
Exploring with the Spark improving the performance and optimization of the existing
algorithms in Hadoop.
Import the data from different sources like HDFS/HBase into Spark RDD.
Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
Import the data from different sources like HDFS/HBase into Spark RDD.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD,
Scala and Python.
Involved in gathering the requirements, designing, development and testing.
Followed agile methodology for the entire project.
Prepare technical design documents, detailed design documents.
Environment: Hive, HBase, Flume, Java, Maven, Impala, Splunk, Pig, Spark, Oozie,
Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat,
Java, Scala, Python.
Hadoop developer.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consumed the data from Kafka queue using Spark.
Configured different topologies for Spark cluster and deployed them on regular
basis.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in loading data from LINUX file system to HDFS.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-
site.xml based upon the job requirement.
Involved in performing the Linear Regression using Scala API and Spark.
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH5 Hadoop
cluster on CentOS.
Assisted with performance tuning, monitoring and troubleshooting.
Created Map Reduce programs for some refined queries on big data.
Involved in the development of Pig UDF'S to analyze by pre-processing the data.
Involved in setting up of HBase to use HDFS.
Used Hive partitioning and bucketing for performance optimization of the hive
tables and created around 20000 partitions.
Created RDD's in Spark technology and extracted data from data warehouse on to the
Spark RDD's.
Used Spark with Scala.
Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Worked on analyzing Hadoop cluster using different big data analytic tools
including Pig, Hive and MapReduce.
Involved in exploring Hadoop, Map Reduce programming and its ecosystems.
Implementing Map Reduce programs / Algorithms for Organizing the data, For
performing Aggregation over the data, Joining different data sets, Filtering the
data, Classification, Partitioning.
Importing and exporting data into HDFS and Hive using Sqoop.
Writing UDF (User Defined Functions) in Pig, Hive when needed.
Developing the Pig scripts for processing data.
Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
Involved in creating Hive tables, loading data &writing hive queries.
Written Hive queries for data analysis to meet the business requirements.
Created HBase tables to store various data formats of incoming data from different
portfolios.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Automated the History and Purge Process.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Load and transform large sets of structured, semi structured and unstructured data.
Validating the data using MD5 algorithems.
Experience in Daily production support to monitor and trouble shoots Hadoop/Hive
jobs.
Involved in Configuring core-site.xml and mapred-site.xml according to the multi
node cluster environment.
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux
scripts.
Used AVRO, Parquet file formats for serialization of data.
Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Developed Web module using Spring MVC, JSP.
Developing model logic by using Hibernate ORM framework.
Handle server side validations.
Involved in Bug fixing.
Involved in Unit Testing by using Junit.
Writing Technical Design Document.
Gathered specifications from the requirements.
Developed the application using Spring MVC architecture.
Developed JSP custom tags support custom User Interfaces.
Developed front-end pages using JSP, HTML and CSS.
Developed core Java classes for utility classes, business logic, and test cases.
Developed SQL queries using MySQL and established connectivity.
Used Stored Procedures for performing different database operations.
Used Hibernate for interacting with Database.
Developed control classes for processing the request.
Used Exception Handling for handling exceptions.
Designed sequence diagrams and use case diagrams for proper implementation.
Used Rational Rose for design and implementation Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Developing Spark programs using Scala API's to compare the performance of Spark
with Hive and SQL.
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
Implemented Spark using Scala and SparkSQL for faster testing and processing of
data.
Designed and created Hive external tables using shared meta-store instead of derby
with partitioning, dynamic partitioning and buckets.
Used Impala for querying HDFS data to achieve better performance.
Implemented Apache PIG scripts to load data from and to store data into Hive.
Imported data from AWS S3 and into Spark RDD and performed transformations and
actions on RDD's.
Used the JSON and XML SerDe's for serialization and de-serialization to load JSON
and XML data into HIVE tables.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like
EC2, S3, EBS, RDS and VPC.
Imported the data from different sources like AWS S3, LFS into Spark RDD.
Worked with various HDFS file formats like Avro, Sequence File and various
compression formats like Snappy.
Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and
backup.
Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive
Tables and handled Structured data using SparkSQL.
Develop Spark/MapReduce jobs to parse the JSON or XML data.
Involved in HBASE setup and storing data into HBASE, which will be used for
analysis.
Used Scala libraries to process XML data that was stored in HDFS and processed data
was stored in HDFS.
Load the data into Spark RDD and do in memory data Computation to generate the
Output response.
Used Spark for interactive queries, processing of streaming data and integration
with popular NoSQL database for huge volume of data.
Wrote different pig scripts to clean up the ingested data and created partitions
for the daily data.
Involved in converting Hive/SQL queries into Spark transformations using Spark
RDDs, Scala and Python.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Involved in converting MapReduce programs into Spark transformations using Spark
RDD in Scala.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for
faster processing of data.
Used Avro, Parquet and ORC data formats to store in to HDFS.
Used Oozie workflow to co-ordinate pig and hive scripts.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie,
Flume, Scala, AWS, Python, Java, JSON, SQL Scripting and Linux Shell Scripting,
Avro, Parquet, Hortonworks.