You are on page 1of 11

 Hands-on experience with the Hadoop stacks (e.g.

MapReduce, HDFS, Spark, Spark streaming, Strom,


Hive Sqoop, Pig, Hive, HBase, Flume, Zookeeper, Avro, Mahout, Oozie etc.)
 Involved in design, architecture and implementation of large scale streaming applications using Hadoop
Lambda architecture.
 Worked on multiple NoSQL platforms like Hbase, Cassandra etc. (e.g. key-value stores, graph databases,
documented oriented dbs)
 Have hands-on with Analytics algorithms like Bayesian Algorithm, decision trees, regressions, random
forest, neural networks, genetic algorithms etc.
 Hands-on experience with "productionalizing" Hadoop applications (e.g. administration, configuration

 management, monitoring, debugging, and performance tuning, continuous builds and continuous

deployments)

 Excellent analytical, problem solving, communication and interpersonal skills with ability to interact with

individuals at all levels and can work as a part of a team as well as independently.

 Having around 2 yrs. of banking industry experience in big data Hadoop framework and related hadoop
 ecosystem like HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Flume, Zookeper and Oozie, Scala, Spark.
 Implemented Hadoop based Enterprise data warehouses, integrated Hadoop with Enterprise Data
Warehouse systems.
 Involved in writing the Pig scripts to reduce the job execution time.
 Experienced with Sqoop to export/import data from a RDBMS to Hadoop and vice versa.
 Involved in creating tables in Hbase tables and storing and retrieving data from Hbase tables.
 Experienced in Writing Hive UDF's and Pig UDF's based on requirements.
 Developed complex PIG scripts and HIVE queries.
 Experience in installing, configuring and administrating the Hadoop Cluster of Major Hadoop Distributions
like Apache Hadoop and cloudera.
 Good knowledge on Hadoop streaming with python.
 Good Knowledge on BIG DATA concepts.
 Troubleshoot map reduce jobs, PIG Latin Scripting and HIVE queries.
 Experience in configuring, installing, supporting and monitoring Hadoop cluster using Apache Cloudera
distribution.
 Around 8 years of professional experience in IT, including 3 years of hands on experience in Big Data,
Hadoop Ecosystem Components
 In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name
Node, Data Node, Job Tracker and Task Tracker.
 Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
 Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data
using HIVE QL.
 Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS.
 Experience in writing Hadoop Jobs for analyzing data using Pig Latin Commands.
 Good Knowledge of analyzing data in HBase using Hive and Pig.
 Working Knowledge in NoSQL Databases like HBase and Cassandra.
 Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and
efficient processing of Big Data.
 Experiencing in using Hadoop Packages in R known as RHADOOP.
 Experience in Launching EC2 instances in Amazon EMR using Console.
 Extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
 Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive
and Pig in Distributed Mode.
 Knowledge in understanding the security requirements for Hadoop and integrate with Kerberos
authentication and authorization infrastructure.
 Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from
application servers.
 Passionate towards working in Big Data and Analytics environment.
 Experience in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling
workflows.
 Working Knowledge in configuring and monitoring tools like Ganglia and Nagios.
 Knowledge on Reporting tools like Tableau which is used to do analytics on data in cloud.
 Extensive experience with SQL, PL/SQL, Shell Scripting and database concepts.
 Experience in developing applications using Java & J2EE technologies.

Roles & Responsibilities:

 Involved in implementation of the project using components like HDFS, Sqoop, Hive, Map reduce, MongoDB

and Oozie.

 Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like
Hive, Pig, HBase, Zookeeper and Sqoop.
 Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to
any warning or failure conditions.
 Managing and scheduling Jobs on a Hadoop cluster.
 Deployed Hadoop Cluster in the following modes Standalone ♦ Pseudo-distributed ♦ Fully Distributed
 Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
 Developed the Pig UDF'S to pre-process the data for analysis
 Develop Hive queries for the analysts
 Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with
Pig.
 Cluster co-ordination services through Zookeeper.
 Collected the logs data from web servers and integrated in to HDFS using Flume.
 Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce
jobs given by the users.
 Managed and reviewed Hadoop log files
 Proactively monitored systems and services, architecture design and implementation of Hadoop
deployment,
 configuration management, backup, and disaster recovery systems and procedures.
 Involved in analyzing system failures, identifying root causes, and recommended course of actions.
 Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing
Hadoop clusters.
 Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job
performance and capacity planning using Cloudera Manager.
 Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
 Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile
and network devices and pushed to HDFS.
 Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page- views, visit
duration, most purchased product on website.
 Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
for the BI team.
 Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box
(like
 Java MapReduce, Pig, Hive, Sqoop) as well as system specific jobs (such as Java programs and shell
scripts)
 Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.

Responsibilities
 Capable of processing large sets of structured, semi-structured and unstructured data and supporting
systems application architecture.
 Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data
in partitioned tables in the EDW.
 Implementation of requirement by using Big Data Ecosystems such as Cloudera CDH-5 (Hadoop,
 MapReduce, HDFS, Hive, Pig, Sqoop, Oozie and Flume)
 Create Hive queries that helped business users spot emerging trends by comparing fresh data with EDW
reference tables and historical metrics
 Use Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the
data.
 Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping,
design and review.
 Managed and reviewed Hadoop log files.
 Provide assistance for troubleshooting and resolution of problems relating to Hadoop jobs.

Responsibilities
 Worked on a live 30 nodes Hadoop cluster running Hortonworks Distributed Platform.
 Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of
3).
 Extracted the data using FLUME from various logs.
 Created and worked Sqoop jobs with incremental load to populate Hive External tables.
 Extensive experience in writing Pig scripts to transform raw data from several data sources into forming
baseline data.
 Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and
External tables in Hive to optimize performance.
 Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
 Developed Oozie workflow for scheduling and orchestrating the ETL process users/application support team
whenever required.
 Provide assistance for troubleshooting and resolution of problems relating to Hadoop jobs.

 Worked on a live 67 nodes Hadoop cluster running CDH 5.3.2.


 Worked with highly Unstructured, Semi structured data of 90 TB in size.
 Interacted with the users, Business Analysts for collecting, understating the business requirements.
 Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
 Developed simple to complex Map Reduce job using Pig, Hive & SQOOP.
 Organized data into tables, performing transformations, and simplifying complex queries with Hive.
 Create Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
 Created HBase tables to load large sets of structured, semi-structured and unstructured data.
 Developed Oozie workflow and maintained several batch jobs to run automatically depending on business
requirements.
 Involved in managing and reviewing Hadoop log files.
 Worked in Agile environment, which maintain the story points in Scrum model?
(Admin & support)
 Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
Worked on designing and upgrading CDH 4 to CDH 5.
Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes,
cluster capacity planning, cluster Monitoring and Troubleshooting.
o Manage and review Hadoop log files and Log cases with Cloudera Manager.
o Monitor and report data usage across the clusters.
o Performing Cloudera Manager Admin activities and providing access to users based on their levels/
requirements.
 Implement best practices regarding system monitoring, change control, Service level agreements.
o Installed and managed multiple Hadoop clusters - Production, stage, development.
 Administrator for Pig, Hive and Hbase installing updates, patches and upgrade
o Collecting and aggregating large amounts of log data using Apache Flume.

 Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
o Wrote MapReduce jobs using Java API.
o Developed Scripts and Batch Job to schedule various Hadoop Program.
o Installed and maintained Apache Hadoop clusters for application development and Hadoop tools
like Hive,
 Pig, HBase and Sqoop.
o Installed and configured Pig and also written PigLatin scripts.
o Developed the Pig UDF'S to pre-process the data for analysis.
o Develop Hive queries for the analysts.
o Wrote Hive queries for data analysis to meet the business requirements.
o Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with
Pig.
 Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce
jobs given by the users.
 Took part in monitoring, troubleshooting and managing Hadoop log files.

 Responsibilities:
 Worked with business teams and created Hive queries for ad hoc access.
 Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
 Involved in review of functional and non-functional requirements
 Responsible to manage data coming from different sources.
 Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
 Loaded daily data from websites to Hadoop cluster by using Flume.
 Involved in loading data from UNIX file system to HDFS.
 Creating Hive tables and working on them using Hive QL.
 Created complex Hive tables and executed complex Hive queries on Hive warehouse.
 Wrote MapReduce code to convert unstructured data to semi structured data.
 Developed programs in Spark based on the application for faster data processing than standard
MapReduce
 programs.
o Used Pig to extract, transformation & load of semi structured data.
o Installed and configured Hive and also written Hive UDFs.
o Develop Hive queries for the analysts.
o Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-
processing with
 Pig.
o Cluster co-ordination services through ZooKeeper.
o Collected the logs data from web servers and integrated in to HDFS using Flume.
o Creating Hive tables and working on them using Hive QL.

 Worked on Hive for exposing data for further analysis and for generating transforming files from different
analytical formats to text files.
o Design and implement Map Reduce jobs to support distributed data processing.
o Supported MapReduce Programs those are running on the cluster.
o Involved in HDFS maintenance and loading of structured and unstructured data.
o Wrote MapReduces job using Java API.
o Designing NoSQL schemas in Hbase.
o Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond
accordingly to any warning or failure conditions.
o Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs
and data.
o Developed the Pig UDF'S to pre-process the data for analysis.
o Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs
and data.
o Experienced in Agile Methodologies and SCRUM Process.

 Responsibilities:
o Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in
Java for data cleansing and preprocessing.
o Evaluated business requirements and prepared detailed specifications that follow project guidelines
required to develop written programs.
o Responsible for building scalable distributed data solutions using Hadoop.
o Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
o Handled importing of data from various data sources, performed transformations using Hive,
MapReduce, and loaded data into HDFS.
o Importing and exporting data into HDFS using Sqoop.
o Wrote MapReduce code to make un-structured data into semi- structured data and loaded into Hive
tables.
o Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and
troubleshooting.
o Worked extensively in creating MapReduce jobs to power data for search and aggregation
o Developed programs in Spark based on the application for faster data processing than standard
MapReduce programs.
o Worked extensively with Sqoop for importing metadata from Oracle.
o Extensively used Pig for data cleansing.
o Created partitioned tables in Hive.
o Managed and reviewed Hadoop log files.
o Involved in creating Hive tables, loading with data and writing hive queries which will run internally
in MapReduce way.
o Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.
o Installed and configured Pig and also written PigLatin scripts.
o Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
o Created Hbase tables to store various data formats of data coming from different portfolios.
o Developed MapReduce jobs to automate transfer of data from Hbase.
o Used SVN, Tortoise SVN version control tools for code management (checkins, checkouts and
synchronizing
 BIG DATA EXPERIENCE
o Coding Experience in Spark using Scala.
o Experience in Hadoop architecture and Map Reduce programming.
o Experience in Data Analysis using Hive and Impala.
o Experience in ETL using SQOOP, FLUME, Hive and HDFS
o Experience in Data Retrieval optimization, Pre-processing, Joins and Filtering Patterns.
o Production ETL Experience with Oracle, MySql, Db2 and legacy systems.
o Experience in HiveQL, Pig Latin in creating reports, writing scripts for business use cases.
o Experience in Data Modeling in NoSQL Databases (Cassandra, Hbase)
o Experience in preprocessing of weblogs.
o Experience in setting up Cloudera CHD3, CHD4 Hadoop Cluster.
o Proficient in Installation, Configuration of Cloudera components HDFS, SQOOP, FLUME, OOZie,
Hive,Spark.

 Responsibilities:
o Configured a Spark streaming application to stream syslogs and various application logs from 100+
nodes for monitoring and alerting as well as to feed the data to dynamic dashboards.
o Migrated traditional MR jobs to Spark MR Jobs.
o Worked on Spark SQL and Spark Streaming.
o Imported, exported file to the HDFS, Hive, Impala SQL.
o The processed results were consumed by HIVE, Scheduling applications and various other BI
reports through data warehousing multi-dimensional models.
o Run Ad-Hoc query through PIG Latin language, Hive or Java mapreducer
o Wrote PIG scripts and executed by using Grunt shell
o Big data analysis using Pig, Hive and User defined functions (UDF)
o Performed joins, group by and other operations in MapReduce using Java or PIG Latin
o Scheduling all Hadoop/Hive/Sqoop/Hbase jobs using Oozie.
o Collected log data from the web servers and integrated it to HDFS using Flume
o Used setter and getter methods of Java in the reducer to set/get values to and from the java jar
o Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file
o Directed massive cloud migration to Amazon EC2/AWS.
o Design and coding of efficient, reliable and scalable AWS infrastructure.
o Used HIVE definition to map the output file to tables
o Involved in managing and reviewing Hadoop log files
o Developed Scripts and Batch Jobs to schedule various Hadoop Program

 Responsibilities:
o Working on multiple projects spanning from Architecting Hadoop Clusters, Installation,
Configuration and Management of Hadoop Cluster.
o Installed and managed multiple hadoop clusters - Production, stage, development.
o Installed and managed production cluster of 150 Node cluster with 4+ PB.
o Managed multiple Hadoop clusters spanning to 250 Nodes.
o Involved in analyzing system failures, identifying root causes, and recommended course of actions
and lab clusters.
o Designed the Cluster tests before and after upgrades to validate the cluster status.
o Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera
Manager.
o Documented and prepared run books of systems processes and procedures for future references.
o Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
o Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of
the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
o Developed entire data transfer model using Sqoop framework.
o Performed Benchmarking and performance tuning on the Hadoop infrastructure.
o Automated data loading between production and disaster recovery cluster.
o Migrated hive schema from production cluster to DR cluster.
o Implemented the Hadoop Name-node HA services to make the Hadoop services highly available
o Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP
o Worked on Migrating application by doing Poc's from relation database systems.

 Responsibilities:
o Installed Hadoop on clustered Environments on Dev/UAT/Prod Environments
o Strong knowledge in administration and development of Hdfs, Hive, Pig with Hive QL and Pig Latin
scripts Respectively
o Installed, Upgraded and managed datameer on boarding users and maintaining data links.
o Installed and tested impala from beta versions in LAB environments and implemented GA release
in prod.
o Configure the cluster properties to gain the high cluster performance by taking cluster hardware
configuration as key criteria
o Designed the rack topology for the production Hadoop cluster using CM
o Manage the day to day operations of the cluster for backup and support
o Created internal and external Hive tables and defined static and dynamic partitions as per
requirement for optimized performance
o Conducted root cause analysis and worked with Big Data Analysts, Designers and Scientists in
Troubleshooting map reduce job failures and issues with Hive and MapReduce.
o Migrated the Oozie workflows to new version during the up gradation of Hadoop cluster from
cdh3u2 to cdh4u6
o Developed sqoop jobs for loading data from oracle, db2 to hadoop for history load and delta loads.
o Developed Shell scripts to report the disk usage by users on Hadoop clusters and automate
alerting system when user reaches his quotas.
o Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
o Resolving user issues and incidents related onboarding, job failures and any technical support
questions
Map Reduce
o Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the
Big Data
o Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets
using Joins and organizing data using Partitioners
o Experience in writing Custom Counters for analyzing the data and testing using MRUnit framework
o Experienced in writing complex Map Reduce programs that work with different file formats like Text,
Sequence, Xml and Avro
o Expertise in composing Map Reduce Pipelines with many user-defined functions using to
implement complex algorithms.
 PIG
o Expertise in writing ad-hoc Map Reduce programs using Pig Scripts
o Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
o Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from
Piggybanks and other sources.
 HIVE
o Expertise in Hive Query Language (HiveQL), Hive Security and debugging Hive issues
o Responsible for performing extensive data validation using HIVE Dynamic Partitioning and
Bucketing
o Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality
of Java into Pig Latin and HQL (Hive QL)
o Worked on different set of tables like External Tables and Managed Tables
o Experiences with working different Hive SerDe's that handle file formats like avro, xml
o Analyzed the data by performing Hive queries and used HIVE UDFs for complex querying. Kafka
and Storm
o Expert in implementing unified data platform to gather data from different sources using Kafka Java
Producers and consumers.
o Experienced in design Kafka brokers create custom partitions and integrated with apache storm for
transformations.
o Experienced in implementing Kafka Simple consumers to get data from specific partitions.
o Experienced in implementing Storm topologies to do pre processing before move data to target
consumers.
o Experienced in design/develop storm spouts, boults to get data from Kafka sources.
Spark
o Experienced in migrating Map reduce to spark Transformations using in memory processing.
o Design and develop Spark transformations, apply spark actions to imply algorithms.
o Experienced in working with Spark QL to analyze structure data queries.
o Knowledge in implementing predictive algorithms using Spark Mlib libraries.
 NoSQL & Others
o Expert database engineer; NoSQL and relational data modeling
o Responsible for building scalable distributed data solutions using Datastax Cassandra.
o Experienced in design/develop data model for Cassandra file system and implement CRUD
operations using
 Thrift and Rest API.
o Experienced in working with Cassandra Query Language (CQL) to work with file system.
o Expertise in HBase Cluster Setup, Configurations, HBase Implementation and HBase Client API
o Worked on importing data into HBase using HBase Shell and HBase Client API.
o Extensive Experienced in working with MongoDB. Creating collections, insert, find data.
o Experienced win implementing data injection using apache NIFI data flow.
o Experienced in implementing full text search analysis using apache Solr.

 Responsibilities
o Capable of processing large sets of structured, semi-structured and unstructured data an
o Supporting systems application architecture.
o Developed MapReduce programs to parse the raw data, populate staging tables and store the
refined data in partitioned tables in the EDW.
o Implementation of requirement by using Big Data Ecosystems such as Cloudera CDH-5 (Hadoop,
MapReduce, HDFS, Hive, Pig, Sqoop, Oozie and Flume)
o Create Hive queries that helped business users spot emerging trends by comparing fresh data with
EDW reference tables and historical metrics
o Use Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-
process the data.
o Able to assess business rules, collaborate with stakeholders and perform source-to-target data
mapping,design and review.
o Managed and reviewed Hadoop log files.
o Provide assistance for troubleshooting and resolution of problems relating to Hadoop jobs.

 Responsibilities
o Worked on a live 30 nodes Hadoop cluster running Hortonworks Distributed Platform.
o Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication
factor of 3).
o Extracted the data using FLUME from various logs.
o Created and worked Sqoop jobs with incremental load to populate Hive External tables.
o Extensive experience in writing Pig scripts to transform raw data from several data sources into
forming baseline data.
o Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed
and External tables in Hive to optimize performance.
o Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
o Developed Oozie workflow for scheduling and orchestrating the ETL process users/application
support team whenever required.
o Provide assistance for troubleshooting and resolution of problems relating to Hadoop jobs.
Environment: Hadoop, Yarn, MapReduce, Spark, Hive, HBase, HDFS, Hive, Java (JDK 1.6), Linux, Cloudera,
MapReduce, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, UNIX Shell Scripting, Eclipse, Scala.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Spark, HBase, Java, Cloudera Linux, XML, MySQL,
MySQL Workbench, Java 6, Eclipse, Cassandra

You might also like