You are on page 1of 7

Rajasekhar

Hadoop/Big Data Developer


E:sriram.scalaspark@gmail.com
M:2196272116

SUMMARY:

 Over 11 years of professional IT experience Java/J2EE technologies, DevOps Build/Release Management,


Change /Incident Management and Cloud Management. 7+ Years of Big Data Hadoop Ecosystems experience
in ingestion, storage, querying, processing and analysis of big data.
 Hands-on experience architecting and implementing Hadoop clusters on Amazon (AWS), using EMR, S2, S3,
Redshift, Cassandra, MongoDB, CosmosDB, SimpleDB, AmazonRDS, DynamoDB, Postgresql, SQL, MS SQL.
 Experience in Hadoop Administration activities such as installation, configuration, and management of clusters in
Cloudera (CDH4, CDH5), &Hortonworks (HDP) Distributions using Cloudera Manager & Ambari.
 Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS,
MapReduce, Hive, Impala, Sqoop, Pig, Oozie, Zookeeper, Spark, Solr, Hue, Flume, Storm, Kafka and Yarn
distributions.
 Very good Knowledge and experience in Amazon AWS concepts like EMR and EC2 web services which
provides fast and efficient processing of Big Data.
 Experienced in performance tuning of Yarn, Spark, and Hive and experienced in developing MapReduce
Programs using Apache Hadoop for analyzing the big data as per the requirement.
 Expert in big data ecosystem using Hadoop, Spark, Kafka with column-oriented big data systems on cloud
platforms such as Amazon Cloud (AWS), Microsoft Azure and Google Cloud Platform.
 Exposure to Data Lake Implementation using Apache Spark and developed Data pipelines and applied business
logics using Spark and used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache
Spark.
 Utilize Flume, Kafka and NiFi to gain real-time and near real-time streaming data in HDFS from different data
sources.
 Experince in Release management, CI/CD process using Jenkins and Configuration management using Visual
Studio Online.
 Good understanding of Hadoop security components like RANGER and KNOX.
 Good Understanding and experience on NameNode HA architecture and experience in monitoring the health of
cluster using Ambari, Nagios, Ganglia and Cronjobs.
 Experienced in Cluster maintenance and Commissioning /Decommissioning of Data Nodes and good
understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, and
Task Tracker, NameNode, DataNode and MapReduce concepts.
 Experienced in implementation of security controls using Kerberos principals, ACLs, Data encryptions using DM-
Crypt to protect entire Hadoop clusters.
 Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
 Expertise in installation, administration, patches, upgrade, configuration, performance tuning and troubleshooting
of Red hat Linux, SUSE, CentOS, AIX, Solaris.
 Experienced Schedule Recurring Hadoop Jobs with Apache Oozie and experience in Jumpstart, Kickstart,
Infrastructure setup and Installation Methods for Linux.
 Experience in administration activities of RDBMS data bases, such as MS SQL Server.
 Extracted files from Cassandra through Sqoop and placed in HDFS and processed.

Technical Competencies:
Hadoop Ecosystem Tools: MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Spark,
Flume, Nifi
Languages: Python, Java, core Java, HTML, Programming C, C++
Databases: MySQL, Oracle, SQL Server, MongoDB
Datamodeling : Dimensional Data Modeling, Star Join Schema Modeling, Snowflake Modeling.
Data Visualization: Tableau, Qlik View
Platforms: Linux (RHEL, Ubuntu,), open Solaris, AIX
Scripting Languages: Shell Scripting, HTML scripting, Python, Puppet, Java
Web Servers: Apache Tomcat, JBOSS, windows server2003, 2008, and 2012
Cluster Management Tools: HDP Ambari, Cloudera Manager, Hue, Solr Cloud
Build Tools: Jenkins, Maven, Gradle

WORK EXPERIENCE:
VMware Inc, California Apr 2021 to
Present
Sr. Big Data Developer
Roles & Responsibilities:

 Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, Pig, Hive,
MapReduce, Spark, and Shell scripts.
 Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Pig, Hive, and
Hbase.
 Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-
SQL, Data Frame, Pair RDD's, Spark YARN.
 Developed SQL scripts using Spark for handling different data sets and verified the performance over Map
Reduce jobs.
 Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop.
 Effectively composed technical and non-technical documents such as design specifications, operation guides,
process flows, and other technical schematics.
 Wrote and updated task/project status and reports.
 Developed queries in SQL, Python & Looker to amalgamate app navigation, behaviour tracking, and profile data
across member types and tenures. Modelled data for exploration and display, building dashboards for
executives, project leads, and marketing professionals across global markets.
 Implemented spring boot micro services to process the messages into the Kafka cluster setup
 Used Spark SQL on data frames to access mysql tables into spark for faster processing of data.
 Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
 Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database
for huge volume of data.
 Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Workflow
Monitor.
 Worked on different tasks in Workflows like sessions, events raise, event wait, decision, e-mail, command,
worklets, Assignment, Timer and scheduling of the workflow.
 Created sessions, configured workflows to extract data from various sources, transformed data, and loading into
data warehouse.
 Involved in monitoring the workflows and in optimizing the load times.

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Spark-Streaming, Spark SQL, Snowflake, , AWS
Redshift, Python, Scala, Looker, Pyspark, MapR,  Informatica Power Center 8.6/8.1

Capital One, Mclean,VA Oct 2019 to Mar


2021
Sr.Big Data Developer
Roles & Responsibilities:

 Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics
initiative in a Hadoop-based Data Lake.
 Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in
AWS HDFS.
 Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like
Hive, Pig, HBase, Zookeeper and Sqoop.
 Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer
Group in Kafka.
 Extracted real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of
Data Frame and save the data as Parquet format in HDFS.
 Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and
financial histories into HDFS for analysis.
 Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.
 Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS
for further analysis.
 Extensively worked on SQL, PlSQL, UNIX, Talend Open Studio, Snowflake, Informatica Power Center 9X.
 Excellent track record in the implementation of business applications using RDBMS OLAP, OLTP and effective
utilization of the technologies viz. Snowflake and Oracle Database, SQL, PLSQl, Talend Open Studio,
Informatica Power Center Tool-ETL.
 Upgraded the Hadoop cluster from CDH4.7 to CDH5.2 and worked on installing cluster, commissioning &
decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
 Developed Spark scripts to import large files from Amazon S3 buckets and imported the data from different
sources like HDFS/HBase into SparkRDD.
 Developed queries in SQL, Python & Looker to amalgamate app navigation, behavior tracking, and profile data
across member types and tenures. Modeled data for exploration and display, building dashboards for executives,
project leads, and marketing professionals across global markets.
 Create visualizations and dashboards using Looker and Google Data Studio.
 Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation and worked on
importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
 Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for
the Cloudera Manager Server, enabling Kerberos Using the Wizard.
 Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
 Monitored cluster for performance and, networking and data integrity issues and responsible for troubleshooting
issues in the execution of MapReduce jobs by inspecting and reviewing log files.
 Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
 Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration
management, monitoring, debugging, and performance tuning.
 Supported MapReduce Programs and distributed applications running on the Hadoop cluster and scripting
Hadoop package installation and configuration to support fully automated deployments.
 Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets
processing and storage and worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2
Instances.
 Experienced in ETL Talend Data Fabric components and used features of Context Variables, MySQL, Oracle,
Hive Database components.
 Hands on Involvement on many components which are there in the palette to design Jobs & used Context
Variables to Parameterize Talend Jobs.
 Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop
clusters and worked on Hive for further analysis and for generating transforming files from different analytical
formats to text files. Loaded files to Hive and HDFS from MongoDB Solr.
 Created Hive External tables and loaded the data into tables and query data using HQL and worked with
application teams to install operating system, Hadoop updates, patches, version upgrades as required.
 Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
 Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager and maintaining the Cluster
by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
 Worked on Hive for exposing data for further analysis and for generating transforming files from different
analytical formats to text files.
 Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done
using python (PySpark)

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Spark-Streaming, Spark SQL, MongoDB,
Snowflake, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Looker, Pyspark, MapR, Java, Oozie, Flume, HBase,
Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5

United Airlines – Chicago, IL Nov 2018 – Sept 2019


Sr. Big Data Developer
Responsibilities:

 Involved in project Life Cycle - from analysis to production implementation, with emphasis on identifying the
source and source data validation, developing logic and transformation as per the requirement and creating
mappings and loading the data into different targets
 Loaded periodic incremental imports of structured batch data from various RDBMS to HDFS using Sqoop
 Implemented Kafka consumers for HDFS and Spark Streaming
 Used Spark Streaming to preprocess the data for real-time data analysis
 Worked on basic Shell Scripting to put data from sources to HDFS and S3. Scheduled the scripts using cron tab
 Involved in writing query using Impala for better and faster processing of data. Implemented Partitioning in
Impala for faster and efficient data access
 Worked on reading multiple data formats such as Avro, Parquet, ORC, JSON including Text
 Spark transformation scripts using API’s like Spark Core and Spark SQL in Scala
 Worked on writing custom Spark Streaming API’s to ingest the data to Elastic Search post the data enrichment in
Spark.
 Provide Elastic Search tuning/optimization based on specific client data structure, Sizes and configures
Elasticsearch clusters and shards.
 Developing various templates to Manage Multiple Elastic indexes, index patterns, users and roles.
 Exploring new elastic components which can improve existing system like elastic cloud enterprise to centralize
all clusters and playbooks and Ansible and machine learning.
 Worked on Apache NiFi in implementing basic workflows using prebuilt processors
 Worked with the team in visualizing data using Tableau
 Experienced as Senior ETL Developer (Hadoop ETL/ Teradata /Vertica / Informatica / Datastage/ Mainframe),
Subject Matter Expertise (SME), Production Support Analyst, QA Tester.
Environment: Spark Streaming, Spark SQL, Spark Core, HDFS, S3, EMR, Impala, Kafka, Sqoop, Oozie, Cloudera
Manager, Apache NiFi, Zoo Keeper, Elastic Search.

Hadoop Spark Developer


ADP, NJ May 2017
– Oct 2018
Responsibilities:

 Examined data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
 Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyse operational data.
 Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
 Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP
servers and data warehouses.
 Real time streaming the data using Spark and Kafka.
 Worked on troubleshooting spark application to make them more error tolerant.
 Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
 Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-
Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
 Wrote Kafka producers to stream the data from external rest API's to Kafka topics.
 Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to
HBase.
 Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark,
effective & efficient Joins, transformations and other capabilities.
 Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second
on streaming data.
 Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
 Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
 Worked extensively with Sqoop for importing data from Oracle.
 Experience working for EMR cluster in AWS cloud and working with S3.
 Involved in creating Hive tables, loading and analysing data using Hive scripts.
 Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
 Used MAVEN extensively for building jar files of Map Reduce programs & deployed to cluster.
 Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
 Perform Tuning and Increase Operational efficiency on a continuous basis.
 Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
 Worked on POC's with Apache Spark using Scala to implement spark in project.
 Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon
AWS, HBase, Teradata, Power Centre, Tableau, Oozie, Oracle, Linux

Evicore –Nashville, TN June 2016 to May 2017


Hadoop Developer
Roles & Responsibilities:
 Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various
proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
 Architected Hadoop system pulling data from Linux systems and RDBMS database on a regular basis in order to
ingest data using Sqoop.
 Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4)
distributions and on Amazon web services (AWS).
 Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in
partitioned tables in the EDW.
 Worked on Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera
manager along with Cloudera Manager Upgrade.
 Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW
reference tables and historical metrics.
 Setting an Amazon Web Services (AWS) EC2 instance for the Cloudera Manager server.
 Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop
Distributed File System and PIG to pre-process the data.
 Provided design recommendations and thought leadership to sponsors/stakeholders that improved review
processes and resolved technical problems.
 Identify query duplication, complexity and dependency to minimize migration efforts Technology stack: Oracle,
Hortonworks HDP cluster, Antiunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
 Shared responsibility for administration of Hadoop, Hive, and Pig and managed and reviewed Hadoop log files
and updating the configuration on each host.
 Worked with Spark eco system using Scala, Python and HIVE Queries on different data formats like Text file and
parquet.
 Tested raw data and executed performance scripts and configuring Cloudera Manager Agent heartbeat interval
and timeouts.
 Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load
Balancer, and Auto scaling groups, VPC subnets and CloudWatch.
 Implemented CDH3 Hadoop cluster on RedHat Enterprise Linux 6.4, assisted with performance tuning and
monitoring.
 Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
 Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD,
Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
 Providing reports to management on Cluster Usage Metrics and related HBase tables to load large sets of
structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.
 Involved in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP)
and MapReduce.
 Performed installation, upgrade and configure tasks for impala on all machines in a cluster and supported
code/design analysis, strategy development and project planning.
 Created reports for the BI team using Sqoop to export data into HDFS and Hive and assisted with data capacity
planning and node forecasting.
 Managing Amazon Web Services (AWS) infrastructure with automation and configuration.
 Administrator for Pig, Hive and HBase installing updates, patches, and upgrades and performed both major and
minor upgrades to the existing CDH cluster and upgraded the Hadoop cluster from CDH3 to CDH4.
 Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and generating views
on the data source using Shell Scripting and Python.
Environment: : Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java (jdk1.6), Cloudera, Oracle, SQL
Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, ETL, Sqoop, Python, Kafka, PySpark, AWS, S3, Oracle, SQL,
Hortonworks, XML, RedHat Linux 6.4

Sabre, Dallas, TX
Big Data and Analytics Developer Sept 2015 – May 2016

Responsibilities:
 Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
 Developed Impala queries to pre-process the data required for running the business process.
 Actively involved in design analysis, coding and strategy development.
 Developed Hive scripts for implementing dynamic partitions and buckets for history data.
 Developed Spark scripts by using Scala per the requirement to read/write JSON files.
 Involve in converting SQL queries into Spark transformations using Spark RDDs and Scala.
 Analyzed the SQL scripts and designed the solution to implement using Scala.
 Worked on creating data ingestion pipelines to ingest huge amount of Stream and customer application data into
Hadoop in various file formats like raw text files, CSV, ORC from applications.
 Worked extensively on integrating Kafka (Data Ingestion) with Spark streaming to achieve high performance real
time processing system.
 Application of various machine learning algorithms like decision trees, regression models, neural networks, SVM,
clustering to identify fraudulent profiles using scikit-learn package in Python. Used clustering technique K-Means
to identify outliers and to classify unlabeled data.
 Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
 Use Spark-MongoDB connector to load data into MongoDB and analyze data from MongoDB tables for quick
searching, sorting and grouping.
 Used Spark API using Scala over Cloudera Hadoop YARN to perform analytics on data in Hive.
 Create a Hadoop design which replicates the Current system design.
 Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data aggregation and
queries.
Environment: R 3.0, Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Flume, Spark, Spark-Streaming, Kafka, R-Studio,
AWS, Tableau 8, MS Excel, Apache, Spark ML lib, TensorFlow, Amazon Machine Learning (AML)Python, MongoDB.

PB Systesm,India
Java Developer Jan 2011 to Dec 2014
Responsibilities:
 Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX.
 Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern
and MVC.
 Developed EJB component to implement business logic using Session and Message Bean.
 Excellent working experience with Oracle10g including storage and retrieving data using Hibernate.
 Developed the application using Struts Framework that leverages Model View Controller (MVC) architecture
 Implemented design patterns such Singleton, Factory pattern and MVC
 Deployed the applications on IBM WebSphere Application Server.
 Worked on Java script, CSS Style Sheet, Bootstrap, jQuery.
 Worked one-on-one with client to develop layout, color scheme for his website and implemented it into a final
interface design with the HTML5/CSS3 & JavaScript using Dreamweaver.
 Used advanced level of HTML5, JavaScript, jQuery, CSS3 and pure CSS layouts (table less layout)
 Wrote SQL queries to extract data from the Oracle & MySQL databases.
 Monitored the error logs using Log4J Maven is used as a build tool and continuous integration is done using
Jenkins.
 Developed Collections in MongoDB and performed aggregations on the collections.
 Used Hadoop's Pig, Hive and Map Reduce for analyzing the data and to help by extract data sets for meaningful
information.
 Involved in developing various data flow diagrams, use case diagrams and sequence diagrams.
 Hands on experience in MapReduce jobs with the Data Science team to analyze this data.
 Converted output to structured data and imported to Tableau with analytics team.

Environment: Java, JSF MVC, Spring IOC, APEX, Spring JDBC, Hibernate, ActiveMQ, Log4j, Ant, MySQL, JDK
1.6, J2EE, JSP, Servlets, HTML, JDBC, MongoDB, DAO, EJB 3.0, PL/SQL, Web Sphere, Eclipse, Angular.JS, and CVS.

Educational Details:
Bachelors in Information technology from Jntuh college in 2012
Master's in Computer Science from Rivier university, Nashua, New hampshire - 2016

Professional References:

Company: Capital One, Mclean,VA Company: United Airlines,Chicago,IL


Email: sandeepkumar.vidiyala@capitalone.com Email: : venkata.revunuru@united.com
Name : Sandeep.v Name : Karunakar reddy

You might also like