You are on page 1of 5

Karthik Potharaju

Sr. Hadoop/Big data Developer


Email: vsai03595@gmail.com
Ph: 601-691-1228
LinkedIn:
Professional Summary

 Over 8+ years of overall IT experience in a variety of industries, which includes hands on experience
of around 5+ years in Big Data technologies (1.0 and 2.0) and designing and implementing Map
Reduce MR1 and MR2 architectures. 
 Well versed in installation, configuration, supporting and managing of Big Data and underlying
infrastructure of Hadoop Cluster. 
 Good knowledge of Hadoop Development and various components such as HDFS, Job Tracker, Task
Tracker, Data Node, Name Node and Map-Reduce concepts. 
 Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS,
HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro. 
 Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster
using various distributions such as Apache and Cloudera. 
 Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in
Java. 
 Involved in project planning, setting up standards for implementation and design of Hadoop based
applications. 
 Written MapReduce programs with custom logics based on the requirement and writing custom
UDFs in pig and hive based on the user requirement. 
 Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS).
 Implemented NOSQL databases like HBase, Cassandra and MongoDB for storing and processing
different formats of data. 
 Implemented Oozie for writing work flows and scheduling jobs. Written Hive queries for data
analysis and to process the data for visualization. 
 Installed Spark and performed analyzing HDFS data and then, by caching a dataset in memory to
perform a large variety of complex computations interactively 
 Experience in importing and exporting the different formats of data into HDFS, HBASE from different
RDBMS databases and vice versa. 
 Developed applications using Spark for data processing. 
 Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and
actions for the faster analysis of the data. 
 Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as
storage mechanism
 Java, Python & Other Experience in installing and setting up Hadoop Environment in cloud though
Amazon Web services (AWS) like EMR and EC2 which provide efficient processing of data. 
 Very good experience in complete project life cycle (design, development, testing and
implementation) of Client Server and Web applications. 
 Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing
project to handle data from various RDBMS and Streaming sources. 
 Worked with the Spark for improving performance and optimization of the existing algorithms
in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN. 
 Experienced in Apache Spark for implementing advanced procedures like text analytics and
processing using the in-memory computing capabilities written in Scala. 
 Experience using middleware architecture using Sun Java technologies like J2EE, Servlets, and
application servers like Web Sphere and Web logic. 
 Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL. 
 Converted Various Hive queries into Spark transformations and Actions that are required. 
 Experience in working on apache Hadoop open source distribution with technologies like HDFS,
Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming,
Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos. 

Technical Skills

Hadoop Components HDFS, Hue, MapReduce, PIG, Hive, HCatalog, Hbase, Sqoop, Impala,
Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos.
Spark Components Apache Spark, Data Frames, Spark SQL, Spark, YARN, Pair RDDs 
Web Technologies / J2EE, XML, Log4j, HTML, XML , CSS, JavaScript,
Other components
Server Side Scripting UNIX Shell Scripting.
Databases Oracle 10g, Microsoft SQL Server, MySQL, DB2, Teradata
Programming Languages Java, C, C++, Scala, Impala, Python.
Web Servers  Apache Tomcat, BEA WebLogic.
IDE Eclipse, Dreamweaver
OS/Platforms Windows 2005/2008, Linux (All major distributions), Unix.
NoSQL Databases Hbase, MongoDB. 
Methodologies Agile (Scrum), Waterfall, UML, Design Patterns, SDLC.

Currently Exploring Apache Flink, Drill, Tachyon.

Conduent, Madison, MS May 2017 – Till date


Sr. Hadoop / Spark Developer

Responsibilities:
 Developed simple to complex MapReduce streaming jobs using Java language for processing and
validating the data.
 Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral
data into HDFS for analysis.
 Developed MapReduce and Spark jobs to discover trends in data usage by users.
 Implemented Spark using Python and Spark SQL for faster processing of data.
 Developed functional programs in SCALA for connecting the streaming data application and
gathering web data using JSON and XML and passing it to FLUME.
 Used Spark for interactive queries, processing of streaming data and integration with popular
NoSQL database for huge volume of data.
 Used the Spark -Cassandra Connector to load data to and from Cassandra.
 Real time streaming the data using Spark with Kafka.
 Experienced in working with Amazon Web Services (AWS) EC2 and S3 in Spark RDD
 Handled importing data from different data sources into HDFS using Sqoop and also performing
transformations using Hive, MapReduce and then loading data into HDFS.
 Exported the analyzed data to the relational databases using Sqoop, to further visualize and
generate reports for the BI team.
 Configured other ecosystems like Hive, Sqoop, Flume, Pig and Oozie.
 Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for
further analysis
 Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study
customer behavior.
 Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
 Developed Pig Latin scripts to perform Map Reduce jobs.
 Developed product profiles using Pig and commodity UDFs.
 Worked on scalable distributed data system using Hadoop ecosystem.
 Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.
 Created HBase tables and column families to store the user event data.
 Written automated HBase test cases for data quality checks using HBase command line tools.
 Created UDF's to store specialized data structures in HBase and Cassandra.
 Create and configured the AWS RDS/Redshift to use Hadoop Ecosystem on AWS infrastructure
 Scheduled and executed work flows in Oozie to run Hive and Pig jobs.
 Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
 Used Tez framework for building high performance jobs in Pig and Hive.
 Configured Kafka to read and write messages from external programs.
 Configured Kafka to handle real time data.
 Developed end to end data processing pipelines that begin with receiving data using distributed
messaging systems Kafka through persistence of data into HBase.
 Uploaded and processed terabytes of data from various structured and unstructured sources into
HDFS (AWS cloud) using Sqoop and Flume.
 Written Storm topology to emit data into Cassandra DB.
 Written Storm topology to accept data from Kafka producer and process the data.
 Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
 Worked extensively with importing metadata into Hive and migrated existing tables and applications
to work on Hive and AWS cloud.
 Developed interactive shell scripts for scheduling various data cleansing and data loading process.
 Performed data validation on the data ingested using MapReduce by building a custom model to
filter all the invalid data and cleanse the data.
 Experience with data wrangling and creating workable datasets.
 Developed schemas to handle reporting requirements using Jaspersoft.
Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Zookeeper, Kafka, Flume,
Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Jaspersoft, Multi-node cluster
with Linux-Ubuntu, Windows, Unix.

ADT, Boca Raton, FL Aug 2016 – May 2017

Sr. Hadoop Developer/Big data developer


Responsibilities:
 Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral
data and financial histories into HDFS for analysis 
 Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase
database and Sqoop. 
 Implemented real time analytics pipeline using Confluent Kafka, storm, elastic search, splunk and
green plum. 
 Design and develop Informatica BDE Application and Hive Queries to ingest to Landing Raw zone
and transform the data with business logic to refined zone and to Green plum data marts for
reporting layer for consumption thro Tableau. 
 Installed, configured, and maintained big data technologies and systems. Maintained
documentation and troubleshooting playbooks. 
 Automated the installation and maintenance of Kafka, storm, zookeeper and elastic search using
salt stack technology. 
 Developed connectors for elastic search and green plum for data transfer from a kafka topic.
Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams
using java for real time data processing. 
 Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform
analytics on data in Hive.
 Exploring with Spark improving the performance and optimization of the existing algorithms Hadoop
using Spark context, Spark-SQL, Data Frame, Spark YARN.
 Imported and exporting data into HDFS and Hive using SQOOP & Developed POC on Apache-Spark
and Kafka. Proactively monitored performance, Assisted in capacity planning. 
 Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce
and Hive using Sqoop. 
 Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and
loaded final data into HDFS Good understanding of performance tuning with NoSQL, and SQL
Technologies.
 Design/Develop framework to leverage platform capabilities using MapReduce, Hive 
UDFs. 
 Worked on data transformation pipelines like Storm. Worked with operational analytics and log
management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Green
plum. 
 Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools
in Hadoop technology stack (Map Reduce, Yarm, Pig, Hive, HDFS) 
Environment: Java, Confluent Kafka, HDFS, Storm, Elastic Search, Salt Scripting, Green plum, k-streams,
k-tables, splunk, Hadoop..

Continental North American, Chicago, IL Jan 2015 – Aug 2016


Hadoop Developer
Responsibilities:
 Involved in loading data from UNIX file system to HDFS. Imported and exported data into HDFS and
Hive using Sqoop. 
 Evaluated business requirements and prepared detailed specifications that follow project guidelines
required to develop written programs. 
 Devised procedures that solve complex business problems with due considerations for
hardware/software capacity and limitations, operating times and desired results. 
 Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Provided quick response to ad hoc internal and external client requests for data and experienced in
creating ad hoc reports. 
 Responsible for building scalable distributed data solutions using Hadoop. Worked hands on with
ETL process. 
 Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and
troubleshooting, manage and review data backups, manage and review Hadoop log files. 
 Handled Imported of data from various data sources, performed transformations using Hive,
MapReduce, and loaded data into HDFS. 
 Extracted the data from Teradata into HDFS using Sqoop. Analyzed the data by performing Hive
queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music
lovers etc. 
 Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and
managing the Hadoop cluster through Cloudera Manager. 
 Installed Oozie workflow engine to run multiple Hive. Developed Hive queries to process the data
and generate the data cubes for visualizing. 
Environment: Hive, Pig, Apache Hadoop, Cassandra, Sqoop, Big Data, HBase, Zookeeper, Cloudera,
Centos, No SQL, sencha extjs, java script, Ajax, Hibernate, Jms, web logic Application server, Eclipse,
Web services, azure, Project Server, Unix, Windows.

Intense Technologies,India June 2012 – Dec 2014


Java Developer
Responsibilities:
 Individually worked on all the stages of a Software Development Life Cycle (SDLC).
 Used JavaScript code, HTML and CSS style declarations to enrich websites.
 Implemented the application using Spring MVC Framework which is based on MVC design pattern.
 Developed application service components and configured beans using (applicationContext.xml)
Spring IOC
 Designed User Interface and the business logic for customer registration and maintenance.
 Integrating Web services and working with data in different servers.
 Involved in designing and Development of SOA services using Web Services.
 Understanding the requirements from business users and end users.
 Working with XML/XSLT files.
 Experience creating UML class and sequence diagram.
 Experience in Creating Tables, Views, Triggers, Indexes, Constraints and functions in SQL Server2005.
 Worked in content management for versioning and notifications.
Environment:  Java, J2EE, JSP, spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web
Services, JUnit, SVN, Windows, UNIX.

You might also like