Suraj Ramesh Chamoli - Bigdata

Suraj Ramesh
Professional Summary
 Over 8+ years of Professional experience which includes Analysis Design, Development,

integration, Deployment, and maintenance of quality software applications using Java/J2EE and
Big Data/Hadoop technologies.
 Proficiency in Java, Hadoop MapReduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Scala,
Python, Kafka, Impala, NoSQL Databases, Snowflake, Teradata, DevOps and AWS.
 Good exposure on usage of NoSQL databases column oriented HBase, Cassandra and
MongoDB (Document Based DB).
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database
Systems (RDBMS)
 Good Knowledge in Amazon Web Service (AWS) concepts like EMR, ECS, ECR and EC2
web services.
 Experienced in analyzing data with Hive Query Language (HQL) and Pig Latin Script.
 Extensive experience writing custom MapReduce programs for data processing and UDFs for
both Hive and Pig in Java.
 Experience successfully delivering applications using agile methodologies including extreme
programming, SCRUM and Test-Driven Development (TDD).
 Strong hands-on development experience with Java, J2EE (Servlets, JSP, Java Beans, EJB,
JDBC, JMS, Web Services) and related technologies.
 Experience in implementing unified data platforms using Kafka producers/ consumers,
implement pre-processing using Storm.
 Experience in building, deploying and integrating applications in Application Servers with ANT,
Maven and Gradle.
 Significant application development experience with REST Web Services, SOAP, WSDL, and
XML.
 Experience in building Scala, Spark SQL and MLlib libraries along with Kafka and other
tools as per requirement then deployed on the Yarn cluster
 Experience using various Hadoop Distributions (Cloudera, Hortonworks, and MapReduce
etc.) to fully implement and leverage new Hadoop features
 In depth and extensive knowledge of Hadoop Architecture and various components such as
HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Resource Manager,
Node Manager and Map Reduce.
 Experience in processing vast data sets by performing structural modifications, to cleanse both
structured and semi-structured data using MapReduce programs in JAVA, Hive QL and Pig
Latin.
 Working with multiple file Input Formats such as Text File, Key Value and Sequence File input
format.
 Experience in working with multiple file formats JSON, XML, Sequence Files and RC Files.
 Solid Experience in optimizing the Hive queries using Partitioning and Bucketing techniques,
which controls the data distribution, to enhance performance.
 Experience in Importing and Exporting data from different databases like MySql, Oracle into
HDFS and Hive using Sqoop.
Skills
 Hadoop/Big Data: MapReduce, HDFS, Hive 2.3, Pig 0.17, HBASE 1.2, Zookeeper 3.4, Sqoop
1.4, Oozie, Flume 1.8, Scala 2.12, Kafka 1.0, Storm, MongoDB 3.6, Hadoop 3.0, Spark,
Cassandra 3.11, Impala 2.1, Google cloud computing (BigQuery), Control-M
 Database: Snowflake, Oracle 12c, DB2, MySQL, MS SQL server, Teradata15.
 Web Tools: HTML 5.1, Java Script, XML, ODBC, JDBC, Hibernate, JSP, Servlets, Java, Struts,
spring, and Avro.
 Cloud Technology: Amazon Web Services (AWS), EMR, ECS, ECR, EC2, EC3, Elastic Search,
Microsoft Azure.
 Languages: Java/J2EE, SQL, Shell Scripting, C/C++, Python
 Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery
 IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence
Version Control Git, SVN, CVS
 Operating System: Windows, Unix, Linux.
 Tools: Eclipse Maven, ANT, JUnit, Jenkins, Soap UI, Log4j,
 Scripting Languages: JavaScript, JQuery, AJAX, CSS, XML, DOM, SOAP, REST
Work Experience
Walmart May 20 – Till Date

Hadoop Developer/Data Engineer
Responsibilities:
 Implemented data pipeline using Spark and Hive ingest customer behavioral data into Hadoop
platform to perform user behavioral analytics.
 Developed Spark applications using Scala for ingestions of data from one environment to
another along with test cases.
 Created Hive tables to load large sets of data after transformation of raw data.
 Enabled and automated data pipelines for moving over 25 Gb of data from Oracle to Hadoop
and Google Big Query using GitHub for source control and Jenkins.
 Created a Big Query table by writing a python program that checks a Linux Directory for
incoming XML files and then upload all new files to a Google cloud platform storage location
before the data is parsed and loaded
 Utilizing Google Big Query SQL and Amazon Athena to build and drive reporting excellence.
 Created Google Dataflow pipelines for uploading large public dataset into Google Big Query
 Implemented end-to-end tests between Dataflow and Big Query
 Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give
better execution HQL queries.
 Created Hive External table for Semantic data and loaded the data into tables and query data
using HQL.
 Identify data sources, create source-to-target mapping, storage estimation, provide support for
setup and data partitioning.
 Developed workflows in Atomic to cleanse and transform raw data into useful information to
load into HDFS.
NBC Universal/Cognizant- Ney York City, NY October 19 – April 31
Big Data Developer
Responsibilities:
 Involved in story-driven agile development methodology and actively participated in daily
scrum meetings.
 Involved in data migration project for multiple applications from on-prem to AWS.
 Implemented multiple Docker images according to the applications, ECS clusters, ECR
repositories, and orchestrated the task definitions to process the data on daily basis
accordingly in AWS.
 Ingested terabytes of click stream data from external systems like FTP Servers and S3 buckets
into HDFS using custom Input Adaptors.
 Implemented installation and configuration of multi-node cluster on the cloud using Amazon
Web Services (AWS) on EC2.
 Responsible for building and configuring distributed data solution using MapR distribution of
Hadoop.
 Involved in complete Big Data flow of the application data ingestion from upstream to HDFS,
processing the data in HDFS and analyzing the data.
 Involved in importing the data from various formats like MapR-DB JSON, XML to HDFS
environment.
 Used MySQL to import data HDFS and vice-versa using Sqoop to configure Hive meta store
with MySQL, which stores the metadata for Hive tables.
 Involved in transfer of data from post log tables into HDFS and Hive using Sqoop.
 Analyzed the existing data flow to the warehouses and taking the similar approach to migrate the
data into HDFS.
 Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the Hive
queries decreased the time of execution from hours to minutes.
 Involved in Agile methodologies, daily scrum meetings, sprint planning.
 Identify data sources, create source-to-target mapping, storage estimation, provide support for
Hadoop cluster setup, data partitioning.
 Implemented Nifi flow topologies to perform cleansing operations before moving data into
HDFS.
 Implemented the Cassandra and manage of the other tools to process observed running on over
YARN.
 Implemented Storm builder topologies to perform cleansing operations before moving data into
Cassandra.
 Created a POC for the demonstration of retrieving the JSON data by calling Rest service and
converting into CSV by creating data flow and loading into HDFS.
 Worked with the Apache Nifi flow to perform the conversion of Raw data into ORC.
 Involved in loading and transforming large sets of structured data from router location to EDW
using a Nifi data pipeline flow.
 Created Hive tables to load large sets of data after transformation of raw data.
 Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give
better execution Hive QL queries.
 Extensively worked on creating an End-End data pipeline orchestration using Nifi.
 Designed Data flow to pull the data from Rest API using Apache Nifi with SSL context
configuration enabled.
 Developed custom Nifi processors in java to add additional functionality to Nifi for providing
out of box solutions.
 Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the
hive queries decreased the time of execution from hours to minutes.
 Implemented data pipeline using Spark, Hive, Sqoop and Kafka to ingest customer behavioral
data into Hadoop platform to perform user behavioral analytics.
 Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and
Slave) for an AWS EMR Cluster.
 Involved in build applications using Maven and integrated with CI servers like Jenkins to build
jobs.
 Used Visualization tools such as PowerView for Excel, Tableau for visualizing and generating
reports.
 Collaborating with application teams to install operating system and Hadoop updates, patches,
version upgrades when required.
 Implemented end-to-end pipelines for performing user behavioral analytics to identify user-
browsing patterns and provide rich experience and personalization to the visitors.
Environment: HDFS, Nifi 1.5, Hive2.3, Pig 0.17, Sqoop 1.4, Oozie 4.3, Hadoop 3.0, MySQL 5.7,
Metadata, Kafka 3.0, HBase 1.2, Spark 2.3, Scala, Oozie 4.3, Python3.6, Jenkins, Maven, Cassandra 3.11
Fidelity Investments – Merrimack, NH April 17-August 19

Hadoop/Big Data Developer
Responsibilities:
 Involved in story-driven agile development methodology and actively participated in daily
scrum meetings.
 Involved in gathering and analyzing business requirements and designing Data Lake as per the
requirements.
 Imported the data from relational databases to Hadoop cluster by using Sqoop.
 Provided batch processing solution to certain unstructured and large volume of data by using
Hadoop Map Reduce framework.
 Built distributed, scalable, and reliable data pipelines that ingest and process data at scale
using Hive and MapReduce.
 Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for
establishing clusters on cloud.
 Created a multi-threaded Java application running on edge node for pulling the raw click
stream data from FTP servers and AWS S3 buckets.
 Used HDFS File System API to connect to FTP Server and HDFS, S3 AWS SDK for
connecting to S3 buckets.
 Analyzed the data by performing the Hive queries using Hive QL to study the customer
behavior.
 Developed MapReduce jobs in Java for cleansing the data and preprocessing.
 Loaded transactional data from MySQL using Sqoop and create Hive Tables.
 Developed workflows to cleanse and transform raw data into useful information to load into
HDFS and NoSQL database.
 Extensively used Sqoop for efficiently transferring bulk data between HDFS and relational
databases.
 Wrote python scripts to parse XML documents and load the data in database.
 Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real
time analysis.
 Created Hive External table for Semantic data and loaded the data into tables and query data
using HQL.
 Worked on data visualization and analytics with research scientist and business stake holders.
 Implemented dynamic partitioning and Bucketing in Hive as part of performance tuning.
 Implemented the recurring workflows using Oozie to automate the scheduling flow.
 Worked with application teams to install OS level updates and version upgrades for Hadoop
cluster environments.
 Built the automated build and deployment framework using Jenkins, Maven.
Environment: Hadoop, Amazon AWS, S3 AWS , Oozie 4.0, EC2, HDFS, Spark 2.0, Sqoop 1.4, MySQL
5.6, Hive 2.3, Cloudera, HBase 1.2, MapReduce, NoSQL, MongoDB, Cassandra 2.1, Kafka 2.2, JSON,
Jenkins, Maven
Sun Trust Bank - Atlanta, GA August 15 – Mar 17

Sr. J2EE/Hadoop Developer
Responsibilities:
 Involved in various Software Development Life Cycle (SDLC) phases of the project like
Requirement gathering, development, enhancements using agile methodologies.
 Developed the application under JEE architecture, developed, designed dynamic and browser
compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript
 Played key role in design and development of new application using J2EE, Servlets, and Spring
technologies/frameworks using Service Oriented Architecture (SOA).
 Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service
classes and JSP pages.
 Developed validation using Spring's Validation Interface and used Spring Core and MVC
develop the applications and access data.
 Installed and configured various components of Hadoop ecosystem and maintained their
integrity.
 Responsible for building scalable distributed data solutions using Hadoop.
 Installed and configured Hive, Sqoop on the Hadoop cluster.
 Used NoSQL databases Cassandra for POC purpose in storing images and URIs.
 Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and
later analyzed the imported data using Hadoop Components
 Created Cassandra tables to store various data formats of data coming from different sources.
 Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several
types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific
jobs.
 Developed Java map-reduce programs to encapsulate transformations.
 Created Stored Procedures to transform the Data and worked extensively in Pl/Sql for various
needs of the transformations while loading the data.
 Worked as a developer in creating complex Pl/Sql, stored procedures, cursors, tables, and
views and other SQL joins and statements for applications.
 Worked with structured and semi structured data of approximately 100TB with replication
factor of 3.
 Created HBase tables to store various data formats of data coming from different portfolios.
 Involved in complete Implementation lifecycle, specialized in writing custom MapReduce,
Sqoop and Hive programs.
 Built on-premise data pipelines using Kafka and Spark streaming using the feed from API
streaming Gateway REST service.
Environment: Html5, CSS3, JavaScript, Spring, Cassandra, Hadoop, Hive 1.1, Sqoop 1.4, Flume, Oozie
3.3, Kafka 2.0, Spark 1.1, HDFS, NoSQL, MapReduce, HBase, Pl/Sql
App labs - Hyderabad, India Oct 12– June 14

Java Developer
Responsibilities:
 Involved in designing and development of the requirements in SDLC followed by Waterfall
methodology.
 Developed the application on Eclipse IDE utilizing the spring framework and MVC
Architecture.
 Worked with users to analyze the requirements and technically created the requirements in JAVA
frameworks.
 Managed connectivity using JDBC for querying/inserting & data management including triggers
and stored procedures.
 Involved in the development of presentation layer and GUI responsive development using JSP,
HTML5, CSS/CSS3, Bootstrap & used Client Side validations were done using Spring MVC,
XSLT and JQuery.
 Transformed HTML files from XML, XSLT using DOM Parser, and Transformer Factory and
hosted on Weblogic server.
 Worked on Spring IoC, Spring MVC framework, Spring Messaging Framework and Spring
AOP to develop application service components.
 Used JavaScript, JQuery and Ajax API for intensive user operations and client-side validations.
 Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data
representation from MVC model to Oracle Relational data model with SQL-based schema.
 Developed Use Case Diagrams, Object Diagrams and Class Diagrams in UML using Rational
Rose.
 Used JDBC to connect to Oracle database.
 Implemented Database application programming for Oracle, PostgreSQL server using Stored
Procedures, Triggers, and Views etc.
 Wrote SQL Queries and stored procedures for data manipulations with the Oracle database.
 Used Oracle as Databases and wrote SQL queries for updating and inserting data into the tables.
 Developed web services using SOAP, WSDL and REST.
 Used SOAP (Simple Object Access Protocol) for web service by exchanging XML data
between the applications.
 Modified Log4j for logging and debugging and developed the pom.xml using Maven for
compiling the dependencies.
 Performed Unit Testing using JUnit and supported System in production.
 Used Ant to build the deployment JAR and WAR files.
 Used Weblogic application server to host the EJBs.
 Used Core Java concepts like Collections, Garbage Collection, Multithreading, OOPs
concepts and APIs to do encryption and compression of incoming request to provide security.
 Written and implemented test scripts to support Test driven development (TDD) and continuous
integration.
Environment: Eclipse, JavaScript, JavaScript, JQuery, Ajax, SQL, Oracle10g, Maven, Log4j, HTML4,
CSS2, Bootstrap2.0, XML, PostgreSQL

Suraj Ramesh Chamoli - Bigdata

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Suraj Ramesh Chamoli - Bigdata

Uploaded by

Copyright:

Available Formats

Suraj Ramesh

 Over 8+ years of Professional experience which includes Analysis Design, Development,

Walmart May 20 – Till Date

Fidelity Investments – Merrimack, NH April 17-August 19

Sun Trust Bank - Atlanta, GA August 15 – Mar 17

App labs - Hyderabad, India Oct 12– June 14

You might also like