Professional Documents
Culture Documents
Suraj Ramesh Chamoli - Bigdata
Suraj Ramesh Chamoli - Bigdata
Professional Summary
Skills
Hadoop/Big Data: MapReduce, HDFS, Hive 2.3, Pig 0.17, HBASE 1.2, Zookeeper 3.4, Sqoop
1.4, Oozie, Flume 1.8, Scala 2.12, Kafka 1.0, Storm, MongoDB 3.6, Hadoop 3.0, Spark,
Cassandra 3.11, Impala 2.1, Google cloud computing (BigQuery), Control-M
Database: Snowflake, Oracle 12c, DB2, MySQL, MS SQL server, Teradata15.
Web Tools: HTML 5.1, Java Script, XML, ODBC, JDBC, Hibernate, JSP, Servlets, Java, Struts,
spring, and Avro.
Cloud Technology: Amazon Web Services (AWS), EMR, ECS, ECR, EC2, EC3, Elastic Search,
Microsoft Azure.
Languages: Java/J2EE, SQL, Shell Scripting, C/C++, Python
Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery
IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence
Version Control Git, SVN, CVS
Operating System: Windows, Unix, Linux.
Tools: Eclipse Maven, ANT, JUnit, Jenkins, Soap UI, Log4j,
Scripting Languages: JavaScript, JQuery, AJAX, CSS, XML, DOM, SOAP, REST
Work Experience
Implemented data pipeline using Spark and Hive ingest customer behavioral data into Hadoop
platform to perform user behavioral analytics.
Developed Spark applications using Scala for ingestions of data from one environment to
another along with test cases.
Created Hive tables to load large sets of data after transformation of raw data.
Enabled and automated data pipelines for moving over 25 Gb of data from Oracle to Hadoop
and Google Big Query using GitHub for source control and Jenkins.
Created a Big Query table by writing a python program that checks a Linux Directory for
incoming XML files and then upload all new files to a Google cloud platform storage location
before the data is parsed and loaded
Utilizing Google Big Query SQL and Amazon Athena to build and drive reporting excellence.
Created Google Dataflow pipelines for uploading large public dataset into Google Big Query
Implemented end-to-end tests between Dataflow and Big Query
Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give
better execution HQL queries.
Created Hive External table for Semantic data and loaded the data into tables and query data
using HQL.
Identify data sources, create source-to-target mapping, storage estimation, provide support for
setup and data partitioning.
Developed workflows in Atomic to cleanse and transform raw data into useful information to
load into HDFS.
NBC Universal/Cognizant- Ney York City, NY October 19 – April 31
Big Data Developer
Responsibilities:
Involved in story-driven agile development methodology and actively participated in daily
scrum meetings.
Involved in data migration project for multiple applications from on-prem to AWS.
Implemented multiple Docker images according to the applications, ECS clusters, ECR
repositories, and orchestrated the task definitions to process the data on daily basis
accordingly in AWS.
Ingested terabytes of click stream data from external systems like FTP Servers and S3 buckets
into HDFS using custom Input Adaptors.
Implemented installation and configuration of multi-node cluster on the cloud using Amazon
Web Services (AWS) on EC2.
Responsible for building and configuring distributed data solution using MapR distribution of
Hadoop.
Involved in complete Big Data flow of the application data ingestion from upstream to HDFS,
processing the data in HDFS and analyzing the data.
Involved in importing the data from various formats like MapR-DB JSON, XML to HDFS
environment.
Used MySQL to import data HDFS and vice-versa using Sqoop to configure Hive meta store
with MySQL, which stores the metadata for Hive tables.
Involved in transfer of data from post log tables into HDFS and Hive using Sqoop.
Analyzed the existing data flow to the warehouses and taking the similar approach to migrate the
data into HDFS.
Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the Hive
queries decreased the time of execution from hours to minutes.
Involved in Agile methodologies, daily scrum meetings, sprint planning.
Identify data sources, create source-to-target mapping, storage estimation, provide support for
Hadoop cluster setup, data partitioning.
Implemented Nifi flow topologies to perform cleansing operations before moving data into
HDFS.
Implemented the Cassandra and manage of the other tools to process observed running on over
YARN.
Implemented Storm builder topologies to perform cleansing operations before moving data into
Cassandra.
Created a POC for the demonstration of retrieving the JSON data by calling Rest service and
converting into CSV by creating data flow and loading into HDFS.
Worked with the Apache Nifi flow to perform the conversion of Raw data into ORC.
Involved in loading and transforming large sets of structured data from router location to EDW
using a Nifi data pipeline flow.
Created Hive tables to load large sets of data after transformation of raw data.
Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give
better execution Hive QL queries.
Extensively worked on creating an End-End data pipeline orchestration using Nifi.
Designed Data flow to pull the data from Rest API using Apache Nifi with SSL context
configuration enabled.
Developed custom Nifi processors in java to add additional functionality to Nifi for providing
out of box solutions.
Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the
hive queries decreased the time of execution from hours to minutes.
Implemented data pipeline using Spark, Hive, Sqoop and Kafka to ingest customer behavioral
data into Hadoop platform to perform user behavioral analytics.
Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and
Slave) for an AWS EMR Cluster.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build
jobs.
Used Visualization tools such as PowerView for Excel, Tableau for visualizing and generating
reports.
Collaborating with application teams to install operating system and Hadoop updates, patches,
version upgrades when required.
Implemented end-to-end pipelines for performing user behavioral analytics to identify user-
browsing patterns and provide rich experience and personalization to the visitors.
Environment: HDFS, Nifi 1.5, Hive2.3, Pig 0.17, Sqoop 1.4, Oozie 4.3, Hadoop 3.0, MySQL 5.7,
Metadata, Kafka 3.0, HBase 1.2, Spark 2.3, Scala, Oozie 4.3, Python3.6, Jenkins, Maven, Cassandra 3.11
Environment: Hadoop, Amazon AWS, S3 AWS , Oozie 4.0, EC2, HDFS, Spark 2.0, Sqoop 1.4, MySQL
5.6, Hive 2.3, Cloudera, HBase 1.2, MapReduce, NoSQL, MongoDB, Cassandra 2.1, Kafka 2.2, JSON,
Jenkins, Maven
Environment: Html5, CSS3, JavaScript, Spring, Cassandra, Hadoop, Hive 1.1, Sqoop 1.4, Flume, Oozie
3.3, Kafka 2.0, Spark 1.1, HDFS, NoSQL, MapReduce, HBase, Pl/Sql