Professional Documents
Culture Documents
Profile
8 years of Software Industry experience with focus on Solution Architecture, Cloud Migrations, Product
Development, Requirement analysis, Design, Development, Performance tuning, Coding, quality reviews and
sustenance of Big Data, Cloud Solutions and ETL projects. Developing framework for ETL Migration, Data
Ingestion in Data Lake, Lambda Architecture implementation, Streaming framework.
Currently, working as Lead Technology Consultant(L5) in Amazon Web Services. AWS, Microsoft Azure,
Databricks, Hadoop, Hive, Spark 2.x/3.x, Spark Streaming, Kafka, Cosmos DB, Dynamo DB, ETL Migrations, EMR
Migrations, Serverless Streaming migration, Redshift, Azure Synapse, Presto, Athena, Jenkins, Code
commit/Code build and Basic UNIX Scripting are the main area of expertise. Having strong knowledge of Data
Analytics and Data Lake preparation and security. AWS 2X certified with Tier 2 Speaker certification
Educational Qualification
Year of
Course/Institution Details Percentage
Passing
B.TECH in Electronics and Communication Engineering.
Institute: RCC Institute of Information Technology 2013 84.50
University: West Bengal University of Technology, Kolkata, West Bengal, India
Higher Secondary School Leaving Certificate (+2)
Institute: Jodhpur Park Boys School 2009 88.00
Board: WBCHSE
Secondary School Leaving Certificate (SSLC)
Institute: WBBSE 2007 90.00
Board: WBBSE
Professional Experience
Tenure Company Name Current Role
Aug 2020 – Present Amazon Web Services Lead Consultant
Jul,2019 – May,2020 MobileWalla Inc. Data Engineer
Apr, 2018 – Jul,2019 PricewaterhouseCoopers India Senior Consultant
Nov, 2016 – Apr,2018 Deloitte US India Pvt. Limited Consultant
March, 2016 – Sept, 2016 Teradata Corporation, Think Big Big Data Consultant
May, 2014 – March, 2016 Cognizant Technology Solutions Pvt. Limited Big Data and ETL Developer
IT Experience
Database and Big Data Component: Hive, Dynamo DB, Cosmos DB, Amazon Redshift, RDS
Operating systems : EMR, Hadoop Distributions, Azure and AWS Databricks
ETL : Informatica Big Data Edition, Power center, SSIS, Talend
Cluster Computing Framework : Apache Spark(Glue, EMR), Sagemaker, DataBricks(UDP)
Language/Technologies : Python, SQL, Core Java and Scala
Data Ingestion Tools : MWAA, Kinesis, AWS Datasync, Azure Data Factory, Kafka
Build & Versioning Tool : Maven, SBT, Git, GitLab, Bitbucket
Cloud Platform : AWS(Amazon Web Services), Microsoft Azure, AWS IoT Analytics, Azure IoT
Technical Experience
Developed Migration Accelerators, Data Ingestion accelerators, contributed to SDLF, ETL Migration Service, EMR
Migration service
Azure Data Bricks SPOC , developed and deployed multiple projects on Azure and AWS Data Bricks
Worked in Big Data projects AWS, Spark, Spark-Streaming, Kafka, EventHub, Azure ML, HDFS, Hive, Pig, Sqoop, ADF,
AWS Managed Airflow, Nifi, and DynamoDB. Worked in Cloud Environment Like AWS, Azure
Developed product on Exabyte volume of Data based Play framework, Spark and AWS from Scratch
Ideated iGeneration and InvSolv for Azure Marketplace as IIoT and Inventory management Solutions
Having exposure to basic Spark, Hive, Informatica, Teradata, Redshift ETL performance tuning.
Have significant experience as a Technical Specialist in performing requirement elicitation, analysis & design,
development, coding and leading a team.
AWS GCCI APAC speaker on ETL Migration Service, EMR Migration Service IT Analytics
Project Experience
Project Experience as Cloud Migration Consultant at Amazon Web Services are
undisclosable due to confidentiality reasons
Solution Architect
Role
Involved in the designing end to end Data flow on AWS.
Responsibility
Involved in Developing the Microservice layer using Play framework on Scala
Developed the Spark Jobs which gets triggered by the API Layer after successfully
validating the requirement
Designing a scalable Data Extraction tool from S3 JSON to CSV, Parquet and JSON format
Prepared the Dynamic Athena Table using the API Layer and Generate some required
metrics as per requirement
Big Data Spark 2.4, Scala 2.12, Play Framework, AWS EMR , S3
Components
Database and SBT, GitLab, Athena, Redshift
Other Components
Solution Architect
Role
Involved in the designing end to end Data flow on Microsoft Azure.
Responsibility
Involved in Developing the Universal Data Lake, Business Data Lake, PDS Layer in
SQL Datawarehouse and analytic cubes in Azure Analytics Services
Developed Notebooks on Azure Databricks and deployed it to DEV, QA and Production
Involved in designing the Pipelines using Azure Data Factory 2.0
Prepared the Data Models in SQL Datawarehouse and Aggregation Strategy for Azure
Analytics Services
Big Data Spark 2.4, Azure Data Bricks, ADLS, Azure SQL DW, MCS Framework, Azure Data Factory 2.0,
Components Azure SQL Server, Azure Dev Ops, Power BI, Azure Analytics Services, Azure Key Vaults
Database and Azure VSTS, Azure Dev OPS
Other Components
HPE Next Generation IT
03 Project Name Real Time & Batch Data Integration for Finance and Sales comp Aug-2017-Apr-18
Description of the Client
Hewlett Packard Enterprise Company (commonly referred to as Hewlett Packard Enterprise or HPE) is an American
multinational enterprise information technology company based in Palo Alto, California. HPE is a business-focused organization
with two divisions: Enterprise Group, which works in servers, storage, networking, consulting and support, and Financial
Services.
Tech Lead
Role
Involved in the designing Framework for Batch and Real Time Data Ingestion,
Responsibility Searching and Processing
Involved in Developing the Data Lake Accelerator tool from scratch , execute it in
Production for HPE as well as different clients
Handling/managing entire ELT process from planning to deployment. Implementing Real
time integration with Kafka 0.10.2.1 and Spark and developing a platform which handles
complex json, csv, txt files in an expected ingestion rate as 120 Million per day.
Involved in Hive, Hbase, and Spark Implementation in Hadoop Cluster.
Involved extensively in Data Flow designing, Data Modelling in Hive.
Big Data Spark 1.6.3 , Spark 2.0 Structured Streaming, Hive, Kafka, Hbase, Kerberos, REST , Unix, Spark in
Components Data Quality, Extended Data Lake Preparation
Database and Git, Tortoise, Maven, MySQL
Other Components
Caterpillar INC
04 Project Name BCP Data Integration Mar-17 to Aug-17
Description of the Client
Caterpillar Inc. is an American corporation which designs, develops, engineers, manufactures, markets and sells machinery,
engines, financial products and insurance to customers via a worldwide dealer network
Consultant
Role
Responsibility Involved in the designing Data Lake in AWS Redshift and Hadoop
Involved in Data Integration using AWS Data Pipeline, EC2 Instances
Handling/managing entire ETL process from planning to deployment.
Involved in Hive, Spark Implementation in Hadoop Cluster.
Involved extensively in Data Flow designing.
Big Data Spark 1.6, Hive, Impala, Redshift, Unix, Spark in Data Quality, Extended Data Lake Preparation,
Components SSIS, Python
Database and Git, Tortoise, Maven, Redshift, AWS Data Pipeline, Lambda, EC2 Instances
Other Components
Diageo INC
05 Project Name NAM Data Analytics Nov-16 to Mar-17
Description of the Client
Diageo is a British multinational alcoholic beverages company, with its headquarters in London, England. It was the world's
largest distiller and also a major producer of beer.
Developer
Role
Involved with exhaustive usage of Azure Data Factory, HDInsight, Data Lake Design, Data
Responsibility Modelling in Hive, Spark with Scala, and Spark-Streaming from Eventhub, ADLS, Blob
Storage, Azure ML, Stream Analytics, and IOT-Hub. Diageo is looking for a forecasting
solution of their Depletion, Shipment and Nielson Sales Data
Developed a single job to load the data into staging layer, Data Validation and Data Quality
Handling/managing entire ETL process from planning to deployment.
Liberty Mutual
06 Project Name Midtierdata Analysis Mar-16 to Sept-16
Description of the Project
Liberty Mutual is trying to build a recommendation engine of providing car, health and property insurance. They are tracking
each and every card driving details, accident occurring, speed control, RFID data ,Medical History, Family Medical Histories,
Property status, Legal obligation, Owner details. The recommendation engine would ease the process of Insurance amount
allocation and also provide potential customers suggestion all across US.
Developer
Role
Responsibility Involved with exhaustive usage of Spark with Scala, Spark-Streaming from Web Socket.
Spool. Storing Denormalized data into HBase, Join different dimensional data from Hive,
RDD, Dataframe operations, Use of Decision Tree, Regression and K-means Clustering
Involved in Functional and non-functional performance tuning of Spark jobs.
Handling/managing entire ETL process from planning to deployment..
Big Data Spark 1.4, Hive, HBase(Storage), Unix, Teradata, Spark-SQL, Spark-Streaming, Kafka, Flume, Sqoop,
Components Teradata Query Grid, Scala, Core Java, SQL, ORC, Spark-Tuning
Database and Teradata(14.10), Eclipse, Putty, Git, Maven
Other Components
Involved in minor data modeling engagements and Real Time Data Warehousing.
Involved extensively in Data Flow designing.
Big Data Spark 1.6, Hive, Unix, Oracle NA Data Warehouse, Spark-SQL, Spark-Streaming, Eventhub, Azure
Components Data Factory, Stream Analytics, Cosmos DB, TFS, SQL DW, ADLS, HDInsight
Database and Oracle, Anaplan, Nielson, Eclipse, Putty, TFS, Maven
Other Components
Personal Summary
Father’s Name : Debasish Guha
Date of Birth : Oct 10, 1992
Languages : English, Bengali , Hindi
Marital Status : Married
Residential Address : 24/4 Selimpur Lane, Dhakuria,Kolkata-700031