You are on page 1of 5

Subhasish Guha

Phone: +919163392353(M) Total Exp: 8 years


Email: subhasish.iot@gmail.com

Profile
 8 years of Software Industry experience with focus on Solution Architecture, Cloud Migrations, Product
Development, Requirement analysis, Design, Development, Performance tuning, Coding, quality reviews and
sustenance of Big Data, Cloud Solutions and ETL projects. Developing framework for ETL Migration, Data
Ingestion in Data Lake, Lambda Architecture implementation, Streaming framework.

 Currently, working as Lead Technology Consultant(L5) in Amazon Web Services. AWS, Microsoft Azure,
Databricks, Hadoop, Hive, Spark 2.x/3.x, Spark Streaming, Kafka, Cosmos DB, Dynamo DB, ETL Migrations, EMR
Migrations, Serverless Streaming migration, Redshift, Azure Synapse, Presto, Athena, Jenkins, Code
commit/Code build and Basic UNIX Scripting are the main area of expertise. Having strong knowledge of Data
Analytics and Data Lake preparation and security. AWS 2X certified with Tier 2 Speaker certification

Educational Qualification
Year of
Course/Institution Details Percentage
Passing
B.TECH in Electronics and Communication Engineering.
Institute: RCC Institute of Information Technology 2013 84.50
University: West Bengal University of Technology, Kolkata, West Bengal, India
Higher Secondary School Leaving Certificate (+2)
Institute: Jodhpur Park Boys School 2009 88.00
Board: WBCHSE
Secondary School Leaving Certificate (SSLC)
Institute: WBBSE 2007 90.00
Board: WBBSE

Professional Experience
Tenure Company Name Current Role
Aug 2020 – Present Amazon Web Services Lead Consultant
Jul,2019 – May,2020 MobileWalla Inc. Data Engineer
Apr, 2018 – Jul,2019 PricewaterhouseCoopers India Senior Consultant
Nov, 2016 – Apr,2018 Deloitte US India Pvt. Limited Consultant
March, 2016 – Sept, 2016 Teradata Corporation, Think Big Big Data Consultant
May, 2014 – March, 2016 Cognizant Technology Solutions Pvt. Limited Big Data and ETL Developer

IT Experience
 Database and Big Data Component: Hive, Dynamo DB, Cosmos DB, Amazon Redshift, RDS
 Operating systems : EMR, Hadoop Distributions, Azure and AWS Databricks
 ETL : Informatica Big Data Edition, Power center, SSIS, Talend
 Cluster Computing Framework : Apache Spark(Glue, EMR), Sagemaker, DataBricks(UDP)
 Language/Technologies : Python, SQL, Core Java and Scala
 Data Ingestion Tools : MWAA, Kinesis, AWS Datasync, Azure Data Factory, Kafka
 Build & Versioning Tool : Maven, SBT, Git, GitLab, Bitbucket
 Cloud Platform : AWS(Amazon Web Services), Microsoft Azure, AWS IoT Analytics, Azure IoT

Technical Experience
 Developed Migration Accelerators, Data Ingestion accelerators, contributed to SDLF, ETL Migration Service, EMR
Migration service
 Azure Data Bricks SPOC , developed and deployed multiple projects on Azure and AWS Data Bricks
 Worked in Big Data projects AWS, Spark, Spark-Streaming, Kafka, EventHub, Azure ML, HDFS, Hive, Pig, Sqoop, ADF,
AWS Managed Airflow, Nifi, and DynamoDB. Worked in Cloud Environment Like AWS, Azure
 Developed product on Exabyte volume of Data based Play framework, Spark and AWS from Scratch
 Ideated iGeneration and InvSolv for Azure Marketplace as IIoT and Inventory management Solutions
 Having exposure to basic Spark, Hive, Informatica, Teradata, Redshift ETL performance tuning.
 Have significant experience as a Technical Specialist in performing requirement elicitation, analysis & design,
development, coding and leading a team.
 AWS GCCI APAC speaker on ETL Migration Service, EMR Migration Service IT Analytics

Project Experience
Project Experience as Cloud Migration Consultant at Amazon Web Services are
undisclosable due to confidentiality reasons

Johann Island Product Development(Data as a Service)


01 Project Name Automated Data Delivery using AWS Cloud. July-19-April 20
Description of the Company
Mobilewalla is the only mobile consumer intelligence platform that observes, captures, and analyzes the behavior of consumers.
Mobilewalla provides different demographical analysis and different segments on the Mobile Data scaling at Petabyte level

Solution Architect
Role
 Involved in the designing end to end Data flow on AWS.
Responsibility
 Involved in Developing the Microservice layer using Play framework on Scala
 Developed the Spark Jobs which gets triggered by the API Layer after successfully
validating the requirement
 Designing a scalable Data Extraction tool from S3 JSON to CSV, Parquet and JSON format
 Prepared the Dynamic Athena Table using the API Layer and Generate some required
metrics as per requirement

Big Data Spark 2.4, Scala 2.12, Play Framework, AWS EMR , S3
Components
Database and SBT, GitLab, Athena, Redshift
Other Components

Unilever DMPT Solutions


02 Project Name Data Lake Solutions and Dashboard Preparation Microsoft Azure Aprt-18-July-19
Description of the Client
Unilever is a British-Dutch transnational consumer goods company co-headquartered in London, United Kingdom and
Rotterdam, Netherlands. Its products include food and beverages, cleaning agents, beauty products, and personal care products.
It is Europe’s seventh most valuable company

Solution Architect
Role
 Involved in the designing end to end Data flow on Microsoft Azure.
Responsibility
 Involved in Developing the Universal Data Lake, Business Data Lake, PDS Layer in
SQL Datawarehouse and analytic cubes in Azure Analytics Services
 Developed Notebooks on Azure Databricks and deployed it to DEV, QA and Production
 Involved in designing the Pipelines using Azure Data Factory 2.0
 Prepared the Data Models in SQL Datawarehouse and Aggregation Strategy for Azure
Analytics Services

Big Data Spark 2.4, Azure Data Bricks, ADLS, Azure SQL DW, MCS Framework, Azure Data Factory 2.0,
Components Azure SQL Server, Azure Dev Ops, Power BI, Azure Analytics Services, Azure Key Vaults
Database and Azure VSTS, Azure Dev OPS
Other Components
HPE Next Generation IT
03 Project Name Real Time & Batch Data Integration for Finance and Sales comp Aug-2017-Apr-18
Description of the Client
Hewlett Packard Enterprise Company (commonly referred to as Hewlett Packard Enterprise or HPE) is an American
multinational enterprise information technology company based in Palo Alto, California. HPE is a business-focused organization
with two divisions: Enterprise Group, which works in servers, storage, networking, consulting and support, and Financial
Services.
Tech Lead
Role
 Involved in the designing Framework for Batch and Real Time Data Ingestion,
Responsibility Searching and Processing
 Involved in Developing the Data Lake Accelerator tool from scratch , execute it in
Production for HPE as well as different clients
 Handling/managing entire ELT process from planning to deployment. Implementing Real
time integration with Kafka 0.10.2.1 and Spark and developing a platform which handles
complex json, csv, txt files in an expected ingestion rate as 120 Million per day.
 Involved in Hive, Hbase, and Spark Implementation in Hadoop Cluster.
 Involved extensively in Data Flow designing, Data Modelling in Hive.

Big Data Spark 1.6.3 , Spark 2.0 Structured Streaming, Hive, Kafka, Hbase, Kerberos, REST , Unix, Spark in
Components Data Quality, Extended Data Lake Preparation
Database and Git, Tortoise, Maven, MySQL
Other Components

Caterpillar INC
04 Project Name BCP Data Integration Mar-17 to Aug-17
Description of the Client
Caterpillar Inc. is an American corporation which designs, develops, engineers, manufactures, markets and sells machinery,
engines, financial products and insurance to customers via a worldwide dealer network

Consultant
Role

Responsibility  Involved in the designing Data Lake in AWS Redshift and Hadoop
 Involved in Data Integration using AWS Data Pipeline, EC2 Instances
 Handling/managing entire ETL process from planning to deployment.
 Involved in Hive, Spark Implementation in Hadoop Cluster.
 Involved extensively in Data Flow designing.

Big Data Spark 1.6, Hive, Impala, Redshift, Unix, Spark in Data Quality, Extended Data Lake Preparation,
Components SSIS, Python
Database and Git, Tortoise, Maven, Redshift, AWS Data Pipeline, Lambda, EC2 Instances
Other Components

Diageo INC
05 Project Name NAM Data Analytics Nov-16 to Mar-17
Description of the Client
Diageo is a British multinational alcoholic beverages company, with its headquarters in London, England. It was the world's
largest distiller and also a major producer of beer.

Developer
Role
 Involved with exhaustive usage of Azure Data Factory, HDInsight, Data Lake Design, Data
Responsibility Modelling in Hive, Spark with Scala, and Spark-Streaming from Eventhub, ADLS, Blob
Storage, Azure ML, Stream Analytics, and IOT-Hub. Diageo is looking for a forecasting
solution of their Depletion, Shipment and Nielson Sales Data
 Developed a single job to load the data into staging layer, Data Validation and Data Quality
 Handling/managing entire ETL process from planning to deployment.
Liberty Mutual
06 Project Name Midtierdata Analysis Mar-16 to Sept-16
Description of the Project
Liberty Mutual is trying to build a recommendation engine of providing car, health and property insurance. They are tracking
each and every card driving details, accident occurring, speed control, RFID data ,Medical History, Family Medical Histories,
Property status, Legal obligation, Owner details. The recommendation engine would ease the process of Insurance amount
allocation and also provide potential customers suggestion all across US.
Developer
Role

Responsibility  Involved with exhaustive usage of Spark with Scala, Spark-Streaming from Web Socket.
Spool. Storing Denormalized data into HBase, Join different dimensional data from Hive,
RDD, Dataframe operations, Use of Decision Tree, Regression and K-means Clustering
 Involved in Functional and non-functional performance tuning of Spark jobs.
 Handling/managing entire ETL process from planning to deployment..

Big Data Spark 1.4, Hive, HBase(Storage), Unix, Teradata, Spark-SQL, Spark-Streaming, Kafka, Flume, Sqoop,
Components Teradata Query Grid, Scala, Core Java, SQL, ORC, Spark-Tuning
Database and Teradata(14.10), Eclipse, Putty, Git, Maven
Other Components
 Involved in minor data modeling engagements and Real Time Data Warehousing.
 Involved extensively in Data Flow designing.

Big Data Spark 1.6, Hive, Unix, Oracle NA Data Warehouse, Spark-SQL, Spark-Streaming, Eventhub, Azure
Components Data Factory, Stream Analytics, Cosmos DB, TFS, SQL DW, ADLS, HDInsight
Database and Oracle, Anaplan, Nielson, Eclipse, Putty, TFS, Maven
Other Components

Family Dollar-Dollar Tree Data Migration and SMBONUS calculation


07 Project Name Family Dollar May-15 to March-16

Description of the project


Family dollar had a merger with Dollar tree in 2015 and they want to get some combined customer, Transactional Data and also
some real time Data. They want to do some data analysis and provide some bonus and credits to each customer on the basis of
their shopping nature.
Developer/Team Member
Role
 Understanding the Data Warehouse for both organizations and feasibility of analyzing
Responsibility their history data.
 Data Migration from Teradata and Netezza, Different KORNOS resources, Teradata
Archive files to HDFS.
 Extensive use of Informatica BDE for Data Migration, Pre Stage Layer populations and
maintain CDC in Dimension Tables
 Developing Real time warehouse planning in Hive. Hive table properties handling partition,
bucketing, and Cleaning, Filtering Data Profiling, Handling Pivoted data in Pig for
Mainframe and SAS data.

Big Data tools Informatica Power Center, Hive, Pig, HBase, MapReduce 2.x, Core Java, Sqoop, Query Grid, Unix
Database and Teradata (14.x),Netezza, Putty, HDP 2.2,Eclipse, Maven, JIRA
other tools
Achievements
1. Merit scholarship for scoring above 90% in all the science subjects from West Bengal Council of Higher Secondary
Education.
2. Received Rising star award from All India Retail and CG for developing tool for Big Data Testing and hard dollar
saving.
3. Received Applause and Spot Award for exceptional performance within a year in Deloitte USI
4. Highest number of Pulse feedbacks within the Analytics team for AWS

Personal Summary
 Father’s Name : Debasish Guha
 Date of Birth : Oct 10, 1992
 Languages : English, Bengali , Hindi
 Marital Status : Married
 Residential Address : 24/4 Selimpur Lane, Dhakuria,Kolkata-700031

Date: 24.02.2022 Subhasish Guha

You might also like