You are on page 1of 1

Saurabh Jethwa

Data Scientist
A Data Scientist Professional with an experience of more than 2.5 years and expertise in detecting trends and patterns, extracting useful insights, building
machine learning models in order to find solutions of various business problems and expertise in Big data tools also.

saurabhjethwa@gmail.com 9811239556 new delhi, India linkedin.com/in/saurabh-jethwa-ba5784136

WORK EXPERIENCE SKILLS AND TECHNOLOGIES


Data Scientist Python Machine Learning Statistics
Polestar Solutions and Services
09/2018 - Present Noida R SQL HADOOP HIVE

Scikit-learn SpaCy SPARK NLP


Project Title - "Store Sales Prediction"
Role - Data Scientist
Description - To build a model which forecast the sales of 890+ stores in 7 European countries which
will predict their daily sales for up to six weeks in advance. PERSONAL PROJECTS
Tasks CORONA VIRUS CASES CHATBOT
Given the big data of daily store sales for 2.5 years with various features such as type of Developed chat bot based on giving the real time corona
store, competition, demographics and promotional virus cases of all the countries, Indian states and cities
using RASA NLU Framework
Created the Ensemble model using Linear regression and XGboost Regressor
Early Warning System (EWS)
Project Title - "Data Pipeline for SharePoint" Developed a model to predict the future churn of
employees based on workforce data.
Role - Big Data Engineer
Description - Developed a data pipeline from Data Lake to Share Point on hadoop platform. House Pricing Prediction(Regression)
Tasks Developed a model using Regression techniques like
Ridge, Lasso , ElasticNet and GBM. Used appropriate
To Extract the data using SQOOP from SQL Server to Data Lake
Pre-processing measures like One hot encoding, scaling,
To Transform the data using Hive outliers treatment etc.
Creating a python rest API to connect Hive to SharePoint Loan Defaulter Prediction (Classification)
Developed a model using Random Forest. Used
Project Title - "Sales Based Chatbot" appropriate Pre-processing measures like One hot
encoding, scaling, Straffied Sampling etc .
Role - Data Scientist
Description - Developed chat bot based on sales data which can be used by any non-technical person Air Quality Prediction(Time Series)
giving all the information about sales in very easy and simple manner.
Forecasted Trend of Air Quality of New York for Next 5-
Tasks 10 Years using Arima Model on Time Series Data.
Created the training data according to the business needs which can be used by the
user considering all the possible scenarios.
Sentiment Analysis On Hadoop Platform
Analyzed the Sentiments of Twitter users. Used Apache
Used RASA NLU (open source Conversational AI framework) for the Intent Flume for Data ingestion to HDFS from Twitter API. Used
Classification and Named-Entity Recognition for the best accuracy in all the scenarios. Apache Hive to model the data and analyze the
sentiments from AFINN Dictionary. Excel is used for data
Chat bot YouTube link - https://www.youtube.com/watch?v=b58K553Sxzw visualization

Project Title - "Real Time Processing on Spark Server"


Role - Data Scientist / Big data Engineer SKILL SET
Developed spark API which uses Pyspark for real time data processing on spark server which was
sufficient enough to handle 5000-10000 concurrent request. Technical Tools and Languages
– Python, Pyspark ,R, SQL, Tableau.
Tasks
Created highly optimised pyspark script for parallel processing of the data on the server. Analytical and Statistical techniques
Used basic NLP functionalities like – Tokenization, lemmatization, N-gram for data – Hypothesis testing, ANOVA, Dimension reduction, (PCA,
Factor Analysis), Exploratory Data Analysis, Manipulation and
preparation purpose. Visualization.
Created Flask API and deployed it on the EMR server.
Machine Learning, NLP and Predictive
Modeling
EDUCATION - Linear Regression, Logistic Regression, Regularization (L1 &
L2), Multiclass Classification, Decision Trees, Random
Forest, Clustering (k means), Support Vector Machine, FP
M.tech in Data Science and Engineering growth, Gradient Boosting, XGBoost, Word2vec, Language
modelling, Named entity recognition, Topic Modeling etc.
Birla Institue of Technology and Sciences, Pilani
2020 - Present Big Data Tools
- Hadoop, Hive, Pig, Scoop, Impala, Flume, Spark.

PG-Diploma in Big Data Analytics


Centre for Development of Advanced Computing (CDAC) - Noida INTERESTS
2017 - 2018
Badminton Gym Football
B.Tech Computer Science
Photography Travelling
Deen Dayal Upadhyaya college , University of Delhi
2013 - 2017

You might also like