Shubham Oza

Shubham Oza
Mobile: +919550590023 | Email:shubhamzerocool2@gmail.com
CAREER OBJECTIVE
 To pursue a good position as a Data Scientist\Big Data Analyst in a
challenging environment to leverage my technical skillset and creativity
and bring value to the organization.
 To explore and carryout the assigned responsibilities with at most
degree of command and dedication resulting in successful culmination
to the satisfaction of the Organization.
 Data-driven Analyst with experience in data analysis and statistical
information.
EDUCATION
Degree Institution Year Percentage
PGP (Big Data INSOFE, Hyderabad Jan 2018 - Aug 2018 70%
Optimization)
B. Tech (CSE) JNTU, (Affiliated To 2011-2015 73%

JNTU Hyderabad)
XII Narayana Junior 2009-20011 86.4%

College, Hyderabad
X ST Ann’s High 2008-2009 81.4%

School. Hyderabad
EXPERIENCE
DATA SCIENCE PROJECTS
1. Predict fare amount for Uber cab services

 Aim: - The aim was to predict the fare amount accurately for uber cab
services received by various riders.
 Methodology: -
 After fetching the data, next step was to conduct various pre-processing
steps, analyzing the data and bring the data to cleaned state.
 Feature selection was conducted using Step AIC, VIF and regularization
techniques. PCA was used for dimensionality reduction.
 Starting with linear regression algorithm further explored various other
algorithms like SVM, random forest, neural nets, and Ensemble models.
 Performance of the model was evaluated based on error metrics (MSE,
RMSE)
 Platforms used: -
 Jupyter note Book: - Extraction, pre-processing and model building of the
data was performed in python language.
 Tableau: -For attaining the insights of the data, visualizing them on
dashboards tableau tool was used.
2. Identifying loan defaulters based on the historic bank data on Hadoop

ecosystem.
 Aim: - The aim was to predict the loan defaulter’s, Loans default will cause
huge loss for the banks, so they pay much attention on this issue and apply
various method to detect and predict default behaviors of their customers.
 Methodology: -
 Fetching the data from RDBMS (SQL) with the help of apache Sqoop tool
and uploading it in the HDFS system. Using apache spark creating schemas
and mapping the data.
 Next step was to conduct various pre-processing steps, analyzing the data,
joining various data frames and attain insights of the data.
 Feature selection was conducted, scaling the data and bringing it to a
cleaned format.
 Using apache sparkML, modeling of the data was performed with various
algorithms includes linear classification, SVM, random forest, neural nets,
and Ensemble models.
 Performance of the model was evaluated based on accuracy and recall
(Confusion matrix)
 Jupyter note Book: - pre-processing and model building of the data was
performed in python language.
 Hadoop Ecosystem: - Apache sqoop, HDFS, Spark, Hive, SparkSQL, Spark
ML, SQL.
 Tableau: -For attaining the insights of the data, visualizing them on
dashboards tableau tool was used.
3. Automated methods to detect and classify human diseases from medical

images using convolution neural net.
 Aim: - It was computer vision project where the aim was to analyze and
classify the disease using X-ray images of persons suffering from
pneumonia.
 Methodology: -
 Fetching the data and performing various preprocessing steps like resizing
the images, arranging them in ideal folder structure and labeling them
with their categories.
 Building basic Convolution neural net, training and testing the model and
obtain best model by performing hyper tuning.
 Using transfer learning technique and applying VGG net to get the best
accuracy and predict on the validation data.
 Performance of the model was evaluated based on accuracy and recall
(Confusion matrix).
4. Text Mining on data obtained from twitter and perform sentiment
analysis: -
 Aim: - To extract data from twitter using API’s and perform sentiment
analysis to obtain insights from data.
 Methodology: -
 Fetching the data from twitter using API’s preprocessing them to python
panda’s table format.
 Using Text Blob library present in python analyzed the data and came up
with the sentiment scores for each individual tweet.
CAREER HISTORY
Cyient Pvt Ltd, (Software Engineer | Tableau developer)

Oct-2015 to till date.
 Working on ASP.Net by customizing the powerOn Tcom web application as
per the requirements of client (CLP, china).
 I was part of Hackadrone -2018 (India’s first UAV hackathon) organizers
team, where I was responsible for developing hackadrone website and
design the complete registration workflow of application module.
 From October 2016 worked as a tableau developer for Cyient innovation
team, where my responsibility was to obtaining raw data in structured and
unstructured format, preprocess, build attractive dashboards and forecast
the insights to the leadership.
SKILLS
 Programming Languages : R, Python, C#, JavaScript,

JQuery.
 Machine Learning : Linear regression,

Classification, Cluster
analysis, SVM, KNN,
Random Forest, Decision
Tree, Artificial Neural nets,
CNN, RNN and Ensemble
models
 Hadoop : Linux, HDFS, Sqoop, Flume,

Pig, Hive, Map reduce,
SparkML, RDD, SparkSQL,
SQL.
 Databases : SQL, Pl/SQL.
 Visualization : Tableau, Using R and

python in build libraries.
Personal Information
 Name : Shubham Oza.

 Father’s name : Subhash Oza.
 DOB : 3rd May 1994.
 Languages Known : Hindi, English and Telugu.
 Address : H.no-2-31/7/a/3, Taranagar,
Lingampally, Hyderabad – 500019.

Shubham Oza

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shubham Oza

Uploaded by

Copyright:

Available Formats

Shubham Oza

Mobile: +919550590023 | Email:shubhamzerocool2@gmail.com

Degree Institution Year Percentage

B. Tech (CSE) JNTU, (Affiliated To 2011-2015 73%

XII Narayana Junior 2009-20011 86.4%

X ST Ann’s High 2008-2009 81.4%

DATA SCIENCE PROJECTS

1. Predict fare amount for Uber cab services

2. Identifying loan defaulters based on the historic bank data on Hadoop

3. Automated methods to detect and classify human diseases from medical

Cyient Pvt Ltd, (Software Engineer | Tableau developer)

 Programming Languages : R, Python, C#, JavaScript,

 Machine Learning : Linear regression,

 Hadoop : Linux, HDFS, Sqoop, Flume,

 Visualization : Tableau, Using R and

 Name : Shubham Oza.

You might also like