You are on page 1of 8

INDUSTRIAL INTERNSHIP

WEEKLY PERFORMANCE REPORT (WPR)

Student ID: IC /8W-PYTHON /02(N)/2021


Student Name: Vikas Gupta
Supervisor Name: B. P. Mishra/ Shivani Mishra
Coordinator/Team Leader Name: Namira Rangrej
Mentor Name: Pranshu Sharma
Organization: CureYa
Hours Worked: Monday-2 hrs, Tuesday-2 hrs, Wednesday-3 hrs, Thursday-3 hrs, Friday- 2 hrs

Week 6 (31 May, 2021 -- 4 June, 2021)


Monday:
Tableau Tutorials:
To better understand the data, Visualization tools such as Tableau, Microsoft Power BI, QlikView ae
used by BI professionals, data analysts for analyzing and forecasting the data. Tableau is the data
visualization tool software widely used by many companies to analyze their data and provide better
insights on the data that helps the business grow. The best thing is, you can customize the
dashboard according to your requirements.
There are 5 main products in Tableau product family:
1) Tableau Desktop
2) Tableau Server
3) Tableau Online
4) Tableau Reader
5) Tableau Online
Tableau is used in various departments such as:
 Aerospace & Defense
 Associations and Non-profits
 Higher Education
 K-12 Education
 Banking
 Media & Entertainment
 Pharmaceuticals and much more…..

Tableau Intro:
What is Tableau?
-Tableau is a Business Intelligence (BI) tool. BI (Business Intelligence) is a process of converting raw
data into meaningful information. It is a set of process, architecture, and technologies that drives
profitable business actions.

-Business Intelligence (BI): It is a method of collecting, storing, and analyzing data from business
operations or activities to optimize performance.
BI has a direct impact on the organizational strategic and operational business decision. It impacts
the revenue and financial model of the business.

Why Tableau is a market leader?


- Because it’s
o Easy
o Powerful
o Fast
And Tableau doesn’t require any tech skills or coding to operate it.

Tableau Installation:
Go to Tableau.com→ Click on “Products” menu tab→ Click on “Desktop” to install it
You’ll see three areas to work on Tableau after the installation namely; Connect, Open, Discover.
Tableau comes with a 14-day free trial period.

Tableau Architecture:
Tableau

Create Share

Tableau Tableau Tableau Tableau


Desktop Server Online Reader

Tableau has two components:


1) Create: It is used to create the visualization and uses Tableau Desktop for this purpose.
2) Share: It is used to share what is created and it uses Tableau Server, Tableau Online, and
Tableau Reader for this purpose.

Tableau Pricing:
 For Individuals: - $70/user/month and it’s billed annually
 For Teams & Organizations: -
 Reports are deployed with Tableau Server: $70, $35, $12 (There’s infrastructure cost
associated with it.
 Reports are deployed with Tableau Online: $70, $42, $15 (It’s deployed on Cloud Server
and thus there’s no infrastructure cost associated with it.

Process Flow of BI
Project:

Business Data
Understanding Understanding

Data Data
Preparation
Deployment
Modeling

Evaluation
 Business Understanding: - It refers to understanding the domain of the business.
 Data Understanding: - It refers to understanding of the data which may be structured or
unstructured, short or long, ambiguous or null.
 Data Preparation: - It refers to the tables, dimensions, relationships
 Modeling: - It refers to the creation of formulas, columns, functions, and advance functions
 Evaluation: - It refers to the creation and evaluation of reports, dashboards. If evaluation fails,
process is repeated.
 Deployment: - It refers to the deployment of the project.

Go to Menu→ Click on MS-Excel→ Select Excel File→ “Processing Request” box appears→ data
source interface appears

Tuesday:
Tableau has 7 data types:
1) Text (String) Values
2) Date Values
3) Date & Time Values
4) Numerical Values
5) Boolean Values (relational only)
6) Geographic values (used with maps)
7) Cluster Group (Used with “find clusters” in Data)

There are 2 types of connections in Tableau:


1) Live: It is used when data is too large and it’s remote data. It is used to connect to the
dataset and work on the dynamic data.
2) Extract: It has the ability to copy the data to local system. It is used for function modelling.
We can refresh both functions manually.

There are four types of joins to join the tables in Tableau:


1) Inner Join
2) Left Join
3) Right Join
4) Full Outer Join

-Visuals Operations:
Changing colour of the bars in visuals
Increasing/Decreasing the size of the bar
Fit Width
The aim is to make visuals attractive so it’s easy to understand them
-Aggregation: It’s a collection of things in a cluster. It’s a collection of numbers in a single value. E. g.,
Median, Average, Count, Variance
-Creation of a new workbook:
 Local aggregation: The aggregations that are directly implemented at the visual level.
 Global aggregation: This aggregation is performed under the measure section. Go to
“Measures”→ Go to “Sales”→ Go to “Default properties”→ Go to “Aggregation”→ Select it.
-Clear Screen Operation
-Change Currency
-For grand total operation→ Go to “Analysis”→ Go to “Totals”
There are two formats to save a Tableau file:
-Saving a Tableau file: Go to “File”→ Save As→ Save in “.twb” format or “.twbx” format
“.twb” is an earlier version and it needs a dataset on the system to view the file.
“.twbx” is the newer version and it refers to the xml format. It doesn’t need dataset on the system to
view the file.
Charts:
Scatter Plot: It is used for establishing relation between 2 variables (which can be dependent or
independent)
Sunburst Pie Charts: It consists of 2 layers
i. Product Category Chart
ii. Product Sub-Category Chart
Sunburst pie charts are not available by default.
Go to “Worksheet”→ Actions→ Name it→ Hover→ Click OK
-Dual Axis
-Synchronize axis
Adding borders and colours to the border

Market Basket Analysis:


Custom Visuals (by joins, functions, or dual axis)
KPIs (Key Performance Indicators)
Bar Graph with KPIs

Wednesday:
Time-Series Analysis on JetRail
Problem Statement
This time you are helping out Unicorn Investors with your data hacking skills. They are considering
making an investment in a new form of transportation - JetRail. JetRail uses Jet propulsion
technology to run rails and move people at a high speed! While JetRail has mastered the technology
and they hold the patent for their product, the investment would only make sense, if they can get
more than 1 Million monthly users with in next 18 months.
You need to help Unicorn ventures with the decision. They usually invest in B2C start-ups less than 4
years old looking for pre-series A funding. In order to help Unicorn Ventures in their decision, you
need to forecast the traffic on JetRail for the next 7 months. You are provided with traffic data of
JetRail since inception in the test file.
The solution consists of the following steps:
Understanding Data
1. Hypothesis Generation
2. Getting the system ready and loading the data
3. Dataset Structure and Content
4. Feature Extraction
5. Exploratory Analysis
Forecasting using Multiple Modeling Techniques
1. Splitting the data into training and validation part
2. Modeling techniques
3. Holt’s Linear Trend Model on daily time series
4. Holt Winter’s Model on daily time series
5. Introduction to ARIMA model
6. Parameter tuning for ARIMA model
7. SARIMAX model on daily time series
Thursday:
Data Science Mind Map:

Data Science:
i) Linear Algebra
(a) Vector Matrix Operations
1. Matrix/Matrix addition
2. Matrix/Matrix multiplication
3. Matrix/Vector multiplication)
(b) Matrix Properties
1. Matrices are not commutative (A*B != B*A)
2. Matrices are associative (A*B)*C = A*(B*C)
3. Matrices with the identity matrix are commutative (AI = IA)
4. SHAPE (M) = ALWAYS Row, Columns (R, C) (e.g. 2, 3)
(c) Inverse and Transposed Matrices
1. Inverse: A*A^-1 = A^-1*A = 1
i. (A^-1 is the inverse of the matrix A, though not all matrices have
an inverse)
2. Transpose: A → AT (where A is a m*n matrix and AT is an n*m, where A ij
= ATji) First column becomes first row basically.
i. X
ii) Front-End (Web Development Tools)
(a) Flask (python)
(b) Shiny (R)
(c) Dash (Python)
(d) Tableau
(e) Carto
(f) Angular/React (JS)
(g) Django (Python)
iii) Network Analysis
(a) Metrics
1. Centricity
2. Betweenness
(b) Data Format
[Node_df = NAME, NODE_ATTRIBUTE_1 , NODE_ATTRIBUTE_N

Relation_df = FROM, TO, EDGE_ATTRIBUTES_1, EDGE_ATTRIBUTE_N]


iv) Time-Series Analysis
v) Recommendation Engines
(1) Content Based
(2) Collaborative Filter
vi) Labeling Data
(a) Manual Labeling
1. Calculate approximate time it would take (e. g. 10s to label one, ergo…)
(b) Crowd Source
1. E. g. Amazon Mechanical Turk/ Chiron
(c) Synthetic Labeling
1. Introducing distortions to smaller training set to amplify it (but only if
distortions are what we would expect to find in real training set not just
random noise)
vii) Big Data
(a) Big Data Technologies
1. Hadoop
2. Spark
(b) ML on large datasets
1. Gradient Descent
i. Stochastic Gradient Descent
viii) Optimization algorithms
ix) Data Visualization Libraries
(a) GGPLOT2(R)
(b) MATPLOTLIB (PYTHON)
(c) SEABORN (PYTHON)
(d) PLOTLY (PYTHON)
x) Machine Learning
(a) Supervised (predictive models)
1. Classification Models
2. Regression Models
i. Performance Metrics/ Cost function
ii. Linear Regression
Single-feature linear regression

iii. Multi-variant linear regression


Hypothesis formula
Assumptions
Data Preparation
Learning rate selection
iv. Polynomial regression
Uses mechanisms of linear regression to solve non-linear
problems

Hypothesis
Feature scaling/normalization extremely important here due
to exponential sizes
However if we take this too far then high chance of OVER-
FITTING as line will follow data points exactly but will not
generalized well
v. Decision Trees for Regression
vi. Random Forest for Regression
3. Reinforcement Models
i. Performance Metrics
ii. Neural Networks
Architectures
Feed Forward Networks (FFNs)
Gradient Descent
Minimizing the cost function
Back Propogation
Recurrent Neural Networks (RNNs)
Convolution Neural Networks (CNNs)
Long Short Term Memory networks (LSTMs)
4. Ensemble Modeling
Ensembling is a technique of combining two or more algorithms of
similar or dissimilar types called base learners
 Types
1. Averaging
2. Majority Vote
3. Weighted Average
 Methods
1. Bagging
2. Boosting
3. Stacking
5. Unsupervised (descriptive models)
i. Clustering
ii. KNN
 Decide how many centroids (clusters) you want
 Randomly position these ‘centroids’ in the data
space
 Go through each data point and ‘assign’ that point
to the closest centroid
 Then move the centroids to the center (mean) of all
the data points that fall under that centroid
 Repeat until
iii. DBscan
iv. Auto-encoders (Neural Nets)

There’s much more to Mind Map of Data Science but couldn’t cover whole Mind Map.
Friday:
a) I worked on wandb.ai, and Google Colab. It’s awesome to use Colab. We can save files in
GitHub and reflect changes made in Google Colab file to GitHub file.
b) Apart from this, I also downloaded, installed, and made few dashboards on Power BI.
Revision:
 How to select the best ML algorithm
 How to perform feature selection
 How to perform cross validation
 Difference between Linear and Logistic Regression
 Sigmoid Function (Sigmoid Curve(S-Curve))
 P-Value
Student Signature: Vikas Gupta Date: 14/5/2021

Head Co-ordinator Signature: Date:

Instructions: After the completed report has been signed by both the student and Head-
coordinator, the head-coordinator shall scan the form to a pdf format and email it to the Director-
1 (bpmishra435@gmail.com) of the company. Specific problems, concerns or suggestions from
either the student/head-coordinator should be emailed separately to the C. E. O. (info@cureya.in)
of the company.

You might also like