Professional Documents
Culture Documents
Tableau Intro:
What is Tableau?
-Tableau is a Business Intelligence (BI) tool. BI (Business Intelligence) is a process of converting raw
data into meaningful information. It is a set of process, architecture, and technologies that drives
profitable business actions.
-Business Intelligence (BI): It is a method of collecting, storing, and analyzing data from business
operations or activities to optimize performance.
BI has a direct impact on the organizational strategic and operational business decision. It impacts
the revenue and financial model of the business.
Tableau Installation:
Go to Tableau.com→ Click on “Products” menu tab→ Click on “Desktop” to install it
You’ll see three areas to work on Tableau after the installation namely; Connect, Open, Discover.
Tableau comes with a 14-day free trial period.
Tableau Architecture:
Tableau
Create Share
Tableau Pricing:
For Individuals: - $70/user/month and it’s billed annually
For Teams & Organizations: -
Reports are deployed with Tableau Server: $70, $35, $12 (There’s infrastructure cost
associated with it.
Reports are deployed with Tableau Online: $70, $42, $15 (It’s deployed on Cloud Server
and thus there’s no infrastructure cost associated with it.
Process Flow of BI
Project:
Business Data
Understanding Understanding
Data Data
Preparation
Deployment
Modeling
Evaluation
Business Understanding: - It refers to understanding the domain of the business.
Data Understanding: - It refers to understanding of the data which may be structured or
unstructured, short or long, ambiguous or null.
Data Preparation: - It refers to the tables, dimensions, relationships
Modeling: - It refers to the creation of formulas, columns, functions, and advance functions
Evaluation: - It refers to the creation and evaluation of reports, dashboards. If evaluation fails,
process is repeated.
Deployment: - It refers to the deployment of the project.
Go to Menu→ Click on MS-Excel→ Select Excel File→ “Processing Request” box appears→ data
source interface appears
Tuesday:
Tableau has 7 data types:
1) Text (String) Values
2) Date Values
3) Date & Time Values
4) Numerical Values
5) Boolean Values (relational only)
6) Geographic values (used with maps)
7) Cluster Group (Used with “find clusters” in Data)
-Visuals Operations:
Changing colour of the bars in visuals
Increasing/Decreasing the size of the bar
Fit Width
The aim is to make visuals attractive so it’s easy to understand them
-Aggregation: It’s a collection of things in a cluster. It’s a collection of numbers in a single value. E. g.,
Median, Average, Count, Variance
-Creation of a new workbook:
Local aggregation: The aggregations that are directly implemented at the visual level.
Global aggregation: This aggregation is performed under the measure section. Go to
“Measures”→ Go to “Sales”→ Go to “Default properties”→ Go to “Aggregation”→ Select it.
-Clear Screen Operation
-Change Currency
-For grand total operation→ Go to “Analysis”→ Go to “Totals”
There are two formats to save a Tableau file:
-Saving a Tableau file: Go to “File”→ Save As→ Save in “.twb” format or “.twbx” format
“.twb” is an earlier version and it needs a dataset on the system to view the file.
“.twbx” is the newer version and it refers to the xml format. It doesn’t need dataset on the system to
view the file.
Charts:
Scatter Plot: It is used for establishing relation between 2 variables (which can be dependent or
independent)
Sunburst Pie Charts: It consists of 2 layers
i. Product Category Chart
ii. Product Sub-Category Chart
Sunburst pie charts are not available by default.
Go to “Worksheet”→ Actions→ Name it→ Hover→ Click OK
-Dual Axis
-Synchronize axis
Adding borders and colours to the border
Wednesday:
Time-Series Analysis on JetRail
Problem Statement
This time you are helping out Unicorn Investors with your data hacking skills. They are considering
making an investment in a new form of transportation - JetRail. JetRail uses Jet propulsion
technology to run rails and move people at a high speed! While JetRail has mastered the technology
and they hold the patent for their product, the investment would only make sense, if they can get
more than 1 Million monthly users with in next 18 months.
You need to help Unicorn ventures with the decision. They usually invest in B2C start-ups less than 4
years old looking for pre-series A funding. In order to help Unicorn Ventures in their decision, you
need to forecast the traffic on JetRail for the next 7 months. You are provided with traffic data of
JetRail since inception in the test file.
The solution consists of the following steps:
Understanding Data
1. Hypothesis Generation
2. Getting the system ready and loading the data
3. Dataset Structure and Content
4. Feature Extraction
5. Exploratory Analysis
Forecasting using Multiple Modeling Techniques
1. Splitting the data into training and validation part
2. Modeling techniques
3. Holt’s Linear Trend Model on daily time series
4. Holt Winter’s Model on daily time series
5. Introduction to ARIMA model
6. Parameter tuning for ARIMA model
7. SARIMAX model on daily time series
Thursday:
Data Science Mind Map:
Data Science:
i) Linear Algebra
(a) Vector Matrix Operations
1. Matrix/Matrix addition
2. Matrix/Matrix multiplication
3. Matrix/Vector multiplication)
(b) Matrix Properties
1. Matrices are not commutative (A*B != B*A)
2. Matrices are associative (A*B)*C = A*(B*C)
3. Matrices with the identity matrix are commutative (AI = IA)
4. SHAPE (M) = ALWAYS Row, Columns (R, C) (e.g. 2, 3)
(c) Inverse and Transposed Matrices
1. Inverse: A*A^-1 = A^-1*A = 1
i. (A^-1 is the inverse of the matrix A, though not all matrices have
an inverse)
2. Transpose: A → AT (where A is a m*n matrix and AT is an n*m, where A ij
= ATji) First column becomes first row basically.
i. X
ii) Front-End (Web Development Tools)
(a) Flask (python)
(b) Shiny (R)
(c) Dash (Python)
(d) Tableau
(e) Carto
(f) Angular/React (JS)
(g) Django (Python)
iii) Network Analysis
(a) Metrics
1. Centricity
2. Betweenness
(b) Data Format
[Node_df = NAME, NODE_ATTRIBUTE_1 , NODE_ATTRIBUTE_N
Hypothesis
Feature scaling/normalization extremely important here due
to exponential sizes
However if we take this too far then high chance of OVER-
FITTING as line will follow data points exactly but will not
generalized well
v. Decision Trees for Regression
vi. Random Forest for Regression
3. Reinforcement Models
i. Performance Metrics
ii. Neural Networks
Architectures
Feed Forward Networks (FFNs)
Gradient Descent
Minimizing the cost function
Back Propogation
Recurrent Neural Networks (RNNs)
Convolution Neural Networks (CNNs)
Long Short Term Memory networks (LSTMs)
4. Ensemble Modeling
Ensembling is a technique of combining two or more algorithms of
similar or dissimilar types called base learners
Types
1. Averaging
2. Majority Vote
3. Weighted Average
Methods
1. Bagging
2. Boosting
3. Stacking
5. Unsupervised (descriptive models)
i. Clustering
ii. KNN
Decide how many centroids (clusters) you want
Randomly position these ‘centroids’ in the data
space
Go through each data point and ‘assign’ that point
to the closest centroid
Then move the centroids to the center (mean) of all
the data points that fall under that centroid
Repeat until
iii. DBscan
iv. Auto-encoders (Neural Nets)
There’s much more to Mind Map of Data Science but couldn’t cover whole Mind Map.
Friday:
a) I worked on wandb.ai, and Google Colab. It’s awesome to use Colab. We can save files in
GitHub and reflect changes made in Google Colab file to GitHub file.
b) Apart from this, I also downloaded, installed, and made few dashboards on Power BI.
Revision:
How to select the best ML algorithm
How to perform feature selection
How to perform cross validation
Difference between Linear and Logistic Regression
Sigmoid Function (Sigmoid Curve(S-Curve))
P-Value
Student Signature: Vikas Gupta Date: 14/5/2021
Instructions: After the completed report has been signed by both the student and Head-
coordinator, the head-coordinator shall scan the form to a pdf format and email it to the Director-
1 (bpmishra435@gmail.com) of the company. Specific problems, concerns or suggestions from
either the student/head-coordinator should be emailed separately to the C. E. O. (info@cureya.in)
of the company.