Professional Documents
Culture Documents
INTRODUCTION
What and Why analytics
• Data Preparation
• Reporting, Dashboards & Visualization
• Segmentation Icon
• Forecasting
• Descriptive Modeling
• Predictive Modeling
• Optimization
continuation
Why is predictive analytics important?
COMMON APPLICATIONS
• The term can have one of several closely related meanings pertaining to databases and database
management systems (DBMS):
• A piece of middleware that extends or supplants the native data dictionary of a DBMS
• Data is of 2 types – Numeric and Character. Again numeric data can be further divided into sub group of –
Discrete and Continuous.
• Also based on usage data is divided into 2 categories – Quantitative and Qualitative
• Manufacturing industry also have their data divided in the groups discussed above. Like production
quantity is a discrete quantity while production rate is a continuous data. Similarly quality parameter can
be given ratings which ordinal data.
DATAMODELING
•Adata model is a conceptual representation of
the data structures that are required by a
database.
Disadvantages
• Implementation complexity
• Database management problem :
maintaining difficult
• Lack of structural independence
• programming complexity
Network
Model
•Graph structure
• Allow more connection between nodes
•Ex:Aemployee work for two department is not
possible in hierarchical model, but here it is
possible
Advantages
• Conceptual simplicity
• handle more relationships
• Ease of data access
• Data integrity
•Data independence
• Database standards
Disadvantages
• System Complexity
• Absence of structural independence
Relational
Model
• Data in the form of table
• each table application entity
• each row instances of that entity
• SQLserves as a uniform interface for users
providing a collection of standard
expression for storing and retrieving data
• Most popular database model
Advantages
•Structural independence
•Conceptual simplicity
•Design , implementation , maintenance and
usage ease
•Query capability
•Very powerful
•Flexible
•Easy to use query capability
The main highlights of relations
model
•Data is stored in tables called relations.
•Relations can be normalized.
•In normalized relations, values saved are
atomic values.
•Each row in a relation contains a unique
value.
•Each column in a relation contains
values from a same domain.
Data Modeling Techniques
Overview:
• Regression analysis mainly focuses on finding a relationship between a
dependent variable and one or more independent variables.
• Predict the value of a dependent variable based on the value of at least one
independent variable.
• It explains the impact of changes in an independent variable on the dependent
variable.
Y = f(X, β)
where Y is the dependent variable
X is the independent variable
β is the unknown coefficient
Types of Regression model are as below:
Linear Regression
•It’s a common technique to determine how
one variable of interest is affected by another.
•Its used for three main purposes:
•For describing the linear dependence of one
variable on the other.
•For prediction of values of other variable from
the one which has more data.
•Correction of linear dependence of one
variable on the other.
Cluster Analysis:
For Example,
We have defined “y” and then checked if there is any missing
value. T or True means that there is a missing value.
y <- c(1,2,3,NA) is.na(y)
# returns a vector (F FF T)
Arithmetic functions on missing values yield missing values.
For Example,
x <- c(1,2,NA,3) mean(x)
# returns NA
To remove missing values from our dataset we use na.omit() function.
For Example,
We can create new dataset without missing data as below: -
newdata<- na.omit(mydata)
Or, we can also use “na.rm=TRUE” in argument of the operator. From
above example we use na.rm and get desired result.
x <- c(1,2,NA,3)
mean(x, na.rm=TRUE)
# returns 2
MICE Package -> Multiple Imputation by Chained
Equations
MICE uses PMM to impute missing values in a
dataset.
PMM-> Predictive Mean Matching (PMM) is a semi-
parametric imputation approach. It is similar to the
regression method except that for each missing
value, it fills in a value randomly from among the a
observed donor values from an observation whose
regression-predicted values are closest to the
regression-predicted value for the missing value
from the simulated regression model.