You are on page 1of 3

Substr requires start and end position

Substring function does not require end position

The default format for date is always YYYY-MM-DD

Lubridate is the name of the package for comparing dates easily

Outlier are those observations which are extremely away from the average.

These are the observations whose values are either greater than upper whisker or lower than lower
whisker.

Q1 – 1st quartile is that no. below which there are 25 observations

Q2- 2nd quartile and median

Q3- 3rd quartile and 75% values are less than q3

What is machine learning?

Machine learning is a subfield of computer

Where is machine learning used?

 Intelligence agencies might use it to determine which of a huge quantity of intercepted


communications are of interest.
 Medical researchers might use it to predict the likelihood of a cancer relapse.

Business usage of data mining/machine learning

 From a large list of prospective customers, which are most likely to respond? We can use
classification techniques like logistic regression, classification trees, etc.
 To find which customers are most likely to commit, for example, fraud (or might already
have committed it) ?

Types of Machine learning techniques

 The machine learning algorithms which we will be covering are


- Supervised learning algorithms
- Unsupervised learning algorithms
Supervised Learning

 Supervised learning algorithms are those used in classification and prediction


 We must have data available in which the value of the outcome of interest (e.g, purchase or
no purchase) is known.
 The objective is to predict the values of the outcome of interest

Example
 Sales are influenced by variables like advertisement expenses, manpower deployed for sales,
etc

Modes for supervised learning


 When the response variable is numerical, predictive modelling is called regression.
 When the response variable is nominal/ categorical, predictive modelling is called
classification. The values of the response variable can be considered as “class labels” in this
case.

Unsupervised learning
 Unsupervised learning algorithms are those used where there is no outcome variable to
predict or classify.
 Association rules, data reduction methods, and clustering techniques are all unsupervised
learning methods.

Market basket method is also known as association rule

IQR – Inter Quartile Range: It is the difference between Q3 and Q1

IQR = Q3 – Q1

Outlier

Jan 300

410

325

360

Dec 420
440

80000

Pch – point character

Lwd – line width

Lty – line type

Box & Whisker

You might also like