You are on page 1of 13

By Cristóbal Veas

LinkedIn: https://www.linkedin.com/in/cristobal-veas

MACHINE LEARNING
“A subfield of computer science that gives computers the ability to learn without being explicitly
programmed”

Major machine learning techniques


- Regression/Estimation: predicting continuous values.
- Classification: predicting the item class/ category of a case.
- Clustering: finding the structure of data, summarizing.
- Associations: Associating frequent co-occurring items/events.
- Anomaly detection: discovering abnormal and unusual cases.
- Sequence mining: predicting next events, click-stream (markov model, HMM).
- Dimension reduction: reducing the size of data (PCA).
- Recommendation systems: recommending items.

Other important concepts


Artificial Intelligence: A wide field that tries to make computers intelligence in order to mimic cognitive
functions of humans (Computer vision, Language processing, creativity, summarizing, etc).
Machine Learning: Branch of A.I that covers the statistical part of computer intelligence. (Classification,
Clustering, Neural Network, etc).
Deep Learning: deeper level of automatization compared with most algorithms of machine learning.

SUPERVISED LEARNING
To train and direct the machine learning to predict model of future instances. For Supervised Learning the
data must be labelled.

Supervised Learning Features:


- Useful for Classification (process of predicting categories) and Regression (Process of predicting
continuous values).
- Has more evaluation methods than Unsupervised Learning.
- Has more controlled environment than Unsupervised Learning.

Types of Supervised Techniques:


REGRESSION
The process of predicting continuous values. It is divided in two types: Linear Regression (X1,Y1) and
Multiple Regression (X1,X2,…..,Xn,Y1).

Advantages of Regression:
- Very fast to analyze.
- It is not requiring tuning parameters.
- It is Easy to understand.
- It is Highly interpretable.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

Train and Test Data:


It is useful to separate the data in test and train data to accurate the prediction model at the moment of
evaluating the values. There are two types of forms to do this:

- Test on a portion of train set: When the Test-Set is a portion of the Train-Set. The benefits are
high training accuracy and low out-of-sample accuracy.
- Train/Test Split: It is mutually exclusive with more accurate evaluation on out sample accuracy
and highly dependent on which datasets the data is trained and tested.
- K-Fold Cross Validation: Using multiple train/test split resulting the average to produce a more
consistent accuracy.

Types of Algorithms for Regression


- Simple Regression: A model used for one independent variable to predict a dependent variable.

- Multiple Linear Regression: A model used for many independent variables to predict a dependent
variable.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

- Non-Linear Regression: Models to recall when the distribution between data is not linear.
Examples of Non- Linear Regression are polynomial, log, logistic, cubic, square regressions, etc.

Methods to minimize the MSE (Minimum Square Error)


It’s important to minimize the MSE to obtain the most accurate predictive model, to do that there are
some methods to find the best parameter θ that minimize the MSE.

- Ordinary least Squares: Using Linear algebra operations and for dataset with less 10k values.
- Optimization Algorithms: Using Gradient Descent for dataset of 10k values or more.

Evaluation metrics in Regression:


By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

CLASSIFICATION
The process of categorizing some unknown items into a discrete set of categories or “classes”. It
corresponds to a supervised learning approach. The target attribute is a categorical variable.

Types of Algorithms for Classification


- K- Nearest Neighbors: It is an algorithm who assumes that similar things exists in proximity. The
steps to use this algorithm are:
1. Pick a value for K.
2. Calculate the distance of unknown cases from all cases.
3. Select the K-observations in the training data that are nearest to the unknown
data point.
4. Predict the response of the unknown data point using the most popular
response value from K-nearest neighbors.
It is important to determinate the best value of K (Number of nearest neighbors of a
specific point). To do that It is useful to plot different K and the accuracy of those ones.

- Decision Trees: It is used to go from observations about an item (represented in the branches)
to conclusions about the item's target value (represented in the leaves). The model is all about
finding the highest information and weighted entropy.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

- Logistic Regression: A classification algorithm for categorical variables. It is analogous to the linear
regression but predicting a categorical variable. This model Is suitable when a data is binary, it
required probabilistic results and if is important to understand the impact of a feature. Logistic
Regression uses as a step the sigmoid function. The training process is:
1. Initialize θ.
2. Calculate the predict value for a costumer.
3. Compare the output of the predict value and the real value and record it as error.
4. Calculate the error for all costumers.
5. Change the θ to reduce the cost.
6. Go back to step 2.

- Support Vector Machine (SVM): It is a supervised algorithm that classifies data finding a
separator. It is also mapping data to a high-dimensional feature space using different predictions
models (Kernelling). Using to image recognition, text category assignment, detecting spam,
sentiment analysis, gene expression classification.
Advantages and Disadvantages of using this algorithm:
1. A. Accurate in high-dimensional spaces.
2. A. Memory efficient.
3. D. Prone to over-fitting.
4. D. No probability estimation.
5. D. Small datasets.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

Evaluation Metrics in Classification


- Jaccard Index: A value nearest to 1 have more accuracy.

- Confusion Matrix: It’s used to calculate the value of F-score, each value of the matrix represents
the number of correct and wrong predictions. A value of F-score nearest to 1 have more
accuracy.

- Log Loss: Using for probabilities between 0 and 1 of a class labels instead of the label. A value
nearest 0 have better accuracy.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

CLUSTERING
Dividing the population or data points into a number of groups such that data points in the same groups
are more similar to other data points in the same group and dissimilar to the data points in other groups.
Clustering is a process for unsupervised learning and is used for exploratory data analysis, summary
generation, outlier detection, finding duplicates, pre-processing step,etc.

Types of Algorithms for Clustering.

- K-means Algorithms: It is used for portioning clustering dividing the data into non-overlapping
subsets without any cluster-internal structure. The examples within a cluster are very similar and
very different across different clusters. K-means are used for med and large sized databases,
produces sphere like clusters and needs numbers of cluster. The features of this algorithms are:
Intra-Cluster: Distances within examples inside a cluster (minimized).
Inter-Cluster: Distances across examples inside a cluster (maximized).

Steps to K-means Algorithm:


1. Initialize K (centroids randomly).
2. Distance calculation.
3. Assign each point to the closest centroid
4. Calculate the SSE and try to minimize with point 5.
5. Compute the new centroids for each cluster.
6. Repeat until there are no more changes.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

- Hierarchical Clustering: Build a hierarchy of clusters where each node is a cluster consists of the
clusters of its daughter nodes. To the top are agglomerative approach and to the bottom are
divisive approach. The hierarchical clustering is mapping into a dendrogram. The steps are:

Distance between clusters:

Advantages and disadvantages of Hierarchical Clustering:


1. Doesn’t required number of clusters to be specified.
2. Easy to implement.
3. Produces a dendrogram, which helps with understanding the data.
4. Can never undo any previous steps throughout the algorithm.
5. Generally, has long runtimes.
6. Sometimes difficult to identify the number of clusters by the dendrogram.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

- Density Based Clustering: Algorithm useful to locate regions of high density and separate outliers.
One of the most important is DBSCAN (Density-Based Spatial Clustering of Applications with
Noise) used to works based on density of objects. Each point is either (Core, Border, Outlier) It is
based in 2 parameters.
1. Radius of neighborhood: Radius that if includes enough number of points within
we call it a dense area.
2. Min number of neighbors: The minimum number od fata points we want in a
neighborhood to define a cluster.

Advantages of DBSCAN:

1. Arbitrarily shaped clusters.


2. Robust to outliers.
3. Does not require specification of the number of clusters.

RECOMMENDER SYSTEMS
It is a process that capture the pattern of people’s behavior and use it to predict what else they might
want or like. The applications are what to buy, where to eat, which job to apply, who you should be friends
with, personalize your experience on the web. The advantages are broader exposure, possibility of
continual usage or purchase of products and provides better experience.

Implementing Recommendation Systems:

- Memory Based: uses the entire user-item dataset to generate a recommendation. Uses statistical
techniques to approximate users of items (Pearson correlation, cosine similarity, Euclidean
distance, etc).
- Model-Based: develops a model of users to learn their preferences and models can be created
using machine learning techniques like regression, clustering, classification.

Types of recommendation systems:

- Content-Based: Tries to recommend items to an user based on their profile

Steps for content-based recommender system:


By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

- Collaborative Filtering: Based on the fact that relationships exist between products. Those
algorithms have 2 different approaches:
1. User-Based collaborative filtering: Based on user’s neighbors.
The steps are:
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

2. Items-Based collaborative filtering: Based on item’s similarities.

Challenges of collaborative filtering:

1. Data Sparsity: Users in general rate only a limited number of items.


2. Cold start: Difficulty in recommendation to new users or new items.
3. Scalability: Increase in number of users or items.

Difference Among fields of Data Science


By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

REINFORCEMENT LEARNING
“A subfield of computer science that study of how given a set of rewards or punishments, it is necessary to
learn what actions to take in the feature”

Agent: Entity that perceives its environment and acts upon that environment. (Ex: a person trying to solve a
puzzle).
State: A configuration of the agent and its environment. (Ex: a configuration of the puzzle’s pieces to solve).
Action: Choices that can be made in a state (Ex: The actions taken by the agent to solve the puzzle in a
specific state).
Environment or Transition Model: The place where the agent is going to make their action.
Reward: Numerical value to represent if the action taken was positive or negative.

Difference of Exploration and Exploitation in Reinforcement Learning

Exploitation: Using knowledge of the actions that the A.I already has.
Exploration: Using knowledge exploring other actions that it may not have explored before.

Algorithms for Reinforcement Learning

Markov Decision Process: Model for decision-making, representing states, actions, and their rewards.
- Set of states S
- Set of ACTIONS(S) = a
- Transition Model P(s’|s,a)
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas

Q- Learning: Method for learning a function Q(s, a) estimate of the value of performing action a in state s.

Pseudocode of Q-Learning:
Start with Q(S, A) = 0 for all s, a
When it takes an action and receive a reward
Estimate the value of Q(s, a) based on current reward and expected future rewards
Update Q(s, a) to take into account old estimates as well as our new estimate.

Q(s, a) update by Q(s, a) + α(new value estimate - old value estimate)

Q(s, a) update by Q(s, a) + α(( r + MAXa'Q(s', a')) - Q(s, a))

r: reward
s: state
a: action
α: Learning rate -> how much is valuable new information compared old information

Greedy Decision-Making Policy: Using with Q- learning formula; when in states, choose action a with highest
Q(s,a).

ε Greedy:
Pseudocode:
Set ε equal to how often we want to move randomly.
With Probability 1 – ε , choose estimated best move.
With probability ε, choose a random move.

Function Approximation : Approximating Q(s,a), often by a function combining various features, rather than
storing one value for every state-action.

You might also like