You are on page 1of 5

Explore data pre-processing packages and AIML algorithms

Explore data pre-processing packages and AIML algorithms

Theory:

Data Analytics: The process of examining datasets to draw conclusions about the information they
contain. Data analytic techniques enable you to take raw data and uncover patterns to extract valuable
insights from it. Many data analytics techniques use specialized systems and software that integrate
machine learning algorithms, automation and other capabilities.

Data Science: Data science combines the scientific method, math and statistics, specialized
programming, advanced analytics, AI, and even storytelling to uncover and explain the business insights
buried in data. Data science encompasses preparing data for analysis and processing, performing
advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw
informed conclusions.

Dataset: A dataset (or data set) is a collection of data. A dataset corresponds to the contents of a single
database table, or a single statistical data matrix, where every column of the table represents a
particular variable, and each row corresponds to a given member of the data set in question.

Datasets are not limited to just numbers and text and may include collections of images or videos.

Fig1

Google Colab : It is basically a free notebook environment that runs fully in the cloud. It has features that
help you to edit documents like the same way you work with Google Docs. Colab supports many popular
and high-level machine learning libraries which can be easily loaded in your notebook.

Write and execute code in Python

Document the code which supports the mathematical equations

Create new notebooks

Upload the existing notebooks

Share the notebooks with the google link


Import data from Google Drive

Save notebooks from/to Google Drive

Import/Publish notebooks from GitHub

Import external datasets e.g. from Kaggle

Integrate PyTorch, TensorFlow, Keras, OpenCV

Free Cloud service with free GPU and TPU

Data Preprocessing in Machine Learning

Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning
model. It is the first and crucial step while creating a machine learning model.

Why do we need Data Preprocessing?

A real-world data generally contains noises, missing values, and maybe in an unusable format which
cannot be directly used for machine learning models. Data preprocessing is required tasks for cleaning
the data and making it suitable for a machine learning model which also increases the accuracy and
efficiency of a machine learning model.

It involves below steps:

Getting the dataset

Importing libraries

Importing datasets

Finding Missing Data

Encoding Categorical Data

Splitting dataset into training and test set

Feature scaling

In order to perform data preprocessing using Python, we need to import some predefined Python
libraries.
Numpy: Numpy Python library is used for including any type of mathematical operation in the code. It is
the fundamental package for scientific calculation in Python. It also supports to add large,
multidimensional arrays and matrices.

https://numpy.org/

Matplotlib: The second library is matplotlib, which is a Python 2D plotting library, and with this library,
we need to import a sub-library pyplot. This library is used to plot any type of charts in Python for the
code.

https://matplotlib.org/

Pandas: The last library is the Pandas library, which is one of the most famous Python libraries and used
for importing and managing the datasets.

https://pandas.pydata.org/

Machine Learning Algorithms

Machine Learning algorithms are the programs that can learn the hidden patterns from the data, predict
the output, and improve the performance from experiences on their own. Different algorithms can be
used in machine learning for different tasks, such as simple linear regression that can be used for
prediction problems like stock market prediction, and the KNN algorithm can be used for classification
problems.

Types of Machine Learning Algorithms

Machine Learning Algorithm can be broadly classified into three types:

Supervised Learning Algorithms

Unsupervised Learning Algorithms

Reinforcement Learning algorithm

1) Supervised Learning Algorithm


Supervised learning is a type of Machine learning in which the machine needs external supervision to
learn. The supervised learning models are trained using the labeled dataset. Once the training and
processing are done, the model is tested by providing a sample test data to check whether it predicts the
correct output.

The goal of supervised learning is to map input data with the output data. Supervised learning is based
on supervision, and it is the same as when a student learns things in the teacher’s supervision. The
example of supervised learning is spam filtering.

Supervised learning can be divided further into two categories of problem:

Classification

Regression

Examples of some popular supervised learning algorithms are Simple Linear regression, Decision Tree,
Logistic Regression, KNN algorithm, etc.

2) Unsupervised Learning Algorithm

It is a type of machine learning in which the machine does not need any external supervision to learn
from the data, hence called unsupervised learning. The unsupervised models can be trained using the
unlabelled dataset that is not classified, nor categorized, and the algorithm needs to act on that data
without any supervision. In unsupervised learning, the model doesn’t have a predefined output, and it
tries to find useful insights from the huge amount of data. These are used to solve the Association and
Clustering problems. Hence further, it can be classified into two types:

Clustering

Association

Examples of some Unsupervised learning algorithms are K-means Clustering, Apriori Algorithm, Eclat,
etc.

3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing actions, and learn with
the help of feedback. The feedback is given to the agent in the form of rewards, such as for each good
action, he gets a positive reward, and for each bad action, he gets a negative reward. There is no
supervision provided to the agent. Q-Learning algorithm is used in reinforcement learning.

ML Algorithms

List of Popular Machine Learning Algorithm

Linear Regression Algorithm

Logistic Regression Algorithm

Decision Tree

SVM

Naïve Bayes

KNN

K-Means Clustering

Random Forest

PCA

You might also like