Professional Documents
Culture Documents
Theory:
Data Analytics: The process of examining datasets to draw conclusions about the information they
contain. Data analytic techniques enable you to take raw data and uncover patterns to extract valuable
insights from it. Many data analytics techniques use specialized systems and software that integrate
machine learning algorithms, automation and other capabilities.
Data Science: Data science combines the scientific method, math and statistics, specialized
programming, advanced analytics, AI, and even storytelling to uncover and explain the business insights
buried in data. Data science encompasses preparing data for analysis and processing, performing
advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw
informed conclusions.
Dataset: A dataset (or data set) is a collection of data. A dataset corresponds to the contents of a single
database table, or a single statistical data matrix, where every column of the table represents a
particular variable, and each row corresponds to a given member of the data set in question.
Datasets are not limited to just numbers and text and may include collections of images or videos.
Fig1
Google Colab : It is basically a free notebook environment that runs fully in the cloud. It has features that
help you to edit documents like the same way you work with Google Docs. Colab supports many popular
and high-level machine learning libraries which can be easily loaded in your notebook.
Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning
model. It is the first and crucial step while creating a machine learning model.
A real-world data generally contains noises, missing values, and maybe in an unusable format which
cannot be directly used for machine learning models. Data preprocessing is required tasks for cleaning
the data and making it suitable for a machine learning model which also increases the accuracy and
efficiency of a machine learning model.
Importing libraries
Importing datasets
Feature scaling
In order to perform data preprocessing using Python, we need to import some predefined Python
libraries.
Numpy: Numpy Python library is used for including any type of mathematical operation in the code. It is
the fundamental package for scientific calculation in Python. It also supports to add large,
multidimensional arrays and matrices.
https://numpy.org/
Matplotlib: The second library is matplotlib, which is a Python 2D plotting library, and with this library,
we need to import a sub-library pyplot. This library is used to plot any type of charts in Python for the
code.
https://matplotlib.org/
Pandas: The last library is the Pandas library, which is one of the most famous Python libraries and used
for importing and managing the datasets.
https://pandas.pydata.org/
Machine Learning algorithms are the programs that can learn the hidden patterns from the data, predict
the output, and improve the performance from experiences on their own. Different algorithms can be
used in machine learning for different tasks, such as simple linear regression that can be used for
prediction problems like stock market prediction, and the KNN algorithm can be used for classification
problems.
The goal of supervised learning is to map input data with the output data. Supervised learning is based
on supervision, and it is the same as when a student learns things in the teacher’s supervision. The
example of supervised learning is spam filtering.
Classification
Regression
Examples of some popular supervised learning algorithms are Simple Linear regression, Decision Tree,
Logistic Regression, KNN algorithm, etc.
It is a type of machine learning in which the machine does not need any external supervision to learn
from the data, hence called unsupervised learning. The unsupervised models can be trained using the
unlabelled dataset that is not classified, nor categorized, and the algorithm needs to act on that data
without any supervision. In unsupervised learning, the model doesn’t have a predefined output, and it
tries to find useful insights from the huge amount of data. These are used to solve the Association and
Clustering problems. Hence further, it can be classified into two types:
Clustering
Association
Examples of some Unsupervised learning algorithms are K-means Clustering, Apriori Algorithm, Eclat,
etc.
3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing actions, and learn with
the help of feedback. The feedback is given to the agent in the form of rewards, such as for each good
action, he gets a positive reward, and for each bad action, he gets a negative reward. There is no
supervision provided to the agent. Q-Learning algorithm is used in reinforcement learning.
ML Algorithms
Decision Tree
SVM
Naïve Bayes
KNN
K-Means Clustering
Random Forest
PCA