You are on page 1of 25

UNIT 1

CAPSTONE PROJECT
ww

Learning Outcomes
the Capstone Project
Understanding Understanding/Defining the Probiem
Problem Decomposing
using Design Thinking Framework Using an Analytical Approach
Model Validation
Metrics af Model Quality-Loss Function

i f i c i a l Intelligence A Is aiu adround us. Everyone is leveraging Al applications to make their life better
Ar
started using Al algorithms to detect suspicious data patterns to identify amination centres
CBSE has
Even
hat indulge in malpractices during examinations conducted by CBSE. Organisations are using Al in customer
i c e , human resource, Il automation, security, and more. in fact, India's Al market is expected to reach $7.8

billion by 2025.

will learn about the meaning of the capstone project, appily concepts learned in class Xi to
to this unit, you
cnlve real-world problems. Next, you will understand which algorithm to apply for your Al model and also learn

ahout different types of testing methodologies. At the end of the chapter you will learn about the concept of
Loss functions.

(Understanding the Capstone Project


Having studied the various facets of Al, students can also create their own small version of an Al model that can help
to solve a real-life problem. This type of Al model is called capstone project. A capstone project is comprehensive.
independent, and final project undertaken as part of the curriculum designed to assess the skills, knowledge, and
expertise a student has acquired. Such a project aften involves researching a topic, evaluating a new technique
or

method, developing a health plan, reseaching a character or event in history, or even the composition of a sketch
or a play.

No matter what type of project you choose to undertake, the result is the same. You can demonstrate your
understanding of the course material learned and demonstrate your readiness to enter in the professional warld to
kickstart your career. It's a rewarding experience if it's done well.
So, a capstone project provides an opportunity for students to integrate all the knowledge learned during the
course and demonstrate it. Ideally, a great capstone project is one that

Capstone Project 111


you have great interest and passion tor

has practical usage/value

doable in a particular timeframe

contemporary
and helps in your career advancement

Experiential Learning
Video Session
Scan the QR code or visit the following link to watch the video: wnat is a Capstone Project

https://wwwyoutube.com/watch?v=yBs2Vb5H54
ATer watching the video, answer the following question:
What is a Capstone Project? What is its purpose?

Some of the examples of capstone projects in Al, from which you can pick up to develop, are as followe:

Studying images to diagnose disease


Forecasting student results
Creating a chatbot for the school admin department or counsellor to handle parents'/students querios.
es
IBM Watson, Google Dialogue Flow
using
Image Classifier
Analysing social media to assess emotions
Using regression to predict a trend

Understanding/ Defining the Problem


Nowadays, Artificial Intelligence is one of the most transformative technologies. Every Al project goes through these

six steps:

Defining the Data Defining the


problem Collection features

Al modelling Evaluation
H Deployment

Understanding the nature of a problem provides insight into the components and attributes of the
yet-to-be-implemented solutions. A good understanding of a problem guides future decisions to make at the
Al mode
project stages, especially decisions such as determining if an Al solution is even feasible. At the core of every
using AL.
is 'finding patterns in data. If the data shows no patterns, then most probably the problem cannot be solved

112 Touchpad Atificial Inteligence-XIl


ww
uccesstulproblem-defin
lem-defining cess
proces requires basic
a
analysis and evaluation of the project-related problems,
inding the right problem definition is usually an iterative process. It can reveal more
nd methods. Find
their
reasons5 ts to Consider that would have been ignored without the problem definition process. Questions
questions
andpoints

Just
a
reference point forra thorough analysis of the problem and the probler
that surround it.
as
following questions can save you weeks and months working on oblems that proved
below o ws e r v e

s p e n d i n gtinme a n s w e r i n g t

previously
unknown reasons
unknown
for
cible roblem that needs to be resolved?
Whatisthe
need
a solution to your problem?
a
d oy o u
Why
work on the solution to the
problem?
we
should

ofthe problem doesthe Al model solve?


How
a s p e c t s
t
Which
interact with the solution to the problem?
need to
do I

category
of data will be involved? (Clasification)
Which
how many (Regression)
much or
How
be grouped
? (Clustering)
the data
Can
unusual pattern in the data? (Anomaly Detection)
Is there any

option
should be given to the customer? (Recommendation)
Which solve your
to
determine which of these questions you' re asking, and in what wayanswering it helps
t i s e s s e n t i a l

p r o b l e m .

Brainy Fact found in sunflowers,


nature even! The 'Golden Ratio' or 'Divine Proportion' i.e., 1.618 is
There are patterns in Vitruvian Man, Mona
etc. Leonardo Da Vinci used this ratio in his paintings like The
daisies, chrysanthemums,
the Taj Mahal has been built using
the Golden Ratio. The ratio of any two terms of the Fibonacci
sa. Even
Lisa.
i5 1.618. Fascinating isn't it!
series like 233/144

Framework
Problem Decomposing using Design Thinking
a solution-based approach to solving problems.
It is very useful
Recall that Design Thinking methodology provides the problem in
in tackling complex problems by
understanding the needs of the people involved, reformulating
to prototyping and testing. By
a human-centered way, brainstorming
many ideas, and taking a practical approach
can solve complex problems that arise
around us. Let's
understanding the five phases of design thinking, anyone
recall the 5 stages as follows:
Process
Design Thinking: A 5 Stage

**~*

Empathise DefineH Ideate Prototype Test

.-

Capstone Project 113


understand your users.
mpathize: Conduct research (interviews, polls,
etc:) to better
e n e : Define the challenges, use your research to observe users current problems.
ms. Creat
Create a point
of
deate: Brainstorm to arrive at various creative solutions.

of one or more ideas.


jien
rototype: Build representations (charts, models)
Test: Test your model(s) and gain user feedback.

Solving real-life problem is complicated. During coding, follow problemas ecomposition


p p e d to real-life problems
we

as well.Problem decomposition steps


are follows: methodology which can
words:
1 derstand the problem and express the problem in your own
Understand the required inputs and outputs
ASK questions for clarity (in class, these questions may be directed to the teacher, howe

ask yourself
however,y can also
or your colleagues)
2. break down the problem into several big parts. Write them down on paper

Divide any larger complicated part into smaller pieces. Continue this until all parts are small.
3.
4. Code the smaller parts one by one. Use the following methodology:

Analyse how to implement the code.


Write the code/query.
Test each code individually
Fix the problem(s), if any.
magine you want to create your first website. How would you decompose this task? Think about the folla.

decomposing:
following while
What colour combination can be used?
How many web pages are to be included on the website?

Who will be the target audience?


What kind of images are to be posted on the website?
What video/audio would be included?
Which software to use for website development?
This work can also be divided among various team members involved in this task.

Experiential Learning

Video Session
Scan the QR code or visit the following link to watch the video: Introduction to Decomposition
https://www.youtube.com/watch?v=rsYpP2-omg
answer the following questions
After watching the video,
What is decomposition?

114 Touchpad Artificial Intelligence-XIl


Using
an
Analytical Approach
ommon to solve problems and Dically data scientists
science,
it is
comm answer questions using data analysis.
I n

ta
d a t a tcomes or explore underlying patterns, for information gathering purposes. Organizations
predict
to
information to take actions to ideally improve future outcomes There aremany rapidly evolving
m o d e l

builda

can
thenfr data analysis and model building. In a remarkably short time, there have been significant
technologi
ments in the quality and accuracy of the models too
improvements
more accessible and prevalent, data scientists need a core methodology that can
becomes
analytics
fnis
data
regardless of the technology, the volume of relevant data, or the approach.
As
quiding strategy,
pIon that form an
ay emphasizes many of the new approaches in data science. It consists of 10 steps
context of the
metnrocess using data to discover information. Each step plays an important role in the
iterativ

o v e r a l l m e t h o d o l o g y

Business

h
Analytic
understanding approach

Data
Feedback
requirements

Data
Deployment
collection

Data
Evaluation
understanding

Data
Modelling
preparation

Business Understanding
Stage 1:
Fverv project starts
with a business understanding. Business sponsors, who need analytical solutions, play the most
in the problem, project objectives, and salution requirements from a business
important role at this time defining
perspective. This fist step lys the foundation for successful business problem solving and is perhaps the hardest
To help ensure project success, sponsors should be involved throughout the project to provide expert knowiedge
review interim conclusions and ensure that work remains on track to produce the intended solution.

Stage 2: Analytic Approach


Once the business problem is clearly stated, the data scientist can define an analytical approach to solving the
problem. This step is to represent a problem in the context of statistical techniques and machine learning so that the
organization can determine the most appropriate for the desired outcome.
For example,
I f the goal is to predict an answer such as yes" or "no', then the analytical method can be defined as the
building, testing, and execution of a clasification model.
I f the goal is to determine the probability of action, then predictive modelling can be used.
I f the goal is to show relationships, a descriptive approach may be necessary.
Capstone Project 115
Stage 3:
The
Data Requirements
approach chosen characterizes the requirements tor thne data. In particular
the analytical mete
collection.
r e q u i r e some content, format, and initial data

Stage 4: Data Collection


urng the initial data collection phase, data scientists identity available data sources (str
and semi-str If there is a gap in data collection, the dsed red, unstructur
Structured) relevant to the problem area.
data scientist ma
modity data requirements accordingly and collect new and/or more data. loday's high-nost
ures
analytics enable data scientists to utilize large datasets that contain large or even all of the availanceda
this, predictive models are able to better predict rare events such as disease or system failure.
performancedata.datDuetabas
may need

Stage 5: Data Understanding


After the initial data collection, and visualizations can be
techniques such as descriptive statistics apr
evaluate the content, quality, and initial insights of the data. Additional data collection may be reau lied to
datasets
to fill
the to
Stage 6: Data Preparation gap
nis stage contains all
the activities to build the dataset used in the subsequent modeling stage. Acti.
data include:
data cleansing (handling missing or invalid values, removing duplicates, applying correctform
joining data from multiple sources (files, tables, platforms), and
rmats),
the conversion of data to more useful variables.

Data
preparation is usually the most time-consuming procedure in a data SCience project. In many a
data
preparation procedures are common for a variety of problems. Automating certain data prenarss
.

Some
advance can speed up the process
by minimizing ad hoc preparation time. Today's tion st
parallel systems and analytics capabilities where data is stored allow data scientists to high-performancopS
in
prepare data mor
ore easily and
quickly using very large datasets.
Stage 7: Modelling
The
modelling stage, which begins with the initial version of the prepared data set, focuses
on conch.
predictive or descriptive models based on the previously stated analytic approach. To
data scientists develop a prediction mo
employ a training set (historical data in which the desired outcome is already odel,
As businesses receive known).
intermediate insights, the modelling process is often very iterative,
data preparation and model formulation. Data leading to refinemente in
scientists may attempt numerous
algorithms with their respective
parameters for a specific technique to get the best model for the available variables.

Stage 8: Evaluation
The data scientist

reviews the model during


development and before deployment to determine its quality and ensure that it
correctly andcompletely answers the business problem.
can interpret the model's
quality and efficacy in solving the problem by
and other outputs such as tables and producing numerous diagnostic metrics
graphs.
can utilize a
testing set for predictive models (which is separate from the
probability distribution and has a known outcome.) The testing set is training set but follows the same
used to assess the model and adjust it as
necessary.

116
Touchpad Artificial Intelligence-XIl
ww
ment, the final model is sometimes applied to a validation set as well
f i n a la s s e s s m e n t , i

ientists can use statistical significance tests to verify the model's accuracy This additional eviden
ition, d a t a sCie

nadditie nodel deployment or take action when the stakes are high, such as with an expensive addiu onal
model
deplov

justity

couldhelp a kkey
or a
e aviation flight system
ould p r o c e d u r e
medical
S t a g e9 : D e p l o y m e n t

been
production environment or an equivalent test nvironment once it has
deployed into tthe
deployed

is nias
It is usually used in a restricted capacity until its effectiveness
zed by the business sponsors.
model

The
b u i l ta n d a u t h
scoring process run

dhlv assessed. The model might be embedded in a complicated workflow and Deploying a
nthoro application, or it could
be as simple as
providing a report with recommendat
bya customzed skills, and
live business
process Trequentiy necessitates the involvement of additional internal teams,
m o d e li n t o

technology.y

S t a g e1 0 : F e e d b a c k

tion receives feedback on the model's efectiveness and impact on the environment in wihicn
in the form of
The organisati

loyed by collecting
findings from the implemented model. Forinstance, feedback could come

apromotional campaign. Data scientists can utilize this feedback to improve accuracy the model's
rates to and
response

it. They can automate any or all of the feedback-gathering, model assessment retining,
by analysing
and ut t phases to speed up the model refresh process and improve results
redeployment p
scientists have a
The iterative nature of the problem-solving process is shown by this methodology's flow. As data
nderstanding of the data they typically return to a prior stage to make changes.
and models, aren t Models
situations
better and fforgotten about instead, they're constantly refined and adapted to changing
deployed,
once,
into it can
built
refinement, and redeployment. As a result, both the model and the labour that goes
feedback,
through f e e d b a c k ,

the business for as long as the solution is required


continue
to add value to

Ai Reboot +****a*ae******ere****ndkeaemaenene*

Fillin the blanks.

1 The problem methodology can be applied to real-life problems as wel.


in which the desired outcome is already known is called
2. Historical data
models
the data scientist will construct
3 In the modelling stage,

4 The methodology for model building and deplayment is an process.

Brainy Fact
Android smartphone app that allows bind and visually impaired people
Envision, the award-winning i0S and
announced plans to integrate its Al-powered
to independently access visual information around them,
software technology into Google Glass in March 2020. The combination of Envision's software and Google
less invasive and hands-free manner of
Glass gives blind and visually impaired users with a substantially
freedom and independence to access and 'see' thee
accessing the world around them, giving them greater
world around them.

Capstone Project 117


Experientlal Lea
Video Session earning
Scan the QR code or visit the following link to watch the video: How Envision Works

s/www.youtube.com/watch?v-9ehENnq2EFo&t=164s

ATter watching the video, answer the following question:


How envision glasses help visually impaired people?

(Model Validation
here are mainly two types of validation methods which are Train Test Evaluation
ct Split Evaluation and Cross V
and Cross
learn about them in detail. Validation et us
Train Test Split Evaluation
n e train test procedure
measures the performance of machine learning algorithms when
thev.
predictions data that was not used to train the model. While it is quick and easy to
on
use, it is Onl make
sui
uge data sets. The train test split cannot be utilised when the data sets are small and additional configurations -for
required, such as when the data set is not balanced.
are
The train test
split technique can be used to test machine learning algorithms for
classification and
problems. The technique divides the provided dataset into two subsets: and regression
The training dataset is used to fine-tune the machine learning model and train the algorithm.
Test dataset
algorithms make predictions using the input elements from the training data.
Reasons for Choosing Train Test Split Evaluation
The goal is to estimate the machine
learning model's performance on new data
was not that
used to tra
model. This is how the model in the real world. To put it another way, we want
want to
we use
to fit it to e the
data with known inputs and
outputs, then generate predictions ror fresh cases in the future where
know the expected outcome or we
existing
don
goal values.
Another reason to employ the train-test split assessment process, other than dataset size, is
efficiency. Some models are extremely expensive to train, making a repeated evaluation, as computation
tional
techniques, impossible. Deep neural network models are one example. Ihe train-test employed in other
employed in this situation. approach is widelv
Sometimes, a project already have a model working efficiently and a large dataset, but still may
may
overview of model require an
performance quickly. Again, the train-test split procedure is selected in this situation
Random selection is also used to divide
samples from the original training dataset into two subsets. This
that the train and test datasets are reflective of the ensures
original dataset. When the dataset available is small, the
train-test procedure is not
appropriate. The reason for this is that there will not be enough data in the
dataset for the model to learn an
appropriate mapping of inputs to outputs. There will also be insufficient training
data in
the test set to evaluate the model's
performance appropriately.
Configuring the Train Test Split
The size of the train and test sets is the
procedure's key configuration parameter. For either the train or test datasets,
this is usually given as a percentage between 0 and 1. For
that the test set will get the remaining
example, a training set with a size of 0.67 (67%) means
percentage of 0.33 (33%6).
118
Touchpad Artificial Intelligence-XIl
an ideal
as an split npercentage. Adata scientist determinesa split % that suits the project goals,
ideal split
thing
such following factors
h e r e
I s
no
n o 4

t h efo
account

i n t o

the m o d e l T h e computational cost of assessing the modet


taking training

of trainingsset
Representativenessof th
Cost Representativeness of the test set
commonly
c om are:
Total numberof examples
used

percentages
20%
split Test:
80%,
Test Set
T r a i n :
Test: 33% Training Set
67%,
Train:
Test:
50%
50%, p e r i e n t i a lL e a r n i n g

Train:

Video Session

code or visit the following link to watch the video: Training and Testing
Scan theQR
https/
s//www.youtube.com/watch?y=P2Nqrfp8usY

After watchinatching the


video, answer the following question:
is the
role ofdata in training a mode!?
What

in Python
Split Procedure
Train-Test
the train-test spit
tion in the scikit-learn Python machine learning package implements
function
train_testsplit) into two subsets.
The
procedure.
The function accepts a dataset as input and returns the dataset split
evaluation
the following statements:
use any of
You can 33)
X_train, Xtest, y_train, y_test =
train_test_split (%, y, test_si zem0.
OR train_size=0.67)
X_test, Y_Traln, Y_test
=

train_test_split (%, y,
rrain,
Example:
train and test sets
dataset into
# split a
import make_blobs
sklearn.datasets
from
sklearn.model_selection import train_testsplit
from
# create dataset

X, y make_blobs (n_samples=1000)
test sets
# split into train
X_test, Y_train, Y_test
=

train_ test_split (X, y, test size=0.500


x train,
Y_train.shape, Y_test.shape)
print (X_ train. shape, _test. shape,
Output:
(500, 2) (500, 2) (500,) (500,)
set and 50% (500) is for test set.
Out of 1000 samples 50% (500) for training
is

Cross-Validation Procedure
models on a small sample of data. The
Cross-validation is resampling technique for evaluating machine learning
a
should
which specifies the number of groups into which a given data sample
process includes only one parameter, k,
cross-validation. For example, k=10 for 10-fold
be divided. As a result, the process is frequently referred to as k-fold
cross-validation. It's a popular strategy since it's straightforward to grasp and produces a less biased or optimistic
estimate of model competence than other approaches, such as a simple train/test split.

Capstone Project 119


The
following is the general procedure
1Randomily shuffle the dataset
2 Organize the data into k groups
For each distinct
group, wite
As a holdout or test data set, use the group
As a training data set, use the remaining groupings
i t a model to the training set and test it against the test set.

Keep the evaluation score but toss out the model


Using the sample of model evaluation scores, summarise the model's ability

Test Data Training data

Iteration- 0 0 0 0 0 0 0 0 0
eration 2--o00To0o@000000o
Tteration3-0000000o09900000
Iheration& 000000000o O 000
All data

Video Session Experiential Learning


Scan the QR code or visit the following link to watch the video: K-Fold Cross validation Intro
Machine Learning -

to

https//www.
After
youtube.com/watch?v=Tlgfimp-4BA
watching the video, answer the following
What is the role of cross validation in
question
training and testing a model?

Train Test Split Vs


The increased
Cross-Validation
test split, these
computational load of doing cross-validation isn't a
are also the major concern on small
should be
problems where model quality scores would be datasets. With a train-
used if your dataset the least
is small. trustworthy. Cross-validation
For the same
reasons, for larger datasets, a simple train-test
enough data that reusing a split is sufficient. It will run
portion of it for a train test faster, and you
split is unnecessary. may have
Cross-validation is mostly the method of
This gives a choice since it allowS
good idea of how well your model to be trained on
Split procedure relies on your model
only one train-test
will
perform on data not seen before. On multiple train-test splits.
the other
split hand, Train Test

120
Touchpad Artificial Intelligence-Xll
ww
tricsof Model Quality-Loss Punction
used ines
by machine to learn. If's a way of determining how well a certain algorithm models the data
is
l o s s
functur ttoo
f u n
far ooff from the actual findings, the loss function will return a very largenumber Lcs function
o o far are
forecasts
ediction error over time with the help of some 'aptimization/objective function
prediction

fthe
tolower
such thing hod
jearns t her
there 1s no as a
one-size-fts-all
loss function Theers machine learning
typein of
learning,
l e a r n i n g ,

machine
hine derivatives,
o and, to some extent, the number of the dataset all pilay a role in
se o of calculating
In
e a s e

the

function for a
function certain task
used,

a
a
loss
loss

the type of learning job were dealing with, loss functions can be divided into two Cateo
s e l e c t i n g

Depend on

ses and classification losses. In dlassification, the output is predicted from a set of finite catego
ressioamole, given a large data set of photographs of handwritten numbers, categortsin of
values. For

ossion, on the other hand, is concerned with predicting a continuous value, sucn asgiver flocor area
Regr
0-9digi
ot
rooms,
n
room size
size, predicting the price of the house
number

Classification
Regression

Log Loss
Mean Square Error/Quadratic Loss
Focal Loss Mean Absolute Error

Huber Loss/Smooth Mean


KL divergence/ Relative Entropy
Absolute Error

Exponential Loss Log Cash Loss

Hinge Loss Quantile Loss

Squared Error)
MSE (Mean
The Mean Squared Error (MSE) is the most basic and widely used loss function, andit is frequently taught in Machine
arning courses. Calculate the difference between the model's predictions and the actual values, square it, and
aUerage it across the entire dataset to get the value of MSE. MSE is given by the equation:

MSE
MSE will never be negative because the erors are always squared.

Advantage
MSE is useful for ensuring that our trained model does not have any outlier predictions with significant errors
because MSE places a higher weight on these errors due to the squaring element of the function

Disadvantage
If our model makes a single particularly incorrect forecast, the squaring part of the function muitiplies the error
However, in many real-life cases, we don't worry about these outiers and instead seek a more fully-rounded model
that performs well enough on the majority of cases.

Capstone Project 121


Experiential Learning

Video Session watch the


video: Mean Squared Error MSE

following link to
S the QR code or visit the
dn
https//www.youtube.com/watch?v=Mhw-xHVmat

question:
answer the following
watching the video,
Arter

What do you mean by MSE?

the Mean squared error regressi

the mean squared_error


function gives loss.
Lalculating MSE in Python,
Om sklearn.metrics import mean_squared
erroO
values
ue [3, -0.5, 2, 7.2] #list ofactual
values
YPred [2.5, 0.0, 2.3, 81 #list of predicted y_pred))
"Teturns MSE value
Lnt CMSE value-", mean_squared
error (y_true,

#Using nested lists


Y_true =
11, [-1, 01, [7, -5]]
[0.5,
-5.5]1
Y_pred =I[o, 21, [-1, 1.5], [8, y_prea)
pElnt("MSE value=", me an _squared error (y_true,

Output:
MSE value= 0.3074999999999999

MSE value= 0.7916666666666667

RMSE (Root Mean Square Error)


The Root Mean Square Error (RMSE) is a metric for determining how well a regression line fits the data points na.SE
residuals. Recall that the residual is the difference hat
between
can also be understood standard deviation in the
as the
Line.
the predicted value and observed value in the Regression

120

100
80 Residual
Error
0

40

20

0 1 4 5 7 9 10
No. of Hours Studied (X)

122 Touchpad Artificial Intelligence-Xll


calculated as
S oR M S E

RMSE- Predicted Actualy


are squared before being averaged
T h ee r r o r s

This in RMSE.
higherweight.This
ght. that RMSE is far
suggests basically
means that RMSE gives larger mistake
p a c to n t h e
model's perform
more
rformance. This characteristic beneficial
when substantial errors exist and have a significant
is
taking the absolu value of the c
error. The RMSE important inin many
mportant many mathermatical calculations since it
of a
a v o l d s

value,
t h e higherthe
the model's performance. good model shouldd be less than 180. The lower the
RMSE
before being
uared before
errors
are
squared
averaged in RMSE. This basically
The
weight.
This suggests that RMSE is far
more beneficial
means that RMSE gives larger mistakes a
RMSE gives larger
when substant exist and have a significant
higon the model's performance. errors
imp teristic is important many mathematical in

This C MSE of a good model should be less calculations since it avoids taking the absolt
solute value of the
than 180. The lower
eTor
the RMSE value. the higher the model
performance

Experiential Learning

Video Session
Cean the QR cOde or visIt the
rollowing link to watch the
videa: UOLVOS Calculating RMSE in Excel
https://wwwy
tube.com/watch?v=G8j8KAJUlw
the video, answer the
After watching ollowing question
What do you
mean
by RMSE?

Calculating RMSE in Python

import numpy
as np

y pred n p . array ( (0. 000, 0.166, 0.333]) #predicted values

Y_true np.array( [0.000, 0.254, 0.998]) #actual values


def rmse (predictions, targets) :

diff =predictions targets


diff_sq= diff ** 2

mean_diff_sq = diff_sq.mean ()

rmse_val = n p . s q r t (mean_diff_sq)

return rmse val

print ("predicted values are: "+ str (["s.4EM 8 i for i in y_pred] ))


print ("actual values are: "+str ("a.4f" i for i in y_true] )
rmse_val = rmse (y_pred, Y_true)

print ("RMS Error is: " + s t r (rmse_val))

Capstone Project 123


Output:
predicted values [ro.0000, 0.1660, 03330
are:
actual
values are: [O.0000, 0.2540', '0.9980
RMS Error is:
0.38728499411 50143 need to alter your
t e a t u r e Or weak
your
your hyperpar.
hyperparameters.
Ou nave a larger RMSE value, you will most likely

Hyperparameters the learning process. Tney also determine the v


are parameters whose
values govern
drdmeters are top-lever
pardmeters that regulate thees of
algorithm. They
odet parameters learned by a learning
that come from it, as the pretix
nyper suggests. Since the mo learning
PrOcess and the model parameters
are said to
be external to the model. Some cannot
OOity its values during learning/training, hyperparameters exampler
of hyperparameters are:

The ratio of train-test split


Optimization algorithms' learning rate (eg gradient descen
ReLU, Tanh)
function selected (eg. Sigmoid,
n a neural network, the activation

The loss function that the model will employ


A neural network's number of hidden layers
to train a neural network.
The number of iterations (epochs) required
A clustering task's number of clusters

A tAt a Glance
independent, and final project undertaken
as part of a curriculum
m
comprehensive, has acquired.
A capstone project is
a
a student
and expertise
the skills, knowledge, evaluation of the project-related
designed to assess
a basic analysis
and
process requires
A successful problem-defining
and methods. problems.
problems, their reasons,
solution-based approach to solving
a
methodology provides which can be applied to real-life
Design Thinking decomposition
methodology
we follow problem
During coding, approach to
problems as wel. scientist can define an analytical
stated, the data
is clearly
Once the business problem
solving the problem for the data.
characterizes the requirements
chosen data sources (structured,
analytical approach
avallable
The data scientists identify
data collection phase,
During the initial s e m i - s t r u c t u r e d )
area.
relevant to the problem
o n constructing
and data set, focuses
unstructured,
the initial version of the prepared
which begins
with
stated analytic approach.
The modelling stage, on the previously to determine its
models based
predictive o r descriptive and before deployment
development
reviews the model
during business problem.
a n s w e r s the
The data scientist and completely need to
that it correctly algorithms when they
quality and
ensure
performance of machine learning
m e a s u r e s the
procedure the model.
The train test that w a s not
used to train
the algorithm.
predictions on data model and train
make the machine learning
dataset is used
to fine-tune
elements from the
training data.
The training the input a small
sample
predictions using
make
learning
models on

Test dataset algorithms


machine
technique for
evaluating
Cross-validation is a resampling

of data.
OO OO OO OO Oo

OO O OO OOOO i oo
N

OO ooE oo oO/ oo Oo/ oo O0 O OO OO

OO O O oO oO OO OOa oO OO OO OO OO
OO OO OO OO OO

OO oo OO OOOO OO OOOO OO0OE OO


can be applied to the dataset
and
statistics the data,

26. Techniques such as descriptive initial insights


dssess the content,
quality and
about

b. Evaluation CBSE, 20722


d. Analysis
a. Visualization

C.Modelling
for evaluating
the
pertormance
of a
algorithm. (CBSE, 207
n e
Train-Test Split
is a technique b.

d.
Data Science

Prediction
2021
a. Rule-Based
O
C.Machine Learning diferent
subsets of the data to get
on
modelling process
28. In werun our

multiple measures of model quality b. Cross-Validation CBSE, 2022


a. Train-Test Split d. Validation

CMachineLearning for predictive modelling.

29. The data scientist will use a -

b. Training set CBSE, 2022


a. Algorithm d. Data preparation
C. Data compilation sufficiently larae
when there is a
is appropriate
30 Assertion (A): The Train-Test procedure

dataset available.
accurate is the prediction.
CBSE, 2022
the more

Reason (R): The larger the dataset, correct explanation of Ace le


() Is the
(A) and Reason (R)
are true and
Reason
sertion (A)
a. Both Assertion
Reason (R) iS not the correct explanation
Reason (R) are true, but
b. Both Assertion (A) and
of Assertion (A)
Assertion (A) is true, but
Reason (R) is false
C.

Reason (R) is true.


d. Assertion (A) is false, but
and classification predicts a label.
31. Regression functions predict
a CBSE, 2022
b. quantity
a. output
d. logic
C. loss
formula for calculating RMSE? CBSE, 2022
32. Which of the following is a correct

N
b. RMSE=2(Predicted i -Actual iN
a. RMSE 2(Predicted i-Actual i) O
N
2(Actual i- Predicted i)? d. RMSE= 2(Actual i-Predicted iN
C RMSE O
N N

33 Consider the following data. Identify which of the following commands are correct to use split0: [CBSE, 2022

Y Month day FFMC DMC DC ISI RH wind rain area

178 2 wed 90.1 82.9 735.7 6.2 45 2.2 0.0 4.88


sep
90.3 80.7 730.2 6.3 62 4.5 0.0 0.00
35 6 3 sep tue

75 9 9 feb thu 84.2 6.8 26.6 7.7 79 3.1 0.0 0.00

491 4 thu 95.8 152.0 624.1 13.8 21 4.5 0.0 0.00


4 aug

464 1.9 77 5.4 0.0 2.14


6 4 feb tue 75.1 4.4 16.2

128 Touchpad Artificial Intelligence-Xll


oooos Ooo0S ooo0
into two subsets

technique
divides the provided
dataset

k-foldIcross-valie
cross-validation.
training and test subsey
10. Assertion (A): A split referred to as
is frequently
Reason (R): As a result, the process above:
statements given
the
the appropriate option for
elect
explanation of A
correct
and Ris the
5oth A and R are true of A
correct explanation
not the
R are true and R
is
both A and

CAis true but R is false

d. Ais False but Ris true

type of question CBSE Sample


Which of these is NOT analytic based
on
b. Statistical Analysis Paper, 202)
a. Descriptive d. Data evaluation

Forecasting
Which of the following statements is/are
INCORRECT: CBSE Sample Paper,
.
train the same machine learning model. 2022
)Different transforms of the data used to
on the same data.
cannot be trained
1) Different machine learning models the same data
on
model trained
machine learning
D i f f e r e n t configurationsfor a

O b.i
a.
d. Both i) & i)
C. Both i) & i) O
action, then which analytic approach
If the problem is based on probabilities of an
V. CBSE Sample Paper,
can be used?
b. Prescriptive 2022
a. Predictive Model
d. Descriptive
Diagnostic
is in terms of being able to predict
V.A loss function is a measure of how good a prediction model
the expected outcome. CBSE, 2022)
lower number, if the predictions are good.
(0 The loss function will output a

if the predictions are incorrect.


() The loss function will output a greater number,
Choose the correct option
b. Only (i) is correct
a. Only (0 is correct

C.Either () or (i) is correct O d. Both () and () are correct

VI. Which of these are common split percentages between Train and Test Data? CBSE, 2022
() Train: 5%, Test: 95%

() Train: 50%, Test. 50%

(i) Train: 80%, Test 20%

(IV) Train: 67%, Test: 33%


Choose the correct option:
a. (0 and (i) b. (Gi), (Gi) and (v)

C. (i) and (v) d. (0,(i) and (ii)


vii. Which of the following is a key success factor to be considered in designing an Al model ? [CBSE, 202
Initiative b. Visual modelling
C. Effective evaluation O d. Model validation

130 Touchpad Artificial Intelligence-Xll


Exercise
Solved Questions

the blanks
Fill in
A.
The technique i5 used for
evaluating an Al model and splits the dataset into two set
1
Al model is used to
forecast trends for a
product
is a set of historical data
A in which the outromes are known
beforehand
is the sum of
squared distances between our actual values and
4 predicted values
determines how well a certain
5
A
algorithm models the data
State whether the following statements are true orfalse.
Hyperparameters are internal to an Al model
1
There is no such thing as an ideal split percentage

Cross-validation is used for evaluating machine learning models an a large sample of data.
3
Every project starts with a business understanding
4

The data collection stage, which begins with the initial version of the
5. prepared data set,
focuses on constructing predictive or descriptive models

Match the Following.


C.
1 MSE a Problem Decomposition
2. Test dataset b. sensitive to outliers
3. Real life problems
Cbrainstorming
Ideate in DT d.
4. quantity
5 Regression functions predict e Evaluation stage
D. Short answer type questions.

1 What is a loss function? Name any two Regression Loss functions

Ans. A loss function is used by machines to learn. It's a way of determining how well a certain algorithm models
the data. If the forecasts are too far off from the actual findings, the loss function will return a very large
number. Loss function learns to lower prediction error over time with the help of some "optimization/objective
function Regression Loss functions are RMSE and MSE
2 Can MSE be a negative value? Why/Why not? Give the equation to calculate MSE

Ans. MSE cannot be a negative value. The difference between the predicted and actual values can be negative
However, these differences are squared. Hence, all results either positive
are or
zero
3. What is meant by the iterative nature of the problem-solving methodology?
Ans. As data scientists have better
a understanding of the data and models, they typically return to a prior stage to
make changes. Models aren't built once, deployed, and forgotten about instead, they're constantly refined and
adapted to changing situations through feedback, refinement, and redeployment. As a result, both the model
and the labour that goes into it can continue to add value to the business for as
long as the solution is required.
Hence the problem-solving methodology is iterative in nature.

Capstone Project 131


agree? Why hy/Why not?
model". Do you
better is the
4. The lower the value of MSE, the ofsquares for all data
of the s u m o
he

Ans. MSE is calculated for a gression


line as the
average

value. The
lower the MSE points. MSE is used ts
he closer the
see how close an estimate
or
forecast is to
model.
an actual
forecast is to the
indicate a better
dctual. So, smaller values

of model effectiveness
5 by feedback the envire
What is meant

the model's
effectiveness and impact on

vironment in
edbackwhito cihmprove
on
Ans receives feedback Data scientists utilize the.
s.
Tne organisation from the implemented
model.
it ax
deployed by collecting findings all of the feedha.

eedback-gathering,
or
automate any
it. They can
the model's accuracy and utility by analysing adel
the model refresh process and
r

and redeployment
phases to speed up improve mode r
assessment, retining. results
E Long answer type questions.
Explain the cross-validation procedure.
1 machine
small els o
learning models onn aa
resampling technique for evaluating
s
Ans. Cros-validation is a
which specifies the number of groups into w k mple of data
k,
The process includes only one parameter,
strategy since
it's straightforward to grasp and Drod.
roduces given data
sample should be divided. It's a popular less a

or optimistic estimate of model competence


Suchh
than other approacnes,
aS aa
as
simple t
simple
train/test split.biased
The
following is the general procedure:

i Randomly shuffle the dataset.


ii. Organize the data into k groups.

ii. For each distinct group:


A s a test data set, take a group.
F o r training data set, use the remaining groupings.
evaluate it against the test set.
Fit the model to the training set and

Keep the evaluation score but toss out the model.

IV. Using the sample of model evaluation scores, summarise the model's ability.
What is Train Test Split Evaluation? State the reasons for choosing this technique.
2.
Ans. The train test procedure measures the performance of machine learning algorithms when they need to
make
predictions on data that was not used to train the model. The technique divides the provided dataset into h
two
subsets: the training dataset and test dataset. The reasons for choosing this technique are:

Large dataset.
.To estimate the machine learning model's performance on new data that was not used to train the model

Better computational efficiency.


A quick overview of model performance
3. Explain the 3 stages of data preparation.
Ans. Stage 4: Data Collection: During the inita data collection phase, data scientists identify available data sources
(structured, unstructured, and semi-structured) relevant to the problem area. If there is a gap in data collection,
the data scientist may need to modify data requirements accordingly and collect new and/or more data.
Stage 5: Data Understanding: After the initial data collection, techniques such as descriptive statistics and
visualizations can be applied to datasets to evaluate the content, quality, and initial insights of the data
Additional data collection may be required to fill the
gap
Stage 6: Data Preparation: This stage contains all the activities to build the dataset used in the
modeling stage. Activities to prepare data include
subsequel
data cleansing (handling missing invalid
or values, removing duplicates, applying correct formats),
132
Touchpad Artificial Inteligence-XIl
wwy
joiningdata from ultiple sources (files, tables, platforms, and
Conversion of data to more useful variables

J5ually
preparation is usu the most time-consuming procedure in a data sience project Automating certain
Data
eparation steps in advance can speed up the process by minimizing ad hoc preparation tin

What
hyperpararmeters/ What i5 their purpose? Give examples of few hyperparameters.
4
Ans. Hyperparameters
parameters whose values govern the learning proce They are top-level parameters
regulate
the Ilearning process and the model parameters that comefrom it, as the prefix hyper suggests
that

the model annot modify its values during learning/training,


cann arameters are said to be external top
Since
the model. Some examples of hyperpararneters are
The ratio of train-test split

Optimization algorithms learning rate (eg gradient descent)


The loss function that the model will employ
A neural networks number of hidden layers
A clustering task's number of clusters

Explainthe purpose of evaluation and deployment stage


5.
Ans. Evaluation Stage
The data scientist

utilizes a testing set tor predictive models (which is separate from the training set but follows the same

probability distribution and has a known outcome) The testing set is used to assess the model and adjust

it as necessary.

. For a final assessment, the final model is sometimes applied to a validation set as well

In addition, data scientists can use statistical significance tests to verify the model's accuracy.

Deployment Stage
The model is deployed into the production environment or an equivalent test environment once it has been
built and authorized by the business sponsors. It is usually used in a restricted capacity until its effectiveness
has been thoroughly assessed. Deploying a model into a live business process frequently necessitates the
involvement of additional internal teams, skill, and technology

Unsolved Questions
A. Fill in the blanks.

1 The dataset is used to evaluate the model and adjust it as necessary.

2 Since the model cannot modify its values during learning, are said to be external to the model

cannot be a negative value.

4. The model is into the production environment or an equivalent test environment once it has
been built
means handling missing or invalid values, removing duplicates, applying correct formats after
the data has been collected.
B. State whether the following statements are true or false.
1. Modelling is usually the most time-consuming procedure in a data science project

Capstone Project 133


w N
inter-Disciplinary

#A Lab
following values of x and y
onsider the
x40 42 44 46 48 50 52 54 58 60
y 42 45 47 44 50 48 49 50 55 58
Line equatiom y=15.142+
Regression

MSE from the above information


Calculate MSE

Answers
AI Quiz
A. 1.b 2. d 3. a 4. b 5.C 6. d 7.b 8.c 9.d 10. a 11. b 12. b
13. c 14. b 15. b 16. a 17. a 18. d 19. a 20. b 21. b 22. c 23. d 24. b
25. b 26. a 27.c 28. b 29. b 30. a 31. b 32. a 33. a 34. a 35. b
B. i.1. b 2.c 3. d 4.a 5. b 6. a 7.d 8.b 9.C 10. c
i. d ii. b IV. a V.d Vi. b VIl d

Exercise
A. 1. Train-Test-Split 2. predictive 3. training set 4. Mean Squared Error (MSE) 5. loss function
B. 1. False 2. True 3. False 4. True 5. False
C. 1.b 2.e 3. a 4.C 5. d

Capstone Project 135

You might also like