Professional Documents
Culture Documents
CAPSTONE PROJECT
ww
Learning Outcomes
the Capstone Project
Understanding Understanding/Defining the Probiem
Problem Decomposing
using Design Thinking Framework Using an Analytical Approach
Model Validation
Metrics af Model Quality-Loss Function
i f i c i a l Intelligence A Is aiu adround us. Everyone is leveraging Al applications to make their life better
Ar
started using Al algorithms to detect suspicious data patterns to identify amination centres
CBSE has
Even
hat indulge in malpractices during examinations conducted by CBSE. Organisations are using Al in customer
i c e , human resource, Il automation, security, and more. in fact, India's Al market is expected to reach $7.8
billion by 2025.
will learn about the meaning of the capstone project, appily concepts learned in class Xi to
to this unit, you
cnlve real-world problems. Next, you will understand which algorithm to apply for your Al model and also learn
ahout different types of testing methodologies. At the end of the chapter you will learn about the concept of
Loss functions.
method, developing a health plan, reseaching a character or event in history, or even the composition of a sketch
or a play.
No matter what type of project you choose to undertake, the result is the same. You can demonstrate your
understanding of the course material learned and demonstrate your readiness to enter in the professional warld to
kickstart your career. It's a rewarding experience if it's done well.
So, a capstone project provides an opportunity for students to integrate all the knowledge learned during the
course and demonstrate it. Ideally, a great capstone project is one that
contemporary
and helps in your career advancement
Experiential Learning
Video Session
Scan the QR code or visit the following link to watch the video: wnat is a Capstone Project
https://wwwyoutube.com/watch?v=yBs2Vb5H54
ATer watching the video, answer the following question:
What is a Capstone Project? What is its purpose?
Some of the examples of capstone projects in Al, from which you can pick up to develop, are as followe:
six steps:
Al modelling Evaluation
H Deployment
Understanding the nature of a problem provides insight into the components and attributes of the
yet-to-be-implemented solutions. A good understanding of a problem guides future decisions to make at the
Al mode
project stages, especially decisions such as determining if an Al solution is even feasible. At the core of every
using AL.
is 'finding patterns in data. If the data shows no patterns, then most probably the problem cannot be solved
Just
a
reference point forra thorough analysis of the problem and the probler
that surround it.
as
following questions can save you weeks and months working on oblems that proved
below o ws e r v e
s p e n d i n gtinme a n s w e r i n g t
previously
unknown reasons
unknown
for
cible roblem that needs to be resolved?
Whatisthe
need
a solution to your problem?
a
d oy o u
Why
work on the solution to the
problem?
we
should
category
of data will be involved? (Clasification)
Which
how many (Regression)
much or
How
be grouped
? (Clustering)
the data
Can
unusual pattern in the data? (Anomaly Detection)
Is there any
option
should be given to the customer? (Recommendation)
Which solve your
to
determine which of these questions you' re asking, and in what wayanswering it helps
t i s e s s e n t i a l
p r o b l e m .
Framework
Problem Decomposing using Design Thinking
a solution-based approach to solving problems.
It is very useful
Recall that Design Thinking methodology provides the problem in
in tackling complex problems by
understanding the needs of the people involved, reformulating
to prototyping and testing. By
a human-centered way, brainstorming
many ideas, and taking a practical approach
can solve complex problems that arise
around us. Let's
understanding the five phases of design thinking, anyone
recall the 5 stages as follows:
Process
Design Thinking: A 5 Stage
**~*
.-
ask yourself
however,y can also
or your colleagues)
2. break down the problem into several big parts. Write them down on paper
Divide any larger complicated part into smaller pieces. Continue this until all parts are small.
3.
4. Code the smaller parts one by one. Use the following methodology:
decomposing:
following while
What colour combination can be used?
How many web pages are to be included on the website?
Experiential Learning
Video Session
Scan the QR code or visit the following link to watch the video: Introduction to Decomposition
https://www.youtube.com/watch?v=rsYpP2-omg
answer the following questions
After watching the video,
What is decomposition?
ta
d a t a tcomes or explore underlying patterns, for information gathering purposes. Organizations
predict
to
information to take actions to ideally improve future outcomes There aremany rapidly evolving
m o d e l
builda
can
thenfr data analysis and model building. In a remarkably short time, there have been significant
technologi
ments in the quality and accuracy of the models too
improvements
more accessible and prevalent, data scientists need a core methodology that can
becomes
analytics
fnis
data
regardless of the technology, the volume of relevant data, or the approach.
As
quiding strategy,
pIon that form an
ay emphasizes many of the new approaches in data science. It consists of 10 steps
context of the
metnrocess using data to discover information. Each step plays an important role in the
iterativ
o v e r a l l m e t h o d o l o g y
Business
h
Analytic
understanding approach
Data
Feedback
requirements
Data
Deployment
collection
Data
Evaluation
understanding
Data
Modelling
preparation
Business Understanding
Stage 1:
Fverv project starts
with a business understanding. Business sponsors, who need analytical solutions, play the most
in the problem, project objectives, and salution requirements from a business
important role at this time defining
perspective. This fist step lys the foundation for successful business problem solving and is perhaps the hardest
To help ensure project success, sponsors should be involved throughout the project to provide expert knowiedge
review interim conclusions and ensure that work remains on track to produce the intended solution.
Data
preparation is usually the most time-consuming procedure in a data SCience project. In many a
data
preparation procedures are common for a variety of problems. Automating certain data prenarss
.
Some
advance can speed up the process
by minimizing ad hoc preparation time. Today's tion st
parallel systems and analytics capabilities where data is stored allow data scientists to high-performancopS
in
prepare data mor
ore easily and
quickly using very large datasets.
Stage 7: Modelling
The
modelling stage, which begins with the initial version of the prepared data set, focuses
on conch.
predictive or descriptive models based on the previously stated analytic approach. To
data scientists develop a prediction mo
employ a training set (historical data in which the desired outcome is already odel,
As businesses receive known).
intermediate insights, the modelling process is often very iterative,
data preparation and model formulation. Data leading to refinemente in
scientists may attempt numerous
algorithms with their respective
parameters for a specific technique to get the best model for the available variables.
Stage 8: Evaluation
The data scientist
116
Touchpad Artificial Intelligence-XIl
ww
ment, the final model is sometimes applied to a validation set as well
f i n a la s s e s s m e n t , i
ientists can use statistical significance tests to verify the model's accuracy This additional eviden
ition, d a t a sCie
nadditie nodel deployment or take action when the stakes are high, such as with an expensive addiu onal
model
deplov
justity
couldhelp a kkey
or a
e aviation flight system
ould p r o c e d u r e
medical
S t a g e9 : D e p l o y m e n t
been
production environment or an equivalent test nvironment once it has
deployed into tthe
deployed
is nias
It is usually used in a restricted capacity until its effectiveness
zed by the business sponsors.
model
The
b u i l ta n d a u t h
scoring process run
dhlv assessed. The model might be embedded in a complicated workflow and Deploying a
nthoro application, or it could
be as simple as
providing a report with recommendat
bya customzed skills, and
live business
process Trequentiy necessitates the involvement of additional internal teams,
m o d e li n t o
technology.y
S t a g e1 0 : F e e d b a c k
tion receives feedback on the model's efectiveness and impact on the environment in wihicn
in the form of
The organisati
loyed by collecting
findings from the implemented model. Forinstance, feedback could come
apromotional campaign. Data scientists can utilize this feedback to improve accuracy the model's
rates to and
response
it. They can automate any or all of the feedback-gathering, model assessment retining,
by analysing
and ut t phases to speed up the model refresh process and improve results
redeployment p
scientists have a
The iterative nature of the problem-solving process is shown by this methodology's flow. As data
nderstanding of the data they typically return to a prior stage to make changes.
and models, aren t Models
situations
better and fforgotten about instead, they're constantly refined and adapted to changing
deployed,
once,
into it can
built
refinement, and redeployment. As a result, both the model and the labour that goes
feedback,
through f e e d b a c k ,
Ai Reboot +****a*ae******ere****ndkeaemaenene*
Brainy Fact
Android smartphone app that allows bind and visually impaired people
Envision, the award-winning i0S and
announced plans to integrate its Al-powered
to independently access visual information around them,
software technology into Google Glass in March 2020. The combination of Envision's software and Google
less invasive and hands-free manner of
Glass gives blind and visually impaired users with a substantially
freedom and independence to access and 'see' thee
accessing the world around them, giving them greater
world around them.
s/www.youtube.com/watch?v-9ehENnq2EFo&t=164s
(Model Validation
here are mainly two types of validation methods which are Train Test Evaluation
ct Split Evaluation and Cross V
and Cross
learn about them in detail. Validation et us
Train Test Split Evaluation
n e train test procedure
measures the performance of machine learning algorithms when
thev.
predictions data that was not used to train the model. While it is quick and easy to
on
use, it is Onl make
sui
uge data sets. The train test split cannot be utilised when the data sets are small and additional configurations -for
required, such as when the data set is not balanced.
are
The train test
split technique can be used to test machine learning algorithms for
classification and
problems. The technique divides the provided dataset into two subsets: and regression
The training dataset is used to fine-tune the machine learning model and train the algorithm.
Test dataset
algorithms make predictions using the input elements from the training data.
Reasons for Choosing Train Test Split Evaluation
The goal is to estimate the machine
learning model's performance on new data
was not that
used to tra
model. This is how the model in the real world. To put it another way, we want
want to
we use
to fit it to e the
data with known inputs and
outputs, then generate predictions ror fresh cases in the future where
know the expected outcome or we
existing
don
goal values.
Another reason to employ the train-test split assessment process, other than dataset size, is
efficiency. Some models are extremely expensive to train, making a repeated evaluation, as computation
tional
techniques, impossible. Deep neural network models are one example. Ihe train-test employed in other
employed in this situation. approach is widelv
Sometimes, a project already have a model working efficiently and a large dataset, but still may
may
overview of model require an
performance quickly. Again, the train-test split procedure is selected in this situation
Random selection is also used to divide
samples from the original training dataset into two subsets. This
that the train and test datasets are reflective of the ensures
original dataset. When the dataset available is small, the
train-test procedure is not
appropriate. The reason for this is that there will not be enough data in the
dataset for the model to learn an
appropriate mapping of inputs to outputs. There will also be insufficient training
data in
the test set to evaluate the model's
performance appropriately.
Configuring the Train Test Split
The size of the train and test sets is the
procedure's key configuration parameter. For either the train or test datasets,
this is usually given as a percentage between 0 and 1. For
that the test set will get the remaining
example, a training set with a size of 0.67 (67%) means
percentage of 0.33 (33%6).
118
Touchpad Artificial Intelligence-XIl
an ideal
as an split npercentage. Adata scientist determinesa split % that suits the project goals,
ideal split
thing
such following factors
h e r e
I s
no
n o 4
t h efo
account
i n t o
of trainingsset
Representativenessof th
Cost Representativeness of the test set
commonly
c om are:
Total numberof examples
used
percentages
20%
split Test:
80%,
Test Set
T r a i n :
Test: 33% Training Set
67%,
Train:
Test:
50%
50%, p e r i e n t i a lL e a r n i n g
Train:
Video Session
code or visit the following link to watch the video: Training and Testing
Scan theQR
https/
s//www.youtube.com/watch?y=P2Nqrfp8usY
in Python
Split Procedure
Train-Test
the train-test spit
tion in the scikit-learn Python machine learning package implements
function
train_testsplit) into two subsets.
The
procedure.
The function accepts a dataset as input and returns the dataset split
evaluation
the following statements:
use any of
You can 33)
X_train, Xtest, y_train, y_test =
train_test_split (%, y, test_si zem0.
OR train_size=0.67)
X_test, Y_Traln, Y_test
=
train_test_split (%, y,
rrain,
Example:
train and test sets
dataset into
# split a
import make_blobs
sklearn.datasets
from
sklearn.model_selection import train_testsplit
from
# create dataset
X, y make_blobs (n_samples=1000)
test sets
# split into train
X_test, Y_train, Y_test
=
Cross-Validation Procedure
models on a small sample of data. The
Cross-validation is resampling technique for evaluating machine learning
a
should
which specifies the number of groups into which a given data sample
process includes only one parameter, k,
cross-validation. For example, k=10 for 10-fold
be divided. As a result, the process is frequently referred to as k-fold
cross-validation. It's a popular strategy since it's straightforward to grasp and produces a less biased or optimistic
estimate of model competence than other approaches, such as a simple train/test split.
Iteration- 0 0 0 0 0 0 0 0 0
eration 2--o00To0o@000000o
Tteration3-0000000o09900000
Iheration& 000000000o O 000
All data
to
https//www.
After
youtube.com/watch?v=Tlgfimp-4BA
watching the video, answer the following
What is the role of cross validation in
question
training and testing a model?
120
Touchpad Artificial Intelligence-Xll
ww
tricsof Model Quality-Loss Punction
used ines
by machine to learn. If's a way of determining how well a certain algorithm models the data
is
l o s s
functur ttoo
f u n
far ooff from the actual findings, the loss function will return a very largenumber Lcs function
o o far are
forecasts
ediction error over time with the help of some 'aptimization/objective function
prediction
fthe
tolower
such thing hod
jearns t her
there 1s no as a
one-size-fts-all
loss function Theers machine learning
typein of
learning,
l e a r n i n g ,
machine
hine derivatives,
o and, to some extent, the number of the dataset all pilay a role in
se o of calculating
In
e a s e
the
function for a
function certain task
used,
a
a
loss
loss
the type of learning job were dealing with, loss functions can be divided into two Cateo
s e l e c t i n g
Depend on
ses and classification losses. In dlassification, the output is predicted from a set of finite catego
ressioamole, given a large data set of photographs of handwritten numbers, categortsin of
values. For
ossion, on the other hand, is concerned with predicting a continuous value, sucn asgiver flocor area
Regr
0-9digi
ot
rooms,
n
room size
size, predicting the price of the house
number
Classification
Regression
Log Loss
Mean Square Error/Quadratic Loss
Focal Loss Mean Absolute Error
Squared Error)
MSE (Mean
The Mean Squared Error (MSE) is the most basic and widely used loss function, andit is frequently taught in Machine
arning courses. Calculate the difference between the model's predictions and the actual values, square it, and
aUerage it across the entire dataset to get the value of MSE. MSE is given by the equation:
MSE
MSE will never be negative because the erors are always squared.
Advantage
MSE is useful for ensuring that our trained model does not have any outlier predictions with significant errors
because MSE places a higher weight on these errors due to the squaring element of the function
Disadvantage
If our model makes a single particularly incorrect forecast, the squaring part of the function muitiplies the error
However, in many real-life cases, we don't worry about these outiers and instead seek a more fully-rounded model
that performs well enough on the majority of cases.
following link to
S the QR code or visit the
dn
https//www.youtube.com/watch?v=Mhw-xHVmat
question:
answer the following
watching the video,
Arter
Output:
MSE value= 0.3074999999999999
120
100
80 Residual
Error
0
40
20
0 1 4 5 7 9 10
No. of Hours Studied (X)
This in RMSE.
higherweight.This
ght. that RMSE is far
suggests basically
means that RMSE gives larger mistake
p a c to n t h e
model's perform
more
rformance. This characteristic beneficial
when substantial errors exist and have a significant
is
taking the absolu value of the c
error. The RMSE important inin many
mportant many mathermatical calculations since it
of a
a v o l d s
value,
t h e higherthe
the model's performance. good model shouldd be less than 180. The lower the
RMSE
before being
uared before
errors
are
squared
averaged in RMSE. This basically
The
weight.
This suggests that RMSE is far
more beneficial
means that RMSE gives larger mistakes a
RMSE gives larger
when substant exist and have a significant
higon the model's performance. errors
imp teristic is important many mathematical in
This C MSE of a good model should be less calculations since it avoids taking the absolt
solute value of the
than 180. The lower
eTor
the RMSE value. the higher the model
performance
Experiential Learning
Video Session
Cean the QR cOde or visIt the
rollowing link to watch the
videa: UOLVOS Calculating RMSE in Excel
https://wwwy
tube.com/watch?v=G8j8KAJUlw
the video, answer the
After watching ollowing question
What do you
mean
by RMSE?
import numpy
as np
mean_diff_sq = diff_sq.mean ()
rmse_val = n p . s q r t (mean_diff_sq)
A tAt a Glance
independent, and final project undertaken
as part of a curriculum
m
comprehensive, has acquired.
A capstone project is
a
a student
and expertise
the skills, knowledge, evaluation of the project-related
designed to assess
a basic analysis
and
process requires
A successful problem-defining
and methods. problems.
problems, their reasons,
solution-based approach to solving
a
methodology provides which can be applied to real-life
Design Thinking decomposition
methodology
we follow problem
During coding, approach to
problems as wel. scientist can define an analytical
stated, the data
is clearly
Once the business problem
solving the problem for the data.
characterizes the requirements
chosen data sources (structured,
analytical approach
avallable
The data scientists identify
data collection phase,
During the initial s e m i - s t r u c t u r e d )
area.
relevant to the problem
o n constructing
and data set, focuses
unstructured,
the initial version of the prepared
which begins
with
stated analytic approach.
The modelling stage, on the previously to determine its
models based
predictive o r descriptive and before deployment
development
reviews the model
during business problem.
a n s w e r s the
The data scientist and completely need to
that it correctly algorithms when they
quality and
ensure
performance of machine learning
m e a s u r e s the
procedure the model.
The train test that w a s not
used to train
the algorithm.
predictions on data model and train
make the machine learning
dataset is used
to fine-tune
elements from the
training data.
The training the input a small
sample
predictions using
make
learning
models on
of data.
OO OO OO OO Oo
OO O OO OOOO i oo
N
OO O O oO oO OO OOa oO OO OO OO OO
OO OO OO OO OO
C.Modelling
for evaluating
the
pertormance
of a
algorithm. (CBSE, 207
n e
Train-Test Split
is a technique b.
d.
Data Science
Prediction
2021
a. Rule-Based
O
C.Machine Learning diferent
subsets of the data to get
on
modelling process
28. In werun our
dataset available.
accurate is the prediction.
CBSE, 2022
the more
N
b. RMSE=2(Predicted i -Actual iN
a. RMSE 2(Predicted i-Actual i) O
N
2(Actual i- Predicted i)? d. RMSE= 2(Actual i-Predicted iN
C RMSE O
N N
33 Consider the following data. Identify which of the following commands are correct to use split0: [CBSE, 2022
technique
divides the provided
dataset
k-foldIcross-valie
cross-validation.
training and test subsey
10. Assertion (A): A split referred to as
is frequently
Reason (R): As a result, the process above:
statements given
the
the appropriate option for
elect
explanation of A
correct
and Ris the
5oth A and R are true of A
correct explanation
not the
R are true and R
is
both A and
Forecasting
Which of the following statements is/are
INCORRECT: CBSE Sample Paper,
.
train the same machine learning model. 2022
)Different transforms of the data used to
on the same data.
cannot be trained
1) Different machine learning models the same data
on
model trained
machine learning
D i f f e r e n t configurationsfor a
O b.i
a.
d. Both i) & i)
C. Both i) & i) O
action, then which analytic approach
If the problem is based on probabilities of an
V. CBSE Sample Paper,
can be used?
b. Prescriptive 2022
a. Predictive Model
d. Descriptive
Diagnostic
is in terms of being able to predict
V.A loss function is a measure of how good a prediction model
the expected outcome. CBSE, 2022)
lower number, if the predictions are good.
(0 The loss function will output a
VI. Which of these are common split percentages between Train and Test Data? CBSE, 2022
() Train: 5%, Test: 95%
the blanks
Fill in
A.
The technique i5 used for
evaluating an Al model and splits the dataset into two set
1
Al model is used to
forecast trends for a
product
is a set of historical data
A in which the outromes are known
beforehand
is the sum of
squared distances between our actual values and
4 predicted values
determines how well a certain
5
A
algorithm models the data
State whether the following statements are true orfalse.
Hyperparameters are internal to an Al model
1
There is no such thing as an ideal split percentage
Cross-validation is used for evaluating machine learning models an a large sample of data.
3
Every project starts with a business understanding
4
The data collection stage, which begins with the initial version of the
5. prepared data set,
focuses on constructing predictive or descriptive models
Ans. A loss function is used by machines to learn. It's a way of determining how well a certain algorithm models
the data. If the forecasts are too far off from the actual findings, the loss function will return a very large
number. Loss function learns to lower prediction error over time with the help of some "optimization/objective
function Regression Loss functions are RMSE and MSE
2 Can MSE be a negative value? Why/Why not? Give the equation to calculate MSE
Ans. MSE cannot be a negative value. The difference between the predicted and actual values can be negative
However, these differences are squared. Hence, all results either positive
are or
zero
3. What is meant by the iterative nature of the problem-solving methodology?
Ans. As data scientists have better
a understanding of the data and models, they typically return to a prior stage to
make changes. Models aren't built once, deployed, and forgotten about instead, they're constantly refined and
adapted to changing situations through feedback, refinement, and redeployment. As a result, both the model
and the labour that goes into it can continue to add value to the business for as
long as the solution is required.
Hence the problem-solving methodology is iterative in nature.
value. The
lower the MSE points. MSE is used ts
he closer the
see how close an estimate
or
forecast is to
model.
an actual
forecast is to the
indicate a better
dctual. So, smaller values
of model effectiveness
5 by feedback the envire
What is meant
the model's
effectiveness and impact on
vironment in
edbackwhito cihmprove
on
Ans receives feedback Data scientists utilize the.
s.
Tne organisation from the implemented
model.
it ax
deployed by collecting findings all of the feedha.
eedback-gathering,
or
automate any
it. They can
the model's accuracy and utility by analysing adel
the model refresh process and
r
and redeployment
phases to speed up improve mode r
assessment, retining. results
E Long answer type questions.
Explain the cross-validation procedure.
1 machine
small els o
learning models onn aa
resampling technique for evaluating
s
Ans. Cros-validation is a
which specifies the number of groups into w k mple of data
k,
The process includes only one parameter,
strategy since
it's straightforward to grasp and Drod.
roduces given data
sample should be divided. It's a popular less a
IV. Using the sample of model evaluation scores, summarise the model's ability.
What is Train Test Split Evaluation? State the reasons for choosing this technique.
2.
Ans. The train test procedure measures the performance of machine learning algorithms when they need to
make
predictions on data that was not used to train the model. The technique divides the provided dataset into h
two
subsets: the training dataset and test dataset. The reasons for choosing this technique are:
Large dataset.
.To estimate the machine learning model's performance on new data that was not used to train the model
J5ually
preparation is usu the most time-consuming procedure in a data sience project Automating certain
Data
eparation steps in advance can speed up the process by minimizing ad hoc preparation tin
What
hyperpararmeters/ What i5 their purpose? Give examples of few hyperparameters.
4
Ans. Hyperparameters
parameters whose values govern the learning proce They are top-level parameters
regulate
the Ilearning process and the model parameters that comefrom it, as the prefix hyper suggests
that
utilizes a testing set tor predictive models (which is separate from the training set but follows the same
probability distribution and has a known outcome) The testing set is used to assess the model and adjust
it as necessary.
. For a final assessment, the final model is sometimes applied to a validation set as well
In addition, data scientists can use statistical significance tests to verify the model's accuracy.
Deployment Stage
The model is deployed into the production environment or an equivalent test environment once it has been
built and authorized by the business sponsors. It is usually used in a restricted capacity until its effectiveness
has been thoroughly assessed. Deploying a model into a live business process frequently necessitates the
involvement of additional internal teams, skill, and technology
Unsolved Questions
A. Fill in the blanks.
2 Since the model cannot modify its values during learning, are said to be external to the model
4. The model is into the production environment or an equivalent test environment once it has
been built
means handling missing or invalid values, removing duplicates, applying correct formats after
the data has been collected.
B. State whether the following statements are true or false.
1. Modelling is usually the most time-consuming procedure in a data science project
#A Lab
following values of x and y
onsider the
x40 42 44 46 48 50 52 54 58 60
y 42 45 47 44 50 48 49 50 55 58
Line equatiom y=15.142+
Regression
Answers
AI Quiz
A. 1.b 2. d 3. a 4. b 5.C 6. d 7.b 8.c 9.d 10. a 11. b 12. b
13. c 14. b 15. b 16. a 17. a 18. d 19. a 20. b 21. b 22. c 23. d 24. b
25. b 26. a 27.c 28. b 29. b 30. a 31. b 32. a 33. a 34. a 35. b
B. i.1. b 2.c 3. d 4.a 5. b 6. a 7.d 8.b 9.C 10. c
i. d ii. b IV. a V.d Vi. b VIl d
Exercise
A. 1. Train-Test-Split 2. predictive 3. training set 4. Mean Squared Error (MSE) 5. loss function
B. 1. False 2. True 3. False 4. True 5. False
C. 1.b 2.e 3. a 4.C 5. d