You are on page 1of 15

UNIT 1

CAPSTONE PROJECT

Learning Outcomes

Understanding/Defining the Problem


Understanding the Capstone Project
Thinking Framework Using an Analytical Approach
Problem Decomposing using Design
Model Validation
.Metrics of Model Quality-Loss Function

to make their life better.


around us. Everyone is leveraging AI applications
Artificial Intelligence (AI) is all centres
Al algorithms to detect suspicious
data patterns to identify examination
Even CBSE has started using customer
using AI in
examinations conducted by CBSE. Organisations are

indulge in malpractices during


that
In fact, India's AI market is expected to reach $7.8
service, human resource, IT automation, security, and more.

billion by 2025.
class XI to
the meaning of the capstone project, apply
concepts learned in
In this unit, you will learn about for your AI model and also learn
will understand which algorithm to apply
real-world problems. Next, you
solve will learn about the concept of
At the end of the chapter, you
about different types of testing methodologies.
Loss functions.

Understanding the Capstone Project an AI model that can help


own small version of
facets of AL, students can also create their
various
Having studied the A capstone project is comprehensive,
problem. This type of AI model is called capstone project.
to solve a real-life
curriculum designed to assess
the skills, knowledge, and
undertaken as part of the
independent, and final project a new technique or
a project often
involves researching a topic, evaluating
has acquired. Such
expertise a student
event in history, or even
the composition of a sketch
researching a character o r
method, developing a health plan,
or a play. demonstrate your
result is the same. You can
choose to undertake, the
No matter what type of project you enter in the professional world to
demonstrate your readiness to
of the course material learned and
understanding
if it's done wel.
Kickstart your career. It's a rewarding experience the
students to integrate all the knowledge learned during
opportunity for
So, a capstone project provides an
that:
Ideally, a great capstone project is one
cOurse and demonstrate it.

Capstone Project 111


you have great interest and passion for
has practical usage/value

doable in a
particular timeframe
contemporary
and helps in your career advancement
Experiential Learning

Video Session
Scan the QR code or visit the following link to watch the video: What is a Capstone Project
www
outube conm vatchiv yBs2Vb'5HIS4
After watching the video, answer the following questio
What is a Capstone
Project? What is its purpose?

Some of the examples of capstone projects in Al, from which you can pick up to develop, are as follows:

Studying images to diagnose disease


Forecasting student results
Creating a chatbot for the school admin department or counsellor to handle parents'/students' queries using
TBM Watson, Google Dialogue Flow

Image Classifier
Analysing social media to assess emotions
R Using regression to predict a trend

(Understanding/Defining the Problem


Nowadays, Artificial Intelligence is one of the most transformative
six steps
technologies. Every Al project goes through these

Defining the Data


problem Defining the
Collection features

Understanding
Al modelling

the nature of
H EvaluationH Deployment

problem provides insight into the


a

yet-to-be-implemented solutions. good understanding of a problem guidescomponents


A
and attributes of the
future decisions to make at the
project stages, especially decisions such as determining if an Al solution is lae
even feasible. At the core of Al model
is finding pattens in data' If the data shows no every
patterns, then most probably the
problem cannot be solved using Al
112 | Touchpad Arificial Intelligence-XIl
A successful problem-defining process requires a basic analysis and evaluation of the project-related problems,
their reasons and methods. Finding the right problem definition is usually an iterative process. It can reveal more
questions and points to consider that would have been ignored without the problem definition process. Questions
below serve as a reterence point for a thorough analysis of the problem and the problems that surround it. JUst
spending time answering the following questions can save you weeks and months working on problems that proved
impossible for previously unknown reasons:
What is the problem that needs to be resolved?
Why do you need a solution to your problem?
How should we work on the solution to the problem?
Which aspects of the problem does the Al model solve?
How do I need to interact with the solution to the
problem?
Which category of data will be involved? (Classification)
How much or how many? (Regression)

Can the data be grouped? (Clustering)

s there any unusual patten in the data? (Anomaly Detection)


Which option should be given to the customer? (Recommendation)

t is essential to determine which of these questions you're asking, and in what way answering it helps solve your
problem.

Brainy Fact
There are patterns in nature even! The Golden Ratio' or 'Divine Proportion' i.e., 1.618 is found in sunflowers,
daisies, chrysanthemums, etc. Leonardo Da Vinci used this ratio in his paintings like The Vitruvian Man, Mona
Lisa. Even the Taj Mahal has been built using the Golden Ratio. The ratio of any two terms of the Fibonacci
series like 233/144 is 1.618. Fascinating isn't it!

(Problem Decomposing using Design Thinking Framework


Recall that Design Thinking methodology provides a solution-based approach to solving problems. It is very useful
in tackling complex problems by understanding the needs of the people involved, reformulating the problem in
a human-centered way, brainstorming many ideas, and taking a practical approach to prototyping and testing. By
understanding the five phases of design thinking, anyone can solve complex problems that arise around us. Let's

recall the 5 stages as follows:

Design Thinking: A 5 Stage Process

Empathise Define HH Ideate Prototype Test

Capstone Project| 113


Empathize: Conduct research (interviews, polls, etc) to better understand your users.
Define: Define the challenges, use your research to observe users' curent problems. Create a point of view

Tdeate: Brainstorm to arrive at various creative solutions.

Prototype: Build representations (charts, models) of one or more ideas.


Test: Test your model(s) and gain user
feedback
Solving real-life problem is complicated. During coding, we follow problem decomposition methodology which can
be applied to real-life problems as well. Problem decomposition steps are as follows

words:
1 Understand the problem and express the problem in your own
Understand the required inputs and outputs
Ask questions for clarity (in class, these questions may be directed to the teacher, however, you can alca

ask yourself or your colleagues)


Break down the problem into several big parts. Write them down on paper.
2
3 Divide any larger complicated part into smaller pieces. Continue this until all parts are smal.

4. Code the smaller parts one by one. Use the following methodology:
Analyse how to implement the code.
Write the code/query.
Test each code individually.
Fix the problem(s), if any.
Imagine you want to create your first website. How would you decompose this task? Think about the following while

decomposing:
What colour combination can be used?
How many web pages are to be included on the website?

Who will be the target audience?


What kind of images are to be posted on the website?
What video/audio would be included?
Which software to use for website development?
This work can also be divided among various team members involved in this task.

Experiential Learning

Video Session
Scan the QR code orvisit the followinglink to watch the video: Introduction to Decomposition
https://www.youtube.com/watch?v=rxsYpP2-omg
orANO
After watching the video, answer the following questions:
What is decomposition?

Intelligence-XI
1 1 4 | Touchpad Artificial
Usingan Analytical Approach
In data science, it is common to solve problems and answer questions using data analysis. Typically, data scientists

build a model to predict outcomes or explore


underlying patterns, for information gathering purposes. Organizauon
can then use this intormation to take actions to ideally improve future outcomes. There are many rapidly evoving
technologies for data analysis and model building. In a remarkably short time, there have been significant
improvements in the quality and accuracy of the models too.

As data analytics becomes more accessible and prevalent, data scientists need a core methodology that can
provide a guiding strategy, regardless of the technology, the volume of relevant data, or the approach. This
methodology emphasizes many of the new approaches in data science. It consists of 10 steps that form an
iterative process using data to discover information. Each step plays an important role in the context of the
overall methodology.

Business Analytic
understanding approach

Feedback Data
requirements

Data
Deployment
collection

Data
Evaluation
understanding

Data
Modelling
preparation

Stage 1: Business Understanding


Every project starts with a business understanding. Business sponsors, who need analytical solutions, play the most
important role at this time in defining the problem, project objectives, and solution requirements from a business
perspective. This first step lays the foundation for successful business problem solving and is perhaps the hardest.
To help ensure project success, sponsors should be involved throughout the project to provide expert knowledge.
review interim conclusions and ensure that work remains on track to produce the intended solution.

Stage 2: Analytic Approach


Once the business problem is clearly stated, the data scientist can define an analytical approach to solving the
problem. This step is to represent a problem in the context of statistical techniques and machine learning so that the
organization can determine the most appropriate for the desired outcome.
For example,
f the goal is to predict an answer such as "yes" or "no", then the analytical method can be defined as the
building, testing, and execution of a classification model.
If the goal is to determine the probability of action, then predictive modelling can be used.
If the goal is to show relationships, a descriptive approach may be necessary.

Capstone Project| 115


Stage 3: Data Requirements
The analytical approach chosen characterizes the requirements for the data. In particular, the analytical methods

used require some content, format, and initial data collection.

Stage 4: Data Collection


Dunng the initial data collection phase, data scientists identify available data sources (structured, unstructured.
and semi-structured) relevant to the problem area. If there is a gap in data collection, the data scientist may need

to modity data requirements accordingly and collect new and/or more data. loday's high-performance database
analytics enable data scientists to utilize large datasets that contain large or even all of the available data. Due to

this, predictive models are able to better predict rare events such as disease or system failure.

Stage 5: Data Understanding


After the initial data collection, techniques such as descriptive statistics and visualizations can be applied to datasets to
evaluate the content quality, and initial insights of the data. Additional data collection may be required to fill the gap.

Stage 6: Data Preparation


This stage contains all the activities to build the dataset used in the subsequent modeling stage. Activities to prepare
data include
data cleansing (handling missing or invalid values, removing duplicates, applying correct formats),
joining data from multiple sources (files, tables, platforms), and
the conversion of data to more useful variables.
Data preparation is usually the most time-consuming procedure in a data science project. In many domains, some
data preparation procedures are common for a variety of problems.. Automating certain data preparation steps in
advance can speed up the process ad hoc preparation time. Today's high-performance,
by minimizing massively
parallel systems and analytics capabilities where data is stored allow data scientists to prepare data more easily and
quickly using very large datasets.

Stage 7: Modelling
The modelling stage, which with the initial version of the prepared data set, focuses on
begins constructing
predictive or descriptive models based on the previously stated analytic approach. To develop a prediction model,
data scientists employ a training set (historical data in which the desired
outcome is already known).
As businesses receive intermediate insights, the modelling process is often very iterative, leading to refinements in
data preparation
and model formulation. Data scientists
may attempt numerous algorithms with their respective
parameters for a specific technique to get the best model for the available variables.

Stage 8: Evaluation
The data scientist
reviews the model during development and before
deployment to determine its quality and ensure that it
correctly and completely answers the business problem.

can interpret the model's quality and efficacy in solving the problem by
producing numerous diagnostic metrics
and other outputs such as tables and graphs.

can utilizetesting set for predictive models (which is separate from the
a
training set but follows the same
orobability distribution and has a known outcome.) The testing set is used to assess the
model and adjust it as
necessary.

116 Touchpad Artificial Intelligence-XII


ww
For a final assessmernt the final model is sometimes applied to a validation set as well
Inaddition data scientists can use statistical significance tests to verify the model's accuracy. This additional evidence
could helpP Justity mOdel deployment or take action when the stakes are high, such as with an expensive additional
medical procedure or a key aviation flight system.
Stage 9: Deployment
The model deployed into the production environment
is
or an equivalent test environment once it has been
built and authorized by the business
sponsors. It is usually used in
restricted capacity until its effectiveness has
a
been thoroughly assessed. The model might be embedded in
complicated workflow and scoring process run
a
by a customized application, or it could be as simple as
providing a report with recommendations.
Deploying a
model into a live business process
frequently necessitates the involvement of additional internal teams, skills, and
technoiogy.
Stage 10: Feedback
The organisation receives feedback on the model's effectiveness and
impact on the environment in which it was
deployed by collecting findings from the implemented model. For instance, feedback could come in the form of
response rates to apromotional campaign. Data scientists can utilize this feedback to improve the model's accuracy
and utility by analysing it. They can automate any or all of the feedback-gathering, model assessment,
refining, and
redepioyment phases to speed up the model refresh process and improve results.
The iterative nature
of the problem-solving process is shown by this methodology's flow. As data scientists havea
better understanding of the data and models, they typically return to a prior stage to make changes. Models aren't
built once, deployed, and
forgotten about; instead, they're constantly refined and adapted to changing situations
through feedback, refinement, and redeployment. As a result, both the model and the labour that goes into it can
continue to add value to the business for long the solution is
as as
required.

Ai Reboot ****' ******** '** *** ****** ******* *** ******************************** *********

Fillin the blanks


The problem
methodology can be applied to real-life problems as well.
Historical data in which the desired outcome is already known is called
3 In the modelling stage, the data scientistwill construct or. models.

4 The methodologyfor model building and deployment is an process.

********'***************

Brainy Fact
Envision, the award-winning iOS and Android smartphone app that allows blind and visually impaired people
to independently access visual information around them, announced plans to integrate its Al-powered
software technology into Google Glass in March 2020. The combination of Envision's software and Google
Glass gives blind and visually impaired users with a substantially less invasive and hands-free manner of
accessing the world around them, giving them greater freedom and independence to access and 'see' the
world around them.

Capstone Project| 117


ExperientialLearning

Video Session
a n the QR code or vrsit the following link to watch the video: How Envision Works

htis www voutube com watchy-Qeht Nng2EFo&t-164s


Ater watching the video, answer the following question
How envision glasses heip visually impaired people?

Model Validation
There are mainly two types of validation methods which are Train Test Split Evaluation and Cross Validation. Let us
leam about them in detail.

Train Test Split Evaluation


The train test procedure measures the performance of machine learning algorithms when they need to make
predictions on data that was not used to train the model. While it is quick and easy to use, it is only suitable for
huge data sets. The train test split cannot be utilised when the data sets are small and additional configurations are
required, such as when the data set is not balanced.
The train test spit technique can be used to test machine learning algorithms for classification and regression
problems. The technique divides the provided dataset into two subsets:
The training dataset is used to fine-tune the machine learning model and train the algorithm.
Test dataset algorithms make predictions using the input elements from the training data.

Reasons for Choosing Train Test Split Evaluation


The goal is to estimate the machine learning model's performance on new data that was not used to train the
model. This is how we want to use the model in the real world. To put it another way, we want to fit it to existing9
data with known inputs and outputs, then generate predictions for fresh cases in the future where we dont
know the expected outcome or goal values.
Another reason to employ the train-test split assessment
process, other than dataset size, is computationa
efficiency Some models are extremely expensive to train, making a repeated evaluation, as employed in other
techniques, impossible. Deep neural network models are one example. The train-test approach is widely
empioyed in this situation
Sometimes, a project may already have a model working efficiently and a large dataset, but still may require an
overview of model performance quickly. Again, the train-test split procedure is selected in this situation.
Random selection is also used to divide samples from the original training dataset into two subsets. This ensures
that the train and test datasets are retlective of the original dataset. When the dataset available is small, the
train-test procedure is not appropriate.The reason for this is that
there will not be data in the training
dataset for the model to learn an appropriate mapping of inputs to outputs. There willenough
also be insufficient data in
the test set to evaluate the model's performance appropriately
Configuring the Train Test Split
The size of the train and test sets is the procedures key configuration parameter. For either the train or test datasets,
this is usually given as a percentage between 0 and 1. For example, a training set with a size of 0.67 (67%) means
that the test set will get the remaining percentage of 0.33 (33%)

118 Touchpad Artificial Intelligence-XIl


There is no such thing as an ideal split
percentage A data scientist determines a split % that
taking into account the following factors suits the project's goals,
. Cost of training the model
The computational of
Representativeness of the training set
cost assessing the model
Representativeness of the test set
Split percentages used comnmonly are:
Train: 80% Test 20% Total number of examples
Train: 67% Test. 33%
Train 50% Test: 50% Training Set Test Set

Experiential Learning

Video Session
Scan the QR code or visit the following link to watch the video: Training and Testing
https://www.youtubecom/watch?v=P2NqrFp8usY
After watching the video, the 0MANGE
answer
following question:
What is the role of data in training a model?

Train-Test Split Procedure in Python


Thetrain testsplit) function in the scikit-learn Python machine learning package implements the train-test split
evaluation procedure. The function accepts a dataset as input and returns the dataset split into two subsets.
You can use any of the following statements:
_rain, _test, y_train, V_test = t r a i n _ t e s t _ s p l i t (X, Y, test_size-0.33)

OR
_rain, X_test, y_train, y_test = t r a i n _ t e s t _ s p l i t (X, y, train_size=0.67)

Example:
#split a dataset into train and test sets
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
#create dataset
, Y = make_blobs (n_samples=1000)

# split into train test sets


_rain, _test, y_train, y_test = train_test_split (X, y, test_size=0 .50)

print (X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Output:
(500, 2) (500, 2) (500,) (500,)
Out of 1000 samples 50% (500) is for training set and 50% (500) is for test set.

Cross-Validation Procedure
learning machine sample models on a small of data. The
Cross-validation is resampling technique for evaluating
a
the number of into which a given data sample should
process includes only one parameter, k, which specifies groups
be divided. As a result, the process is frequently referred to as k-fold cross-validation. For example, k=10 for 10-fold
cross-validation. It's a popular strategy since it's straightforward to grasp and produces a less biased or optimistic
estimate of model competence than other approaches, such as a simple train/test split.

Capstone Project 119


The following is the general procedure
Randomly shuffle the dataset.
2 Organize the data into k groups
For each distinct
group, write
As a holdout or test data set, use the group.
As a training data set, use the remaining grouping.
Fit a model to the training set and test it against the test set.
Keep the evaluation score but toss out the model.
4 Using the sample of model evaluation scores, summarise the model's ability
Test Data Training data

Iteration1
Iteration 2 0 0OC
Iteration 3 O0TOO0g
Tteration kk- O 00 0©OOO0
All data

ExperientialLearning

Video Session
Scan the QR code orvisit the following link to watch the video: K-Fold Cross Validation Intro to
Machine Learning
https.//wwwyoutube.com/watch?v=TIgfjmp-4BA
ORANGE
After watching the video, answer the following question:
What is the role of cross validation in training and testing a model?

Train Test Split Vs Cross-Validation


The increased computational load of doing crosS-validation isn't a major concern on small datasets. With a train-
test split, these are also the problems where model quality scores would be the least trustworthy. Cross-validation
should be used if your dataset is smal.

For the same reasons, for larger datasets, a simple train-test split is sufficient. It will run faster, and you may have
enough data that reusing a portion of it for a train test split is unnecessary.

Cross-validation is mostly the method ofchoice since it allows your model to be trained
multiple train-test splits.
This gives a good idea of how well your model will perform on data not seen before. On the other hand, Train Test

Split procedure relies on only one train-test split.

120 Touchpad Artificial Inteligence-XII


Metrics of Model Quality-Loss Function
models the data.
A loss function
is usecdby machnes to learn It's a way of determining how well a certain algorithm
number. Loss function
too far off from the actual findings, the loss function will return a very large
Ifthe foreasts are
learns to lower prediction error over tme with the help of some "optimization/objective function.
method
fits all loss function. The type of machine learning
In machine learning. there is no such thing as a one size
in the dataset, all play a role in
used the ease of calculating derivatives, and, to some extent, the number of outliers
selecting a loss function for a certan task
be divided into two categories
Depending on the type of dealing with, loss functions can
learning job we're
a set of finite categorical
In classification, the output is predicted from
regression losses and classificat1on losses. them into one o
handwritten numbers, categorising
values. For example. given a large data set of photographs of
as given floor area,
a continuous value, such
0-9 digits Regression, on the other hand, is concerned with predicting
number of rooms, room size, predicting the price of the house.

Classification Regression

Loss
Log Loss Mean Square Error/Quadratic

Mean Absolute Error


Focal Loss

Huber Loss/Smooth Mean


K L divergence/ Relative Entropy Absolute Error

Log Cash Loss


Exponential Loss

Quantile Loss
Hinge Loss

MSE (Mean Squared Error)


in Machine
widely used loss function, and it frequently taught
is
Error (MSE) is the most basic and
The Mean Squared it, and
model's predictions and the actual values, square
courses. Calculate the difference between the
Learning
theentire dataset to the value of MSE. MSE is given by the equation:
get
average it across

MSE

MSE will never be negative because the errors are always squared.

Advantage with significant errors


model does not have any outlier predictions
MSE is ensuring that our trained
useful for
on these errors due to
the squaring element of the function.
because MSE places a higher weight

Disadvantage
incorrect forecast, the squaring part of the function multiplies the error.
If our model makes a single particularly
outliers and instead seek a more fully-rounded model
cases, we don't worry about these
However, in many real-life
that performs well enough on the majority of cases

Capstone Project| 121


Experiential Learning

Video Session
Scan the QR code or visit the following link to watch the video: Mean Squared Error MSE

https.//wwwyoutube.com/watch?v=Mhw_-xHVmaE
After watching the video, answer the following question: MAN OE

What do you mean by MSE?

Calculating MSE in Python, the mean_squared_error function gives the Mean squared error regression loss.
from sklearn.metrics import mean_squared_error
Y_true 13, -0.5, 2, 7.2] #list of actual values
Y_pred- [2.5, 0.0, 2.3, 8] #list of predicted values
print"MSE value=", mean_squared_error (y_true, Y_pred)) #returns MSE value

#Using nested lists


Y_true = I[0.5, 1],[-1, 0 ] , [7, -5]]

y_pred i t0, 21, [-1, 1.5], [8, -5.5]1


print ("MSE value=", mean_squared_error (y_true, Y_pred) )
Output
MSE value= 0.3074999999999999
MSE value= 0.7916666666666667

RMSE (Root Mean Square Error)


The Root Mean Square Error
(RMSE) is a metric for determining how well a
can also be understood as the standard deviation
regressionline fits the data points. RMSE
in the residuals. Recall that the
residual is the difference between
the predicted value and observed value in the Regression Line.
120

100-

80

.I..- Residual
60 Error

40

20

1 2 3 4 5 7 8
No. of Hours Studied 10
(X)

Touchpad Artificial Intelligence-XIl


122
w
So. RMSE is calculatedas

RMSE
Predicted Actual
N

The errors are squared betore being averaged in RMSE. This basically means that RMSE
gives larger mistakes a
This
higher weight. sug9gests that RMSE is far more beneficial when substantial erors exist and have a significant
impact on the model's performance. This characteristic is important in many mathematical calculations since it
avoids taking the absolute value of the error. The RMSE of a good model should be less than 180. The lower the
RMSE value, the higher the model's performance.
The errors are squared before being
averaged in RMSE. This basically means that RMSE gives larger mistakes a3
higher weight. This suggests that RMSE is far more beneficial when substantial errors
exist and have a
impact on the model's performance.
significant
This characteristic is
important in many mathematical calculations since it avoids taking the absolute value of the
error. The RMSE of a good model should be less than 180. The lower the RMSE value, the higher the model's
performance.

Experiential Learning

Video Session
Scan the QR code or visit the following link to watch the video: U01V05 Calculating RMSE in Excel

https://www.youtube.com/watch?v=G8j8KAJJlw
After watching the video, answer the following question:
What do you mean by RMSE?

Calculating RMSE in Python


import numpy as np

_Pred np.array([O.000, 0.166, 0.333]) #predicted values


Y_true = n p . a r r a y ( [0.000, 0.254, 0.998]
#actual values
aet Imse (predictions, targets) :

diff = predictions targets


diff_sq = diff ** 2

mean_diff_sq = d i f f _ s q . mean ()

rmse_val = np.sqrt (mean_diff_sq)

return rmse_val

print ("predicted values are: " t str (["%.4f" i for i in y_pred] ) )

Print ("actual values are: " + str (["%.4f" & i for i in y_true]))

mse_val = rmse (y_pred, Y_true)

Print ("RMS Error is: " + s t r (rmse _val))

Capstone Project| 123


Output:
predicted values are: ['0.0000', "0.1660', 0.3330']
actual values are: [0.0000', '0.2540', 0.9980']
RMS Error is: 0.3872849941150143

likely need to alter your feature or tweak your hyperparameters.


If you have a larger RMSE value, you will most

Hyperparameters
parameters whose values govern the learning process. They also determine the values of
Hyperparameters are

model parameters learned by a learning algorithm. They are 'top-level parameters that regulate the learning
process and the model parameters that come from it, as the prefix hyper suggests. Since the model cannot
modify its values during learning/training, hyperparameters are said to be external to the model. Some examples
of hyperparameters are
The ratio of train-test split
Optimization algorithms' learning rate (e.g. gradient descent)
In a neural network, the activation function selected (e.g. Sigmoid, ReLU, Tanh)
The loss function that the model will
employ
A neural network's number of hidden layers

The number of iterations (epochs) required to train a neural network.


A clustering task's number of clusters

At a Glance
A
capstone project is a comprehensive, independent, and final project undertaken as
designed to assess the skills, knowledge, and expertise a student has acquired. part of a curriculum
A successful
problem-defining process requires a basic analysis and evaluation of the project-related
problems, their reasons, and methods.
Design Thinking methodology provides a solution-based approach
solving problems.
to
During coding, we follow problem decomposition
problems as well. methodology which can be applied to real-life
Once the business
problem is clearly stated, the data scientist can define
solving the problem. an
analytical approach to
The
analytical approach chosen characterizes the requirements for the data.
During the initial data collection phase, data scientists
unstructured, and semi-structured) relevant to the identify available data sources (structured,
The modelling stage, which problem area.
begins with the initial version of the prepared data set, focuses
predictive or descriptive models based on the previously stated on
constructing
The data scientist reviews the analytic approach.
model
quality and ensure that it correctly and during development and before
deployment to determine its
The train test procedure measures
completely answers the business
problem.
the performance of machine
make predictions on data that was
not used to train the model.
learning algorithms when they need to
The training dataset is used to
fine-tune the machine learning model and train the
.Test dataset algorithms make algorithm.
predictions using the input elements from the
Cross-validationis a
resampling technique for
training data.
of data. evaluating machine learning models on a small sample

124 | Touchpad Artificial Intelligence-XIl


A loss function determines how well a certain algorithm models the data.
Loss function learns to lower prediction error over time with the help of some "optimization/objective
function."

Loss functions can be divided into two categories: regression losses and classification losses.

MSE is sensitive to outliers.


data
The Root Mean Square Error (RMSE) is a metric for determining how well a regression line fits the
points
Hyperparameters are parameters whose values govern the learning process.

Ai Quiz
A.Tick () the correct option.
Which of the following is not the part of Design Thinking, a 5 Stage Process?
a. Empathize b. Sympathize.
d. Define
c. Prototype
is a project where students must research a topic independently to get a deep understanding
A
of the subject matter.

a. Al model b. culminating report


d. capstone
C. senior report
loss function. MSE is the sum of squared
3 Mean Error (MSE) is the most commonly used regression
Square
values. Identity one feature of MSE:
distances between our target variable and predicted

a. It is sensitive to outliers

b. It is used on data, conditioned on the output variables.


around median value.
c. It is good to use if the target data is normally distributed a

where the optimal prediction is the mean.


d. It should be compared with Mean Absolute Error,

In problem decomposition:
and then restate the problem in your own words
Understand the problem
ii. Gather all simple facts to create a complicated piece

ii. Break larger units into simpler ones

iv. Code one small unit at a time

Which of the following is true?


b. ). (i) and (iv)
a. ) and (i)
d. i). ci), (ii) and (iv)
C (i) and (iv)

5 For AI techniques to be applied to a dataset, the data must have a .


associatioon
b. relationship
a.
d. Either a and b
C. pattern
An optimum Al model should have a. value less than 180.
6 b. Mean Absolute Error
a.

C.
Mean Square Error

Quantile Loss 8 d. Root Mean Square Error O

Capstone Project 125

You might also like