Professional Documents
Culture Documents
[1]
CLASS X UNIT-II AI PROJECT CYCLE
1.4 PROBLEM SCOPING
Problem Scoping is the initial stage of the AI project cycle. It refers to the process of
identifying or framing a problem in such a way that it can be visualized in order for it to be
solved.
1.4.1 Define Problem Statement And Set Actions
Setting goals for aproject is not enough. In order to solve a problem, the problem statement
must be clearly defined. It is very important to know how this problem is affecting whom;
what is the cause and why should it be solved? For this purpose, a useful tool- 4Ws
Canvas is used.
1.4.2 4Ws Canvas
The 4Ws Canvas tool is a problem framing method that helps in describing and interpreting
a problem to arrive at a problem statement. Framing a problem is all about identifying the
right problem to solve and to understand it perfectly. The 4Ws Canvas is a useful tool that
helps us critically explore a problem from various angles and bring clarity about the problem
to be solved.
The 4Ws Canvas tool (also known as problem statement template) explores four W-
based questions – WHO, WHAT, WHERE and WHY- in the context of problem to be solved
to clearly understand various aspects of the problem.
a) Who Block
The “WHO” block helps in finding the people who are getting affected directly or
indirectly due to the problem under analysis. Under this, we need to find out who are the
‘Stakeholders’ to this problem. We also need to find what we know about these
stakeholders. Stakeholders are the people who are currently facing the problem and
later they would be benefitted with the solution as well.
b) What Block
The “WHAT” block is used for finding the kind of problem we are facing. At this stage we
need to find the nature of the problem. Under this block, we also gather various
evidence to show that the problem we have found does actually exists. Newspaper
articles, Media, announcements, etc., are some examples through which we can get the
information regarding the problem.
c) Where Block
This block will help us to look into the situation in which the problem arises or the
context of the problem and the area where the problem is prominent.
d) Why Block
In this “Why” canvas, we think about the benefits which the stakeholders would get from
the solution and how would it benefit them as well as the society.
[2]
CLASS X UNIT-II AI PROJECT CYCLE
1.4.4 Choosing a Theme and Topic of AI Project
As your aim is to develop an AI based solution for a problem, look for problems around you
that require a novel solution. For this, firstly choose a theme and then choose a problem
from the domain of chosen theme.
You may choose a theme from one of the following:
a) Environment e) Cyber Security
b) Healthcare f) Agriculture
c) Education & E- Education g) Infrastructure
d) Transport h) Women Safety
i) Wellness
You can even choose a theme from the 17 SDGs (Sustainable Development Goals).
We are choosing an AI Project based on Sustainable Development Goal 3: Good Health
and Well- Being, which emphasizes upon the goal: Ensure healthy lines and promote
well-being for all at all ages.
1.4.5 Identify The Problems Around The Selected Topic
After choosing the theme (SDG3: Good Health and Well being) and its subtopic (Face –
Mask Detection System), let us first identify the issues and problems around it so that AI
can help with it.
1.4.6 Writing Problem Statement
Let us now try to write the problem statement for our AI project of Face Mask Detection.
Problem Statement of AI project of Face Mask Detection
Government and Non – Government agenciesresponsible for public health and wellbeing
(WHO) ensure that a safe and healthy environment is made available to people. For this,
certain rules and laws are in place – one such rule by law is to wear face masks on public
places, especially during pandemic times.
But, some people violate this law and do not wear face masks (WHAT) at public
places (WHERE) or wear them wrongly (WHAT).
It is important to ensure that face masks are worn properly to ensure a safe, infection-
free public environment for people (WHY).
1.5 DATA ACQUISITION
Data Acquisition is a process of collecting data from various sources for the purpose of
analytical operations like training and predictions.
1.5.1Significance of Data
Data plays a crucial role for an AI project to behave intelligently as the AI project is trained
using data to behave in a specific way. Data is raw facts and figures. It is the statistics
collected together for reference or analysis.
To build an AI system, you would need to source large amounts of data and create data sets
for training, testing and evaluation, and then deployment of the AI project. This process is
repeated through several rounds of training, testing and evaluation until the desired outcome
is achieved and data plays an important role at each step.
❖ For Training, previously existing data with specific outcomes is fed into an AI system and
the system is trained using the data.
❖ Then Evaluation takes place where validation data, which is new data, is fed to evaluate
the working system. Validation data provides the first test against unseen data, to evaluate
how well the AI model makes predictions/ decisionsbased on the new data.
❖ Then Testing happens when the AI system is fed with some data whose outcomes are
known beforehand and produced outcomes of AI system are compared with the expected
[3]
CLASS X UNIT-II AI PROJECT CYCLE
outcomes to test if the system is working efficiently or not. Testing data once again
validates that the developed AI model can make accurate predictions/decisions.
For example, If we want to make an Artificially Intelligent system which can predict the
weather of any geographical area based on the previous weather analysis, then we would
feed the weather data of previous years into the machine. This is the data with which the
machine can be trained. Now, once it is ready, it will make the weather forecast. The previous
weather data is known as Training Data while the predicted weather data set is known as the
Testing Data.
The training data should be relevant and reliable. This is because AI systems work on a
principle called GIGO (Garbage In, Garbage Out).The concept of GIGO implies that the
quality of output depends upon the quality of input.
“For any AI project to be efficient, the training data should be authentic, accurate and
relevant to the problem statement scoped.”
1.5.2 Type of Data used in AI Projects
Artificial Intelligence (AI) projects are required to process the vast amounts of data produced
as a result of the increase of Internet-based technologies in areas such as stock exchanges
and financial services, industry and manufacturing, telecommunications and transport,
healthcare, academia and so forth.
Data of AI systems broadly belongs to one of the following two categories:
a) Structured Data:
Structured data is data that has a purposely designed, pre-defined structured as per
some existing data model, such as simple 2D spreadsheet arrays, complex relational
databases or knowledge graphs etc. The structured data has well-defined relationships
among its elements.
b) Unstructured Data:
Unstructured data is data that is no organized according to any pre-existing data model.
Unstructured data is unprocessed and is often generated by machine-led systems for
example, social media posts, surveillance camera footage, or satellite imagery etc. The
unstructured data can have its own internal structure, which may not fit in some
well-defined format.
1.5.3 Data Features
Both structured and unstructured data have certain data features.Data features refer to the
type of data you want to collect. For example, for an AI system analyzing social media posts,
the data features required would be social – media post, platform, time-posted etc.
1.5.4FindingReliable Data Sources
Data can be acquired from various sources but these sources should be reliable, correct and
authentic. There are various sources from which the data can be acquired.
Some of the data acquisition sources are:
a) Surveys: Surveys is one way in which we can collect the information directly by asking
the customers. Survey can collect either quantitative or qualitative data or both. A
survey consists of a list of queries respondents can answer in just one or two words. We
can conduct surveys online, over email, over the phone or in person for collecting data.
b) Web Scrapping: Web data extraction (also known as web scraping, web harvesting,
screen scraping, etc.) is a technique for extracting huge amounts of data from websites
on the internet using a web browser. The data collected from website is arranged in an
organized format such as table, CSV file or spreadsheet.
[4]
CLASS X UNIT-II AI PROJECT CYCLE
c) Sensors:Sensors, often called Transducers, convert real-world phenomenon like
temperature, force, and movement to voltage or current signals that can be used as
inputs.
d) Cameras:Camera is an important part of data collection in the form of images. Live data
can be acquired using the web-camera, CCTV, chat-bot interface, etc.
e) Observation:Observation means the careful and systematic viewing of facts as they
occur. Observation requires movement of the eyes and carefully listening through ears.
Observation serves the purpose of: (i) studying collective behavior and complex social
situations; (ii) following up of individual elements of the situations; (iii) understanding the
situation in their interrelation; (iv) getting the details of the situation.
f) API(Application Program Interface):Application programming interfaces are the piece
of codes which help one application to connect to another. API are used to collect data
from other application.
We should keep in mind that the data which we collect is open-source and it should not
belong to anybody. On internet the most reliable and authentic sources of information, are
the open-source websites hosted by the government. Some of the open- sourced
government portals are: data.gov.in,india.gov.in
1.5.5 Primary Data & Secondary Data
There are two terms associated based on who collects the data:
a) Primary Data is the type that you gather by yourself. It means you are actively involved
in the sourcing of information.
b) Secondary Data is all around us. It is easily accessible on the internet and requires
fewer resources to gather, unlike Primary Data. In this case, the collection of primary
data has been done by someone else before getting uploaded to the internet.
Secondary Data comes in the form of search results.
1.5.6System Maps
System Maps help us to find relationships between different elements of the problem which
we have scoped. It helps us in strategizing the solution for achieving the goal of our project.
A system map shows the components and boundaries of a system at a specific point in time.
With the help of System Maps, one can easily define a relationship amongst different
elements which come under a system. The use of ‘+’ signs and ‘-‘ signs in the loops indicate
the nature of the relationship between elements. The arrow-head depicts the direction of the
effect and the sign (+ or -) shows their relationship. If the arrow goes from X to Y with a +
sign, it means that both are directly related to each other. That is, If X increases, Y also
increases and vice versa. On the other hand, If the arrow goes from X to Y with a – sign, it
means that both the elements are inversely related to each other which means if X
increases, Y would decrease and vice-versa.
1.6DATA EXPLORATION
Data exploration is the phase after data acquisition wherein the collected data is cleaned by
removing redundant data and handling missing values and then analysed using data
visualization and statistical techniques to understand the nature of data before it can be
converted into AI models.
1.6.1Data Visualization
Data visualization refers to the process of representing data visually or graphically, by using
visual elements like charts, graphs, diagrams and maps etc.
The importance of data visualization is summarized as follows:
❖ Data visualization is a powerful way to represent a bulk of data in a collective visual form.
❖ It is a way to explore data with presentable results.
[5]
CLASS X UNIT-II AI PROJECT CYCLE
❖ It becomes easier to see the trends, relationships and trends of data through data
visualization.
1.6.2Visualisation Tools
Some of the Data Visualisation Tools are:
a) Scatter Chart (Used with numeric type of data):
An XY (scatter) chart either shows the relationships among the numeric values in several
data series or plots two groups of numbers as one series of XY coordinates.
How to draw?
The scatter chart is drawn by plotting the independent variable on the horizontal axis X,
the dependent variable on the vertical axis Y and then by marking data points as per their
XY values.
b) Bubble Chart (Used with numeric type of data):
A bubble chart is primarily used to depict and show relationships between numeric
variables with marker size as additional dimension. Bigger marker means bigger value.
How to draw?
The bubble chart is drawn by plotting the independent variable on the horizontal axis (X),
the dependent variable on the vertical axis(Y) and then by marking bubbles at their XY
values. The Y values will determine the bubble size.
c) Line Graph(Used with numeric type data):
A line chart shows trends in data at equal intervals. Line charts are useful for depicting
the change in a value over a period of time.
How to draw?
The line chart is drawn by plotting the independent variable on the horizontal axis (X), the
dependent variable on the vertical axis(Y) and then by marking data points as per their
XY values. Then a line is drawn by joining the marked data points.
d) Pie Graph(Used with numeric type of data):
A pie chart shows the proportional size of items that make up a single data series to the
sum of the items.
How to draw?
The pie chart represents single data series, whole of which represents full circle (360).
Each data value is calculated as a percentage of whole and drawn as a pie of the circle.
e) Bar Graph (Used with numeric type of data):
A bar chart illustrates comparisons among individual items, mainly of number types.
How to draw?
The bar chart is drawn by plotting the independent variable on the horizontal axis(X), the
dependent variable(s) on the vertical axis (Y) and then by marking bars for their Y values.
f) Histogram (Used with numeric type of data):
A histogram is used to summarize discrete or continuous data by showing the number of
data points that fall within a specified range of values (called “bins”). Unlike a bar chart,
there are no gaps in between in a histogram.
How to draw?
Like bar chart, rectangles of varying height are used to represent the frequency of
different values of the continuous variable (Y values). There are no spaces between the
rectangles.
1.7 MODELLING
Modelling is a process in which AI-Enabled algorithms are being designed as per the
requirements of the system and later the model is implemented.
Modeling is the stage where we select the technique required for building the model using
prepared data. The model built is can be trained using various learning algorithms. To build an
[6]
CLASS X UNIT-II AI PROJECT CYCLE
AI based project, we need to work around Artificially Intelligent models or algorithms.Training a
model is required so that it can understand the various pattern, rules and features.
1.7.1Modelling Approaches
AI Modelling refers to developing algorithms, also called models which can be trained to
getintelligent outputs. That is, writing codes to make a machine artificially intelligent.In
Modeling, there are two approaches taken by researchers while building AImodels. Now let
us understand this modeling technique.
1.7.2Categories of AI Models
The AI models can either be data driven or model driven. The model driven AI models are
mainly rule based while data driven AI models are mainly learning based.
a) Rule-based Approach:
A Rule based Approach is generally based on the data and rules fed into the machine,
where the machine reacts accordingly to deliver the desired output. Rule Based Approach
refers to the AI modeling where the relationship or patterns in data are defined by the
programmer or the model developer. The machine is trained using the rules laid down by
the developer. The machine has to follows the rules or instructionsmentioned and then
performs its task accordingly.
Decision Tree is an example of a Rule based approach.
● It is a tree-structured classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node represents the
outcome.
● In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make
any decision and have multiple branches,
whereas Leaf nodes are the output of
those decisions and do not contain any
further branches.
● The decisions or the test are performed on
the basis of features of the given dataset.
● It is a graphical representation for getting
all the possible solutions to a
problem/decision based on given
conditions.
[7]
CLASS X UNIT-II AI PROJECT CYCLE
● It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
● A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtree.
We can understand decision tree with the help of the example given below:
We can draw the decision tree for those people who are interested in buying a new house
based on income. The decision of the house type whether 1BHK, 2BHK, 3BHK or 4BHK is
taken.
Drawbacks of Rule Based AI Models
Although the rule based AI models are comparatively easier to maintain and implement, they
also suffer from the following drawbacks:
(i) Lot of manual work. The rule based system requires a lot of manual work as all the
rules governing the decisions must be pre-coded and made available to the system.
(ii) Consumes a lot of time. Creating all possible rules for a system requires a lot of time.
(iii)Suitableonly for less complex domains.Complex systems would require large
number of rules.
b) Learning-based Approach:
A Learning based approach is that the machine is fed with data and the desired output is
achieved when the machine designs its own algorithm (or set of rules)to match the data to
the desired output fed into the machine. Learning based approach refers to the AI modeling
where the relationship or patterns in data are not defined by the programmer or model
developer. In this approach, random data is fed to the machine and it is left on the machine
to figure out patterns and trends out of it.
Generally this approach is followed when the data is unlabelled and too random for a
human to make sense out of it. Thus, the machine looks at the data, tries to extract similar
features out of it and clusters same data sets together. In the end as output, the machine
tells us about the trends which it observed in the training data.
Why Machine Learning (ML) fall under the category of Learning-Based AI?
Machine Learning (ML) is a branch of AI that enables machines to automatically learn and
improve at tasks with experience and by the use of data.ML based machines undergo lots
of repetitions of taking data and testing it; these then keep track of when things went wrong
or right, and keep improving their results.
The ML systems can automatically learn and improve without explicitly being programmed.
The recommendation systems on music and video streaming services are example of
ML.Machine learning finds patterns in data and uses them to make predictions.
1.7.3 Unlabelled and Labelled Data
Before we proceed to the discussion of different learning based approaches, it is important
to talk about labelled and unlabelled data first.
a) Unlabelled Data:Unlabelled data is a description for pieces of data that have not been
tagged with labels identifying characteristics, properties or classifications of data. Some
examples of unlabelled data might include photos, audio recordings, videos, news
articles, tweets, x-rays (in case of some medical application), etc. There is no
“explanation” for each piece of unlabelled data – it just contains the data, and nothing
else.
b) Labelled Data: Labelled data is a group of samples that have been marked with one or
more labels. Labelling puts meaningful tags to data so that it gives some information or
[8]
CLASS X UNIT-II AI PROJECT CYCLE
explanation about the data, e.g., if some X-ray images are labeled as “tumour” then
those X-ray images are not unlabelled any more, rather, now they belong to a category
of images that show tumors of some type.
1.7.4 Supervised Learning
Supervised learning is a machine learning approach in which a machine, with the help of an
algorithm(called the model), learns from a labeled dataset and desired outputs. Using this
dataset, it learns to identify the type or class of the data given to it. Later some data is
shown to it to test if it can clearly identify the
data or not. It applies the same concept as a
student learns in the supervision of the
teacher.
For example, a labelled dataset of shapes
would contain photos of triangle tagged as
triangle, photos of square tagged as square
and so on for other shapes. When showna
new image, the model compares it to the
training examples to predict the correct label. The model gets feedback about its result as
per the desired outputs and this way, it learns to classify correctly and that is why it is
supervised learning.
1.7.4.1 Types of Supervised Learning
a) Classification: This is type of Supervised Learning. In
general, classification refers to the process of classifying
something according to its features. Similarly, in AI models,
classification algorithms are used to classify the given
dataset on the basis of rules assigned to it.
Let us understand with the help of a simple example.
Suppose you have a dataset consisting of 100 images of
pears and pomegranates. Now, you want to train a model
to recognize whether an image is of a pear or a
pomegranate. To do this, you must train the model with
discrete datasets along with their labels. After training with
a particular dataset, your model will be ready to classify the images on the basis of the
labels and predict the correct label for test data.
A classification problem is when the output variable is a category, such as “Red”
or “blue”, “disease” and “no disease”, spam or no
spam in email.
[10]
CLASS X UNIT-II AI PROJECT CYCLE
❖ Data visualization applications
❖ Video & satellite observation compression
1.7.9 Association
Association is another unsupervised learning
technique that finds important relations between
variables or features in a data set.
For example, if you pick some home decor items
such as lamps or shelves in an online shopping cart,
it will start suggesting the related items such as
furniture, rugs and even interior designing firms.
This is an example of association, where certain
features of a data sample correlate with other features. By looking at a couple key attributes
of a data point, and unsupervised learning model can predict the other attributes with which
they’re commonly associated.
Some examples of association problems/applications are:
❖ As Recommendation Systems, based on people’s own personality/habits.
❖ People that buy a new home are most likely to buy new furniture and thussuggesting
furniture items and stores
1.7.10Reinforcement Learning
Reinforcement Learning is a feedback-based Machine learning technique in which an agent
learns to behave in an environment by performing
the actions and seeing the results of actions. For
each good action, the agent gets positive
feedback, and for each bad action, the agent gets
negative feedback or penalty.
In Reinforcement Learning, the agent learns
automatically using feedbacks without any labeled
data. Since there is no labeled data, so the agent is
bound to learn by its experience only.
Example:
The problem is as follows: We have an agent and a reward, with many hurdles in
between. The agent is supposed to find the best possible path to reach the reward.
The following problem explains the problem more easily.
The image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by
trying all the possible paths and then choosing the path which gives him the reward
with the least hurdles. Each right step will give the robot a reward and each wrong
step will subtract the reward of the robot. The total reward will be calculated when it
reaches the final reward that is the diamond.
Neural Networks
⮚ Biological Neural Network
✔ In living organisms, the brain is the control unit of the neural network, and it has different
subunits that take care of vision, senses, movement, and hearing.
✔ The brain is connected with a dense network of nerves to the rest of the body’s sensors and
actors.
[11]
CLASS X UNIT-II AI PROJECT CYCLE
✔ There are approximately 10¹¹ neurons in the
brain, and these are the building blocks of the
complete central nervous system of the living
body.
⮚ Working of Biological Neural Network
✔ The neuron is the fundamental building block of
neural networks.
✔ A neuron comprises three major parts: the
synapse, the dendrites, and the axon.
✔ Dendrites: It receive signals from surrounding
neurons.
✔ Axon: It transmits the signal as electric impulses along its length to the other neurons. Each
neuron has one axon.
✔ Synapse: At the ending terminal of the axon, the contact with the dendrite is made through
a synapse.
⮚ Artificial Neural Network or Neural Network
✔ The term "Artificial Neural Network" is
derived from Biological neural networks that
develop the structure of a human brain.
✔ Similar to the human brain that has neurons
interconnected to one another, artificial neural
networks also have neurons that are
interconnected to one another in various layers
of the networks.
✔ These neurons are known as nodes.
✔ A neural network is essentially a system of organizing machine learning algorithms
to perform certain tasks.
✔ The key advantage of neural networks, are that they are able to extract data features
automatically without needing the input of the programmer.
✔ It is a fast and efficient way to solve problems for which the dataset is very large, such as in
images.
⮚ Working of Artificial Neural Network (ANN)
✔ A Neural Network is divided into multiple layers and each layer is further divided into
several blocks called nodes.
✔ Each node has its own task to accomplish which is then passed to the next layer.
✔ The first layer of a Neural Network is known as the input layer. The job of an input layer is to
acquire data and feed it to the Neural Network. No processing occurs at the input layer.
✔ Next to it, are the hidden layers. Hidden layers are the layers in which the whole processing
occurs. Their name essentially means that these layers are hidden and are not visible to the
user. Each node of these hidden layers has its own machine learning algorithm which it
executes on the data received from the input layer.
✔ The processed output is then fed to the subsequent hidden layer of the network. There can
be multiple hidden layers in a neural network system and their number depends upon the
complexity of the function for which the network has been configured.
✔ Also, the number of nodes in each layer can vary accordingly. The last hidden layer passes
the final processed data to the output layer which then gives it to the user as the final
output.
✔ Similar to the input layer, output layer too does not process the data which it acquires. It is
meant for user-interface.
[12]