Professional Documents
Culture Documents
1. Problem Scoping
2. Data Acquisition
3. Data Exploration
4. Data Modelling
5. Evaluation
The problem scoping refers to the identification of a problem and the vision to solve it.
1. Who? – Refers that who is facing a problem and who are the stakeholders of the problem
2. What? – Refers to what is the problem and how you know about the problem
3. Where? – It is related to the context or situation or location of the problem
4. Why? – Refers to why we need to solve the problem and what are the benefits to the
stakeholders after solving the problem
When you start with an AI project or model you need to do problem scoping first.
It the process of figure out the problem and what are the solutions.
The AI project must have problem statement with required clarity
[context, situation]
– Describe the context or location or situation
when/while ___________ Where
___________
___________
1. Data : Data refers to the raw facts , figures, or piece of facts, or statistics collected for
reference or analysis.
2. Acquisition: Acquisition refers to acquiring data for the project.
The stage of acquiring data from the relevant sources is known as data acquisition.
Data features refer to the type of data that we want to collects. Here two terms are associated with
this:
1. Training Data: The collected data through the system is known as training data. In other
words the input given by the user in the system can be considered as training data.
2. Testing Data: The result data set or processed data is known as testing data. In other
words, the output of the data is known as testing data.
●Lionbridge AI
●Amazon Mechanical Turk
●LabelBox
●Figure Eight
●Kaggle
●http://mospi.nic.in/data
Data Exploration
Data Exploration refers to the techniques and tools used to visualize data through complex
statistical methods.
Till now you learned about problem scoping and data acquisition. Now you have set your goal for
your AI project and found ways to acquire data. When you acquired data the main problem with
data is – the data is very complex. Because it’s having numbers. To make use of these numbers
user need a specific pattern to understand the data.
For example if you are going to reading a book. You went to library and selected a book. The first
things you try to do is, just turning the pages and take a review and then select a book of your
choice. Similarly, when you are working with data or going to analyze data you need to use data
visualization.
Here the list of 20 data visualization tools for you. Although there are many more tools available
and these numbers increasing day by day.
1. Microsoft Excel
2. Tableau
3. Qlikview
4. FusionCharts
5. DataWrapper
6. MS Power BI
7. Google Data Studio
8. Sisense
9. HiCharts
10. Xplenty
11. HubSpot
12. Whatagraph
13. Adaptive Discovery
14. Teammate Analytics
15. Jupyter
16. Dundas BI
17. Infogram
18. Google Charts
19. Visme
20. Domo
Do a small research and learn how to visualize your data with above tools.
You can get access to multidimensional data by following this link: Visualize 200-dimensional data
Modelling
Now you are entering the modelling stage. So let’s explore the terms for it:
Artificial Intelligence, or AI, refers to any technique that enables computers to mimic
human intelligence.
Machine Learning, or ML, enables machines to improve at tasks with experience. The
machine learns from its mistakes and takes them into consideration in the next execution.
Deep Learning, or DL, enables software to train itself to perform tasks with vast amounts
of data. In deep learning, the machine is trained with huge amounts of data which helps it
into training itself around the data.
AI Modelling refers to developing algorithms, also called models which can be trained to
get intelligent outputs. That is, writing codes to make a machine artificially intelligent.
Types of AI models
Rule-Based model refers to setting up rules and training the model accordingly. It follows an
algorithm or code to train, test and validate data.
Learning-based refer to identifying the data by its attributes and behaviour and training the model
accordingly. There is no code or algorithm to train, test and validate the data. It learns from past
behaviour and attributes received from data.
Decision Tree
Decision tree builds classification or regression models in the form of a tree structure.
It breaks down a dataset into smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed.
The final result is a tree with decision nodes and leaf nodes
Types of learning
There are three types of learning:
1. Supervised
2. Unsupervised
3. Reinforcement
Supervised Learning
Unsupervised Learning
Evaluation
Once a model has been made and trained, it needs to go through proper testing so that one can calculate the
efficiency and performance of the model. Hence, the model is tested with the help of Testing Data (which was
separated out of the acquired dataset at Data Acquisition stage) and the efficiency of the model is calculated
on the basis of the following parameters,
1. Accuracy
2. Precision
3. Recall
4. F1 score