You are on page 1of 2

Artificial Intelligence and Data Analytics (AIDA) Guidebook

5 Machine Learning Methodology


This section provides an overview of how developers effectively build, evaluate, and manage
analytic and learning systems through a machine-learning pipeline. A machine-learning pipeline
is a way to codify and automate the workflow necessary to produce a machine-learning model.
Machine learning pipelines consist of multiple sequential steps that do everything from data
extraction and preprocessing to model training and deployment. Figure 6 provides a high-level
example of a machine-learning pipeline, with each step described in more detail to follow.

Figure 6: Machine Learning Pipeline

Pipeline steps:

1. Establish AI system goal – traditional goals of AI research include reasoning, knowledge


representation, planning, learning, natural language processing, perception, and the
ability to move and manipulate objects.
2. Establish requirements – consider desired performance, usability, integration, and
statistical behavior.
3. Identify AI solution design – identify the algorithms and programming language to be
used.
4. Identify use constraints – constraints enumerate the possible values a set of variables
may take in a given world.
5. Identify required data sets – the more complex your model becomes, the more data you
will need to determine its parameters.

Page 15
Artificial Intelligence and Data Analytics (AIDA) Guidebook

6. Instantiate AI solution design in code – there are many programming languages to


choose from such as C++, Java, Python, or R.
7. Prepare training data – includes cleaning the data of missing values, formatting data for
consistency, making the units consistent, decomposing complex values, and aggregating
simple values in the data.
8. Perform iterative training, testing, and evaluation cycle – input the data into the model
in order to train it and improve model accuracy, setting a minimum acceptable accuracy
threshold. If the testing and evaluation reveal that the model is not ready for
deployment, return to step 3 and continue to refine.
9. Integrate – determine how machine learning will work within existing business
processes.
10. Deploy – turn the model on in a real-world environment.
While the steps are sequential, the Machine Learning pipeline is often more of an iterative
process, especially between the “identify AI solution design” and “perform iterative training,
testing and evaluation cycle” steps. Often developers will need to revise either the training
data, data preparation/augmentation, or machine learning model structure as a result of the
training, testing, and evaluation step. The developer may uncover invalid assumptions which
require revisiting the initial design setup and conducting continuous iteration until the overall
AI system requirements are met.
A machine learning pipeline integrates both statistical behavior (i.e., statistical analysis and
response requirements for the system) and use constraints (i.e., constraints on the how the
system is to be deployed to support decision making – how autonomous, what timing
requirements, what level of potential harm to users and/or subjects) into the AI system
development process. Although powerful when implemented correctly, the machine-learning
pipeline does offer challenges to a developer. One such challenge is imposing DevOps practices
on a machine-learning pipeline. As previously defined, DevOps is the combination of software
and hardware systems aimed at continuously
developing and deploying software to AI Example: Language Models
increase operational efficiencies. In this
process, DevOps could also be considered the An AI model that has been trained to predict

code used to create the model. The machine- the next word or words in a text based on
learning model is the combination of the data the preceding words, it’s part of the
and the code, which is refined through technology that predicts the next word you
continuous integration, continuous want to type on your mobile phone allowing
deployment, and continuous training of the you to complete the message faster.
model.
While a DevOps code may be relatively set once developed, machine learning’s challenge is
how to keep code up to date with data while they change in parallel. Model accuracy and

Page 16

You might also like