You are on page 1of 17

INT

ER
NA
L&
ML Process in Microsoft Azure PA
RT
NE
Table des matières RS

I. Definition:.......................................................................................................................................2
1. ML Life cycle and resources in Azure ML....................................................................................2
2. ML assets in Azure ML................................................................................................................3
II. Machine Learning Studio:...............................................................................................................3
III. Machine Learning Service:..........................................................................................................4
1. Prepare Data...............................................................................................................................4
2. Experiments................................................................................................................................5
3. Deployment................................................................................................................................6
IV. Data Workflow using Azure services (Data Pipeline):.................................................................6
1. Connectivity:...............................................................................................................................7
2. Data Ingestion and processing:...................................................................................................7
A. Batch mode (or static dataset or cold path):.......................................................................................7
B. Streaming Mode...................................................................................................................................8
3. Data store:..................................................................................................................................9
4. Post processing (Analytics) & Exposition using ADX service.....................................................10
5. Machine Learning model training:............................................................................................10
6. Evaluation.................................................................................................................................11
7. Model Deployment in container instance (ACI)........................................................................11
8. Production deployment using AKS:..........................................................................................12
9. Exposure of the model via API as endpoint to third apps.........................................................13
10. Machine learning model Inferencing:.......................................................................................13
V. MLops (for Python):......................................................................................................................14
1. Definition..................................................................................................................................14
2. Architecture :............................................................................................................................14
VI. Data quality:.............................................................................................................................16

5acXjzUk
INT
ER
NA
L&
I. Definition: PA
RT
1. ML Life cycle and resources in Azure ML NE
RS

Azure Machine Learning includes several resources and assets to enable you to perform your
machine learning tasks. These resources and assets are needed to run any job.

 Resources: setup or infrastructural resources needed to run a machine learning


workflow. Resources include:
1) Workspace: it’ equivalent to a container: it provides a centralized place to work
with all the artifacts you create when you use Azure ML. The workspace stores
references to resources like datastores and compute. It also holds all assets like
models, environments, components and data asset.
2) Compute: A compute is a designated compute resource where you run your job
or host your endpoint => it’s a runtime environment.
Azure Machine Learning supports the following types of compute:
 Compute cluster – it’s a set of compute resources (scalable clusters of VM)
that allows you to easily create a cluster of CPU or GPU compute nodes in the
cloud => ability to perform parallel computing and experiments using multi
node computes.
 Compute instance: it’s a fully configured and managed development
environment in the cloud. You can use the instance as a training or inference
compute for development and testing. It's similar to a virtual machine on the
cloud
 Inference cluster: used to deploy trained machine learning models as web
service which is called Azure ML endpoint:
o Azure Container Instance ACI: for testing and development (staging
& QA)
o Azure Kubernetes Service AKS: for production level
 Attached compute - You can attach your own compute resources to your
workspace and use them for training and inference.
It allows to bring your own compute like HDinsight cluster, a virtual machine,
databricks cluster to use as compute target with your Azure ML.

5acXjzUk
INT
ER
NA
L&
3) Datastore PA
 Azure Blob storage RT
NE
 Azure Data Lake
RS
 Azure SQL Database
 Databricks File System
 Azure Blob Container

2. ML assets in Azure ML
 Assets: created using Azure Machine Learning commands or as part of a training/scoring
run. Assets are versioned and can be registered in the Azure ML workspace. They include:
1) Model: binary file(s) that represent a machine learning model and any
corresponding² metadata. Models can be created from a local or remote file or
directory
2) Environment
3) Data
4) Component:
A component is analogous to a function - it has a name, parameters, expects
input, and returns output. Components can do tasks such as data processing,
model training, model scoring, and so on.

Azure offers two Machine Learning solutions with different capabilities and advantages:

 Machine learning Studio


 Machine learning Service

II. Machine Learning Studio:


for building ML solutions using a collaborative, drag and drop interface and pre-built models:
ideal for those who’re new to ML.
Experiments:
An azure ML pipeline contains the jobs of build, train and test a model. Once this pipeline is
run, it becomes an experiment.
ML studio allows to build experiments that includes these basic tasks:
1. Create a model:
 Get the data
 Prepare the data
 Define features
2. Train the model:
 Choose and apply an algorithm
 Optimize the model parameters
3. Evaluate and test the model:
 Predict outputs values for a testing dataset
 Calculate Performance metrics
 Evaluate Model

5acXjzUk
INT
ER
NA
L&
PA
RT
NE
RS

III. Machine Learning Service:


A more open platform for creating ML solutions using Python and other open-source tooling
– ideal for those who are experienced building ML solutions and want to take advantage of
public-cloud scalability

Azure machine learning service workflow is a three-step process that includes:

1. Prepare Data
 Datastore: They are used to store connection information to Azure storage services
which can be referred to by name and are attached to the workspace. Some
examples of supported Azure storage services that can be registered as datastores
are:
o Azure Blob Storage
o Azure Data Lake
o Azure SQL Database
o Databricks File System
o Azure Blob Container

 Datasets: A Dataset is a reference to data in the datastore or behind public web URLs
and also creates a copy of its metadata. There are two types of datasets supported
by Azure namely the File dataset and Tabular dataset.

5acXjzUk
INT
ER
NA
L&
2. Experiments PA
Build, Train & Evaluate the model RT
NE
 Model: It is a piece of code that takes input and produces the output for the given
RS
inputs. While developing a machine learning model, it requires selecting an
algorithm, availing data, and tuning of hyperparameters.
Once the algorithm run dataset it becomes a model
 Training includes an iterative process that provides a trained model inheriting what it
learned from the training process.
 Evaluation
Model evaluation is the process that uses some metrics which help us to analyze the
performance of the model. There are many metrics like:
 Accuracy: a metric that measures the percentage of correct predictions
made by the model
 Precision: a metric that measures the percentage of true positives out of all
predicted positives
 Recall (or sensitivity): a metric that measures the percentage of true
positives out of all actual positives
 F1 score: a metric that combines precision and recall into a single score
 Area Under the Curve (AUC): a metric that measures the performance of a
binary classification model by calculating the area under the Receiver
Operating Characteristic (ROC) curve
 Overfitting: a common problem in model evaluation where the model is too
complex and performs well on the training data but poorly on new, unseen
data
 Underfitting: a common problem in model evaluation where the model is too
simple and performs poorly on both the training data and new, unseen data
 Cross-validation: a technique used to assess the performance of a model by
splitting the data into multiple subsets and training and evaluating the model
on each subset
 Regularization: a technique used to prevent overfitting by adding a penalty
term to the loss function that discourages the model from being too
complex.

5acXjzUk
INT
ER
NA
L&
PA
RT
NE
RS

Evaluation is based on validation and testing process of the models. Hence,


Evaluation uses small part from training dataset which will be divided into 2 types:
 Validation dataset: it comes to select a model: This dataset will be used to test
multiple models and find the optimal values for the parameters of the model to
be then selected as the best one.
 Testing dataset: it aims to make final evaluation of the selected model. The
testing dataset is used to provide an unbiased evaluation of the performance of
this model and ensure that it can generalize well to new, unseen data.

3. Deployment
Once the model is trained and tested, it is stored in the Container image registry and
then deployed in web service or IoT modules.
 Image: An image contains a model, application, training script and the dependencies
required by the model or script. The images are stored in the image registry. There
are two types of images:
o Docker image : used to deploy computer targets such as Azure Kubernetes
Service (AKS) or Azure Container Instances (ACI).
o FPGA image: used while deploying a field-programmable gate array in Azure ML.
 Deployment:
There are 3 ways to deploy your trained model:
 as a Web service endpoint: The registered model is deployed as a
service endpoint.
o Deploy as a Real-time endpoint
o Deploy as a Batch endpoint

5acXjzUk
INT
ER
NA
L&
Endpoint can run on Azure Kubernetes Cluster or Container Instance. PA
RT
IV. Data Workflow using Azure services (Data Pipeline): NE
RS

1. Connectivity:
i. IoT Edge:
OPC Publisher is a module that runs on Azure IoT Edge. It connects to OPC UA server
systems and publishes telemetry data to Azure IoT Hub in various formats.

2. Data Ingestion and processing:

A. Batch mode (or static dataset or cold path):

Batch processing refers to the processing of blocks of data that have already been
stored over a period of time. This process involves using specific connectors for each
data source and target destination.

i. Data ingestion using Azure Data Factory:

Azure Data Factory: is an integration services that provides connectors used to


extract data from various static data sources, including databases, file systems,
and cloud services. These connectors are created by Microsoft or third-party
vendors and are designed to function effectively with multiple data sources. For
example, you can use SAP connectors for various SAP data ingestion scenarios;

5acXjzUk
INT
ER
NA
L&
NB: usually, the platform receiving data (ingesting data) is responsible of PA
providing connector to extract data: for example, foundry platform provide RT
NE
connector to extract data from azure cloud source.
RS

ii. Data preparation using Azure Data Factory:

ADF is an ETL service that provides services that helps to prepare and transform
data:

 Application of pre-processing methods to data: making the data usable by


the system by removing errors, duplicates, etc.
 Staging area: The staging area serves as a temporary storage location
between the source and the destination. The main purpose of this staging
area is to maintain data in a uniform and structured format while it
undergoes transformations or quality checks before it's loaded into its
destination.
 Presenting the prepared data to a learning algorithm

B. Streaming Mode
Data streaming is a mechanism that was designed to allow you to transfer, process
and analyze continuous streams of data in real time. It differs from traditional
databases where data is stored before being processed.
i. Data ingestion process:

 Step 1: Data Capture using IoT Hub


Data is generated in real time from different sources, such as IoT sensors,
online applications, social networks, servers, etc.
IoT Hub can have the same functionality as Event Hubs but also can handle a
bi-directional communication. IoT Hub allows for sending data back to the IoT

5acXjzUk
INT
ER
NA
L&
devices which is not possible with Event Hubs, e.g. send software updates to PA
the sensors. RT
NE
IoT Hub supports MQTT protocol and can act as MQTT subscriber that will
RS
diffuse the topic published from the edge device to cloud services.

 Step 2: Data Ingestion


Raw data is collected using ingestion tools like Event Hub, Apache Kafka,
RabbitMQ, or APIs. These tools ensure that data is reliably routed to the
streaming platform.
Event hubs is a highly scalable telemetry service offering one-way
communication with the HTTP/AMQP protocol. You can send events from
anywhere: a website, an app, an IoT device, a software, etc. Azure Event
Hubs is distinct from Azure IoT Hub as communication is one way and not
two ways.
Event hub is based on Producer/consumer architecture:
 Event producers: Anything that sends data to an event hub can count
as an event producer. The events can be published using the HTTPS
or AMQP 1.0 or Apache Kafka (1.0 and above) protocols. HTTP is
usually used in scenarios with low volume of published events. On
the other hand, AMQP can deal with higher volumes of events
providing better performance and throughput.
 Consumer: Consumer is any application that reads event data from
an Event Hub. For reading event data by the consumer only the
AMQP protocol is used, because the events are pushed to the
consumer from event hub via the AMQP channel, the client does not
need to pull for data availability.

NB: Event Grid supports MQTT protocol and acts as a broker

ii. Data preparation: Real-time processing:


Once ingested, the data is immediately available for processing. Streaming
engines, such as Azure Stream Analytics (ASA), Apache Flink, Apache Spark
Streaming, or Kafka Streams, can be used to process this data in real time. The
data can then be filtered, transformed, aggregated, or enriched during transit.

3. Data store:
 ADLS Gen 2: The data storage proposed for all types of raw, processed, and
transformed data is Azure Data Lake Store Gen2.
 Storing in Batch mode
Azure Synapsys Analytics is an analytics service that binds enterprise data
warehousing and Big Data analytics. Also, Synapsys is a processing engine of massive
data with query capability that allows to store batched processed data. Once
processed data is available and stored in Azure Synapse, various analytics clients can
consume it for business applications.
 Storing in stream mode

5acXjzUk
INT
ER
NA
L&
Most often, data is stored temporarily to be accessible for a short period of PA
time. This allows for re-examination or additional analysis to be performed on the RT
NE
data if necessary. We may perform other steps actions like:
RS
 Step-1: Broadcast or real-time action
Processing results can be streamed in real-time for downstream applications,
such as real-time dashboards, alerts, automated actions, etc.
 Step-2: Archiving or long-term storage
After real-time exploitation, the data can be archived in long-term storage
systems, such as databases or data warehouses, for later analysis.

4. Post processing (Analytics) & Exposition using ADX service

 Azure Data Explorer (ADX) is a fast and highly scalable data exploration and
analytics service that is designed to ingest, analyze, and visualize large volumes
of structured and unstructured data in real-time.
 It is offered as Platform as a Service (PaaS) as part of Microsoft Azure platform
 ADX validates the initial data and converts its formats if necessary.
 Data manipulation includes schema matching, organization, indexing, coding,
and data compression
 Azure Data Explorer offers queuing (batching) and streaming ingestion
 Ingestion properties : Properties that affect how data is inserted, such as tagging,
mapping, and creation time.
 Data Ingestion methods are pipelines and connectors to common services like:
o Azure Event Grid
o Azure Event Hub
o programmatic ingestion using SDKs (kit de développement logiciel)
 Data visualization can be achieved using their native dashboard offering, or with
tools like Power BI or Grafana.

5. Machine Learning model training:


I’s an Iteration to discover the best model (i.e., the model that produces the most
accurate results and the fewest false positives). The most widely used techniques are:

o Holdout Validation:
The holdout validation approach refers to creating a training set and the holdout set
(also referred to as the test or validation set). The training data is used to train the
model, while the unseen test data is used to validate the model performance. The
common split ratio is 70:30, which means 70% of the data is used for building the
model, while the remaining 30% is used for testing the model performance.

5acXjzUk
INT
ER
NA
L&
o K-fold Cross-Validation: PA
RT
The data is divided into k folds. The model is trained on k-1 folds with one-fold held NE
back for testing. For example, if k is set to ten, then the data will be divided into ten RS

equal parts. After that, the model will be built on the first nine parts, while the
evaluation will be done on the tenth part or fold.

This process gets repeated to ensure each fold of the dataset gets the chance to be
the held-back set. Once the process is complete, we summarize the evaluation
metrics such as:

 Mean
 standard deviation
 mean absolute error
 root mean square error
 coefficient of determination: often referred to as R 2, represents the
predictive power of the model as a value between 0 and 1. Zero means the
model is random (explains nothing); 1 means there is a perfect fit

6. Evaluation
1) Calculate metric (or performance data) using test & validation methods like Holdout
validation or Cross validation, or any other (see evaluation techniques above)
2) Evaluate the models based on these metrics
3) Select a model having the optimal parameters
4) Test the selected model with other testing dataset to provide an unbiased
evaluation of the performance of the model selected.

7. Model Deployment in container instance (ACI)

a) Deploy in preproduction environment - Workflow

5acXjzUk
INT
ER
NA
L&
PA
RT
NE
RS

The docker client will be the local development environment where we code the ML
model and the docker host will be located in the shopfloor server

b) Testing & staging: it includes operations such as:


 Retraining
 testing of the model candidate on pre-production data
 test deployments for endpoint performance
 data quality checks
 unit testing
 responsible AI checks for model and data bias

8. Production deployment using AKS:

Model deployment can be promoted to production by using a human-in-the-loop gated


approval.

5acXjzUk
INT
ER
NA
L&
PA
RT
NE
RS

Workflow :
1) The kubectlclient will first translate your CLI command to one more REST-API call(s)
and send it to kube-apiserver.
2) After validating these REST-API calls, kube-apiserver understands the task and calls
kube-scheduler process to select one node from the available ones to execute the
job. This is the scheduling procedure.
3) Once kube-scheduler returns the target node, and kube-apiserver will dispatch the
task with all of the details describing the task.
4) The kubelet process in the target node receives the task and talks to the container
engine (ex: Docker engine) to spawn a container with all provided parameters.
5) This job and its specification will be recorded in a centralized database etcd. Its job is
to preserve and provide access to all data in the cluster.

9. Exposure of the model via API as endpoint to third apps.


10. Machine learning model Inferencing:
is the process of running live data points into a machine learning algorithm (or “ML
model”) to calculate an output such as a single numerical score. This process is also
referred to as “operationalizing an ML model” or “putting an ML model into production.

5acXjzUk
INT
ER
NA
L&
When an ML model is running in production, it is often then described as artificial PA
intelligence (AI) since it is performing functions similar to human thinking and analysis RT
NE
a) Scoring: RS
Scoring is also called prediction and is the process of generating values based on a
trained machine learning model, given some new input data. The values, or scores,
that are created can represent predictions of future values, but they might also
represent a likely category or outcome. The scoring process can generate many
different types of values:
 A list of recommended items and a similarity score
 Numeric values, for time series models and regression models
 A probability value, indicating the likelihood that a new input belongs to
some existing category.
 The name of a category or cluster to which a new item is most similar.
 A predicted class or outcome, for classification models

b) Analytical workload:

The output of the model is stored in analytics systems like Azure Blob storage, Azure
Synapse Analytics, Azure Data Lake, or Azure SQL Database, where the input data is
also collected and stored. This stage facilitates the availability of the prediction
results for customer consumption, model monitoring, and retraining of models with
new data to improve their accuracy.

c) Monitoring:
Monitoring in staging & test and production aims to collect metrics and then act to
make changes in the performance of the model, data, and infrastructure.

V. MLops (for Python):


1. Definition
MLOps = ML + DEV + OPS
MLops: It applies the DevOps principles and practices like continuous integration, delivery, and
deployment to the machine learning process, with an aim for faster experimentation,
development, and deployment of Azure machine learning models into production and quality
assurance.

5acXjzUk
INT
ER
NA
L&
PA
RT
NE
RS

Dev: Plan (requirements backlog) -> Create code for ML model (inputs
parameters, targets, features...etc)-> ML (Experiments) : prepare data ->
Build -> Train -> Evaluate -> Dev: Test selected model -> create a package ->
Ops= Deploy -> Release -> Configure -> Monitor -> Dev: Plan
2. Architecture :
This architecture uses the Azure Machine Learning SDK for Python to create
a workspace (space for an experiment), compute resources, and datastore resources.

5acXjzUk
INT
ER
NA
L&
PA
RT
NE
RS

VI. Data quality:


 Metadata:
 Data structure:
 Traceability: data should be traceable: source, Transformation, logging
 Validity across its relationship: how data are connected to each other: this
achieved through Enterprise level ontology: data is cleaned, stored and made
ready for use by end users.

 Data quality dimensions (metrics):


1- Completeness: Expected comprehensiveness:
Ensure nothing is missing from the data and all the required information is available
2- Conformity: data is following the set of standard data definition like data type, size,
range, format

5acXjzUk
INT
ER
NA
L&
3- Consistency: PA
Ensure that information stored in one place match relevant data stored in another. RT
NE
Ensure that received data values match the expected values
RS
4- Accuracy
Data should correctly reflect the real world
Ensure that data are correct and If any mistake due to misspelling or data sourcing
5- Timeliness:
Ensure that up to date information is available
Ensure that the refresh rate is consistent
6- Uniqueness: only one instance:
Ensure only one instance and no data duplication

5acXjzUk

You might also like