Handover

I. Image Search Service.
1. Describe.
Image search using for find similar products (images). Each shop has its own
database. When a user sends an image, the system will automatically search for
products with similar images in the shop’s database and return the results. You
can also send a text description of the object you are looking for in case there are
multiple objects in the same image. The data is stored in two places: MongoDB
and the Qdrant search engine. The data includes information, attributes of
products, and the feature vector of image.
2. API document.
2.1. API search.
Function: Similar product retrieval
Endpoint: /process
Method: POST
Body:
{
"index": "string",
"data": {
"input_type": "image_url",
"image_url": "link to image",
"storage_id": "shop’s id saved in database",
"user_id": "facebook user id",
"text": "user message. Ex: áo này đẹp quá"
}
}
2.2. API refresh index.
Function: pull all data from MongoDB insert to Qdrant and then delete older
data from Qdrant.
HOÀNG NGỌC TIẾN 1

Endpoint: /refresh_search_index
Method: POST
Body:
{
"index": "string",
"data": ["list of storage_id"]
}
Nếu muốn refresh cho tất cả các shop:
{
"index": "string",
"data": ["all"]
}
2.3. API update data.
Function: update/insert product into database.
Endpoint: /update_shop_data_image
Method: POST
Body:
{
"index": "string",
"data": [
{
"storage_id": "shop’s id",
"id": "id of variant product",
"parent_class": "category of product that received from
chatbot. Ex: áo, quần",
"exception_data": false, # update icon/sticker data if True,
update normal data if False
"image_urls": [
"list of image’s url"
],
"class": "VariantProduct"
}
]
}
2.4. API delete data.
Function: delete one or more products from database
Endpoint: /delete_shop_data_image
Method: POST
Body:
{
"index": "a;lkalsd",
"data": [
{
"id": "id of variant product",
"image_urls": [list urls you want to delete]
}
]
}
2.5. API delete shop.
Function: delete all shop’s data.
Endpoint: /delete/{storage_id}
Method: DELETE
2.6. API return object detection result.
Function: get result of object detection service and draw on image.
Endpoint: /get_image_detail
Method: POST
Body:
{
"index": "string",
"data": {
"input_type": "image_url",
"show_img": true, # return image drawn image if true
"image_urls": [
"link of image"
]
}
}

2.7. API count No. product.
Function: count number of variant product of shop.
Endpoint: /get_database_info/{storage_id}
Method: GET
Call /get_database_info/all if you want return information of all shop.
2.8. API synchronize product.
Function: synchronize products with the Shop Manager. Including two actions,
send all products id in the database currently to shop manager and deleting
products that do not exist in the shop manager’s database.
Endpoint: /sync_product
Method: GET
3. Models.
3.1. Embedding model.
Idea:
It is a kind of self-supervised learning used to train embedding model for

images, meaning we don’t need to label the data or spend little effort to label it.
Each sample consists of a set of three images, two of that are original images
from the dataset, and another is mixed from two original images. We use the mix
ratio as the label which is similarity between the mixed image and the other two
images. For example, if you mix 30% of image 1 + 70% of image 2, then the
labels will be Sim(mix, image_1) = 0.3 and Sim(mix, image_2) = 0.7.
The objective function: close the similarity between each pair of images equal to
the label, the label here is a soft label (real number).
Paper: https://arxiv.org/abs/2207.08409

Code train and dataset:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\tienhn\server23\fashion-dataset\
TokenMix
3.2. Object detection.
Idea:
We use the YOLOv8 model, however, YOLOv8 pre-trained on the COCO

dataset only has 80 classes and contains a few fashion classes (dresses,
watches…) so we have to retrain it.
We use another excellent model from Microsoft called Scene-graph to label, this
model can detect more than 200 labels with high accuracy but the inference time
is slow. We use Scene-graph inference through the COCO dataset to label data.
Filter to keep only the labels have large enough data. We use that data to train
the YOLOv8 model.
Dataset:
- Original COCO dataset labeled by model Scene-graph:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets_origin
- Processed dataset:
Detection\datasets
Checkpoint path:
\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v3
Training code:
Detection
3.3. Classification.
Idea:
We want a classify model that is strong enough and especially lightweight to

ensure the inference time can adapt RPS. We use the output of the embedding
model as the input for the classify model. We design a simple network consisting
of linear layers, dropout, layernorm, and ReLU (Rectified Linear Unit).
Code training:
This file is located at image search service on repos Chatbot.
/ChatBot/AlgorithmServices/Image/ProductImageSearch/model/utils/
train_classification.py
Data example:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\tienhn\local\shirt
4. Resources.
4.1. Azure repository.
Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot
Branch: dev
Folder: AlgorithmServices/Image/ProductImageSearch
4.2. Triton server.
Model repository:
\\smb-ai.tmt.local\Public-AI\Public\Model\ProductImageSearch\Version_4
If you want test with triton server, you can run Docker image build on server 24:
Image’s name: my_triton_server
Image’s id: 56a5c17e4805

Command:
sudo docker run --rm --gpus=all --shm-size=512m -v \
/TMTAI/KBQA/nguyenpq/Deployment/Image/ProductImageSearch/model/check
points/model_repository:/models \
-p 25002:8001 --name triton_tienhn my_triton_server
4.3. Onnx.
Code convert model embedding and classification to Onnx:
/ChatBot/AlgorithmServices/Image/ProductImageSearch/model/utils/
train_classification.py
5. Future work:
We want implement a task retrieval image base on image-text, for example:

Paper: https://arxiv.org/pdf/2006.11149.pdf
Github: https://github.com/ecom-research/ComposeAE
Keyword: image-text query for image retrieval, Linguistic-Visual for image

retrieval
II. Object Detection Service.
1. Describe.
It is used to detect all objects in an image, or a single object if passing the text to
parameter. Currently, there are two services call to object detection, which are
the image search service and the shop manager used for cropping images.
2. Models.
2.1. Yolov8.
Idea:

We use the YOLOv8 model, however, YOLOv8 pre-trained on the COCO
dataset only has 80 classes and contains a few fashion classes (dresses,
watches…) so we have to retrain it.
We use another excellent model from Microsoft called Scene-graph to label, this
model can detect more than 200 labels with high accuracy but the inference time
is slow. We use Scene-graph inference through the COCO dataset to label data.
Filter to keep only the labels have large enough data. We use that data to train
the YOLOv8 model.
Checkpoint path:
\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v4
Training code:
Detection
2.2. Sentence transformer
Idea:
In case a text is passed to describe the object of interest, the text will be fed into
the sentence transformer model to produce a vector embedding, then compared
with all object class names from the output of object detection to select the
object that most closely similar with the description in the input text.
Note: We have experimented with others visual grounding models, but the
inference time was too slow (OFA, CLIP-large) or the accuracy was not
guaranteed (OFA-base).
Model:
\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v4\
checkpoint\sbert

3. Resources.
3.1. Data.
- Original COCO dataset labeled by model Scene-graph:
Detection\datasets_origin
- Processed dataset:
Detection\datasets
ChatBot
Branch: dev
Folder: AlgorithmServices/Image/ObjectDetectionGeneral
III. Visual Grounding.
1. Describe.
Visual Grounding is a task in computer vision and NLP that aims to locate the
most relevant object or region in an image, based on a natural language
query. The query can be a phrase, a sentence, or even a multi-round dialogue.
We experienced with OFA model.
2. Resources.
Code:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\data_sv24\API\
VisualGroundingApi
Checkpoint path:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\data_sv24\API\
VisualGroundingApi\model\checkpoints\visual_grounding\refcoco_base_best.pt
Document:
https://docs.google.com/document/d/1yv2SlMTHEEht-zc-
U1tPuOEnW2HRPDKO-zA Pq2yc3xw/edit

IV. Bill Generation Service.
1. Describe.
Using for generate image of size table or order’s image following predefined
format
2. Resources.
ChatBot
Branch: dev
Folder: AlgorithmServices/Image/BillImageGeneration

V. Bill Classification.
1. Describe.
When the chat bot receives any image, it always calls the API to the image
search service, so a model need to distinguish between images of product and
image of bank transaction.
The model must be lightweight, so I chose SVM and used HoG feature
extraction of the image as input.
Note: It hasn’t integrated into the service yet.
2. Resources.
Training code:
\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\main.ipynb
Weight:
\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\model.pkl
\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\pca.pkl

Handover

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handover

Uploaded by

Copyright:

Available Formats

I. Image Search Service.

Function: Similar product retrieval

2.2. API refresh index.

HOÀNG NGỌC TIẾN 1

2.3. API update data.

Function: update/insert product into database.

2.4. API delete data.

Function: delete one or more products from database

2.5. API delete shop.

Function: delete all shop’s data.

2.6. API return object detection result.

Function: get result of object detection service and draw on image.

HOÀNG NGỌC TIẾN 3

Function: count number of variant product of shop.

Call /get_database_info/all if you want return information of all shop.

2.8. API synchronize product.

It is a kind of self-supervised learning used to train embedding model for

HOÀNG NGỌC TIẾN 4

3.2. Object detection.

We use the YOLOv8 model, however, YOLOv8 pre-trained on the COCO

- Original COCO dataset labeled by model Scene-graph:

We want a classify model that is strong enough and especially lightweight to

This file is located at image search service on repos Chatbot.

4.2. Triton server.

Image’s name: my_triton_server

Image’s id: 56a5c17e4805

sudo docker run --rm --gpus=all --shm-size=512m -v \

-p 25002:8001 --name triton_tienhn my_triton_server

Code convert model embedding and classification to Onnx:

HOÀNG NGỌC TIẾN 7

Keyword: image-text query for image retrieval, Linguistic-Visual for image

II. Object Detection Service.

HOÀNG NGỌC TIẾN 8

2.2. Sentence transformer

HOÀNG NGỌC TIẾN 9

- Original COCO dataset labeled by model Scene-graph:

3.1. Azure repository.

III. Visual Grounding.

We experienced with OFA model.

HOÀNG NGỌC TIẾN 10

HOÀNG NGỌC TIẾN 11

HOÀNG NGỌC TIẾN 12

Note: It hasn’t integrated into the service yet.

HOÀNG NGỌC TIẾN 13

You might also like