You are on page 1of 13

I. Image Search Service.

1. Describe.
Image search using for find similar products (images). Each shop has its own
database. When a user sends an image, the system will automatically search for
products with similar images in the shop’s database and return the results. You
can also send a text description of the object you are looking for in case there are
multiple objects in the same image. The data is stored in two places: MongoDB
and the Qdrant search engine. The data includes information, attributes of
products, and the feature vector of image.

2. API document.
2.1. API search.

Function: Similar product retrieval

Endpoint: /process

Method: POST

Body:
{
"index": "string",
"data": {
"input_type": "image_url",
"image_url": "link to image",
"storage_id": "shop’s id saved in database",
"user_id": "facebook user id",
"text": "user message. Ex: áo này đẹp quá"
}
}

2.2. API refresh index.

Function: pull all data from MongoDB insert to Qdrant and then delete older
data from Qdrant.

HOÀNG NGỌC TIẾN 1


Endpoint: /refresh_search_index

Method: POST

Body:
{
"index": "string",
"data": ["list of storage_id"]
}
Nếu muốn refresh cho tất cả các shop:
{
"index": "string",
"data": ["all"]
}

2.3. API update data.

Function: update/insert product into database.

Endpoint: /update_shop_data_image

Method: POST

Body:
{
"index": "string",
"data": [
{
"storage_id": "shop’s id",
"id": "id of variant product",
"parent_class": "category of product that received from
chatbot. Ex: áo, quần",
"exception_data": false, # update icon/sticker data if True,
update normal data if False
"image_urls": [
"list of image’s url"
],
"class": "VariantProduct"
}
]
}

2.4. API delete data.

Function: delete one or more products from database

Endpoint: /delete_shop_data_image
HOÀNG NGỌC TIẾN 2
Method: POST

Body:
{
"index": "a;lkalsd",
"data": [
{
"storage_id": "shop’s id",
"id": "id of variant product",
"image_urls": [list urls you want to delete]
}
]
}

2.5. API delete shop.

Function: delete all shop’s data.

Endpoint: /delete/{storage_id}

Method: DELETE

2.6. API return object detection result.

Function: get result of object detection service and draw on image.

Endpoint: /get_image_detail

Method: POST

Body:
{
"index": "string",
"data": {
"input_type": "image_url",
"storage_id": "shop’s id",
"show_img": true, # return image drawn image if true
"image_urls": [
"link of image"
]
}
}

HOÀNG NGỌC TIẾN 3


2.7. API count No. product.

Function: count number of variant product of shop.

Endpoint: /get_database_info/{storage_id}

Method: GET

Call /get_database_info/all if you want return information of all shop.

2.8. API synchronize product.

Function: synchronize products with the Shop Manager. Including two actions,
send all products id in the database currently to shop manager and deleting
products that do not exist in the shop manager’s database.

Endpoint: /sync_product

Method: GET

3. Models.
3.1. Embedding model.

Idea:

It is a kind of self-supervised learning used to train embedding model for


images, meaning we don’t need to label the data or spend little effort to label it.
Each sample consists of a set of three images, two of that are original images
from the dataset, and another is mixed from two original images. We use the mix
ratio as the label which is similarity between the mixed image and the other two
images. For example, if you mix 30% of image 1 + 70% of image 2, then the
labels will be Sim(mix, image_1) = 0.3 and Sim(mix, image_2) = 0.7.

The objective function: close the similarity between each pair of images equal to
the label, the label here is a soft label (real number).

Paper: https://arxiv.org/abs/2207.08409

HOÀNG NGỌC TIẾN 4


Code train and dataset:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\tienhn\server23\fashion-dataset\
TokenMix

3.2. Object detection.

Idea:

We use the YOLOv8 model, however, YOLOv8 pre-trained on the COCO


dataset only has 80 classes and contains a few fashion classes (dresses,
watches…) so we have to retrain it.

We use another excellent model from Microsoft called Scene-graph to label, this
model can detect more than 200 labels with high accuracy but the inference time
is slow. We use Scene-graph inference through the COCO dataset to label data.
Filter to keep only the labels have large enough data. We use that data to train
the YOLOv8 model.

Dataset:

- Original COCO dataset labeled by model Scene-graph:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets_origin

- Processed dataset:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets

Checkpoint path:

\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v3

Training code:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection
HOÀNG NGỌC TIẾN 5
3.3. Classification.

Idea:

We want a classify model that is strong enough and especially lightweight to


ensure the inference time can adapt RPS. We use the output of the embedding
model as the input for the classify model. We design a simple network consisting
of linear layers, dropout, layernorm, and ReLU (Rectified Linear Unit).

Code training:

This file is located at image search service on repos Chatbot.

/ChatBot/AlgorithmServices/Image/ProductImageSearch/model/utils/
train_classification.py

Data example:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\tienhn\local\shirt

4. Resources.
4.1. Azure repository.

Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot

Branch: dev

Folder: AlgorithmServices/Image/ProductImageSearch

4.2. Triton server.

Model repository:

\\smb-ai.tmt.local\Public-AI\Public\Model\ProductImageSearch\Version_4

If you want test with triton server, you can run Docker image build on server 24:

Image’s name: my_triton_server

Image’s id: 56a5c17e4805


HOÀNG NGỌC TIẾN 6
Command:

sudo docker run --rm --gpus=all --shm-size=512m -v \

/TMTAI/KBQA/nguyenpq/Deployment/Image/ProductImageSearch/model/check
points/model_repository:/models \

-p 25002:8001 --name triton_tienhn my_triton_server

4.3. Onnx.

Code convert model embedding and classification to Onnx:

/ChatBot/AlgorithmServices/Image/ProductImageSearch/model/utils/
train_classification.py

5. Future work:
We want implement a task retrieval image base on image-text, for example:

HOÀNG NGỌC TIẾN 7


Paper: https://arxiv.org/pdf/2006.11149.pdf

Github: https://github.com/ecom-research/ComposeAE

Keyword: image-text query for image retrieval, Linguistic-Visual for image


retrieval

II. Object Detection Service.

1. Describe.
It is used to detect all objects in an image, or a single object if passing the text to
parameter. Currently, there are two services call to object detection, which are
the image search service and the shop manager used for cropping images.

2. Models.
2.1. Yolov8.

Idea:

HOÀNG NGỌC TIẾN 8


We use the YOLOv8 model, however, YOLOv8 pre-trained on the COCO
dataset only has 80 classes and contains a few fashion classes (dresses,
watches…) so we have to retrain it.

We use another excellent model from Microsoft called Scene-graph to label, this
model can detect more than 200 labels with high accuracy but the inference time
is slow. We use Scene-graph inference through the COCO dataset to label data.
Filter to keep only the labels have large enough data. We use that data to train
the YOLOv8 model.

Checkpoint path:

\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v4

Training code:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection

2.2. Sentence transformer

Idea:

In case a text is passed to describe the object of interest, the text will be fed into
the sentence transformer model to produce a vector embedding, then compared
with all object class names from the output of object detection to select the
object that most closely similar with the description in the input text.

Note: We have experimented with others visual grounding models, but the
inference time was too slow (OFA, CLIP-large) or the accuracy was not
guaranteed (OFA-base).

Model:

\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v4\
checkpoint\sbert

HOÀNG NGỌC TIẾN 9


3. Resources.
3.1. Data.

- Original COCO dataset labeled by model Scene-graph:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets_origin

- Processed dataset:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets

3.1. Azure repository.

Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot

Branch: dev

Folder: AlgorithmServices/Image/ObjectDetectionGeneral

III. Visual Grounding.

1. Describe.
Visual Grounding is a task in computer vision and NLP that aims to locate the
most relevant object or region in an image, based on a natural language
query. The query can be a phrase, a sentence, or even a multi-round dialogue.

We experienced with OFA model.

2. Resources.
Code:

HOÀNG NGỌC TIẾN 10


\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\data_sv24\API\
VisualGroundingApi

Checkpoint path:

\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\data_sv24\API\
VisualGroundingApi\model\checkpoints\visual_grounding\refcoco_base_best.pt

Document:

https://docs.google.com/document/d/1yv2SlMTHEEht-zc-
U1tPuOEnW2HRPDKO-zA Pq2yc3xw/edit

HOÀNG NGỌC TIẾN 11


IV. Bill Generation Service.

1. Describe.
Using for generate image of size table or order’s image following predefined
format

2. Resources.
2.1. Azure repository.

Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot

Branch: dev

Folder: AlgorithmServices/Image/BillImageGeneration

HOÀNG NGỌC TIẾN 12


V. Bill Classification.

1. Describe.
When the chat bot receives any image, it always calls the API to the image
search service, so a model need to distinguish between images of product and
image of bank transaction.

The model must be lightweight, so I chose SVM and used HoG feature
extraction of the image as input.

Note: It hasn’t integrated into the service yet.

2. Resources.
Training code:

\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\main.ipynb

Weight:

\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\model.pkl

\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\pca.pkl

HOÀNG NGỌC TIẾN 13

You might also like