Professional Documents
Culture Documents
1. Describe.
Image search using for find similar products (images). Each shop has its own
database. When a user sends an image, the system will automatically search for
products with similar images in the shop’s database and return the results. You
can also send a text description of the object you are looking for in case there are
multiple objects in the same image. The data is stored in two places: MongoDB
and the Qdrant search engine. The data includes information, attributes of
products, and the feature vector of image.
2. API document.
2.1. API search.
Endpoint: /process
Method: POST
Body:
{
"index": "string",
"data": {
"input_type": "image_url",
"image_url": "link to image",
"storage_id": "shop’s id saved in database",
"user_id": "facebook user id",
"text": "user message. Ex: áo này đẹp quá"
}
}
Function: pull all data from MongoDB insert to Qdrant and then delete older
data from Qdrant.
Method: POST
Body:
{
"index": "string",
"data": ["list of storage_id"]
}
Nếu muốn refresh cho tất cả các shop:
{
"index": "string",
"data": ["all"]
}
Endpoint: /update_shop_data_image
Method: POST
Body:
{
"index": "string",
"data": [
{
"storage_id": "shop’s id",
"id": "id of variant product",
"parent_class": "category of product that received from
chatbot. Ex: áo, quần",
"exception_data": false, # update icon/sticker data if True,
update normal data if False
"image_urls": [
"list of image’s url"
],
"class": "VariantProduct"
}
]
}
Endpoint: /delete_shop_data_image
HOÀNG NGỌC TIẾN 2
Method: POST
Body:
{
"index": "a;lkalsd",
"data": [
{
"storage_id": "shop’s id",
"id": "id of variant product",
"image_urls": [list urls you want to delete]
}
]
}
Endpoint: /delete/{storage_id}
Method: DELETE
Endpoint: /get_image_detail
Method: POST
Body:
{
"index": "string",
"data": {
"input_type": "image_url",
"storage_id": "shop’s id",
"show_img": true, # return image drawn image if true
"image_urls": [
"link of image"
]
}
}
Endpoint: /get_database_info/{storage_id}
Method: GET
Function: synchronize products with the Shop Manager. Including two actions,
send all products id in the database currently to shop manager and deleting
products that do not exist in the shop manager’s database.
Endpoint: /sync_product
Method: GET
3. Models.
3.1. Embedding model.
Idea:
The objective function: close the similarity between each pair of images equal to
the label, the label here is a soft label (real number).
Paper: https://arxiv.org/abs/2207.08409
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\tienhn\server23\fashion-dataset\
TokenMix
Idea:
We use another excellent model from Microsoft called Scene-graph to label, this
model can detect more than 200 labels with high accuracy but the inference time
is slow. We use Scene-graph inference through the COCO dataset to label data.
Filter to keep only the labels have large enough data. We use that data to train
the YOLOv8 model.
Dataset:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets_origin
- Processed dataset:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets
Checkpoint path:
\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v3
Training code:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection
HOÀNG NGỌC TIẾN 5
3.3. Classification.
Idea:
Code training:
/ChatBot/AlgorithmServices/Image/ProductImageSearch/model/utils/
train_classification.py
Data example:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\tienhn\local\shirt
4. Resources.
4.1. Azure repository.
Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot
Branch: dev
Folder: AlgorithmServices/Image/ProductImageSearch
Model repository:
\\smb-ai.tmt.local\Public-AI\Public\Model\ProductImageSearch\Version_4
If you want test with triton server, you can run Docker image build on server 24:
/TMTAI/KBQA/nguyenpq/Deployment/Image/ProductImageSearch/model/check
points/model_repository:/models \
4.3. Onnx.
/ChatBot/AlgorithmServices/Image/ProductImageSearch/model/utils/
train_classification.py
5. Future work:
We want implement a task retrieval image base on image-text, for example:
Github: https://github.com/ecom-research/ComposeAE
1. Describe.
It is used to detect all objects in an image, or a single object if passing the text to
parameter. Currently, there are two services call to object detection, which are
the image search service and the shop manager used for cropping images.
2. Models.
2.1. Yolov8.
Idea:
We use another excellent model from Microsoft called Scene-graph to label, this
model can detect more than 200 labels with high accuracy but the inference time
is slow. We use Scene-graph inference through the COCO dataset to label data.
Filter to keep only the labels have large enough data. We use that data to train
the YOLOv8 model.
Checkpoint path:
\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v4
Training code:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection
Idea:
In case a text is passed to describe the object of interest, the text will be fed into
the sentence transformer model to produce a vector embedding, then compared
with all object class names from the output of object detection to select the
object that most closely similar with the description in the input text.
Note: We have experimented with others visual grounding models, but the
inference time was too slow (OFA, CLIP-large) or the accuracy was not
guaranteed (OFA-base).
Model:
\\smb-ai.tmt.local\Public-AI\Public\Model\ObjectDetectionGeneral\v4\
checkpoint\sbert
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets_origin
- Processed dataset:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\YOLO-v8-Object-
Detection\datasets
Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot
Branch: dev
Folder: AlgorithmServices/Image/ObjectDetectionGeneral
1. Describe.
Visual Grounding is a task in computer vision and NLP that aims to locate the
most relevant object or region in an image, based on a natural language
query. The query can be a phrase, a sentence, or even a multi-round dialogue.
2. Resources.
Code:
Checkpoint path:
\\smb-ai.tmt.local\Public-AI\Public\AI_Member\hungtd\data_sv24\API\
VisualGroundingApi\model\checkpoints\visual_grounding\refcoco_base_best.pt
Document:
https://docs.google.com/document/d/1yv2SlMTHEEht-zc-
U1tPuOEnW2HRPDKO-zA Pq2yc3xw/edit
1. Describe.
Using for generate image of size table or order’s image following predefined
format
2. Resources.
2.1. Azure repository.
Link: https://tfs.tpos.dev/TMTAICollection/TMT_Intelligent_Chatbot/_git/
ChatBot
Branch: dev
Folder: AlgorithmServices/Image/BillImageGeneration
1. Describe.
When the chat bot receives any image, it always calls the API to the image
search service, so a model need to distinguish between images of product and
image of bank transaction.
The model must be lightweight, so I chose SVM and used HoG feature
extraction of the image as input.
2. Resources.
Training code:
\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\main.ipynb
Weight:
\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\model.pkl
\\smb-ai.tmt.local\Public-AI\Public\Model\BillClassification\pca.pkl