You are on page 1of 9

Face detection \w facial landmarks

● Various datasets but most of them are only available for research OR really small:
○ http://shuoyang1213.me/WIDERFACE/ -> for face detection, Iʼve used it before, but not exactly what
we need, in this project, we have ʻlocalizationʼ -> ʻsingle face, where exactly is it?ʼ
○ https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/aflw/ -> best dataset but
for research purposes only
■ HOWEVER: there are pre-trained models in the MMPose framework which we can use, I have
experience \w this framework, it could also be exported to ONNX (platform independent, can
run easily on CPU as well) :
https://mmpose.readthedocs.io/en/latest/demo.html#d-face-keypoint-demo
○ More datasets here: https://paperswithcode.com/task/facial-landmark-detection
■ Generally small datasets, and do the same as in the demo here on PyTorchʼs website:
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html?highlight=dataloader
■ Mostly created for publication purposes, not really useable :/ (except the one mentioned
above)
Face detection \w facial landmarks
● Open-source python library based on traditional (mostly non deep learning)
techniques that do face detection and landmark generation
○ https://github.com/ageitgey/face_recognition
■ Iʼve used this before, it is pretty good and especially in this scenario it might be really easy to
set up, but I am not sure how accurate this will be
■ A hybrid of this and facial landmark \w MMPose could work well, have to experiment with that
● Many pretrained models here, on various datasets:
https://mmpose.readthedocs.uio/en/latest/topics/face.html#aflw-dataset
● Overall, I imagine a system:
○ That detects the face with a simple module OR based on the facial landmark detection, generates a
bounding box (after landmark detection)
○ Detects the landmarks for the eyes, bounding box is heuristically generated
(Credit) card detection
● There are no good open-source solution for detecting (credit) cards on images:
○ This SDK seems to do the job, but it costs a lot IMO and could be developed for our scenario in ~20
hours with ~1ʼ000 annotated images (people holding it in front of the camera, parallel with the image
plane) -> https://github.com/DoubangoTelecom/ultimateCreditCard-SDK
● Solutions:
○ Pure OpenCV and try to detect cards based on edges
■ This could work, though it might be prone to errors and not sure how reliable this will be, if the
users put the card in a required position (in the middle of their forehead, mouth, etc) that
could improve this significantly
○ Annotating a few hundred to a thousand images with various people and cards (I assume this should
not only work with credit cards specifically, but all cards that have the same size as a credit card) ->
this is the ʻbestʼ way, training an object detector afterwards is pretty straightforward
MMPose results - some bad, but reliable examples
MMPose - bad results
MMPose on test set
● 31 items out of the 206 have no detected ʻfaceʼ
○ This is an issue, with the face_recognition library, we could improve on this with our own representative data
● Out of the 31:
○ 24 could still be used based on the keypoint detection (some need user input, to ideally place the detection on the
pupil ~9, but others are perfectly reasonable without the face detection mask too
○ The remaining are completely off
● On 175 ( the remainder ) images, the faces were correctly detected as well as all the keypoints,
even with masks, but that could be an issue (!)
How to move on:
● I think it would be wise to annotate 1000 representative images
○ Do this in two phase (!)
■ Run these models after the data is gathered, and correct the faulty ones
■ I could later on fine-tune these models with the acquired, annotated dataset
● We would need to hold back ~100 images for testing the trained model, so it would be wise, to
collect more than 1000 images in total
Commercial solutions
● Google Vision API :
https://cloud.google.com/vision/pricing#google_cloud_platform_costs
○ https://cloud.google.com/vision/docs/detecting-faces
○ Free 1000 per month, above there are costs (but not much)
○ Tested this, but this required a lot of setup, so not testing the others
Google Vision API - results (only 1 missed)
Microsoft

Amazon

You might also like