Professional Documents
Culture Documents
on
OBJECTS DETECTION
Submitted in partial fullfillment for the award of degree of
Submitted By:
ABDUL RAHMAN ABDUL WAHAB
ABDUL RAHEMAN MOHAMMED AYYUB SHARIF
SHAIKH UBADA ILYAS
Affiliated to
1
Mahatma Gandhi Mission’s
College of Computer Science & IT,
MGM Campus, Nanded-431605
Certificate
This is to certify that the project entitled "objects detection" by ABDUL RAHMAN
fulfillment of the requirements for the award of degree of null by Swami Ramanand
Teerth Marathwada University, Nanded – 431606 during the academic year 2022-
23, is a bonafide record of work carried out under my guidance and supervision.
Mr. Gill
S.S.
Guide
Examinar-1 Examinar- 2
2
Declaration
We hereby declare that project entitled "objects detection" has been completed in
the Department of Computer Science & IT, MGM’s College of Computer Science &
Nanded, under the guidance of Mr. Gill S.S. for the award of degree of null. This
report comprises only our original work and has not been submitted for award of any
other degree to any university. Due acknowledgement has been made in the text to
Date:20/3/2024
Place:Nanded
3
Acknowledgment
With executive gratitude, I would like to extend special thanks to my project advisor
and guide Mr. Gill S.S. at department of computer science MGM’s College of CS &
IT, Nanded for their valuable suggestion, meticulous guidance, keep interest as well
immense thank you from the core of my heart for their invaluable continuous
Nanded for their guidance at every step and for their invaluable support and help. I
would like to express special thanks to all my colleagues. I would like to thank my
parents to being my pillars of strength. I am highly obligate to them for their love an
Date:20/3/2024
Place:Nanded
4
Index
1 Abstract 06
2 Introduction 07-08
7 Result/output 37-40
12 Limitations 58
13 Conclusion 59
14 Bibliography 60
5
Abstract
Studio and integrating TensorFlow Object Detection API, provides users with a
convenient tool to identify objects within images. Leveraging the power of machine
learning and computer vision, this application offers two distinct methods for object
recognition: selecting an image from the device's files or capturing an image using
the device's camera. The TensorFlow Object Detection API facilitates robust and
accurate detection of objects, enabling users to obtain real-time insights into the
contents of their images. With a user-friendly interface and seamless integration into
the Android platform, this application serves as a practical solution for individuals
devices.
Key components:
6
Introduction
Android Studio and Java, leveraging the TensorFlow Object Detection API. This
application allows users to identify objects within images using two primary
methods: selecting an image file from the device's gallery or capturing an image
using the device's camera. Upon selecting or capturing an image, the app utilizes
Android application for object detection, catering to users who require quick and
7
The scope of this project encompasses the development of an Android application
select images from their device's gallery or capture images using the device's
• The accuracy of object detection may vary depending on factors such as image
quality, lighting conditions, and object complexity.
• The application may experience performance limitations on devices with lower
processing power and memory.
• Real-time processing of object detection for camera-captured images may be
affected by device capabilities and computational resources.
The decision to focus on object detection for Android was driven by several factors:
8
Data Flow Diagram
User Interface
Process
Result Presentation
9
Components:
User Interaction:
Users interact with the application through the user interface, selecting images or
Image Selection/Capture:
Users choose an image from the device's gallery or capture a new image using the
device's camera.
Detected objects, along with their labels and confidence scores, are displayed to the
Users have the option to search for the detected object's name on a search engine for
further information.
Data Movement:
Data flows from the user interaction phase to the image selection/capture phase.
The selected/captured image data is then processed through the object detection
process. The results of the object detection process are presented to the user for
viewing. Optionally, users can initiate a search query based on the detected object's
name.
10
Literature Review
techniques have been developed over the years to address this task, including:
Convolutional Neural Networks (CNNs), particularly popular for their high accuracy
and scalability. State-of-the-art architectures like Faster R-CNN, YOLO (You Only
Look Once), and SSD (Single Shot Multibox Detector), which offer real-time object
detection capabilities.
There are several frameworks and libraries available for implementing object
offers pre-trained models and tools for custom model development. PyTorch:
Another popular deep learning framework with object detection capabilities, offering
flexibility and ease of use. OpenCV: A widely-used computer vision library that
11
Review of Similar Android Applications:
Several Android applications leverage object detection for various purposes, such as
include:
Google Lens: An image recognition tool developed by Google, integrated into the
Google Photos app and Google Assistant. It allows users to search for information
objects and products, providing users with relevant information and shopping links.
Amazon Shopping: The mobile app by Amazon includes a feature called "AR
View," which utilizes augmented reality and object detection to visualize products in
12
Design and Implementation
The user interface design of the Object Detection Android app is aimed at providing
a seamless and intuitive experience for users to detect objects in images. The UI
Home Screen: Option to select an image from the device's gallery or capture a new
image using the device's camera. Buttons or icons for each option (gallery and
Image Preview: Display area to show the selected image or the image captured by
the camera. Provides a clear view of the image to users before initiating the object
detection process.
Object Detection Result: Area to display the detected objects along with their
corresponding labels and confidence scores. Clear presentation of results for easy
The implementation process of the Object Detection Android app involves the
following steps:
Image Input Handling: If the user chooses to select an image from the device's
gallery, the app retrieves the selected image. If the user opts to capture an image
using the device's camera, the app captures the image in real-time.
Object Detection: The selected image undergoes object detection using TensorFlow
Display Results: The detected objects along with their labels and confidence scores
13
are displayed to the user on the screen.The app may highlight the detected objects in
The TensorFlow Object Detection API is integrated into the Android application to
enable object detection functionality. This integration involves the following steps:
Setup TensorFlow Library: Import TensorFlow library into the Android project and
configure dependencies.
Model Loading: Load the pre-trained object detection model into the Android app.
Inference Processing: Perform inference on input images using the loaded model to
detect objects.
Result Rendering: Render the object detection results obtained from TensorFlow
14
Source Code
});
15
2: Intents (open gallery and camera)
3: ActivityResult Check
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (resultCode == RESULT_OK) {
switch (requestCode) {
case PICK_IMAGE_REQUEST:
handleGalleryResult(data);
break;
case REQUEST_IMAGE_CAPTURE:
handleCaptureResult(data);
break;
}
}
16 }
4: Object Detection Using TensorFlow
1
MobilenetV110224Quant.Outputs outputs = model.process(inputFeature0);
TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
// get output result
result.setText(labels[getMax(outputFeature0.getFloatArray())]+"");
// Releases model resources if no longer used.
model.close();
} catch (IOException e) {
// Handle the exception
}
}
5: Object Search:
1
6: MobileNet:
try {
MobilenetV110224Quant model = MobilenetV110224Quant.newInstance(context);
1
7: Manifest Folder:
2
8: Toolbar:
<LinearLayout
android:id="@+id/toolbar_main"
android:layout_width="match_parent"
android:layout_height="?actionBarSize"
android:background="@color/primary">
<TextView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Search Engine"
android:textColor="@color/secondary"
android:textSize="20dp"
android:textStyle="bold"
android:layout_gravity="center"
android:layout_margin="15dp"/>
</LinearLayout>
2
8: Home Screen Controls:
<LinearLayout
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical"
android:layout_below="@+id/toolbar_main"
android:layout_above="@+id/bottomNavigation1">
// Gallery Button
<LinearLayout
android:id="@+id/gallery_btn"
android:layout_width="match_parent"
android:layout_height="100dp"
android:layout_marginStart="20dp"
android:layout_marginEnd="20dp"
android:layout_marginTop="10dp"
android:background="@color/white"
android:elevation="5dp"
android:orientation="horizontal">
<LinearLayout
android:layout_width="match_parent"
android:gravity="center"
android:layout_height="match_parent"
android:layout_weight="1"
android:orientation="vertical">
<TextView
android:layout_width="wrap_content"
2
android:layout_height="wrap_content"
android:text="Gallery"
android:layout_marginStart="15dp"
android:layout_gravity="start"
android:textStyle="bold"
android:textColor="@color/primary"
android:textSize="18dp"/>
<TextView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Select Image from gallery"
android:layout_marginStart="15dp"
android:layout_gravity="start"
android:textColor="@color/black"
android:textSize="15dp"/>
</LinearLayout>
<ImageView
android:layout_width="match_parent"
android:layout_height="match_parent"
android:layout_weight="1"
android:padding="5dp"
android:src="@drawable/gallery_png"/>
</LinearLayout>
2
// Camera Button
<LinearLayout
android:id="@+id/camera_btn"
android:layout_width="match_parent"
android:layout_height="100dp"
android:layout_marginStart="20dp"
android:layout_marginEnd="20dp"
android:layout_marginTop="10dp"
android:background="@color/white"
android:elevation="5dp"
android:orientation="horizontal">
<ImageView
android:layout_width="match_parent"
android:layout_height="match_parent"
android:layout_weight="1"
android:src="@drawable/camera_png"/>
<LinearLayout
android:layout_width="match_parent"
android:gravity="center"
android:layout_height="match_parent"
android:layout_weight="1"
android:orientation="vertical">
<TextView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Camera"
2
android:layout_marginStart="15dp"
android:layout_gravity="start"
android:textStyle="bold"
android:textColor="@color/primary"
android:textSize="18dp"/>
<TextView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Capture object's image"
android:layout_marginStart="15dp"
android:layout_gravity="start"
android:textColor="@color/black"
android:textSize="15dp"/>
</LinearLayout>
</LinearLayout>
// Preview ImageView
<ImageView
android:visibility="gone"
android:id="@+id/preview_image"
android:layout_width="match_parent"
android:layout_height="200dp"
android:layout_marginStart="20dp"
android:layout_marginEnd="20dp"
android:layout_marginTop="10dp"
android:src="@mipmap/ic_launcher"/>
2
// Object Search Button
<LinearLayout
android:visibility="gone"
android:id="@+id/search_btn"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:background="@drawable/background_bottombar"
android:layout_margin="20dp"
android:padding="5dp"
android:gravity="center"
android:orientation="horizontal">
<TextView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Search Object"
android:gravity="center"
android:padding="5dp"
android:textColor="@color/secondary"/>
<ImageView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:src="@drawable/baseline_search_24"
android:layout_marginStart="5dp"/>
</LinearLayout>
</LinearLayout>
2
9: Item Selector Draw-able
<item
android:state_selected="true"
android:color="@color/white"/>
<item
android:state_selected="false"
android:color="@color/secondary"/>
</selector>
<color name="primary">#900C3F</color>
<color name="secondary">#d6cadd</color>
</resources>
2
10: Themes Day
<resources xmlns:tools="http://schemas.android.com/tools">
<!-- Base application theme. -->
<style name="Base.Theme.AdvoDetection"
parent="Theme.Material3.DayNight.NoActionBar">
<!-- Customize your light theme here. -->
<item name="colorPrimary">@color/primary</item>
</style>
<resources xmlns:tools="http://schemas.android.com/tools">
<!-- Base application theme. -->
<style name="Base.Theme.AdvoDetection"
parent="Theme.Material3.DayNight.NoActionBar">
<!-- Customize your dark theme here. -->
<item name="colorPrimary">@color/primary</item>
</style>
</resources>
2
Dependencies
These are dependencies specified in a Gradle build file for an Android project. Let's
break down each dependency:
implementation("androidx.appcompat:appcompat:1.6.1"):
This dependency imports the AndroidX AppCompat library version 1.6.1.
AppCompat is a support library provided by Google that allows developers to use
modern Android features on older versions of Android. It provides backward-
compatible implementations of many UI components and behaviors introduced in
newer Android versions.
implementation("com.google.android.material:material:1.10.0"):
This dependency imports the Material Components for Android library version
1.10.0. Material Components for Android is a set of UI components and styles
provided by Google to implement Material Design in Android apps. It includes
components like buttons, text fields, cards, and more, following Google's Material
Design guidelines.
implementation("androidx.constraintlayout:constraintlayout:2.1.4"):
This dependency imports the AndroidX ConstraintLayout library version 2.1.4.
ConstraintLayout is a layout manager for Android that allows developers to create
complex layouts with a flat view hierarchy. It enables the creation of responsive and
flexible user interfaces by defining constraints between UI elements.
implementation("org.tensorflow:tensorflow-lite-support:0.1.0"):
This dependency imports the TensorFlow Lite Support library version 0.1.0.
TensorFlow Lite Support provides additional utilities and support for TensorFlow
Lite models on Android devices. It includes functionalities for loading, running, and
managing TensorFlow Lite models, as well as support for common pre- and post-
2
processing tasks.
implementation("org.tensorflow:tensorflow-lite-metadata:0.1.0"):
This dependency imports the TensorFlow Lite Metadata library version 0.1.0.
TensorFlow Lite Metadata provides tools and utilities for working with metadata
associated with TensorFlow Lite models. It allows developers to access and
manipulate metadata information such as model input/output details, author
information, and model descriptions.
testImplementation("junit:junit:4.13.2"):
This dependency imports the JUnit testing framework version 4.13.2 for unit testing
purposes. JUnit is a popular framework for writing and executing unit tests in Java
and Android projects. It provides annotations and assertions for defining and
verifying test cases, helping developers ensure the correctness of their code.
androidTestImplementation("androidx.test.ext:junit:1.1.5"):
This dependency imports the AndroidX Test JUnit library version 1.1.5 for Android
instrumentation testing. It includes extensions and utilities to enhance JUnit testing
capabilities specifically for Android applications.
androidTestImplementation("androidx.test.espresso:espresso-core:3.5.1"):
This dependency imports the Espresso testing framework version 3.5.1 for UI testing
on Android. Espresso provides a fluent API for writing concise and reliable UI tests,
interacting with UI elements and verifying UI behaviors programmatically.
3
Default Config
configuration settings that apply to all build variants by default. Let's break down
applicationId = "com.wozrusfanr.example":
This parameter sets the unique application ID for the Android application. The
application ID is used to uniquely identify the app in the Google Play Store and must
MinSdk = 24:
This parameter specifies the minimum Android SDK version required to run the
application. Devices running Android versions lower than the specified minimum
targetSdk = 33:
This parameter specifies the target Android SDK version that the application is built
and tested against. It indicates the highest version of the Android SDK that the app
VersionCode = 1:
This parameter sets the version code of the application, which is used to differentiate
between different versions of the app. The version code must be an integer value and
should increase with each subsequent version to indicate the progression of the app.
3
VersionName = "1.0":
This parameter sets the version name of the application, which is a human-readable
string used to identify the version of the app. The version name typically follows a
convention like "major.minor" (e.g., "1.0", "1.1", "2.0") to indicate major and minor
releases.
testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner":
This parameter specifies the instrumentation test runner class to be used for running
AndroidJUnit tests. It indicates the entry point for running instrumented tests on an
versions, version code, version name, and test instrumentation runner. These settings
3
Build Types
configurations for different build types, such as "debug" or "release". Let's focus on
release { ... }:
This block defines configurations specifically for the "release" build type. The
"release" build type is typically used for generating the final, production-ready
isMinifyEnabled = false:
This parameter disables code shrinking and obfuscation for the "release" build. Code
shrinking removes unused code and resources from the final APK to reduce its size,
these features can simplify debugging and troubleshooting for the release build, but
proguardFiles(...):
This parameter specifies the ProGuard configuration files to be used for code
3
custom ProGuard rules defined by the developer for additional configuration.
configurations for different build types in an Android project. For the "release" build
ensure the final APK is optimized, secure, and ready for distribution.
3
Compile Options
compilation options for the Java source code used in the project. Let's break down
sourceCompatibility = JavaVersion.VERSION_1_8:
This parameter sets the Java source compatibility version for the project. It specifies
the version of the Java language syntax and features that the source code is
targetCompatibility = JavaVersion.VERSION_1_8:
This parameter sets the target Java compatibility version for the project. It specifies
the version of the Java bytecode that the compiled classes will be compatible with.
In this case, it's also set to JavaVersion.VERSION_1_8, indicating that the compiled
These settings ensure that the project's Java source code is written and compiled
using Java 8 syntax and features (sourceCompatibility), and the resulting bytecode is
source and target compatibility versions, developers can leverage the features and
3
Build Features
The buildFeatures block in an Android project's Gradle build file allows developers
to enable or disable certain build features. In this case, the mlModelBinding feature
mlModelBinding = true:
This parameter enables the ML model binding feature for the project. ML model
binding is a feature introduced in Android Studio Arctic Fox (2020.3.1) and higher
that simplifies the process of integrating machine learning (ML) models into
Android apps. With ML model binding enabled, Android Studio generates Java or
Kotlin classes that represent the ML models, making it easier for developers to load
and use these models in their applications. By enabling the mlModelBinding feature,
Studio for integrating ML models into their Android apps. This feature abstracts
away some of the complexities associated with loading and using ML models,
functionality.
3
Result/output
1: Home Activity:
The Home Screen serves as the entry point for users to initiate the object detection
This option allows users to select an image from their device's file system or gallery.
When the user selects this option, the application opens the device's file explorer or
gallery, enabling them to browse through their stored images. Once the user selects
an image, the application retrieves the selected image for object detection
processing.
This option enables users to capture a new image using the device's camera in real-
time. When the user selects this option, the application activates the device's camera
interface, allowing them to capture a photo. After capturing the image, the
device's storage.
3
2: Functionality Explanation:
Upon selecting this option, the application launches an intent to open the device's
file explorer or gallery. The user can navigate through their stored images and select
the desired image for object detection. After the user selects an image, the
application retrieves the selected image's URI and proceeds with object detection
processing.
When the user chooses this option, the application launches the device's camera
interface. The user can capture a new image by tapping the capture button within the
camera interface. After capturing the image, the application immediately processes it
3
3: Output Screen:
This ImageView component displays the selected or captured image, allowing users
to visualize the image on which object detection was performed. The displayed
image provides context for the detected objects and enhances user understanding.
This TextView component dynamically displays the names of objects detected in the
selected or captured image. As the object detection process identifies objects, their
This Button component enables users to perform a quick search on the internet for
more information about the detected object. When clicked, the button triggers a
search query using the detected object's name as the search term.
3
4: Search Button Click Functionality:
Upon clicking the search button, the app creates an intent with the
ACTION_SEARCH action. This intent indicates to the Android system that the app
The detected object's name, obtained from the TextView displaying the object
names, serves as the search query.The app extracts the detected object's name and
The app then starts an activity that can handle the search intent, such as a web
search intents, the user may be prompted to choose the preferred application.
4
TensorFlow
This section describes our proposed SSD framework for detection (Sec. 2.1) and the
specific model details and experimental results. Fig. 1: SSD framework. (a) SSD
only needs an input image and ground truth boxes for each object during training. In
aspect ratios at each location in several feature maps with different scales (e.g. 8 × 8
and 4 × 4 in (b) and (c)). For each default box, we predict both the shape offsets and
the confidences for all object categories ((c1, c2, · · · , cp)). At training time, we first
match these default boxes to the ground truth boxes. For example, we have matched
two default boxes with the cat and one with the dog, which are treated as positives
and the rest as negatives. The model loss is a weighted sum between localization
4
2: Model
a fixed-size collection of bounding boxes and scores for the presence of object
class
the final detections. The early network layers are based on a standard architecture
used for high quality image classification (truncated before any classification
layers), which we will call the base network2 We then add auxiliary structure to the
Multi-scale feature maps for detection We add convolutional feature layers to the
end of the truncated base network. These layers decrease in size progressively and
predicting detections is different for each feature layer (cf Overfeat[4] and YOLO[5]
Convolutional predictors for detection Each added feature layer (or optionally an
existing feature layer from the base network) can produce a fixed set of detection
predictions using a set of convolutional filters. These are indicated on top of the SSD
network architecture in Fig. 2. For a feature layer of size m × n with p channels, the
kernel that produces either a score for a category, or a shape offset relative to the
default box coordinates. At each of the m × n locations where the kernel is applied,
Fig. 2: A comparison between two single shot detection models: SSD and YOLO
[5]. Our SSD model adds several feature layers to the end of a base network, which
predict the offsets to default boxes of different scales and aspect ratios and their
associated confidences.
4
3: Face Attributes
technique for deep networks. We seek to reduce a large face attribute classifier with
classifier using the MobileNet architecture. Distillation [9] works by training the
labels, hence enabling training from large (and potentially infinite) unlabeled
across attributes (mean AP) as the in-house while consuming only 1% the Multi-
Adds.
4: Object Detection
detection systems. We report results for MobileNet trained for object detection on
COCO data based on the recent work that won the 2016 COCO challenge [10]. In
table 13, MobileNet is compared to VGG and Inception V2 [13] under both Faster-
RCNN [23] and SSD [21] framework. In our experiments, SSD is evaluated with
300 input resolution (SSD 300) and Faster-RCNN is compared with both 300 and
4
600 input resolution (FasterRCNN 300, Faster-RCNN 600). The Faster-RCNN
model evaluates 300 RPN proposal boxes per image. The models are trained on
COCO train+val excluding 8k minival images Table 12. Face attribute classification
4
MobileNetV1 model
1: Introduction
since AlexNet [19] popularized deep convolutional neural networks by winning the
ImageNet Challenge: ILSVRC 2012 [24]. The general trend has been to make
deeper and more complicated networks in order to achieve higher accuracy [27, 31,
29, 8]. However, these advances to improve accuracy are not necessarily making
networks more efficient with respect to size and speed. In many real world
applications such as robotics, self-driving car and augmented reality, the recognition
platform. This paper describes an efficient network architecture and a set of two
hyper-parameters in order to build very small, low latency models that can be easily
matched to the design requirements for mobile and embedded vision applications.
Section 2 reviews prior work in building small models. Section 3 describes the
2. Prior Work
There has been rising interest in building small and efficient neural networks in the
recent literature, e.g. [16, 34, 12, 36, 22]. Many different approaches can be
networks directly. This paper proposes a class of network architectures that allows a
4
model developer to specifically choose a small network that matches the resource
optimizing for latency but also yield small networks. Many papers on small
introduced in [26] and subsequently used in Inception models [13] to reduce the
computation in the first few layers. Flattened networks [16] build a network out of
Squeezenet [12] which uses a bottleneck approach to design a very small network.
Other reduced computation networks include structured transform networks [28] and
deep fried convnets [37]. A different approach for obtaining small networks is
4
3. MobileNet Architecture
In this section we first describe the core layers that MobileNet is built on which are
depthwise separable filters. We then describe the MobileNet network structure and
the depthwise convolution applies a single filter to each input channel. The
pointwise convolution then applies a 1×1 convolution to combine the outputs the
depthwise convolution. A standard convolution both filters and combines inputs into
a new set of outputs in one step. The depthwise separable convolution splits this into
two layers, a separate layer for filtering and a separate layer for combining. This
factorization has the effect of drastically reducing computation and model size.
in the previous section except for the first layer which is a full convolution. By
defining the network in such simple terms we are able to easily explore network
All layers are followed by a batchnorm [13] and ReLU nonlinearity with the
4
exception of the final fully connected layer which has no nonlinearity and feeds into
after each convolutional layer. Down sampling is handled with strided convolution
in the depthwise convolutions as well as in the first layer. A final average pooling
reduces the spatial resolution to 1 before the fully connected layer. Counting
is also important to make sure these operations can be efficiently implementable. For
Figure 3. Left: Standard convolutional layer with batchnorm and ReLU. Right:
batchnorm and ReLU. instance unstructured sparse matrix operations are not
typically faster than dense matrix operations until a very high level of sparsity. Our
model structure puts nearly all of the computation into dense 1 × 1 convolutions.
This can be implemented with highly optimized general matrix multiply (GEMM)
this approach is used in the popular Caffe package [15]. 1×1 convolutions do not
require this reordering in memory and can be implemented directly with GEMM
has 75% of the parameters as can be seen in Table 2. Nearly all of the additional
4
MobileNet models were trained in TensorFlow [1] using RMSprop [33] with
training large models we use less regularization and data augmentation techniques
because small models have less trouble with overfitting. When training MobileNets
we do not use side heads or label smoothing and additionally reduce the
amount image of distortions by limiting the size of small crops that are used in large
Inception training [31]. Additionally, we found that it was important to put very little
or no weight decay (l2 regularization) on the depthwise filters since their are so few
parameters in them. For the ImageNet benchmarks in the next section all models
were trained with same training parameters regardless of the size of the model.
Although the base MobileNet architecture is already small and low latency, many
times a specific use case or application may require the model to be smaller and
faster. In order to construct these smaller and less computationally expensive models
we introduce a very simple parameter α called width multiplier. The role of the
width multiplier α is to thin a network uniformly at each layer. For a given layer and
width multiplier α, the number of input channels M becomes αM and the number of
settings of 1, 0.75, 0.5 and 0.25. α = 1 is the baseline MobileNet and α < 1 are
reduced MobileNets. Width multiplier has the effect of reducing computational cost
applied to any model structure to define a new smaller model with a reasonable
4
accuracy, latency and size trade off. It is used to define a new reduced structure that
4. Experiments
the choice of shrinking by reducing the width of the network rather than the number
of layers. We then show the trade offs of reducing the network based on the two
different applications.
compared to a model built with full convolutions. In Table 4 we see that using
parameters. We next show results comparing thinner models with width multiplier to
shallower models using less layers. To make MobileNet shallower, the 5 layers of
separable filters with feature size 14 × 14 × 512 in Table 1 are removed. Table 5
Table 6 shows the accuracy, computation and size trade offs of shrinking the
MobileNet architecture with the width multiplier α. Accuracy drops off smoothly
5
until the architecture is made too small at α = 0.25. Table 7 shows the accuracy,
computation and size trade offs for different resolution multipliers by training
MobileNets with reduced input resolutions. Accuracy drops off smoothly across
resolution. Figure 4 shows the trade off between ImageNet Accuracy and
computation for the 16 models made from the cross product of width multiplier α ∈
{1, 0.75, 0.5, 0.25} and resolutions {224, 192, 160, 128}. Results are log linear with
We train MobileNet for fine grained recognition on the Stanford Dogs dataset [17].
We extend the approach of [18] and collect an even larger but noisy training set than
[18] from the web. We use the noisy web data to pretrain a fine grained dog
recognition model and then fine tune the model on the Stanford Dogs training set.
Results on Stanford Dogs test set are in Table 10. MobileNet can almost achieve the
state of the art results from [18] at greatly reduced computation and size.
PlaNet [35] casts the task of determining where on earth a photo was taken as a
classification problem. The approach divides the earth into a grid of geographic cells
that serve as the target classes and trains a convolutional neural network on millions
of geo-tagged photos. PlaNet has been shown to successfully localize a large variety
of photos and to outperform Im2GPS [6, 7] that addresses the same task. We re-train
PlaNet using the MobileNet architecture on the same data. While the full PlaNet
model based on the Inception V3 architecture [31] has 52 million parameters and
5.74 billion mult-adds. The MobileNet model has only 13 million parameters with
5
the usual 3 million for the body and 10 million for the final layer and 0.58 Million
mult-adds. As shown in Tab. 11, the MobileNet version delivers only slightly
5. Conclusion
leading to an efficient model. We then demonstrated how to build smaller and faster
when applied to a wide variety of tasks. As a next step to help adoption and
5
Results and Evaluation
scenarios and image types demonstrated consistent and reliable detection results.
Processing Time:
The processing time for object detection varied depending on the complexity of the
image and the computational resources of the device. On average, the model
objects.
Resource Consumption:
Memory usage and battery consumption were monitored during testing to assess the
utilization, with acceptable levels of memory usage and minimal impact on battery
life.
Performance Benchmarking:
The performance of the object detection model on Android was compared with
comparisons may vary depending on the specific use case and dataset, our model
5
Advantages Over Alternatives:
Our solution offers the advantage of being tailored specifically for Android devices,
API provides a robust and well-supported framework for object detection tasks.
Usability and User Experience: Initial user feedback highlighted the application's
intuitive interface and ease of use, particularly in selecting images and viewing
object detection results. Users appreciated the real-time feedback provided during
the object detection process, enhancing their overall experience with the application.
Feature Requests and Suggestions: Some users provided suggestions for additional
mode, and performance optimizations. User feedback will be considered for future
5
Future Enhancements
Implement the capability for users to train custom object detection models directly
within the application. Allow users to define and label their own object categories,
2. Offline Mode:
connectivity.
Expand the object detection capabilities to include object tracking, enabling users to
track the movement and trajectory of detected objects over time. Implement
objects directly onto the camera view. Enable users to interact with detected objects
triggering actions.
5. Accessibility Features:
Enhance accessibility features within the application to cater to users with visual
5
improved accessibility.
Investigate methods for further optimizing the object detection model's performance
on mobile devices to achieve even faster processing times and lower resource
Explore methods for detecting and classifying objects with subtle differences or
3. Contextual Understanding:
capabilities, enabling it to infer the relationships between detected objects and their
segmentation, and contextual reasoning to provide deeper insights into the detected
and user feedback to improve the accuracy and coverage of object detection models.
5
5. Privacy-Preserving Object Detection:
Research techniques for performing object detection while preserving user privacy
differential privacy, and on-device model training to enable secure and privacy-
5
Limitations
1: Performance Constraints:
2: Model Accuracy:
The accuracy of object detection heavily relies on the quality of the pre-trained
model used in the application. Pre-trained models may not always accurately detect
and data.
The object detection model may struggle with identifying objects that are small,
depending on the object's size, shape, orientation, and lighting conditions, leading to
inconsistencies in results.
If the app relies on an online search engine for retrieving additional information
about detected objects, it may require a stable internet connection. Lack of internet
5
Conclusion
Detection API, and provided users with a convenient means of identifying objects
The Object Detection Android application holds significant implications for various
access relevant information, and enhance their understanding of the world around
continue to evolve and refine the application, we remain committed to advancing the
field of computer vision and delivering impactful solutions that empower users
worldwide.
5
Bibliography