Final Project Report

A PROJECT REPORT
on
OBJECTS DETECTION
Submitted in partial fullfillment for the award of degree of
Submitted By:
ABDUL RAHMAN ABDUL WAHAB
ABDUL RAHEMAN MOHAMMED AYYUB SHARIF
SHAIKH UBADA ILYAS
Under the guidance of
Mr. Gill S.S.
Mahatma Gandhi Mission’s

College of Computer Science & IT,
Department Of Computer Science & IT,
MGM Campus, Nanded-431605
Affiliated to
Swami Ramanand Teerth Marathwada University,

Nanded-431606
2023-24
1
Mahatma Gandhi Mission’s
College of Computer Science & IT,
MGM Campus, Nanded-431605
Certificate
This is to certify that the project entitled "objects detection" by ABDUL RAHMAN
ABDUL WAHAB ( ), ABDUL RAHEMAN MOHAMMED AYYUB
SHARIF ( ), SHAIKH UBADA ILYAS ( ) submitted in partial
fulfillment of the requirements for the award of degree of null by Swami Ramanand
Teerth Marathwada University, Nanded – 431606 during the academic year 2022-
23, is a bonafide record of work carried out under my guidance and supervision.
Mr. Gill
S.S.
Guide
Dr.Mrs. Kanchan A. Nandedkar Prof. Shirish L. Kotgire

Head of Department Principal
Examinar-1 Examinar- 2
2
Declaration
We hereby declare that project entitled "objects detection" has been completed in
the Department of Computer Science & IT, MGM’s College of Computer Science &
IT, Nanded and Submitted to Swami Ramanand Teerth Marathwada University
Nanded, under the guidance of Mr. Gill S.S. for the award of degree of null. This
report comprises only our original work and has not been submitted for award of any
other degree to any university. Due acknowledgement has been made in the text to
all other material used.

SHAIKH UBADA ILYAS
Date:20/3/2024
Place:Nanded
3
Acknowledgment
With executive gratitude, I would like to extend special thanks to my project advisor
and guide Mr. Gill S.S. at department of computer science MGM’s College of CS &
IT, Nanded for their valuable suggestion, meticulous guidance, keep interest as well
as encouragement in this complete endeavor. words are inadequate to express my
immense thank you from the core of my heart for their invaluable continuous
encouragement. My cordial thanks to principal prof. Shirish L. kotgire, Head of
Department Dr. Mrs. Kanchan A. Nandedkar, from MGM’s College of CS and IT
Nanded for their guidance at every step and for their invaluable support and help. I
would like to express special thanks to all my colleagues. I would like to thank my
parents to being my pillars of strength. I am highly obligate to them for their love an
encouragement at every point of time.

SHAIKH UBADA ILYAS
Date:20/3/2024
Place:Nanded
4
Index
Sr. No Contents Page. No
1 Abstract 06
2 Introduction 07-08
3 Data Flow Diagram 09-10
4 Literature Review 11-12
5 Design and Implementation 13-14
6 Source code 15-36
7 Result/output 37-40
8 Tensor Flow 41-44
9 MobileNetV1 model 45-52
10 Results and Evaluation 53-54
11 Future Enhancements 55-57
12 Limitations 58
13 Conclusion 59
14 Bibliography 60
5
Abstract
The "Object Detection" Android application, developed using Java in Android
Studio and integrating TensorFlow Object Detection API, provides users with a
convenient tool to identify objects within images. Leveraging the power of machine
learning and computer vision, this application offers two distinct methods for object
recognition: selecting an image from the device's files or capturing an image using
the device's camera. The TensorFlow Object Detection API facilitates robust and
accurate detection of objects, enabling users to obtain real-time insights into the
contents of their images. With a user-friendly interface and seamless integration into
the Android platform, this application serves as a practical solution for individuals
seeking efficient and reliable object identification capabilities on their mobile
devices.
Key components:
Android Studio, Java Programming Language, TensorFlow Object Detection API,
Image Selection from Files, Camera Integration, Object Detection Model.
6
Introduction
Overview of the Project:
The "Object Detection Android App" is an innovative application developed using
Android Studio and Java, leveraging the TensorFlow Object Detection API. This
application allows users to identify objects within images using two primary
methods: selecting an image file from the device's gallery or capturing an image
using the device's camera. Upon selecting or capturing an image, the app utilizes
advanced object detection algorithms powered by TensorFlow to accurately
recognize and label objects present in the image.
Purpose and Objectives:
The primary purpose of this project is to develop a user-friendly and efficient
Android application for object detection, catering to users who require quick and
accurate identification of objects within images. The main objectives include:
• Providing users with a convenient tool to identify objects in images effortlessly.

• Demonstrating the integration of TensorFlow Object Detection API into Android
applications.
• Enhancing user experience by offering both gallery selection and camera capture
functionalities.
• Exploring the capabilities of deep learning-based object detection techniques in a
mobile environment.
Scope and Limitations:
7
The scope of this project encompasses the development of an Android application
capable of real-time object detection using pre-trained models provided by the
TensorFlow Object Detection API. The application allows users to interactively
select images from their device's gallery or capture images using the device's
camera. However, it's important to note some limitations:
• The accuracy of object detection may vary depending on factors such as image
quality, lighting conditions, and object complexity.
• The application may experience performance limitations on devices with lower
processing power and memory.
• Real-time processing of object detection for camera-captured images may be
affected by device capabilities and computational resources.
Motivation Behind Choosing Object Detection for Android:
The decision to focus on object detection for Android was driven by several factors:
• Rising demand for mobile applications with advanced computer vision

capabilities.
• Increasing adoption of deep learning techniques for image recognition tasks.
• Opportunity to explore the integration of cutting-edge technologies like
TensorFlow into mobile development.
• Potential applications in various fields such as augmented reality, image
classification, and accessibility tools.
8
Data Flow Diagram
User Interface
Select Image Capture Image
Image Input Module
Process
Result Presentation
End Process Search Result
9
Components:
User Interaction:
Users interact with the application through the user interface, selecting images or
capturing new images using the camera.
Image Selection/Capture:
Users choose an image from the device's gallery or capture a new image using the
device's camera.
Object Detection Process:
The selected/captured image undergoes object detection using the TensorFlow
Object Detection API.
Display Detected Objects:
Detected objects, along with their labels and confidence scores, are displayed to the
user on the screen.
Search Object Name on Engine:
Users have the option to search for the detected object's name on a search engine for
further information.
Data Movement:
Data flows from the user interaction phase to the image selection/capture phase.
The selected/captured image data is then processed through the object detection
process. The results of the object detection process are presented to the user for
viewing. Optionally, users can initiate a search query based on the detected object's
name.
10
Literature Review
Background Research on Object Detection Techniques:
Object detection is a fundamental task in computer vision with numerous
applications, ranging from security surveillance to autonomous vehicles. Various
techniques have been developed over the years to address this task, including:
Traditional methods such as Haar cascades, Histogram of Oriented Gradients
(HOG), and feature-based methods. Modern deep learning-based approaches like
Convolutional Neural Networks (CNNs), particularly popular for their high accuracy
and scalability. State-of-the-art architectures like Faster R-CNN, YOLO (You Only
Look Once), and SSD (Single Shot Multibox Detector), which offer real-time object
detection capabilities.
Overview of Existing Object Detection Frameworks and Libraries:
There are several frameworks and libraries available for implementing object
detection tasks, catering to different programming languages and platforms. Some
notable ones include:
TensorFlow Object Detection API: Developed by Google, TensorFlow provides a
comprehensive framework for training and deploying object detection models. It
offers pre-trained models and tools for custom model development. PyTorch:
Another popular deep learning framework with object detection capabilities, offering
flexibility and ease of use. OpenCV: A widely-used computer vision library that
provides various object detection algorithms and pre-trained models.
11
Review of Similar Android Applications:
Several Android applications leverage object detection for various purposes, such as
image recognition, augmented reality, and accessibility assistance. Some examples
include:
Google Lens: An image recognition tool developed by Google, integrated into the
Google Photos app and Google Assistant. It allows users to search for information
about objects captured in photos.
CamFind: An Android app that uses image recognition technology to identify
objects and products, providing users with relevant information and shopping links.
Amazon Shopping: The mobile app by Amazon includes a feature called "AR
View," which utilizes augmented reality and object detection to visualize products in
the user's environment before purchase.
12
Design and Implementation
1: User Interface Design
The user interface design of the Object Detection Android app is aimed at providing
a seamless and intuitive experience for users to detect objects in images. The UI
consists of the following components:
Home Screen: Option to select an image from the device's gallery or capture a new
image using the device's camera. Buttons or icons for each option (gallery and
camera) for user interaction.
Image Preview: Display area to show the selected image or the image captured by
the camera. Provides a clear view of the image to users before initiating the object
detection process.
Object Detection Result: Area to display the detected objects along with their
corresponding labels and confidence scores. Clear presentation of results for easy
understanding by the user.
2: Detailed Explanation of the Implementation Process:
The implementation process of the Object Detection Android app involves the
following steps:
Image Input Handling: If the user chooses to select an image from the device's
gallery, the app retrieves the selected image. If the user opts to capture an image
using the device's camera, the app captures the image in real-time.
Object Detection: The selected image undergoes object detection using TensorFlow
Object Detection API.
Display Results: The detected objects along with their labels and confidence scores
13
are displayed to the user on the screen.The app may highlight the detected objects in
the image or provide a separate list of detected objects.
3: Integration of TensorFlow API into the Android Application:
The TensorFlow Object Detection API is integrated into the Android application to
enable object detection functionality. This integration involves the following steps:
Setup TensorFlow Library: Import TensorFlow library into the Android project and
configure dependencies.
Model Loading: Load the pre-trained object detection model into the Android app.
Inference Processing: Perform inference on input images using the loaded model to
detect objects.
Result Rendering: Render the object detection results obtained from TensorFlow
API on the app's user interface.
14
Source Code
1: Buttons Click Event:

change_btn_1.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
Intent intent = new Intent(MainActivity.this,CameraActivity.class);
startActivity(intent);
}
});
home_btn.setOnClickListener(new View.OnClickListener() {
@Override
Intent intent = new Intent(MainActivity.this,MainActivity.class);
startActivity(intent);
}
});
gallery_btn.setOnClickListener(new View.OnClickListener() {
@Override
openGallery();
}
});
camera_btn.setOnClickListener(new View.OnClickListener() {
@Override
captureImage();
}
});
search_btn.setOnClickListener(new View.OnClickListener() {
@Override
search_google(result);
}
});
15
2: Intents (open gallery and camera)
private void openGallery() {

Intent intent = new Intent(Intent.ACTION_PICK,
MediaStore.Images.Media.EXTERNAL_CONTENT_URI);
startActivityForResult(intent, PICK_IMAGE_REQUEST);
}
private void captureImage() {
Intent takePictureIntent = new
Intent(MediaStore.ACTION_IMAGE_CAPTURE);
if (takePictureIntent.resolveActivity(getPackageManager()) != null) {
startActivityForResult(takePictureIntent,
REQUEST_IMAGE_CAPTURE);
}
3: ActivityResult Check
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (resultCode == RESULT_OK) {
switch (requestCode) {
case PICK_IMAGE_REQUEST:
handleGalleryResult(data);
break;
case REQUEST_IMAGE_CAPTURE:
handleCaptureResult(data);
break;
}
}
16 }
4: Object Detection Using TensorFlow
private void predict(){
String[] labels = new String[1001];

int cnt=0;
try {
BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(getAssets().open("labels.txt")));
String line = bufferedReader.readLine();
while (line!=null){
labels[cnt]=line;
cnt++;
line = bufferedReader.readLine();
}
} catch (IOException e){
e.printStackTrace();
}
try {
MobilenetV110224Quant model =
MobilenetV110224Quant.newInstance(MainActivity.this);
// Creates inputs for reference.
TensorBuffer inputFeature0 = TensorBuffer.createFixedSize(new int[]{1, 224,
224, 3}, DataType.UINT8);
if (bitmap!= null) {
bitmap = Bitmap.createScaledBitmap(bitmap, 224, 224, true);
inputFeature0.loadBuffer(TensorImage.fromBitmap(bitmap).getBuffer());
} else {
Toast.makeText(MainActivity.this, "bitmap is null",
Toast.LENGTH_SHORT).show();
}
// Runs model inference and gets result.
1
MobilenetV110224Quant.Outputs outputs = model.process(inputFeature0);
TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
// get output result
result.setText(labels[getMax(outputFeature0.getFloatArray())]+"");
// Releases model resources if no longer used.
model.close();
} catch (IOException e) {
// Handle the exception
}
}
5: Object Search:
private void search_google(TextView result) {

String obj = result.getText().toString();
// Create a URI with the search query
Uri searchUri = Uri.parse("https://www.google.com/search?q=" +
Uri.encode(obj));
// Create an Intent to open a web browser
Intent searchIntent = new Intent(Intent.ACTION_VIEW, searchUri);
// Check if there's an app to handle the Intent
if (searchIntent.resolveActivity(getPackageManager()) != null) {
// Start the activity
startActivity(searchIntent);
}
}
1
6: MobileNet:
try {
MobilenetV110224Quant model = MobilenetV110224Quant.newInstance(context);
// Creates inputs for reference.

TensorBuffer inputFeature0 = TensorBuffer.createFixedSize(new int[]{1, 224, 224,
3}, DataType.UINT8);
inputFeature0.loadBuffer(byteBuffer);
// Runs model inference and gets result.

MobilenetV110224Quant.Outputs outputs = model.process(inputFeature0);
TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
// Releases model resources if no longer used.

model.close();
} catch (IOException e) {
// TODO Handle the exception
}
1
7: Manifest Folder:
<?xml version="1.0" encoding="utf-8"?>

<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools">
<uses-permission android:name="android.permission.CAMERA"
tools:ignore="PermissionImpliesUnsupportedChromeOsHardware" /
>
<application
android:allowBackup="true"
android:dataExtractionRules="@xml/data_extraction_rules"
android:fullBackupContent="@xml/backup_rules"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/Theme.AdvoDetection"
tools:targetApi="31">
<activity
android:name=".CameraActivity"
android:exported="false" />
<activity
android:name=".MainActivity"
android:exported="true">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>
</manifest>
2
8: Toolbar:
<LinearLayout
android:id="@+id/toolbar_main"
android:layout_width="match_parent"
android:layout_height="?actionBarSize"
android:background="@color/primary">
<TextView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Search Engine"
android:textColor="@color/secondary"
android:textSize="20dp"
android:textStyle="bold"
android:layout_gravity="center"
android:layout_margin="15dp"/>
</LinearLayout>
2
8: Home Screen Controls:
<LinearLayout
android:layout_height="match_parent"
android:orientation="vertical"
android:layout_below="@+id/toolbar_main"
android:layout_above="@+id/bottomNavigation1">
// Gallery Button
<LinearLayout
android:id="@+id/gallery_btn"
android:layout_height="100dp"
android:layout_marginStart="20dp"
android:layout_marginEnd="20dp"
android:layout_marginTop="10dp"
android:background="@color/white"
android:elevation="5dp"
android:orientation="horizontal">
<LinearLayout
android:gravity="center"
android:layout_weight="1"
android:orientation="vertical">
<TextView
2
android:text="Gallery"
android:layout_gravity="start"
android:textColor="@color/primary"
android:textSize="18dp"/>
<TextView
android:text="Select Image from gallery"
android:textColor="@color/black"
</LinearLayout>
<ImageView
android:padding="5dp"
android:src="@drawable/gallery_png"/>
</LinearLayout>
2
// Camera Button
<LinearLayout
android:id="@+id/camera_btn"
android:background="@color/white"
android:elevation="5dp"
<ImageView
android:src="@drawable/camera_png"/>
<LinearLayout
android:orientation="vertical">
<TextView
android:text="Camera"
2
android:textColor="@color/primary"
<TextView
android:text="Capture object's image"
android:textColor="@color/black"
</LinearLayout>
</LinearLayout>
// Preview ImageView
<ImageView
android:visibility="gone"
android:id="@+id/preview_image"
android:src="@mipmap/ic_launcher"/>
2
// Object Search Button
<LinearLayout
android:visibility="gone"
android:id="@+id/search_btn"
android:background="@drawable/background_bottombar"
android:layout_margin="20dp"
<TextView
android:text="Search Object"
android:textColor="@color/secondary"/>
<ImageView
android:src="@drawable/baseline_search_24"
android:layout_marginStart="5dp"/>
</LinearLayout>
</LinearLayout>
2
9: Item Selector Draw-able

<selector xmlns:android="http://schemas.android.com/apk/res/android">
<item
android:state_selected="true"
android:color="@color/white"/>
<item
android:state_selected="false"
android:color="@color/secondary"/>
</selector>
10: Color Codes

<resources>
<color name="black">#FF000000</color>
<color name="white">#FFFFFFFF</color>
<color name="primary">#900C3F</color>
<color name="secondary">#d6cadd</color>
</resources>
2
10: Themes Day
<resources xmlns:tools="http://schemas.android.com/tools">

<style name="Base.Theme.AdvoDetection"
parent="Theme.Material3.DayNight.NoActionBar">

<item name="colorPrimary">@color/primary</item>
</style>
<style name="Theme.AdvoDetection" parent="Base.Theme.AdvoDetection" />

</resources>
10: Themes Night
<resources xmlns:tools="http://schemas.android.com/tools">

<style name="Base.Theme.AdvoDetection"
parent="Theme.Material3.DayNight.NoActionBar">

<item name="colorPrimary">@color/primary</item>
</style>
</resources>
2
Dependencies
These are dependencies specified in a Gradle build file for an Android project. Let's
break down each dependency:
implementation("androidx.appcompat:appcompat:1.6.1"):
This dependency imports the AndroidX AppCompat library version 1.6.1.
AppCompat is a support library provided by Google that allows developers to use
modern Android features on older versions of Android. It provides backward-
compatible implementations of many UI components and behaviors introduced in
newer Android versions.
implementation("com.google.android.material:material:1.10.0"):
This dependency imports the Material Components for Android library version
1.10.0. Material Components for Android is a set of UI components and styles
provided by Google to implement Material Design in Android apps. It includes
components like buttons, text fields, cards, and more, following Google's Material
Design guidelines.
implementation("androidx.constraintlayout:constraintlayout:2.1.4"):
This dependency imports the AndroidX ConstraintLayout library version 2.1.4.
ConstraintLayout is a layout manager for Android that allows developers to create
complex layouts with a flat view hierarchy. It enables the creation of responsive and
flexible user interfaces by defining constraints between UI elements.
implementation("org.tensorflow:tensorflow-lite-support:0.1.0"):
This dependency imports the TensorFlow Lite Support library version 0.1.0.
TensorFlow Lite Support provides additional utilities and support for TensorFlow
Lite models on Android devices. It includes functionalities for loading, running, and
managing TensorFlow Lite models, as well as support for common pre- and post-
2
processing tasks.
implementation("org.tensorflow:tensorflow-lite-metadata:0.1.0"):
This dependency imports the TensorFlow Lite Metadata library version 0.1.0.
TensorFlow Lite Metadata provides tools and utilities for working with metadata
associated with TensorFlow Lite models. It allows developers to access and
manipulate metadata information such as model input/output details, author
information, and model descriptions.
testImplementation("junit:junit:4.13.2"):
This dependency imports the JUnit testing framework version 4.13.2 for unit testing
purposes. JUnit is a popular framework for writing and executing unit tests in Java
and Android projects. It provides annotations and assertions for defining and
verifying test cases, helping developers ensure the correctness of their code.
androidTestImplementation("androidx.test.ext:junit:1.1.5"):
This dependency imports the AndroidX Test JUnit library version 1.1.5 for Android
instrumentation testing. It includes extensions and utilities to enhance JUnit testing
capabilities specifically for Android applications.
androidTestImplementation("androidx.test.espresso:espresso-core:3.5.1"):
This dependency imports the Espresso testing framework version 3.5.1 for UI testing
on Android. Espresso provides a fluent API for writing concise and reliable UI tests,
interacting with UI elements and verifying UI behaviors programmatically.
3
Default Config
The defaultConfig block in an Android project's Gradle build file contains
configuration settings that apply to all build variants by default. Let's break down
each parameter specified in this defaultConfig block:
applicationId = "com.wozrusfanr.example":
This parameter sets the unique application ID for the Android application. The
application ID is used to uniquely identify the app in the Google Play Store and must
be unique across all apps.
MinSdk = 24:
This parameter specifies the minimum Android SDK version required to run the
application. Devices running Android versions lower than the specified minimum
SDK version will not be able to install or run the application.
targetSdk = 33:
This parameter specifies the target Android SDK version that the application is built
and tested against. It indicates the highest version of the Android SDK that the app
is aware of and can utilize certain features and behaviors from.
VersionCode = 1:
This parameter sets the version code of the application, which is used to differentiate
between different versions of the app. The version code must be an integer value and
should increase with each subsequent version to indicate the progression of the app.
3
VersionName = "1.0":
This parameter sets the version name of the application, which is a human-readable
string used to identify the version of the app. The version name typically follows a
convention like "major.minor" (e.g., "1.0", "1.1", "2.0") to indicate major and minor
releases.
testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner":
This parameter specifies the instrumentation test runner class to be used for running
AndroidJUnit tests. It indicates the entry point for running instrumented tests on an
Android device or emulator.
In summary, the defaultConfig block defines fundamental configuration settings for
an Android application, including application ID, minimum and target SDK
versions, version code, version name, and test instrumentation runner. These settings
ensure proper functioning, compatibility, and identification of the application across
different devices and versions of Android.
3
Build Types
The buildTypes block in an Android project's Gradle build file specifies
configurations for different build types, such as "debug" or "release". Let's focus on
the "release" build type and its specific configurations:
release { ... }:
This block defines configurations specifically for the "release" build type. The
"release" build type is typically used for generating the final, production-ready
version of the application for distribution.
isMinifyEnabled = false:
This parameter disables code shrinking and obfuscation for the "release" build. Code
shrinking removes unused code and resources from the final APK to reduce its size,
while obfuscation obfuscates code to make it harder to reverse-engineer. Disabling
these features can simplify debugging and troubleshooting for the release build, but
it may result in larger APK sizes and less secure code.
proguardFiles(...):
This parameter specifies the ProGuard configuration files to be used for code
obfuscation and optimization. ProGuard is a tool used for code shrinking,
optimization, and obfuscation in Android applications. The
getDefaultProguardFile("proguard-android-optimize.txt") line specifies the default
ProGuard configuration file provided by the Android SDK, which includes
optimization rules for Android-specific code. The "proguard-rules.pro" file contains
3
custom ProGuard rules defined by the developer for additional configuration.
In summary, the buildTypes block allows developers to define specific
configurations for different build types in an Android project. For the "release" build
type, disabling code shrinking and obfuscation (isMinifyEnabled = false) and
specifying ProGuard configuration files (proguardFiles(...)) are common practices to
ensure the final APK is optimized, secure, and ready for distribution.
3
Compile Options
The compileOptions block in an Android project's Gradle build file specifies
compilation options for the Java source code used in the project. Let's break down
each parameter specified in this block:
sourceCompatibility = JavaVersion.VERSION_1_8:
This parameter sets the Java source compatibility version for the project. It specifies
the version of the Java language syntax and features that the source code is
compatible with. In this case, it's set to JavaVersion.VERSION_1_8, indicating
compatibility with Java 8 syntax and features.
targetCompatibility = JavaVersion.VERSION_1_8:
This parameter sets the target Java compatibility version for the project. It specifies
the version of the Java bytecode that the compiled classes will be compatible with.
In this case, it's also set to JavaVersion.VERSION_1_8, indicating that the compiled
bytecode will target Java 8-compatible bytecode.
These settings ensure that the project's Java source code is written and compiled
using Java 8 syntax and features (sourceCompatibility), and the resulting bytecode is
compatible with Java 8 runtime environments (targetCompatibility). By aligning the
source and target compatibility versions, developers can leverage the features and
improvements introduced in Java 8 while ensuring compatibility with the targeted
Java runtime environments.
3
Build Features
The buildFeatures block in an Android project's Gradle build file allows developers
to enable or disable certain build features. In this case, the mlModelBinding feature
is being enabled. Let's break down what this means:
mlModelBinding = true:
This parameter enables the ML model binding feature for the project. ML model
binding is a feature introduced in Android Studio Arctic Fox (2020.3.1) and higher
that simplifies the process of integrating machine learning (ML) models into
Android apps. With ML model binding enabled, Android Studio generates Java or
Kotlin classes that represent the ML models, making it easier for developers to load
and use these models in their applications. By enabling the mlModelBinding feature,
developers can take advantage of the streamlined workflow provided by Android
Studio for integrating ML models into their Android apps. This feature abstracts
away some of the complexities associated with loading and using ML models,
allowing developers to focus more on building and refining their applications'
functionality.
3
Result/output
1: Home Activity:
The Home Screen serves as the entry point for users to initiate the object detection
process. It provides two primary options for users to choose from:
Select from File or Gallery:
This option allows users to select an image from their device's file system or gallery.
When the user selects this option, the application opens the device's file explorer or
gallery, enabling them to browse through their stored images. Once the user selects
an image, the application retrieves the selected image for object detection
processing.
Capture with Camera:
This option enables users to capture a new image using the device's camera in real-
time. When the user selects this option, the application activates the device's camera
interface, allowing them to capture a photo. After capturing the image, the
application processes it immediately for object detection without saving it to the
device's storage.
3
2: Functionality Explanation:
Select from File or Gallery Option:
Upon selecting this option, the application launches an intent to open the device's
file explorer or gallery. The user can navigate through their stored images and select
the desired image for object detection. After the user selects an image, the
application retrieves the selected image's URI and proceeds with object detection
processing.
Capture with Camera Option:
When the user chooses this option, the application launches the device's camera
interface. The user can capture a new image by tapping the capture button within the
camera interface. After capturing the image, the application immediately processes it
for object detection without saving it to the device's storage.
3
3: Output Screen:
ImageView for Displaying Selected/Captured Image:
This ImageView component displays the selected or captured image, allowing users
to visualize the image on which object detection was performed. The displayed
image provides context for the detected objects and enhances user understanding.
TextView for Displaying Detected Object Name:
This TextView component dynamically displays the names of objects detected in the
selected or captured image. As the object detection process identifies objects, their
names are updated and displayed in the TextView in real-time.
Button for Searching Object Name on Search Engine:
This Button component enables users to perform a quick search on the internet for
more information about the detected object. When clicked, the button triggers a
search query using the detected object's name as the search term.
3
4: Search Button Click Functionality:
Triggering the Search Intent:
Upon clicking the search button, the app creates an intent with the
ACTION_SEARCH action. This intent indicates to the Android system that the app
wants to perform a search operation.
Preparing the Search Query:
The detected object's name, obtained from the TextView displaying the object
names, serves as the search query.The app extracts the detected object's name and
sets it as the query parameter for the search intent.
Launching the Search Intent:
The app then starts an activity that can handle the search intent, such as a web
browser or a search application.If multiple applications on the device can handle
search intents, the user may be prompted to choose the preferred application.
4
TensorFlow
1: The Single Shot Detector (SSD)
This section describes our proposed SSD framework for detection (Sec. 2.1) and the
associated training methodology (Sec. 2.2). Afterwards, Sec. 3 presents dataset-
specific model details and experimental results. Fig. 1: SSD framework. (a) SSD
only needs an input image and ground truth boxes for each object during training. In
a convolutional fashion, we evaluate a small set (e.g. 4) of default boxes of different
aspect ratios at each location in several feature maps with different scales (e.g. 8 × 8
and 4 × 4 in (b) and (c)). For each default box, we predict both the shape offsets and
the confidences for all object categories ((c1, c2, · · · , cp)). At training time, we first
match these default boxes to the ground truth boxes. For example, we have matched
two default boxes with the cat and one with the dog, which are treated as positives
and the rest as negatives. The model loss is a weighted sum between localization
loss (e.g. Smooth L1 [6]) and confidence loss (e.g. Softmax).
4
2: Model
The SSD approach is based on a feed-forward convolutional network that produces
a fixed-size collection of bounding boxes and scores for the presence of object
class
instances in those boxes, followed by a non-maximum suppression step to produce
the final detections. The early network layers are based on a standard architecture
used for high quality image classification (truncated before any classification
layers), which we will call the base network2 We then add auxiliary structure to the
network to produce detections with the following key features:
Multi-scale feature maps for detection We add convolutional feature layers to the
end of the truncated base network. These layers decrease in size progressively and
allow predictions of detections at multiple scales. The convolutional model for
predicting detections is different for each feature layer (cf Overfeat[4] and YOLO[5]
that operate on a single scale feature map).
Convolutional predictors for detection Each added feature layer (or optionally an
existing feature layer from the base network) can produce a fixed set of detection
predictions using a set of convolutional filters. These are indicated on top of the SSD
network architecture in Fig. 2. For a feature layer of size m × n with p channels, the
basic element for predicting parameters of a potential detection is a 3 × 3 × p small
kernel that produces either a score for a category, or a shape offset relative to the
default box coordinates. At each of the m × n locations where the kernel is applied,
it produces an output value.
Fig. 2: A comparison between two single shot detection models: SSD and YOLO
[5]. Our SSD model adds several feature layers to the end of a base network, which
predict the offsets to default boxes of different scales and aspect ratios and their
associated confidences.
4
3: Face Attributes
Another use-case for MobileNet is compressing large systems with unknown or
esoteric training procedures. In a face attribute classification task, we demonstrate a
synergistic relationship between MobileNet and distillation [9], a knowledge transfer
technique for deep networks. We seek to reduce a large face attribute classifier with
75 million parameters and 1600 million Mult-Adds. The classifier is trained on a
multi-attribute dataset similar to YFCC100M [32]. We distill a face attribute
classifier using the MobileNet architecture. Distillation [9] works by training the
classifier to emulate the outputs of a larger model2 instead of the ground-truth
labels, hence enabling training from large (and potentially infinite) unlabeled
datasets. Marrying the scalability of distillation training and the parsimonious
parameterization of MobileNet, the end system not only requires no regularization
(e.g. weight-decay and early-stopping), but also demonstrates enhanced
performances. It is evident from Tab. 12 that the MobileNet-based classifier is
resilient to aggressive model shrinking: it achieves a similar mean average precision
across attributes (mean AP) as the in-house while consuming only 1% the Multi-
Adds.
4: Object Detection
MobileNet can also be deployed as an effective base network in modern object
detection systems. We report results for MobileNet trained for object detection on
COCO data based on the recent work that won the 2016 COCO challenge [10]. In
table 13, MobileNet is compared to VGG and Inception V2 [13] under both Faster-
RCNN [23] and SSD [21] framework. In our experiments, SSD is evaluated with
300 input resolution (SSD 300) and Faster-RCNN is compared with both 300 and
4
600 input resolution (FasterRCNN 300, Faster-RCNN 600). The Faster-RCNN
model evaluates 300 RPN proposal boxes per image. The models are trained on
COCO train+val excluding 8k minival images Table 12. Face attribute classification
using the MobileNet architecture. Each row corresponds to a different hyper-
parameter setting (width multiplier α and image resolution).
4
MobileNetV1 model
1: Introduction
Convolutional neural networks have become ubiquitous in computer vision ever
since AlexNet [19] popularized deep convolutional neural networks by winning the
ImageNet Challenge: ILSVRC 2012 [24]. The general trend has been to make
deeper and more complicated networks in order to achieve higher accuracy [27, 31,
29, 8]. However, these advances to improve accuracy are not necessarily making
networks more efficient with respect to size and speed. In many real world
applications such as robotics, self-driving car and augmented reality, the recognition
tasks need to be carried out in a timely fashion on a computationally limited
platform. This paper describes an efficient network architecture and a set of two
hyper-parameters in order to build very small, low latency models that can be easily
matched to the design requirements for mobile and embedded vision applications.
Section 2 reviews prior work in building small models. Section 3 describes the
MobileNet architecture and two hyper-parameters width multiplier and resolution
multiplier to define smaller and more efficient MobileNets. Section 4 describes
experiments on ImageNet as well a variety of different applications and use cases.
Section 5 closes with a summary and conclusion.
2. Prior Work
There has been rising interest in building small and efficient neural networks in the
recent literature, e.g. [16, 34, 12, 36, 22]. Many different approaches can be
generally categorized into either compressing pretrained networks or training small
networks directly. This paper proposes a class of network architectures that allows a
4
model developer to specifically choose a small network that matches the resource
restrictions (latency, size) for their application. MobileNets primarily focus on
optimizing for latency but also yield small networks. Many papers on small
networks focus only on size but do not consider speed.
MobileNets are built primarily from depthwise separable convolutions initially
introduced in [26] and subsequently used in Inception models [13] to reduce the
computation in the first few layers. Flattened networks [16] build a network out of
fully factorized convolutions and showed the potential of extremely factorized
networks. Independent of this current paper, Factorized Networks[34] introduces a
similar factorized convolution as well as the use of topological connections.
Subsequently, the Xception network [3] demonstrated how to scale up depthwise
separable filters to out perform Inception V3 networks. Another small network is
Squeezenet [12] which uses a bottleneck approach to design a very small network.
Other reduced computation networks include structured transform networks [28] and
deep fried convnets [37]. A different approach for obtaining small networks is
shrinking, factorizing or compressing pretrained networks. Compression based on
product quantization [36], hashing
4
3. MobileNet Architecture
In this section we first describe the core layers that MobileNet is built on which are
depthwise separable filters. We then describe the MobileNet network structure and
conclude with descriptions of the two model shrinking hyperparameters width
multiplier and resolution multiplier.
3.1. Depthwise Separable Convolution
The MobileNet model is based on depthwise separable convolutions which is a form
of factorized convolutions which factorize a standard convolution into a depthwise
convolution and a 1×1 convolution called a pointwise convolution. For MobileNets
the depthwise convolution applies a single filter to each input channel. The
pointwise convolution then applies a 1×1 convolution to combine the outputs the
depthwise convolution. A standard convolution both filters and combines inputs into
a new set of outputs in one step. The depthwise separable convolution splits this into
two layers, a separate layer for filtering and a separate layer for combining. This
factorization has the effect of drastically reducing computation and model size.
Figure 2 shows how a standard convolution 2(a) is factorized into a depthwise
convolution 2(b) and a 1 × 1 pointwise convolution 2(c).
3.2. Network Structure and Training
The MobileNet structure is built on depthwise separable convolutions as mentioned
in the previous section except for the first layer which is a full convolution. By
defining the network in such simple terms we are able to easily explore network
topologies to find a good network. The MobileNet architecture is defined in Table 1.
All layers are followed by a batchnorm [13] and ReLU nonlinearity with the
4
exception of the final fully connected layer which has no nonlinearity and feeds into
a softmax layer for classification. Figure 3 contrasts a layer with regular
convolutions, batchnorm and ReLU nonlinearity to the factorized layer with
depthwise convolution, 1 × 1 pointwise convolution as well as batchnorm and ReLU
after each convolutional layer. Down sampling is handled with strided convolution
in the depthwise convolutions as well as in the first layer. A final average pooling
reduces the spatial resolution to 1 before the fully connected layer. Counting
depthwise and pointwise convolutions as separate layers, MobileNet has 28 layers. It
is not enough to simply define networks in terms of a small number of Mult-Adds. It
is also important to make sure these operations can be efficiently implementable. For
Figure 3. Left: Standard convolutional layer with batchnorm and ReLU. Right:
Depthwise Separable convolutions with Depthwise and Pointwise layers followed by
batchnorm and ReLU. instance unstructured sparse matrix operations are not
typically faster than dense matrix operations until a very high level of sparsity. Our
model structure puts nearly all of the computation into dense 1 × 1 convolutions.
This can be implemented with highly optimized general matrix multiply (GEMM)
functions. Often convolutions are implemented by a GEMM but require an initial
reordering in memory called im2col in order to map it to a GEMM. For instance,
this approach is used in the popular Caffe package [15]. 1×1 convolutions do not
require this reordering in memory and can be implemented directly with GEMM
which is one of the most optimized numerical linear algebra algorithms.
MobileNet spends 95% of it’s computation time in 1 × 1 convolutions which also
has 75% of the parameters as can be seen in Table 2. Nearly all of the additional
parameters are in the fully connected layer.
4
MobileNet models were trained in TensorFlow [1] using RMSprop [33] with
asynchronous gradient descent similar to Inception V3 [31]. However, contrary to
training large models we use less regularization and data augmentation techniques
because small models have less trouble with overfitting. When training MobileNets
we do not use side heads or label smoothing and additionally reduce the
amount image of distortions by limiting the size of small crops that are used in large
Inception training [31]. Additionally, we found that it was important to put very little
or no weight decay (l2 regularization) on the depthwise filters since their are so few
parameters in them. For the ImageNet benchmarks in the next section all models
were trained with same training parameters regardless of the size of the model.
3.3. Width Multiplier: Thinner Models
Although the base MobileNet architecture is already small and low latency, many
times a specific use case or application may require the model to be smaller and
faster. In order to construct these smaller and less computationally expensive models
we introduce a very simple parameter α called width multiplier. The role of the
width multiplier α is to thin a network uniformly at each layer. For a given layer and
width multiplier α, the number of input channels M becomes αM and the number of
output channels N becomes αN.
The computational cost of a depthwise separable convolution with width multiplier α
is: DK · DK · αM · DF · DF + αM · αN · DF · DF (6) where α ∈ (0, 1] with typical
settings of 1, 0.75, 0.5 and 0.25. α = 1 is the baseline MobileNet and α < 1 are
reduced MobileNets. Width multiplier has the effect of reducing computational cost
and the number of parameters quadratically by roughly α 2 Width multiplier can be
applied to any model structure to define a new smaller model with a reasonable
4
accuracy, latency and size trade off. It is used to define a new reduced structure that
needs to be trained from scratch.
4. Experiments
In this section we first investigate the effects of depthwise convolutions as well as
the choice of shrinking by reducing the width of the network rather than the number
of layers. We then show the trade offs of reducing the network based on the two
hyper-parameters: width multiplier and resolution multiplier and compare results to
a number of popular models. We then investigate MobileNets applied to a number of
different applications.
4.1. Model Choices
First we show results for MobileNet with depthwise separable convolutions
compared to a model built with full convolutions. In Table 4 we see that using
depthwise separable convolutions compared to full convolutions only reduces
accuracy by 1% on ImageNet was saving tremendously on mult-adds and
parameters. We next show results comparing thinner models with width multiplier to
shallower models using less layers. To make MobileNet shallower, the 5 layers of
separable filters with feature size 14 × 14 × 512 in Table 1 are removed. Table 5
shows that at similar computation and number of parameters, that making
MobileNets thinner is 3% better than making them shallower.
4.2. Model Shrinking Hyperparameters
Table 6 shows the accuracy, computation and size trade offs of shrinking the
MobileNet architecture with the width multiplier α. Accuracy drops off smoothly
5
until the architecture is made too small at α = 0.25. Table 7 shows the accuracy,
computation and size trade offs for different resolution multipliers by training
MobileNets with reduced input resolutions. Accuracy drops off smoothly across
resolution. Figure 4 shows the trade off between ImageNet Accuracy and
computation for the 16 models made from the cross product of width multiplier α ∈
{1, 0.75, 0.5, 0.25} and resolutions {224, 192, 160, 128}. Results are log linear with
a jump when models get very small at α = 0.25.
4.3. Fine Grained Recognition
We train MobileNet for fine grained recognition on the Stanford Dogs dataset [17].
We extend the approach of [18] and collect an even larger but noisy training set than
[18] from the web. We use the noisy web data to pretrain a fine grained dog
recognition model and then fine tune the model on the Stanford Dogs training set.
Results on Stanford Dogs test set are in Table 10. MobileNet can almost achieve the
state of the art results from [18] at greatly reduced computation and size.
4.4. Large Scale Geolocalizaton
PlaNet [35] casts the task of determining where on earth a photo was taken as a
classification problem. The approach divides the earth into a grid of geographic cells
that serve as the target classes and trains a convolutional neural network on millions
of geo-tagged photos. PlaNet has been shown to successfully localize a large variety
of photos and to outperform Im2GPS [6, 7] that addresses the same task. We re-train
PlaNet using the MobileNet architecture on the same data. While the full PlaNet
model based on the Inception V3 architecture [31] has 52 million parameters and
5.74 billion mult-adds. The MobileNet model has only 13 million parameters with
5
the usual 3 million for the body and 10 million for the final layer and 0.58 Million
mult-adds. As shown in Tab. 11, the MobileNet version delivers only slightly
decreased performance compared to PlaNet despite being much more compact.
Moreover, it still outperforms Im2GPS by a large margin.
5. Conclusion
We proposed a new model architecture called MobileNets based on depthwise
separable convolutions. We investigated some of the important design decisions
leading to an efficient model. We then demonstrated how to build smaller and faster
MobileNets using width multiplier and resolution multiplier by trading off a
reasonable amount of accuracy to reduce size and latency. We then compared
different MobileNets to popular models demonstrating superior size, speed and
accuracy characteristics. We concluded by demonstrating MobileNet’s effectiveness
when applied to a wide variety of tasks. As a next step to help adoption and
exploration of MobileNets, we plan on releasing models in Tensor Flow.
5
Results and Evaluation
Performance Evaluation of the Object Detection Model on Android:
Object Detection Accuracy:
The object detection model exhibited satisfactory accuracy in identifying objects
within images captured by Android devices. Extensive testing across various
scenarios and image types demonstrated consistent and reliable detection results.
Processing Time:
The processing time for object detection varied depending on the complexity of the
image and the computational resources of the device. On average, the model
achieved real-time or near-real-time performance, with minimal latency in detecting
objects.
Resource Consumption:
Memory usage and battery consumption were monitored during testing to assess the
model's impact on device resources. The model demonstrated efficient resource
utilization, with acceptable levels of memory usage and minimal impact on battery
life.
Comparison with Existing Solutions (if Applicable):
Performance Benchmarking:
The performance of the object detection model on Android was compared with
existing solutions and benchmarks available in the literature. While direct
comparisons may vary depending on the specific use case and dataset, our model
generally exhibited competitive performance in terms of accuracy and speed.
5
Advantages Over Alternatives:
Our solution offers the advantage of being tailored specifically for Android devices,
leveraging optimizations and platform-specific features for optimal performance and
user experience. Additionally, the integration with TensorFlow Object Detection
API provides a robust and well-supported framework for object detection tasks.
User Feedback (if Available):
Usability and User Experience: Initial user feedback highlighted the application's
intuitive interface and ease of use, particularly in selecting images and viewing
object detection results. Users appreciated the real-time feedback provided during
the object detection process, enhancing their overall experience with the application.
Feature Requests and Suggestions: Some users provided suggestions for additional
features or improvements, such as support for custom object categories, offline
mode, and performance optimizations. User feedback will be considered for future
updates and enhancements to the application.
5
Future Enhancements
1. Custom Object Detection:
Implement the capability for users to train custom object detection models directly
within the application. Allow users to define and label their own object categories,
enabling personalized object detection for specific use cases.
2. Offline Mode:
Introduce offline mode functionality, allowing users to perform object detection
without requiring an internet connection. Incorporate on-device models or caching
mechanisms to enable object detection in environments with limited or no network
connectivity.
3. Enhanced Object Tracking:
Expand the object detection capabilities to include object tracking, enabling users to
track the movement and trajectory of detected objects over time. Implement
advanced tracking algorithms such as Kalman filters or deep learning-based trackers
for improved object tracking accuracy.
4. Augmented Reality Integration:
Integrate augmented reality (AR) features to overlay information about detected
objects directly onto the camera view. Enable users to interact with detected objects
in real-time, such as accessing additional information, viewing related content, or
triggering actions.
5. Accessibility Features:
Enhance accessibility features within the application to cater to users with visual
impairments or disabilities. Implement features such as voice-guided object
detection, text-to-speech functionality, and compatibility with screen readers for
5
improved accessibility.
Suggestions for Further Research:
1. Real-Time Performance Optimization:
Investigate methods for further optimizing the object detection model's performance
on mobile devices to achieve even faster processing times and lower resource
consumption. Explore techniques such as model quantization, model pruning, and
hardware acceleration to improve inference speed and efficiency.
2. Fine-Grained Object Recognition:
Research techniques for fine-grained object recognition to enable the identification
of specific object attributes, variations, or subcategories within detected objects.
Explore methods for detecting and classifying objects with subtle differences or
variations, such as species of plants or breeds of animals.
3. Contextual Understanding:
Investigate approaches for enhancing the application's contextual understanding
capabilities, enabling it to infer the relationships between detected objects and their
surrounding environment. Explore techniques for scene understanding, semantic
segmentation, and contextual reasoning to provide deeper insights into the detected
objects' context and significance.
4. Collaborative Object Detection:
Explore collaborative object detection frameworks that leverage crowd-sourced data
and user feedback to improve the accuracy and coverage of object detection models.
Develop mechanisms for users to contribute labeled data, corrections, and
annotations to continuously refine and update the object detection capabilities.
5
5. Privacy-Preserving Object Detection:
Research techniques for performing object detection while preserving user privacy
and sensitive information. Investigate methods such as federated learning,
differential privacy, and on-device model training to enable secure and privacy-
preserving object detection in distributed environments.
5
Limitations
1: Performance Constraints:
Object detection algorithms, especially when executed on mobile devices, can be
computationally intensive and may lead to performance issues, particularly on older
or low-end devices. Processing large images or multiple objects simultaneously may
result in longer processing times and increased battery consumption.
2: Model Accuracy:
The accuracy of object detection heavily relies on the quality of the pre-trained
model used in the application. Pre-trained models may not always accurately detect
objects in various real-world scenarios, leading to false positives or missed
detections. Fine-tuning or training custom models specific to certain object
categories may improve accuracy but requires substantial computational resources
and data.
3: Limited Object Recognition:
The object detection model may struggle with identifying objects that are small,
occluded, or have complex backgrounds. Detection performance may vary
depending on the object's size, shape, orientation, and lighting conditions, leading to
inconsistencies in results.
4: Dependency on Internet Connection:
If the app relies on an online search engine for retrieving additional information
about detected objects, it may require a stable internet connection. Lack of internet
connectivity in certain environments or regions could hinder the user's ability to
access contextual information.
5
Conclusion
In conclusion, the Object Detection Android application represents a significant
milestone in the field of computer vision and mobile application development.
Throughout the project, we have explored various aspects of object detection,
implemented a robust Android application using Java and TensorFlow Object
Detection API, and provided users with a convenient means of identifying objects
within images captured by their mobile devices.
The Object Detection Android application holds significant implications for various
domains, including image recognition, augmented reality, e-commerce, and
accessibility. By leveraging the power of object detection technology on mobile
devices, the application empowers users to identify objects in their surroundings,
access relevant information, and enhance their understanding of the world around
them. In conclusion, the Object Detection Android application represents a
culmination of efforts in research, development, and implementation, aimed at
bringing the benefits of object detection technology to mobile devices. As we
continue to evolve and refine the application, we remain committed to advancing the
field of computer vision and delivering impactful solutions that empower users
worldwide.
5
Bibliography
• TensorFlow Object Detection API. (n.d.). Retrieved from

(https://github.com/tensorflow/models/tree/master/research/object_detection)
• TensorFlow: A system for large-scale machine learning. In 12th
{USENIX} Symposium on Operating Systems Design and Implementation
({OSDI} 16) (pp. 265-283).
• MobileNets: Efficient convolutional neural networks for mobile vision

applications. arXiv preprint arXiv:1704.04861.
• Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks

for large-scale image recognition.
• ShuffleNet: An extremely efficient convolutional neural network for
mobile devices. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 6848-6856).

Final Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Project Report

Uploaded by

Copyright:

Available Formats

A PROJECT REPORT

Under the guidance of

Mr. Gill S.S.

Mahatma Gandhi Mission’s

Swami Ramanand Teerth Marathwada University,

ABDUL WAHAB ( ), ABDUL RAHEMAN MOHAMMED AYYUB

SHARIF ( ), SHAIKH UBADA ILYAS ( ) submitted in partial

Dr.Mrs. Kanchan A. Nandedkar Prof. Shirish L. Kotgire

IT, Nanded and Submitted to Swami Ramanand Teerth Marathwada University

all other material used.

ABDUL RAHMAN ABDUL WAHAB

as encouragement in this complete endeavor. words are inadequate to express my

encouragement. My cordial thanks to principal prof. Shirish L. kotgire, Head of

Department Dr. Mrs. Kanchan A. Nandedkar, from MGM’s College of CS and IT

encouragement at every point of time.

ABDUL RAHMAN ABDUL WAHAB

Sr. No Contents Page. No

3 Data Flow Diagram 09-10

4 Literature Review 11-12

5 Design and Implementation 13-14

6 Source code 15-36

8 Tensor Flow 41-44

9 MobileNetV1 model 45-52

10 Results and Evaluation 53-54

11 Future Enhancements 55-57

The "Object Detection" Android application, developed using Java in Android

seeking efficient and reliable object identification capabilities on their mobile

Android Studio, Java Programming Language, TensorFlow Object Detection API,

Image Selection from Files, Camera Integration, Object Detection Model.

Overview of the Project:

The "Object Detection Android App" is an innovative application developed using

advanced object detection algorithms powered by TensorFlow to accurately

recognize and label objects present in the image.

Purpose and Objectives:

The primary purpose of this project is to develop a user-friendly and efficient

accurate identification of objects within images. The main objectives include:

• Providing users with a convenient tool to identify objects in images effortlessly.

Scope and Limitations:

capable of real-time object detection using pre-trained models provided by the

TensorFlow Object Detection API. The application allows users to interactively

camera. However, it's important to note some limitations:

Motivation Behind Choosing Object Detection for Android:

• Rising demand for mobile applications with advanced computer vision

Select Image Capture Image

Image Input Module

End Process Search Result

capturing new images using the camera.

Object Detection Process:

The selected/captured image undergoes object detection using the TensorFlow

Object Detection API.

Display Detected Objects:

user on the screen.

Search Object Name on Engine:

Background Research on Object Detection Techniques:

Object detection is a fundamental task in computer vision with numerous

applications, ranging from security surveillance to autonomous vehicles. Various

Traditional methods such as Haar cascades, Histogram of Oriented Gradients

(HOG), and feature-based methods. Modern deep learning-based approaches like

Overview of Existing Object Detection Frameworks and Libraries:

detection tasks, catering to different programming languages and platforms. Some

notable ones include:

TensorFlow Object Detection API: Developed by Google, TensorFlow provides a

comprehensive framework for training and deploying object detection models. It

provides various object detection algorithms and pre-trained models.