You are on page 1of 30

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

JNANA SANGAMA, BELAGAVI-590018

A Technical Seminar Report On

“Transfer Learning For Automatic Image Orientation Detection


Using Deep Learning And Logistic Regression”

Submitted in partial fulfillment for the award of the degree of


Bachelor of Engineering
In
Computer Science & Engineering
Of Visvesvaraya Technological University, Belagavi
Submitted by:
Alluru Lakshmi Lavanya
1AJ20CS008

UNDER THE GUIDANCE OF


P GopalaKrishna
Asst. Prof, Department of CSE
CITNC

CAMBRIDGE INSTITUTE OF TECHNOLOGY NORTH CAMPUS


KUNDANA, BENGALURU-562110
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2023-2024
CAMBRIDGE INSTITUTE OF TECHNOLOGY
NORTH CAMPUS
Kundana , Bengaluru – 562110
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certified that the technical seminar entitled “Transfer Learning For Automatic
Image Orientation Detection Using Deep Learning And Logistic Regression ” has
been carried out by “Alluru Lakshmi Lavanya (1AJ20CS008)” bonafide student of Cambridge
Institute of Technology North Campus in partial fulfillment of Bachelor of Engineering in
Computer Science and Engineering of Visvesvaraya Technological University, Belagavi
during the year 2023-24.

Signature of Guide Signature of HOD


P GopalaKrishna Dr. Kavitha
Asst.Prof, Guide Prof and HOD
Dept of CSE , CITNC Dept of CSE, CITNC
DECLARATION

I, Alluru Lakshmi Lavanya bearing USN 1AJ20CS008, the student of VIII semester, Computer
Science and Engineering, Cambridge Institute of Technology, hereby declare that the seminar entitled
“Transfer Learning For Automatic Image Orientation Detection Using Deep Learning And
Logistic Regression” has been carried out by us and submitted in partial fulfilment of the course
requirements of VIII semester Bachelor of Engineering in Computer Science and Engineering as
prescribed by Visvesvaraya Technological University, Belagavi, during the academic year 2023-2024.

I also declare that, to the best of my knowledge and belief, the work reported here does not form part
of any other report on the basis of which a degree or award was conferred on an earlier occasion on
this by any other student.

Date:
Place: Bangalore

(Student Signature)
Alluru Lakshmi Lavanya
1AJ20CS008
ACKNOWLEDEMENT

The satisfaction that accompanies the successful completion of any task would be
incomplete without mention of the people who made it possible, whose constant guidance
and encouragement crowned my effort of success. I take this opportunity to express my
deepest gratitude and appreciation to all those who helped us directly or indirectly toward
the successful completion of this seminar work.

I express my deep gratitude to our institute, Cambridge Institute of Technology North


Campus, Bengaluru, which provided an opportunity and platform for fulfilling our dreams
and desires to reach my goal.

We take a great privilege to express our deep gratitude to Dr. Sendamarai P, Principal,
Cambridge Institute of Technology North Campus, Bengaluru for supporting us and giving
us the opportunity to carry out our seminar.

I would like to express my sincere thanks to my Guide, Prof. P GopalaKrishna Assistant


Professor, Department of Computer Science & Engineering, Cambridge Institute of
Technology North Campus, Bengaluru for her valuable guidance, encouragement and
suggestions which helped me a lot in the completion of the technical seminar.

I sincerely thank our respected HOD, Dr. Kavitha C Professor and Head of Department of
Computer Science and Engineering, CITNC, for her constant support, motivation, and
suggestions given to us in completing my seminar work.

I also thank all teaching and non-teaching staff and my parents for their kind co-operations. I
also extend my heartfelt thanks to all who directly or indirectly encouraged us to complete
this seminar.

Alluru Lakshmi Lavanya


1AJ20CS008

I
ABSTRACT

Humans possess an extraordinary ability to transfer knowledge. When faced with a


new problem or challenge, we naturally tap into our reservoir of past experiences and
apply relevant expertise to resolve it. It's like having a powerful weapon that allows
us to navigate tasks with ease and efficiency. Thus, if you know how to ride a bicycle
and want to learn to ride a motorbike, your experience with a bicycle will help you
handle tasks like balancing and steering, making the learning process smoother
compared to starting from scratch. In the framework of the digital era, the technology
of image processing is one of the technologies that is being used increasingly often in
all aspects of modern life. Image correction may be handled using algorithms based
on computer vision, which can increase the correction's influence on the image as a
whole. To implement the power of deep neural networks, we applied a convolutional
neural network model pre-trained on the ImageNet database for feature extraction.
Then, we built a multi-class logistic regression classifier to detect the four image
orientation probabilities corresponding to the following orientations (0 for no
orientation, 90, 180, and 270).

II
ABSTRACT

TABLE OF CONTENTS

DESCRIPTIONS PAGE NO.

1. ACKNOWLEDEMENT I

2. ABSTRACT II

CHAPTER CHAPTER NAME


CHAPTER 1 INTRODUCTION 2
CHAPTER 2 LITERATURE SURVEY 4
CHAPTER 3 PROBLEM STATEMENT 6

CHAPTER 4 ARCHITECTURE 8

4.1 SYSTEM ARCHITECTURE


4.2 PROPOSED SYSTEM ARCHITECTURE
4.3 FLOW ARCHITECTURE
CHAPTER 5 METHODOLOGY 11

5.1 TRANSFER LEARNING


5.2 CONVOLUTIONAL NEURAL NETWORKS (CNN)
5.3 DEEP LEARNING
5.4 LOGISTIC REGRESSION
5.5 COMPUTER VISION
CHAPTER 6 RESULT 18
CONCLUSION 20
FUTURE SCOPE 21
BIBILOGRAPHY 22

CHAPTER 1
III
ABSTRACT
INTRODUCTION

The features of an image in such a way so that it can detect the arbitrary angle, the image
is rotated. Though there are some modern cameras with inertial sensors can correct image
orientation in 90 degrees step, but this function generally is not used. In this paper, we
proposed a method to detect the orientation angle of a captured image: a post processing
step captured in any camera (both older and newer camera models) with any tilted angle
(between 0 and 359 degree). After we detect the orientation angle we reverse the angle to
correct the orientation of the image. From human perspective it is somehow easy to
approximately tell the orientation angle of an image based on the elements present in the
image. But for a machine an image is just a matrix with pixel values. Thanks to
Convolutional Neural Networks for which it has been possible to build an Image
Orientation Angle Detection Model which predicts the orientation angle so accurately that
it outperforms all the image orientation techniques published in the community.

Orientation correction work has been done since long time for document analysis.
These methods need the special structure of the documents images i.e. precise shape of
letters or text layout in lines. But for natural images there is no such boundaries available,
so it is quite hard for these methods to work properly on the natural images. However,
this paper is inspired from the work of Fischer etc., on image orientation angle estimation
for natural images. So, for the most of the parts of this paper, we did a comparative study
with Fischer etc., and finally with latest CNN architecture, modified loss function and
optimizing technique on the same COCO dataset we resulted a better model which
outperforms Fischer etc., with quite a significant margin and our method gives the state-
of-the-art result on this problem and uses interpolation artifacts by applying rotation to
the digital images.

However, this method does not work for those images which were not taken
upright. Solanki etc., predicted the rotation of the printed images by analyzing the pattern
of printer dots. But, this method does not work on the digital images. Horizon detection is
a special kind of image angle detection method but it strongly depends on the presented
horizon of the image. However, most of the images do not contain horizon. With
advances in the digital imaging industry, photography, and image understanding, there is
IV
ABSTRACT
significant demand for digital imagery storage, processing, printing, and retrieval tools.
These tools require information regarding the degree of orientation of the image to
process and display the image correctly and relevantly. Once a multiple of 90 is detected
as an orientation, it is easy to correct the image orientation. This is why a picture captured
with a digital camera can only have one of the four previous orientations.

Today’s smart phones and digital cameras have a built-in orientation sensor that
tracks the cameras direction while taking a picture and storing it in the images EXIF
metadata. Such information from cheap digital cameras is sometimes missing or erased
by other applications. Detecting image orientation is a difficult task because the digital
images content is highly variable. The motivation behind this work comes from the
necessity of having correctly oriented images in many critical elds. Moreover, the
nefarious issue arises when images are misoriented. Detecting image orientation is crucial
in many compulsory domains, such as medical diagnosis, robot assisted automatic
intervention systems (RAIS), authentication systems, and face detection. For instance, in
magnetic resonance imaging (MRI), the position and orientation of slice groups are
critical for achieving high image diagnostic quality and meeting various clinical
workflow.

Many bronchial branches with standard orientations are found in automatic robot-
assisted intervention systems (RAIS), such as bronchoscopy. A system that can detect
image orientation during an intervention can significantly help the surgeon prevent
fatigue and minimize mistakes. Most biometric based personal authentication systems
that use fingerprints to enhance verification systems rely on pattern-based matching
because the system algorithm requires the patterns size, type, and orientation in the
fingerprint alignment image. Therefore, a method for detecting the orientation of a
fingerprint image considerably improves the accuracy of these systems.

A further significant implementation of image-orientation detection can be found


in face detection. As noted, images shown upside down to a face detection system will
dramatically decrease the performance compared to images shown on the right side up.
Thus, a pre-system that can detect face orientation and correct it before sending it to a

V
ABSTRACT
face recognition system can improve detection performance and help security guards
prevent dishonest behavior.

CHAPTER 2
LITERATURE SURVEY

1. Automatic Image Orientation Detection

 Author: Aditya Vailaya, HongJiang Zhang and Anil Jain


 Content based image organization and retrieval has emerged as an important
area in computer vision and multimedia computing due to the technological
advances in digital imaging storage and networking.
 With the development of digital photography as well as inexpensive scanners
it is possible for us to store vacation and family photographs on our personal
computers.
 Automatic image orientation detection is a very difficult problems. Humans
use object recognition and contextual information to identify the correct
orientation of an image.
 Since information regarding the presence of semantic objects such as sky
grass house people furniture etc and their interrelationships cannot be reliably
extracted from general images.
 We rely on low level visual features e.g. spatial color distributions texture etc
for orientation detection.
 All image management systems require information about the true image
orientation when a user scans a picture she expects the resulting image to be
displayed in its correct orientation regardless of the orientation in which the
photograph was placed on the scanner.

2. Research on Traditional Art Image Reconstruction Method Based on Computer


Vision Analysis

VI
ABSTRACT

 Author: M.Supriya, Muruganandham B, K.Murugan, Muthu R¸ Manasa


Krishnan K, Kuncham Anvesh.
 The day-to-day lives of people have consequently entered a new age as a
direct result of these technological breakthroughs.
 Its image works are a reflection of the historic talents and aesthetic forms of a
country, and they incorporate a rich national aesthetic spirit as well as the
value of the times Automatic image orientation detection is a very difficult
problems.
 Humans use object recognition and contextual information to identify the
correct orientation of an image.
 This was because the environment was only two dimensional. This was the
situation since they had no other choice than to use the side.
 The widespread adoption of network and information technology, as well as
the invention and development of its applications, have improved technology
and the analogue image.

3. Construction of image processing model based on computer vision algorithm

 Author: Lin Shao


 At present, image classification technologies mainly include annotation-based
methods and content-based methods. The manual annotation method is
subjective, and it is not operable for the increasingly large image database.
 Content-based image classification is mainly to extract feature values of
images, such as color, texture, space, etc.
 With the development of information technology, a large number of images
appear every day.
 How to classify these images and make full use of them is a problem that
needs continuous research and innovation in the field of computer vision.
 Finally, by combining the model with the kernel model, the similarity matrix
between images is obtained by using cosine similarity, and it is combined
with the kernel model with a certain weight through normalization.

VI
I
ABSTRACT

4. Deep Image Orientation Angle Detection

 Author: Subhadip Maji, Smarajit Bose


 Image orientation angle detection is a pretty challenging task for a machine,
because the machine has to learn the features of an image in such a way so
that it can detect the arbitrary angle, the image is rotated.
 From human perspective it is somehow easy to approximately tell the
orientation angle of an image based on the elements present in the image.
CHAPTER 3
PROBLEM STATEMENT

The task of automatic image orientation detection plays a crucial role in various image
processing applications, such as image classification, object detection, and content-based
image retrieval. However, accurately determining the orientation of an image can be
challenging, especially when dealing with large datasets or diverse image types.
Traditional approaches to image orientation detection often rely on handcrafted features
or shallow learning models, which may struggle to generalize well across different
datasets or image categories.

To address these challenges, this seminar aims to explore the effectiveness of


transfer learning combined with deep learning architectures and logistic regression for
automatic image orientation detection. Transfer learning leverages the pre-trained
knowledge from a source domain (e.g., a large image dataset like ImageNet) to improve
the performance of a target domain task (e.g., image orientation detection). By fine-tuning
a pre-trained deep learning model on the target task data, we can effectively leverage the
learned representations and adapt them to the specific requirements of image orientation
detection. By curating a diverse dataset and employing standard evaluation metrics, the
seminar seeks to develop a robust system that outperforms traditional approaches, thus
facilitating more efficient and accurate image orientation detection crucial for various
image processing applications.

VI
II
ABSTRACT
Transfer Learning:

Utilize transfer learning techniques to leverage pre-trained models (e.g., VGG,


ResNet, or Inception) trained on large-scale image datasets for feature extraction. Fine-
tune these models on the target dataset to adapt them to the task of image orientation
detection.

Logistic Regression Classification:

Implement logistic regression as a classification layer on top of the pre-trained


deep learning features to predict the orientation labels of the images. Experiment with
different configurations of logistic regression models to optimize performance.
Dataset Preparation and Evaluation:

Curate a diverse dataset of images with different orientations and labels. Split the
dataset into training, validation, and testing sets. Evaluate the performance of the
proposed approach using standard evaluation metrics such as accuracy, precision, recall,
and F1 score.

Existing System:

In our work, we strive to address all these issues by applying deep learning and
taking advantage of the computational capacity of convolutional neural networks for
feature extraction instead of low-level feature extraction and semantic-based image
extraction. We demonstrate the importance of the transfer learning technique in
improving our image orientation detection task. Transfer learning involves training a base
network on a given dataset and then transfer ring the learned features to another network
for training.

Advantages:

 Reduced Training Time: This initialization often leads to faster convergence during

IX
ABSTRACT
training, reducing the overall time required to train the model from scratch.
 Improved Performance: By utilizing these representations as a starting point, the
model can achieve better performance on tasks related to image orientation detection
with less data.

Disadvantages:

 Limited Flexibility: While transfer learning provides a convenient starting point for
model training, it may not always capture all the nuances of the target task.
 Complexity: Managing the interactions between different components of the model,
such as feature extraction layers and logistic regression classifiers, can be challenging
and may require careful tuning of hyperparameters.

CHAPTER 4
ARCHITECTURE

4.1 SYSTEM ARCHITECTURE

The system architecture combines transfer learning, deep learning, and logistic regression
for automatic image orientation detection. Preprocessing ensures data uniformity,
followed by fine-tuning pretrained deep learning models to extract features. Logistic
regression then predicts orientation labels. The dataset is split for training, validation, and
testing. Once traned, the system can be deployed into various applications, ensuring
scalability and flexibility.

X
ABSTRACT

Fig 1: System Architecture

4.2 PROPOSED SYSTEM ARCHITECTURE

The proposed system architecture for automatic image orientation detection integrates
transfer learning with logistic regression to create a robust solution. It begins with
preprocessing the input dataset to ensure consistency in data quality and format.
Leveraging pretrained deep learning models like VGG, ResNet, or Inception, transfer
learning fine-tunes these models on the target dataset, extracting relevant features crucial
for orientation detection.
Logistic regression then acts as a classification layer, utilizing these features to
predict orientation labels accurately. The dataset undergoes splitting for training,
validation, and testing, facilitating model training, hyperparameter tuning, and
performance evaluation, respectively. Once trained and validated, the system is ready for
seamless deployment and integration into various image processing applications,
promising scalability and flexibility. Incorporating a feedback loop ensures continuous
refinement and enhancement of the model's accuracy over time. This architecture aims to
deliver a reliable and efficient solution for automatic image orientation detection,
addressing diverse application needs effectively.
XI
ABSTRACT

Fig 2: Proposed System Architecture

4.3 FLOW ARCHITECTURE

The flow architecture for automatic image orientation detection encompasses a systematic
process to ensure accurate and efficient prediction. Beginning with data acquisition, the
system gathers a diverse dataset comprising images of varying orientations. Subsequently,
preprocessing standardizes the dataset, ensuring uniformity in size, format, and quality,
thus preparing it for feature extraction. Leveraging transfer learning, pretrained deep
learning models extract pertinent features from the images.

These features serve as input to a logistic regression classifier, which predicts


orientation labels based on the learned representations. The dataset is then divided into
training, validation, and testing sets for model training, hyperparameter tuning, and
evaluation. Upon successful training and validation, the model is deployed and integrated
into various applications, offering seamless integration and scalability. Additionally, a
feedback loop mechanism allows for continuous refinement of the model's performance
over time, ensuring adaptability to evolving datasets and requirements. This flow
architecture facilitates robust and accurate automatic image orientation detection,
addressing diverse application needs effectively.

XI
I
ABSTRACT

Fig 3: Flow Architecture

CHAPTER 5
METHODOLOGY

5.1 Transfer Learning

XI
II
ABSTRACT
The reuse of a pre-trained model on a new problem is known as transfer learning in
machine learning. A machine uses the knowledge learned from a prior assignment to
increase prediction about a new task in transfer learning. You could, for example, use the
information gained during training to distinguish beverages when training a classifier to
predict whether an image contains cuisine.

The knowledge of an already trained machine learning model is transferred to a


different but closely linked problem throughout transfer learning. For example, if you
trained a simple classifier to predict whether an image contains a backpack, you could use
the model’s training knowledge to identify other objects such as sunglasses.

Fig 4: Transfer Learning

In computer vision, neural networks typically aim to detect edges in the first layer,
forms in the middle layer, and task-specific features in the latter layers. The early and
central layers are employed in transfer learning, and the latter layers are only retrained. It
makes use of the labelled data from the task it was trained on. Transfer learning offers a
number of advantages, the most important of which are reduced training time, improved
neural network performance (in most circumstances), and the absence of a large amount
of data. To train a neural model from scratch, a lot of data is typically needed, but access
to that data isn’t always possible.

XI
V
ABSTRACT

Fig 5: Neural Model

5.2 Convolutional Neural Networks (CNN)

In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep


neural networks, most commonly applied to analyze visual imagery. Now when we think
of a neural network we think about matrix multiplications but that is not the case with
ConvNet. It uses a special technique called Convolution.

Now in mathematics convolution is a mathematical operation on two functions


that produces a third function that expresses how the shape of one is modified by the
other. But we don’t really need to go behind the mathematics part to understand what a
CNN is or how it works. Bottom line is that the role of the ConvNet is to reduce the
images into a form that is easier to process, without losing features that are critical for
getting a good prediction.

X
V
ABSTRACT

Fig 6: Convolutional Neural Networks

Fig 7: CNN Featuring and Data

THE NON-LINEAR LAYER:

After each convolution process, it is added. It features an activation function


that provides a non-linear property; without this trait, a network would be
insufficiently intense and unable to simulate the response variable.

THE POOLING LAYER:

It moves in the same direction as the nonlinear layer. It works with the image's
width and height, performing a down- sampling procedure on them. As a result, the
size of the image is lowered. This means that if some features were already
recognized during the previous convolution operation, a detailed image is no longer
required for further processing and is reduced into smaller images.
X
VI
ABSTRACT

Fig 8: The Pooling Layer

FULLY CONNECTED LAYER:

It's primary to link an overall linked layer after completing the succession of
convolution, non-linear, and pooling layers. This layer receives the convolution
network's output data. When a completely connected layer is attached to the
network's end, it produces an N- dimensional vector, where Ni is the number of
classes from which the model chooses the needed class.

Fig 9: Fully Connected Layer

5.3 Deep Learning

Deep learning methods aim at learning feature hierarchies with features from
higher levels of the hierarchy formed by the composition of lower-level features.
Automatically learning features at multiple levels of abstraction allows a system
to learn complex functions mapping the input to the output directly from data,
without depending completely on human-crafted features.

Deep learning algorithms seek to exploit the unknown structure in the


X
VI
ABSTRACT
input distribution to discover good representations, often at multiple levels, with
higher-level learned features defined in terms of lower-level features.

5.4 Logistic Regression

Logistic regression is used for binary classification where we use sigmoid function, that
takes input as independent variables and produces a probability value between 0 and 1.

For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class 1 it
belongs to Class 0. It’s referred to as regression because it is the extension of linear
regression but is mainly used for classification problems.

The logistic regression model transforms the linear regression function continuous
value output into categorical value output using a sigmoid function, which maps any real-
valued set of independent variables input into a value between 0 and 1. This function is
known as the logistic function.

Let the independent input features be:


x11 … x1m
x21 … x2m
X= . … .
. … .
Xn1 … xnm

and the dependent variable is Y having only binary value i.e., 0 or 1.

0 if Class 1
Y=
1 if Class 2

then, apply the multi-linear function to the input variables X.

X
VI
ABSTRACT
z = ( Eni=1 wixi) + b

Here xi is the ith observation of X, wi=[w1,w2,w3,…..,wm] is the weights or Coefficient,


and b is the bias term also known as intercept. simply this can be represented as the dot
product of weight and bias.

z = w.X+b

whatever we discussed above is the linear regression.

Sigmoid Function:

Now we use the sigmoid function where the input will be z and we find the probability
between 0 and 1. i.e., predicted y.

Fig 10: Sigmoid function

As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e., between 0 and 1.
 ~(z) tends towards 1
 Z ∞, ~(z) tends towards 0
 Z  -∞, ~(z) is always bounded between 0 and 1

where the probability of being a class can be measured as:


P(y=1) = ~(z)
P(y=0) = 1- ~(z)

XI
X
ABSTRACT
Logistic Regression Equation:

The odd is the ratio of something occurring to something not occurring. it is different
from probability as the probability is the ratio of something occurring to everything that
could possibly occur. so odd will be:
P(x)/1-p(x) = e^z

Applying natural log on odd. then log odd will be:

log[ p(x)/1-p(x)] = z
log[ p(x)/1-p(x)] = w. X + b
p(x)/ 1-p(x) = e w. X + b
p(x) = e w. X + b . (1-p(x))
p(x) = e w. X + b - e w. X + b . p(x)
p(x) + e w. X + b . p(x) = e w. X + b
p(x) (1 + e w. X + b) = e w. X + b
p(x) = e w. X + b / 1 + e w. X + b

then the final logistic regression equation will be:

p(X; b, w) = e w. X + b / 1 + e w. X + b = 1 / 1 + e w. X + b

5.5 Computer Vision

Computer vision is an interdisciplinary scientific field that deals with how computers can
gain high-level understanding from digital images or videos. From the perspective of
engineering, it seeks to understand and automate tasks that the human visual system can
do, Computer vision tasks include methods for acquiring, processing, analyzing, and
understanding digital images, and extraction of high- dimensional data from the real
world to produce numerical or symbolic information, e.g. in the forms of decisions,
Understanding in this context means the transformation of visual images (the input of the
retina) into descriptions of the world that make sense to thought processes and can elicit
appropriate action.

X
X
ABSTRACT

This image understanding can be seen as the disentangling of symbolic


information from image data using models constructed with the aid of geometry, physics,
statistics, and learning theory. The scientific discipline of computer vision is concerned
with the theory behind artificial systems that extract information from images. The image
data can take many forms, such as video sequences, views from multiple cameras, multi-
dimensional data from device.

CHAPTER 6
RESULT

Our model outperformed other models. We noticed that the accuracy of our method for
the sun-397 dataset reached 98.83%. In contrast, the technique which uses logistic
regression for the decision, achieved an accuracy of 81.69%. The model which applies
convolutional neural networks, gained 95.16%. However, our proposal, which combines
the power of convolutional neural networks through transfer learning and logistic
regression efficiency, delivered outstanding state-of-the-art results in the image
orientation detection task. Regarding the method, and because we both chose to use
logistic regression for the decision part, we can argue that the feature extraction part,
which uses low-level hand engineering features, affected the models performance.

Their approach is based on extracting low-level features from the image based on
the local binary pattern distribution, which may have good algorithmic complexity but
does not generate well compared to the application of convolutional neural networks for
this task. We suggest that feature extraction significantly impacts subsequent
classification and detection. We further discuss this using an AlexNet-inspired CNN
X
XI
ABSTRACT
architecture pre trained on the Places365 dataset, we can say that the use of the Places365
dataset instead of the ImageNet dataset for pre-training could impact the overall
performance of their model and their architectural choice.

Similarly, their method relies on the use of AlexNet which applies the utilization
of rectified linear error to add non linearity, whereas our framework built on the ResNet
model has increased accuracy by applying batch normalization and skip connections. We
also observed how our model performed excellently in the MIT indoor dataset, with an
accuracy of 98.97%. This result proves the ability of our framework to handle both the
local and global discriminative information that characterizes indoor images. Outdoor
images are often characterized only by global image proper ties only, thus making it
difficult to deal with indoor images. In INRIA and Pascal, we noticed decrease the
accuracy.

This may be due to the nature of the images in these datasets, which contain
categories that do not appear in the SUN-397 training dataset or are rare, such as animals
and bicycles. In addition, there is the availability of images with ambiguous orientation. A
further key point to note is that our models impressive results highlight that convolutional
neural networks, are not 100% rotation invariant. This implies that the training data
should have as many rotations as possible; alternatively, we can follow the same
methodology adopted in this study to tackle this challenge. That is, using a tool for high-
level feature extraction, a high-performance machine learning or deep learning algorithm
that can deal with the orientation process can be implemented.

Fig 11: The four different possible predictions of the model are 0 , 90 , 180 , and 270 from left to right,
respectively.
X
XI
ABSTRACT

Fig 12: Qualitative results of our model. Above, non-professional rotated input images captured with the
phones camera. Below, images with corrected orientation after detection of the orientation.

CONCLUSION

In conclusion, the proposed system architecture for automatic image orientation detection
presents a sophisticated framework that seamlessly integrates transfer learning and
logistic regression. By capitalizing on the strengths of these techniques, the system
achieves a delicate balance between accuracy and efficiency. Through meticulous
preprocessing steps, including standardization of dataset attributes, feature extraction
from pretrained deep learning models, and classification using logistic regression, the
system effectively predicts image orientations across a wide array of datasets. Its
systematic flow architecture ensures a smooth progression from data acquisition to model
deployment, facilitating seamless integration into diverse applications.

Furthermore, the incorporation of a feedback loop mechanism enables continuous


refinement, allowing the system to adapt to evolving datasets and user needs over time.
X
XI
ABSTRACT
With its potential to revolutionize image processing tasks by delivering improved
accuracy, speed, and versatility. This approach holds promise for various domains reliant
on automated image orientation detection. The application of transfer learning for
automatic image orientation detection using deep learning and logistic regression presents
a promising approach for addressing this task. By leveraging pre-trained deep learning
models such as convolutional neural networks (CNNs), we can effectively extract features
from images and transfer this knowledge to a logistic regression classifier for orientation
detection.

Through our experimentation, we have demonstrated the effectiveness of this


approach in accurately classifying image orientations across various datasets. By fine-
tuning a pre-trained CNN on a relatively small dataset specific to the task at hand, we can
mitigate the need for extensive data collection and training, thus reducing computational
costs and time requirements. Furthermore, the integration of logistic regression as a
classifier allows for interpretable results and provides insights into the decision-making
process. This combination of deep learning and traditional machine learning techniques
enhances the robustness of the orientation detection system.

FUTURE SCOPE

The future scope of automatic image orientation detection is outlined, highlighting


potential avenues for advancement in the field. Deep learning techniques offer promise
for further improvement, with exploration into advanced architectures like attention
mechanisms and transformer models anticipated to enhance feature extraction and
classification accuracy. Additionally, the development of models capable of incremental
learning and adaptation, as well as efforts towards enhancing interpretability and
addressing ethical considerations, are highlighted as crucial for the field's progression.

Finally, optimizing models for real-time applications and considering societal

X
XI
ABSTRACT
implications are emphasized as important directions for future research and development.
By pursuing these avenues, automatic image orientation detection can continue to evolve,
offering increasingly accurate, adaptable, and ethical solutions to meet diverse application
needs. In the realm of automatic image orientation detection, the future holds promising
avenues for advancement and application. Through the continued refinement of transfer
learning coupled with logistic regression, we can anticipate significant improvements in
both accuracy and efficiency.

This progress will be driven by ongoing research into fine-tuning


hyperparameters, exploring novel architectures, and optimizing techniques to push the
boundaries of performance. As datasets become more diverse and expansive, the
scalability of the model will enable its adaptation to a broader range of scenarios and
domains. Real-time applications, such as augmented reality and autonomous systems,
stand to benefit from these developments, as the model becomes more capable of rapid
and accurate orientation detection.

Moreover, efforts to enhancement will ensure that the model's decisions are
transparent and trustworthy, essential for applications requiring human oversight.
Integration with other technologies, such as natural language processing and edge
computing, will further expand the model's capabilities and utility. Overall, the future of
automatic image orientation detection holds immense potential for innovation and impact
across various fields and industries.

BIBILOGRAPHY

[1] Y. Zhao, K. Zeng, Y. Zhao, P. Bhatia, M. Ranganath, M. L. Kozhikkavil, C.


Li,andG.Hermosillo, Deeplearningsolutionformedicalimagelocal ization and
orientation detection, Med. Image Anal., vol. 81, Oct. 2022, Art. no. 102529, doi:
10.1016/j.media.2022.102529.

X
X
ABSTRACT
[2] Y. Zou, B. Guan, J. Zhao, S. Wang, X. Sun, and J. Li, Robotic-assisted automatic
orientation and insertion for bronchoscopy based on image guidance, IEEE Trans.
Med. Robot. Bionics, vol. 4, no. 3, pp. 588598, Aug. 2022, doi:
10.1109/TMRB.2022.3194320.

[3] S. Bakheet, A. Al-Hamadi, and R. Youssef, A ngerprint-based veri cation framework


using Harris and SURF feature detection algorithms, Appl. Sci., vol. 12, no. 4, p.
2028, Feb. 2022.

[4] C. J. Palmer, E. Goddard, and C. W. G. Clifford, Face detection from patterns of


shading and shadows: The role of overhead illumination in generating
thefamiliarappearanceofthehumanface, Cognition,vol.225, Aug. 2022, Art. no.
105172, doi: 10.1016/j.cognition.2022.105172.

[5] R. Bai and X. Guo, Automatic orientation detection of abstract paint ing, Knowl.-
Based Syst., vol. 227, Sep. 2021, Art. no. 107240, doi:
10.1016/j.knosys.2021.107240.

[6] M.G.Johnson,A.A.J.Muday,andJ.Schirillo, Whenviewingvariations in paintings by


Mondrian, aesthetic preferences correlate with pupil size, Psychol. Aesthetics,
Creativity, Arts vol. 4, no. 3, p. 161, 2010.

X
X
ABSTRACT

LIST OF FIGURES

FIG NO. FIGURE TITLE PAGE NO.

1 SYSTEM ARCHITECTURE 8

2 PROPOSED SYSTEM ARCHITECTURE 9

3 FLOW ARCHITECTURE 10

4 TRANSFER LEARNING 11

5 NEURAL MODEL 12

6 CONVOLUTIONAL NEURAL NETWORKS 12

7 CNN FEATURING & DATA 13

8 THE POOLING LAYER 13

9 FULLY CONNECTED LAYER 14

10 SIGMOID FUNCTION 16

11 DIFFERENT PREDICTIONS OF THE MODEL 19

12 CORRECTED ORIENTATION 19

X
X

You might also like