Professional Documents
Culture Documents
2023-2024
CAMBRIDGE INSTITUTE OF TECHNOLOGY
NORTH CAMPUS
Kundana , Bengaluru – 562110
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CERTIFICATE
This is to certified that the technical seminar entitled “Transfer Learning For Automatic
Image Orientation Detection Using Deep Learning And Logistic Regression ” has
been carried out by “Alluru Lakshmi Lavanya (1AJ20CS008)” bonafide student of Cambridge
Institute of Technology North Campus in partial fulfillment of Bachelor of Engineering in
Computer Science and Engineering of Visvesvaraya Technological University, Belagavi
during the year 2023-24.
I, Alluru Lakshmi Lavanya bearing USN 1AJ20CS008, the student of VIII semester, Computer
Science and Engineering, Cambridge Institute of Technology, hereby declare that the seminar entitled
“Transfer Learning For Automatic Image Orientation Detection Using Deep Learning And
Logistic Regression” has been carried out by us and submitted in partial fulfilment of the course
requirements of VIII semester Bachelor of Engineering in Computer Science and Engineering as
prescribed by Visvesvaraya Technological University, Belagavi, during the academic year 2023-2024.
I also declare that, to the best of my knowledge and belief, the work reported here does not form part
of any other report on the basis of which a degree or award was conferred on an earlier occasion on
this by any other student.
Date:
Place: Bangalore
(Student Signature)
Alluru Lakshmi Lavanya
1AJ20CS008
ACKNOWLEDEMENT
The satisfaction that accompanies the successful completion of any task would be
incomplete without mention of the people who made it possible, whose constant guidance
and encouragement crowned my effort of success. I take this opportunity to express my
deepest gratitude and appreciation to all those who helped us directly or indirectly toward
the successful completion of this seminar work.
We take a great privilege to express our deep gratitude to Dr. Sendamarai P, Principal,
Cambridge Institute of Technology North Campus, Bengaluru for supporting us and giving
us the opportunity to carry out our seminar.
I sincerely thank our respected HOD, Dr. Kavitha C Professor and Head of Department of
Computer Science and Engineering, CITNC, for her constant support, motivation, and
suggestions given to us in completing my seminar work.
I also thank all teaching and non-teaching staff and my parents for their kind co-operations. I
also extend my heartfelt thanks to all who directly or indirectly encouraged us to complete
this seminar.
I
ABSTRACT
II
ABSTRACT
TABLE OF CONTENTS
1. ACKNOWLEDEMENT I
2. ABSTRACT II
CHAPTER 4 ARCHITECTURE 8
CHAPTER 1
III
ABSTRACT
INTRODUCTION
The features of an image in such a way so that it can detect the arbitrary angle, the image
is rotated. Though there are some modern cameras with inertial sensors can correct image
orientation in 90 degrees step, but this function generally is not used. In this paper, we
proposed a method to detect the orientation angle of a captured image: a post processing
step captured in any camera (both older and newer camera models) with any tilted angle
(between 0 and 359 degree). After we detect the orientation angle we reverse the angle to
correct the orientation of the image. From human perspective it is somehow easy to
approximately tell the orientation angle of an image based on the elements present in the
image. But for a machine an image is just a matrix with pixel values. Thanks to
Convolutional Neural Networks for which it has been possible to build an Image
Orientation Angle Detection Model which predicts the orientation angle so accurately that
it outperforms all the image orientation techniques published in the community.
Orientation correction work has been done since long time for document analysis.
These methods need the special structure of the documents images i.e. precise shape of
letters or text layout in lines. But for natural images there is no such boundaries available,
so it is quite hard for these methods to work properly on the natural images. However,
this paper is inspired from the work of Fischer etc., on image orientation angle estimation
for natural images. So, for the most of the parts of this paper, we did a comparative study
with Fischer etc., and finally with latest CNN architecture, modified loss function and
optimizing technique on the same COCO dataset we resulted a better model which
outperforms Fischer etc., with quite a significant margin and our method gives the state-
of-the-art result on this problem and uses interpolation artifacts by applying rotation to
the digital images.
However, this method does not work for those images which were not taken
upright. Solanki etc., predicted the rotation of the printed images by analyzing the pattern
of printer dots. But, this method does not work on the digital images. Horizon detection is
a special kind of image angle detection method but it strongly depends on the presented
horizon of the image. However, most of the images do not contain horizon. With
advances in the digital imaging industry, photography, and image understanding, there is
IV
ABSTRACT
significant demand for digital imagery storage, processing, printing, and retrieval tools.
These tools require information regarding the degree of orientation of the image to
process and display the image correctly and relevantly. Once a multiple of 90 is detected
as an orientation, it is easy to correct the image orientation. This is why a picture captured
with a digital camera can only have one of the four previous orientations.
Today’s smart phones and digital cameras have a built-in orientation sensor that
tracks the cameras direction while taking a picture and storing it in the images EXIF
metadata. Such information from cheap digital cameras is sometimes missing or erased
by other applications. Detecting image orientation is a difficult task because the digital
images content is highly variable. The motivation behind this work comes from the
necessity of having correctly oriented images in many critical elds. Moreover, the
nefarious issue arises when images are misoriented. Detecting image orientation is crucial
in many compulsory domains, such as medical diagnosis, robot assisted automatic
intervention systems (RAIS), authentication systems, and face detection. For instance, in
magnetic resonance imaging (MRI), the position and orientation of slice groups are
critical for achieving high image diagnostic quality and meeting various clinical
workflow.
Many bronchial branches with standard orientations are found in automatic robot-
assisted intervention systems (RAIS), such as bronchoscopy. A system that can detect
image orientation during an intervention can significantly help the surgeon prevent
fatigue and minimize mistakes. Most biometric based personal authentication systems
that use fingerprints to enhance verification systems rely on pattern-based matching
because the system algorithm requires the patterns size, type, and orientation in the
fingerprint alignment image. Therefore, a method for detecting the orientation of a
fingerprint image considerably improves the accuracy of these systems.
V
ABSTRACT
face recognition system can improve detection performance and help security guards
prevent dishonest behavior.
CHAPTER 2
LITERATURE SURVEY
VI
ABSTRACT
VI
I
ABSTRACT
The task of automatic image orientation detection plays a crucial role in various image
processing applications, such as image classification, object detection, and content-based
image retrieval. However, accurately determining the orientation of an image can be
challenging, especially when dealing with large datasets or diverse image types.
Traditional approaches to image orientation detection often rely on handcrafted features
or shallow learning models, which may struggle to generalize well across different
datasets or image categories.
VI
II
ABSTRACT
Transfer Learning:
Curate a diverse dataset of images with different orientations and labels. Split the
dataset into training, validation, and testing sets. Evaluate the performance of the
proposed approach using standard evaluation metrics such as accuracy, precision, recall,
and F1 score.
Existing System:
In our work, we strive to address all these issues by applying deep learning and
taking advantage of the computational capacity of convolutional neural networks for
feature extraction instead of low-level feature extraction and semantic-based image
extraction. We demonstrate the importance of the transfer learning technique in
improving our image orientation detection task. Transfer learning involves training a base
network on a given dataset and then transfer ring the learned features to another network
for training.
Advantages:
Reduced Training Time: This initialization often leads to faster convergence during
IX
ABSTRACT
training, reducing the overall time required to train the model from scratch.
Improved Performance: By utilizing these representations as a starting point, the
model can achieve better performance on tasks related to image orientation detection
with less data.
Disadvantages:
Limited Flexibility: While transfer learning provides a convenient starting point for
model training, it may not always capture all the nuances of the target task.
Complexity: Managing the interactions between different components of the model,
such as feature extraction layers and logistic regression classifiers, can be challenging
and may require careful tuning of hyperparameters.
CHAPTER 4
ARCHITECTURE
The system architecture combines transfer learning, deep learning, and logistic regression
for automatic image orientation detection. Preprocessing ensures data uniformity,
followed by fine-tuning pretrained deep learning models to extract features. Logistic
regression then predicts orientation labels. The dataset is split for training, validation, and
testing. Once traned, the system can be deployed into various applications, ensuring
scalability and flexibility.
X
ABSTRACT
The proposed system architecture for automatic image orientation detection integrates
transfer learning with logistic regression to create a robust solution. It begins with
preprocessing the input dataset to ensure consistency in data quality and format.
Leveraging pretrained deep learning models like VGG, ResNet, or Inception, transfer
learning fine-tunes these models on the target dataset, extracting relevant features crucial
for orientation detection.
Logistic regression then acts as a classification layer, utilizing these features to
predict orientation labels accurately. The dataset undergoes splitting for training,
validation, and testing, facilitating model training, hyperparameter tuning, and
performance evaluation, respectively. Once trained and validated, the system is ready for
seamless deployment and integration into various image processing applications,
promising scalability and flexibility. Incorporating a feedback loop ensures continuous
refinement and enhancement of the model's accuracy over time. This architecture aims to
deliver a reliable and efficient solution for automatic image orientation detection,
addressing diverse application needs effectively.
XI
ABSTRACT
The flow architecture for automatic image orientation detection encompasses a systematic
process to ensure accurate and efficient prediction. Beginning with data acquisition, the
system gathers a diverse dataset comprising images of varying orientations. Subsequently,
preprocessing standardizes the dataset, ensuring uniformity in size, format, and quality,
thus preparing it for feature extraction. Leveraging transfer learning, pretrained deep
learning models extract pertinent features from the images.
XI
I
ABSTRACT
CHAPTER 5
METHODOLOGY
XI
II
ABSTRACT
The reuse of a pre-trained model on a new problem is known as transfer learning in
machine learning. A machine uses the knowledge learned from a prior assignment to
increase prediction about a new task in transfer learning. You could, for example, use the
information gained during training to distinguish beverages when training a classifier to
predict whether an image contains cuisine.
In computer vision, neural networks typically aim to detect edges in the first layer,
forms in the middle layer, and task-specific features in the latter layers. The early and
central layers are employed in transfer learning, and the latter layers are only retrained. It
makes use of the labelled data from the task it was trained on. Transfer learning offers a
number of advantages, the most important of which are reduced training time, improved
neural network performance (in most circumstances), and the absence of a large amount
of data. To train a neural model from scratch, a lot of data is typically needed, but access
to that data isn’t always possible.
XI
V
ABSTRACT
X
V
ABSTRACT
It moves in the same direction as the nonlinear layer. It works with the image's
width and height, performing a down- sampling procedure on them. As a result, the
size of the image is lowered. This means that if some features were already
recognized during the previous convolution operation, a detailed image is no longer
required for further processing and is reduced into smaller images.
X
VI
ABSTRACT
It's primary to link an overall linked layer after completing the succession of
convolution, non-linear, and pooling layers. This layer receives the convolution
network's output data. When a completely connected layer is attached to the
network's end, it produces an N- dimensional vector, where Ni is the number of
classes from which the model chooses the needed class.
Deep learning methods aim at learning feature hierarchies with features from
higher levels of the hierarchy formed by the composition of lower-level features.
Automatically learning features at multiple levels of abstraction allows a system
to learn complex functions mapping the input to the output directly from data,
without depending completely on human-crafted features.
Logistic regression is used for binary classification where we use sigmoid function, that
takes input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class 1 it
belongs to Class 0. It’s referred to as regression because it is the extension of linear
regression but is mainly used for classification problems.
The logistic regression model transforms the linear regression function continuous
value output into categorical value output using a sigmoid function, which maps any real-
valued set of independent variables input into a value between 0 and 1. This function is
known as the logistic function.
0 if Class 1
Y=
1 if Class 2
X
VI
ABSTRACT
z = ( Eni=1 wixi) + b
z = w.X+b
Sigmoid Function:
Now we use the sigmoid function where the input will be z and we find the probability
between 0 and 1. i.e., predicted y.
As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e., between 0 and 1.
~(z) tends towards 1
Z ∞, ~(z) tends towards 0
Z -∞, ~(z) is always bounded between 0 and 1
XI
X
ABSTRACT
Logistic Regression Equation:
The odd is the ratio of something occurring to something not occurring. it is different
from probability as the probability is the ratio of something occurring to everything that
could possibly occur. so odd will be:
P(x)/1-p(x) = e^z
log[ p(x)/1-p(x)] = z
log[ p(x)/1-p(x)] = w. X + b
p(x)/ 1-p(x) = e w. X + b
p(x) = e w. X + b . (1-p(x))
p(x) = e w. X + b - e w. X + b . p(x)
p(x) + e w. X + b . p(x) = e w. X + b
p(x) (1 + e w. X + b) = e w. X + b
p(x) = e w. X + b / 1 + e w. X + b
p(X; b, w) = e w. X + b / 1 + e w. X + b = 1 / 1 + e w. X + b
Computer vision is an interdisciplinary scientific field that deals with how computers can
gain high-level understanding from digital images or videos. From the perspective of
engineering, it seeks to understand and automate tasks that the human visual system can
do, Computer vision tasks include methods for acquiring, processing, analyzing, and
understanding digital images, and extraction of high- dimensional data from the real
world to produce numerical or symbolic information, e.g. in the forms of decisions,
Understanding in this context means the transformation of visual images (the input of the
retina) into descriptions of the world that make sense to thought processes and can elicit
appropriate action.
X
X
ABSTRACT
CHAPTER 6
RESULT
Our model outperformed other models. We noticed that the accuracy of our method for
the sun-397 dataset reached 98.83%. In contrast, the technique which uses logistic
regression for the decision, achieved an accuracy of 81.69%. The model which applies
convolutional neural networks, gained 95.16%. However, our proposal, which combines
the power of convolutional neural networks through transfer learning and logistic
regression efficiency, delivered outstanding state-of-the-art results in the image
orientation detection task. Regarding the method, and because we both chose to use
logistic regression for the decision part, we can argue that the feature extraction part,
which uses low-level hand engineering features, affected the models performance.
Their approach is based on extracting low-level features from the image based on
the local binary pattern distribution, which may have good algorithmic complexity but
does not generate well compared to the application of convolutional neural networks for
this task. We suggest that feature extraction significantly impacts subsequent
classification and detection. We further discuss this using an AlexNet-inspired CNN
X
XI
ABSTRACT
architecture pre trained on the Places365 dataset, we can say that the use of the Places365
dataset instead of the ImageNet dataset for pre-training could impact the overall
performance of their model and their architectural choice.
Similarly, their method relies on the use of AlexNet which applies the utilization
of rectified linear error to add non linearity, whereas our framework built on the ResNet
model has increased accuracy by applying batch normalization and skip connections. We
also observed how our model performed excellently in the MIT indoor dataset, with an
accuracy of 98.97%. This result proves the ability of our framework to handle both the
local and global discriminative information that characterizes indoor images. Outdoor
images are often characterized only by global image proper ties only, thus making it
difficult to deal with indoor images. In INRIA and Pascal, we noticed decrease the
accuracy.
This may be due to the nature of the images in these datasets, which contain
categories that do not appear in the SUN-397 training dataset or are rare, such as animals
and bicycles. In addition, there is the availability of images with ambiguous orientation. A
further key point to note is that our models impressive results highlight that convolutional
neural networks, are not 100% rotation invariant. This implies that the training data
should have as many rotations as possible; alternatively, we can follow the same
methodology adopted in this study to tackle this challenge. That is, using a tool for high-
level feature extraction, a high-performance machine learning or deep learning algorithm
that can deal with the orientation process can be implemented.
Fig 11: The four different possible predictions of the model are 0 , 90 , 180 , and 270 from left to right,
respectively.
X
XI
ABSTRACT
Fig 12: Qualitative results of our model. Above, non-professional rotated input images captured with the
phones camera. Below, images with corrected orientation after detection of the orientation.
CONCLUSION
In conclusion, the proposed system architecture for automatic image orientation detection
presents a sophisticated framework that seamlessly integrates transfer learning and
logistic regression. By capitalizing on the strengths of these techniques, the system
achieves a delicate balance between accuracy and efficiency. Through meticulous
preprocessing steps, including standardization of dataset attributes, feature extraction
from pretrained deep learning models, and classification using logistic regression, the
system effectively predicts image orientations across a wide array of datasets. Its
systematic flow architecture ensures a smooth progression from data acquisition to model
deployment, facilitating seamless integration into diverse applications.
FUTURE SCOPE
X
XI
ABSTRACT
implications are emphasized as important directions for future research and development.
By pursuing these avenues, automatic image orientation detection can continue to evolve,
offering increasingly accurate, adaptable, and ethical solutions to meet diverse application
needs. In the realm of automatic image orientation detection, the future holds promising
avenues for advancement and application. Through the continued refinement of transfer
learning coupled with logistic regression, we can anticipate significant improvements in
both accuracy and efficiency.
Moreover, efforts to enhancement will ensure that the model's decisions are
transparent and trustworthy, essential for applications requiring human oversight.
Integration with other technologies, such as natural language processing and edge
computing, will further expand the model's capabilities and utility. Overall, the future of
automatic image orientation detection holds immense potential for innovation and impact
across various fields and industries.
BIBILOGRAPHY
X
X
ABSTRACT
[2] Y. Zou, B. Guan, J. Zhao, S. Wang, X. Sun, and J. Li, Robotic-assisted automatic
orientation and insertion for bronchoscopy based on image guidance, IEEE Trans.
Med. Robot. Bionics, vol. 4, no. 3, pp. 588598, Aug. 2022, doi:
10.1109/TMRB.2022.3194320.
[5] R. Bai and X. Guo, Automatic orientation detection of abstract paint ing, Knowl.-
Based Syst., vol. 227, Sep. 2021, Art. no. 107240, doi:
10.1016/j.knosys.2021.107240.
X
X
ABSTRACT
LIST OF FIGURES
1 SYSTEM ARCHITECTURE 8
3 FLOW ARCHITECTURE 10
4 TRANSFER LEARNING 11
5 NEURAL MODEL 12
10 SIGMOID FUNCTION 16
12 CORRECTED ORIENTATION 19
X
X