You are on page 1of 75

Abstract

Self-driving cars have emerged as a promising solution to reduce road accidents,


congestion, and carbon emissions. However, developing autonomous driving systems that
can safely and efficiently operate in complex real-world environments remains a
significant challenge. Self-driving cars have become a hot topic in the last 10 years,
thanks to significant technological progress. The project's aim is to train a neural network
for driving an autonomous vehicle agent on the Udacity simulator tracks. For example,
Udacity has put out the simulator as Open-Source software and enthusiasts have created a
competition to show cars how to ride on camera images and deep learning. This research
paper provides a comprehensive review of the state-of-art research on self-driving cars,
with a focus on the technical aspects of perception, and decision making.
The perception system is generally divided into many subsystems responsible for tasks
such as self-driving-car localization, static obstacles mapping, moving obstacles detection
and tracking, road mapping, traffic signalization detection and recognition, among others.
The decision-making system is commonly partitioned as well into many subsystems
responsible for tasks such as route planning, path planning, behaviour selection, motion
planning, and control.
Behavioral cloning technique is used to mimic human driving behavior in the training
mode on the track. That means a dataset is generated in the simulator by user driven car in
training mode, and the deep neural network model then drives the car in autonomous
mode
This paper presents a detailed description of the architecture of the autonomy system by
building a Self-driving car model using Computer Vision and Deep Neural Network. The
paper lists prominent self-driving car research platforms developed by academia and
technology companies, and reported in the media. And also, outlines future research
directions and identifies potential areas for improvement to accelerate the adoption of
self-driving cars.
Overall, this paper aims to provide a clear understanding of the current state of research
on self-driving cars and to guide researchers and practitioners towards the development of
safe and reliable autonomous driving systems.

1
Chapter 1

Introduction
1.1 Definition of Self-driving car
Self-Driving cars, also known as autonomous vehicles, are vehicles that are capable of
sensing their environment and operating without human input. They use a combination of
sensors, such as radar, lidar, cameras, and GPS, along with advanced algorithms and
computer system, to navigate roads and make decisions on their own [1].
Self-driving cars operate in various modes, ranging from partial automation, where the
vehicles can assist with some driving tasks but still requires a human driver, to full
automation, where the vehicle can operate without human input. These vehicles have the
potential to revolutionize transportation by reducing accidents caused by human error,
improving traffic flow and efficiency, increasing accessibility for people who are unable
to drive, and reducing environmental impact. However, there are also concerns related to
safety, cybersecurity, legal and ethical issues, and the potential impact on employment in
the transportation sector [2].

Fig: 1.1 Self Driving Car [18]

1.2 Levels of Autonomy of Self-Driving Car

Fig 1.2 Levels of Autonomy [20]

2
1. Level 0 – No Automation: The vehicle does not have any independent features at this
level, the driver shall take charge of every aspect of driving. The driver will perform all of
the driving tasks manually, e.g. steering, baking and acceleration.[3]

2. Level 1 – Driver Assistance: This level of autonomy includes, but is not limited to,
technologies such as cruise control and lane keeping aids; the driver continues to be solely
responsible for steering, braking and acceleration. The driver must remain alert and
accelerate the vehicle. At all times, the driver must remain alert and in readiness to take
control of the vehicle. [3]

3. Level 2 – Partial Automation: At this level, vehicles may be able to control the steering
and acceleration as well as deceleration under certain conditions. However, it is still up to
the driver to keep a close eye on the vehicle's environment and prepare for taking over if
necessary. Adaptive cruise control and lane centering assist are examples of Level 2
features. [3]

4. Level 3 – Conditional Automation: Vehicles at this level can drive autonomously in some
circumstances, for example on motorways, and the driver shall be prepared to assume
control of the vehicle if traffic conditions have changed. At Level 3, the vehicle is capable
of sensing the driving environment and making decisions as to when to take over driving
duties, but the driver must remain alert and ready to take over at any time. [3]

5. Level 4 – High Automation: In the majority of cases, a vehicle is able to operate on its
own in this level but under certain circumstances it may be up to the driver to take over
control. In regular driving conditions, vehicles at Level 4 do not require the input or
interaction of humans. However, in some driving environments or conditions, the vehicle
may be restricted to Level 4. [3]

6. Level 5 – Full Automation: This is the highest level of autonomy that a vehicle can have,
with its ability to drive in all traffic conditions and environments. Human inputs or
intervention are not needed to operate Level 5 vehicles, and passengers have the
opportunity of fully relaxing and taking part in other activities while they drive
themselves. [3]

It is important to note that the SAE levels are not always universally adopted, slightly
different definitions for the levels of autonomy. [3]

1.3 Merit and Demerit of Self-Driving Cars


Advantages:
1. Increased Safety: Self driving cars are using multiple sensors and cameras to monitor their
surroundings, reducing the risk of accidents that might result from mistakes made by
humans. [4]

3
2. Reduced Traffic congestion: Self-driving cars can communicate with each other, which
can help to improve traffic flow and reduce congestion. [4]

3. Increased efficiency: The journey, speed and fuel consumption of self driving vehicles can
be optimised. It leads to greater efficiency of travel. [4]

4. Improved accessibility: Self driving cars could make travel more accessible to people who
cannot drive themselves, such as those with disabilities or the elderly. [4]

Disadvantages:
1. Cost: Currently, the cost of manufacturing and maintaining autonomous vehicles is so
high that they may be restricted to a limited number of people. [4]

2. Dependence on technology: Self-driving cars rely on complex technology, which can be


vulnerable to malfunctions and hacking. [4]

3. Job loss: The widespread adoption of self-driving cars could lead to job losses in
industries such as taxi and truck driving. [4]

4. Ethical dilemmas: In situations where ethics have to be taken into account, e.g. in
determining the safety of passengers or pedestrians, selfdriving cars may find themselves
forced to make ethical decisions. [4]

5. Legal and regulatory challenges: Unpredictability for consumers and manufacturers will
continue to arise from the changing legal and regulatory frameworks related to
autonomous cars. [4]

1.4 Companies involved in Self-Driven cars


development
There are many companies involved in self-driving car development. Here are some of the
most prominent ones:

1. Waymo: Owned by Alphabet (Google's parent company), Waymo has been described as a
leader of the self-driving car industry. In different parts of the U.S., they'd test their self-
driving cars. [5]

2. Tesla: Tesla's been known for the electric vehicles, but it has spent a couple of years
developing its own autonomous vehicle technology. They've been offering an autopilot
system that can assist in driving, but it's still not completely self-sufficient. [5]

4
3. Uber: Uber has conducted tests with autonomous cars in select cities, although the
programme was briefly suspended following a fatal accident involving one of their
vehicles. [5]

4. General Motors (GM): Through its Cruise Automation subsidiary, GM made significant
investments in the field of autonomous driving technology. In San Francisco, they've been
testing their robot cars. [5]

1.5 Impact of Self-Driven Car


Self-driving cars have the potential to have a significant impact on society in several ways

1. Increased safety: The potential to enhance road safety is one of the most important
potential benefits for self-driving cars. Human error is one of the most important causes of
road accidents, which can be eliminated by automated vehicles. [5]

2. Reduced traffic congestion: As autonomous cars are equipped with communication tools
that allow them to optimize routes and prevent bottlenecks, they can have a potential
improvement in traffic flow and reduce road congestion. [5]

3. Improved accessibility Mobility for people with a disability or an old age could
potentially be enhanced by the use of autonomous vehicles. The need for car ownership
might also be reduced, which would make travel more accessible and affordable. [5]

4. Environmental impact: Self driving vehicles, if they are designed with a view to greater
fuel efficiency and optimising the routes used for reducing fuel consumption, might be
able to reduce emissions and improve air quality. [5]

5. Job displacement: Jobs like cab drivers or truck drivers could potentially be lost as a result
of autonomous cars. However, in areas like software development and data analysis they
could contribute to creating new jobs. [5]

5
Chapter 2

Literature Review
Proposed Work
Since the mid-1980s, several universities, research centers, automobile firms, and
companies from other industries throughout the world have studied and produced self-
driving cars (also known as autonomous cars and driverless cars). Navlab's mobile
platform (Thorpe et al., 1991), the University of Pavia and University of Parma's
automobile, ARGO (Broggi et al., 1999), and UBM's vehicles, VaMoRs and VaMP
(Gregor et al., 2002) are important examples of self-driving car research platforms in the
previous two decades. The DARPA Grand Challenge (Buehler et al., 2007) challenged
self-driving automobiles to navigate a 132-mile route through lowlands, dry lake beds,
and mountain passes, involving three tight tunnels and more than 100 acute left and right
turns. There were 23 contenders in this competition, and four cars completed the route
within the time restriction. Stanley (Thrun et al., 2006) from Stanford University took first
prize, with Sandstorm and H1ghlander from Carnegie Mellon University finishing second
and third, respectively. [6]
Researchers propose developing advanced perception algorithms that can accurately
detect and classify objects in complex driving scenarios. This involves leveraging
multiple sensor modalities such as LiDAR, radar, and cameras, and fusing the data from
these sensors to create a comprehensive understanding of the environment. The focus is
on improving object detection, tracking, and prediction capabilities to enhance the
vehicle's perception capabilities and enable safe navigation. The application of machine
learning and deep learning techniques to train models that can understand and predict the
behavior of other road users, analyze complex traffic conditions, and make appropriate
driving judgments is being investigated by researchers. Reinforcement learning
techniques are also used to improve the overall performance of autonomous systems by
optimizing decision-making processes. [7]
High-definition maps are proposed to provide accurate and up-to-date information about
the road network, including lane markings, traffic signs, and traffic flow. The
development of mapping techniques that efficiently create and update these maps in real-
time is a focus of proposed work. Additionally, localization algorithms that can precisely
position the vehicle within the mapped environment are explored to enable accurate
navigation and maneuvering [8].
The proposed control and planning work focuses on building algorithms that provide safe
and efficient self-driving automobile trajectories. The study investigates the application of
optimization approaches such as Model Predictive Control (MPC) and Dynamic
Programming to plan real-time vehicle motions while taking into account aspects such as
traffic laws, obstacle avoidance, and vehicle dynamics. Cooperative planning solutions
that take other cars, pedestrians, and cyclists into account are also examined. [9]

6
An key component of the proposed effort is understanding how humans interact with self-
driving automobiles and their acceptance of autonomous technology. The research focuses
on creating user-friendly interfaces, communication tactics, and trust-building processes
to improve interactions between vehicles and their passengers or other road users. To
improve the entire user experience, studies are done to acquire insights into user views,
expectations, and concerns about self-driving automobiles. In addition, the proposed work
addresses the legal, ethical, and societal ramifications of self-driving cars. To address the
issues connected with autonomous vehicles, research is looking into legal frameworks,
liability restrictions, and insurance policies. The allocation of duty and ethical
implications in crucial situations are also investigated. Furthermore, studies are being
conducted to determine the possible influence of self-driving cars on traffic patterns,
energy usage, and urban planning. [10]

Faraday's rule of electromagnetism governs DC motors, and there are numerous


techniques to adjust their speed and rotation direction [27]. • PWM (Pulse-width
modulation) is used in this project to regulate the speed of the motor, with values ranging
from 0 to 255 in the Arduino IDE platform. • Reversing the polarity of the DC motors
causes the direction of rotation to be reversed, and an H-bridge is utilized for this purpose
[27]. • The motor driver receives signals from the Arduino UNO and adjusts the speed and
direction of rotation of the DC motors based on these inputs. The Raspberry Pi sends four
signals to the Arduino UNO, which handles image processing and machine learning data.
[11]
Autonomous vehicles have brought about a technological revolution, particularly in the
field of computer vision, which encompasses image processing, machine learning, and
deep learning algorithms. [12]
Chishti et al. used a process termed "Supervised learning" in which a large amount of data
was collected with high precision by a driven car. [14] Memon et al. employed another
method, known as the "Target Car" method, in which the vehicle obtains data by
following the direction and location of another vehicles. [13]
The detection of the lanes of the road on which the vehicle is driven is crucial for an
autonomous vehicle. The "Hough Transformation," which defines a series of methods for
feature extraction, and SVM (Support Vector Machine) for machine learning algorithms
that may be applied using lines, edges, and the region of interest, are two strategies used
to detect the lanes. [15]
The same research could yield excellent accuracy when processing capability and light
intensity differences are included. Satozda et al. claim that the same system's accuracy has
risen by enhancing the Hough Transformation algorithm in lane detection and introducing
the (HAHT), which stands for hierarchical additive Hough Transformation. Another
solution is to employ YOLO (You Only Look Once) and CNN (Convolutional Neural
Network) algorithms, which recognize the lane and associated objects on highways but
not on city streets. [16]
Machine learning is used to recognize traffic signs. The traffic sign may be identified by
using the CNN network and transforming the input photos to grayscale. When compared
to other approaches, deep CNN architecture offers a greater accuracy rate. A similar

7
approach was used for sign detection, which uses a CNN-SVM classification with a
98.6% accuracy rate, which takes a raw image of a sign converted to grayscale and then
trains the image for feature extraction and classification. This model is tested prior to data
processing, and if the process fails, it should be trained again for optimal results. [17]
The primary purpose of computer vision is to rebuild images and movies taken by digital
cameras or natural scene optical sensor arrays. Every scene is made up of pixels, edges,
shapes, colors, and textures, which are then processed by artificial intelligence systems.
The OpenCV library is used for image processing and machine learning, and it translates
images from one format to another based on the type of data required, such as grayscale,
RGB, BGR, and thresholding. In OpenCV, thresholding is the technique for substituting
pixel values. The threshold value is used to compare each pixel value. When the value of
a pixel is less than the threshold, it is set to 0 (black). If the value exceeds the threshold, it
is set to 1. [18]

𝑔(𝑥,𝑦)={0, if 𝑔 (𝑥,𝑦) ≤ T1, if 𝑔 (𝑥,𝑦)> 𝑇

One of the most popular edge detectors is Canny edge. It can recognize edges with high
precision and capture as many edges as possible on an image. It also has the lowest level
of noise when compared to other detectors. In a graphical representation, the histogram
function shows the distribution of pixels. The histogram's x-axis depicts the various color
values between 0 and 255, while the y-axis shows the number of times a certain color
value appears in the image. Pointers, which are variables that store memory addresses, can
be used to acquire the values of the histogram function. They are used to hold additional
variables, addresses, or memory elements. [19]

8
Chapter 3

Methodology
3.1 Technology Behind self-Driving cars
Self-driving cars, also known as autonomous vehicles, are equipped with a complex
network of sensors, processors and software that work together to enable to navigate and
make decisions without human intervention.
1. Lidar (Light Detection and Ranging): Lidar sensors use lasers to create a 3D map of the
environment around the vehicle. This allows the car to detect and track objects, such as
other vehicles, pedestrians, and road makings.

2. Radar (Radio Detection and Ranging): Radar sensors use radio waves to detect and track
the speed and distance of the objects around the vehicle. This helps the car to avoid
collisions and adjust its speed to maintain a safe distance from other vehicles.

3. Cameras: Cameras provide a visual feed of the surroundings, which can be used for object
detection, lane detection, and traffic sign recognition.

4. GPS (Global Positioning System): GPS is used to determine the location of the vehicle
and to plan a route to the destination.

5. Machine Learning and Artificial Intelligence: The sensors collect large amounts of data,
which is processed by machine learning algorithms to help the car make decisions in real-
time.

6. Control Systems: The car’s control system uses the information gathered by sensors and
processed by algorithms to make decisions about steering, braking, acceleration.

3.2 Model Evaluation


3.2.1 Finding Lanes Lines

9
Fig 3.2.1 test_image for finding lane
Loading Images
The Purpose of this section is to build a program that can identify lane lines in a
picture or a video. When we drive a car, we can see where the lane lines are using our
eyes. A car doesn’t have any eyes and that’s where computer vision comes in, which,
through complex algorithms, helps the computer see the world as we do.
In our case, we’ll be using it to see the road and identify lane lines in a series of
camera images.

imread(‘’) – This function will do is read the image from our file and return it as
multidimensional numpy array containing the relative intensities of each pixel in the
image.
imshow(‘’) – This function renders the image and show in the output terminal window.

Edge Detection Technique


The goal of the edge detection technique is to identify the boundaries of objects
within image. Essence will be using as a detection to try and find regions in an image
where there is a sharp change in intensity, as sharp change in color.
An image is matrix, an array of pixels. A pixel contains the light intensity at some
location in the image, each pixels intensity denoted by numeric values that ranges from
zero to 255.
An intensity of value of zero indicates no intensity if something is completely black,
whereas 255 represents maximum intensity, something completely white. A gradient is
the change in brightness over a series of pixels. A strong gradient indicates a steep
change, whereas a small gradient represents a shallow change. This helps us identify
edges in our image since an edge is defined by the difference in intensity values in
adjacent pixels and wherever there’s a sharp change in intensity, a rapid change in
brightness, wherever there’s a strong gradient, there is a corresponding bright pixel in the
gradient image. By tracing out all of these pixels, we obtain the edges.
We’re going to use this intuition to detect the edges in our own image. This is a
multi-process.
Step 1: Grayscale Conversion of image
A three-channel color image would have red, green and blue channels pixels, a
combination of three intensity values, whereas a grayscale image only has one channel
each pixel with only one intensity value ranging from zero to 255.

10
The point being, by using a grayscale image processing, a single channel is faster
than processing a three-channel color image and less computationally intensive.

Fig 3.2.2 test_image ( Gray scale Image)

Step 2 : Image Smoothening


While it’s important to accurately catch as many edges in the images as possible, we
must filter out any image noise.
Image noise can create false edges and ultimately affect edge detection. That’s
why it’s imperative to filter it out and thus smoothening the image, filtering out noise and
smoothing will be done a Gaussian filter.
Gaussian Filter is a commonly used linear filter for image smoothing of blurring.
It works by convolving the image with a 2D Gaussian function, which is a bell-shaped
curve that gives more weight to pixels closer to the centre and less weight o pixels farther
away.

11
The main idea behind using a Gaussian filter for image smoothing is to remove
high-frequency noise from the image while preserving the low-frequency content, such as
edges and textures. This is achieved by averaging the values of neighbouring pixels, with
the weights determined by Gaussian function.
The standard deviation(sigma) of the gaussian function controls the amount of
smoothing applied to the image. A larger sigma value results in a smoother image, while a
smaller value preserves more details. In practice, the choice of sigma depends on the
characteristics of the image and the level of smoothing desired.

We are applying a Gaussian blur on a grayscale image with a five-by-five kernel. The size
of the kernel is dependent on specific situations. A five-by-five kernel is a good size for
most cases. It will ultimately return a new image that we simply called blur.
Applying the gaussian blur by involving our image with a kernel of Gaussian values
reduces noise in our image back to our project.

Fig 3.2.3 test_image.jpg - Blurred Image after applying Gaussian Filter

12
Step 3: Simple Edge Detection
Now it’s time to apply the KANY method to identify edges in our image.
Recall that an edge corresponds to a region in an image where there is a sharp change in
an image, the change in brightness over series of pixels is the gradient.
A strong gradient indicates a steep change, whereas a small gradient, as shallow change,
whereas a small gradient, a shallow change.
We first established that an image as it’s composed of pixels, can therefore be read as
matrix an array of pixels, can therefore be read as a matrix, an array of pixel intensities.
To compute the gradients the gradients in a image, one must recognize that we can also
represent an image in a two dimensional coordinate space X and Y.The X axis traverses
the images width and the Y axis goes along the images heights with representing the
number of columns in the image height, the number of roes such that the product of both
within height gives us the total number of pixels in our image.The point being, not only
can we look at our image as an array, but also as a continuous function of X and Y. Since
it’s a mathematical function, we can perform mathematical operation, which begs the
question, what operator can we use to determine a rapid change in brightness in our
image?
What KANY function will do for us is perform a derivate on our function in both X and Y
directions, thereby measuring the change in intensity with respect to adjacent pixels. A
small derivative is a small change in intensity, whereas a big derivative is a big change by
computing the derivative in all directions of the image. We’re computing the gradients
since recall the gradient is the change in brightness over a series of pixels.So we call
KANY function, it does all of that for us. It computes the gradient in all directions of our
blurred image and is then going to trace our strongest gradients as a series of white pixels.

Notice these two arguments, low threshold and high threshold. While this actually allows
us to isolate the adjacent pixels that follow the strongest gradients, if the gradient is larger
than the upper threshold, then it is accepted as an edge pixel. If it is below the lower
threshold, it is rejected. If the gradient is between the thresholds, then it will be accepted
only if it is connected to a strong edge.

13
Fig 3.2.3 test_image.jpg ( Gradient Image)
Above fig represents the gradient image, which clearly traces an outline of the edges that
correspond to the sharpest changes in intensity, gradients that exceed the high threshold
are traced as bright pixels, identifying adjacent pixels in the image with the most rapid
changes in brightness.
Small changes in the brightness are not traced at all. And accordingly, they are black as
they fall below the threshold.

Step 4: Identify Lane line in image - REGION OF INTEREST

14
Fig : 3.2.4 lane image with X and Y axis
Y axis starts from the first row of pixels and then goes downwards with our axis. We’re
going to limit the extent of our field of view based on the REGION OF INTEREST,
which ultimately traces a triangle where the vertices of 200 along X axis and 700 pixels
along the Y, which would simply be the bottom of the image.

15
700 pixels at the Y, the very bottom and last vertex will simply be 550 pixels along X and
250 pixels travelling down the Y, ultimately tracing a triangle that isolates the region
where we want to identify the lines.
So the next goal is to create an image that’s completely black, a mask with the same
dimensions as our road image and fill part of its area where they triangular polygon first
and foremost will revert, showing the image to being done with OPEN CV rather than
MATPLOTLIB.

Polygon variable takes different parameters as precise location as region of interest.

Fig: 3.2.5 Area of Interest


We have to fill this mask with our triangle using CV2.fillPoly().

16
Fig 3.2.5 Region of interest filled with white color (High intensity)
The above figure represents our mask and inside of the mask, there is the enclosed region
and polygon with specified vertices.Previously, we created a mask with the same
dimensions as our road image. We then identified a region of interest in or own image
with very specific vertices along the X and Y axis that we then used to fill our mask.
We’re going to use it to only show a specific portion of the image.
Each image is represented as an array of pixels in the range of intensities ranging from 0
to 255. In our image black intensity is marked as 0 whereas white intensity is marked as
255. Each of these intensities is represented as binary numbers, where black portion is
marked as 0000 and white portion is marked as 1111. We’re going to apply this mask on
our Canny image to ultimately only show the region of interest, the region traced by
polygon center. We do this by applying the bitwise AND operation between the two
images, the bitwise AND operation occurs element wise between the two images, between
the two arrays of pixels.

Fig 3.2.6 Pixels representation of Image


Now, both of these images have the same array shape and therefore the same dimensions
and the same amount of pixels by applying bitwise AND.

17
Fig 3.2.7 Applying BITWISE AND OPERATION

Since it occurs element wise, then we’re taking the bitwise AND of each homologous
pixels and both arrays. And the way bitwise AND works is, the black region whose pixels
have intensity values which correspond to the binary number by taking the BITWISE
AND, by operating against the pixel values in the corresponding region of the other array.
The result is always going to be a binary value of 0000. This translates to the number
zero, which means all pixel intensities in that region will have a value of ZERO.It will be
completely black, thereby masking the entire region. We know the operation occurs
element wise, so all the white pixels in this region of the array will be operated against the
corresponding region of the other array. This region will remain unaffected.
We already concluded that since the polygon contour is completely white, then the binary
representation of each pixel intensity in that region would be all ones. If we take the
BITWISE AND of the ones with any other value, it’s not going to have an effect. And so
in our image, taking the BITWISE AND of these two regions would also have zero effect,
which means that we have successfully masked our canny image to ultimately only show
the region of interest, the region traced by polygon contour.

18
Fig 3.2.8 Region of interest
Step- 5: Line Detection using Hough Transform

19
We’ve have identified the edges in our images and isolated the region of interest.
Now we’ll make use of a technique that will detect straight lines in the image and thus
identify the lane lines. The technique is known as Hough Transform. We’ll start by
drawing a 2d coordinate space of X and Y, and inside of it a straight line. Equation of line
is represented as y=mx+b we are plotting this line as a function of x and y, but this line
can also be represented in parametric space which will call as Hough Space as b vs m.

Fig 3.2.9 a) Image Space b) Hough Space


Now, imagine that instead of a line, we have a single dot located at any coordinates, there
are may possible lines that can pass through this dot each line with different values for m
and b, we could have a line that crosses that with them. Notice that a single point X and Y
space is represented by a line and half space, in other words by applauding the set of lines
that goes through the point each line with its own distinct m and b value pair, this
produces an entire line of m and b value pairs Hough space. There are many possible lines
that can cross each point individually, each line with different slope and wider set of
values. However, there is one line that is consistent with both points. We can determine
that by looking at the point of intersection in Hough space.Gradient image is just a series
of white points which represent edges in our image.
But what are the lines? What are their parameters? How do we identify them? we’re going
to is first split our space into a grid, each bin inside of our grid corresponding to the slope,
and we intercept value of candidate line. The bin with maximum number of votes will be
selected for casting.

20
Fig 3.2.10 Grid System View
The maximum number of votes is the line.Displaying lane lines on the original image
The first function is named average_slope_intercept which takes in two parameters: image
and lines. The purpose of this function is to average the slope and intercept of the detected
lines and return the averaged lines. The function first creates two empty lists, left_fit and
right_fit, to store the slope and intercept values of the left and right lane lines respectively.
Next, the function checks if lines is not None. If lines is None, then the function returns
None. For each line in the lines parameter, the function iterates through the start and end
points of the line represented as x1, y1, x2, y2. Using these coordinates, the function
calculates the slope and intercept of the line using np.polyfit function from NumPy.
The slope value is used to distinguish between left and right lane lines. If the slope is
negative, the line is considered to be part of the left lane, and its slope and intercept values
are appended to the left_fit list. Otherwise, the line is considered to be part of the right
lane, and its slope and intercept values are appended to the right_fit list. After all the lines
have been processed, the function calculates the average slope and intercept values for
both the left and right lanes using np.average function. The resulting values are stored in
left_fit_average and right_fit_average variables.
Finally, the function uses these averaged values to generate the start and end points of the
left and right lane lines using the make_points function. The resulting lines are stored in a
list named averaged_lines, which is returned by the function. The second function is
named display_lines and takes in two parameters: img and lines.
This function generates a new image with the detected lane lines drawn on it. The
function first creates a new numpy array of the same size as the input image img with all
elements initialized to zero. This new array is named line_image. Then, the function
iterates through each line present in the lines parameter. Each line is represented as a list
containing four integers, namely, x1, y1, x2, y2, which are the coordinates of the start and
end points of the line respectively.
For each line, the function draws a line on the line_image array using cv2.line function
from the OpenCV library. The line is drawn using the color blue, represented as (255, 0,
0), and a thickness of 10 pixels. Finally, the main code uses these two functions to
generate the final image with the detected lane lines. The average_slope_intercept
function is first called to generate the averaged lines, which are then passed to the

21
display_lines function to generate the line image. The resulting line image is then
combined with the original lane image using cv2.addWeighted function to generate the
final output image named combo_image.
The code combo_image = cv2.addWeighted(lane_image, 0.8, line_image, 1, 0) combines
the original image lane_image with a new image line_image that contains the detected
lane lines. The resulting image is stored in the combo_image variable.
The cv2.addWeighted function blends the two images using a weighted sum. In this case,
the lane_image is given a weight of 0.8, while the line_image is given a weight of 1.0.
This means that the lane lines will be more visible in the final image since they have a
higher weight.
The third parameter, gamma, is set to 0, which means no scalar value is added to each
pixel after blending.
Overall, the resulting combo_image will contain the original image with the added lane
lines, with the lane lines being more prominent than the rest of the image. This is useful in
tasks such as lane detection in self-driving cars, where the lane lines need to be clearly
visible for the vehicle's computer vision system to properly detect and navigate within the
lane.

Fig 3.2.11 Detected Lane line, blended over original image

22
Capturing lanes on a video
Lane line detection in a video involves detecting the lane lines on each frame of the video
as it plays. This is achieved by applying image processing techniques to each frame of the
video to extract the lane lines. The detected lane lines are then overlaid on the original
video frames to visualize the results.
cap = cv2.VideoCapture("test2.mp4"): This line opens the video file "test2.mp4" and
creates a VideoCapture object cap. This object is used to read frames from the video
stream.
while(cap.isOpened()):: This line starts a loop that will continue until the end of the video
or until the user presses 'q' to quit.
_, frame = cap.read(): This line reads the next frame from the video stream and stores it in
the variable frame. The underscore before frame is used to ignore the first return value
from the cap.read() function.
canny_image = canny(frame): This line applies the Canny edge detection algorithm to the
frame and returns the resulting binary image canny_image.
cropped_canny = region_of_interest(canny_image): This line applies a region of interest
mask to canny_image to remove any edges outside the area of the lane.

lines=cv2.HoughLinesP(cropped_canny,2,np.pi/
180,100,np.array([]),minLineLength=40,maxLineGap=5):
This line applies the Hough transform to the cropped Canny image to detect lines. The
parameters used here are cropped_canny (the input image), 2 (rho value), np.pi/180 (theta
value), 100 (the minimum number of votes needed to be considered a line), np.array([]) (a
placeholder array), minLineLength=40 (the minimum length of a line to be detected), and
maxLineGap=5 (the maximum gap allowed between line segments to be considered a
single line).
averaged_lines = average_slope_intercept(frame, lines): This line takes the detected lines
from the previous step and uses the average_slope_intercept function to calculate the
slope and intercept of the left and right lanes.
line_image = display_lines(frame, averaged_lines): This line takes the original frame and
the averaged lane lines from the previous step and creates an image line_image with the
detected lane lines drawn on it.
combo_image = cv2.addWeighted(frame, 0.8, line_image, 1, 1): This line overlays
line_image on top of frame using the cv2.addWeighted function. The resulting image is
stored in combo_image.
cv2.imshow("result", combo_image): This line displays the resulting combo_image in a
window with the title "result".
if cv2.waitKey(1) & 0xFF == ord('q'): break: This line waits for a key event for 1
millisecond. If the user presses the 'q' key, the loop is terminated.
cap.release(): This line releases the resources held by the VideoCapture object cap.

23
cv2.destroyAllWindows(): This line destroys all the windows created by OpenCV.

3.2.2 Machine Learning


Machine learning is the concept of building computational systems that learn over
time based on experience and at its most basic level, rather than explicitly programming
hard coded instructions into a system, machine learning uses a variety of algorithms to
describe and analyse data, learn from it, improve and predict useful outcomes.

Learning can be classified as supervised and unsupervised learning. Supervised


learning is the most popular machine learning technique that acts as a guide to teach the
algorithm what conclusions that should come up with. Supervised learning typically
begins with a data set that’s associated with labelled features, which define the meaning of
our data and finds patterns that can be applied to ana analytics process.

Another machine learning approach that occurs between the learner and its
environment is unsupervised learning. In this case, all the learner gets as training, as large
data sets with no labels and the learner’s task are to process that data, find similarities and
differences in the information provided, and act in that information without prior training.

3.2.3 Linear Regression


It’s a supervised learning algorithm that allows us to make predictions based on
linearly arranged data sets. For now, let’s look into an example of simple linear regression
model which would model the relationship of quantitative response variable and the
explanatory variable.

The response variable is the dependent variable whose value we want to explain and
predict based on the values we want to explain and predict based on the values of the

24
independent variable. More specifically will look to establish a linear relationship
between the price of the House with respect to its size, the price of the house will be
dependent on the size and as such, it will represent out response variable, whereas the size
of the house is independent variable whose values we’re going to use to try and make
predictions about the price.

Fig 3.2.12 Linear Regression with data points

Accordingly, the size of the House is the input and the output is what we’re trying to
predict the price itself. We’re currently given a set of data points which reflect the price of
some houses based on their sizes, and for each house with a given size, it’s given a price
label. We’re applying a linear function that best approximates the relationship between
our two variables. This creates a predictive model for our data showing trends in our data
and can thereby predict the values of new data which were not initially labelled.This is
known as linear regression. As such, the linear regression model also accounts for an error
value, our predictive model, through some learning rated, learns to minimize this error by
updating its linear function until no further improvement is possible.

3.2.4 Introduction to Neural Network (Classification)


Predictive modelling approximates a mapping function which predicts the class or
category for a given observation. In the most terms, it takes some kind of input X and
classifies it, maps it to some category generally based on the previous data sets. It would

25
calculate the likelihood of something belonging to some class. The best way to get a broad
sense of how classification works is to go through an example, where a hospital, which
annually tests thousands of patients. It’s our job based on that person’s condition to
determine whether or nor they are likely to become diabetic.

Let’s suppose the examination contains information regarding two attributes the person’s
blood sugar and their age. Generally speaking, high blood sugar normally indicates a
resistance to insulin without insulin, a lot of sugar builds up in the bloodstream leading to
diabetes as age goes, as someone gets older. Generally speaking, they tend to exercise
less, gain weight, increasing the risk of diabetes as well.

Obviously, in diagnosis, a lot of other factors are considered like blood pressure,
cholesterol levels, family history, etc. But for the sake of our model and keeping things
simple, well let’s assume that the higher our blood sugar and the older we are, the more
likely we are to become diabetic.

For input 1, one person has blood sugar levels of two millimoles per litre and is 20 years
old, this person is young and their blood sugar levels are relatively low and thus unlikely
that they would be diabetic. Thus, They’re in good health. They're in good health.
For input 2, Another person is 60 years old with blood sugar levels of 11 millimoles per
litre. Obviously, this person is relatively older with a very high blood sugar. So, we’ll
assume it’s very likely that they are diabetic.

For input 3, finally there is another person who forty-five years old with blood sugar
levels of five millimoles per litre, is this person likely to have the disease or not?
Likely, this one’s a bit of question mark, we need some kind of predictive model which
can predict the category, the class to which person belongs. And based on this given input,
the category is being, is that person likely to have the disease or not?

We’ll start by representing this predictive model inside of a two-dimensional Cartesian


coordinate system. Blood glucose levels will traverse the vertical axis, which will range
from zero to 12 millimoles per litre and age will traverse the horizontal axis.

In our case, simply ranging from zero to one hundred years. What we’ll do now is plot the
people we have so far, the first one will be plotted on the coordinates that the node to (2
millimoles per litre, 20 years old) located in the bottom left of our coordinates space.

The second person will be plotted on the coordinates denoting 11 millimoles per litre and
60 years old. Let’s assume that we ended up testing both of our patients and actually

26
found out that this person indeed did have diabetes and other did not. Thus, we can assign
each one a label such that this applicant has the disease and other does not and it will be
denoted alias not being diabetic as blue and actually having the disease as red. How does
this determine whether or not the person who’s forty five years old with millimoles per
litre of glucose level have diabetes or not, to which
litre of glucose level have diabetes or not, to which class do we classify this person Is he
likely to have the disease or not? While I recall that since classification is a supervised
form of learning, we need previously labelled data sets so that we can develop a model
which is going to learn how to calculate the likelihood of a new patient belonging to some
class.

What we’re seeing now, all of these points are actually patients who we ended up testing,
we found out which ones were diabetic and which ones weren't, anyone that didn't have it
was labelled as blue anyone that did carry the disease is labelled a red.

Fig 3.2.13 Distribution of two classes

Clearly, there's a pattern in our data. The higher your age and blood sugar, the more likely
you are to become diabetic. What about this person who's 45 years old with sugar levels
of five millimoles per litre? This person, unlike the others, we haven't actually tested
them. They have not been labelled. But based on previous data, it can be confidently
predicted that, no, this person is not likely to have the disease.
We can easily look and see that previous patients with similar blood sugar and age whom
we actually tested, are not diabetic. Thus we can conclude that this person is likely not to
be diabetic either. Now, recall, we previously discussed how we can use lines to classify
data among discrete classes, well, simply enough, our data can be separated by a line.

27
Fig 3.2.14 Linear Separation between two classes

This line is represented by some linear equation. How do we obtain the equation of this
line? There is a clear distinction that patients above the line are diabetic and patients
below the line are not. That being said, based on this line, based on our predictive model,
which was obtained thanks to previous data sets, we can effectively predict the probability
of a new patient having diabetes. If, for example, someone has blood sugar levels of four
millimoles per litre and is fifty-five years old. This is a new data sets and it's not labelled.
We haven't actually tested this person, so we don't know. We need to use our linear
model, which we obtained from previously labelled data sets to make a prediction.
Clearly, this coordinate is plot below the line. It can then be confidently assumed that this
person is not likely to be diabetic.
In summary, what we're doing is first looking at previous data sets of patients who have
already been labelled as having the disease or not, and based on that data, we can come up
with a pattern, a predictive linear model, a line that best separates our data into discrete
classes.
Classification in itself is a very important concept in self-driving cars. Implementing an
algorithm that's able to find this line with minimal error will be the entire focus of this
section. Solving classification problems is essential in self-driving cars and can be used to
classify whether an object crossing the road is a car or a pedestrian, a bicycle, or even
identifying different traffic signs.

3.2.5 Perceptron
Previously we saw how algorithms are trained to develop linear models that best classify
correctly labelled data points. And from that linear model, we can use it to predict the
values of new data, which don’t have a label based on their score.The Perceptron is the
most basic form of neural network that inspiration from the brain. The brain is a complex

28
information processing device with incredible pattern recognition abilities. It’s able to
definitely process input from its surrounding environments, categorize them and generate
some kind of inputs. Similarly, the Perceptron is a basic processing element that also
receives input, processes it and generates an output based on the predictive model. More
specifically, this predictive linear model was developed based on previously labelled data
sets.
Our model node is going to receive two input nodes the age and blood sugar x 1∧x 2
respectively. Suppose we want to predict if someone is diabetic.
The person is 30 years old, has a blood sugar levels of three millimoles per litre. As such,
the input node sugar x 1 is 30 years old and the input node is x 2 is three millimoles per
litre. Assume the line is the following equation, the Perceptron will then plot the point on
the graph and check whether the point is relative to the linear model based on this line
equation, if the person’s linear combination is positive, they’re going to be plotted below
the line and thus output a prediction ONE. Otherwise, if the person’s linear combination is
negative, they’ll be classified above the line and thus output a prediction of ZERO.
In our case, this person who is 30 years old with three millimoles per litre of the blood
sugar, has a positive linear combination, is therefore going to be plotted somewhere below
the line along with the other blue points in the Region ONE. And thus we can safely
predict that this person is in class one. They are not diabetic.

Also add a bias value of 1 to the perceptron model. Thus predicts the perfect output of the
model,resulting they’re not diabetic.

Fig 3.2.15 Perceptron Model

Setting Initial setup, Plotting two regions of the data onto matplotlib

29
Fig 3.2.16 Plotting Random Points

Error function

As we saw above, random dots had been plotted. The goal is to use this data to train and
develop an algorithm to come up with a line that best separates data into discrete classes
with minimal error. As we know, a line can be represented by an equation where w1 and
w2 dictate the slope of the line. These weights start out as random values. And so, we’re
just going to have a random line which more often than not is going to classify our data
correctly. So, what it’s going to do is display a random line with some equation. It’s then
going to look at this line and identify any errors and categorizing the following datasets
based on those errors. It’s going to eventually adjust the weight of the linear equation to
fit the data better.

30
Let a line initially is drawn by computer but it doesn’t fit well o doesn’t classify better. To
overcome this error, we need to determine by the error rate that is whether it is larger or
smaller. And now through a gradient our network is going to take tiny steps to reduce this
error. In every iteration, these tiny steps correspond to the learning rate, which needs to be
sufficiently small. At every step, the line will move closer to the misclassified points and
it keeps doing so until there are no errors or until the error is sufficiently small. And by
doing so we eventually obtain the perfect model.

Calculate the error function

We’re going to need a continuous error function. We’ll call it E. Looking at this diagram,
clearly there are two misclassified points.

The total error then results from the sum of these penalties associated with each point will
safely assume that the total error is very high. So, what we’ll do is actually move the line
in direction with the most errors. We keep doing that until all error penalties are
sufficiently small and thus, we’re minimizing the errors as we adjust the weights of our
linear model to better classify the points and thereby minimizing and decreasing out total
error sum.

Sigmoid

The sigmoid function, also known as the logistic function, is commonly used in neural
networks to produce probabilities. The sigmoid function takes any real-valued number as
input and squashes it into a range between 0 and 1. This property makes it suitable for
mapping the output of a neural network to a probability.

31
The Sigmoid function is defined as
Where exp(-x) represents the exponential function with base e raised to the power of -x.
In a binary classification problem, the output of a neural network is typically passed
through a sigmoid activation function. This activation function transforms the output into
a probability value between 0 and 1, representing the likelihood of belonging to the
positive class.

Implementation of sigmoid function

Printing a linear model with 100 blue and red points

32
Fig 3.2.17 Linearly Classification in perceptron model with high error

Cross Entropy

The general equation for cross entropy is the following.

−∑ y ln ( p ) + ( 1− y ) (ln ( 1− p ) )
Where P is the probability of a point being blue and the variable Y corresponds to a label
of either ZERO or One, if our point is labelled blue. If we tested this person and they are
healthy, it will have a label of one. If our point is red, y will equal to ZERO.

Cross-entropy is a commonly used loss function in classification tasks, including linear


models. It measured the dissimilarity between the predicted probabilities and the true
labels and is often used to train models to output probability distributions that are as close
as possible to the true distribution. By minimizing the cross-entropy loss, the linear model
can learn the best decision boundary that maximizes the separation between different
classes and accurately predicts the class probabilities for new, unseen data.

Gradient descent is an iterative optimization algorithm commonly used to minimize the error or
loss function in machine learning models. It is particularly effective in training neural networks
and other models with a large number of parameters.
The basic idea behind gradient descent is to update the model's parameters in the opposite
direction of the gradient of the loss function with respect to those parameters. By taking steps in
the direction of the steepest descent, the algorithm gradually converges towards the minimum of
the loss function.

33
Code Implementation

34
Fig 3.2.18 Linearly Classification in perceptron model with least error

3.2.6 Keras Prediction

Keras is a high-level neural networks library written in Python. It is commonly used as an


interface for deep learning models, especially in the field of machine learning.
Keras provides a user-friendly and modular API that allows developers to quickly build
and experiment with different types of neural network architectures. It supports various
types of layers, such as dense layers, convolutional layers, recurrent layers, and more.
These layers can be easily stacked together to create complex network structures.
One of the key advantages of Keras is its simplicity and ease of use. It abstracts away many
of the low-level details of building neural networks, making it accessible to both beginners
and experienced practitioners. It also provides a wide range of built-in functions for
common tasks in deep learning, such as activation functions, loss functions, and optimizers.
Keras is built on top of other deep learning libraries, such as TensorFlow and Theano,
which provide efficient numerical computation. This allows Keras to leverage the
computational capabilities of these libraries while providing a more intuitive and user-
friendly interface.
Overall, Keras is a powerful tool for building and training machine learning models,
particularly deep learning models. It simplifies the process of designing and experimenting

35
with neural networks, making it a popular choice among researchers and practitioners in
the field of machine learning.
Keras code implementation
#importing libraries

1. n_pts = 500: Sets the number of data points to 500. This variable determines the
number of points in each class.
2. np.random.seed(0): Sets the random seed to ensure reproducibility. By setting the seed
to a specific value (0 in this case), the same random numbers will be generated every
time the code is run.
3. Generating Data for Class A:
 Xa = np.array([np.random.normal(13, 2, n_pts), np.random.normal(12, 2, n_pts)]).T : Generates
n_pts data points for Class A by sampling from a normal distribution. The mean of the
distribution for the x-coordinate is 13, and the standard deviation is 2. Similarly, the
mean for the y-coordinate is 12, and the standard deviation is 2. The np.array function
is used to create a 2-dimensional array, and .T is used to transpose the array.
4. Generating Data for Class B:
 Xb = np.array([np.random.normal(8, 2, n_pts), np.random.normal(6, 2, n_pts)]).T : Generates
n_pts data points for Class B following the same logic as for Class A. The mean for
the x-coordinate is 8, and the mean for the y-coordinate is 6.
5. Combining Data and Creating Labels:
 X = np.vstack((Xa, Xb)): Stacks the data points from Class A and Class B vertically to
create a single array X that contains all the data points.
 y = np.matrix(np.append(np.zeros(n_pts), np.ones(n_pts))).T: Creates the labels for the data
points. It uses np.zeros(n_pts) to generate an array of 0s (representing Class A) with
length n_pts, and np.ones(n_pts) to generate an array of 1s (representing Class B) with
length n_pts. np.append is used to concatenate the two arrays, and np.matrix is used to
convert the resulting 1-dimensional array to a column matrix.
6. Plotting the Data:
 plt.scatter(X[:n_pts,0], X[:n_pts,1]): Plots the data points corresponding to Class A.
X[:n_pts,0] selects the x-coordinates of the first n_pts data points, and X[:n_pts,1]
selects the y-coordinates.

36
The resulting scatter plot shows two distinct clusters of data points, representing two
different classes (Class A and Class B). Class A is represented by blue dots, and Class
B is represented by orange dots.

Fig 3.2.19 Random Distribution of data point

Creating the Model

The code provided creates a sequential model using Keras, compiles it with an optimizer
and loss function, and trains the model using the provided data. Here's a breakdown of the
code:

37
1. Importing Libraries:
 from tensorflow.keras.models import Sequential: Imports the Sequential class from the
Keras models module.
 from tensorflow.keras.layers import Dense: Imports the Dense layer class from the Keras
layers module.
 from tensorflow.keras.optimizers import Adam: Imports the Adam optimizer from the
Keras optimizers module.
2. Model Creation:
 model = Sequential(): Initializes a sequential model, which is a linear stack of layers.
3. Adding Layers:
 model.add(Dense(units=1, input_shape=(2,), activation='sigmoid')): Adds a single dense
layer to the model. This layer has 1 unit, meaning it will output a single scalar value. The
input_shape parameter is set to (2,), indicating that the input data has two features. The
activation function used in this layer is the sigmoid function, which produces probabilities
between 0 and 1.
4. Compiling the Model:
 adam = Adam(lr=0.1): Initializes the Adam optimizer with a learning rate of 0.1.
 model.compile(adam, loss='binary_crossentropy', metrics=['accuracy']): Compiles the
model with the Adam optimizer, binary cross-entropy as the loss function, and accuracy
as the metric to monitor during training. Binary cross-entropy is commonly used in binary
classification tasks to measure the dissimilarity between the predicted probabilities and
the true binary labels.
5. Model Training:
 h = model.fit(x=X, y=y, verbose=1, batch_size=50, epochs=500, shuffle='true'): Trains
the model using the fit() function. The x parameter specifies the input data (X), and the y
parameter specifies the target labels (y). The verbose parameter set to 1 displays the
training progress during each epoch. The batch_size parameter determines the number of
samples per gradient update. The epochs parameter defines the number of times the
training process iterates over the entire dataset. The shuffle parameter set to 'true' shuffles
the data before each epoch to introduce randomness into the training process.
After training the model, the h object will contain information about the training
history, such as the loss and accuracy values for each epoch. This information can be
used to evaluate the model's performance and visualize the training progress.

38
1. plt.plot(h.history['accuracy']): Plots the accuracy values from the training history stored in
the h object. The 'accuracy' key retrieves the accuracy values from the history dictionary.
2. plt.legend(['accuracy']): Adds a legend to the plot with the label 'accuracy'. This label
identifies the plotted line.
3. plt.ylabel('accuracy'): Sets the y-axis label as 'accuracy'.
4. plt.xlabel('epoch'): Sets the x-axis label as 'epoch'.
5. plt.plot(h.history['loss']): Plots the loss values from the training history. The 'loss' key
retrieves the loss values from the history dictionary.
6. plt.legend(['loss']): Adds a legend to the plot with the label 'loss'.
7. plt.title('loss'): Sets the title of the plot as 'loss'.
8. plt.xlabel('epoch'): Sets the x-axis label as 'epoch'.
By plotting the accuracy and loss values over the epochs, you can visualize the training
progress of the model. The accuracy plot shows how the model's accuracy improves or
changes throughout the training process. The loss plot, on the other hand, illustrates how
the loss decreases over the epochs. This information can help in evaluating the model's
performance and identifying any potential issues like overfitting or underfitting.

Fig 3.2.20 Loss during training

39
The provided code defines a function called plot_decision_boundary that visualizes the
decision boundary of a classification model. Here's a breakdown of the code:
1. Function Signature:
 def plot_decision_boundary(X, y, model): Defines a function named
plot_decision_boundary that takes three parameters: X, y, and model. X represents
the input data, y represents the target labels, and model represents the trained
classification model.
2. Generating Meshgrid:
 x_span = np.linspace(min(X[:,0]) - 1, max(X[:,0]) + 1): Generates a range of x-
coordinates that spans the minimum and maximum values of the first feature in the
input data.
 y_span = np.linspace(min(X[:,1]) - 1, max(X[:,1]) + 1): Generates a range of y-
coordinates that spans the minimum and maximum values of the second feature in
the input data.
 xx, yy = np.meshgrid(x_span, y_span): Creates a 2-dimensional grid of points by
combining the x-coordinates and y-coordinates. The resulting xx and yy matrices
represent all the possible combinations of x and y coordinates within the given
ranges.
3. Reshaping and Predicting:
 xx_, yy_ = xx.ravel(), yy.ravel(): Flattens the xx and yy matrices into 1-dimensional
arrays, xx_ and yy_, respectively.
 grid = np.c_[xx_, yy_]: Concatenates the flattened xx_ and yy_ arrays side by side
to create a new 2-dimensional array called grid. Each row of the grid array
represents a point in the 2-dimensional feature space.
 pred_func = model.predict(grid): Uses the trained model to predict the class labels
for each point in the grid. The predict method is applied to the model object, and the
resulting predictions are stored in the pred_func variable.
4. Reshaping and Plotting:
 z = pred_func.reshape(xx.shape): Reshapes the predictions (pred_func) back into
the original shape of the xx matrix. The resulting reshaped predictions are stored in
the z variable.
 plt.contourf(xx, yy, z): Creates a filled contour plot using the xx, yy, and z
variables. The contourf function fills the areas in the plot that correspond to
different predicted classes, effectively visualizing the decision boundary of the
classification model.

40
By calling the plot_decision_boundary function and passing in the input data (X),
target labels (y), and the trained model (model), you can generate a plot that shows
the decision boundary of the classification model. The decision boundary separates
different classes in the input space, indicating the regions where the model assigns
different class labels.

The code provided plots the decision boundary of a classification model along with the
data points. It then performs a prediction for a specific point (x=7.5, y=5) and
visualizes it. Here's an explanation of the code:
1. plot_decision_boundary(X, y, model): Calls the plot_decision_boundary function, passing
in the input data (X), target labels (y), and the trained model (model). This generates
a plot showing the decision boundary of the classification model.
2. plt.scatter(X[:n_pts,0], X[:n_pts,1]): Plots the data points corresponding to Class A (first
n_pts data points) as blue dots. X[:n_pts,0] selects the x-coordinates of Class A, and
X[:n_pts,1] selects the y-coordinates.
3. plt.scatter(X[n_pts:,0], X[n_pts:,1]): Plots the data points corresponding to Class B
(remaining data points) as orange dots. X[n_pts:,0] selects the x-coordinates of Class
B, and X[n_pts:,1] selects the y-coordinates.
4. plot_decision_boundary(X, y, model): Calls the plot_decision_boundary function again to
generate another plot of the decision boundary.
5. plt.scatter(X[:n_pts,0], X[:n_pts,1]): Plots the data points corresponding to Class A (first
n_pts data points) as blue dots.
6. plt.scatter(X[n_pts:,0], X[n_pts:,1]): Plots the data points corresponding to Class B
(remaining data points) as orange dots.
7. x = 7.5 and y = 5: Assigns specific values to the variables x and y, representing the
coordinates of a point for which we want to perform a prediction.
After executing this code, you will see two plots: one with the decision boundary
and the data points, and another with only the data points. The last part of the code
sets the values of x and y to specific coordinates (7.5, 5) for which you want to
make a prediction.

41
The additional code you provided creates a new point with coordinates (x=7.5, y=5),
makes a prediction using the trained model, and visualizes the point on the plot. Here's
an explanation of the code:
1. point = np.array([[x, y]]): Creates a 2D numpy array called point with the
coordinates (x, y). The point is represented as a single-row matrix.
2. prediction = model.predict(point): Uses the predict method of the trained model
(model) to make a prediction for the given point. The point array is passed as the
input to the model's predict method, and the resulting prediction is stored in the
prediction variable.
3. plt.plot([x], [y], marker='o', markersize=10, color="red") : Plots the point (x, y) on the
existing plot as a red dot. The plot function is used with [x] and [y] as the x and
y coordinates, respectively. The marker parameter specifies that the marker
shape should be a circle ('o'), and the markersize parameter sets the size of the
marker to 10 pixels.
4. print("prediction is: ",prediction): Prints the predicted output for the given point.
The predicted output is stored in the prediction variable, which represents the
model's prediction for the specified point (x, y).
By adding this code, you can visualize and obtain the prediction for a specific point
(x=7.5, y=5) on the existing plot. The point will be highlighted as a red dot, and the
predicted output will be printed.

3.2.7 Deep Neural Network


Deep neural networks are commonly used in image classification tasks, including
classifying traffic signals. By using deep neural networks, the model can automatically
learn relevant features and pattern from the images, enabling it to extract information
about traffic signals and classify them accurately. The depth of the network allows for
the extraction of hierarchical features, capturing both low-level details (edges,
textures) and high-level semantic information (traffic signal shapes, color) necessary
for accurate classification.
a) Non-Linear Models

As discussed above, linear models are unable to represent a set of data, a non-
linear model is used instead.
But how we able to obtain this curve? The answer to the question is, we’re
going to do is combine two Perceptron into a third one. Two linear models are
superimposed to form a single non-linear model.

42
Fig 3.2.21 Non-linear model

Based on our empirical analysis, both Model 1 and Model 2 exhibit misclassifications in
our data. However, by leveraging a synergistic approach that combines the strengths of
these models, the third model achieves impeccable classification performance. Third
model is actually a linear combination of the two other models.

Each linear model is treated as an input node which contains some linear equation. Let’s
refer to the first model as x1 and the second model as x2. Each linear model by some
weights x1 is multiplied by some weight and second model is also does the same.
Everything is then added up to obtain the linear combination and recall that subsequent
to the linear combination, we apply the sigmoid activation function ultimately resulting
in the non-linear curve.

Let’s assume
W1 is 1.5 and w2 is 1 with the bias value as 1. Suppose in the first
perceptron model that red point predicted as the probability of it being in the positive
region is 0.88 and probability of 0.66 in the second perceptron model.
Summing all those weights and bias value also taking the sigmoid into consideration
which converts them into probabilities resulting into
0.88 ( 1.5 ) +0.64 ( 1 )+ 1 ( 0.5 )=2.46
Applying the sigmoid function

43
It produces the probability of positivity as 0.92. Thus, forming into a non-linear model
which better classify our data points.

Architecture

In deep neural networks, two linear models are combined with a sigmoidal activation
function to form a nonlinear model. This combination allows the network to learn
complex relationships and make nonlinear predictions.A linear model assumes that there
is a linear relationship between the input features and the output. It computes the output
by multiplying each input feature by a weight, summing them, and adding a bias term.
However, linear models alone are limited in their ability to capture nonlinear patterns in
the data. To introduce nonlinearity, apply an activation function to the output of the
linear model. In this case the sigmoid function is used. The sigmoid function takes the
output of a linear model and converts it to a value between 0 and 1. This transformation
introduces nonlinearity into the model.Combining a linear model with a sigmoid
activation function makes the output of the linear model the input of the sigmoid
function. This transformed output represents the predictions of the first layer of the deep
neural network.

To create a deep neural network, repeat this process stacking multiple layers of linear
models, followed by a sigmoidal activation function. Each level takes the previous
level's output as input and produces a new transformed output.

Deep neural networks can learn complex nonlinear relationships between input features
and desired outputs by adjusting the weights and biases of linear models during the
training process. This allows the network to model complex patterns in the data and
make accurate predictions. Overall, the combination of linear models and sigmoidal
activation functions in deep neural networks enables networks to act as powerful
nonlinear models capable of solving a variety of complex tasks in areas such as
computer vision, natural language processing, and more. Become.

Fig 3.2.22 Representation of Deep Neural Network with hidden layer [21]

44
To create a deep neural network, repeat this process stacking multiple layers of linear
models, followed by a sigmoidal activation function. Each level takes the previous
level's output as input and produces a new transformed output.

Deep neural networks can learn complex nonlinear relationships between input features
and desired outputs by adjusting the weights and biases of linear models during the
training process. This allows the network to model complex patterns in the data and
make accurate predictions. Overall, the combination of linear models and sigmoidal
activation functions in deep neural networks enables networks to act as powerful
nonlinear models capable of solving a variety of complex tasks in areas such as
computer vision, natural language processing, and more. Become.

Feedforward and backpropagation are fundamental processes in deep neural network


training. Let's discuss each one.

Feedforward

Feedforward is the process of passing input data through the layers of a neural network
to produce outputs or predictions. The input is passed layer by layer through the network
until it reaches the output layer.
In feedforward, the input data is multiplied by the weights of connections between
neurons in each layer. Each neuron receives a weighted sum of its inputs, applies an
activation function (such as Sigmoid or ReLU), and passes the transformed output as
input to the next layer. This process continues until the output layer is reached and the
final prediction or output of the network is obtained.

The feedforward process computes the network's output based on the current weights
and biases. However, this initial prediction may differ from the desired output, resulting
in the need to adjust the weights through a process called backpropagation.

Backpropagation method

Backpropagation is the process of computing the gradients of the network weights and
biases with respect to the loss function. This allows the network to learn from mistakes
by iteratively adjusting the weights to minimize the difference between the predicted
and desired outputs. Backpropagation begins by comparing the network output to the
desired output and calculating the loss. Loss gradients associated with the weights and
strains are then calculated using the analytical chain rule.

Gradients propagate back through the layers of the network, starting at the output layer
and toward the input layer. Each layer uses the gradient to update the weights and
distortions, shifting the weights and distortions in the direction that reduces the loss.
This adjustment is typically performed using an optimization algorithm such as gradient
descent. The backpropagation process gradually adjusts the weights and biases over a
number of training samples to improve the network's performance. By repeatedly
updating the weights based on the gradients, the network learns how to make better
predictions and minimize overall loss.

45
Feedforward and backpropagation are integral parts of training deep neural networks. A
feedforward process produces predictions, while backpropagation allows the network to
learn and adjust weights to improve performance over time. Together they form the
basis for training and optimizing deep neural networks.

Fig: 3.2.23 model trained with deep neural network (Jupyter Lab)

3.2.8 Multiclass classification (classifying traffic signals)


Multiclass classification is a machine learning technique used to classify data into
several different categories or classes. It is especially useful for tasks such as
recognizing handwritten digits and classifying signals.

Handwritten digit recognition trains a multiclass classification algorithm on a data set of


labeled handwritten digits. Each number is assigned to a specific class (usually 0-9). The
goal is to create a model that can accurately identify new handwritten digits and place
them in the appropriate class. Similarly, traffic light classification uses a multiclass
classification algorithm to classify different types of traffic lights based on their
appearance and importance. This classification task is very important for applications
such as self-driving cars and traffic management systems.

The multiclass classification process involves several steps. First, a labeled sample data
set containing handwritten digits and images of traffic lights and their corresponding
classes is collected. These datasets serve as training data for the algorithm. Collected
data is pre-processed to extract meaningful features. For handwritten digits, features
may include pixel values, shape, or orientation. For traffic lights, features can be colors,
shapes, or specific visual patterns.

46
Fig 3.2.22 Sample traffic Signal with different class [22]

Once the data is preprocessed, various machine learning algorithms or deep learning
techniques can be used to train multi-class classification models. Examples of such
algorithms include logistic regression, decision trees, random forests, and neural
networks. The model is trained using labeled samples from the dataset with the goal of
learning the patterns and features that distinguish each class.After the model is trained, it
is evaluated against another dataset to evaluate its performance. Metrics such as
accuracy, precision, recall, and F1 score are commonly used to measure a model's
classification performance.Finally, the trained model can be used to predict the class of
new unseen instances. This will enable the model to classify handwritten digits and
traffic signs accurately and efficiently. Overall, multiclass classification plays an
important role in tasks such as handwritten digit recognition and traffic signal
classification, providing a valuable solution for a variety of real-world applications.

Softmax

The sigmoid ids very useful tool to use for classifying binary data sets. It ranges in
probability values between zero and one, the predicted probability of a point being zero
as it infinitely approaches positive infinity and takes the form of the equation which
converts the variable x score of a point to a probability. However, it not feasible to use
this model for multiclass data classification. Its spin limitation between zero and one
make this difficult.

So, a new activation is introduced called as Softmax function. The softmax function is a
useful activation function for scenarios involving multiclass functions.
Softmax is an activation function that scales numbers/logits into probabilities. The output
of a Softmax is a vector (say v) with probabilities of each possible outcome. The
probabilities in vector v sums to one for all possible outcomes or classes.

Mathematically, Softmax is defined as,

47
3.2.9 Convolutional Neural Network

A deep learning model known as a convolutional neural network (CNN) is created


specifically for the analysis of visual data, such as photos and movies. It excels at tasks
like image segmentation, object detection, and categorization of images.
The human brain's visual processing mechanisms serve as an inspiration for CNNs.
They are made up of numerous interconnected layers, each of which operates on the
input data in a particular way. The convolutional layer, which applies a number of
learnable filters (also known as kernels) to the input image, is the fundamental part of a
CNN. These filters look for particular patterns or characteristics, such textures,
corners, or edges.The primary function of a CNN is carried out by the convolutional
layer. Convolutional filters are used to extract distinct features at various spatial
regions from the input image. The filters move across the input image (convolve),
computing dot products with local patches and creating feature maps.Activation
Function: To add non-linearity to the network, an activation function is applied
element-by-element following the convolutional process. The Rectified Linear Unit
(ReLU), which sets negative values to zero and leaves positive values unaltered, is the
most widely used activation function in CNNs.Pooling Layer: The pooling layer
provides translation invariance while lowering computational complexity by
downsampling the spatial dimensions of the feature maps. A popular pooling strategy
is called max pooling, which selects and keeps the highest value in each local region
while discarding all other values.
Fully Connected Layer: The recovered features are flattened and fed into one or more
fully connected layers (also known as dense layers) after multiple convolutional and
pooling layers. Based on the learnt features, these layers carry out high-level reasoning

48
and generate predictions. The final class probabilities are often generated by the output
layer using an appropriate activation function (for classification, for example,
softmax).

Fig 3.2.23 Typical Convolutional Neural Network [23]

Fig: 3.2.23 Representation of Convolutional Layer

49
MNIST Image Recognition System

Classifying traffic signals

Fig 3.3.1 Traffic Symbol Representation by different class [22]

50
Fig 3.3.2 Distribution of Training Dataset

leNet Implementation

Convolutional neural network (CNN) architecture LeNet, often referred to as LeNet-5,


was created in the early 1990s by Yann LeCun and his colleagues. One of the original
CNN models, it made a substantial contribution to the development of deep learning
for image identification applications.

LeNet's main goal was to create a system that could read handwritten numbers for use
in applications like bank check recognition. LeNet set the stage for later developments
in the field of deep learning with its impressive performance on digit recognition
tests.The LeNet architecture has seven layers, including an output layer, two fully
linked layers, and three sets of convolutional layers. The layers are broken out as
follows:Layer one: Grayscale images with specified dimensions are fed into the
network (often 32x32 or 28x28 pixels). For additional processing, these photos are
supplied into the network.LeNet use convolutional layers to draw out pertinent
characteristics from the input photos. A set of learnable filters or kernels are used by
the first convolutional layer to conduct a convolution operation on the input image. By
moving the filters across the input, this procedure creates several feature maps from
the input. From the feature maps of the preceding layer, the next convolutional layers
progressively extract increasingly intricate characteristics.
Subsampling (Pooling) layers: To improve computational efficiency, LeNet adds a
subsampling layer after each convolutional layer. This reduces the spatial dimensions

51
of the feature maps. The most popular method is max pooling, which only keeps the
highest value from each pooling zone and throws away the remainder.Fully connected
layers: These feature maps are flattened into a 1D vector and processed through fully
connected layers after being derived from the convolutional and pooling layers. These
layers are similar to conventional artificial neural networks in that every neuron in the
layer before it is connected to every other neuron in the network. Input data are learned
as high-level representations by the fully linked layers.Output layer: The output layer
is connected to the final fully connected layer by a number of neurons that typically
correspond to the number of classes in the classification task. The output layer
generates probability ratings for each class using an appropriate activation function
(like softmax).LeNet minimizes the discrepancy between expected and actual outputs
by adjusting the weights of the network during training using backpropagation and
gradient descent optimization.Convolutional neural networks were proven to be
effective for image classification tasks by LeNet's architecture, which also paved the
way for later developments in deep learning models. It substantially influenced the
creation of contemporary CNN architectures and motivated academics to investigate
and enhance the design.
The model begins with a Conv2D layer that has 60 5x5 filters that operate on images
with a 32x32 input shape and a single (grayscale) channel. The 1,560 parameters in
this layer.
A second Conv2D layer with 60 filters of size 5x5 is added after that, creating an
output form of 28x28x60. The 90,060 parameters in this layer.
The pooling size is 2x2, and a MaxPooling2D layer is used to reduce the spatial
dimensions to 14x14x60.Following MaxPooling2D layers, two further sets of Conv2D
layers with 30 filters of size 3x3 are added. These layers assist in separating out more
intricate elements from the source photos.
The Flatten layer flattens the output of the final MaxPooling2D layer into a
vector.With the addition of a fully connected dense layer with 500 units and a ReLU
activation function, 240,500 parameters are introduced.To avoid overfitting during
training, a Dropout layer with a dropout rate of 0.5 is included.The softmax activation
function is used to create class probabilities in the final dense layer, which has 43 units
(matching to the number of traffic signal classes).
The categorical cross-entropy loss function, accuracy metric, and learning rate of 0.001
of the Adam optimizer are used to construct the model.
Description of layers of Convolutional Neural Network with parameters

52
Classifying traffic signals with different image data

Fig 3.3.3 Sample input given to the model to classify traffic signal [22]

53
The above model predicted the class 34. Left Turn sign belongs to the category of class
34.

3.4 Behavioral Cloning


A approach used in machine learning and robotics to educate an agent or model to
replicate a desired behaviour or policy by learning from demonstrations supplied by an
expert is behavioural cloning, also known as imitation learning. Behavioral cloning is a
crucial technique for teaching autonomous vehicles to imitate the behaviour of human
drivers in the context of self-driving cars.
Behavioral cloning is the process of teaching a model to carry out a task by imitating the
activities of a human expert. A human driver serves as the expert in the case of self-
driving automobiles, giving demonstrations by operating a vehicle while recording
pertinent information, such as pictures or sensor readings, along with matching actions,
such as steering angles or acceleration/deceleration.
A human driver serves as the expert in the case of self-driving automobiles, giving
demonstrations by operating a vehicle while recording pertinent information, such as
pictures or sensor readings, along with matching actions, such as steering angles or
acceleration/deceleration.
Learning from our driving habits using convolutional neural networks. The network will
review each image and figure out how to drive the vehicle based on what it sees. It will
develop the ability to change the steering angle to suit various circumstances. After
training our model, we'll use the simulator to test it on a different track. The vehicle will
operate automatically with no human intervention. The automobile should perform
admirably on this course and be self-driving if our training went properly. This method
is crucial in the area of autonomous vehicles. This kind of method is used to copy the
behavior of genuine self-driving automobiles after they have been educated by human
drivers on actual roads. Therefore, you will gain a better understanding of the science
underlying self-driving automobiles through mastering behavioral cloning.

Udacity Simulator
The Self-Driving Car Engineer Nanodegree program is offered by the online learning
platform Udacity, which also offers the Udacity Simulator as a software tool. It is
intended to mimic a computerized environment where students can practice and test
their simulations of self-driving vehicles.
The simulator offers a 3D environment that is realistic and includes a variety of tracks
and scenarios that reflect real-world driving situations. Students can test their learned
models in autonomous mode or drive a simulated car manually.Driving Controls: The
simulator allows users to manually steer, accelerate, and brake the virtual car via

54
keyboard inputs or a linked gamepad. By operating the vehicle themselves, students can
gather training data in this way.statistics collection: During manual driving, the
simulator enables users to record a variety of statistics, including steering angles, throttle
inputs, and braking pressure. This information can be used to generate training datasets
for machine learning methods such as behavioral cloning.After being taught, a model
can be tested in the simulator's autonomous mode. In an attempt to drive autonomously,
the trained model takes control of the virtual vehicle and makes decisions based on the
information it receives from the simulated sensors.

Fig 3.4.1 Snapshot of Udacity car simulator [24]

We can test their trained models in autonomous mode or manually operate a simulated
car on the simulator. With several courses and settings that replicate real-world driving
circumstances, it provides a believable 3D experience.

Collecting Data
By operating the vehicle, themselves while recording pertinent data, such as images
from the camera systems, steering angles, throttle inputs, and braking inputs, students
can gather training data. Multiple camera viewpoints, including left, centre, and right,
are offered by the simulator, which aids in diversifying the training data and expanding
the model's applicability.

The collected data is saved in a directory along with a CSV file that logs the recorded
images and their corresponding information, such as steering angles. The steering angle
is the primary label used to train the model to imitate the expert driving behaviour.
The collected data can be analyzed, preprocessed, and used to train neural network
models. Techniques such as data augmentation and flipping images can be applied to
improve data balance and model performance

55
Fig 3.4.2 Data Collected while driving the car model in training mode

The above datasets are captured from the camera of the car

Data Loading

56
Fig 3.4.3 Data Distribution with steering, throttle, reverse, speed

The dataset contains information about the images captured from a simulated
self-driving car.Each row represents a data sample with the following columns:
 "center": Path to the image captured by the center camera.
 "left": Path to the image captured by the left camera.
 "right": Path to the image captured by the right camera.
 "steering": The steering angle value associated with the image (a measure of
the car's turning direction).
 "throttle": The throttle value (speed control) associated with the image.
 "reverse": The reverse value indicating whether the car is in reverse gear or not.
 "speed": The speed of the car.
 The dataset seems to be organized in a tabular format, with each row
representing a specific instance or time frame in the simulation.
 The "center," "left," and "right" columns likely correspond to different camera
perspectives capturing the road ahead.
 The "steering" column provides information about the steering angle,
indicating the direction the car should turn.
 The "throttle" column represents the speed control input for the car.
 The "reverse" column indicates whether the car is in reverse gear or not.
 The "speed" column provides information about the speed of the car at each
instance.

57
Data Preprocessing

With the help of the steering angles of the data samples, the offered segment tries to
filter and visualize a dataset. The total number of data samples in the dataset is printed
first. The data is then divided into ranges of steering angle bins by the code. The code
generates a list of sample indices for each bin that fall within the specified steering angle
range. A subset of the samples that exceed a predetermined limit (defined by
samples_per_bin) are placed to a remove_list after these indices are randomly shuffled.
The code then uses the indices in remove_list to remove the matching rows from the
dataset.

Data Validation
The provided code segment extracts image paths and steering angles from a DataFrame.
It first prints the details of a specific data sample, displaying information such as image
paths, steering angle, throttle value, reverse flag, and speed.The code then defines a
function called load_img_steering, which iterates over the data and collects the image
paths and corresponding steering angles. For each data sample, it appends the center
image path and steering angle to separate lists. Additionally, it augments the data by
appending the left image path with a slightly increased steering angle and the right
image path with a slightly decreased steering angle.After the loop, the lists are converted
into NumPy arrays and returned from the function as image_paths and steerings,
respectively. These arrays represent the augmented dataset with the image paths and
adjusted steering angles.

58
Fig 3.4.4 Distribution of Training Set and Validation Set

Preprocessing Image
The code segment defines a function called img_preprocess that performs preprocessing
on an input image. Here's a summary of the code's functionality:The function takes an
image path as input and reads the image using mpimg.imread(). The image is then
cropped to remove the top portion, keeping the region of interest related to the road.
Next, the image is converted from RGB to the YUV color space using cv2.cvtColor().
This color space conversion helps capture important details for computer vision tasks.A
Gaussian blur with a kernel size of (3,3) is applied to the image using
cv2.GaussianBlur(). This blurring operation helps reduce noise and smooth out the
image.The image is then resized to a fixed size of (200, 66) pixels using cv2.resize().
Resizing the image ensures consistent input dimensions for downstream tasks or
models.Finally, the image is normalized by dividing its pixel values by 255 to bring
them within the range of 0 to 1.The function returns the preprocessed image as the
output.

In summary, the img_preprocess function takes an image path, applies a series of


preprocessing steps including cropping, color space conversion, blurring, resizing, and
normalization, and returns the preprocessed image. This preprocessing is often
performed to enhance the image quality and extract relevant features before feeding it
into computer vision models or tasks.

Fig 3.4.3 Original Image and Pre-processed Image

Nvidia Model
The NVIDIA model, often known as the NVIDIA End-to-End model or NVIDIA
PilotNet, is a behavioral cloning deep learning model that has been extensively utilized
in simulations of self-driving cars. Researchers at NVIDIA came up with the idea, and it
has successfully mimicked human driving behavior with astounding results.

The NVIDIA model is intended to provide an end-to-end method for autonomous


driving by directly mapping the raw input images from the vehicle's cameras to steering

59
commands. It does away with the necessity of manually designing intricate perception
and control pipelines or explicitly extracting features. Because it uses the power of
CNNs to learn complex visual representations straight from raw input photos, the
NVIDIA model has proven useful in the behavioral cloning technique. It effectively
learns to replicate human driving behavior by mapping pictures to steering angles.

It is crucial to highlight that, while the NVIDIA model provides a solid foundation for
behavioral cloning, it may have limits in dealing with difficult cases or generalizing to
unknown conditions. To increase the model's robustness and performance in real-world
driving settings, additional data augmentation techniques, model assembly, or
integrating other components such as perception and planning can be examined.

While the NVIDIA model has showed success in behavioral cloning, it has limits in
dealing with complicated scenarios such as extreme weather, unusual events, or edge
cases. For more robust and advanced self-driving systems, more sensor modalities or the
incorporation of higher-level decision-making components may be necessary.

The provided code defines a function called nvidia_model() that constructs the
architecture of the NVIDIA model for self-driving cars. Here's an explanation of
the function's code in paragraph format:
1. The function begins by creating a sequential model using Sequential(),
which allows for the sequential stacking of layers.
2. The model starts with a Conv2D layer with 24 filters, a kernel size of (5,5),
and a stride of (2,2). This layer operates on input images with
dimensions of (66,200,3) and uses the ReLU activation function.
3. The next layer is another Conv2D layer with 36 filters, the same kernel
size and stride as the previous layer, and the ELU activation function.

60
4. Following that, there is a Conv2D layer with 48 filters and the ELU
activation function.
5. The subsequent layer is a Conv2D layer with 64 filters and a kernel size of
(3,3). It also uses the ELU activation function.
6. Another Conv2D layer with 64 filters and a kernel size of (3,3) is added,
this time without a stride.
7. A Dropout layer with a dropout rate of 0.5 is included to prevent
overfitting.
8. The output of the previous layers is then flattened into a 1D vector
using the Flatten() layer.
9. Next, a fully connected Dense layer with 100 units and the ELU activation
function is added.
10. Another Dropout layer with a dropout rate of 0.5 is used.
11. The model continues with a Dense layer of 50 units and the ELU
activation function, followed by a Dropout layer.
12. A Dense layer with 10 units and the ELU activation function is added,
along with a Dropout layer.
13. Finally, a single-unit Dense layer is added, serving as the output layer for
regression, as the model predicts a continuous steering angle.
14. The model's optimizer is set to Adam with a learning rate of 1e-3, and
the mean squared error (MSE) loss function is used for training.
15. The model is compiled, and the constructed model is returned.

Fig 3.4.4 Nvidia Model summary


Training the model

61
The provided code segment trains a machine learning model using the fit()
function. Here's an explanation of the code:
1. model.fit(): This function is used to train the model. It takes the input data
(X_train) and target labels (y_train) as the training dataset.
2. epochs=30: The epochs parameter specifies the number of times the model
will iterate over the entire training dataset during training. In this case,
the model will be trained for 30 epochs.
3. validation_data=(X_valid, y_valid): The validation_data parameter is used to
specify a separate validation dataset for evaluating the model's
performance during training. It takes the input validation data ( X_valid)
and the corresponding target labels ( y_valid).
4. batch_size=100: The batch_size parameter determines the number of
samples that will be processed by the model at each training iteration.
In this case, the model will process 100 samples at a time.
5. verbose=1: The verbose parameter controls the level of output displayed
during training. Here, it is set to 1, which means training progress will be
displayed in the console for each epoch.
6. shuffle=1: The shuffle parameter specifies whether to shuffle the training
data before each epoch. A value of 1 indicates that the data will be
shuffled, which helps prevent the model from memorizing the order of
the training samples.
Overall, the model.fit() function is used to train the model for a specified number
of epochs using the provided training data. The validation dataset is used to
evaluate the model's performance, and the batch size determines the number
of samples processed at a time. The code will display training progress, and the
data will be shuffled before each epoch to enhance training effectiveness.

 loss: 0.0569: This indicates the average loss value achieved during the
training phase. The loss represents the discrepancy between the
predicted output of the model and the actual target values. A lower loss
indicates that the model's predictions are closer to the true values.

62
 val_loss: 0.0605: This represents the average loss value obtained during the
validation phase. The validation loss is calculated similarly to the training
loss but uses a separate validation dataset that the model hasn't seen
during training. The purpose of the validation loss is to assess how well
the model generalizes to unseen data.
In this case, the training loss is 0.0569, indicating that the model is able to
minimize the discrepancy between its predictions and the training data. The
validation loss of 0.0605 suggests that the model also performs well on the
unseen validation data, as the validation loss is comparable to the training loss.
Overall, having low values for both the training and validation losses indicates
that the model has learned to make accurate predictions and is not overfitting
to the training data. It suggests that the model is capable of generalizing well
to new, unseen data.

Fig 3.4.5 Loss Function on training and Validation set

3.5 Connection and driving the car with simulator


Now we want to send steering commands to the car in the simulator.
WebSocket is used for communicating with the simulator by means of a
socket.io library.

63
Fig 3.5.1 Connection with Udacity simulator through socket.io [25]

The Udacity Simulator will be connected to the socket.io server on port 4567 in
autonomous mode. When connected, it shall transmit a message to that port containing
telemetry data. Then the server could respond to that message with command lines, which
would then result in telemetry messages and so on.
Driving Code
This code is setting up a Socket.IO server using socketio. Server() and creates a Flask
application using Flask(__name__). These two elements are essential in ensuring that the
simulation of a vehicle and its driving model can communicate as soon as possible.
1. Defining a function for image processing

This function is responsible for preprocessing the input images before feeding them into
the neural network model. It crops the image to focus on the road region, converts the
image color space from RGB to YUV, applies Gaussian blur, resizes the image to a
specific dimension, and normalizes the pixel values to a range between 0 and 1.
2. Handling telemetry data from the car simulator

64
This function controls the 'telemetry' event that is received in a car simulator. In order to
get the projected steering angle, it extracts speed and image data from telemetry, decodes
and preprocesses an image by using img_preprocesses function which passes processed
images through a trained neural network model. The value of the throttle shall be
calculated by reference to the deviation from the speed and the specified velocity limit.
3. Handling the connection event

This function is an event handler for the 'connect' event, which is triggered when the
driving model is connected to the car simulator. To initialize the steering angle and
throttle values to 0, it simply prints a 'connected' message that calls an send_control
function.

4. Sending control commands to the car simulator

Using the Socket.IO program, this function sends control commands to your car
simulator. It uses a dictionary with the steering angle and throttle values to generate
an'steering' event. In order to be compatible with the communication protocol, these
values are converted into strings by __str__().

5. Main Execution

65
The pretrained model of the neural network is loaded from a file named 'Model.h5' in
the main execution block. Use socketio to wrap a Flask application on your Socket.IO
server. Listens on port 4567 are being initiated with Middleware(), and an Eventlet
server. It allows the driver model to receive telemetry and send control commands in
real time from a car simulator.

Chapter 4

66
Result and Analysis
We'll assess the Network's simulation performance in training before we road test an
established CNN.

Fig 4.1 Backend connection representation of Nvidia Model [26]

The simulator uses prerecording video recorded from a forwardfacing onboard camera
connected to human driving the data collection vehicle, which produces pictures that
are roughly what they would look like if CNN had been in charge of the car. Time
synchronization of the recorded steering instructions given by a human driver is used in
these test videos. In order to take account of deviations from the ground truth, the
simulator will transform the original images. Note that all divergences between the
human journey and the actual reality are also part of this transformation. The same
methods as in previous cases are used to achieve this transformation. The simulator will
seek the recording of this test video, together with synchronised steering instructions
which have been activated when it has been captured. The simulator sends the first
frame of the chosen test video, adjusted for any departures from the ground truth, to the
input of the trained CNN, which then returns a steering command for that frame.

67
Fig 4.2 Udacity simulator on Autonomous mode [24]

The simulator then modifies the next frame in the test video so that the image appears as if
the vehicle were at the position that resulted by following steering commands from the CNN.
After that, the new image is sent to CNN and it will be repeated. The simulator records the
off-center distance (distance from the car to the lane center), the yaw, and the distance
traveled by the virtual car. When the off-center distance exceeds one meter,
a virtual human intervention is triggered, and the virtual vehicle
position and orientation is reset to match the ground truth of the
corresponding frame of the original test video.

Fig 4.3 Driving the car in Simulator on Autonomous mode

68
Value loss or Accuracy
For every epoch of the training cycle, The Loss" parameter shall be taken into
consideration in this evaluation. Keras provides aval_loss, which is the average loss
after that epoch, to calculate the value loss over each epoch. The loss observed during
the initial epochs at the beginning of training phase is high, but it falls gradually, and
that is evident by the screenshots below.

The loss over epochs must be plotted for comparison. I came up with a graph depicting
the loss for each of the three architectures. The graph plotted between 0 to 0.1 (loss
values) shows a clearer comparison between different architecture results

69
Fig 4.4 Loss over epochs

70
Chapter 5

Conclusions and Recommendations

This project started with training the models and tweaking parameters to get the best
performance on the tracks and then trying to generalize the same performance on
different tracks. On Track 2, the models that performed best on one track did not
perform so well as a result. In order to achieve real time, it was necessary to use image
enlargement and processing. generalization.
This combination is a good fit for the development of fast and less computation
required neural networks because of the use of CNN to obtain spatial features and
RNN to temporal features in the image dataset. In future projects, the introduction of
recurrent layers for pooling layers could reduce the loss of information and would be
worth exploring.
What's interesting is that for training these models, a combination of real-world data
and simulator data can be used. Then I'll understand the true nature of how model
training in a simulator is going to enable it to be integrated into any real world, or vice
versa. In the area of autonomous cars, a lot of experimentation is being carried out and
this project has contributed significantly.

Deep neural network layers have been applied to temporal models for the
implementation of this project. It can be a significant improvement to the project's
performance by using parallel networks of network layers that enable you to monitor
specific behaviour on separate branches. One of the branches can have CNN layers, the
other with the RNN layers and combining the output with a dense layer at the end.
Similar issues exist, which are: solved using RESNET (Deep Residual networks) a
modular learning framework. RESNET are deeper than their ‘plain’ counterparts
(state-of-art deep neural networks) yet require similar number of parameters (weights).
Implementing Reinforcement Learning approaches for determining steering angles,
throttle and brake can also be a great way of tackling such problems.
The placement of fake cars and obstacles on the tracks would increase the difficulty of
solving the problem, but it would be much closer to the real world The environment
where autonomous vehicles would be exposed to the actual world. A good challenge
might be to see how well the model does when it comes to actual world data. The
model was tested in a dataset from an actual world, but no test could be carried out on
the simulator environment.

It must be tried out on autonomous cars by the big players in the auto industry. It
would be an excellent test of how this model actually works in the real world.

71
Chapter 6

References
1. Are Autonomous Vehicles Only a Technological Step? The Sustainable
Deployment of Self-Driving Cars on Public Roads. (2019). Contemporary
Readings in Law and Social Justice, 11(2), 22.
https://doi.org/10.22381/crlsj11220193

2. Frison, A. K., Forster, Y., Wintersberger, P., Geisel, V., & Riener, A. (2020,
December 14). Where We Come from and Where We Are Going: A Systematic
Review of Human Factors Research in Driving Automation. Applied Sciences,
10(24), 8914. https://doi.org/10.3390/app10248914

3. Stayton, E., & Stilgoe, J. (2020). It’s Time to Rethink Levels of Automation for
Self-Driving Vehicles. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.3579386

4. T., D. M. (2020, March 31). Self Driving Car. International Journal of


Psychosocial Rehabilitation, 24(5), 380–388.
https://doi.org/10.37200/ijpr/v24i5/pr201704

5. Jiang, A. (2021, March 1). Research on the Development of Autonomous Race


Cars And Impact on Self-driving Cars. Journal of Physics: Conference Series,
1824(1), 012009. https://doi.org/10.1088/1742-6596/1824/1/012009

6. Hőgye-Nagy, G., Kovács, G., & Kurucz, G. (2023, April). Acceptance of self-
driving cars among the university community: Effects of gender, previous
experience, technology adoption propensity, and attitudes toward autonomous
vehicles. Transportation Research Part F: Traffic Psychology and Behaviour, 94,
353–361. https://doi.org/10.1016/j.trf.2023.03.005

7. Ahmed, M., Hegazy, M., Klimchik, A. S., & Boby, R. A. (2022, December). Lidar
and camera data fusion in self-driving cars. Computer Research and Modeling,
14(6), 1239–1253. https://doi.org/10.20537/2076-7633-2022-14-6-1239-1253

8. Gragnaniello, D., Greco, A., Saggese, A., Vento, M., & Vicinanza, A. (2023, April
16). Benchmarking 2D Multi-Object Detection and Tracking Algorithms in
Autonomous Vehicle Driving Scenarios. Sensors, 23(8), 4024.
https://doi.org/10.3390/s23084024

9. Liu, J., Jayakumar, P., Stein, J. L., & Ersal, T. (2016, August 31). A study on
model fidelity for model predictive control-based obstacle avoidance in high-speed

72
autonomous ground vehicles. Vehicle System Dynamics, 54(11), 1629–1650.
https://doi.org/10.1080/00423114.2016.1223863

10. Liu, J., Jayakumar, P., Stein, J. L., & Ersal, T. (2016, August 31). A study on
model fidelity for model predictive control-based obstacle avoidance in high-speed
autonomous ground vehicles. Vehicle System Dynamics, 54(11), 1629–1650.
https://doi.org/10.1080/00423114.2016.1223863

11. Kolya, A. K., Mondal, D., Ghosh, A., & Basu, S. (2021, July). Direction and Speed
Control of DC Motor Using Raspberry PI and Python-Based GUI. International
Journal of Hyperconnectivity and the Internet of Things, 5(2), 74–87.
https://doi.org/10.4018/ijhiot.2021070105

12. SVM | Support Vector Machine Algorithm in Machine Learning.”


https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-
machine-example-code/ (accessed Jun. 01, 2022).

13. “Autonomous Driving – Car detection with YOLO Model with Keras in Python |
sandipanweb.” https://sandipanweb.wordpress.com/2018/03/11/autonomous-
driving-car-detection-with-yolo-in-python/ (accessed Jun. 01, 2022).

14. Y. Lai, N. Wang, Y. Yang, and L. Lin, “Traffic Signs Recognition and Classification
based on Deep FeatureLearning:,” in Proceedings of the 7th International
Conference on Pattern Recognition Applications and Methods, Funchal, Madeira,
Portugal, 2018, pp. 622–629. doi: 10.5220/0006718806220629.

15. “Python | Thresholding techniques using OpenCV | Set-1 (Simple Thresholding) -


GeeksforGeeks.” https://www.geeksforgeeks.org/python-thresholding-techniques-
using-opencv-set-1-simple-thresholding/ (accessed Jun. 01, 2022).

16. “Feature Detectors - Canny Edge Detector.”


https://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htm (accessed Jun. 01, 2022).

17. Detection of Traffic Sign Using CNN. (2022). Recent Trends in Parallel
Computing. https://doi.org/10.37591/rtpc.v9i1.269

18. Xashimov, B., & Khaydarova, D. (2023, April 11). Using and development of
artificial intelligence on the process of accounting. Новый Узбекистан: Успешный
Международный Опыт Внедрения Международных Стандартов Финансовой
Отчетности, 1(5), 219–223. https://doi.org/10.47689/stars.university-5-pp219-223

19. Currie, S. (2016, July 1). Self-Driving Car.

73
20. Autonomous Vehicles (AV) and Automated Guided Vehicles (AGV) | zatran. (n.d.).
zatran.com. https://www.zatran.com/en/technology/autonomous-vehicles/

21. Wang, L., Zhou, S., Fang, W., Huang, W., Yang, Z., Fu, C., & Liu, C. (2023, April
13). Automatic Piecewise Extreme Learning Machine-Based Model for S-Parameters
of RF Power Amplifier. MDPI. https://doi.org/10.3390/mi14040840

22. H., Zaibi, A., Ladgham, A., & Sakly, A. (2021, April 30). A Lightweight Model for
Traffic Sign Classification Based on Enhanced LeNet-5 Network. A Lightweight
Model for Traffic Sign Classification Based on Enhanced LeNet-5 Network.
https://doi.org/10.1155/2021/8870529

23. K, A. (2022, June 30). Fashion Recommendation System. International Journal


for Research in Applied Science and Engineering Technology, 10(6), 2470–2474.
https://doi.org/10.22214/ijraset.2022.44362

24. ROS Robotics Projects. (n.d.). O’Reilly Online Learning.


https://www.oreilly.com/library/view/ros-robotics-projects/9781838649326/73747233-
97bc-478c-a902-dc9c9ea226c3.xhtml

25. Bakoushin, A. (2020, June 19). Run a self-driving car using JavaScript and
TensorFlow.js. Medium. https://levelup.gitconnected.com/run-a-self-driving-car-
using-javascript-and-tensorflow-js-8b9b3f7af23d

26. machine learning in autonomous driving for Sale OFF 73%. (n.d.). Machine
Learning in Autonomous Driving for Sale OFF 73%.
https://www.cipranchi.ac.in/peruse.aspx?
cname=machine+learning+in+autonomous+driving&cid=23

74
75

You might also like