You are on page 1of 21

A

Practical file On
Deep Learning
Lab Code: - EE105722

Supervisor- Submitted by-

Kapil Gulati Ramnaresh Rajbhar


APX-JPR-2020/01613

Department of Computer Science Engineering


Apex University

Apex School Of Computer & IT

December-2023
Question 1 : Write a program for Data cleaning using pandas.

What Is Data Cleaning?

When working with multiple data sources, there are many chances for data to be
incorrect, duplicated, or mislabeled. If data is wrong, outcomes and algorithms are
unreliable, even though they may look correct. Data cleaning is the process of changing
or eliminating garbage, incorrect, duplicate, corrupted, or incomplete data in a dataset.
There’s no such absolute way to describe the precise steps in the data cleaning process
because the processes may vary from dataset to dataset. Data cleansing, data cleansing,
or data scrub is the general data preparation process initiative. Data cleaning plays an
important part in developing reliable answers within the analytical process and is
observed to be a basic feature of the info science basics. The motive of data cleaning
services is to construct uniform and standardized data sets that enable easy access to
data analytics tools and business intelligence and perceive accurate data for each
problem.

Why Is Data Cleaning Essential?

Data cleaning is the most important task that should be done by a data science
professional. Having wrong or bad-quality data can be detrimental to processes and
analysis. Having clean data will ultimately increase overall productivity and permit the very
best quality information in your decision-making.
some reasons why data cleaning is essential:

DataHour: Democratising AI Deployment Date:


7 Dec Time: 7 PM – 8 PM IST

RSVP Now!

1. Error-Free Data: When multiple sources of data are combined, there may be a
chance of so much error. Through Data Cleaning, errors can be removed from data.
Having clean data which is free from wrong and garbage values can help in performing
analysis faster as well as efficiently. By doing this task our considerable amount of time is
saved. The results won’t be accurate if we use data containing garbage values. When we
don’t use accurate data, we will surely make mistakes. Monitoring errors and good
reporting helps to find where errors are coming from and also makes it easier to fix
incorrect or corrupt data for future applications.
2. Data Quality: The quality of the data is the degree to which it follows the rules of
particular requirements. For example, if we have imported phone numbers data of
different customers, and in some places, we have added email addresses of customers
in the data. But because our needs were straightforward for phone numbers, then the
email addresses would be invalid data. Here some pieces of data follow a specific format.
Some types of numbers have to be in a specific range. Some data cells might require
selected quiet data like numeric, Boolean, etc. In every scenario, there are some
mandatory constraints our data should follow. Certain conditions affect multiple fields of
data in a particular form. Particular types of data have unique restrictions. It will always
be invalid if the data isn’t in the required format. Data cleaning will help us simplify this
process and avoid useless data values.

3. Accurate and Efficient: Ensuring the data is close to the correct values. We know
that most of the data in a dataset are valid, and we should focus on establishing its
accuracy. Even if the data is authentic and correct, it doesn’t mean it is accurate.
Determining accuracy helps to figure out whether the data entered is accurate or not.
For example, a customer’s address is stored in the specified format; maybe it doesn’t
need to be in the right one. The email has an additional character or value that makes it
incorrect or invalid. Another example is the phone number of a customer. This means that
we have to rely on data sources to cross-check the data to figure out if it’s accurate or
not. Depending on the kind of data we are using, we might be able to find various
resources that could help us in this regard for cleaning.

4. Complete Data: Completeness is the degree to which we should know all the required
values. Completeness is a little more challenging to achieve than accuracy or quality.
Because it’s nearly impossible to have all the info we need, only known facts can be
entered. We can try to complete data by redoing the data-gathering activities like
approaching the clients again, re-interviewing people, etc. For example, we might need
to enter every customer’s contact information. But a number of them might not have email
addresses. In this case, we have to leave those columns empty. If we have a system that
requires us to fill all columns, we can try to enter missing or unknown there. But entering
such values does not mean that the data is complete. It would still be referred to as
incomplete.
5. Maintains Data Consistency: To ensure the data is consistent within the same
dataset or across multiple datasets, we can measure consistency by comparing two
similar systems. We can also check the data values within the same dataset to see if they
are consistent or not. Consistency can be relational. For example, a customer’s age might
be 25, which is a valid value and also accurate, but it is also stated as a senior citizen in
the same system. In such cases, we have to cross-check the data, similar to measuring
accuracy, and see which value is true. Is the client a 25-year-old? Or is the client a senior
citizen? Only one of these values can be true. There are multiple ways to for your data
consistent.

• By checking in different systems.


• By checking the source.
• By checking the latest data.

Data Cleaning Cycle

It is the method of analyzing, distinguishing, and correcting untidy, raw data. Data cleaning
involves filling in missing values, handling outliers, and distinguishing and fixing errors
present in the dataset. Whereas the techniques used for data cleaning might vary in step
with different types of datasets. In this tutorial, we will learn how to clean data using
pandas. The following are standard steps to map out data cleaning:
Data Cleaning With Pandas

Data scientists spend a huge amount of time cleaning datasets and getting them in the
form in which they can work. It is an essential skill of Data Scientists to be able to work
with messy data, missing values, and inconsistent, noisy, or nonsensical data. To work
smoothly, python provides a built-in module, Pandas. Pandas is the popular Python library
that is mainly used for data processing purposes like cleaning, manipulation, and analysis.
Pandas stand for “Python Data Analysis Library”. It consists of classes to read, process,
and write csv files. There are numerous Data cleaning tools present, but the Pandas
library provides a really fast and efficient way to manage and explore data. It does that by
providing us with Series and DataFrames, which help us represent data efficiently and
manipulate it in various ways.

In this article, we will use the Pandas module to clean our dataset.

We are using a simple dataset for data cleaning, i.e., the iris species dataset. You can
download this dataset from kaggle.com.

Let’s get started with data cleaning step by step.


To start working with Pandas, we need to first import it. We are using Google Colab as
IDE, so we will import Pandas in Google Colab.

#importing module import


pandas as pd
Step 1: Import Dataset

To import the dataset, we use the read_csv() function of pandas and store it in the pandas
DataFrame named as data. As the dataset is in tabular format, when working with tabular
data in Pandas, it will be automatically converted into a DataFrame. DataFrame is a two-
dimensional, mutable data structure in Python. It is a combination of rows and columns
like an excel sheet.

Python Code:

The head() function is a built-in function in pandas for the dataframe used to display the
rows of the dataset. We can specify the number of rows by giving the number within the
parenthesis. By default, it displays the first five rows of the dataset. If we want to see the
last five rows of the dataset, we use the tail()function of the dataframe like this:

#displayinf last five rows of dataset data.tail()

Step 2: Merge Dataset

Merging the dataset is the process of combining two datasets in one and lining up rows
based on some particular or common property for data analysis. We can do this by using
the merge() function of the dataframe. Following is the syntax of the merge function:

DataFrame_name.merge(right, how='inner', on=None, left_on=None,


right_on=None, left_index=False, right_index=False, sort=False,
suffixes=('_x', '_y'), copy=True, indicator=False, validate=None) But in
this case, we don’t need to merge two datasets. So, we will skip this step.
Step 3: Rebuild Missing Data

To find and fill in the missing data in the dataset, we will use another function. There are
4 ways to find the null values if present in the dataset. Let’s see them one by one:

Using isnull() function:

data.isnull()

This function provides the boolean value for the complete dataset to know if any null value
is present or not.

Using isna() function:

data.isna()
This is the same as the isnull() function. Ans provides the same output.

Using isna().any() data.isna().any()

This function also gives a boolean value if any null value is present or not, but it gives
results column-wise, not in tabular format.

Using isna(). sum() data.isna().sum()

This function gives the sum of the null values preset in the dataset column-wise.
Using isna().any().sum()

data.isna().any().sum()

This function gives output in a single value if any null is present or not.

There are no null values present in our dataset. But if there are any null values preset, we
can fill those places with any other value using the fillna() function of DataFrame.Following
is the syntax of fillna() function:

DataFrame_name.fillna(value=None, method=None, axis=None,


inplace=False, limit=None, downcast=None)
[source]

This function will fill NA/NaN or 0 values in place of null spaces. You may also drop null
values using the dropna method when the amount of missing data is relatively small and
unlikely to affect the overall.

Step 4: Standardization and Normalization

Data Standardization and Normalization is a common practices in machine learning.

Standardization is another scaling technique where the values are centered around the
mean with a unit standard deviation. This means that the mean of the attribute becomes
zero, and the resultant distribution has a unit standard deviation.

Normalization is a scaling technique in which values are shifted and rescaled so that they
end up ranging between 0 and 1. It is also known as Min-Max scaling.

To know more about this, click here.


This step is not needed for the dataset we are using. So, we will skip this step.

Step 5: De-Duplicate Data

De-Duplicate means removing all duplicate values. There is no need for duplicate values
in data analysis. These values only affect the accuracy and efficiency of the analysis
result. To find duplicate values in the dataset, we will use a simple dataframe function,
i.e., duplicated(). Let’s see the example:

data.duplicated()

This function also provides bool values for duplicate values in the dataset. As we can see,
the dataset doesn’t contain any duplicate values. If a dataset contains duplicate values, it
can be removed using the drop_duplicates() function. Following is the syntax of this
function:

DataFrame_name.drop_duplicates(subset=None, keep='first',
inplace=False, ignore_index=False)
[source]

Step 6: Verify and Enrich the Data

After removing null, duplicate, and incorrect values, we should verify the dataset and
validate its accuracy. In this step, we have to check that the data cleaned so far is making
any sense. If the data is incomplete, we have to enrich the data again by data gathering
activities like approaching the clients again, re-interviewing people, etc.
Completeness is a little more challenging to achieve accuracy or quality in the dataset.

Step 7: Export Dataset


This is the last step of the data-cleaning process. After performing all the above
operations, the data is transformed into a clean dataset, and it is ready to export for the
next process in Data Science or Data Analysis.

Question 2 : Explain Hill Climbing Algorithm

Hill climbing is a simple optimization algorithm used in Artificial Intelligence (AI) to find the
best possible solution for a given problem. It belongs to the family of local search
algorithms and is often used in optimization problems where the goal is to find the best
solution from a set of possible solutions.
• In Hill Climbing, the algorithm starts with an initial solution and then iteratively
makes small changes to it in order to improve the solution. These changes are
based on a heuristic function that evaluates the quality of the solution. The
algorithm continues to make these small changes until it reaches a local
maximum, meaning that no further improvement can be made with the current set
of moves.
• There are several variations of Hill Climbing, including steepest ascent Hill
Climbing, first-choice Hill Climbing, and simulated annealing. In steepest ascent
Hill Climbing, the algorithm evaluates all the possible moves from the current
solution and selects the one that leads to the best improvement. In first-choice
Hill Climbing, the algorithm randomly selects a move and accepts it if it leads to
an improvement, regardless of whether it is the best move. Simulated annealing
is a probabilistic variation of Hill Climbing that allows the algorithm to occasionally
accept worse moves in order to avoid getting stuck in local maxima. def hill
climbing (f, x0):

x = x0 # initial solution

while True:

neighbors = generate_neighbors(x) # generate neighbors of x

# find the neighbor with the highest function value

best_neighbor = max(neighbors, key=f)

if f(best_neighbor) <= f(x): # if the best neighbor is not better than x, stop

return x

x = best_neighbor # otherwise, continue with the best neighbor

Question 3 : Explain Best First Search Algorithm

The implementation of our best-first search algorithm is achieved by


function best_first() and a modification of the underlying class Graph.
The best_first() function takes three parameters:
• The graph parameter takes an initialized Graph object (see the blog on the
breadth-first search algorithm, the section on graphs).
• The start_vertex parameter takes the starting vertex, which we choose freely
(remember, a graph is not a tree, there is no absolute root).
• The target parameter is the entity we want to find in the graph, enclosed in a
vertex.
class Vertex:

__slots__ = '_entity', '_h' def


__init__(self, entity, h=0):
self._entity = entity
self._h = h
# The real-world entity is represented by the Vertex object.
def entity(self): return self._entity

# The real-world entity has a heuristic function of h.


def h(self): return self._h

# We have to implement __hash__ to use the object as a dictionary key.


def __hash__(self): return hash(id(self))

def __lt__(self, other):


return self.h() < other.h()
def best_first(graph, start_vertex, target): #
Create the priority queue for open vertices.
vertices_pq = PriorityQueue()
# Adds the start vertex to the priority queue.
print(f'Visiting/queueing vertex {start_vertex.entity()}')
vertices_pq.put(start_vertex) print('Prioritized vertices
(vertex, h(vertex)):',
*((vert.entity(), vert.h()) for vert in vertices_pq.queue)
, end=2 * '\n')
# The starting vertex is visited first and has no leading edges.
# If we did not put it into 'visited' in the first iteration, # it would
end up in 'visited' during the second iteration, pointed to # by one
of its children vertices as a previously unvisited vertex.
visited[start_vertex] = None # Loops until the priority list gets
empty.
while not vertices_pq.empty(): # Gets
the vertex with the lowest cost. vertex =
vertices_pq.get() print(f'Exploring vertex
{vertex.entity()}') if vertex.entity() ==
target:
return vertex
# Examine each non-visited adjoining edge/vertex. for
edge in graph.adjacent_edges(vertex):
# Gets the second endpoint.
v_2nd_endpoint = edge.opposite(vertex) if
v_2nd_endpoint not in visited:
# Adds the second endpoint to 'visited' and maps # the
leading edge for the search path reconstruction.
visited[v_2nd_endpoint] = edge print(f'Visiting/queueing
vertex {v_2nd_endpoint.entity()}')
vertices_pq.put(v_2nd_endpoint) print('Prioritized vertices
(vertex, h(vertex)):',
*((vert.entity(), vert.h()) for vert in vertices_pq.queue)
, end=2 * '\n') return
None

Question 4 : Explain Means and analysis


Today's fast-paced and complex world, problem-solving skills are crucial for success in various
domains. Means-ends analysis (MEA) is an effective problem-solving approach that involves
dividing a problem into smaller goals and identifying the means that stand in the way of the
current state and the desired goal.

Working

The means-end analysis utilizes the following steps to achieve goals.

Advantages
• Problem-solving: Ensures a systematic approach by providing a structured framework.
• Clarity and focus: Divides complex goals into smaller subgoals to enhance clarity,
allowing individuals to focus on one step at a time.
• Progress tracking: Achieves goals effectively by tracking the progress and making
timely adjustments.

Applications
• Project management: Project managers can track progress and break down complex
projects into manageable tasks. Management achieves the desired goal by dividing it into
subgoals and linking them with associated actions.
• Technology and innovation: Organizations can drive technological advancements,
increase efficiency, and enhance user experiences by analyzing the state of technology at
the moment, establishing subgoals, and determining ways to achieve those goals.

Conclusion

Means-ends analysis assists individuals and organizations in navigating the complexities of


problem-solving. It promotes efficiency, clarity, and progress by breaking down goals, identifying
subgoals, and strategically planning the means to achieve them.

Question 5 : Explain First order Logic Algorithm ?

o First-order logic is another way of knowledge representation in artificial


intelligence. It is an extension to propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a
concise way. o First-order logic is also known as Predicate logic or First-order
predicate logic. First-order logic is a powerful language that develops information
about the objects in a more easy way and can also express the relationship between
those objects.
o First-order logic (like natural language) does not only assume that the world
contains facts like propositional logic but also assumes the following things in the
world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or nany
relation such as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics

Syntax of First-Order logic:

The syntax of FOL determines which collection of symbols is a logical expression in


firstorder logic. The basic syntactic elements of first-order logic are symbols. We write
statements in short-hand notation in FOL.

Basic Elements of First-order logic:

Following are the basic elements of FOL syntax:

Backward Skip 10sPlay VideoForward Skip 10s


Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences
are formed from a predicate symbol followed by a parenthesis with a sequence of
terms. o We can represent atomic sentences as Predicate (term1, term2, ......,
term n).

Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).


Chinky is a cat: => cat (Chinky).

Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.

First-order logic statements can be divided into two parts:

o Subject: Subject is the main part of the statement.


o Predicate: A predicate can be defined as a relation, which binds two atoms
together in a statement.

Consider the statement: "x is an integer.", it consists of two parts, the first part x is the
subject of the statement and second part "is an integer," is known as a predicate.
Quantifiers in First-order logic:

o A quantifier is a language element which generates quantification, and


quantification specifies the quantity of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of
the variable in the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).
Universal Quantifier:

Universal quantifier is a symbol of logical representation, which specifies that the


statement within its range is true for everything or every instance of a particular thing.

The Universal quantifier is represented by a symbol , which resembles an inverted A.


Note: In universal quantifier we use implication "→".
If x is a variable, then x is read as:

o For all x o
For each x o
For every x.

Example:

All man drink coffee.

Let a variable x which refers to a cat so all x can be represented in UOD as below:
x man(x) → drink (x, coffee).

It will be read as: There are all x where x is a man who drink coffee.

Existential Quantifier:

Existential quantifiers are the type of quantifiers, which express that the statement within
its scope is true for at least one instance of something.

It is denoted by the logical operator , which resembles as inverted E. When it is used with
a predicate variable then it is called as an existential quantifier.

Question 6 : Forward and Backward Channing Algorithm

orward chaining is also known as a forward deduction or forward reasoning method


when using an inference engine. The forward-chaining algorithm starts from known facts,
triggers all rules whose premises are satisfied and adds their conclusion to the known
facts. This process repeats until the problem is solved.

In this type of chaining, the inference engine starts by evaluating existing facts,
derivations, and conditions before deducing new information. An endpoint, or goal, is
achieved through the manipulation of knowledge that exists in the knowledge base.
Backward chaining is also known as a backward deduction or backward reasoning
method when using an inference engine. In this, the inference engine knows the final
decision or goal. The system starts from the goal and works backward to determine what
facts must be asserted so that the goal can be achieved.

For example, it starts directly with the conclusion (hypothesis) and validates it by
backtracking through a sequence of facts. Backward chaining can be used in debugging,
diagnostics and prescription applications.

• John is the tallest boy in the class


Which means:

Height (John) > Height (anyone in the class)

AND

John and Kim both are in the same class

AND

Height (Kim) > Height (anyone in the class except John)

AND
John is boy

SO

Height (John) > Hight(Kim)

Which aligns with the knowledge base fact. Hence the goal is proved
true.

You might also like