You are on page 1of 167

HIGHWAY TO PYTHON

2 Books in 1: The Fastest Way for Beginners to


Learn Python Programming, Data Science and
Machine Learning in 3 Days (or less) + Practical
Exercises Included

Aaron Cox
Book 1
Python for Beginners

Book 2
Python Data Science
© Copyright 2020 by Aaron Cox - All rights reserved.
This eBook is provided with the sole purpose of providing relevant
information on a specific topic for which every reasonable effort has been
made to ensure that it is both accurate and reasonable. Nevertheless, by
purchasing this eBook, you consent to the fact that the author, as well as the
publisher, are in no way experts on the topics contained herein, regardless
of any claims as such that may be made within. As such, any suggestions or
recommendations that are made within are done so purely for entertainment
value. It is recommended that you always consult a professional before
undertaking any of the advice or techniques discussed within.
This is a legally binding declaration that is considered both valid and fair by
both the Committee of Publishers Association and the American Bar
Association and should be considered as legally binding within the United
States.
The reproduction, transmission, and duplication of any of the content found
herein, including any specific or extended information, will be done as an
illegal act regardless of the end form the information ultimately takes. This
includes copied versions of the work, both physical, digital, and audio,
unless express consent of the Publisher is provided beforehand. Any
additional rights reserved.
Furthermore, the information that can be found within the pages described
forthwith shall be considered both accurate and truthful when it comes to
the recounting of facts. As such, any use, correct or incorrect, of the
provided information will render the Publisher free of responsibility as to
the actions taken outside of their direct purview. Regardless, there are zero
scenarios where the original author or the Publisher can be deemed liable in
any fashion for any damages or hardships that may result from any of the
information discussed herein.
Additionally, the information in the following pages is intended only for
informational purposes and should thus be thought of as universal. As
befitting its nature, it is presented without assurance regarding its continued
validity or interim quality. Trademarks that are mentioned are done without
written consent and can in no way be considered an endorsement from the
trademark holder.
Python for Beginners
Introduction:
Chapter 1
:
             
What is Python
Chapter 2
:
             
 
Why Python is the Easiest
Language to Learn
Chapter 3
:
             
Installing the Interpreter
Chapter 4
:
             
 
Using the Python Shell, IDLE and
Writing the First Program
Chapter 5
:
             
Variables and Operators
Chapter 6
:
             
Data Types in Python
Chapter 7
:
             
Making your program Interactive
Chapter 8
:
             
Making Choices and Decisions
Chapter 9
:
             
Functions and Models
Chapter 10
:
             
How to Work with Files
Chapter 11
:
             
Object Oriented Programming
Chapter 12
:
             
Math and binary
Chapter 13
:
             
Exercises
Conclusion
Python Data Science
Introduction
Chapter 1 Installing Python
Chapter 14
:
             
 
Python Libraries to Help with
Data Science
Chapter 15
:
             
Python Functions
Chapter 16
:
             
The Basics of Working with
Python
Chapter 17
:
             
Data Structures and the A*
Algorithm
Chapter 18
:
             
Reading data in your script
Chapter 19
:
             
Manipulating data
Chapter 20
:
             
 
Probability – Fundamental –
Statistics – Data Types
Chapter 21
:
             
Distributed Systems & Big Data
Chapter 22
:
             
Python in the Real World
Chapter 23
:
             
Linear Regression
Conclusion
Python for Beginners
THE CRASH COURSE TO LEARN
PYTHON PROGRAMMING IN 3-DAYS (OR
LESS).
MASTER ARTIFICIAL INTELLIGENCE FOR
DATA
SCIENCE AND MACHINE LEARNING +
PRACTICAL
EXERCISES.
Introduction:
Programming has come a long way. The world of programming may have
started quite some time ago; it was only a couple of decades ago that it
gained attention from computer experts from across the globe. This sudden
shift saw some great minds who contributed to the entire age of
programming far more significant than most. We saw the great GNU project
take shape during this era. We came across the rather brilliant Linux. New
programming languages were born as well, and people certainly enjoyed
these to the utmost.
While most of these programming languages worked, there was something
that was missing. Surely, something could be done to make coding a less
tedious task to do and carry out. That is precisely what a revolutionary new
language, named after Monty Python’s Flying Circus, did for the world.
Immediately, coding became so much easier for programmers. The use of
this language started gaining momentum, and today, it is set to overtake the
only language that stands before it to claim the prestigious spot of being the
world’s most favored language.
This language was the brainchild of Guido Van Rossum. Created in the year
1991, Python has become a byword for efficient and user-friendly
programming. This language is what connected the dots and gave
programmers the much-needed ease of coding that they have since been
yearning for. Naturally, the language was received well by the programming
community. Today, it is one of the most critical languages for both
professionals and students who aim to excel in fields like Machine
Learning, automation, artificial intelligence, and so much more.
With real-life examples showing a wide variety of use, Python is now living
and breathing in almost every major social platform, web application, and
website. All of this sounds interesting and exciting at the same time, but
what if you have no prior knowledge about programming? What if you
have no understanding of basic concepts and you wish to learn Python?
I am happy to report that this will provide you with every possible chance
of learning Python and allow you to jump-start your journey into the world
of programming. This is ideally meant for people who have zero
understanding of programming and may have never coded a single line of
program before.
I will walk you through all the basic steps from installation to application.
We will look into various aspects of the language and hopefully provide you
with real-life examples to further explain the importance of such aspects.
The idea of this is to prepare you as you learn the core concepts of Python.
After this, you should have no problem choosing your path ahead. The
basics will always remain the same, and this ensures that each one of those
basic elements is covered in the most productive way possible. I will try to
keep the learning process as fun as I can without deviating from the
learning itself.
Things You Need!
“Wait. Did you not say I don’t need to know anything about programming?”
Well, yes! You do not have to worry about programming or their concepts at
the moment, and when the time comes, I will do my best to explain those.
What is needed of you is something a little more obvious.

Computer: Like I said, obvious! You need a machine of your own


to download and practice the material and matter you learn from
here. To make the most out of the this, practice as you read. This
greatly increases your confidence and allows you to keep a
steady pace. The specifications do not matter much. Most of the
modern machines (2012 and above) should be able to run each of
the components without posing any problem.
An internet connection: You will be required to download a few
files from the internet.
An Integrated Development Environment (IDE): If, for some
reason, you felt intimidated by this terminology, relax! I will be
guiding you through each and every step to ensure you have all
of these and know what they are all about. For now, just imagine
this as a text editor.
A fresh mind: There is no point in learning if your mind is not
there with you. Be fresh, be comfortable. This may take a little
practice and a little time, but it will all be worth it.
That is quite literally all that you need. Before we go on into our very first
and start learning the essentials, there is but one more thing I would like to
clarify right away.
If you picked up a copy of this or are considering it, under the impression
that it will teach you all the basics about Python, good choice! However, if
you are of the idea that by the end of this, you will turn out to be a fully
trained professional with an understanding of things like machine learning
and other advanced Python fields, please understand that this would fall
outside the scope.
This is to serve as a guide, a crash course of a sort. To learn more advanced
methods and skills, you will first need to establish command over all the
basic elements and components of the language. Once done, it is highly
recommended to seek out that are for advanced learning.
What I can recommend you to do is to continue practicing your codes after
you have completed. Unlike driving and swimming, which you will
remember for the rest of your life, even if you stop doing them, Python
continues to update itself. It is essential that you keep yourself in practice
and continue to code small programs like simple calculator, number
predictors, and so on. There are quite a few exercises you can come across
online.
For advanced courses, refer to Udemy. It is one of the finest sources to gain
access to some exceptional courses and learn new dimensions of
programming, amongst many other fields.
Phew! Now that this is out of the way, I shall give you a minute to flex your
muscles, adjust your seat, have a glass of water; we are ready to begin our
journey into the world of Python.
Chapter 1: 
    
What is Python
Python is a multi-purpose language created by Guido van Rossum. The
language boasts of a simple syntax that makes it easy for a new learner to
understand and use. This will introduce the basics of the Python language.
Stay tuned.
Let’s get started!
Python is described as a general-purpose language. It has many applications
and therefore, you can use it to accomplish many different functions.
The syntax of a python language is clean and the length of the code is short.
Developers who have used Python at one point of their lives will express
how fun it was to code with Python. The beauty of Python is that it offers
you a chance to think more about the task at hand instead of the language
syntax.
Some history of Python
The design of the Python language started back in the 1980s and it was first
launched in February 1991.
Why was Python developed?
The reason why Guido Van Rossum embarked on the move to design a new
programming language is that he wanted a language that could offer a
simple syntax just like the ABC. This motivation led to the development of
a new language named Python.
But you may be wondering why just the name Python?
First, this language wasn’t named after the huge snake called python. No!
One of the interests of Rossum was watching comedy. He was a great fan of
the comedy series in the late seventies. As a result, the name of the
language was borrowed from the “Monty Python’s Flying Circus.”
Properties of Python
Easy to learn – The syntax of Python is simple and beautiful. Additionally,
Python programmers enjoy writing its syntax than other languages. Python
simplifies the art of programming and allows the developer to concentrate
on the solution instead of the syntax. For a newbie, this is a great choice to
start your Python career.
Portability – When it comes to Python portability, it offers you the ability to
run Python on different platforms without making any changes.
Python is described as a high-level language – In other words, you don’t
need to be scared of tedious tasks such as memory management and so on.
Alternatively, whenever you execute a Python code, it will automatically
change the language to a language that your computer understands. No need
to be worried about any lower-level operations.
Object-oriented – Since it is an object-oriented language, it will allow you
to compute solutions for the most difficult problems. Object-Oriented
Programming makes it possible to divide a large problem into smaller parts
by building objects.
Has a huge standard library to compute common tasks – Python has
different standard libraries for the programmer to use. As a result, you will
not write all the lines of code yourself. Instead, you will only import the
library of the relevant code.
A Brief Application of Python
Web Applications
You develop a scalable Web application using CMS and frameworks that
are created on Python. Popular environments for developing web
applications include Pyramid, Django, Django CMS, and Phone.
Other popular websites like Instagram, Mozilla, and Reddit are written in
Python language.
Scientific and Numeric Calculations
There are different Python libraries designed for Scientific and numeric
calculations. Libraries such as NumPy and SciPy use Python for general
computing purpose. And, there are specially designed libraries like AstroPy
for Astronomy, and so on.
Additionally, the Python language is highly applied in data mining, machine
learning, and deep learning.
A great Language for Tech Programmers
The Python language is an important tool used to demonstrate programming
to newbies and children. It is a great language that has important
capabilities and features. However, it is one of the easiest languages to learn
because it has a simple syntax.
Building Software Prototypes
Compared to Java and C++, Python is a bit slow. It may not be a great
choice when resources are restricted and efficiency is made compulsory.
But Python is a powerful language to build prototypes. For instance: You
can apply the Pygame library to develop the prototype of your game first. If
you enjoy the prototype, you can decide to use C++ to develop the actual
game.
Chapter 2: 
    
Why Python is the Easiest Language
to Learn
Python is an interpretive, object-oriented and dynamic data type high-level
programming language. Since the birth of Python language in the early
1990s, it has gradually been widely used in processing system management
tasks and Web programming. Especially with the continuous development
of artificial intelligence, Python has become one of the most popular
programming languages.
The first benefit that you will notice with the Python language is that it is
easy to learn. This language was developed with the beginner in mind, in
the hopes of bringing more people into coding. Some of the traditional
languages were hard and bulky, and unless you were really passionate about
some of the work that you were doing with coding, you would probably
decide to give up long before anything was done. But with the Python
language, things are a bit different. This language as designed to be easy to
learn and easy to read, which helped make it possible for more people to get
into the world of coding.
Even though you will be pleasantly surprised by how easy it is to learn
about the Python language, you will also find that it is a powerful language.
Don’t let the simplicity of this language fool you; it has enough power to
get the work done, no matter how complex or hard the problem is. Even
though Python is able to handle some of the basic coding needs that you
have, it also has the power to help you to do things like machine learning
and data analysis. And if you have spent any time working with these
topics, and these ideas, you know that they are not easy.
With this in mind, Python is also going to have a lot of extensions and
libraries that help it to work better. This is primarily how you will be able to
get Python to work with some of those more complex tasks. You can add
these simply by installing them to your computer or system, and the Python
language is ready to go when you are. You can then handle algorithms,
finish your data analysis, and so much more. There are many Python data
science libraries available based on which step of the process you are
working on at the time
Why is Python special?
There are hundreds of programming languages now available for
programmers to start with. However, according to statistics from a survey
done by Harvard computer scientists Python is a leading language among
beginners. We will discuss about some of the reasons that make Python an
understandable language for new programmers.
Python has the following major advantages over other programming
languages:
(1) The grammar is concise and clear, and the code is highly readable.
Python's syntax requires mandatory indentation, which is used to reflect the
logical relationship between statements and significantly improve the
readability of the program.
(2) Because it is simple and clear, it is also a programming language with
high development efficiency.
(3) Python can be truly cross-platform, for example, the programs we
develop can run on Windows, Linux, macOS systems. This is its portability
advantage.
(4) It consists of A large number of rich libraries or extensions. Python is
often nicknamed glue language. It can easily connect various modules
written in other languages, especially C/C++. Using these abundant third-
party libraries, we can easily develop our applications.
(5) The amount of code is small, which improves the software quality to a
certain extent. Since the amount of code written in Python is much smaller
than that in other languages, the probability of errors is much smaller,
which improves the quality of the software written to a certain extent.
Python is very versatile and can be used in the following areas:
(1) web page development;
(2) Visual (GUI) interface development;
(3) Network (can be used for network programming);
(4) System programming;
(5) Data analysis;
(6) Machine learning (Python has various libraries to support it);
(7) Web crawlers (such as those used by Google);
(8) Scientific calculation (Python is used in many aspects of the scientific
calculation).
For example, Python is used in many Google services. YouTube is also
implemented in Python. The basic framework of the Wikipedia Network
initially is also implemented in Python.
How does python work?
Python Program Execution Principle is very simple. We all know that
programs written in compiled languages such as C/C++ need to be
converted from source files to machine languages used by computers, and
then binary executable files are formed after linking by linkers. When
running the program, you can load the binary program from the hard disk
into memory and run it.
However, for Python, Python source code does not need to be compiled into
binary code. It can run programs directly from the source code. The Python
interpreter converts the source code into bytecode and then forwards the
compiled bytecode to the Python virtual machine (PVM) for execution.
When we run the Python program, the Python interpreter performs two
steps.
(1) Compiles Source Code into Byte Code
Compiled bytecode is a Python-specific expression. It is not a binary
machine code and needs further compilation before it can be executed by
the machine. This is also why Python code cannot run as fast as C/C++.
If the Python process has to write permission on the machine, it will save
the bytecode of the program as a file with the extension .pyc. If Python
cannot write the bytecode on the machine, the bytecode will be generated in
memory and automatically discarded at the end of the program. When
building a program, it is best to give Python permission to write on the
computer, so as long as the source code is unchanged, the generated .py file
can be reused to improve the execution efficiency.
(2) Forwarding the compiled bytecode to Python Virtual Machine (PVM)
for execution.
PVM is short for Python Virtual Machine. It is Python's running engine and
part of the Python system. It is a large loop that iteratively runs bytecode
instructions, completing operations one after another.
In this process, every python program is executed and gives results that can
be further analyzed and tested to deploy as new applications completely.
Chapter 3: 
    
Installing the Interpreter
Python has many free IDEs and environments available online. With this
variety of options, there are some programs which are better than others.
With their shortfalls in mind, the best software one can use to practice their
Python programming is PyCharm Community Edition.
Python is a common programming language for application development.
Python design focuses on code readability and clear programming for both
small and big projects. You are able to run modules and full application
from a massive library of resources on the server. Python works on various
operating systems, such as Windows. Installing Python on the Windows
server is a straightforward process of downloading the installer, and running
it on your server and configuring some adjustments can Python easier.
It is this software that I recommend to many of my students, although
Anaconda is another, I found quite useful. PyCharm won’t offer you the
extraordinary power and capabilities as professional software will, but for
beginners, it’s more than adequate.
With that in mind, we need only to download and install the software. I will
go through this process with you, step-by-step with pictures.
Step 1: Open your preferred internet browser, (Google Chrome, Firefox,
etc.), and search ‘PyCharm community edition’. You should see page-link
depicted in image 1.2 as your first result.
1.1: Searching PyCharm Community Edition

1.2: First Result for PyCharm Community Edition


Step 2: Once you click on the link, you should see a page like the one
depicted in image 2.1. From here, you can decide which version of
PyCharm you wish to download. Be it Mac, Windows or Linux for your
OS, Pro or Community edition depending on your preferred plan.
NOTE: I will be using Community Edition throughout
2.1: PyCharm Download Page
Step 3: Once you have chosen your preferred OS and version of PyCharm,
click the ‘Download’ button. You should see the PyCharm installer
downloading at the bottom-left of your screen.
3.1: PyCharm Downloading


Step 4: Once the download is
complete, click on the same box at the bottom-left of your screen. If you no
longer see the box or have closed your browser, locate the downloaded
installer in your Downloads folder. Double-click the icon to start the
installation.
4.1: PyCharm Download Finished

4.2: PyCharm Installer in Downloads Folder


Step 5: Now, we begin the installation. The process is simple, as you need
only click ‘Next’ and then ‘Install’ at the bottom of the installation process
boxes. However, I will be going through each box.
5.1: Box 1 - Introduction

This first box is simply introducing you to the installation process.


Click ‘Next’ to continue.
5.2: Box 2 - Install Location
The second box is concerning where the software will be installed.
PyCharm is a relatively small program, requiring less than 1GB of space. In
addition to that, you may want to install PyCharm in a certain folder, by
clicking ‘Browse...’ you will be presented with an interface that allows you
to select which one.
RECOMMENDED: The default location is perfectly fine and should cause
no issues, as long as you have enough space for the program on your
system. I recommend you leave this option unaltered. Click ‘Next’ to
continue.
5.3: Box 3 - Additional Installation Options
This step is purely optional, but I do recommend you create a Desktop
Shortcut by checking the box for ‘64-bit Launcher’, for ease of use. What
this will do is place an icon on your desktop, which you can use to quickly
start PyCharm without having to search for it in the Start Menu.
Whether you have selected this option or not, click ‘Next’ to continue.
5.4: Box 4 - Start Menu Folder Selection
This step is similar to 5.2, as you can select the folder where the software is
installed, but in this case, it is the Start Menu folder. If you have enabled the
Desktop Shortcut as recommended in 5.3, this step can be left without
alteration.
However, if you want the application icon to be stored in a specific folder,
you can change that here.
RECOMMENDED: Once more, the default location is perfectly fine and I
recommend you leave it unaltered.
Click ‘Install’ to continue.
5.5: Box 5 - PyCharm Installing
PyCharm is installing and you are on your way to Python programming!
Leave your computer running until the installation is complete.
5.6: Box 6 - PyCharm Finished Installing
PyCharm has been installed on your system and you have one more option
before clicking ‘Finish’. You can choose to run PyCharm now.
If you have unchecked this box and enabled the Desktop Shortcut, you can
find the following icon on your Desktop to start PyCharm.
5.7: PyCharm Desktop Shortcut Icon

Step 6: Once you have started PyCharm up, you should see the following as
depicted in image 6.1. For the first startup, PyCharm asks you to accept
standard terms & conditions before you can use the program.
You can read through these or not, but in order to continue, check the box
that states you have read and accepted the terms of this user agreement.
Once checked, click ‘Continue’.
6.1: Accepting User Agreement
Step 7: The box you should see is an option for most programming
software. The software developers ask if you allow the software to send
data on your usage to help in bug-fixing etc. For more details, they allow
you an option to read more about it.
You can choose to provide this information or not, you still have full access
to PyCharm.
7.1: Data Sharing Agreement
Step 8: We are in the final stages of this installation process. The few steps
are more preference steps than anything else. Once completed, you are
ready to move, where we will create a project for coding in.
Choose a theme for your UI. I will be using Darcula, but you can use
whichever. Once selected, click ‘Next: Featured plugins.
8.1: Theme Choosing

8.2: Featured Plugins


Once more, these are preference options. These plugins are more for the
experienced program and all are optional.
I won’t be using any additional plugins, so once you are ready, click ‘Start
Using PyCharm’
8.3: Finished and Ready to Start Creating Projects!
With that, you have finished installing and setting up PyCharm!
If you see the image in 8.3, you are ready to start where we will create a
project. An important last step before we start learning some code!
Chapter 4: 
    
Using the Python Shell, IDLE and
Writing the First Program
Once you have Python in your operating system, the following step is to
compile and run a program with Python.
A program is a series of instructions that have been coded and that will
allow you to perform a series of specific tasks on your computer. These
coded instructions are what are known as source code; these codes are what
the user or programmer sets in his computer.
The source code is written in the Python programming language and this
language will be converted into an executable file and for this to happen, in
other words, for the source code to be converted into an executable file, the
help of a compiler will be necessary that will be executed in a "central
processing unit" (CPU) and all this will happen with the help of an
interpreter.
In summary, we have that a compiler is going to convert our source code
into an executable file since it is a translator that transforms a program or
source code into a machine language so that it can be executed; this
translation process is what is known as compiling.
There is a difference between a compiler and an interpreter since the first
one translates a program described by the programming language into the
machine code of the system, while the interpreters only perform the
translation, be it instruction by instruction, and also do not store the result
of this translation.
Therefore, we have a source code that is going to be executable in two ways
by either a compiler or an interpreter who will execute it immediately.
When we open the IDLE in our system, in the same way that we did it
before, we are going to observe the screen that we find when we open our
IDLE, which is called Shell, or we can also call it as the interpreter of our
Python language.
Every time we open our interpreter or Shell, we will always find a kind of
header, which will always be the same, where it has Python information,
such as the version in which it is working, date and time, for example. This
type of format helps us appreciate that we are working with the Shell
interpreter.
By means of this example, we will be able to visualize how our Shell
interpreter is doing the translation from Python language to machine
language instruction by instruction.
The default on OS X is that Python 3 is not going to be installed at all. If
you want to use Python 3, you can install it using some of the installers that
are on Python.org. This is a good place to go because it will install
everything that you need to write and execute your codes with Python. It
will have the Python shell, the IDLE development tools, and the interpreter.
Unlike what happens with Python 2.X, these tools are installed as a
standard application in the Applications folder.
You have to make sure that you are able to run both the Python IDLE and
the Python shell on any computer that you are using. And being able to run
these can be dependent on which version you have chosen for your system
as well. You will need to go through and check on your system which
version of Python is there before proceeding and then moves on from there
to determine which IDLE and shell you need to work with.
We write a line of codes in Python, starting with the very famous phrase in
Python for every beginner "Hello World" and we will do it in the following
way:
The syntax is written as follows:


Already written
the instruction that we want the program to execute, we only have to press
the "Enter" key and automatically the interpreter will translate instruction
by instruction and will not wait to receive another additional instruction but
executes once we press the "Enter" key.
Additional detail of the interpreter is that it can also be used from the
command prompt, which is also available on Windows, Linux and Mac.
In order to use the interpreter from the command prompt, simply type in the
word Python and press the "Enter" key. This way, you start to run the
Python interpreter and we know that we are effectively in the interpreter
because, we are going to see the same header as we saw before.
Now we can start to execute instructions written with Python:
--- print ("Hello world"), the interpreter is going to translate this line and
immediately shows us the result "Hello world".
Chapter 5: 
    
Variables and Operators
What are Variables?
A variable is nothing more than a reserved location in the memory, a
container if you like, where values are stored. The basic rules relating to
variables are:

Values can be strings, numeric or another data type


A variable is created when they are first assigned
A variable need to be assigned before you can reference it
The value that you store in the variable may be updated or
accessed at a far ahead time
Variables do not require a declaration
The variable data type, for example, float, int, string, etc., will be
decided by Python
The Python interpreter will allocate the required amount of
memory based on the variable data type

Naming Rules for Variables


Like many things in Python, variables come under strict naming
conventions:

A variable must start with an underscore (_) or a letter – A to Z


or a to z
The other characters in the name may be underscores, letters or
numbers
Variables are case sensitive. For example, myname is a different
variable to MyName
Variable names can be any length within reason
Reserved keywords cannot be used – a list of these can be found
below

Basic Operators and Assignment Operators


The control flow of a program is the order that the code is executed in and
this is regulated through loops, conditional statements and function calls.
We are going to look first at Boolean and comparison operators, followed
by the if statement and all the variations of it.
Booleans and Comparison Operators
Boolean data types may be True or they may be False, nothing else.
Booleans are used to control program flow and to make comparisons. They
are representative of truth values that we associate with mathematics, the
logic side of it to be precise. Booleans were named after George Boole, a
mathematician, and the word always starts with a capital B. By the same
token, the two values, True and False, also start with capital letters. The
reason for this is because, in Python, they are special values.
We are going to look at how these Booleans work, including comparison
operators, and logical operators. First, we look at the comparison operators.
Comparison Operators
In computer programming, we use comparison operators as a way of
comparing two or more values and to evaluate a single value, the Boolean
True or False. These are the comparison operators:
OperatorDescription
==is equal to
! =is not equal to
<is less or lower than
>is greater or larger than
<=is less or equal to
>=is greater or equal to
To better understand the way they work, let’s look at an example where we
assigned two variables with integer values:
x=7
y=9
We can see from the example that, because x has been assigned a value of
7, it is greater than variable y, which has been assigned 9 as a value.
Using these variables and the values that go with them, we can take a better
look at the comparison operators. We are going to write a program that asks
whether each of the operators will evaluate to True or to False and then
print the result. To help understand it even more, we will ask for a string to
be printed that shows us what is happening.
x=7
y=9
print ("x == y:", x == y)
print ("x! = y:", x! = y)
print ("x < y:", x < y)
print ("x > y:", x > y)
print ("x <= y:", x <= y)
print ("x >= y:", x >= y)
The output would be:
x == y: False
x! = y: True
x < y: True
x > y: False
x <= y: True
x >= y: False
If we followed the logic in Math, we can see that Python evaluated each of
these expressions as:
Is 7 (x) equal to 9 (y)? False
Is 7 not equal to 9? True
Is 7 less than 9? True
Is 7 greater than 9 False
Is 7 less than or equal to 9? True
Is 7 not less than or equal to 9? False
We used integer numbers for this example but we could just have easily
used floats. We can also use strings with Boolean operators but do
remember that they are case sensitive. Look at a practical example of how
strings are compared:
Sally = "Sally"
sally = "sally"
print ("Sally == sally: ", Sally == sally)
The output would be:
Sally == sally: False
The string called “Sally” is not the same as the string called “sally” because
they are not the same – one begins with a capital letter, the other doesn’t. If
we were to add in a variable that we assigned with a value, “Sally”, they
would evaluate as equal:
Sally = "Sally"
sally = "sally"
also_Sally = "Sally"
print ("Sally == sally: ", Sally == sally)
print ("Sally == also_Sally", Sally == also_Sally)
The output would be:
Sally == sally: False
Sally == also_Sally: True
As well as these, we can use two other comparison operators, < and >, to
compare strings as well as evaluating Booleans with these operators:
t = True
f = False
print ("t! = f: ", t! = f)
Output
t! = f: True
This has resulted in an evaluation that True does not equal False.
Note that there is a difference between these two operators - = and ==:
x = y # Sets x as equal to y
x == y # Evaluates if x is equal to y
The first one, =, is called an assignment operator. This will set a value as
being equal to another value. ==, on the other hand, is a comparison
operator and this evaluates if two separate values are equals.
Logical Operators
We can make sue of three different logical operators when we want to
compare two values. These will evaluate an expression, down to True or
False, both Boolean values. Here is what those operators are and what they
do:
OperatorDescription
andEvaluates True if both values are true
orEvaluates True if one or more values is true
not evaluates True only if the evaluation is false
We use the logical operators to determine if at least two expressions are true
or not. For example, we can use a logical operator to see if a specific grade
is a passing grade and to check that a specific student has been registered in
the course. If both are True, the student is assigned with a grade. Another
way of looking at this would be to see if a user is an active and valid user at
an online store and this would be based on whether they have made any
purchases within the last 3 months or if they have been extended store
credit.
To better understand logical operators, look at the following expressions:
print ((9 > 7) and (1 < 3)) # Both of the original expressions evaluate to
True
print ((7 == 7) or (4! = 4)) # One of the original expressions evaluates to
True
print (not (4 <= 2)) # The original expression evaluates to False
The output would be:
True
True
True
Let’s break this down:
In the first expression, print ((9 > 7) and (1 < 3)), both 9 > 7 and 1 < 3 had
to evaluate as True because we used the and operator and both of the
statements are true
In the second expression, print ((7 == 7) or (4! = 4)), because 7 ==
7 evaluates to True, it doesn’t matter that (4! = 4) evaluated to False. We
used the or operator so only one of them had to evaluate True. If the and
operator had been used instead, this would have been False.
In the third expression, print (not (4 <= 2)), the use of the not operator
means that the False value that the expression returns is negated and the
output is True.
Now let’s look at some expressions where floating points are used instead
of integers – we want to see False as the evaluated Boolean value.
print ((-0.1 > 1.5) and (0.7 < 4.1)) # One of the original expressions
evaluates to False
print ((6.5 == 7.9) or (8.2! = 8.2)) # Both of the original expressions
evaluate to False
print (not (-4.7 <= 0.2)) # The original expression evaluates to True
In this example:
The and operator has to have one or more False expressions that evaluate to
False or both of the expressions has to evaluate as False
The inner expression of the not operator has to be True otherwise the new
expression cannot evaluate as False
Compound statements may also be written with the and, not and or
operators:
not ((-0.1 > 1.5) and ((0.7 < 4.1) or (0.2 == 0.2)))
Now, let's take a look at the inner expression – (o.7 < 4.1) or (0.2 == 0.2).
This will evaluate as True because both of the statements are True in
mathematical terms.
Then, we take the value that was returned as True and add it to the
following inner expression – (0.1 > 1.5) and (True). This will evaluate as
False because the first statement is False and a False and True must always
return False.
Lastly, the final expression – not (False) will evaluate as True so, if we were
to print all this out, the output would be:
True
Using Boolean Operators for Controlling Flow
To control how a program flows and what the outcome will be, we use low
control statements and these are made up of conditions and clauses.
A condition will evaluate to True or False and that gives us the point in the
program where a decision has been made. In other words, the condition will
determine if something will evaluate to True or to False.
The clause is a code block and it comes after the condition. The clause is
responsible for determining what the program outcome is. To clear it up, if
you had a construction of “if y is True, then do this”, the clause is the do
part of it. The example below shows you the control flow of a program
through the comparison operators working together with conditional
statements.
if grade >= 70: # Condition
print ("Passing grade") # Clause
else:
print ("Failing grade")
The program is going to evaluate each grade from the students and evaluate
if they are a passing or a failing grade. If a student had a grade of 75, the
initial statement will evaluate as True and the Passing grade print statement
is triggered. If a student has a grade of 69, the initial statement evaluates as
False and the Failing grade print statement will be executed.
Chapter 6: 
    
Data Types in Python
The basic operations that can be done in Python, we now move on to a
discussion of data types. Computer programming languages have several
different methods of storing and interacting with data, and these different
methods of representation are the data types you’ll interact with. The
primary data types within Python are integers, floats, and strings. These
data types are stored in Python using different data structures: lists, tuples,
and dictionaries. We’ll get into data structures after we broach the topic of
data types.
Integers
Integers in Python is not different from what you were taught in math class:
a whole number or a number that possess no decimal points or fractions.
Numbers like 4, 9, 39, -5, and 1215 are all integers. Integers can be stored
in variables just by using the assignment operator, as we have seen before.
Floats
Floats are numbers that possess decimal parts. This makes numbers like
-2.049, 12.78, 15.1, and 1.01 floats. The method of creating a float instance
in Python is the same as declaring an integer: just choose a name for the
variable and then use the assignment operator.
String
While we’ve mainly dealt with numbers so far, Python can also interpret
and manipulate text data. Text data is referred to as a “string,” and you can
think of it as the letters that are strung together in a word or series of words.
To create an instance of a string in Python, you can use either double quotes
or single quotes.
string_1 = "This is a string."
string_2 = ‘This is also a string.’
However, while either double or single quotes can be used, it is
recommended that you use double quotes when possible. This is because
there may be times you need to nest quotes within quotes, and using the
traditional format of single quotes within double quotes is the encouraged
standard.
Something to keep in mind when using strings is that numerical characters
surrounded by quotes are treated as a string and not as a number.
# The 97 here is a string
Stringy = "97"
# Here it is a number
Numerical = 97
Type Casting in Python
The term “type casting” refers to the act of converting data from one type to
another type. As you program, you may often find out that you need to
convert data between types. There are three helpful commands that Python
has which allow the quick and easy conversion between data types: int (),
float () and str ().
All three of the above commands convert what is placed within the
parenthesis to the data type outside the parentheses. This means that to
convert a float into an integer, you would write the following:
int (float here)
Because integers are whole numbers, anything after the decimal point in a
float is dropped when it is converted into an integer. (Ex. 3.9324 becomes 3,
4.12 becomes 4.) Note that you cannot convert a non-numerical string into
an integer, so typing: int (“convert this”) would throw an error.
The float () command can convert integers or certain strings into floats.
Providing either an integer or an integer in quotes (a string representation of
an integer) will convert the provided value into a float. Both 5 and “5”
become 5.0.
Finally, the str () function is responsible for the conversion of integers and
floats to strings. Plug any numerical value into the parenthesis and get back
a string representation of it.
List
Lists are just collections of data. When you think about a list in regular life,
you often think of a grocery list or to-do list. These lists are just collections
of items, and that’s precisely what lists in Python are; collections of items.
Lists are convenient because they offer quick and easy storage and retrieval
of items.
Let’s say we have a bunch of values that we need to access in our program.
We could declare separate variables for all those values, or we could store
them all in a single variable as a list. Declaring a list is as simple as using
brackets and separating objects in the list with commas. So, if we wanted to
declare a list of fruits, we could do that by doing the following:
Fruits = ["apple", "pear", "orange", "banana"]
It’s also possible to declare an empty list by just using empty brackets. You
can add items to the list with a specific function, the append function -
append (). We can access the items in the list individually by specifying the
position of the item that we want. Remember, Python is zero-based, and so
to get the first item is 0 in the list. How do we select the values from a list?
We just declare a variable that references that specific value and position:
Apple = fruits [ 0]
Tuple
Tuples are very similar to lists, but unlike lists, their contents cannot be
modified once they are created. The items that exist in the tuple when
created will exist for as long as the tuple exists. If it’s unclear as to when
tuples would be useful, they would be helpful whenever you have a list of
items that will never change. For example, consider the days of the week. A
list containing all the days of the week won’t change. In practice, you are
likely to use tuples far less often than you will use lists, but it’s good to be
aware of the existence of tuples.
Functionally, tuples are declared and accessed very similarly to lists. The
major difference is that when a list is created, parentheses are used instead
of brackets.
This_is_a_tuple = ("these", "are", "values", "in", "a", "tuple")
The items can be accessed with brackets, just like a list.
Word = this_is_a_tuple [ 0]
Dictionary
Dictionaries hold data that can be retrieved with reference items, or keys.
Dictionaries can be confusing for first-time programmers but try to imagine
a bank filled with a number of safety deposit boxes. There are rows and
rows of these boxes, and the contents of each box can only be accessed
when the correct key is provided. Much like opening a deposit box, the
correct key must be provided to retrieve the value within the dictionary. In
other words, dictionaries contain pairs of keys and the value that can be
accessed with those keys.
When you declare a dictionary, you must provide both the data and the key
that will point to that data. These key-value pairs must be unique. Evidently,
it would be a problem if one key could open multiple boxes, so keys in a
dictionary cannot be repeated; you cannot have two keys, both named
“Key1”.
The syntax for creating a key in Python is curly braces containing the key
on the left side and the value on the right side, separated by a colon. To
demonstrate, here’s an example of a dictionary:
Dict_example = {"key1": 39}
If you want to create a dictionary with multiple items, all you need to do is
separate the items with commas.
Dict_example2 = {"key1": 39, "key2": 21, "key3": 54}
Dictionaries can also be declared by using the dict () method. You could
create the same dictionary as above by-passing keys and their values using
the assignment operator and still separating them with commas.
Dict_example3 = duct (key1 = 39, key2 = 21, key3 = 54)
Note that this method uses parentheses instead of curly braces and doesn’t
use quotes.
To access items within the dictionary, you need to supply the appropriate
key. The syntax for this in Python is dictionary[‘key’], so in order to get 39
from the dictionary above, you would use this syntax:
number = Dict_example3["key1"]
Since the syntax above selects the value associated with the passed key, you
might be able to guess that we can overwrite the data by selecting the value
we want and using an assignment operator.
Dict_example3["key1"] = 99
Much like how it is possible to create an empty list with just an empty pair
of parentheses, we can also create an empty dictionary by using empty
brackets when we declare the dictionary.
Dict_example4 = {}
To add data to a dictionary, all we need to do is create a new dictionary
entry and assign a value to it.
Dict_example4["key1"] = 109
To drop values from the dictionary, we use the del command followed by
the dictionary and the key we want to drop.
del Dict_example4["key1"]
Chapter 7: 
    
Making your program Interactive
Input0
When writing your program or creating an application, you may require the
users to enter an input such as their username and other details. Python
provides the input () function that helps you get and process input from
users. Other than entering input, you may require the users to perform an
action so that they may go to the next step. For example, you may need
them to press the enter key on the keyboard to be taken to the next step.
Example:
input ("\n\n Press Enter key to Leave.")
Just type the above statement on the interactive Python interpreter then hit
the Enter key on the keyboard. You will be prompted to press the Enter key:
The program waits for an action from the user to proceed to the next step.
Notice the use of \n\n which is characters to create a new line. To create one
line, we use a single one, that is, \n. In this case, two blank lines will be
created. That is how Python input () function works.
Print ()
Python comes with many in-built functions. A good example of such a
function is the “print ()” function which we use for displaying the contents
on the screen. Despite this, it is possible for us to create our own functions
in Python. Such functions are referred to as the “user-defined functions”.
#!/usr/bin/python3
def functionExample():
print ('The function code to run')
bz = 10 + 23
print(bz)
Triple Quotes
Before we move into triple quotes, keep in mind that you can also create a
string like 'this'.
>>> 'this''this'
We can create the string, but we really don't have to do anything with it.
The interpreter will tell you what it is. Next enter the following into the
interpreter. (Enter 3 single quotes before and after). Warning a double quote
and single quote together will give a different error.
>>> '''line 1... line 2''''line 1\nline 2'
Notice the (\n) fora new line. Try it with print in front of it and it will put
the string in two separate lines.
>>> print ('''line 1 ... line 2''') line 1 line 2
If you enter '\n' it will display the following string on a new line..
>>> Print('I\ngo')I go
The results will be the same with '>>>'. Now a raw string with some slight
changes. Try it at the interpreter.
>>> r'string''string' >>> r'string\string2''string\\string2' >>> r"string\string2"
'string\\string2' >>> print("string\string2")string\string2 >>>
r"""string\string2""" 'string\\string2' >>> print(r"""string\string2""")
string\string2
The last example is to show that you can test your expressions as your
scripts progress towards complexity. The string examples above will
become more clear as you progress.
Escape characters
Have a look at the following code:

print(“\tHi there”)
output:
Hi there #tabbed to the right
So what is with the \t ?
The backslash ( \ ) character is used to escape characters that are required to
be interpreted differently by Python. Sounds a bit of a mouthful right!
Have another look at the output in the example above. Notice how the text
(Hi there) is tabbed to the right. Inserting the escape character \t at the
beginning of the string results in the string being tabbed to the right.
Adding two escape characters \t would result in the string being tabbed to
the right twice:

print ( "\t\tHi there")


output:
Hi there # tabbed to the right twice

\n is another popular escape character. \n adds a new line to the string.

print( "I’m going for a walk\nin the park\nbecause it is a lovely\n\tday")


output:
I’m going for a walk
in the park #adds new lines
because it is a lovely day
day #adds new line & a tab to the right

There is no space between the text and the escape character \thiya
Here are some of the most regularly used escape characters in Python.
Escape Description
character New Line
\n Horizontal
\t Tab
\\ Backslash
\’ Single quote
\” Double Quote
Let’s have a look at the some more escape characters.
The following sentences would result in an error when printed:

print ( "I said "hello mate" and he totally ignored me")


print (‘He said he’d be there at 2pm’)

In the first example Python thinks that the inverted comma before hello is
the end of the string. The third inverted comma would cause the program to
crash. Likewise, in the second sentence Python would take the back quote
on the word he’d to mean the end of the string and throw an error when it
encounters the third comma. One solution is to use single quotes when you
intend using inverted commas in the string:

print ( ‘I said “hello mate” and he totally ignored me’)

And use double quote if you intend using a lot of back quotes such as he’d,
there’s etc. in your string:

print ( " He said he’d be there at 2pm but there’s no sign of him")
Alternatively, you can use the escape character \

print ( "I said \"hello mate\" and he totally ignored me")


print ('He said he\'d be there at 2pm but there\'s no sign of him')
output:
I said "hello mate" and he totally ignored me
He said he'd be there at 2pm but there's no sign of him

Now what if you want just to print a backslash \ in Python? Yes, you also
must escape it.

print ("\") #will result in an error


print ("\\") #use a backslash to escape a backslash
output:
\

So how does this work in real life? Have a look at a snippet from a sample
food menu:
“Available drinks include tea\coffee\water”
To print this in we need to include the escape character \

print ("Available drinks include tea\\coffee\\water")


output:
Available drinks include tea\coffee\water
Chapter 8: 
             
Making Choices and Decisions
Conditions Statements
In some of the code samples we have used until now, you may notice that
the codes follow a pattern of execution that is religiously adhered to. One
prominent pattern is the top to bottom order of execution each of these
codes uses. Bringing that to light now, did you know it is possible to alter
this order of execution? Say, for instance, you want the program to make
decisions on its own, performing different actions depending on the
situation that comes up. Like printing, “Good Morning” or “Good Night”
depending on the time of day.
This is a possible feat in Python and can be achieved with the use of control
flow statements. These statements are three in number in Python, namely
while, if, and for statements. Let’s discuss each one of them briefly:
If Statements
The if statement serves as a means of taking control of how a statement that
follows it is executed — in this case, a block of code or single statement
contained in braces. The if statement evaluates the expression contained in
parentheses. Should the expression result in a value considered to be true,
the execution process is initiated? If not, the whole statement is abandoned.
Doing this allows your PHP script to make decisions on its own based on a
range of factors selected.
Syntax:
if ( expression ) {
// code to run if the expression outputs as true
}
Sample:
The following code would display x is greater than y if $x is greater than
$y:
<?php
$x=5;
$y=2;
if ($x > $y)
echo "x is bigger than y";
?>
Inline If
This statement is used alongside the if…else statement during the execution
of a series of codes, should one of a variety of conditions be true. As the
name connotes, the elseif statement is a mixture of both if and else
statements. As with the else statement, the elseif statement extends the if
statement to run another statement in the event that the main if the
expression is evaluated as FALSE. Albeit, contrary to the else statement, the
else if statement runs the alternative expression only when the assigned
conditional expression is evaluated to be TRUE. So, put simply, whenever
you wish to run a set of code when one of many different conditions
evaluate to true, the else if statement should be used.
Syntax
if (condition)
code to be run if the condition evaluates to true;
elseif (condition)
code to be run if the condition evaluates to true;
else
code to be run if the condition evaluates to false;
Sample:
The sample shown below produces “Good morning. Rise and shine!” if the
period of the day is Morning, and “Good night! Sleep well.” when it is
night. Otherwise, it produces “Have a great day!”
<html>
<body>
<?php
$t = time("T");
if ($t == "Morn")
echo "Good morning. Rise and shine!";
elseif ($t == "Ngt")
echo "Good night! Sleep well.";
else
echo "Have a great day!";
?>
</body>
</html>
When executed, the result shown below will be outputted:
Good morning. Rise and shine!
While Loop
This type of loop runs a specific block of code for as long as the given
condition remains true. Once the given condition is no longer valid, or turns
to false, the block of code will end right away.
This is quite a useful feature as there may be codes that you may need to
rely on to process information quickly. To give you an idea, suppose, you
are to guess a number. You have three tries. You want the prompt to ask the
user to guess the number. Once the user guesses the wrong number, it will
reduce the maximum number of tries from three to two, inform the user that
the number is wrong and then ask to guess another time. This will continue
until either the user guesses the right number or the set number of guesses
are utilized, and the user fails to identify the number.
Imagine just how many times you would have to write the code over and
over again. Now, thanks to Python, we just type it once underneath the
‘while’ loop, and the rest is made for us.
Here’s how the syntax for the ‘while’ loop looks like:
while condition:
code
code

You begin by typing in the word ‘while’ followed by the condition. We then
add a colon, just like we did for the ‘if’ statement. This means, whatever
will follow afterward, it will be indented to show that the same is working
underneath the loop or the statement.
Let us create a simple example from this. We start by creating a variable.
Let’s give this variable a name and a value like so:
x=0
For Loop
In Python, the for…in statement is a looping statement that allows users to
iterate over a sequence of objects. That is, it is used to go through every
item that makes up a sequence. Take note that a sequence refers to an
ordered set of items. Let’s consider the same code sample used for the if
statement. This time, though, save the file by the name “for. py”:
for x in range(1, 7):
print(x)
else:
print(‘The for loop is complete)
Output:
$ python for. py
1
2
3
4
5
6
The for loop is completed
How the for statement Works:
In the code sample used above, we attempt to print out a sequence of
numbers. This sequence of numbers is generated with the help of a built-in
“range” function. What we do at this point is to enter two numbers into the
program, and the “range” function returns a sequence of numbers beginning
from the initial number up to the second one. For instance, range (1,7)
produces the sequence (1, 2, 3, 4, 5, 6). In a default state, range assumes a
step count of 1. If we add a third number into the range, then it
automatically takes the place of the default step count. Take, for instance,
range (1,7,2) produces the sequence [1,3,5]. Take note that the range
reaches up to the second number, but does not include the second number
itself. So, the second number serves as a boundary the range never reaches
or exceeds. Keep in mind that the range() function only generates one
number per time. So, if you need a full set of numbers at any point, use the
list() on the range() function. For instance:
list(range(7)) will result in the sequence [0, 1, 2, 3, 4, 5, 6].
Moving on, the for loop steps in and begins iteration over the range— for x
in range(1,7) is the same as for x in [1, 2, 3, 4, 5, 6]. This case is also
similar to assigning each object or number in the sequence to x, one per
time, and then running the clock of code for every value of x. At this point,
we go straight to printing the values within the block of code. Recall that
the else of the code remains optional. So, when it is introduced, it is only
ever executed after the for loop has been entirely executed, or until a break
statement is used. Also, recall that for in loops work on all sequences. At
this point, there is a sequence of numbers produced from executing the
range function. However, it is possible to use still any other sequence
containing any type of object.
Break
The break statement in Python is applied as a breakout strategy from a loop
statement. That is, it is used to stop the running of a loop statement, even
when the condition for looping remains True, and the sequence of objects
has not undergone complete iteration. A point worth noting is that when
you apply the break statement to a while or for loop, any other alternative
loop, such as the else or elif block, remains unexecuted.
Let’s consider the same code sample used for the if statement. Save the file
by the name “break. py”:
while True:
m = input('Enter something : ')
if m == 'quit':
break
print('Length of the string is', len(m))
print('Completed')
When the code is executed, the result is as follows:
$ python break. py
Enter something: Python is easy to learn
Length of the string is 23
Enter something: When my work is over
Length of the string is 20
Enter something: You could make your work fun:
Length of the string is 29
Enter something: Hello, World!
Length of the string is 13
Enter something: quit
Completed
Continue
In Python, the continue statement is used to inform the program to skip the
remainder of the statements yet unexecuted in the present loop block and
continue to the following loop iteration. Let’s consider a sample code of the
continue statement in use. Save the file as continue. py.
while True:
j = input(‘Write something : ')
if j == 'quit':
break
if lensj) <5:
print(‘Entry is too small')
continue
print('Entry is of sufficient length')
# Process other type of things here...
When the code sample above is executed, the result is as follows:
$ python continue. py
Enter something: x
Entry is too small
Enter something: 515
Entry is too small
Write something: vwxyz
Entry is of sufficient length
Write something: quit
Try & Except
The try except blocks was used to manage the error. However, you or your
user can still do something to screw your solution up. For example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except:
print("Cannot Divide by Zero.")
>>> div(5, "a")
Cannot Divide by Zero.
>>> _
The statement prepared for the “except” block is not enough to justify the
error that was created by the input. Dividing a number by a string does not
warrant a “Cannot Divide by Zero.” message.
For this to work, you need to know more about how to use except block
properly. First of all, you can specify the error that it will capture and
respond to by indicating the exact exception. For example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except ZeroDivisionError:
print("Cannot Divide by Zero.")
>>> div(5, 0)
Cannot Divide by Zero.
>>> div(5, "a")
Traceback (most recent call last):
File "<stdin>", line 1, <module>
File "<stdin>", line 3, in div
TypeError: unsupported operand type(s) for /: 'int' and 'str'
>>> _
Chapter 9: 
    
Functions and Models
Functions of the regression analysis
Trend Forecasting
Determine the strength of predictors
Predict an effect
Breaking down regression
There are two basic states of regression-linear and multiple regression.
Although there are different methods for complex data and analysis. Linear
regression contains an independent variable to help forecast the outcome of
a dependent variable. On the other hand, multiple regression has two or
more independent variables to assist in predicting a result.
Regression is very useful to financial and investment institutions because it
is used to predict the sales of a particular product or company based on the
sales and GDP growth among many other factors. The capital pricing model
is one of the most common regression models applied in the finance. The
example below describes formulae used in the linear and multiple
regression.

Choosing the best regression model


Selecting the right linear regression model can be very hard and confusing.
Trying to model it with a sample data cannot make it easier. These are some
of the most popular statistical methods which one can use to choose models,
challenges that you might come across, and lists some practical advice to
use to select the correct regression model.
It always begins with a researcher who would like to expand the
relationship between the response variable and predictors. The research
team that is accorded with the responsibility to perform investigation
essentially measures a lot of variables but only has a few in the model. The
analysts will make efforts to reduce the variables that are different and
apply the ones which have an accurate relationship. As time moves on, the
analysts continue to add more models.
Statistical methods to use to find the best regression model
If you want a great model in regression, then it is important to take into
consideration the type of variables which you want to test as well as other
variables which can affect the response.
Modified R-squared and Predicted R-squared.
Your model should have a higher modified and predicted R-squared values.
The statistics are shown below help eliminate critical issues which revolve
around R-squared.
• The adjusted R squared increases once a new term improves the model.
• Predicted R-squared belongs to the cross-validation that helps define how
your model can generalize remaining data sets.
P-values for the Predictors
When it comes to regression, a low value of P denotes statistically
significant terms. The term “Reducing the model” refers to the process of
factoring in all candidate predictors contained in a model.
Stepwise regression
This is an automated technique which can select important predictors found
in the exploratory stages of creating a model.
Real World Challenges
There are different statistical approaches for choosing the best model.
However, complications still exist.
• The best model happens when the variables are measured by the study.
• The sample data could be unusual because of the type of data collection
method. A false positive and false negative process happens when you
handle samples.
• If you deal with enough models, you’ll get variables that are significant
but only correlated by chance.
• P-values can be different depending on the specific terms found in the
model.
• Studies have discovered that the best subset regression and stepwise
regression can’t select the correct model.
Finding the correct Regression Model
Theory
Perform research done by other experts and reference it into your model. It
is important that before you start regression analysis, you should develop
ideas about the most significant variables. Developing something based on
outcome from other people eases the process of collecting data.
Complexity
You may think that complex problems need a complex model. Well, that is
not the case because studies show that even a simple model can provide an
accurate prediction. Once there is a model with the same explanatory
potential, the simplest model is likely to be a perfect choice. You just need
to start with a simple model as you slowly advance the complexity of the
model.
How to calculate the accuracy of the predictive model
There are different ways in which you can compute the accuracy of your
model. Some of these methods include:
You divide the dataset into a test and training data set. Afterward, build the
model based on the training set and apply the test set as a holdout sample to
measure your trained model with the test data. The following thing to do is
to compare the predicted values using actual values by computing the error
by using measures like the “Mean Absolute Percent Error” (MAPE). If your
MAPE is less than 10%, then you have a great model.
2. Another method is to calculate the “Confusion Matrix” to the computer
False Positive Rate and False Negative Rate. These measures will allow a
person to choose whether to accept the model or not. If you consider the
cost of the errors, it becomes a critical stage of your decision whether to
reject or accept the model.
3. Computing Receiver Operating Characteristic Curve (ROC) or the Lift
Chart or Area under the curve (AUC) are other methods that you can use to
decide on whether to reject or accept a model.
Chapter 10: 
                     
How to Work with Files
The succeeding thing that we need to focus on when it comes to working
with Python is making sure we know how to work and handle files. It may
happen that you are working with some data and you want to store them
while ensuring that they are accessible for you to pull up and use when they
are needed. You do have some choices in the way that you save the data,
how they are going to be found, and how they are going to react in your
code.
When you work with the files, you will find that the data is going to be
saved on a disk, or you can re-use in the code over and over again as much
as you would like. This is going to help us learn a bit more about how to
handle some of the work that we need to do to ensure the files behave the
way that they should, and so much more.
Now, we are going to enter into file mode on the Python language, and this
allows you to do a few different options along the way. A good way to think
about this is that you can think about it like working on a file in Word. At
some point, you may try to save one of the documents that you are working
with so that it doesn’t get lost and you can find them. These kinds of files in
Python are going to be similar. But you won’t be saving pages as you did on
Word, you are going to save parts of your code.
You will find with this one that there are a few operations or methods that
you can choose when it comes to working with files. And some of these
options will include:
Closing up a file you are working on.
Creating a brand new file to work on.
Seeking out or moving a file that you have over to a new location to make it
easier to find.
Writing out a new part of the code on a file that was created earlier.
Creating new files
The first task that we are going to look at doing here is working on creating
a file. It is hard to do much of the other tasks if we don’t first have a file in
place to help us out. if you would like to be able to make a new file and
then add in some code into it, you first need to make sure the file is opened
up inside of your IDLE. Then you can choose the mode that you would like
to use when you write out your code.
When it comes to creating files on Python, you will find there are three
modes that you can work with. The three main modes that we are going to
focus on here include append (a), mode(x) and write(w).
Any time that you would like to open up a file and make some changes in it,
then you would want to use the write mode. This is the easiest out of the
three to work with. The write method is going to make it easier for you to
get the right parts of the code set up and working for you in the end.
The write function is going to be easy to use and will ensure that you can
make any additions and changes that you would like to the file. You can add
in the new information that you would like to the file, change what is there,
and so much more. If you would like to see what you can do with this part
of the code with the write method, then you will want to open up your
compiler and do the following code:
#file handling operations
#writing to a new file hello. txt
f = open(‘hello. txt’, ‘w’, encoding = ‘utf-8’)
f.write(“Hello Python Developers!”)
f.write(“Welcome to Python World”)
f.flush()
f.close()
From here, we need to discuss what you can do with the directories that we
are working with. The default directory is always going to be the current
directory. You can go through and switch up the directory where the code
information is stored. Still, you have to take the time, in the beginning, to
change that information up, or it isn’t going to end up in the directory that
you would like.
Whatever directory you spent your time in when working on the code is the
one you need to make your way back to when you want to find the file. If
you would like it to show up in a different directory, make sure that you
move over to that one before you save it and the code. With the option that
we wrote above, when you go to the current directory (or the directory that
you chose for this endeavor, then you will be able to open up the file and
see the message that you wrote out there.
For this one, we wrote a simple part of the code. You, of course, will be
writing out codes that are much more complicated as we go along. And
with those codes, there are going to be times when you would like to edit or
overwrite some of what is in that file. This is possible to do with Python,
and it just needs a small change to the syntax that you are writing out. A
good example of what you can do with this one includes:
#file handling operations
#writing to a new file hello. txt
f = open(‘hello. txt’, ‘w’, encoding = ‘utf-8’)
f.write(“Hello Python Developers!”)
f.write(“Welcome to Python World”)
mylist = [“Apple”, “Orange”, “Banana”]
#writelines() is used to write multiple lines into the file
f.write(mylist)
f.flush()
f.close()
The example above is a good one to use when you want to make a few
changes to a file that you worked on before because you just need to add in
one new line. This example wouldn’t need to use that third line because it
just has some simple words, but you can add in anything that you want to
the program, just use the syntax above and change it up for what you need.
What are the binary files?
One other thing that we need to focus on for a moment before moving on is
the idea of writing out some of your files and your data in the code as a
binary file. This may sound a bit confusing, but it is a simple thing that
Python will allow you to do. All that you need to do to make this happen is
to take the data that you have and change it over to a sound or image file,
rather than having it as a text file.
With Python, you can change any of the code that you want into a binary
file. It doesn’t matter what kind of file it was in the past. But you do need to
make sure that you work on the data in the right way to ensure that it is
easier to expose in the way that you want. The syntax that is going to be
needed to ensure that this will work well for you will be below:
# write binary data to a file
# writing the file hello. that write binary mode
F = open(‘hello. dat’, ‘wb’)
# writing as byte strings
f.write(“I am writing data in binary file!/n”)
f.write(“Let’s write another list/n”)
f.close()
If you take the time to use this code in your files, it is going to help you to
make the binary file that you would like. Some programmers find that they
like using this method because it helps them to get things in order and will
make it easier to pull the information up when you need it.
Opening your file up
So far, we have worked with writing a new file and getting it saved, and
working with a binary file as well. In these examples, we got some of the
basics of working with files down so that you can make them work for you
and you can pull them up any time that you would like.
Now that this part is done, it is time to learn how to open up the file and use
it, and even make changes to it, any time that you would like. Once you
open that file up, it is going to be so much easier to use it again and again as
much as you would like. When you are ready to see the steps that are
needed to open up a file and use it, you will need the following syntax.
# read binary data to a file
#writing the file hello. that write append binary mode
with open(“hello. dat”, ‘rb’) as f:
data = f.read()
text = data. decode(‘utf-8’)
print(text)
The output that you would get form putting this into the system would be
like the following:
Hello, world!
This is a demo using with
This file contains three lines
Hello world
This is a demo using with
This file contains three lines.
Seeking out a file you need
And finally, we need to take a look at how you can seek out some of the
files that you need on this kind of coding language. We already looked at
how to make the files, how to store them in different manners, how to open
them and rewrite on them, and then how to seek the file. But there are times
where you can move one of the files that you have over to a new location.
For example, if you are working on a file and as you do that, you find that
things are not showing up the way that you would like it to, then it is time
to fix this up. Maybe you didn’t spell the time of the identifier the right
way, or the directory is not where you want it to be, then the seek option
may be the best way to actually find this lost file and then make the
changes, so it is easier to find.
With this method, you are going to be able to change up where you place
the file, to ensure that it is going to be in the right spot all of the time or
even to make it a bit easier for you to find it when you need. You just need
to use a syntax like what is above to help you make these changes.
Working through all of the different methods that we have talked about are
going to help you to do a lot of different things inside of your code.
Whether you would like to make a new file, you want to change up the
code, move the file around, and more; you will be able to do it all using the
codes that we have gone through.
Chapter 11: 
                     
Object Oriented Programming
Python classes make Python an object-oriented language. It is one of the
most effective approaches to writing software. You write classes to depict
real-life objects in programs. A class allows you to define the general
behavior of a real-life object. The class is equipped with the attributes of
that object. You can add more traits along the way. Any real-life object can
be modeled on classes. There is a feature known as instantiation in which
you have to work with instances. You will write certain classes that tend to
extend the functionality of existing classes.
Object-orienting programming allows you to create different objects. You
will able to see the world as a programmer does. You can be a creator of
things that exist in your imagination. You will think logically and write
programs that allow you to complete your tasks effectively and efficiently.
Classes make life easier for you as you move on to complex tasks in your
programming life.
But many of the newer programming languages have been changed to
become object oriented. These are easier to deal with and can be used in a
variety of different ways. Python is one of these object-oriented
programming languages, and you will be able to look at the objects and
determine what they are attached to. So, if you have a ball inside of the
program or the code, it should match up to the ball that you would find in
real life. This helps to keep things in order and even a beginner will be able
to recognize how the objects work inside of the code.
With that being said, you will also need to look for some of the attributes
that are in your code. The attributes are what is going to determine the
object. An excellent way to think about this is to pick out an object, such as
a box. The attributes would be the things that you would use to describe the
object. So, in this case, this is going to be brown, big, sturdy, square, and so
on. These should all make sense to others who would look at the box and
want to describe it. For example, you would not want to add in bouncing or
flying to the box, because these are not attributes that are usually given to
the box.
These classes are also going to help you to organize some of the objects that
you are making. If there are a few objects that you are using, you will be
able to put them all inside the same class so that you are able to find them
later on. You are able to make the class be composed of anything that you
would like, but it is often better to make the items inside of the class be
similar, so that they make more sense and it keeps the code easy to work
with. You may have to think this through a little before you get started, but
you should be able to organize the objects that you are using into the right
classes to help the interpreter do the work the way that you would like.
Creating a Python Class
Python classes can be used to model anything such as a bird, an animal, or a
human being as well. In the following example, I will be writing a
superman class. I will attribute to it certain behaviors that you might have
seen in a superman cartoon or movie. You can study the code, understand,
and then create your own object on its basis. Before I create a superman
class, I will create an eagle class to make things easier for you. Let’s jump
on to Python editor. Each instance that is created from an Eagle class will
have a name and its age. After that, I will attribute certain behaviors to the
Eagle class, such as flying and attacking the prey.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
my_eagle = Eagle('Gamon', 5)
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
>>>
This is an Eagle class. The last two lines of the code are the instance that I
have created. This is a kind instruction for the class on which it will act.
You can add as many instances in the Python class as you want to.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
my_eagle = Eagle('Gamon', 5)
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
my_eagle1 = Eagle('Timmy', 4)
print("My eagle's name is " + my_eagle1.ename.title() + ".")
print("The eagle is " + str(my_eagle1.eage) + " years old.")
my_eagle2 = Eagle('Flyer', 5)
print("My eagle's name is " + my_eagle2.ename.title() + ".")
print("The eagle is " + str(my_eagle2.eage) + " years old.")
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
My eagle's name is Timmy.
The eagle is 4 years old.
My eagle's name is Flyer.
The eagle is 5 years old.
>>>
I have added three instances this time. That’s how you can add as many
instances to a Python class as you want to. I have told Python to create three
eagles who have different names and different age groups. When Python
reads this line, it calls on to the __init__ () method to create an object. The
__init__() method is known as a particular method in Python classes that
Python needs to run any new instance that you create for a class. There are
two leading and two trailing underscores in the script of the method. I have
allocated three attributes to the __init__() method. I will add one or two
more in the following example.
In the instance that I have created, I just passed the name and age of the bird
that was then applied by the __init__ method. I have also added two more
methods; one to make the eagle fly and the other to attack prey. You can
add as many methods as you like. In the following example, I will make the
program a bit more complex. Let’s see how it is done.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage, ecolor):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
self.ecolor = ecolor
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
def rest(self):
"""This will simulate it to rest in response to a command."""
print(self.ename.title() + " is resting in the nest!")
my_eagle = Eagle('Gamon', 5, 'black')
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
my_eagle1 = Eagle('Timmy', 4, 'blue')
print("My eagle's name is " + my_eagle1.ename.title() + ".")
print("The eagle is " + str(my_eagle1.eage) + " years old.")
my_eagle2 = Eagle('Flyer', 5, 'grey')
print("My eagle's name is " + my_eagle2.ename.title() + ".")
print("The eagle is " + str(my_eagle2.eage) + " years old.")
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
My eagle's name is Timmy.
The eagle is 4 years old.
My eagle's name is Flyer.
The eagle is 5 years old.
>>>
I have added more arguments to the function to make the class more
interactive. Now it is time to call all the functions to make the eagle do
what I made it do. I will make the eagle fly, attack, eat, and rest in the nest.
It is really amazing to see it do things that you want it to do.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage, ecolor):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
self.ecolor = ecolor
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
def eat(self):
"""This will simulate it to eat in response to a command."""
print(self.ename.title() + " is eating the rabbit!")
def rest(self):
"""This will simulate it to rest in response to a command."""
print(self.ename.title() + " is resting in the nest!")
my_eagle = Eagle('Gamon', 5, 'black')
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
my_eagle.fly()
my_eagle.attack()
my_eagle.eat()
my_eagle.rest()
my_eagle1 = Eagle('Timmy', 4, 'blue')
print("My eagle's name is " + my_eagle1.ename.title() + ".")
print("The eagle is " + str(my_eagle1.eage) + " years old.")
my_eagle1.fly()
my_eagle1.attack()
my_eagle1.eat()
my_eagle1.rest()
my_eagle2 = Eagle('Flyer', 5, 'grey')
print("My eagle's name is " + my_eagle2.ename.title() + ".")
print("The eagle is " + str(my_eagle2.eage) + " years old.")
my_eagle2.fly()
my_eagle2.attack()
my_eagle2.eat()
my_eagle2.rest()
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
Gamon is now flying high in the air.
Gamon is attacking a rabbit!
Gamon is eating the rabbit!
Gamon is resting in the nest!
My eagle's name is Timmy.
The eagle is 4 years old.
Timmy is now flying high in the air.
Timmy is attacking a rabbit!
Timmy is eating the rabbit!
Timmy is resting in the nest!
My eagle's name is Flyer.
The eagle is 5 years old.
Flyer is now flying high in the air.
Flyer is attacking a rabbit!
Flyer is eating the rabbit!
Flyer is resting in the nest!
>>>
When you want to call a method, you just need to give the name of the
instance that is, in my case, my_eagle, my_eagle1, and my_eagle2. You can
give any names to the attributes and methods, but if they are descriptive, it
will help you read through the code quickly and identify what is missing if
you are receiving error messages.
Chapter 12: 
                     
Math and binary
Whether you’re using a simple command prompt, the Jupyter development
environment (check it out, it’s excellent!), or your favorite Python file
editor, you can import the math module by typing “import math”. This
loads the module and makes a number of mathematical functions available
to you. On a related note, other modules can be imported the same way, so
if you’re interested in how you can extend Python in the future, search for
the list of Python modules. It is quite extensive and diversified, and many of
them are used in fields like data science and machine learning.
One of the functions you’ll have access to is the square root function we
mentioned earlier. This isn’t part of basic Python, but after importing the
module, you can now try the following example:
import math
print (sqrt (36))
Oops! You get an error. But why? We imported the math module, didn’t
we? Yes, we did, but the problem is we still wrote the function the same
way we would declare normal, built-in functions. When using module
functions, we need also to include the name of the module, followed by a
dot and then add the function. Here’s how:
import math
x = 36
print (math. sqrt (36))
6.0
Now we have the result of the function, which is 6.0.
The math module we imported adds a great deal of functions, including
trigonometric, logarithmic, and hyperbolic functions, as well as constants
such pi. We’re not going to explore all of them because learning
mathematics is beyond the scope, mainly because only specific fields
require advanced notions beyond the standard operations you learned. But if
you’re interested nonetheless and you have some math knowledge, you can
play around with the following functions:
math. cos (x): This will return the cosine of x radians.
math. degrees(x): This function translates the angle of x from radians to
degrees.
math.e: The value of e is a constant equal to 2.7182…and it doesn’t require
parentheses. The same goes for pi, which is another constant. These are
exceptions to the rule.
math. log (x, y): Return the natural logarithm of x to base y.
math. factorial (x): This will return the factorial value of x.
There are a lot more functions included in the math module. If you plan to
pursue a path in data science or machine learning, you should look them up
and brush up on your skills in mathematics.
Binary and Text

Files can be classified into two distinct categories:


Binary Files: These files are used to store computer data in the form of
bytes. This is the computer’s native language, so whatever you see in these
files is unreadable to your eyes. Well, actually, you can learn binary and
find out what every combination of 0 and 1 means, but realistically you
don’t want to go through that. On a side note, if you open this kind of file
using a text editor, you’ll see a bunch of unreadable gibberish. For instance,
you can open an image file inside a text editor and you’ll see text, just not
in characters you’re familiar with. The text won’t mean anything to you, but
you can read it. Just don’t save and overwrite that file in text or you can
cause some issues.
Binary files include the following examples: Executable files (.exe, .bin),
images (jpg, gif), pdf documents, compressed zip files, mp3 audio files,
videos, and fonts.
Text Files: These files are readable because they contain characters. So
when you run them inside an editor, you’ll see the text characters you’re
used to. However, this doesn’t mean you’ll understand what you’re reading
because they might not be set in a particular language.
Text files include the following examples: Simple text files like .txt and
.csv, source code files like your Python files, and data (json or xml).
The file types we mention are the most common ones you’re undoubtedly
familiar with. There are other files that are split into these two categories.
Before we get started with practical examples, take note that we’re going to
use Visual Studio Code as our coding editor instead of the usual Jupyter
notebook, Vim or the online Python console. You may have heard of Visual
Studio being used generally with other languages like C#, but it also offers
Python support. You don’t have to use this editor, though. Any will do just
fine. The reason why we’re going to play around with it is because it has a
handy Explorer bar to show us the folder/directory we’re in.
Chapter 13: 
                     
Exercises
In your first program you had a single statement that was printed with the
print function. Keep in mind that you can also print any number of
statements, even in the same line, even if they are represented by several
variables. This is done with one of the most successful operations you will
perform on strings called concatenation. This concept is simple. All it
involves is linking multiple strings together. Here’s a simple example:
charRace = “human”
charGender = “male”
print (charRace, charGender)
The output will be “human male”.
As you can see, we have two variables and each one of them holds a string.
We can print both of them by separating the variables with commas when
writing the print statement. Keep in mind that there are multiple ways you
can do this. For instance, if you don’t want to use variables but you need to
concatenate the strings, you can get rid of the commas inside the print
statement. You will notice a little problem, though. Here’s the example:
print (“school” “teacher”)
The result is “schoolteacher”. What happened? We didn’t leave any
whitespace. Take note that whitespace can be part of a string just as
numbers and punctuation marks. If you don’t leave a space, words will be
glued together. The solution is to simply add one blank space before or after
one of the strings, inside the quotes.
Subsequent, let’s see what happens if you try to combine the two methods
and concatenate a variable together with a simple string.
print (charRace “mage”)
This is what you will see:
File "<stdin>", line 1
print (characterGender “warrior”)
^ SyntaxError: invalid syntax
Congratulations, you got your first syntax error. What’s the problem here?
We tried to perform the concatenation without using any kind of separator
between the two different items.
Let’s take a look at one more method frequently used to concatenate a set of
strings. Type the following:
x = “orc”
y = “ mage”
x+y
As you can see you can apply a mathematical operator when working with
string variables. In this case, we add x to y and achieve string
concatenation. This is a simple method and works just fine, however, while
you should be aware of it, you shouldn’t be using it. Mathematical
operations require processing power. Therefore, you are telling your Python
program to use some of your computer juice on an operation that could be
written in such a way as not to consume any resources. Whenever you work
on a project, at least a much more complex one, code optimization becomes
one of your priorities and that involves managing the system’s resource
requirement properly. Therefore, if you have to concatenate a large number
of string variables, use the other methods that don’t involve any math.
We’re going to carry on with If and Then in a bit but before we do, there’s
one more thing to consider: comparing variables.
Sometimes it will be useful to look at one variable and then compare that to
another variable. For instance, we might want to compare a string to a
stored password if we’re asking someone to log in. Alternatively, we might
be trying to find out if someone is older or younger than a certain age.
To do this, we have a few symbols and conventions. To ask if something
‘equals’ something else, we will use the symbol ‘==’ (using ‘==’ compares
two variables, whereas one ‘=’ forces them to be the same). This is what
will allow us to test certain conditions for our IF, THEN statements. This
way we can say ‘IF’ password is correct, ‘THEN’ proceed.
For example:
Password = "guest"
Attempt = "guest"
if Attempt == Password:
print("Password Correct")
This essentially tests the imaginary password attempt against the true
password and only says ‘correct’ when the two strings are the same. Notice
that we aren’t actually using the word ‘next’ at any point. In some
programming languages (such as BASIC) you actually do write ‘next’ but
in most it is implicit. Anything that comes after the colon is ensuing, which
is just the same way that loops work! Python is nice and consistent and it’s
actually a very attractive and simple language to look at when you code
with it well…
(That’s right – programming languages can be attractive! In fact, there is
even such thing as ‘code poems’!)
We can also use an input to make this a bit more interactive!
Doing this is very easy:
Password = "guest"
Attempt = input("Please enter password: ")if Attempt == Password:
print("Password Correct")
Try entering the right password and you should be presented with the
correct message – congrats!
There’s just one problem at the moment, which is that our user will still be
able to get into the program if they get the program wrong! And there is
nothing to tell them that they answered incorrectly…
Fortunately, we can fix this with our following statement: ‘else’.
As you might already have guessed, ‘else’ simply tells us what to do if the
answer is not correct.
This means we can say:
Password = "guest"
Attempt = input("Please enter password: ")if Attempt == Password:
print("Password Correct")
else:
print("Password Incorrect!")
Note that the ‘else’ statement moves back to be in-line with the initial ‘if’
statement. Try entering wrong passwords on purpose now and the new
program will tell you you’ve made a mistake!
Okay, so far so good! But now we have another problem: even though our
user is entering the password wrong and being told as much, they are still
getting to see whatever code comes subsequent:
Password = "guest"
Attempt = input("Please enter password: ")if Attempt == Password:
print("Password Correct")
else:
print("Password Incorrect!")
print(“Secret information begins here…”)
Of course this somewhat negates the very purpose of having a password in
the first place!
So now we can use something else we learned earlier – the loop! And better
yet, we’re going to use while True, break and continue. Told you they’d
come in handy!
Password = "guest"
while True:
Attempt = input("Please enter password: ")
if Attempt == Password:
print("Password Correct")
break
else:
print("Password Incorrect!")
continue
print("Secret information begins here...")
Okay, this is starting to get a little more complex and use multiple concepts
at once, so let’s go through it!
Basically, we are now starting a loop that will continue until interrupted.
Each time that loop repeats itself, it starts by asking for input and waits for
the user to try the password. Once it has that information, it tests the
attempt to see if it is correct or not. If it is, it breaks the loop and the
program continues.
If it’s not? Then the loop refreshes and the user has another attempt to enter
their password!
We’ve actually gone on something of a tangent here but you may recall that
the title of this was ‘Comparing Variables’. What if we don’t want to test
whether two variables are the same? What if we want to find out if one
variable is bigger than another? We can ask if something is ‘bigger’ using
the symbol ‘>’ and ask whether it is smaller using the ‘<’ symbol. This is
easy to remember – just look at the small end and the big end of the
character!
Adding an equals sign will make this test inclusive. In other words ‘>=’
means ‘equal or bigger than’.
Likewise, we may also test if two strings are different. We do this like so:
‘!=’ which basically means ‘not equal to’.
Using that last example, we can turn our password test on its head and
achieve the exact same end result:
Password = "guest"
while True:
Attempt = input("Please enter password: ")
if Attempt != Password:
print("Password Incorrect!")
continue
else:
print("Password Correct")
break
print("Secret information begins here...")
Of course when you get programming you’ll find much more useful ways
to use this symbol!
Let’s Make Our First Game!
We’ve talked an awful lot of theory at this point so perhaps it’s time for us
to make our first game! It’s not going to be that much fun, seeing as you’ll
know the answer – but you can get your friends to play it to impress them
with your coding know-how (unfortunately, it’s still not all that fun even
then!).
The game is simply going to get the player to guess the number it is
thinking of and will then give clues to help them get there if they get it
wrong.
CorrectNumber = 16
while True:
GuessedNumber = int(input("Guess the number I'm thinking of!"))
if GuessedNumber == CorrectNumber:
print("Correct!")
break
elif GuessedNumber < CorrectNumber:
print("Too low!")
continue
elif GuessedNumber > CorrectNumber:
print("Too high!")
continue
print("You WIN!!!")
Conclusion
Now that we have come to the end, I hope you have gathered a basic
understanding of what machine learning is and how you can build a
machine learning model in Python. One of the best ways to begin building a
machine learning model is to practice the code, and also try to write similar
code to solve other problems. It is important to remember that the more you
practice, the better you will get. The best way to go about this is to begin
working on simple problem statements and solve them using the different
algorithms. You can also try to solve these problems by identifying newer
ways to solve the problem. Once you get a hang of the basic problems, you
can try using some advanced methods to solve those problems.
Thanks for reading to the end!
Python Machine Learning may be the answer that you are looking for when
it comes to all of these needs and more. It is a simple process that can teach
your machine how to learn on its own, similar to what the human mind can
do, but much faster and more efficient. It has been a game-changer in many
industries, and this guide tried to show you the exact steps that you can take
to make this happen.
There is just so much that a programmer can do when it comes to using
Machine Learning in their coding, and when you add it together with the
Python coding language, you can take it even further, even as a beginner.
The succeeding step is to start putting some of the knowledge that in this
guide to good use. There are a lot of great things that you can do when it
comes to Machine Learning, and when we can combine it with the Python
language, there is nothing that we can’t do when it comes to training our
machine or our computer.
This guide took some time to explore a lot of the different things that you
can do when it comes to Python Machine Learning. We looked at what
Machine Learning is all about, how to work with it, and even a crash course
on using the Python language for the first time. Once that was done, we
moved right into combining the two of these to work with a variety of
Python libraries to get the work done.
You should always work towards exploring different functions and features
in Python, and also try to learn more about the different libraries like SciPy,
NumPy, PyRobotics, and Graphical User Interface packages that you will
be using to build different models.
Python is a high-level language which is both interpreter based and object-
oriented. This makes it easy for anybody to understand how the language
works. You can also extend the programs that you build in Python onto
other platforms. Most of the inbuilt libraries in Python offer a variety of
functions that make it easier to work with large data sets.
You will now have gathered that machine learning is a complex concept
that can easily be understood. It is not a black box that has undecipherable
terms, incomprehensible graphs, or difficult concepts. Machine learning is
easy to understand, and I hope it has helped you understand the basics of
machine learning. You can now begin working on programming and
building models in Python. Ensure that you diligently practice since that is
the only way you can improve your skills as a programmer.
If you have ever wanted to learn how to work with the Python coding
language, or you want to see what Machine Learning can do for you, then
this guide is the ultimate tool that you need! Take a chance to read through
it and see just how powerful Python Machine Learning can be for you.
Python Data Science
THE COMPLETE GUIDE TO DATA
ANALYTICS + MACHINE LEARNING + BIG
DATA SCIENCE + PANDAS PYTHON. THE
EASY WAY TO PROGRAMMING (EXERCISES
INCLUDED).  
Introduction
Data Science might be a relatively new multi-disciplinary field. However,
its integral parts have been individually studied by mathematicians and IT
professionals for decades. Some of these core elements include machine
learning, graph analysis, linear algebra, computational linguistics, and much
more. Because of this seemingly wild combination of mathematics, data
communication, and software engineering, the domain of data science is
highly versatile. Keep in mind that not all data scientists are the same. Each
one of them specializes based on competency and area of expertise. With
that in mind, you might be asking yourself now what's the most important
or powerful, tool for anyone aiming to become a data scientist.
This book will focus on the use of Python because this tool is highly
appreciated within the community of data scientists, and it's easy to start
with. This is a highly versatile programming language that is used in a wide
variety of technical fields, including software development and production.
It is powerful, easy to understand, and can handle any kind of program,
whether small or complex.
Python started out in 1991, and it has nothing to do with snakes. As a fun
fact, this programming language loved by both beginners and professionals
was named this way because its creator was a big fan of Monty Python, a
British comedy group. If you're also one of their fans, you might notice
several references to them inside the code, as well as the language's
documentation. But enough about trivia - we're going to focus on Python
due to its ability to develop quick experimentations and deploy scientific
applications. Here are some of the other core features that explain why
Python is the way to go when learning data science:
Integration: Python can integrate many other tools and even code
written in other programming languages. It can act as a unifying
force that brings together algorithms, data strategies, and languages.
Versatility: Are you a complete beginner who never learned any
kind of programming language, whether procedural or object-
oriented? No problem, Python is considered by many to be the best
tool for aspiring data scientists to grasp the concepts of
programming. You can start coding as soon as you learn the basics!
Power: Python offers every tool you need for data analysis and
more. There is an increasing number of packages and external tools
that can be imported into Python to extend its usability. The
possibilities are truly endless, and that is one of the reasons why
this programming language is so popular in diverse technical fields,
including data science.
Cross-Platform Compatibility: Portability is not a problem, no
matter the platform. Programs and tools written in Python will work
on Windows, Mac, as well as Linux and its many distributions.

Python is a Jack of all trades, master of everything. It easy to learn,


powerful, and easy to integrate with any other tools and languages, and that
is why this book will focus on it when discussing data science and its many
aspects. Now let’s begin by installing Python.
Chapter 1    
Installing Python
Since many aspiring data scientists never used Python before, we’re going
to discuss the installation process to familiarize you with various packages
and distributions that you will need later.
Before we begin, it's worth taking note that there are two versions of
Python, namely Python 2 and Python 3. You can use either of them.
However, Python 3 is the future. Many data scientists still use Python 2, but
the shift to version 3 has been building up gradually. What's important to
keep in mind is that there are various compatibility issues between the two
versions. This means that if you write a program using Python 2 and then
run it inside a Python 3 interpreter, the code might not work. The
developers behind Python have also stopped focusing on Python 2.
Therefore version 3 is the one that is being constantly developed and
improved. With that being said, let's go through the step by the step
installation process.
Step by Step Setup
Start by going to Python's webpage at www.python.org and download
Python. Next, we will go through the manual installation, which requires
several steps and instructions. It is not obligatory to setup Python manually.
However, this gives you great control over the installation, and it's
important for future installations that you will perform independently,
depending on each of your projects' specifications. The easier way of
installing Python is by automatically installing a scientific data distribution,
which sets you up with all the packages and tools you may need (including
a lot that you won't need) therefore if you wish to go through the simplified
installation method, head down to the section about scientific distributions.
When you download Python from the developer's website, make sure to
choose the correct installer depending on your machine's operating system.
Afterward, simply run the installer. Python is now installed. However, it is
not quite ready for our purposes. We will now have to install various
packages. The easiest way to do this is to open the command console and
type "pip" to bring up the package manager. The "easy_install" package
manager is an alternative, but pip is widely considered an improvement. If
you run the commands and nothing happens, it means that you need to
download and install any of these managers. Just head to their respective
websites and go through a basic installation process to get them. But why
bother with a package manager as a beginner?
A package manager like "pip" will make it a lot easier for you to
install/uninstall packages, or roll them back if the package version causes
some incompatibility issues or errors. Because of this advantage of
streamlining the process, most new Python installations come with pip pre-
installed. Now let's learn how to install a package. If you chose "pip,"
simply type the following line in the command console:
pip install < package_name >
If you chose "Easy Install," the process remains the same. Just type:
easy_install < package_name >
Once the command is given, the specified package will be downloaded and
installed together with any other dependencies they require in order to run.
We will go over the most important packages that you will require in a later
section. For now, it’s enough to understand the basic setup process.

Scientific Distributions
As you can see in the previous section, building your working environment
can be somewhat time-consuming. After installing Python, you need to
choose the packages you need for your project and install them one at a
time. Installing many different packages and tools can lead to failed
installations and errors. This can often result in a massive loss of time for an
aspiring data scientist who doesn't fully understand the subtleties behind
certain errors. Finding solutions to them isn't always straightforward. This
is why you have the option of directly downloading and installing a
scientific distribution.
Automatically building and setting up your environment can save you from
spending time and frustration on installations and allow you to jump
straight in. A scientific distribution usually contains all the libraries you
need, an Integrated Development Environment (IDE), and various tools.
Let’s discuss the most popular distributions and their application.
Anaconda
This is probably the most complete scientific distribution offered by
Continuum Analytics. It comes with close to 200 packages pre-installed,
including Matplotlib, Scikit-learn, NumPy, pandas, and more (we'll discuss
these packages a bit later). Anaconda can be used on any machine, no
matter the operating system, and can be installed next to any other
distributions. The purpose is to offer the user everything they need for
analytics, scientific computing, and mass-processing. It's also worth
mentioning that it comes with its own package manager pre-installed, ready
for you to use in order to manage packages. This is a powerful distribution,
and luckily it can be downloaded and installed for free, however, there is an
advanced version that requires purchase.
If you use Anaconda, you will be able to access “conda” in order to install,
update, or remove various packages. This package manager can also be
used to install virtual environments (more on that later). For now, let’s focus
on the commands. First, you need to make sure you are running the latest
version of conda. You can check and update by typing the following
command in the command line:
conda update conda
Now, let’s say you know which package you want to install. Type the
following command:
conda install < package_name >
If you want to install multiple packages, you can list them one after another
in the same command line. Here’s an example:
conda install < package_number_1 > < package_number_2 > <
package_number_3 >
Next, you might need to update some existing packages. This can be done
with the following command:
conda update < package_name >
You also have the ability to update all the packages at once. Simply type:
conda update --all
The last basic command you should be aware of for now is the one for
package removal. Type the following command to uninstall a certain
package:
conda remove < package_name >
This tool is similar to "pip" and "easy install," and even though it's usually
included with Anaconda, it can also be installed separately because it works
with other scientific distributions as well.
Canopy

This is another scientific distribution popular because it’s aimed towards


data scientists and analysts. It also comes with around 200 pre-installed
packages and includes the most popular ones you will use later, such as
Matplotlib and pandas. If you choose to use this distribution instead of
Anaconda, type the following command to install it:
canopy_cli
Keep in mind that you will only have access to the basic version of Canopy
without paying. If you ever require its advanced features, you will have to
download and install the full version.
WinPython

If you are running on a Windows operating system, you might want to give
WinPython a try. This distribution offers similar features as the ones we
discussed earlier. However, it is community-driven. This means that it's an
open-source tool that is entirely free.
You can also install multiple versions of it on the same machine, and it
comes with an IDE pre-installed.
Virtual Environments
Virtual environments are often necessary because you are usually locked to
the version of Python you installed. It doesn’t matter whether you installed
everything manually or you chose to use a distribution - you can’t have as
many installations on the same machine as you might want. The only
exception will be if you are using the WinPython distribution, which is
available only for Windows machines, because it allows you to prepare as
many installations as you want. However, you can create a virtual
environment with the "virtualenv". Create as many different installations as
you need without worrying about any kind of limitations. Here are a few
solid reasons why you should choose a virtual environment:
Testing grounds: It allows you to create a special environment
where you can experiment with different libraries, modules, Python
versions, and so on. This way, you can test anything you can think
of without causing any irreversible damage.
Different versions: There are cases when you need multiple
installations of Python on your computer. There are packages and
tools, for instance, that only work with a certain version. For
instance, if you are running Windows, there are a few useful
packages that will only behave correctly if you are running Python
3.4, which isn’t the most recent update. Through a virtual
environment, you can run different version of Python for separate
goals.
Replicability: Use a virtual environment to make sure you can run
your project on any other computer or version of Python, aside
from the one you were originally using. You might be required to
run your prototype on a certain operating system or Python
installation, instead of the one you are using on your own computer.
With the help of a virtual environment, you can easily replicate
your project and see if it runs under different circumstances.

With that being said, let’s start installing a virtual environment by typing
the following command:
pip install virtualenv
This will install "virtualenv," however, you will first need to make several
preparations before creating the virtual environment. Here are some of the
decisions you have to make at the end of the installation process:
Python version: Decide which version you want “virtualenv” to use.
By default, it will pick up the one it was installed from. Therefore,
if you want to use another Python version, you have to specify by
typing -p python 3.4, for instance.
Package installation: The virtual environment tool is always set to
perform the full package installation process for each environment
even when you already have said package installed on your system.
This can lead to a loss of time and resources. To avoid this issue,
you can use the --system-site-packages command to instruct the
tool to install the packages from the files already available on your
system.
Relocation: For some projects, you might need to move your virtual
environment on a different Python setup or even on another
computer. In that case, you will have to instruct the tool to make the
environment scripts work on any path. This can be achieved with
the --relocatable command.

Once you make all the above decisions, you can finally create a new
environment. Type the following command:
virtualenv myenv
This instruction will create a new directory called “myenv” inside the
location, or directory, where you currently are. Once the virtual
environment is created, you need to launch it by typing these lines:
cd myenv
activate
Necessary Packages
We discussed earlier that the advantages of using Python for data science
are its system compatibility and highly developed system of packages. An
aspiring data scientist will require a diverse set of tools for their projects.
The analytical packages we are going to talk about have been highly
polished and thoroughly tested over the years, and therefore are used by the
majority of data scientists, analysts, and engineers.
Here are the most important packages you will need to install for most of
your work:
NumPy: This analytical library provides the user with support for
multi-dimensional arrays, including the mathematical algorithms
needed to operate on them. Arrays are used for storing data, as well
as for fast matrix operations that are much needed to work out
many data science problems. Python wasn't meant for numerical
computing. Therefore every data scientist needs a package like
NumPy to extend the programming language to include the use of
many high-level mathematical functions. Install this tool by typing
the following command: pip install numpy.
SciPy: You can't read about NumPy without hearing about SciPy.
Why? Because the two complement each other. SciPy is needed to
enable the use of algorithms for image processing, linear algebra,
matrices, and more. Install this tool by typing the following
command: pip install scipy.
pandas:  This library is needed mostly for handling diverse data
tables. Install pandas to be able to load data from any source and
manipulate as needed. Install this tool by typing the following
command: pip install pandas.
Scikit-learn: A much-needed tool for data science and machine
learning, Scikit is probably the most important package in your
toolkit. It is required for data preprocessing; error metrics
supervised and unsupervised learning, and much more. Install this
tool by typing the following command: pip install scikit-learn.
Matplotlib: This package contains everything you need to build
plots from an array. You also have the ability to visualize them
interactively. You don’t happen to know what a plot is? It is a graph
used in statistics and data analysis to display the relation between
variables. This makes Matplotlib an indispensable library for
Python. Install this tool by typing the following command: pip
install matplotlib.
Jupyter: No data scientist is complete without Jupyter. This package
is essentially an IDE (though much more) used in data science and
machine learning everywhere. Unlike IDEs such as Atom, or R
Studio, Jupyter can be used with any programming language. It is
both powerful and versatile because it provides the user with the
ability to perform data visualization in the same environment, and
allows customizable commands. Not only that, it also promotes
collaboration due to its streamlined method of sharing documents.
Install this tool by typing the following command: pip install
jupyter.
Beautiful Soup: Extract information from HTML and XML files
that you have access to online. Install this tool by typing the
following command: pip install beautifulsoup4.

For now, these seven packages should be enough to get you started and give
you an idea of how to extend Python's abilities. You don't have to
overwhelm yourself just yet by installing all of them, however, feel free to
explore and experiment on your own. We will mention and discuss more
packages later in the book as needed to solve our data science problems.
But for now, we need to focus more on Jupyter, because it will be used
throughout the book. So let’s go through the installation, special commands,
and learn how this tool can help you as an aspiring data scientist.
Using Jupyter
Throughout this book, we will use Jupyter to illustrate various operations
we perform and their results. If you didn’t install it yet, let’s start by typing
the following command:
pip install jupyter
The installation itself is straightforward. Simply follow the steps and
instruction you receive during the setup process. Just make sure to
download the correct installer first. Once the setup finishes, we can run the
program by typing the next line:
jupyter notebook
This will open an instance of Jupyter inside your browser. Next, click on
“New” and select the version of Python you are running. As mentioned
earlier, we are going to focus on Python 3. Now you will see an empty
window where you can type your commands.
You might notice that Jupyter uses code cell blocks instead of looking like a
regular text editor. That’s because the program will execute code cell by
cell. This allows you to test and experiment with parts of your code instead
of your entire program. With that being said, let’s give it a test run and type
the following line inside the cell:
In: print (“I’m running a test!”)
Now you can click on the play button that is located under the Cell tab. This
will run your code and give you output, and then a new input cell will
appear. You can also create more cells by hitting the plus button in the
menu. To make it clearer, a typical block looks something like this:
In: < This is where you type your code >
Out: < This is the output you will receive >
The idea is to type your code inside the "In" section and then run it. You can
optionally type in the result you expect to receive inside the "Out" section,
and when you run the code, you will see another "Out" section that displays
the true result. This way, you can also test to see if the code gives you the
result you expect.
Chapter 14: 
                     
Python Libraries to Help with
Data Science
Python is one of the best coding languages that you are able to work with
when you want to do some work with data science. But the regular library
that comes installed with the Python language is not going to be able to
handle all of the work that needs to be done with this field. This doesn’t
mean that you are stuck though. There are many extensions and other
libraries that work with Python, that can do some wonderful things when it
comes to working on data science. When you are ready to start analyzing
some of the data that you have been able to collect and learn some valuable
insights out of them, here are some of the best coding libraries that work
with Python as well. 
NumPy and SciPy
The first part of the Python libraries for data science that we are going to
take a look at is the NumPy, or Numeric and Scientific Computation, and
the SciPy library. NumPy is going to be useful because it is going to help us
lay down the basic premises that we need for scientific computing in
Python. It is going to help us get ahold of functions that are precompiled
and fast to help with numerical and mathematical routines as needed. 
In addition to some of the benefits that we listed out above, NumPy is able
to come in and optimize some of the programming that comes with Python
by adding in some powerful structures for data. This makes it easier for us
to efficiently compute matrices and arrays that are multi-dimensional. 
Scientific Python, which is known as SciPy, is going to be linked together
with NumPy, and it is often that you can’t have one without the other. When
you have SciPy, you can lend a competitive edge to what happens with
NumPy. This happens when you enhance some of the useful functions for
minimization, regression, and more. 
When you want to work with these two libraries, you need to go through the
process of installing the NumPy library first and getting that all setup and
ready to work with Python. From there, you can install the SciPy library
and get to work with using the Python coding language with any of your
goals or projects that include data science. 
Pandas
The second type of Python library that we can use to help out with data
science is going to be known as Pandas, or Python Data Analysis Library.
The name of the library is going to be so important when it shows us how
we can use this kind of library to help us get started. 
Pandas is going to be a tool that is open-sourced and can provide us with
data structures that are easy to use and high in performance and it comes
with all of the tools that you need to complete a data analysis in the Python
code. You can use this particular library to add in data structures and tools
to complete that data analysis, no matter what kind you would like to do.
Many industries like to work with this Python library for data science will
include engineering, social science, statistics, and finance. 
The best part about using this library is that it is adaptable, which helps us
to get more work done. It also works with any kind of data that you were
able to collect for it, including uncategorized, messy, unstructured, and
incomplete data. Even once you have the data, this library is going to step
in and help provide us with all of the tools that we need to slice, reshape,
merge, and more all of the sets of data we have. 
Pandas is going to come with a variety of features that makes it perfect for
data science. Some of the best features that come with the Pandas library
from Python will include:
1. You can use the Pandas library to help reshape the structures
of your data. 
2. You can use the Pandas library to label series, as well as
tabular data, to help us see an automatic alignment of the
data. 
3. You can use the Pandas library to help with heterogeneous
indexing of the data, and it is also useful when it comes to
systematic labeling of the data as well. 
4. You can use this library because it can hold onto the
capabilities of identifying and then fixing any of the data that
is missing. 
5. This library provides us with the ability to load and then save
data from more than one format. 
6. You can easily take some of the data structures that come out
of Python and NumPy and convert them into the objects that
you need to Pandas objects. 

Matplotlib
When you work on your data science, you want to make sure that after
gathering and then analyzing all of the data that is available you also find a
good way to present that information to others so they can gain all of the
insights quickly. Working with visualizations of some sort, depending on
the kind of data you are working with, can make it easier to see what
information is gathered and how different parts are going to be combined
together. 
This is where the Matplotlib is going to come in handy. This is a 2D
plotting library from Python, and it is going to be capable of helping us to
produce publication-quality figures in a variety of formats. You can also see
that it offers a variety of interactive environments across a lot of different
platforms as well. This library can be used with the scripts form Python, the
Python and the IPython shell, the Jupyter notebook, four graphical interface
tool kits, and many servers for web applications. 
The way that this library is going to be able to help us with data science is
that it is able to generate a lot of the visualizations that we need to handle
all of our data, and the results that we get out of the data. This library is able
to help with generating scatterplots, error charts, bar charts, power spectra,
histograms, and plots to name a few. If you need to have some kind of chart
or graph to go along with your data analysis, make sure to check out what
the matplotlib option can do for you. 
Scikit-Learn
Scikit-Learn is going to be a module that works well in Python and can help
with a lot of the state of the art algorithms that are found in machine
learning. These algorithms that work the best with the Scikit-Learn library
will work with medium-scale unsupervised and supervised machine
learning problems so you have a lot of applications to make all of this
work. 
Out of the other libraries that we have talked about in this guidebook, the
Scikit-Learn library is one of the best options from Python when it comes to
machine learning. This package is going to focus on helping us to bring
some more machine learning to non-specialists using a general-purpose
high-level language. With this language, you will find that the primary
emphasis is going to be on things like how easy it is to use, the
performance, the documentation, and the consistency that shows up in the
API. 
Another benefit that comes with this library is that it has a minimal amount
of dependencies and it is easy to distribute. You will find that this library
shows up in many settings that are commercial or academic. Scikit-Learn is
going to expose us to a consistent and concise kind of interface that can
work with some of the most common algorithms that are part of machine
learning, which makes it easier to add in some machine learning to the data
science that you are working with. 
Theano
Theano is another great library to work with during data science, and it is
often seen as one of the highly-rated libraries to get this work done. In this
library, you will get the benefit of defining, optimizing, and then evaluating
many different types of mathematical expressions that come with multi-
dimensional arrays in an efficient manner. This library is able to use lots of
GPUs and perform symbolic differentiation in a more efficient manner. 
Theano is a great library to learn how to use, but it does come with a
learning curve that is pretty steep, especially for most of the people who
have learned how to work with Python because declaring the variables and
building up some of the functions that you want to work with will be quite a
bit different from the premises that you learn in Python. 
However, this doesn’t mean that the process is impossible. It just means that
you need to take a bit longer to learn how to make this happen. With some
good tutorials and examples, it is possible for someone who is brand new to
Theano to get this coding all done. Many great libraries that come with
Python, including Padas and NumPy, will be able to make this a bit easier
as well. 
TensorFlow
TensorFlow, one of the best Python libraries for data science, is a library
that was released by Google Brain. It was written out mostly in the
language of C++, but it is going to include some bindings in Python, so the
performance is not something that you are going to need to worry about.
One of the best features that come to this library is going to be some of the
flexible architecture that is found in the mix, which is going to allow the
programmer to deploy it with one or more GPUs or CPUs in a desktop,
mobile, or server device, while using the same API the whole time. 
Not many, if any, of the other libraries that we are using in this chapter, will
be able to make this kind of claim. This library is also unique in that it was
developed by the Google Brain project, and it is not used by many other
programmers. However, you do need to spend a bit more time to learn the
API compared to some of the other libraries. In just a few minutes, you will
find that it is possible to work with this TensorFlow library in order to
implement the design of your network, without having to fight through the
API like you do with other options. 
Keras
Keras is going to be an open-sourced library form Python that is able to
help you to build up your own neural networks, at a high level of the
interface. It is going to be pretty minimalistic, which makes it easier to
work with, and the coding on this library is going to be simple and
straightforward, while still adding in some of the high-level extensibility
that you need. It is going to work either TensorFlow or Theano along with
CNTK as the backend to make this work better. We can remember that the
API that comes with Keras is designed for humans to use, rather than
humans, which makes it easier to use and puts the experience of the user
right in front. 
Keras is going to follow what are known as the best practices when it
comes to reducing the cognitive load. This Python library is going to offer a
consistent and simple APIs to help minimize how many actions the user has
to do for many of the common parts of the code, and it also helps to provide
feedback that is actionable and clear if an error does show up. 
In this library, we find that the model is going to be understood as a
sequence, or it can be a graph of standalone, fully-configurable modules
that you are able to put together with very few restrictions at the time.
Neural layers, optimizers, activation functions, initialization schemes, cost
functions, and regularization schemes are going to be examples of the
standalone modules that are combined to create a new model. You will also
find that Keras is going to make creating a new module simple, and existing
module that are there can provide us with lots of examples to work with. 
Caffe
The final Python library that we will take a look at in order to do some
work with data science is going to be Caffe. This is a good machine
learning library to work with when you want to focus your attention on
computer vision. Programmers like to use this to create some deep neural
networks that are able to recognize objects that are found in images and it
has been explored to help recognize a visual style as well. 
Caffe is able to offer us an integration that is seamless with GPU training
and then is highly recommended any time that you would like to complete
your training with some images. Although this library is going to be
preferred for things like research and academics, it is going to have a lot of
scope to help with models of training for production as well. The expressive
architecture that comes with it is going to encourage application and
innovation as well. 
In this kind of library, you are going to find that the models will be
optimized and then defined through configuration without hard coding in
the process. You can even switch between the CPU and the GPU by setting
a single flag to train on a GPU machine, and then go through and deploy to
commodity clusters, or even to mobile devices. 
These are just a few of the different libraries that you are able to use when it
comes to working on Python, and they will ensure that you are going to see
the best results any time that you want to explore a bit with data science.
While the traditional form of the Python library, the one that comes with the
original download, is not going to be able to handle some of the different
parts that come with data science, you can easily download and add on
these other Python libraries and see exactly what steps they can help with
when it comes to gathering, cleaning, analyzing, and using the data that you
have with data science. 
Chapter 15: 
                     
Python Functions
Python functions are a good way of organizing the structure of our code.
The functions can be used for grouping sections of code that are related.
The work of functions in any programming language is to improve the
modularity of code and make it possible to reuse code. 
Python comes with many in-built functions. A good example of such a
function is the “print()” function which we use for displaying the contents
on the screen. Despite this, it is possible for us to create our own functions
in Python. Such functions are referred to as the “user-defined functions”. 
To define a function, we use the “def” keyword which is then followed by
the name of the function, and then the parenthesis (()). 
The parameters or the input arguments have to be placed inside the
parenthesis. The parameters can also be defined within parenthesis. The
function has a body or the code block and this must begin with a colon (:)
and it has to be indented. It is good for you to note that the default setting is
that the arguments have a positional behavior. This means that they should
be passed while following the order in which you defined them. 
Example:
#!/usr/bin/python3
def functionExample():
print('The function code to run')
bz = 10 + 23
print(bz)
We have defined a function named functionExample. The parameters of a
function are like the variables for the function. The parameters are usually
added inside the parenthesis, but our above function has no parameters.
When you run above code, nothing will happen since we simply defined the
function and specified what it should do. The function can be called as
shown below:
#!/usr/bin/python3
def functionExample():
print('The function code to run')
bz = 10 + 23
functionExample()
It will print this:

That is how we can have a basic Python function. 


Function Parameters
You can dynamically define arguments for a function. Example:
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(10,5)
The code returns the following result:

 
We defined a function named addFunction. The function takes two
parameters namely n1 and n2. We have another variable named result which
is the sum of the two function parameters. In the last statement, we have
called the function and passed the values for the two parameters. The
function will calculate the value of variable result by adding the two
numbers. We finally get the result shown above. 
Note that during our function definition, we specified two parameters, n1
and n2. Try to call the function will either more than two parameters, or 1
parameter and see what happens. Example:
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(5)
In the last statement in our code above, we have passed only one argument
to the function, that is, 5. The program gives an error when executed:

The error message simply tells us one argument is missing. What if we run
it with more than two arguments?
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(5,10,9)
We also get an error message:
The error message tells us the function expects two arguments but we have
passed 3 to it. 
In most programming languages, parameters to a function can be passed
either by reference or by value. Python supports parameter passing only by
reference. This means if what the parameter refers to is changed in the
function; the same change will also be reflected in the calling function.
Example:
#!/usr/bin/python3
def referenceFunction(ls1):
print ("List values before change: ", ls1)
ls1[0]=800
print ("List values after change: ", ls1)
return
# Calling the function
ls1 = [940,1209,6734]
referenceFunction( ls1 )
print ("Values outside function: ", ls1)
The code gives this result:

What we have done in this example is that we have maintained the


reference of the objects which are being passed and then values have been
appended to the same function. 
In next example below, we are passing by reference then the same reference
will be overwritten inside the same function which has been called:
#!/usr/bin/python3
def referenceFunction( ls1 ):
ls1 = [11,21,31,41]
print ("Values inside the function: ", ls1)
return
ls1 = [51,91,81]
referenceFunction( ls1 )
print ("Values outside function: ", ls1)
The code gives this result:

 
Note that the “ls1” parameter will be local to the function
“referenceFunction”. Even if this is changed within the function, the “ls1”
will not be affected in any way. As the output shows above, the function
helps us achieve nothing. 
Function Parameter Defaults
There are default parameters for functions, which the function creator can
use in his or her functions. This means that one has the choice of using the
default parameters, or even using the ones they need to use by specifying
them. To use the default parameters, the parameters having defaults are
expected to be last ones written in function parameters. Example:
#!/usr/bin/python3
def myFunction(n1, n2=6):
pass
In above example, the parameter n2 has been given a default value unlike
parameter n1. The parameter n2 has been written as the last one in the
function parameters. The values for such a function may be accessed as
follows:
#!/usr/bin/python3
def windowFunction(width,height,font='TNR'):
# printing everything
print(width,height,font)
windowFunction(245,278)
The code outputs the following:

The parameter font had been given a default value, that is, TNR. In the last
line of the above code, we have passed only two parameters to the function,
that is, the values for width and height parameters. However, after calling
the function, it returned the values for the three parameters. This means for
a parameter with default, we don’t need to specify its value or even mention
it when calling the function. 
However, it’s still possible for you to specify the value for the parameter
during function call. You can specify a different value to what had been
specified as the default and you will get the new one as value of the
parameter. Example:
#!/usr/bin/python3
def windowFunction(width,height,font='TNR'):
# printing everything
print(width,height,font)
windowFunction(245,278,'GEO')
The program outputs this:
Above, the value for parameter was given the default value “TNR”. When
calling the function in the last line of the code, we specified a different
value for this parameter, which is “GEO”. The code returned the value as
“GEO”. The default value was overridden. 

Chapter 16: 
                     
The Basics of Working with
Python
Before we start working with machine algorithms, you should first
understand the basics of working with Python. However, if you are already
familiar with Python or you have experience programming in other
languages such as C++ or C#, you can probably skip this chapter or simply
use it to refresh your memory.
In this chapter we are going to discuss the basic concepts of working with
Python briefly. Machine learning and Python go hand in hand due to the
simple fact that Python is a simple but powerful and versatile language.
Furthermore, there are many modules, packages, and tools designed to
expand Python's functionality to specifically work with machine learning
algorithms, as well as data science.
Keep in mind that this is a brief introduction to Python, and therefore we
will not be using any IDE’s or fancy tools. All you need is the Python shell,
in order to test and experiment with your code as you learn. You don’t even
need to install anything on your computer because you can simply head to
Python’s official website and use their online shell. You can find it here:
https://www.python.org/shell/.
Data Types
Knowing the basic data types and how they work is a must. Python has
several data types, and in this section, we will go through a brief description
of each one and then see them in practice. Don't forget to also practice on
your own, especially if you know nothing or very little about Python.
With that in mind, let's explore strings, numbers, dictionaries, lists, and
more!
Numbers
In Python, just like in math in general, you have several categories of
numbers to work with, and when you work them into code, you have to
specify which one you're referring to. For instance, there are integers, floats,
longs, and others. However, the most commonly used ones are integers and
floats.
Integers, written int for short, are whole numbers that can either be positive
or negative. So make sure that when you declare a number as an integer,
you don't type a float instead. Floats are decimal or fractional numbers.
Now let's discuss the mathematical operators. Just like in elementary
school, you will often work using basic mathematical operators such as
adding, subtracting, multiplication, and so on. Keep in mind that these are
different from the comparison operators, such as greater than or less than or
equal to. Now let's see some examples in code:
x = 99
y = 26
print (x + y)
This basic operation simply prints the sum of x and y. You can use this
syntax for all the other mathematical operators, no matter how complex
your calculation is. Now let’s type a command using a comparison operator
instead:
x = 99
y = 26
print (x > 100)
As you can see, the syntax is the same. However, we aren't performing a
calculation. Instead, we are verifying whether the value of x is greater than
100. The result you will get is "false" because 99 is not greater than 100.
Next, you will learn what strings are and how you can work with them.
Strings
Strings have everything to do with text, whether it's a letter, number, or
punctuation mark. However, take note that numbers written as strings are
not the same as the numbers data type. Anything can be defined as a string,
but to do so you need to place quotation marks before and after your
declaration. Let's take a look at the syntax:
n = “20”
x = 10
Notice that our n variable is a string data type and not a number, while x is
defined as an integer because it lacks the quotation marks. There are many
operations you can do on strings. For instance, you can verify how long a
string is, or you can concatenate several strings. Let's see how many
characters there are in the word "hello" by using the following function:
len (“Hello”)
The “len” function is used to determine the number of characters, which in
this case is five. Here’s an example of string concatenation. You’ll notice
that it looks similar to a mathematical operation, but with text:
‘42 ’ + ‘is ’ + ‘the ’ + ‘answer’
The result will be “42 is the answer”. Pay attention to the syntax, because
you will notice we left a space after each string, minus the last one. Spaces
are taken into consideration when writing strings. If we didn’t add them, all
of our strings would be concatenated into one word.
Another popular operation is the string iteration. Here’s an example:
bookTittle = “Lord of the Rings”
for x in book: print c
The result will be an iteration of every single character found in the string.
Python contains many more string operations. However, these are the ones
you will use most often.
Now let’s progress to lists.
Lists
This is a data type that you will often be using. Lists are needed to store
data, and they can be manipulated as needed. Furthermore, you can store
objects of different types in them. Here's what a Python list looks like:
n = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
The square brackets define the list, and every object separated by a comma
is a list element. Here's an example of a list containing different data types:
myBook = [“title”, “somePages”, 1, 2.1, 5, 22, 42]
This is a list that holds string objects as well as integers and floats. You can
also perform a number of operations on lists, and most of them follow the
same syntax as for the strings. Try them out!
Dictionaries
This data type is nearly identical to a list. However, you cannot access the
elements the same way. What you need is to know the key, which is linked
to a dictionary object. Take a look at the following example:
dict = {‘weapon’ : ‘sword’, ‘soldier’ : ‘archer’}
dict [‘weapon’]
The first line contains the dictionary's definition, and as you can see, the
objects and their keys have to be stored between curly braces. You can
identify the keys as "weapon" and "soldier" because, after them, you need
to place a colon, followed by the attribute. Keep in mind that while in this
example, our keys are, in fact strings, they can be other data types as well.
Tuples
This data type is similar to a list, except its elements cannot be changed
once defined. Here’s an example of a tuple:
n = (1, 43, ‘someText’, 99, [1, 2, 3])
A tuple is defined between parentheses, and in this case, we have three
different data types, namely a few integers, a string, and a list. You can
perform a number of operations on a tuple, and most of them are the same
as for the lists and strings. They are similar data types, except that once you
declare the tuple, you cannot change it later.
Conditional Statements
Now that you know the basic data types, it’s time to take a crash course on
more complex operations that involve conditional statements. A conditional
statement is used to give an application a limited ability to think for itself
and make a decision based on their assessment of the situation. In other
words, it analyzes the condition required by a variable in order to tell the
program to react based on the outcome of that analysis.
Python statements are simple to understand because they are logical, and
the syntax reflects human thinking. For instance, the syntax written in
English looks like this "If I don't feel well, I won't go anywhere else. I will
have to go to work." In this example, we instruct the program to check
whether you feel well. If the statement is valued as false, it means you feel
well, and therefore, it will progress to the next line, which is an "else"
statement. Both “if” and “if else” conditionals are frequently used when
programming in general. Here’s an example of the syntax:
x = 100
if (x < 100):
print(“x is small”)
This is the most basic form of the statement. It checks whether it's true, and
if it is, then something will happen, and if it's not, then nothing will happen.
Here's an example using the else statement as well:
x = 100
if (x < 100):
print(“x is small”)
else:
print(“x is large”)
print (“Print this no matter what”)
With the added “else” keyword, we instruct the application to perform a
different task if a false value is returned. Furthermore, we have a separate
declaration that lies outside of the conditional statement. This will be
executed no matter the outcome.
Another type of conditional involves the use of "elif" which allows the
application to analyze a number of statements before it makes a decision.
Here's an example:
if (condition1):
add a statement here
elif (condition2):
add another statement for this condition
elif (condition3):
add another statement for this condition
else:
if none of the conditions apply, do this
Take note that this time we did not use code. You already know enough
about Python syntax and conditionals to turn all of this into code. What we
have here is the pseudo-code, which is very handy, whether you are writing
simple Python exercises or working with machine learning algorithms.
Pseudocode allows you to place your thoughts on "paper" by following the
Python programming structure. This makes it a lot easier for you to
organize your ideas and your application by writing the code after you've
outlined it. With that being said, here's the actual code:
x = 10
if (x > 10):
print (“x is larger than ten”)
elif x < 4:
print (“x is smaller”)
else:
print (“x is pretty small”)
Now you have everything you need to know about conditionals. Use them
in combination with what you learned about data types in order to practice.
Keep in mind that you always need to practice these basic Python concepts
in order to understand later how machine learning algorithms work.
Loops
Code sometimes needs to be executed repeatedly until a specific condition
is met. This is what loops are for. There are two types, the for loop and the
while loop. Let’s begin with the first example:
for x  in range(1, 10):
print(x)
This code will be executed several times, printing the value of X each time,
until it reaches ten. 
The while loop, on the other hand, is used to repeat the execution of a code
block only if the condition we set is still true. Therefore, when the condition
is no longer met, the loop will break, and the application will continue with
the next lines of code. Here's a while loop in action:
x=1
while x < 10:
print(x)
x += 1
The x variable is declared as an integer, and then we instruct the program
that as long as x is less than ten, the result should be printed. Take note that
if you do not continue with any other statement at this point, you will create
an infinite loop, and that is not something you want. The final statement
makes sure that the application will print the new value with one added to it
with every execution. When the variable stops being less than ten, the
condition will no longer be met, and the loop will break, allowing the
application to continue executing any code that follows.
Keep in mind that infinite loops can easily happen due to mistakes and
oversight. Luckily, Python has a solution, namely the "break" statement,
which should be placed at the end of the loop. Here's an example:
while True:
answer = input (“Type command:”)
if answer == “Yes”:
break
Now the loop can be broken by typing a command.
Functions
As a beginner machine learner, this is the final Python component you need
to understand before learning the cool stuff. Functions allow you to make
your programs a great deal more efficient, optimized, and easier to work
with. They can significantly reduce the amount of code you have to type,
and therefore make the application less demanding when it comes to system
resources. Here's an example of the most basic function to get an idea about
the syntax:
def myFunction():
print(“Hello, I am now a function!”)
Functions are first declared by using the “def” statement, followed by its
name. Whenever we want to call this block of code, we simply call the
function instead of writing the whole code again. For instance, you simply
type:
myFunction()
The parentheses after the function represent the section where you can store
a number of parameters. They can alter the definition of the function like
this:
def myName(firstname):
print(firstname + “ Smith”)
myName(“Andrew”)
myName(“Peter”)
myName(“Sam”)
Here we have a first name parameter, and whenever we call the function to
print its parameter, it does so together with the addition of the word
"Smith". Take note that this is a really basic example just so you get a feel
for the syntax. More complex function are written the same way, however.
Here’s another example where we have a default parameter, which will be
called only if there is nothing else to be executed in its place.
def myHobby(hobby = “leatherworking”):
print (“My hobby is “ + hobby)
myHobby (“archery”)
myHobby (“gaming”)
myHobby ()
myHobby (“fishing”)
Now let’s call the function:
My hobby is archery
My hobby is gaming
My hobby is leatherworking
My hobby is fishing
You can see here how the default parameter is used when we lack a
specification.
Here you can see that the function without a parameter will use the default
value we set.
In addition, you can also have functions that return something. For now, we
only wrote functions that perform an action, but they don't return any values
or results. These functions are far more useful because the result can then
be placed into a variable that will later be used in another operation. Here's
how the syntax looks in this case:
def square(x):
return x * x
print(square (5))
Now that you've gone through a brief Python crash course and you
understand the basics, it's time to learn how to use the right tools and how
to set up your machine learning environment. Don't forget that Python is
only one component of machine learning. However, it's an important one
because it's the foundation, and without it, everything falls apart.
Chapter 17: 
                     
Data Structures and the A*
Algorithm
In this chapter, you will learn how to create abstract data structures using
the same Python data types you already know. Abstract data structures
allow your programs to process data in intuitive ways and rely on the Don't
Repeat Yourself (DRY) principle. That is, using less code and not typing
out the same operations repeatedly for each case. As you study the
examples given, you will begin to notice a pattern emerging: the use of
classes that complement each other with one acting as a node and another as
a container of nodes. In computer science, a data structure that uses nodes is
generally referred to as a tree. There are many different types of trees, each
with specialized use cases. You may have already heard of binary trees if
you are interested in programming or computer science at all.
One possible type of tree is called an n-ary tree, or n-dimensional tree.
Unlike the binary tree, the n-ary tree contains nodes that have an arbitrary
number of children. A child is simply another instance of a node that is
linked to another node, sometimes called a parent. The parent must have
some mechanism for linking up to child nodes. The easiest way to do this is
with a list of objects.
Example Coding #1: A Mock File-System
A natural application of the n-ary tree is a traditional windows or UNIX file
system. Nodes can be either folders, directories, or individual files. To keep
things simple, the following program assumes a single directory as the tree's
root.
# ch1a.py
The FileSystem acts as the tree, and the Node class does most of the work,
which is common with tree data structures. Notice also that FileSystem
keeps track of individual ID’s for each node. The ID’s can be used as a way
to quantify the number of nodes in the file system or to provide lookup
functionality.
When it comes to trees, the most onerous task is usually programming a
solution for traversal. The usual way a tree is structured is with a single
node as root, and from that single node, the rest of the tree can be accessed.
Here the function look_up_parent uses a loop to traverse the mock directory
structure, but it can easily be adapted to a recursive solution as well. 
General usage of the program is as follows: initiate the FileSystem class,
declare Node objects with the directory syntax (in this case backslash so
Python won’t mistake it for escape characters), and then calling the add
method on them.
Example Coding # 2: Binary Search Tree (BST)
The binary search tree gets its name from the fact that a node can contain at
most two children. While this may sound like a restriction, it is actually a
good one because the tree becomes intuitive to traverse. An n-ary tree, in
contrast, can be messy.
# ch1b.py
As before, the Node class does most of the heavy lifting. This program uses
a BST primarily to sort a list of numbers but can be generalized to sorting
any data type. There are also a number of auxiliary methods for finding out
the size of the tree and which nodes are childless (leaves). 
This implementation of a tree better illustrates the role that recursion takes
when traversing a tree at each node calls a method (for example, insert) and
creates a chain until a base case is reached. 
Example Coding # 3: A* Algorithm
The A* star search algorithm is considered the same as the Dijkstra
algorithm but with brains. Whereas Dijkstra searches almost exhaustedly
until the path is found, A* uses what is called a heuristic, which is a fancy
way of saying “educated guess.” A* is fast because it is able to point an
arrow at the target (using the heuristic) and find steps on that path.
First, here's a brief explanation of the algorithm. To simplify things, we will
be using a square grid with orthogonal movement only (no diagonals). The
object of A* is to find the shortest path between point A and point B. That
is, we know the position of point B. This will be the end node and A the
start. In order to get from A to B, the algorithm must calculate distances of
nodes between A and B such that each node gets closer to B or is discarded.
An easy way to program this is by using a heap or priority queue and using
some measure of distance to sort order.
After the first node is added to the heap, each neighbor node will be
evaluated for distance, and the closest one to B is added to the heap. The
process repeats until the node is equal to B.
#ch1c.py
In this case, the heuristic is called Manhattan distance, which is just the
absolute value between the current node and the target. The heapq library is
being used to create a priority queue with f as the priority. Note that the
backtrace function is simply traversing a tree of nodes that each has a single
parent. 
You can think of the g variable is the cost of moving from the starting point
to somewhere along the path. Since we are using a grid with no variation in
movement, cost g can be constant. The h variable is the estimated distance
between the current node and the target. Adding these two together gives
you the f variable, which is what controls the order of nodes on the path.

Chapter 18: 
                     
Reading data in your script
Reading data from file
Let's make our data file using Microsoft Excel, LibreOffice Calc, or some
other spreadsheet application and save it in a tab-delimited file
ingredients.txt

Food c f protein calories serving


a a size
r t
b
pasta 3 1 7 210 56
9
parmesan 0 1 2 20 5
grated .
5
Sour cream 1 5 1 60 30
Chicken 0 3 22 120 112
breast
Potato 2 0 3 110 148
8
Fire up your IPython notebook server. Using the New drop-down menu in
the top right corner, create a new Python3 notebook and type the following
Python program into a code cell:
#open file ingredients.txt
with open('ingredients.txt', 'rt') as f:
for line in f:    #read lines until the end of file
print(line)  #print each line
Remember that indent is important in Python programs and designates
nested operators. Run the program using the menu option Cell/Run, the
right arrow button, or the Shift-Enter keyboard shortcut. You can have
many code cells in your IPython notebooks, but only the currently selected
cell is run. Variables generated by previously run cells are accessible, but if
you just downloaded a notebook, you need to run all the cells that initialize
variables used in the current cell. You can run all the code cells in the
notebook by using the menu option Cell/Run All or Cell/Run All Above
This program will open a file called "ingredients" and print it line by line.
Operatorwithis a context manager - it opens the file and makes it known to
the nested operator's asf. Here, it is used as an idiom to ensure that the file
is closed automatically after we are done reading it. Indentation before is
required - it shows that for is nested in with and has access to the variable f
designating the file. Function print is nested inside for which means it will
be executed for every line read from the file until the end of the file is
reached, and the for cycle quits. It takes just three lines of Python code to
iterate over a file of any length.
Now, let's extract fields from every line. To do this, we will need to use a
string's method split() that splits a line and returns a list of substrings. By
default, it splits the line at every white space character, but the tab character
delimits our data - so we will use tab to split the fields. The tab character is
designated\t in Python.
with open('ingredients.txt', 'rt') as f:
for line in f:
fields=line.split('\t') #split line in separate fields
print(fields)          #print the fields
The output of this code is:
['food', 'carb', 'fat', 'protein', 'calories', 'serving size\n']
['pasta', '39', '1', '7', '210', '56\n']
['parmesan grated', '0', '1.5', '2', '20', '5\n']
['Sour cream', '1', '5', '1', '60', '30\n']
['Chicken breast', '0', '3', '22', '120', '112\n']
['Potato', '28', '0', '3', '110', '148\n']
Now, each string is split conveniently into lists of fields. The last field
contains a pesky\ncharacter designating the end of line. We will remove it
using the strip() method that strips white space characters from both ends of
a string.
After splitting the string into a list of fields, we can access each field using
an indexing operation. For example, fields[0] will give us the first field in
which a food’s name is found. In Python, the first element of a list or an
array has an index 0.
This data is not directly usable yet. All the fields, including those
containing numbers, are represented by strings of characters. This is
indicated by single quotes surrounding the numbers. We want food names
to be strings, but the amounts of nutrients, calories, and serving sizes must
be numbers so we could sort them and do calculations with them. Another
problem is that the first line holds column names. We need to treat it
differently.
One way to do it is to use file object's methodreadline()to read the first line
before entering the for loop. Another method is to use function enumerate()
which will return not only a line but also its number starting with zero:
with open('ingredients.txt', 'rt') as f:
#get line number and a line itself
#in i and line respectively
for i,line in enumerate(f):
fields=line.strip().split('\t')#split line into fields
print(i,fields)    #print line number and the
fields
This program produces following output:
0 ['food', 'carb', 'fat', 'protein', 'calories', 'serving size']
1 ['pasta', '39', '1', '7', '210', '56']
2 ['parmesan grated', '0', '1.5', '2', '20', '5']
3 ['Sour cream', '1', '5', '1', '60', '30']
4 ['Chicken breast', '0', '3', '22', '120', '112']
5 ['Potato', '28', '0', '3', '110', '148']
Now we know the number of a current line and can treat the first line
differently from all the others. Let’s use this knowledge to convert our data
from strings to numbers. To do this, Python has function float(). We have to
convert more than one field so we will use a powerful Python feature called
list comprehension.
with open('ingredients.txt', 'rt') as f:
for i,line in enumerate(f):
fields=line.strip().split('\t')
if i==0:                # if it is the first line      
print(i,fields)     # treat it as a header
continue            # go to the next line
food=fields[0]          # keep food name in food
#convert numeric fields no numbers
numbers=[float(n) for n in fields[1:]]
#print line numbers, food name, and nutritional values     
print(i,food,numbers)  
Operatoriftests if the condition is true. To check for equality, you need to
use==. The index is only 0 for the first line, and it is treated differently. We
split it into fields, print, and skip the rest of the cycle using the continue
operator.
Lines describing foods are treated differently. After splitting the line into
fields, fields[0]receives the food's name. We keep it in the variable food. All
other fields contain numbers and must be converted.
In Python, we can easily get a subset of a list by using a slicing mechanism.
For instance,list1[x:y] means that a list of every element in list1 -starting
with index and ending with y-1. (You can also include stride, see help). If x
is omitted, the slice will contain elements from the beginning of the list up
to the elements-1. If y is omitted, the slice goes from element x to the end
of the list. Expressionfields[1:]means every field except the firstfields[0].
numbers=[float(n) for n in fields[1:]]
means we create a new list number by iterating from the second element in
the fields and converting them to floating-point numbers.
Finally, we want to reassemble the food's name with its nutritional values
already converted to numbers. To do this, we can create a list containing a
single element - food's name - and add a list containing nutrition data. In
Python, adding lists concatenates them.
[food]+ numbers
Dealing with corrupt data
Sometimes, just one line in a huge file is formatted incorrectly. For
instance, it might contain a string that could not be converted to a number.
Unless handled properly, such situations will force a program to crash. In
order to handle such situations, we must use Python's exception handling.
Parts of a program that might fail should be embedded into atry ... except
block. In our program, one such error-prone part is the conversion of strings
into numbers.
numbers=[float(n) for n in fields[1:]]
Lets insulate this line:
with open('ingredients.txt', 'rt') as f:
for i,line in enumerate(f):
fields=line.strip().split('\t')
if i==0:                 
print(i,fields)
continue   
food=fields[0]
try:                # Watch out for errors!
numbers=[float(n) for n in fields[1:]]
except:             # if there is an error
print(i,line)   # print offenfing lile and its number
print(i,fields) # print how it was split
continue        # go to the next line without crashin
print(i,food,numbers)  

Chapter 19: 
                     
Manipulating data
Sorting data
In order to do something meaningful with the data, we need a container to
hold it. Let’s store information for each food in a list, and create a list of
these lists to represent all the foods. Having all the data conveniently in one
list allows us to sort it easily.
data=[]      # create an empty list to hold data
with open('ingredients.txt', 'rt') as f:
for i,line in enumerate(f):
fields=line.strip().split('\t')
if i==0:                      
header=fields      #remember a header
continue           
food=fields[0].lower() #convert to lower case
try:           
numbers=[float(n) for n in fields[1:]]
except:
print(i,line)
print(i,fields)
continue
#append food info to data list       
data.append([food]+numbers)
# Sort list in place by food name
data.sort(key=lambda a:a[3]/a[4], reverse=True)
for food in data: #iterate over the sorted list of foods
print(food)   #print info for each food
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
data=[]creates an empty list and theappend()method appends new variables
to the list.sort()method sorts lists in place. If the list contains simple values
(such as numbers or strings), they are sorted from small to large or
alphabetically by default. We have a list of complex data and it is not
obvious how to sort it. So, we pass akeyparameter to thesort() method. This
parameter is a function that takes an element of the list and returns a simple
value that is used to order the elements in the list. In our case, we used a
simple nameless lambda function that took record for each food and
returned the first element, which is the food's name. So we ended up with
the list sorted alphabetically.
We could also sort the list by the second value, which represents the amount
of carbohydrates per serving. All we have to do is change the lambda
function that calculates the key:
data.sort(key=lambda a:a[1])
This will return foods in different order:
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
Of course, sorting by amount of carbohydrates per serving doesn't make
much sense because serving sizes might be as different as 5 grams for
parmesan and 148 grams for potatoes. Perhaps, ordering foods by amount
of protein per calorie might make more sense; whereby, the value would be
reflecting the "healthiness" of the food. Once again, all we need to do is to
change the key function:
data.sort(key=lambda a:a[3]/a[4])
The output is
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
We have the "unhealthiest" food on top. Perhaps, we want to start with the
healthiest one. To do this we need to provide another parameter for thesort()
method – reverse.
data.sort(key=lambda a:a[3]/a[4], reverse=True)
This will reverse the list.
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
Although it is easy to sort by one or several columns in traditional
spreadsheet applications, it is much harder to sort by complex expressions
that require calculations on values from several columns. Python allows you
to easily do it.
Filtering data
Having our data in a list allows us to filter it with one line of code using list
comprehension, but, this time, we will use new a option for list
comprehension - anif that allows us to exclude some elements from the new
list:
data_filtered=[a for a in data if a[3]/a[4]>0.09]
for food in data_filtered:
print(food)
The filtered list is:
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
Chapter 20: 
                     
Probability – Fundamental –
Statistics – Data Types
Things are quite straightforward in Knowledge Representation and
Reasoning; KR&R. Exclusive of doubt, formulating and representing
propositions is easy. The thing is, when uncertainty makes itself known,
problems begin to arise – for example, an expert system designed to replace
a doctor. For diagnosing patients, a doctor possesses no formal knowledge
of treating the patient and no official rules based off of symptoms. In this
situation, to determine if the patient has a specific condition and also the
cure for it, it is the probability the expert system will use to formulate the
highest probability chance.
Real-Life Probability Examples
As a mathematical term, probability has to do with the possibility that an
event may occur like taking out from a bag of assorted colors a piece of
green or drawing an ace from a deck of cards. In all daily decision-making
process, you use probability even without having a clue of the
consequences. While you may determine the best course of action is to
make judgment calls using subjective probability, you may not perform
actual probability problems sometimes. 
Organize around the weather
You can make plans with the weather in mind since you use probability
almost every day. Predicting the weather condition is not possible for
meteorologists and as a result, to establish the possibility that there will be
snow, hail, or rain, they utilize instruments and tools. For example, it has
rained with the conditions of the weather that is 60 out of 100 days amid the
same conditions when there is a 60 percent chance of rain. Intuitively, rather
than going to work with an umbrella or putting on sandals, closed-toed
shoes, maybe preferred outfit to wear. Also, not only do meteorologists
analyze probable weather patterns for that week or day but with the
historical databases that they also examine to calculate approximately low
and high temperatures.
Strategies in sports
For competitions and games, the probability is what coaches and athletes
utilize to influence the best strategies for sports. When putting any player in
the lineup, a coach of baseball evaluates the batting average of such a
player. For example, out of every ten at-bats, an athlete may get a base hit
two if the player’s batting average is 200. The odd is even higher for a
player to even have, out of every ten at-bats, four hits when such a player
has a 400-batting average. Another example is when; field goal attempts
from over 40 yards out of 15, a high-school football kicker makes nine in a
season, his next goal effort from the same space may be about 60 percent
chance. We can have an equation like this:
9/15 = 0.60 or 60 percent
Insurance option
To conclude on the plans that are best for your family and even for you and
the required deductible amounts, probability plays a vital role in analyzing
insurance policies. For example, you make use of probability to know how
possible it can be that you will need to make a declaration when you choose
a car insurance policy. You may likely make consideration for not only
liability but comprehensive insurance on your car when 12 percent or of
every 100 drivers over the past year, 12 out of them in your community
have crashed into a deer. Also, if following a deer-connected event run
$2,8000, not to be in a situation where you cannot afford to cover certain
expenses, you might consider a lower deductible on car repairs.
Recreational and games activities
Probability is what you use when you engage in video or card games or
play board games that has the involvement of chance or luck. A required
video game covert missile or the chances of getting the cards you need in
poker is what you must weigh. Also, the determination of the extent of the
risk you will be eager to take rests on the possibility of getting those tokens
or cards. For example, as Wolfram Math World suggests, getting three of a
class in a poker hand is the odds of 46.3-to-1, about a chance of 2 percent.
However, you will have about 42 percent or 1.4-to-1 odds that you will
catch one pair. It is through the help of probability that you settle on the
manner with which you intend to play the game when you assess what is at
stake.
Statistics
The basis of modern science is on the statements of probability and
statistical significance. In one example, according to studies, cigarette
smokers have a 20 times greater likelihood of developing lung cancer than
those that don’t smoke. In another research, the next 200,000 years will
have the possibility of a catastrophic meteorite impact on Earth. Also,
against the second male children, the first-born male children exhibit IQ test
scores of 2.82 points. But, why do scientists talk in ambiguous expressions?
Why don’t they say it that lung cancer is as a result of cigarette smoking?
And they could have informed people if there needs to be an establishment
of a colony on the moon to escape the disaster of the extraterrestrial. 
The rationale behind these recent analyses is an accurate reflection of the
data. It is not common to have absolute conclusions in scientific data. Some
smokers can reduce the risk of lung cancer if they quit, while some smokers
never contract the disease, other than lung cancer; it was cardiovascular
diseases that kill some smokers prematurely. As a form of allowing
scientists to make more accurate statements about their data, it is the
statistic function to quantify variability since there is an exhibition of
variability in all data.
Those statistics offer evidence that something is incorrect may be a
common misconception. However, statistics have no such features. Instead,
to observe a specific result, they provide a measure of the probability.
Scientists can put numbers to probability through statistic techniques,
taking a step away from the statement that someone is more likely to
develop lung cancer if they smoke cigarettes to a report that says it is nearly
20 times greater in cigarette smokers compared to nonsmokers for the
probability of developing lung cancer. It is a powerful tool the
quantification of probability statistics offers and scientists use it thoroughly,
yet they frequently misunderstand it.
Statistics in data analysis
Developed for data analysis is a large number of procedures for statistics
they are in two parts of inferential and descriptive:
Descriptive statistics:
With the use of measures for deviation like mean, median, and standard,
scientists have the capability of quickly summing up significant attributes
of a dataset through descriptive statistics. They allow scientists to put the
research within a broad context while offering a general sense of the group
they study. For example, initiated in 1959, potential research on mortality
was Cancer Prevention Study 1 (CPS-1). Among other variables,
investigators gave reports of demographics and ages of the participants to
let them compare, at the time, the United States’ broader population and
also the study group. The age of the volunteers was from ages 30 to 108
with age in the middle as 52 years. The research had 57 percent female as
subjects, 2 percent black, and 97 percent white. Also, in 1960, the total
population of female in the US was 51 percent, black was about 11 percent,
and white was 89 percent. The statistics of descriptive easily identified
CPS-1’s recognized shortcoming by suggesting that the research made no
effort to sufficiently consider illness profiles in the US marginal groups
when 97 percent of participants were white.
Inferential statistics:
When scientists want to make a considered opinion about data, making
suppositions about bigger populaces with the use of smaller samples of
data, discover connection between variables in datasets, and model patterns
in data, they make use of inferential statistics. From the perspective of
statistics, the term “population” may differ from the ordinary meaning that
it belongs to a collection of people. The larger group is a geometric
population used by a dataset for making suppositions about a society,
locations of an oil field, meteor impacts, corn plants, or some various set of
measurements accordingly.
With regards to scientific studies, the process of shifting results to larger
populations from small sample sizes is quite essential. For example, though
there was conscription of about 1 million and 1.2 million individuals in that
order for the Cancer Prevention Studies I and II, their representation is for a
tiny portion of the 1960 and 1980 United States people that totaled about
179 and 226 million. Correlation, testing/point estimation, and regression
are some of the standard inferential techniques. For example, Tor Bjerkedal
and Peter Kristensen analyzed 250,000 male’s test scores in IQ for
personnel of the Norwegian military in 2007. According to their
examination, the IQ test scores of the first-born male children scored higher
points of 2.82 +/- 0.07 than second-born male children, 95 percent
confidence level of a statistical difference.
The vital concept in the analysis of data is the phrase “statistically
significant,” and most times, people misunderstand it. Similar to the
frequent application of the term significant, most people assume that a
result is momentous or essential when they call it significant. However, the
case is different. Instead, an estimate of the probability is statistical
significance that the difference or observed association is because of chance
instead of any actual connection. In other words, when there is no valid
existing difference or link, statistical significance tests describe the
probability that the difference or a temporary link would take place.
Because it has a similar implication in statistics typical of regular verbal
communication, though people can measure it, the measure of significance
is most times expressed in terms of confidence.
Data Types
To do Exploratory Data Analysis, EDA, you need to have a clear grasp of
measurement scales, which are also the different data types because specific
data types have correlated with the use of individual statistical
measurements. To select the precise visualization process, there is also the
requirement of identifying data types with which you are handling. The
manner with which you can categorize various types of variables is data
types. Now, let’s take an in-depth look at the main types of variables and
their examples, and we may refer to them as measurement scales
sometimes.
Categorical data
Characteristics are the representation of categorical data. As a result, it
stands for things such as someone’s language, gender, and so on. Also,
numerical values have a connection with categorical data like 0 for female
and 1 for male. Be aware that those numbers have no mathematical
meaning.
Nominal data
The discrete units are the representation of nominal values, and they use
them to label variables without any quantitative value. They are nothing but
“labels.” It is important to note that nominal data has no order. Hence,
nothing would change about the meaning even if you improve the order of
its values. For example, the value may not change when a question is asking
you for your gender, and you need to choose between female and male. The
order has no value. 
Ordinal data
Ordered and discrete units are what ordinal values represent. Except for the
importance of its ordering, ordinal data is therefore almost similar to
nominal data. For example, when a question asks you about your
educational background and has the order of elementary, high school,
undergraduate, and graduate. If you observe, there is a difference between
college and high school and also between high school and elementary. Here
is where the major limitation of ordinal data suffices; it is hard to know the
differences between the values. Due to this limitation, they use ordinal
scales to measure non-numerical features such as customer satisfaction,
happiness, etc.
Numerical Data
Discrete data
When its values are separate and distinct, then we refer to discrete data. In
other words, when the data can take on specific benefits, then we speak of
discrete data. It is possible to count this type of data, but we cannot measure
it. Classification is the category that its information represents. A perfect
instance is the number of heads in 100-coin flips. To know if you are
dealing with discrete data or not, try to ask the following two questions: can
you divide it into smaller and smaller parts, or can you count it?
Continuous data
Measurements are what continuous data represents, and as such, you can
only measure them, but you can’t count their values. For example, with the
use of intervals on the real number lines, you can describe someone’s
height. 
Interval data
The representation of ordered units with similar differences is interval
values. Consequently, in the course of a variable that contains ordered
numeric values and where we know the actual differences between the
values is interval data. For example, a feature that includes a temperature of
a given place may have the temperature in -10, -5, 0, +5, +10, and +15.
Interval values have a setback since they have no “true zero.” It implies that
there is no such thing as the temperature in regards to the example.
Subtracting and adding is possible with interval data. However, they don’t
give room for division, calculation, or multiplication of ratios. Ultimately, it
is hard to apply plenty of inferential and descriptive statistics because there
is no true zero.
Ratio data
Also, with a similar difference, ratio values are ordered units. The contrast
of an absolute zero is what ratio values have, the same as the interval
values. For example, weight, length, height, and so on.
The Importance of Data Types
Since scientists can only use statistical techniques with specific data types,
then data types are an essential concept. You may have a wrong analysis if
you continue to analyze data differently than categorical data. As a result,
you will have the ability to choose the correct technique of study when you
have a clear understanding of the data with which you are dealing. It is
essential to go over every data once more. However, in regards to what
statistic techniques one can apply. There is a need to understand the basics
of descriptive statistics before you can comprehend what we have to discuss
right now. Note: you can read all about descriptive statistics down the line
in this chapter.
Statistical Methods
Nominal data
The sense behind dealing with nominal data is to accumulate information
with the aid of:
Frequencies:
The degree upon which an occasion takes place concerning a dataset or
over a period is the frequency.
Proportion:
When you divide the frequency by the total number of events, you can
easily calculate the proportion. For example, how often an event occurs
divided by how often the event could occur.
Percentage:
Here, the technique required is visualization, and a bar chart or a pie chat is
all that you need to visualize nominal data. To transform nominal data into a
numeric feature, you can make use of one-hot encoding in data science.
Ordinal data
The same technique you use in nominal data can be applied with ordinal
data. However, some additional tools here there for you to access.
Consequently, proportions, percentages, and frequencies are the data you
can use for your summary. Bar charts and pie charts can be used to visualize
them. Also, for the review of your data, you can use median, interquartile
range, mode, and percentiles. 
Continuous data
You can use most techniques for your data description when you are dealing
with constant data. For the summary of your data, you can use range,
median, percentiles, standard deviation, interquartile range, and mean.
Visualization techniques:
A box-plot or a histogram, checking the variability, central tendency,
kurtosis of a distribution, and modality all come to mind when you are
attempting to visualize continuous data. You need to be aware that when
you have any outliers, a histogram may not reveal that. That is the reason
for the use of box-plots.
Descriptive Statistics
As an essential aspect of machine learning, to have an understanding of
your data, you need descriptive statistical analysis since making predictions
is what machine is all about. On the other hand, as a necessary initial step,
you conclude from data through statistics. Your dataset needs to go through
descriptive statistical analysis. Most people often get to wrong conclusions
by losing a considerable amount of beneficial understandings regarding
their data since they skip this part. It is better to be careful when running
your descriptive statistics, take your time, and for further analysis, ensure
your data complements all prerequisites.
Normal Distribution
Since almost all statistical tests require normally distributed data, the most
critical concept of statistics is the normal distribution. When scientists plot
it, it is essentially the depiction of the patterns of large samples of data.
Sometimes, they refer to it as the “Gaussian curve,” or the “bell curve.”
There is a requirement that a normal distribution is given for calculation
and inferential statistics of probabilities. The implication of this is that you
must be careful of what statistical test you apply to your data if it not
normally distributed since they could lead to wrong conclusions.
If your data is symmetrical, unimodal, centered, and bell-shaped, a normal
distribution is given. Each side is an exact mirror of the other in a perfectly
normal distribution. 
Central tendency
Mean, mode, and the median is what we need to tackle in statistics. Also,
these three are referred to as the “Central Tendency.” Apart from being the
most popular, these three are distinctive “averages.”
With regards to its consideration as a measure that is most consistent of the
central propensity for formulating a hypothesis about a population from a
particular model, the mean is the average. For the clustering of your data
value around its mean, mode, or median, central tendency determines the
tendency. When the values’ number is divided, the mean is computed by the
sum of all values.
The category or value that frequently happens contained by the data is the
mode. When there is no repletion of number or similarity in the class, there
is no mode in a dataset. Also, it is likely for a dataset to have more than one
mode. For categorical variables, the single central tendency measure is the
mode since you can compute such as the variable “gender” average.
Percentages and numbers are the only categorical variables you can report.
Also known as the “50th percentile,” the midpoint or “middle” value in
your data is the median. More than the mean, the median is much less
affected by skewed data and outliers. For example, when a housing prizes
dataset is from $100,000 to £300,000 yet has more than $3million worth of
houses. Divided by the number of values and the sum of all values, the
expensive homes will profoundly impact the mean. As all data points
“middle” value, these outliers will not profoundly affect the median.
Consequently, for your data description, the median is a much more suited
statistic.
Chapter 21: 
                     
Distributed Systems & Big
Data
Distributed System
A distributed system is a gathering of autonomous PCs which are
interconnected by either a nearby Network on a worldwide network.
Distributed systems enable a different machine to play out various
procedures. Distributed system example incorporates banking system, air
reservation system, etc. 
Distributed System has numerous objectives. Some of them are given
underneath. 
Scalability - To extend and deal with the server without corrupting any
administrations. 
Heterogeneity - To deal with considerable variety types of hubs. 
Straightforwardness - to shroud the interior working so that is user can't
understand the complexity. 
Accessibility - To make the resources accessible with the goal that the user
accesses the resources and offer the resource adequately. 
Receptiveness - To offers administrations as per standard guidelines. 
There are numerous points of interest in a distributed system. Some of them
are given beneath: 
Complexity is covered up in a distributed system. 
Distributed System guarantees the scalability. 
Convey system give consistency. 
Distributed System is more productive than other System. 
A drawback of distributed System is given underneath: 
Cost - It is increasingly costly because the advancement of distributed
System is difficult. 
Security - More defenseless to hacking because resources are uncovered
through the network. 
Complexity - More mind-boggling to understand fabric usage. 
Network reliance - The current network may cause a few issues. 
How do I get hands-on with distributed systems?
Learning DS ideas by 
1. Building a simple chat application: 
Step 1: Start little, implement a simple chat application. 
If fruitful, modify it to help multi-user chat sessions. 
You should see a few issues here with a message requesting. 
Step 2: After reading DS hypothesis for following, causal, and other
requesting procedures, implement every one of them individually into your
System. 
2. Building a capacity test system: 
Step 1: Write an Android application (no extravagant UI, merely a few
catches) that can embed and inquiry into the hidden Content Provider. This
application ought to have the option to speak with different gadgets that run
your application. 
Step 2: After perusing the hypothesis of Chord protocol and DHT, reenact
these protocols in your distributed set up. 
For example, Assume I run your application in three emulators. 
These three cases of your application should frame a chord ring and serve
embed/question demands in a distributed style, as indicated by the chord
protocol. 
If an emulator goes down, at that point, you ought to have the option to
reassign keys dependent on your hashing calculation to at present running
examples. 
WHAT ARE THE APPLICATIONS OF DISTRIBUTED SYSTEMS?
An appropriate system is a gathering of computer cooperating, which shows
up as a single computer to the end-user.
Whenever server traffic grows, one has to redesign the hardware and
programming arrangement of the server to deal with it, which is known as
the vertical scaling. The vertical scaling is excellent. However, one cannot
scale it after some purpose of time. Indeed, even the best hardware and
programming can not give better support for enormous traffic.
Coming up next are the different applications of the distributed System.
Worldwide situating System
World Wide Web
Airport regulation System
Mechanized Banking System
In the World Wide Web application, the information or application was
distributed on the few numbers of the heterogeneous computer system, yet
for the end-user or the browser, it is by all accounts a single system from
which user got the data. 
The multiple numbers of the computer working simultaneously and play out
the asset partaking in the World Wide Web. 
These all the System are the adaptation to internal failure, If anyone system
is bomb the application won't become up short, disappointment computer
errand can be given over by another computer in the System, and this will
all occur without knowing to the end-user or browser. 
The elements of the World Wide Web are 
Multiple Computer
Common Sate
Interconnection of the Multiple computers. 
There are three sorts of distributed systems: 
Corporate systems
These separate utilization servers for database, business insight, exchange
preparation, and web administrations. These are more often than not at one
site, yet could have multiple servers at numerous areas if continuous
administration is significant.
Vast web locales, Google, Facebook, Quora, maybe Wikipedia 
These resemble the corporate systems; however, are gigantic to the point
that they have their very own character. They are compelled to be
distributed due to their scale.
Ones serving distributed associations that can't depend on system
availability or need local IT assets 
The military will require some unit-level direction and control capacity. The
perfect would be that every unit (trooper, transport, and so on) can go about
as a hub so that there is no focal area whose pulverization would cut
everything down. 
Mining operations frequently have a significant modern limit at the
remotest places and are best served by local IT for stock control, finance
and staff systems, and particular bookkeeping and arranging systems. 
Development organizations frequently have huge ventures without
significant correspondences so that they will be something like mining
operations above. In the most pessimistic scenario, they may depend on a
driver bouncing in his truck with a memory stick and associating with the
web in some close-by town. 
Data Visualization
What is Data Visualization?
Data Visualization is Interactive 
Have you at any point booked your flight plans online and saw that you can
now view situate accessibility as well as pick your seat? Perhaps you have
seen that when you need to look into information online on another nation,
you may discover a site where all you need to do to get political, affordable,
land, and other information is drag your mouse over the area of the nation
wherein you are intrigued. 
Possibly you have assembled a business introduction comprising of
different degrees of complicated advertising and spending information in a
straightforward display, which enables you to audit all parts of your report
by just tapping on one area of a guide, outline, or diagram. You may have
even made forecasts by adjusting some information and watching the
diagram change before your thought.
Warehouses are following the stock. Businesses are following deals.
Individuals are making visual displays of information that addresses their
issues. The explorer, the understudy, the ordinary laborer, the advertising
official, the warehouse administrator, the CEO are currently ready to
associate with the information they are searching for with data visualization
tools. 
Data Visualization is Imaginative 
If you can visualize it in your psyche, you can visualize it on a PC screen.
The eager skier might be keen on looking at the average snowfall at Soldier
Mountain, ID. Specialists and understudies may need to look at the average
malignant growth death pace of men to ladies in Montana or Hawaii. The
models are interminable. 
Data visualization tools can assist the business visionary with presenting
items on their site imaginatively and educationally. Data visualization has
been grabbed by state and national government offices to give helpful
information to general society. Aircraft exploit data visualization to be all
the more obliging. Businesses utilize data visualization for following and
announcing. Youngsters use data visualization tools on the home PC to
satisfy investigate assignments or to fulfill their interest in awkward spots
of the world. 
Any place you go, data visualization will be there. Whatever you need, data
visualization can present answers in an accommodating way. 
Data Visualization is a Comprehensive 
Every one of us has looked into information online and found not exactly
accommodating introduction designs that have a way of either exhibiting
necessary details in a complicated technique or showing complex
information in a much progressively complex way. Every one of us at some
time has wanted that that site had a more user amicable way of introducing
the information.
Information is the language of the 21st century, which means everybody is
sending it, and everybody is looking through it. Data visualization can make
both the senders and the searchers cheerful by creating a primary
mechanism for frequently giving complex information. 
Data Visualization Basics
Data visualization is the way toward information/ displaying data in
graphical charts, bars, and figures. 
It is used as intends to convey visual answering to users for the
performance, tasks, or general measurements of an application, system,
equipment, or all intents and purposes any IT asset. Data visualization is
ordinarily accomplished by extricating data from the primary IT system.
This data is generally as numbers, insights, and by and massive action. The
data is prepared to utilize displayed on the system's dashboard and data
visualization software.
It is done to help IT directors in getting brisk, visual, and straightforward
knowledge into the performance of the hidden system. Most IT
performance observing applications use data visualization procedures to
give an accurate understanding of the performance of the checked system. 
Software Visualization
Software visualization is the act of making visual tools to delineate
components or generally display parts of source code. This should be
possible with a wide range of programming dialects in different ways, with
different criteria and tools.
The principal thought behind software visualization is that by making visual
interfaces, makers can support developers and others to get code or to figure
out applications. A ton of the intensity of software visualization has to do
with understanding connections between pieces of code, where specific
visual tools, for example, windows, will openly introduce this information.
Different highlights may include various sorts of charts or formats that
developers can use to contrast existing code with a specific standard. 
Enormous Data Visualization
Massive data visualization alludes to the usage of progressively
contemporary visualization methods to show the connections inside data.
Visualization strategies incorporate applications that can display constant
changes and increasingly graphic designs along these lines going past pie,
bar, and different charts. These delineations veer away from the use of
many paths, segments, and qualities toward a progressively creative visual
portrayal of the data.
Ordinarily, when businesses need to introduce connections among data,
they use diagrams, bars, and charts to do it. They can likewise make use of
an assortment of hues, terms, and images. The primary issue with this
arrangement, notwithstanding, is that it doesn't work superbly of exhibiting
exceptionally enormous data or data that incorporates immense numbers.
Data visualization uses increasingly intelligent, graphical representations -
including personalization and liveliness - to display figures and set up
associations among pieces of information.
The Many Faces of Data Visualization
Data Visualization has turned out to be one of the primary "buzz" phrases
twirling around the Web nowadays. With the majority of the guarantees of
Big Data and the IoT (Internet of Things), more organizations are trying to
get more an incentive from the voluminous data they produce. This as often
as possible, includes complex examination - both ongoing and chronicled -
joined with robotization.
A critical factor in interpreting this data into meaningful information, and in
this manner, into educated activity, is the methods by which this data is
pictured. Will it be found progressively? Furthermore, by whom? Will it be
shown in vivid air pocket charts and pattern graphs? Or on the other hand,
will it be implanted in high-detail 3D graphics? What is the objective of the
visualization? Is it to share information? Empower cooperation? Engage in
basic leadership? Data visualization may be a rough idea, yet we don't all
have a similar thought regarding what it implies. 
For some organizations, viable data visualization is a significant piece of
working together. It can even involve life and demise (think human services
and military applications). Data visualization (or information visualization)
is a vital piece of some scientific research. From molecule material science
to sociology, making compact yet incredible visualizations of research data
can help researchers rapidly identify examples or irregularities, and can at
times, move that warm and fluffy inclination we get when we have a feeling
that we've at last folded our head over something.
The present Visual Culture 
We live in a present reality that is by all accounts producing new
information at a pace that can be overpowering. With TV, the Web, roadside
announcements, and all the more all competing for our inexorably divided
consideration, the media, and corporate America are compelled to discover
new ways of getting their messages through the clamor and into our
observation. As a rule - when conceivable - the medium picked to share the
message is visual. Regardless of whether it's through a picture, a video, a
fantastic infographic, or a primary symbol, we have all turned out to be
exceptionally talented at preparing information outwardly.
It's a bustling world with numerous things about which we want to be
educated. While we as a whole get information from multiple points of
view over some random day, just individual bits of that information will
have any genuine impact in transit we think and go about as we approach
our typical lives. The intensity of compelling data visualization is that it can
distill those significant subtleties from enormous arrangements of data just
by placing it in the best possible setting. 
Well-arranged data visualization executed in an outwardly engaging way
can prompt quicker, progressively positive choices. It can reveal insight into
past disappointments and uncover new chances. It can give an apparatus to
a joint effort, arranging, and preparing. It is turning into a need for some
organizations that want to contend in the commercial center, and the
individuals who do it well will separate themselves.
Chapter 22: 
                     
Python in the Real World
Now that you know the basics behind Python programming, you might be
wondering where exactly could you apply your knowledge. Keep in mind
that you only started your journey, so right now, you should focus on
practicing all the concepts and techniques you learned. However, having a
specific goal in mind can be extremely helpful and motivating.
As mentioned earlier in this book, Python is a powerful and versatile
language with many practical applications. It is used in many fields, from
robotics to game development and web-based application design. In this
chapter, you are going to explore some of these fields to give you an idea
about what you can do with your newly acquired skills.
What is Python Used For?
You're on your way to work listening to your favorite Spotify playlist and
scrolling through your Instagram feed. Once you arrive at the office, you
head over to the coffee machine, and while waiting for your daily boost,
you check your Facebook notifications. Finally, you head to your desk, take
a sip of coffee, and you think, "Hey, I should Google to learn what Python
is used for." At this point, you realize that every technology you just used
has a little bit of Python in it.
Python is used in nearly everything, whether we are talking about a simple
app created by a startup company or a giant corporation like Google. Let’s
go through a brief list of all the ways you can use Python.
Robotics

Without a doubt, you’ve probably heard about tiny computers like the
Raspberry Pi or Arduino board. They are tiny, inexpensive devices that can
be used in a variety of projects. Some people create cool little weather
stations or drones that can scan the area, while others build killer robots
because why not. Once the hardware problems are solved, they all need to
take care of the software component.
Python is the ideal solution, and it is used by hobbyists and professionals
alike. These tiny computers don't have much power, so they need the most
powerful programming language that uses the least amount of resources.
After all, resources also consume power, and tiny robots can only pack so
much juice. Everything you have learned so far can be used in robotics
because Python is easily combined with any hardware components without
compatibility issues. Furthermore, there are many Python extensions and
libraries specifically designed for the field of robotics.
In addition, Google uses some Python magic in their AI-based self-driving
car. If Python is good for Google and for creating killer robots, what more
can you want?
Machine Learning
You’ve probably heard about machine learning because it is the new
popular kid on the block that every tech company relies on for something.
Machine learning is all about teaching computer programs to learn from
experience based on data you already have. Thanks to this concept,
computers can learn how to predict various actions and results.
Some of the most popular machine learning examples can be found in:
1. Google Maps: Machine learning is used here to determine the
speed of the traffic and to predict for you the most optimal
route to your destination based on several other factors as
well.
2. Gmail: SPAM used to be a problem, but thanks to Google’s
machine learning algorithms, SPAM can now be easily
detected and contained.
3. Spotify or Netflix: Noticed how any of these streaming
platforms have a habit of knowing what new things to
recommend to you? That's all because of machine learning.
Some algorithms can predict what you will like based on what
you have watched or listened to so far.

Machine learning involves programming, as well as a great deal of


mathematics. Python's simplicity makes it attractive for both programmers
and mathematicians. Furthermore, unlike other programming languages,
Python has a number of add-ons and libraries created explicitly for machine
learning and data science, such as Tensorflow, NumPy, Pandas, and Scikit-
learn.
Cybersecurity

Data security is one of the biggest concerns of our century. By integrating


our lives and business into the digital world, we make it vulnerable to
unauthorized access. You probably read every month about some
governmental institution or company getting hacked or taken offline. Most
of these situations involve terrible security due to outdated systems and
working with antiquated programming languages.
Python's own popularity is something that makes it far more secure than
any other. How so? When something is popular, it becomes driven by a
large community of experts and testers. For this reason, Python is often
patched, and security issues are plugged in less than a day. This makes it a
popular language in the field of cybersecurity.
Web Development
As mentioned several times before, Python is simple yet powerful. Many
companies throughout the world, no matter the size, rely on Python to build
their applications, websites, and other tools. Even giants like Google and
Facebook rely on Python for many of their solutions.
We discussed earlier in the book, the main advantages of working with
Python so that we won't explore them yet again. However, it is worth
mentioning that Python is often used as a glue language, especially in web
development. Creating web tools always involves several different
programming languages, database management languages, and so on.
Python can act as the integration language by calling C++ data types and
combining them with other elements, for example. C++ is mentioned
because in many tech areas, the critical performance components are
written in C++, which offers unmatched performance. However, Python is
used for high-level customization.
Chapter 23: 
                     
Linear Regression
The easiest and most basic machine learning algorithm is linear regression.
It will be the first one that we are going to look at, and it is a supervised
learning algorithm. That means that we need both – inputs and outputs – to
train the model.
Mathematical Explanation
Before we get into the coding, let us talk about the mathematics behind this
algorithm.
In the figure above, you see a lot of different points, which all have an x-
value and a y-value. The x-value is called the feature, whereas the y-value
is our label. The label is the result of our feature. Our linear regression
model is represented by the blue line that goes straight through our data. It
is placed so that it is as close as possible to all points at the same time. So
we “trained” the line to fit the existing points or the existing data.
The idea is now to take a new x-value without knowing the corresponding
y-value. We then look at the line and find the resulting y-value there, which
the model predicts for us. However, since this line is quite generalized, we
will get a relatively inaccurate result.
However, one must also mention that linear model only really develops
their effectiveness when we are dealing with numerous features (i.e., higher
dimensions).
If we are applying this model to data of schools and we try to find a relation
between missing hours, learning time, and the resulting grade, we will
probably get a less accurate result than by including 30 parameters.
Logically, however, we then no longer have a straight line or flat surface but
a hyperplane. This is the equivalent to a straight line, in higher dimensions.
Preparing Data
Our data is now fully loaded and selected. However, in order to use it as
training and testing data for our model, we have to reformat them. The
sklearn models do not accept Pandas data frames, but only NumPy arrays.
That's why we turn our features into an x-array and our label into a y-array.
X = np.array(data.drop([prediction], 1))
Y = np.array(data[prediction])
The method np.array converts the selected columns into an array. The drop
function returns the data frame without the specified column. Our X array
now contains all of our columns, except for the final grade. The final grade
is in the Y array.
In order to train and test our model, we have to split our available data. The
first part is used to get the hyperplane to fit our data as well as possible. The
second part then checks the accuracy of the prediction, with previously
unknown data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1)
With the function train_test_split, we divide our X and Y arrays into four
arrays. The order must be exactly as shown here. The test_size parameter
specifies what percentage of records to use for testing. In this case, it is
10%. This is also a good and recommended value. We do this to test how
accurate it is with data that our model has never seen before.
Training and Testing
Now we can start training and testing our model. For that, we first define
our model.
model = LinearRegression()
model.fit(X_train, Y_train)
By using the constructor of the LinearRegression class, we create our
model. We then use the fit function and pass our training data. Now our
model is already trained. It has now adjusted its hyperplane so that it fits all
of our values.
In order to test how well our model performs, we can use the score method
and pass our testing data.
accuracy = model.score(X_test, Y_test)
print(accuracy)
Since the splitting of training and test data is always random, we will have
slightly different results on each run. An average result could look like this:
0.9130676521162756
Actually, 91 percent is pretty high and good accuracy. Now that we know
that our model is somewhat reliable, we can enter new data and predict the
final grade.
X_new = np.array([[18, 1, 3, 40, 15, 16]])
Y_new = model.predict(X_new)
print(Y_new)
Here we define a new NumPy array with values for our features in the right
order. Then we use the predict method to calculate the likely final grade for
our inputs.
[17.12142363]
In this case, the final grade would probably be 17.
Visualizing Correlations
Since we are dealing with high dimensions here, we can’t draw a graph of
our model. This is only possible in two or three dimensions. However, what
we can visualize are relationships between individual features.
plt.scatter(data['study time'], data['G3'])
plt.title("Correlation")
plt.xlabel("Study Time")
plt.ylabel("Final Grade")
plt.show()
Here we draw a scatter plot with the function scatter, which shows the
relationship between the learning time and the final grade.
In this case, we see that the relationship is not really strong. The data is very
diverse and you cannot see a clear pattern.
plt.scatter(data['G2'], data['G3'])
plt.title("Correlation")
plt.xlabel("Second Grade")
plt.ylabel("Final Grade")
plt.show()
However, if we look at the correlation between the second grade and the
final grade, we see a much stronger correlation.
Here we can clearly see that the students with good second grades are very
likely to end up with a good final grade as well. You can play around with
the different columns of this data set if you want to.
Conclusion
In conclusion, Python and big data provide one of the strongest capabilities
in computational terms on the platform of big data analysis. If this is your
first time at data programming, Python will be a much easier language to
learn than any other and is far more user-friendly. 
And so, we've come to the end of this book, which was meant to give you a
taste of data analysis techniques and visualization beyond the basics using
Python. Python is a wonderful tool to use for data purposes, and I hope this
guide stands you in good stead as you go about using it for your purposes.
I have tried to go more in-depth in this book, give you more information on
the fundamentals of data science, along with lots of useful, practical
examples for you to try out.
Please read this guide as often as you need to and don’t move on from a
chapter until you fully understand it. And do try out the examples included
– you will learn far more if you actually do it rather than just reading the
theory.
This was just an overview to recap on what you learned in the first book,
covering the datatypes in pandas and how they are used. We also looked at
cleaning the data and manipulating it to handle missing values and do some
string operations. 

You might also like