You are on page 1of 457

TM112: Introduction to Computing and

Information Technology 2

Meeting #1
Block 1 (Part 1 )
Binary data representation and computation
OU Materials, PPT prepared by Dr. Khaled Suwais
1
Edited by Dr. Ahmad Mikati
Contents

• Introduction
• 1.1 Representing integers and text in binary
• 1.2 Decimal numbers and some limitations of binary representations
• 1.3 Representing logic operations and logic circuits
• Summary

2
Introduction

• This part will provide you with a basic understanding of


how computers represent and process data.
• After studying it, you will be able to compare some of the
different binary representations of data and reason about
their efficiency.
• Also, you will appreciate that sometimes a particular
representation can lead to errors (with potentially far-
reaching consequences).

3
1.1 Representing integers and text in
binary
• The printed symbols shown in Figure 1.1 provide
convenient representations of short and long flashes of
light or short and long bleeps of sound, which is how
Morse code is normally transmitted.

4
1.1 Representing integers and text in
binary
• In any representation it is important that the symbols can
be distinguished from each other.
• Changes in electrical voltages or friction in mechanical
systems can cause random fluctuations, called noise,
which may distort how the symbol is perceived.
• In a binary system there are only two symbols, so it is
generally easier to make them different enough to be
distinguishable – for example, Morse code specifies that a
dash should be three times as long as a dot.

5
Binary representation systems in
computers
• Some very early computers, such as the ENIAC
(Electronic Numerical Integrator and Computer), tried to
represent data using our usual base-10 system. So 0 volts
was used to represent the digit 0, 1 volt to represent the
digit 1, and so on, all the way up to 9 volts to represent
the digit 9.
• Hence, a lot of circuitry was needed just in order to
distinguish between the different voltages, which took up
a lot of space and generated a lot of heat.
• The advantage of representing data in binary is that only
two ranges of voltage need to be detected.

6
Converting numbers from binary to
decimal notation
To convert a number from binary to decimal notation, we put
the number in the table and add up the values of each place
value.
So to convert the binary number 1001 into decimal notation,
we can use the following table

Decimal number= 8+1=9

7
Converting numbers from decimal to
binary notation

8
Representing integers in binary
Unsigned integers

• In computer-speak, an unsigned integer is an integer


that is greater than or equal to zero.
• An unsigned integer is sufficient for any purpose where a
value does not become negative – for example, a counter
counting upwards from zero.
• The number of unsigned decimal values that we can
represent in binary depends on the number of bits we
have available. If there are 3 bits available, we can
represent 23 = 8 values. If we want to include 0, this
means that we can encode all of the unsigned integers
from 0 to 7 in three bits.
9
Representing integers in binary
Unsigned integers

• The largest decimal value that can be represented in 3 bits


is 7. This is one less than 23 (or, in mathematical notation,
23 – 1) because one of the 8 available codes has been used up
to represent 0.
• This scheme can be extended to systems with more bits.
• In general, if we have n bits, we can represent 2n unsigned
integers, and the largest integer that can be represented is
2n – 1.

10
Adding unsigned integers in binary
notation

• In decimal notation, adding two positive integers is very


straightforward. The two values are added and the sign of the
result is automatically positive.
• Similarly, if the values are represented as two unsigned
integers in binary notation, their binary values can just be
added.
• So, to add two binary numbers by hand, we use the same
method as adding two decimal numbers. The numbers are
written one under the other, so that each column has the same
place value. Then the digits in each column are added, column
by column from the right, carrying digits as necessary.
11
Adding unsigned integers in binary
notation

• To illustrate, here is the working to add the two unsigned


integers 110 and 101, where the ‘carry’ digit is shown in
blue below:

12
Sign-magnitude representation
Signed magnitude is the most intuitive method for representing the
unsigned numbers.
The MSB (Most Significant Bit) of a binary number is kept as the
“sign” of the number
MSB = 1: negative number
MSB = 0: positive number
The remaining bits represent the magnitude (or absolute value) of
the numeric value.
• So for our 3 bits, there would be 23 = 8 possible binary codes, which
could be used to encode positive and negative integers as shown in
Table 1.7.

13
Sign-magnitude representation
In an N bit word signed magnitude system
 1 bit is used for the sign of the number (MSB).
 N-1 bits are used for the magnitude of the number.
 The largest integer is 2N-1- 1
 The smallest integer is -(2N-1- 1)

Example: In an 8 bit word signed magnitude system give the decimal


representation of the following numbers: 00000001, 10000001
Answer:
•00000001:
-The MSB is 0: The number is positive
-The remaining 7 bits are: 00000012 = 110
-The decimal number is +1
•10000001:
-The MSB is 1: The number is negative
-The remaining 7 bits are: 00000012 = 110
-The decimal number is -1
14
1.1.5 Representing text in binary
• Most modern systems for encoding text derive in part from ASCII
(American Standard Code for Information Interchange, pronounced
‘askee’), which was developed in 1963.
• In the original ASCII system, upper-case and lower-case letters,
numbers, punctuation and other symbols and control codes (such as
a carriage return, backspace and tab) were encoded in 7 bits. As
computers based on multiples of 8 bits (or a byte) became more
common, the encoding system became an 8-bit system, and so could
be expanded to include more symbols.
• When binary numbers were assigned to each character in the original
ASCII system, careful thought was given to choosing sequences of
values for the characters of the alphabet and numerals that would
make it easy for a computer processor to perform common
operations on them. (These encodings were preserved in the 8-bit
system by simply padding out the leftmost bit with a 0.)
15
1.1.5 Representing text in binary

• Since 2007, the standard encoding system for characters has been
Unicode Transformation Format-8 (UTF-8) which uses a variable
number of bytes (up to 6) to encode characters in use across the
world. However, in order to maintain backward compatibility, the
original 127 ASCII codes are preserved in UTF-8.
16
Floating-point numbers and scientific
notation
• Consider the decimal number 2343.56. We could also write this
as 23.4356 × 102 OR 0.234356 × 104 or 234356.0 × 10–2.
• The decimal point can ‘float’ to any position as long as the
power of 10 is appropriate.
• Scientific notation is a special case of floating-point notation
where there is a single non-zero digit between 1 and 9
(inclusive) to the left of the decimal point.
• So the number 2343.56 can be represented in scientific
notation as 2.34356 × 103. Note that the exponent, 3, indicates
that the decimal point should be moved three places to the
right to get back the original decimal notation.
17
Floating-point numbers and scientific
notation
• The number –0.000654 in decimal notation can be
written as -6.54 × 10–4 in scientific notation. Here, the
negative exponent (–4) indicates that the decimal point
should be moved 4 places to the left to get back to the
original decimal notation.
• Notice that scientific notation has three distinct parts,
shown in Figure 1.10:
• a sign
• an exponent (the power of 10)
• a mantissa (the decimal number part).

18
1.3 Representing logic operations and
logic circuits
• In the previous two sections, we have seen that we can use binary
encodings to represent numerical and textual data. We will now see
that operations, including arithmetical operations such as addition,
and comparison operations such as less than and equals, can be
encoded as one or more logic operations. These logic operations act
on the binary representations of the data.
• To move from the human to the computer view, the integers have to
be encoded as binary representations and the addition operator has
to be encoded as a sequence of logical operations that have what is
called the truth table.

19
1.3 Representing logic operations and
logic circuits
• A truth table for a logic operation lists all the possible
combinations of input values, and for each possibility gives the
output value for that operation. As the operations we will
consider will always be applied to binary encodings, each input
value must be either a 1 or a 0 and the result of the operation
must also always be a 1 or a 0.
• We will start by looking at the truth tables for three of the
fundamental logical operations defined by Boole. We will then
see how these basic operations can be used as building blocks
for the logic circuits that perform more complex operations.
By the end of this subsection, you will see how these simple
operations can be used to build a logic circuit to add two binary
numbers. 20
1.3 Representing logic operations and
logic circuits
The NOT operation
• One of the most fundamental operations we might want to perform is
to ‘flip’ a single bit – let’s call the bit a. So if 𝒙 is 1, we want the result to
be 0, and if 𝒙 is 0, we want the result to be 1. This operation is called
NOT 𝒙 and is expressed as: ഥ
𝒙 or 𝒙’.
The behavior of NOT operator is characterized by the truth table
shown below:
To physically perform logic operations on
binary data in a computer, we need to use
electrical components. The components
that represent the most fundamental
operations are called logic gates, which
can be combined in a logic circuit in order
to create more complex operations.
The NOT truth table
21

The NOT logic gate


1.3 Representing logic operations and
logic circuits
• The AND operation
• Most logical operations involve two input values. A truth table for
two binary inputs, x and y, has more rows because there are four
possible permutations (or ways of combining) the two input values,
as shown in Table 1.19.

AND logic gate

AND truth table

22
1.3 Representing logic operations and
logic circuits
• The OR operation
• truth table for the logic operation OR (which Boole originally
designated by the symbol +) is shown below

OR logic gate

OR truth table
23
Building logic circuits

• Suppose we want to build a logic circuit with two inputs, A and B,


that tests if B is greater than A. The first step is to create a truth table
showing the desired outcomes: if B is greater than A, the result is 1
(True), otherwise the result is 0 (False).

24
Building logic circuits
• To translate this into a logic expression – that is a combination of
our logic operations (NOT, AND and OR) – we follow this algorithm.
• Identify the row where the outcome (B > A) is 1.
• If input A is 1, write A; otherwise write NOT A in the logic expression for
the selected row.
• If input B is 1, write B; otherwise write NOT B in the logic expression for
the selected row.
• Join these with an AND.
• The final equation will be the sum of all the deduced logic expressions.
This algorithm yields the answer given

Final equation: A’. B


25
Building logic circuits

• Here, the resulting logic expression NOT A AND B tells us that the
logic circuit that is equivalent to this truth table for each combination
of inputs can be constructed from two logic gates

• a NOT gate with A as an input, which gives an output of NOT A


• an AND gate that takes NOT A and B as inputs, which gives the
required result NOT A AND B as an output.
26
Building logic circuits

In the above circuit:


The output of gate 1 is: 𝒙 + ഥ
𝒚
The output of gate 2 is: 𝒚 + 𝒛
The output of gate 3 is: 𝒙 + ഥ
𝒚 ( 𝒚 + 𝒛)

27
1.3.4 What is inside a logic gate?
• How Logic gates are actually constructed , and what exactly is
inside a logic gate?
• A Logic gate is itself made up of a combination of more
fundamental components that act as on/off switches.
• In early computers, such devices were generally based on
various designs of vacuum tube (collectively called valves).
• In modern computers, they are based on transistors, which are
formed of layers of semiconducting material such as silicon.

28
A ‘pluggable’ unit made of valves from A chip containing six inverters
an IBM computer of the mid-1950s
TM112: Introduction to Computing and
Information Technology 2

Meeting #2
Block 1 (Part 3)
Hardware and Software Concept
OU Materials, PPT prepared by Dr. Khaled Suwais
29
Edited by Dr. Ahmad Mikati
Contents

• Introduction
• 3.1 The processor
• 3.2 Storing and moving data and instructions
• 3.3 Peripherals and pulling it all together
• 3.4 Instructing the processor
• 3.5 Programmers, programming and
programs
• Summary

30
Introduction
You will learn the answers to questions such as the following.
• You will learn the answers to questions such as the following.
• How does a data bottleneck occur in a computer and how it can be
avoided?
• How can I melt my computer?
• What are those strange strings of symbols when I get the ‘blue screen
of death’ on my Windows machine?
• How can a sip and a puff help a person with disabilities interact with a
computer?
• How do computers and programmers pull themselves up by their
bootstraps?
• Do you do RISC?
• When is hardware not required for a computer?
31
3.1 The processor

• The processor of a computer is the part that actually


performs the instructions that we ask the computer to
execute.
• A commercial processor is a wafer of silicon, called a chip or
microchip, on which are etched several hundreds of millions
of the logic gates that are used to store and process
instructions and data.
• Processors come in ‘families’ such as the Intel Core, Celeron
and AMD Athlon, etc.

32
3.1 The processor
• The arithmetic and logic unit (ALU) and the floating-
point unit (FPU) are at the heart of the processor, as these
are the places where the data is actually manipulated.
ALU FPU
• Contains electronic circuits that • It is a common part of most
perform binary arithmetic, such modern processors.
as addition, subtraction, • Its function is very similar to that
multiplication and division on of the ALU, but it operates on
integers. floating-point numbers using
• Contains circuits to perform specialised circuitry optimised to
logical operations, such as be as efficient as possible when
comparing integers with zero, working with floating-point
testing two integers for equality, representations.
testing if one integer is greater
than another, etc.
33
3.1.2 Registers and cache memory
• Main memory is a storage area that contains program
instructions and data.
• When a program is first loaded, the corresponding instructions
and data are put into main memory, which is outside the
processor.
• Each instruction and piece of data is held in a ‘chunk’
called a word.
• A word has a fixed size (usually 32 or 64 bits in a modern
computer), and it is handled as a unit by the hardware of the
processor.

34
3.1.2 Registers and cache memory

As each word gets closer to being processed, it is moved


into the processor so that it can be accessed more quickly in
two steps:
• first, the word is moved to cache memory which is inside the
processor
• then from cache memory to memory locations called registers.
• Registers are very small but very fast areas of memory that are used
as a holding area for instructions and data immediately before they
are needed by the ALU/FPU.

35
3.1.2 Registers and cache memory
In modern processors, there may be several levels of cache
memory.
• Level 1 cache is the fastest (and smallest), and the aim is to use
this for the data and instructions that will imminently be
transferred to the registers.
• Level 2 cache is a larger but slower cache memory.
• There may be two more levels of cache below Level 2, each with
more capacity but slower speed.

36

The organisation, relationship between, and speeds of different levels of cache


3.1.2 Registers and cache memory

• Data has to be moved into cache memory from main


memory before it is needed by the processor.
• The use of the cache to speed up execution depends on how
effective the cache management is at predicting future data
use.
• If a sequence of instructions is to be executed, then pre-
loading all these instructions into cache before execution
begins can improve the overall processing speed.

37
3.1.2 Registers and cache memory
• There are several different types of registers in different
parts of the processor, and each is designed to hold a
particular type of information for a specific function.
• The accumulator is a register within the ALU where an
actual calculation takes place.
• The status register, sometimes called the flags register,
holds further information about the last operation
executed. Each bit in the register represents some
description of the result – is the result zero? Is the result
negative? Is the result too big to be stored in the
accumulator? And so on.
38
3.1.3 The control unit and other
registers
• The control unit has the role of coordinating the
movement of data and instructions within the processor.
It does this by sending out electrical pulses, called control signals,
that activate the necessary connections between main memory,
cache, registers, ALU and FPU, as required, to execute the
instruction.
• The address register holds the memory address of the
next instruction to be executed.

• The data registers are where data is stored when it is on


its way to the ALU or FPU or when a result is on its way
back to main memory.

39
3.1.4 Multi-core processors
• A multi-core processor is a single chip that contains two or
more independent processors called cores.
• Each core performs the usual functions of loading data and
instructions into registers and performing arithmetic manipulations or
floating-point manipulations, but instructions can be shared between
each of the cores and run at the same time, increasing the overall
speed of programs.
• You may think that four cores all working simultaneously would
make a program run four times as fast. However, this is far from
being the case, for several reasons.
• Firstly, each core requires its share of the data and instructions to be
moved from the shared main memory into cache memory, and from
there into its registers.
• Each core may have its own Level 1 cache memory, but often the
other levels of cache memory are shared between them. This can lead
to delays while the cores wait for data and instructions to be 40
transferred.
3.1.4 Multi-core processors
• In order to take advantage of multiple cores, the program has to
be written in such a way that a task can be split up into
independent sub-tasks, each of which can be completed by a
core, and then, if necessary, reassembled into a final solution.
This process is called threading – with each of the independent
tasks being coordinated by a separate thread

41

A multi-core processor where each core is processing a separate thread. (L1, Level 1; L2, Level 2.)
3.2 Storing and moving data and instructions
(Main Memory)
• Main memory is where the instructions, and the data they act on, are
loaded from when a program is executed.
• It is volatile memory, which means that its content is lost when the
power is switched off.
• Each byte in main memory is numbered in sequence, so that it has a
unique memory address.
• In main memory, every memory address can be directly accessed,
which is why this type of memory is referred to as random-access
memory (RAM).
• Most forms of memory today are random access, but for historical
reasons we still tend to reserve the acronym RAM for main memory.
• An advantage of any form of random access memory, is that accessing
any location in memory takes the same amount of time, regardless of
whether it is stored at a location with a low or a high memory address.
42
3.2.3 Buses and clocks
• The wiring that connects the various internal and external
components of a computer is known as a bus. Internal
buses inside the processor connect the various registers
and cache memory together.
• The control bus: this bus carries the control signals
between the processor and main memory (and other
parts of the computer system).
• The address bus: this bus carries the addresses of
memory locations to be accessed.
• The data bus: this bus transfers data from place to
place.
43
3.2.3 Buses and clocks

• All computers have a processor clock, which sends out


pulses at regular intervals.
• The clock sends a synchronising signal between the
circuits within the processor to ensure that they remain
in step.
• You can think of each pulse of the processor clock as
being like the rhythmic stroke of a pump that regulates
the movement of data and instructions along the buses
within the processor.

44
3.2.4 The operating system

• Managing the various resources of a computer and


coordinating the hardware components is the job of a
collection of programs known as the operating system.
• In early computers, all the direct interaction between devices,
users and executing programs was coded into each program.
This made programs difficult to write, requiring specialist
knowledge of how to interact with devices connected to the
computer (so called peripheral devices) such as keyboards,
screens, printers and disk drives.
• Without an operating system, the programmers also had to
have specific knowledge of the components of the processor
on which their programs would execute.
45
3.2.4 The operating system

• The operating system provides an interface between the


program and the rest of the computer system.
• The operating system allows the user, who writes the
program, to interact at a higher level of abstraction with
the computer that executes the program.
• The operating system is independent of the type of
processor, and this makes it possible to talk generally
about using a Windows’ computer, or a Mac.

46
3.2.4 The operating system
Some of the functions that the operating system provides are as follows:
• Provision of a user interface:
• It provides us with a means of inputting data and instructions, and displaying output in
a form that users can understand.
• Management of multiple programs:
• The operating system supports hardware designed to enable the processor to switch
between different executing programs in order to multitask.
• Management of memory:
• It is the job of the operating system to allocate appropriately sized areas of memory to
each executing program, and to ensure that program instructions and data do not
interfere with each other or with data and instructions of other programs.
• Coordination and control of peripheral devices:
• in order to carry out its tasks, a computer will need to communicate with one or more
peripheral devices. For example, it may wish to receive data from the keyboard or
mouse, read from a file on a disk, send output to the monitor or printer, connect to a
network, and so on.

47
3.3.2 Secondary memory

• Secondary memory (or secondary storage) is the term given


to the storage devices that contain persistent data.
• In most devices, secondary storage is built into the case and is
usually supplied in the form of a hard disk drive, or a solid-state
drive.
• Secondary memory is used to store program code and data
files that are not immediately needed by the computer system.
• Secondary memory devices usually make up the bulk of the
memory in desktop, laptop and mainframe computers, and
other devices such as mobile phones or tablets, but may be
completely absent in embedded computer systems
48
The memory hierarchy

• The fastest memory access is in the registers, however, register


memory is very expensive.

• It is also the case that the registers are built directly into the processor,
so there are usually a fixed number of them – typically fewer than 50.

• The slowest access is to hard disk storage, roughly 10,000 times


slower than for the registers, but hard disk storage memory is much 49

cheaper and expansion is usually possible by adding more disk drives.


3.5 Programmers, programming and
programs

• Early computer programmers wrote instructions in a form


that could be directly understood by their computer’s
family of processors, i.e. in some dialect of machine
language (also known as machine code).
• These consisted of binary patterns that were entered
directly into the hardware of the machine using plug
boards or panel switches.

50
3.5 Programmers, programming and
programs
• An assembly language is a programming language that uses human-
readable symbolic instructions and symbolic addresses that translate
into machine language instructions on a one-to-one basis.
• A program written in assembly language has the ability to directly
access all the features and instructions available on the processor it is
designed for.
• Whenever a program is written in a language other than machine
language, the instructions in the original program (called the source
code) need to be converted into equivalent machine language
instructions.

51
3.5 Programmers, programming and
programs

• The task of converting the source code into machine language is carried
out by special programs called translators.
• When the source code is in assembly language, the program that does
this translation into machine code is called an assembler.
• An assembler takes an assembly language program and generates an
equivalent program in machine language, which can then be loaded into
memory and executed. Since each processor family has a different
machine language, and therefore a different assembly language, they
each require a different assembler.

52
3.5 Programmers, programming and
programs
• It would be exceptionally tedious (not to mention error-prone) to have
to deal with computer programs by writing in low-level languages and
writing code specifically for each family of processors, so modern
computing is not done in this way.
• Instead, high-level programming languages are used, in which each
instruction in the high-level language is translated into many
instructions in the machine language of the processor on which it is to
be executed.
• High-level programming languages include Python, JavaScript, Java,
C++, Smalltalk, Scratch and a whole range of application-specific
languages that attempt to make the process of writing programs easier
for the human involved.

53
3.5 Programmers, programming and
programs
• In compilation, the program written in the high-level
language, called the source code or source program, is
used as the input to a translator program called a compiler.
• The compiler translates the entire source program into the
machine language understood by the processor; this
translation is referred to as the object code or object
program.
• The object code is then saved, and it is this machine
language program that is loaded into memory and executed
when the program is executed.
• Languages such as C, C++, and Visual Basic are designed to
be compiled.
54
3.5 Programmers, programming and
programs
• Whereas a compiler translates all the source code in one go, an
interpreter translates each instruction in the source code only
when it is required for that instruction to be executed.
• There is never a complete translation of the whole of the source
code into machine language, and so no object code program is
generated.
• The advantage of an interpreted language is that the potentially
lengthy process of compilation does not need to be gone through
for each small change in the source code.
• The main disadvantage is that the translation process must take
place every time a program is executed, resulting in a slower
execution of the program. Like compilers, it is also the case that
each processor family needs a different interpreter.
• Languages such as JavaScript, Perl and Basic are designed to be
interpreted.
55
3.5 Programmers, programming and
programs
• Virtualisation is a term used to describe any configuration
where a physical computer system is emulated using software.
• Using a virtual machine to interpret bytecode as we described above
is just one example, but there are many different kinds of
virtualisation. For example, if you use a Mac, you might have a virtual
machine on your computer that allows you to also run an emulated
Windows platform.
• Cloud computing relies on virtual machines sitting on top of
remote servers, allowing the server’s processing and storage
capacity to be shared between several users by using a software
layer called a hypervisor to act as an intermediary between
multiple ‘guest’ operating systems and the host operating
system that directly interacts with the hardware.

56
Summary
• In this part, you have learned how the main components
of a computer work together to execute a program.

• knowing a little bit about processors, memory and various


peripherals to help you to be more aware of how to match
specifications to a person’s particular computing needs.

• We have also explored how code written in high-level


programming languages is turned into the instructions a
processor understands.

57
TM112: Introduction to Computing and
Information Technology 2

Meeting #3
Block 1 (Part 2- Intro)
Introduction to Python
Collected by Dr. Ahmad Mikati
58
Why Python?
Python is object-oriented
• Supports concepts such as polymorphism, operation overloading, and multiple inheritance
It's free (open source)
• Downloading and installing Python is free and easy
• Source code is easily accessible
• Free doesn't mean unsupported! Online Python community is huge
It's portable
• Python runs virtually on major platforms used today
• As long as you have a compatible Python interpreter installed, Python programs will run in exactly
the same manner, irrespective of platform
It's powerful
• Dynamic typing
• Built-in types and tools
• Library utilities
• Third party utilities (e.g. Numeric, NumPy, SciPy)
• Automatic memory management
7 December
2021 59
Python IDLE

• IDLE: Integrated DeveLopment Environment


• After installing the IDLE, you can start writing your Python
programs.
7 December 2021 60
Programming Modes in Python

• Interactive Mode
• gives you immediate feedback
• Not designed to create programs to be saved and run later
• Script Mode
• Write, edit, save, and run (later)
• Save your file using the “.py” extension

7 December
2021 61
Create and run programs in Script Mode
1. Go to the File menu.
2. Make a new file.
3. Give a name for the new file such as:
firstProgram.py and then save with .py
extension.
4. You can now start writing your code.
5. To run your code, save it first and then go to the
run menu  choose run Module or press F5.

7 December
2021 62
Python print() Function
The print() function prints the specified message to the screen, or other
standard output device.

The message can be a string, or any other object, the object will be
converted into a string before written to the screen.

print("Hello", "how are you?") Hello how are you?

x = ("apple", "banana", "cherry") ('apple', 'banana', 'cherry')


print(x)

63
Your First Python Program
• Python is "case-sensitive":
• print("hello") #correct
• print('hello') #correct
• Print("hello") #error
• PRINT("hello") #error

• "hello" is called a String expression.


• In Python, String is a series(sequence) of characters
enclosed between " " or ‘ ’.
• When the computer does not recognize the statement to be
executed, a syntax error is generated.

7 December
2021 64
String Literals
• String literals in python are surrounded by either single
quotation marks, or double quotation marks.
• 'hello' is the same as "hello".
• Strings can be output to screen using the print function.
For example: print("hello").
• Like many other popular programming languages,
strings in Python are arrays of bytes representing
unicode characters.
• However, Python does not have a character data type, a
single character is simply a string with a length of 1.
Square brackets can be used to access elements of the
string.

65
String Literals
Example 1: Get the character at position 1 (remember that the first character
has the position 0):

a = "Hello, World!"
e
print(a[1])

Example 2- Substring: Get the characters from position 2 to position 5-1


(character in position 5 is not included):

b = "Hello, World!" llo


print(b[2:5])

66
Program Documentation

• Comment lines provide documentation about


your program.
• Anything after the “#” symbol is a comment
• Ignored by the computer

# First Python Program


# September 30, 2019

7 December
2021 67
Variables

• A variable is a name that refers to a value.


• Variables let us store and reuse values in several places.
• But to do this we need to define the variable and then tell it
to refer to a value.
• We do this using an assignment statement.
• Example:
>>> y = 3
>>> y
3

7 December
2021 68
Variables

• You can also assign to multiple names at the same time.

• Example1:
>>> x,y = 2,3
>>> x
2
>>> y
3

• Example2:
>>> x = 5; y = 4; L = [0,1,2]

7 December
2021 69
Variables
• Variable names can contain letters, numbers, and the
underscore (the dollar sign is NOT accepted!).
• Variable names cannot contain spaces.
• Variable names cannot start with a number.
• Variable name cannot be a reserved word.
• Case matters: temp and Temp are different variables.
• There are many reserved words such as:

and, assert, break, class, continue,


def, del, elif, else, except, exec,
finally, for, from, global, if, import,
in, is, lambda, not, or, pass, print,
raise, return, try, while
7 December
2021 70
Data Types in Python
Python supports four different numerical types:
• int (signed integers): positive or negative whole numbers with no
decimal point.
• long (long integers ): integers of unlimited size, written like integers
and followed by an uppercase or lowercase L.
• float (floating point real values): real numbers and are written with
a decimal point dividing the integer and fractional parts. Floats may
also be in scientific notation, with E or e indicating the power of 10
(2.5e2 = 2.5 x 102 = 250).
• complex (complex numbers): are of the form a + bJ, where a and b
are floats and J (or j) represents the square root of -1 (which is an
imaginary number). The real part of the number is a, and the
imaginary part is b. Complex numbers are not used much in Python
programming.
7 December
2021 71
Reading from the keyboard
• To read from the keyboard:
x=input(“enter your text” )#Hello Ahmad
print (x)
The output: Hello Ahmad

• For reading values (numbers) from the keyboard we


can use “eval” which converts the string to values.
• Example:
>>x =eval(input(“enter your number”)) #5
>>y =eval(input(“enter your number”)) #10
>>x+y
>> 15
7 December
2021 72
Casting in Python
• Python converts numbers internally in an expression
containing mixed types to a common type for evaluation.
• Sometimes, you need to explicitly convert a number from one
type to another. This is called casting.
• int(x) to convert x to a plain integer.
• float(x) to convert x to a floating-point number.
• str() to construct string from a wide variety of data types, including
strings, integer literals and float literals
• complex(x) to convert x to a complex number with real part x
and imaginary part zero.
• complex(x,y) to convert x and y to a complex number with
real part x and imaginary part y. x and y are numeric expressions.
• long(x) to convert x to a long integer.
7 December
2021 73
Casting in Python

>>> x = '100'
>>> y = '-90'
>>> print (x + y)
Since they are strings, x and
100-90 y will be concatenated

>>> print (int(x) + int(y))


10
Since they have been casted, values
of x and y will be added

7 December
2021 74
Casting in Python
• Casting to integers:
x=int(input(“enter the value”)) #5
y=int(input(“enter the value”)) #10
x+y = 15
• Casting to floats:
x=float(input(“enter the value”)) #5.0
y=float(input(“enter the value”)) #10.0
x+y = 15.0
• Casting to strings:
x = str("s1") # x will be 's1'
y = str(2) # y will be '2'
z = str(3.0) # z will be '3.0'

7 December
2021 75
Math Operators
Name Meaning Example Result
+ Addition 34 + 1 35
Can be used
also for string - Subtraction 34.0 - 0.1 33.9
concatenation:
y="hello" * Multiplication 300 * 30 9000
y=y+“ world!"
/ Float Division 1/2 0.5

// Integer Division 1 // 2 0

** Exponentiation 4 ** 0.5 2.0

% Remainder 20 % 3 2
7 December
2021 76
If statement

● The general form of an if statement is:


If condition:
block
● Example
if grade >=50:
Needs print (“pass”)
indentation
• The condition is a Boolean expression
• The block is a series of statements
• If the condition evaluates to true, the block is executed
7 December 2021 77
Indentation-No Braces

• Python relies on indentation, using whitespace, to define


scope in the code. Other programming languages often use
curly-brackets for this purpose.
• All lines must be indented the same amount to be part of the
scope (or indented more if part of an inner scope)
• This forces the programmer to use proper indentation since
the indenting is part of the program!

7 December
2021 78
Python Conditions and If statements

Python supports the usual logical conditions from mathematics:


•Equals: a == b
•Not Equal: a != b
•Less than: a < b
•Less than or equal to: a <= b
•Greater than: a > b
•Greater than or equal to: a >= b
These conditions can be used in several ways, most commonly
in "if statements" and loops.

Example:
a = 33
b = 200
if b > a: 79
print("b is greater than a")
The elif Statement
The elif keyword is a python’s way of saying "if the
previous conditions were not true, then try this condition".

Example:
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")

Output:
a and b are equal

80
The else Statement
The else keyword catches anything which isn't caught by
the preceding conditions.
Example:
a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")

Output:
a is greater than b

7 December
2021 81
Conditional Operators

7 December
2021 82
Common Mistakes

7 December
2021 83
Exercises
1.Write a program that asks the user to enter a length in
centimeters. If the user enters a negative length, the program
should tell the user that the entry is invalid. Otherwise, the
program should convert the length to inches and print out the
result.There are 2.54 centimeters in an inch.
2.Ask the user for a temperature. Then ask them what units,
Celsius or Fahrenheit, the temperature is in. Your program
should convert the temperature to the other unit. The
conversion Formulas are:
F = 9C/5 + 32 and C = 5 (F -32) /9

7 December
2021 84
Exercises
Solution of Ex 1:
length = float(input(“Enter the length in Centimeters: "))
if length <0:
print("Length should be positive!!")
else:
inch = length/ 2.54
print(length, “Centimeters is: ",inch," Inches")
Output:
Enter the length in Centimeters : 20
20.0 Centimeters is: 7.874015748031496 Inches
>>>
Note:
• You can use the round function to round the result:
print(length, “Centimeters is: ",round(inch,1)," Inches")
• In this case, the output will be rounded into 1 decimal place:
20.0 Centimeters is: 7.9 Inches
7 December
2021 85
Exercises
Solution of Ex 2:
temp = float(input(“Enter the temperature: "))
unit = input(“Enter the unit: C/F: ")
if unit == "C":
fah = 9/5 * temp + 32
print(temp," Celsius is ",fah," Fahrenheit")
else:
cel = 5/9*(temp -32)
print(temp," Fahrenheit is ",cel," Celsius")

Output:
Enter the temperature: 50
Enter the unit: C/F: C
50.0 Celsius is 122.0 Fahrenheit
>>>
7 December
2021 86
For Loop
A for loop is used for iterating over a sequence (that is either a list, a
tuple, a dictionary, a set, or a string).
With the for loop we can execute a set of statements, once for each
item in a list, tuple, set etc.

• The structure of a for loop is as follows:

for variable name in range(number of times to repeat ):


statements to be repeated

• Example 1 The following program will print Hello ten times:


for i in range(10):
print('Hello')

7 December 2021 87
For Loop
• Example 2 The program below asks the user for a number
and prints its square. It does this three times and then prints:
‘The loop is done’.

The output:
No
indentation
here; so it is
outside the
loop

88
7 December 2021
For Loop

--output--
A
B
C
D
C
D
C
D
C
D
C
D
E
7 December
2021 89
The range function
• To loop through a set of code a specified number of times, we can
use the range() function.
• The range() function returns a sequence of numbers, starting from 0
by default, and increments by 1 (by default), and ends at a specified
number.
• The value we put in the range function determines how many times
we will loop.
• The range function produces a list of numbers from zero (by
default, unless other is specified) to the value minus one.
• For instance, range(5) produces five values:

0, 1, 2, 3, and 4.
7 December
2021 90
The range function

--output--
0
1
2
• Prints the numbers from 0 to 99. 3
.
.
.
.
.
.
.
.
99
7 December
2021 91
The range function
Example
• Since the loop variable i, gets increased by 1 each time
through the loop, it can be used to keep track of where we
are in the looping process. Consider the example below:

7 December
2021 92
The range function
• If we want the list of values to start at a value other than 0, we can do that by
specifying the starting value.
range(1,5) will produce the list 1, 2, 3, 4.

• Another thing we can do is to get the list of values to go up by more than one at a
time. To do this, we can specify an optional step as the third argument.
range(1,10,2) steps through the list by twos, producing 1, 3, 5, 7, 9.

• To get the list of values to go backwards, we can use a step of -1.


range(5,1,-1) will produce the values 5, 4, 3, 2 in that order.
Note that the range function stops one short of the ending value 1.

7 December 2021 93
The range function

Here are a few more examples:

Output

7 December
2021 94
The range function
Example:
for i in range(1,7):
print (i, i**2, i**3, i**4)

----output----
1 1 1 1
2 4 8 16
3 9 27 81
4 16 64 256
5 25 125 625
6 36 216 1296
>>>
7 December
2021 95
The range function
Here is a program that counts down from 5 and then prints a message:

Output:

Note:
• Python’s print() function comes with a parameter called ‘end’.
• By default, the value of this parameter is ‘\n’ (the new line character).
• You can end a print statement with any character or string using this parameter.

7 December
2021 96
The while loop

• We have already learned about for loops, which allow us to repeat


things a specified number of times.
• Sometimes, though, we need to repeat something, but we don’t
know ahead of time exactly how many times it has to be repeated.
For instance, a game of Tic-tac-toe keeps going until someone wins
or there are no more moves to be made, so the number of turns
will vary from game to game.
• This is a situation that would call for a while loop.

7 December
2021 97
The while loop
While statements have the following basic structure:

while condition:
action

As long as the condition is true, the while statement will execute the action.

Example:
x = 1
while x < 4: # as long as x < 4...
print (x**2) # print the square of x
x = x+1 # increment x by +1
--output--
1 # only the squares of 1, 2, and 3 are printed, because
4 # once x = 4, the condition is false
9
7 December
2021 98
The while loop
The following while and for loops are equivalent

• (They have the exact same effect):


--output--
0
1
2
3
4
5
6
7
8
7 December
9 2021 99
The while loop
Pitfall to avoid:

• While statements are intended to be used with changing


conditions.
• If the condition in a while statement does not change, the
program will be in an infinite loop until the user hits ctrl-C.

Example:
x = 1
while x == 1:
print('Hello world‘)
# so-called Infinite loop! Python will keep printing
# “Hello world” because x does not change

7 December 10
2021 0
The while loop

• The optional else clause runs only if the loop


exits normally (not by break)

x = 1
---output---
while x < 3 :
1
print (x)
2
x = x + 1
hello
else:
print ('hello')

7 December 10
2021 1
The break Statement
With the break statement we can stop the loop even if the
while condition is true:

• Now, consider this case with break


x = 1 --output--
while x < 5 : 1
print (x)
x = x + 1
break
else :
print ('got here')

7 December 10
2021 2
The continue Statement
With the continue statement we can stop the current iteration, and
continue with the next.

Comparison between break and continue


Example:
Exit the loop when i is 3: Continue to the next iteration if i is 3:
break continue
i = 1 i = 0
while i < 6: while i < 6:
print(i) i += 1
if i == 3: if i == 3:
break continue
i += 1 print(i)
1 1
2 2
3 4
5 103

6
Functions & Returns

• abs(x) The absolute value of x: the (positive) distance


between x and zero.
• ceil(x) The ceiling of x: the smallest integer not less than x

• cmp(x, y) -1 if x < y, 0 if x == y, or 1 if x > y


x
• exp(x) The exponential of x: e

• fabs(x)The absolute value of x after converting x to float if it


can (if it can’t it throws an exception).
• floor(x)The floor of x: the largest integer not greater than x

• log(x)The natural logarithm of x, for x> 0

• log10(x) The base-10 logarithm of x for x> 0

7 December 10
2021 4
Functions & Returns
• max(x1, x2,...) The largest of its arguments: the value
closest to positive infinity
• min(x1, x2,...) The smallest of its arguments: the value
closest to negative infinity
• modf(x) The fractional and integer parts of x in a two-item tuple.
Both parts have the same sign as x. The integer part is returned as a
float.
• pow(x, y) The value of x**y.

• round(x [,n]) x rounded to n digits from the decimal point.


Python rounds away from zero as a tie-breaker: round(0.5) is 1.0
and round(-0.5) is -1.0.
• sqrt(x) The square root of x for x > 0
7 December 10
2021 5
Functions & Returns

• Some of the previous functions are built-in into the language


like pow() and abs().

• consider the following example:


x = -10
--output--
print (abs(x))
print (pow(x,2))
10
print (pow(x,3)) 100
y = 2.67 -1000
print (round(y)) 3
print (round(y,1)) 2.7

7 December 10
2021 6
Functions & Returns

• Other function are not built-in and require you to import the
math library.

import math --output--


x = 9 3.0
print(math.sqrt(x)) 7
y = 7.4 8
print(math.floor(y))
print(math.ceil(y))

7 December 10
2021 7
TM112: Introduction to Computing and
Information Technology 2

Meeting #4
Block 1 (Part 2)
Introduction to problem solving in Python
OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati


108
Contents

• Introduction
• Strategies for success
• Strategies for problem solving
• 2.1 Problem solving using decomposition
• 2.2 Iteration
• 2.3 Problems
• 2.4 Using lists for flexibility
• 2.5 Nested iteration
• Summary

109
Introduction
• Programming is about solving problems, so while it is
sometimes perceived as difficult, it is closely related to
many things we do every day.
• Python is a great language for beginner programmers as
small programs can be quickly and easily written.
• In this module, programming is viewed as a problem-
solving process that requires you to think about a
problem and decompose it, before writing any code.

110
Strategies for problem solving

• Often when solving a problem, we can adapt a solution to


one solved earlier.
• This ability to reuse solutions is key. It means that, while
we all start as novice programmers, if we keep going, we
can fully expect to end up tackling complex problems.
• When we are writing a program, we are producing an
algorithm.

111
Problem solving using decomposition
(Algorithms)
• Solving problems, when programming, starts with
working out an algorithm.
• An algorithm tells us or a computer how to carry out a
task.
• An algorithm should be self-contained in that all of the
instructions for the task are included within the algorithm
and nothing else is needed.
• A step-by-step approach allows anyone following the
algorithm to organise their work and know precisely
where they are in the algorithm.
112
Problem solving using decomposition
(Algorithms)
• Most of the algorithms we meet in everyday life – such as
recipes or instructions for completing a task – are
sequences of individual actions.
• We start at the first action (usually at the top, and
sometimes labelled 1 or a) and work through the actions
in order.
• A key point about sequences is that the outcome of
performing an action is dependent on the outcome of
previous steps.

113
Problem solving using decomposition
(Make life simple: algorithms in simple English)
We will describe an algorithm using a series of steps
expressed largely in ordinary natural language, but in a
structured way.
• We solve a problem by decomposing (dividing) it into
steps that solve it.
• If the original problem is complex enough, we may first
decompose it into sub-problems and then decompose
these into steps.
• For difficult problems, we may further decompose the
sub-problems.
114
Problem solving using decomposition
Programming and robotic turtles

• We have seen how to produce an algorithm by


decomposing a problem. How does this relate to writing a
program? Quite simply, we need to keep decomposing
until the steps of our algorithm are simple enough to
translate into single lines of a programming language.
• In this module, the language we are using is Python, and
we will introduce appropriate parts of the language as
needed.

115
Problem solving using decomposition
Programming and robotic turtles

• In this subsection, we will program a robotic turtle.


• This is a turtle that can be instructed to move around a
two-dimensional space, leaving a trace of its movement
using a coloured pen.
• This provides a lot of opportunities for drawing
interesting shapes, diagrams and indeed pictures.

116
Problem solving using decomposition
Programming and robotic turtles

• Here is our first simple turtle program. (You can try these
commands in Python and see what happens.)

# Draw start of staircase


from turtle import * Comment “no execution“
forward(40)
left(90) the line is stating that we will be
forward(40) working with commands that
right(90) apply to a turtle
forward(40)
The operation that have been
available for us to use on turtle

forward(40) : The operation actually works in terms of pixels. The turtle


always starts pointing horizontally to the right (by default in the middle of
117
the window), so this line results in the turtle drawing a horizontal line from
left to right of length 40 units.
Problem solving using decomposition
Programming and robotic turtles

• So the program draws the start of a


staircase consisting of three lines.
• Note that the figure includes an
arrowhead representing the turtle.
• While in general we might want to
hide this, and Python hides the
turtle using ht(), for our purposes it
is sometimes useful to see where the
turtle is and where it is pointing. This
is partly because some of our
solutions will be used as part of
larger solutions.
118
Problem solving using decomposition
Programming and robotic turtles

• If we had written the solution to the previous problem in


natural language (English, in our case), we might have had
something like:
> Draw start of staircase
move forward by 40 units
turn left by 90 degrees

• Our first line uses a ‘>’ symbol, which shows that the first line is
a heading: ‘> Draw start of staircase’, which tells us what we
want to do. It describes the problem we are solving.
• The next two lines are a decomposition of the heading line
above. These two lines achieve the task set out in the heading.
119
Problem solving using decomposition
Drawing some simple shapes through decomposition

Natural Language Python

120
Problem solving using decomposition
Iteration

• Imagine I have decided that I need to walk 10 000 steps


per day to stay fit. I want to know that I have walked 10
000 steps, and I decide I will be more likely to do this if I
have an algorithm to follow.
• I start to write the algorithm:

Having realized that writing the


algorithm out this way is going to
be very tedious, I want a way of
writing down something that
repeats a number of times.
121
Problem solving using decomposition
Iteration
• We will have a variable called ‘step-counter’, and we
need a way of giving it a value, doing something, and
increasing the value. We also need to stop after 10 000.
• We will use the keyword ‘for’ to show that we want to do
something for a number of times. We will use it as
follows:
• This is an example of iteration:
something gets carried out a
number of times (Loop).
• The idea is that we do
something repeating it a
number of times.
122
Problem solving using decomposition
(Making choices)
Now, suppose that I want to click my fingers only when a
multiple of 10 is reached. Then, I can modify my algorithm as
follows:

• Notice that the if keyword is indented at the same level as ‘Walk a


step’ – since they are two parts of the same sequence. The click step
is indented further, to show it is conditional on the condition
expressed by the if statement.
123
Problem solving using decomposition
(Programming for repetition)
In this case, for denotes iterating (looping) around a statement or sequence of
statements.

Python

• The program will move the turtle forward by 100 units. It does this by
moving forward 10 units, and repeating this movement ten times.
• range(1, 11) means that the range of numbers starts at 1, counts
upwards by 1 and stops just before 11.

124
Problem solving using decomposition
(Programming for repetition)
Python
Alternative Solution

• Recall that In Python, when we specify a range using just one


number, then the range starts at 0.
• In this case, when using range(10), we get a range of whole numbers
from 0 to 9.

125
Problem solving using decomposition
(A more powerful approach to design)
• Let’s think about a slightly more complicated problem.
We will design and implement a program to draw two
squares, one below the other, with a gap in between.

Problem decomposition
126
Problem solving using decomposition
(A more powerful approach to design)
• We have now decomposed the problem into sub-problems,
rather than steps.

The symbol >> refers to sub-problems


127
Problem solving using decomposition
(A more powerful approach to design)
• Now we can decompose
sub-problems into steps.
• We did copy and paste
from the original square
algorithm! This is know as
reusing solutions.
• Reuse is a mature approach
to program design for
saving time, and avoid
introducing errors
128
Using lists for flexibility
(Drawing a graph of a fixed number of points)
• In this subsection, we will consider how to draw a line graph.
• We will assume we are plotting the sales of gloves, by a
given company, over the four quarters of the year.
• Now, we have seen how a variable could be used to control a
loop, and that it is essentially a named box that can hold a
value.
• Here we will use variables to hold the sales of gloves for each
quarter.
• Variables gloves1, gloves2, gloves3 and gloves4 to hold the
number, in millions, of gloves sold in the first, second, third
and fourth quarters.
• When working with variables, we need to be able to give
them values. Python has, like many programming
languages, a construct called assignment. 129
Using lists for flexibility
(Drawing a graph of a fixed number of points)
• The turtle has considerably more operations than we
have seen so far.
In particular, it has an operation goto that allows us to move the
pen to a position, drawing a line as we go.

• Assume we are starting at position (0,0), the python


code goto(40,0) moves the turtle horizontally to
position (40,0), drawing a line if the pen is down. The
first value in the brackets represents the horizontal
displacement (from 0) and the second value represents
the vertical displacement.
130
Using lists for flexibility
(Drawing a graph of a fixed number of points)
Python
# Produce graph for gloves sales
from turtle import *
# set up the variables
g1=10
g2=8
# produce the x axis
goto(40,0)
goto(0,0)
# produce the y axis
goto(0,100)
goto(0,0)
# Plot data
goto(20, g1*10)
goto(40, g2*10)
ht() # to hide the turtle prompt
131
Using lists for flexibility
(Drawing a graph of a fixed number of points)
Python
# Produce graph for gloves sales
from turtle import *
# set up the variables
clear()
pu()
setpos(0,0)
pd()
g1=10
g2=8
# produce the x axis
goto(40,0)
goto(0,0)
# produce the y axis
goto(0,120)
goto(0,0)
# Plot data
goto(20, g1*10)
dot( 5, "blue")
write("g1", False,"center","bold")
goto(40, g2*10)
dot( 5, "red")
132
write("g1", False,“left","bold")
ht() # to hide the turtle prompt
Using lists for flexibility
(Working with simple lists)
• Now you may have noticed that in the last section we
were doing something repeatedly, but we didn’t use a
loop.
• We needed to plot a number of points, but each point
required us to use a different variable.
• Let’s think about our gloves data. Imagine we want to
know the total sales of gloves for the year.
total = gloves1 + gloves2 + gloves3 + gloves4

133
Using lists for flexibility
(Working with simple lists)
• A list is a data structure in Python that is a mutable
(or changeable), ordered sequence of elements.
• Each element or value that is inside of a list is called an item.
• Lists are defined by having values between square brackets [ ].
• Lists are great to use when you want to work with many
related values.
• They enable you to keep data together that belongs together,
condense your code, and perform the same methods and
operations on multiple values at once.

13
4
Using lists for flexibility
(Working with simple lists)
• To create a list we use square brackets to indicate the start and end of
the list, and separate the items by commas:
L = [1,2,3]
• You can use the print function to print the entire contents of a list:
print(L) #[1, 2, 3] will be printed
• Lists are mutable: individual elements can be reassigned in place.
L[0] = 4
print(L) #[4, 2, 3] will be printed
• The empty list is []. It is the list equivalent of 0 or empty string ''.
• If you have a long list to enter, you can split it across several lines, like below:
nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40]

13
5
Using lists for flexibility
(Working with simple lists)
• If we had sales data for every day of the year, with
gloves1 holding the data for the first day of the year,
and so on, we could still use the previous approach –
but things become tedious.
• Instead, we would like a way of holding the glove data
together and being able to access it in a flexible way.
• In Python, we can say:
gloves = [10,8,3,5]
• This variable refers to a list of numbers, rather than just
a single number as earlier. 136
Using lists for flexibility
(Working with simple lists)
Now, to calculate and print the total number of glove
sales :
gloves = [10,8,3,5]
total = 0
for index in range(0, len(gloves)):
total = total + gloves [index]
print(total)
This gives me a loop which gets executed with values for
index of 0, 1, 2, and 3.

137
Nested iteration
(Independent nested loops)
• Sometimes we use loops within loops (embedded loops) to solve
more complex problems.

• For example, what if we had monthly sales data of gloves for


several years and wanted to calculate the total sales for each year?
In this case, we might have a list containing a list for each year.

• Consider the problem of producing a ‘times table’: a table showing


all the possible multiplications of two numbers between 1 and 12, in
order.

• Breaking the problem down, we see the need to show all the
multiples of 1, all the multiples of 2, and so on. So we can use a loop.
The sub-problems, such as finding the multiples of 1, 2 and 3, can
also be done using a loop. 138
Nested iteration
(Independent nested loops)
Problem
Decomposition

# Produce times table


size=12
for row in range(1, size+1):
for column in range(1, size+1):
print(row*column, end=' ')
# Move to a new line
print()

Translated
Python Code 139
Nested iteration
(Independent nested loops)
• In the previous case, we just want a space character between the
numbers. We need to include end = ... any time we don’t want print
to move to a new line. So end = ' ' adds a space character without
moving to a new line.
• To prevent print moving to a new line, but without adding a
character, we can say end = ''.

140
Nested iteration
(Programming the turtle using nested loops)
• Consider a program to produce a number of squares across
the page. A decomposition of this problem is as follows:

• If we now copy the


decomposition from
subsection 2.2.3 for ‘Draw a
square’ into the above
decomposition, we get:
141
Nested iteration
(Independent nested loops)

Problem
Decomposition

# Draw squares across page


from turtle import *
number_of_shapes=4
for shape in range (1, number_of_shapes +1):
for shapes in range(1,5):
forward(40)
right(90)
# Move forward to start position of next square
pu()
forward(50) Translated 142

pd()
Python Code
Nested iteration
(Dependencies between nested loops)
• In this subsection, we will look at loops where the index of
the inner loop is dependent on the outer loop.

• Consider if we wanted to print a triangle, not using the


turtle robot, but as a block of characters – for example, the
following right-angled triangle:

*
**
***

143
Nested iteration
(Dependencies between nested loops)

Python
# Print Right-angled triangle
size=3
for line in range(1,size+1):
for asterisk in range(1, line+1):
print('*', end='')
print()

144
Summary
After studying this part, you should be able to:
• decompose a simple problem to produce an algorithm,
using sequence, selection and iteration
• translate a simple design of an algorithm into Python
• make use of iteration to produce code that can solve
problems where we need to do things several times
• make use of lists to express the idea that a number of
data items are related
• solve problems involving drawing line-based images using
turtle graphics
• use algorithmic thinking to solve problems. 145
TM112: Introduction to Computing and
Information Technology 2

Meeting #5
Block 1 (Part 4)
Patterns, algorithms and programs 1
OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati


146
Contents

• Introduction
• 4.1 Calculate
• 4.2 Document and test
• Lists
• Summary

147
Introduction
• We will study several types of problems and the
corresponding patterns that you can use to solve them.
• A pattern will be a template to be filled in to obtain a
concrete algorithm.
• When given a problem, if you recognise the type of the
problem, you can then use the corresponding pattern to
create a concrete algorithm for the given problem. That is
much easier and faster than having to design an algorithm
from scratch.

148
Introduction
• The problem-solving process you saw in Part 2 becomes more
structured, with intermediate steps to go from problem to
algorithm, illustrated in this Figure.

149
4.1 Calculate
• Throughout this part, we will show you various kinds of
numeric problems,
• i.e. where the inputs and outputs are numbers.
• In this section, we’ll start with the simplest of such
problems, with a single output calculated directly from
the inputs.
• The aims of this section are to recap and summarise some
concepts you already came across in Part 2 and possibly in
TM111, and to set the stage for the structure of the
numeric problems and solutions in the rest of this part.

150
4.1.1 Numeric expressions

• Python can be used as a simple calculator.


• Python provides integers (whole numbers) like –3 and 0 and
123456789. Unlike other programming languages, Python integers can
be arbitrarily large (for both positive and negative integers).
• Note that an integer must be written as a sequence of digits: even for large
numbers, you can’t use commas, full stops, underscores or any other mark to
group digits into thousands.
• Python also provides floating-point numbers (decimal numbers) like –
3.2 and 0.5 and 12345.6789.
• Python has no thousands separator and uses the point as the decimal
mark, so the number one million and a half is written as 1000000.5.

151
4.1.1 Numeric expressions

• Python provides several numeric operators to express


calculations over numbers.
• Besides addition (+), subtraction (-) and multiplication (*),
there are two kinds of divisions.
• A single division sign (/) represents exact division, i.e. the result is
always a floating-point number,
• A double division sign (//) represents floor division, i.e. the result
is always rounded down to an integer.
• For example, 1/2 is 0.5, but 1//2 is 0.

152
4.1.1 Numeric expressions
• Python also provides the remainder (modulus) operator,
confusingly written as the percentage sign (%).
• The % operator calculates the remainder of the floor
division of the first number by the second,
• For example, 7%2 (read as “7 modulo 2”) is 1 because 2 fits 3 times
in 7 (3 * 2 = 6) and 1 remains (7 – 6 = 1).
• The remainder operator is useful to check if a number n is a
multiple of another number m; if it is, then n modulo m will
be 0, because m will fit an exact number of times in n.
• For example, 14%7 is 0 because 14 is a multiple of 7.

153
4.1.2 Formula problems: pattern
Problem 4.1 Brick volume

• Compute the volume of a brick, given its width, length and


height.
• We will systematically follow the problem-solving
workflow in slide 4 and, in the process, introduce the first
problem type and its solution pattern.

154
4.1.2 Formula problems: pattern
Problem 4.1 Brick volume
Decompose the problem
• The first step is to decompose the problem into simpler
sub-problems.
• In this case, we remember from school that usually the
volume of a solid is obtained from the area of its base and
its height. In this problem, the base of the brick is the
rectangular area formed by the width and length.
• Since the volume depends on the area, I have to
decompose the problem in the right order:
• > Compute the volume, given the width, length and height:
• >> Compute the base area, given the width and the length
• >> Compute the volume, given the base area and the height
155
4.1.2 Formula problems: pattern.
Pattern 4.1 Formula
Line Instruction

1 initialise the input variables

2 set the output variable to the value of the formula applied to the
inputs

3 print the output variable


• The pattern is not an algorithm: it doesn’t tell us the precise
variables and values to use, because that will depend on the
problem.

• The pattern is a template that needs to be ‘filled in’ to become an


algorithm. Every algorithm thus obtained is a concrete instance of
the abstract pattern. Therefore we will also use the verb
‘instantiate’ to mean ‘fill in’.
156
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm
Instantiate the Pattern
• We have to fill in the pattern twice: once for each sub-
problem.
• The pattern mentions input and output variables.
• In the first sub-problem (computing the area), the inputs are
the width and length and the output is the area.
• In the second sub-problem (computing the volume), the
inputs are the area and the height and the output is the
volume.
• Note how the input of a sub-problem can be the output of a
previous sub-problem.

157
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm

• We can now start to develop the algorithm for the first


sub-problem (computing the area) by filling in the
pattern.
• We are going to fill in the pattern bit by bit, so there will
be several intermediate, partially filled-in versions of the
algorithm.
• We first insert the patterns into the problem
decomposition to ensure I won’t forget to fill in all the
steps.

158
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm
• Line 3 asks to provide the
initial values, but the sub-
problem doesn’t state what
the values of the width and
length are.
• Usually the algorithm
would ask the user to type
in some values on the
keyboard, but to keep
things simple I choose the
values myself – preferably
small ones (let’s say 2 and
3) to easily check if the
area is correct in line 5.
• So, line 3 of algorithm 4.1
version 1 is instantiated as
159
follows:
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm
• Note how line 3 of the pattern
became two lines of the
algorithm.
• Note that we use specific and
descriptive variable names that
capture the problem at hand.

• Next we have to fill in what is


now line 5.
• The output is the area, and the
formula to calculate a
rectangular area is simply the
product of width and length.

• Next, in line 6, we replaced the


output variable with area as per
the variable used in line 5.
160
4.1.2 Formula problems: pattern
4.1.4 Formula problems: program

Implement the algorithm

• The next stage of the problem-solving process is to


translate the algorithm to Python code.

• This is straightforward for formula problems because


they only use assignments, numbers and expressions.

161
4.2 Document and test
• For complicated or less familiar problems, the reader can struggle
to understand what’s going on in the code, even with descriptive
variable names.

• The solution is to document the code, by adding comments.

• Comments should be used to explain the inputs and outputs, in


what units the values are, and any constraints – for example, not
being negative.

• Simple and familiar formulas don’t require explanation.

162
4.2 Document and test
• No matter how simple your program is, you should get
into the habit of testing it to catch any errors.

Testing a program consists of writing down several pairs


of inputs and the corresponding expected outputs, running
the program for those inputs, and checking whether the
actual outputs match the ones you expected.

163
4.2 Document and test
• You only have to test your program for admissible input
values and document in the code which values are not.
• If the user inputs an inadmissible value, it’s their fault, and
all bets are off: your program may crash, output a
nonsensical value – or, even worse, output a value that
seems correct.
• For numeric problems, some of the typical conditions for
an input value to be admissible are: the value is an integer;
positive; negative; non-zero; a multiple of some number n;
a percentage (i.e. a floating-point number from 0 to 1 or
an integer from 0 to 100); larger than another input; etc.
164
How to solve computational
problems- Summary
As a conclusion, we briefly summarize a process to solve
computational problems:

The process consists of:


1- Decomposing the problem into sub-problems.
2- Recognising the type of each sub-problem, and instantiating the
corresponding solution pattern to obtain an algorithm for each sub-
problem.
3- Putting the algorithms together to solve the overall problem.
4- Translating the overall algorithm to Python code.

Finally , programmers are urged to document and test where necessary.

165
Lists

• Lists are written as comma-separated items (also called the


elements of the list) within square brackets. The empty list,
without any elements, is simply written as [].
• An example of a list with three integers is [4, 2, -9].
Lists can be assigned to variables, e.g. temperatures =
[4, 2, -9]. The variable name should reflect the content
of the list to make the program easier to understand.
The length of a list is its number of elements.

The Python expression


len(temperatures) computes the length of temperatures
– in this case, 3.
166
Lists

• There are a number of things which work the same way for lists
as for strings.
• len(): where you can use len(L) to know the number of
items in a list L.
• in: operator which tells you if a list contains something.
• Examples:
if 2 in L:
print('Your list contains number 2.')
if 0 not in L:
print('Your list has no zeroes.')

16
7
Lists
• A list is a data structure in Python that is a mutable
(or changeable), ordered sequence of elements.

16
8
Lists
• A list can contain all kinds of things, even other lists.
• For example, this is a valid list:
L = [1, 2.718, ‘xyz’, [1,2,5]]

• To access the items of the list inside a list, you have to use double squared
brackets.

16
9
Lists

• Indexing and slicing work exactly as with strings:


L[0] is the first item of the list L.
L[:3] gives the first three items.
• The + operator adds one list to the end of another.
• The * operator repeats a list.
• Following are some examples.

17
0
Lists

In Python, for loop is used to iterate over the items of any sequence
including the Python list, string, tuple etc. The for loop is also used to access
elements from a container (for example list, string, tuple) using built-in
function range().

Both of the following examples print out the items of a list, one-by-one, on
separate lines.

17
1
Lists

17
2
Lists

17
3
Lists
More operations on lists:
• Concatenation:
>>> L1 = [0,1,2]; L2 = [3,4,5]
>>> L1+L2
[0,1,2,3,4,5]
• Repetition:
>>> L1*3
[0,1,2,0,1,2,0,1,2]
• Appending:
>>> L1.append(10)
[0,1,2,10]
• Sorting:
>>> L3 = [2,1,4,3]
>>> L3.sort()
[1,2,3,4] 17
4
Lists
• Reversal:
>>> L4 = [4,3,2,1]
>>> L4.reverse()
>>> L4
[1,2,3,4]
• Shrinking:
>>> del L4[0] # L4 will be [2,3,4]
>>> L4 = [] # L4 will be []
• Index and slice assignment:
>>> L1 = [10,20,30,40]
>>> L1[1] = 0 # L1 will be [10,0,30,40]
>>> L1[1:4] = [4,5,6] # L1 will be [10,4,5,6]
• Making a list of integers:
>>> list(range(4))
[0,1,2,3]
>>> list(range(1,5))
[1,2,3,4] 17
5
Example- List of lists
• Suppose we have the following list:
List=[['Ali', 5, 10,15],['Naji',12,12,15],['Fadi',10,14,12],['Rajaa',18,16,14]]
- Print the average of the second grade for all students.
- Print the name of the student and his/her average grade

Accessing the 2nd grade

OR

176
Summary
In this part, you learned further techniques to solve
computational problems by:
• looking in the problem statement for the input(s) and the
output(s)
• thinking about which inputs are not allowed, e.g. negative
numbers
• writing tests (pairs of admissible inputs and their
expected outputs)
• recognising the type of the problem or the types of the
sub-problems
• instantiating the patterns for those problem types
• combining the algorithms for the sub-problems to solve
the whole problem.
177
TM112: Introduction to Computing and
Information Technology 2

Meeting #6
Block 2 (Part 2)
Patterns, Algorithms and Programs 2

Block 2 (Part 4)
Organizing Your Python Code and Data

OU Materials, PPT prepared by Dr. Khaled Suwais


178
Edited by Dr. Ahmad Mikati
Contents

• Introduction
• Generate a List- Append
• Reduce
• Search
• Combine
• The Final Problem
• Summary

179
Introduction
• This part continues to apply the problem-solving approach from Block 1
Part 4, in which we go from problems to programs via patterns and
algorithms.

• Block 1 Part 4 ended with the generation of a sequence of numbers. To


be able to better process sequences of numbers, they will be stored in
lists.

• Lists are a very useful and flexible way of storing multiple data items.
Most Python programs use lists. In this part, I will show some common
problem types on lists and the corresponding solution patterns.

• As part of the overall problem-solving process, this part will also


continue to emphasize the importance of documenting and testing
your programs.

180
• Generate a List

• Sequences and Lists


• Append
• Filter
• Transform

181
Append
• Normally the generated sequence has to be stored for
further processing, and lists are ideally suited for that.
Storing the generated sequence in a list requires just two
changes to Pattern 2.1:
 start with the empty list.
 instead of (or in addition to) printing the value,
append it to the list, i.e. add it to the end of the list.

• By appending each value, the list will store the values in


the same order as they were generated.

182
Append
• In Python, adding value to the end of a list , say sequence, is
intuitively written as:

sequence = sequence + [value]


Note the square brackets around the value. That tells Python to
concatenate (‘add’ together) two lists, the second of which happens to have one
element in this case. Forgetting the brackets will lead to an error.
or
sequence.append(value)

183
Filter

• Often lists are not generated from scratch


but from an existing list. One common way is
to ‘filter’ the existing list: generate a new list
that only includes those values of the input
list that satisfy some condition.

184
Python editor: hot days program
• Given a list of daily temperatures in degrees Celsius in a certain place,
construct a list of the temperatures above 30. Assume temperatures are
given as whole numbers.

# Problem: obtain the temperatures above a threshold


# Input: a possibly empty list of temperatures in Celsius
daily_temperatures = [28, 33, 29, 32, 27,42,29,36]
# Output: the list of temperatures above 30
hot_days = [] # a new list that will contain the hot temperatures
for temperature in daily_temperatures:
if temperature > 30:
hot_days = hot_days + [temperature] # or hot_days.append(temperature)
print('The hot days had temperatures', hot_days)

185
O/P: The hot days had temperatures [33, 32, 42, 36]
Transform
• Another way to construct a new list from an existing
one is to transform each item of the input list into
one item of the output list.

Problem 2.2 Celsius to Fahrenheit


• Convert a list of temperatures in degrees Celsius to a
list of temperatures in degrees Fahrenheit. Both
input and output temperatures can be floating-point
numbers.
• The pattern for the list transformation problem is as
follows.
186
Python editor: Celsius to Fahrenheit
# Problem: convert Celsius temperatures to Fahrenheit
# Input: a possibly empty list of temperatures in Celsius
celsius_values = [28, 33, 29, 32, 27,42,29,36]
# Output: the list of temperatures in Fahrenheit
fahrenheit_values = [] # a new list that will contain the transformed values
for celsius in celsius_values:
fahrenheit = celsius * 1.8 + 32
fahrenheit_values = fahrenheit_values + [fahrenheit] # or use append
print('The temperatures in Fahrenheit are',fahrenheit_values)

The temperatures in Fahrenheit are [82.4, 91.4, 84.2, 89.6,


80.6, 107.60000000000001, 84.2, 96.8] 187
Reduce
• A typical class of problems on lists is to reduce them to a single value, which
might or might not be in the list. For example, a list of strings might be
reduced to a single string that is the concatenation of all the strings in the
list.
• In this and the next section you will see three types of reduction problems,
illustrated with numeric lists.
 Counting problems involve computing how many numbers are in the list, or
how many numbers in the list satisfy a certain condition, e.g. are non-zero.
 Aggregation problems are a generalization of counting problems. They
involve computing some new number (e.g. the sum or the product) out of all
or part of the numbers in the list.
 Retrieval (or search) problems involve finding a particular number in the list
that satisfies some condition, e.g. the largest number. Contrary to the other
two, in this type of problem the resulting number is always in the list.
• All these problems can be solved by similar programming patterns that use a
for- or while-loop and at least three variables: one for the list containing the
items to be processed, one to step through the items, and one to
incrementally compute the value the list is being reduced to. Learning to
recognize a problem – or part of a problem – as a reduction problem will
allow you to apply those patterns, even to non-numeric lists, instead of
thinking of a solution from scratch. 188
Count
• In its more general form, the counting problem asks for how
many items in a list satisfy some condition. Here’s an
example.

Problem 2.3 Negative Temperatures


• Given a list of daily maximum temperatures over a week-
long period, calculate how many days the temperature was
below zero.

189
Program 2.5 Negative temperatures
• Pattern 2.5 Counting:
1. initialize the input list.
2. set a counter to zero.
3. for each item in list:
Algorithm a. if the item satisfies the condition:
i. increment the counter, i.e.
add 1 to it.
4. print the counter.

# Problem: count how many days temperature was below zero


# Input: temperatures, a list of 7 numbers
temperatures = [4, -5, 3, 1, 0, 3, -2]
# Output: number of days, a non-negative integer
Python Code days = 0
for temperature in temperatures:
if temperature < 0:
days = days + 1 # or day+=1
print('The temperature was below 0 for', days, 'days') 190

The temperature was below 0 for 2 days


Count – List Length
• As a further example, let’s look at the quintessential counting problem
on lists.

Problem 2.4 List length


• Compute the length of a list, i.e. count how many items are in the
list.

• It is such a commonly recurring problem that Python has already


solved it:

• len (some_list) computes the length of some_list. Nevertheless, it is


instructive to see that there is no magic to it.

• The main difference to the general problem is that there is no


condition to check: all items count towards the length of the list. 191
Aggregate
• The aim of aggregation is to compute a new number from the
numbers in the list. The resulting number is computed incrementally
as we iterate through the list, by ‘aggregating’ each number in the list
into the result so far.

• A quintessential aggregation problem is the sum of a list of numbers:


each number in the list is added to the result so far.
• Again, this is such a recurrent problem that Python already provides
an expression for it:

 sum(some_list) – to sum all the element of some_list


 The pattern generalizes Pattern 2.5 for counting problems:

192
Program 2.7 list sum
• Pattern 2.6 Aggregate:

1. initialize the input list.


2. initialize the aggregator with a suitable value.
3. for each number in the list:
a. if the number satisfies the condition:
i. update the aggregator according
to the value of the number.
4. print the aggregator.

# Problem: compute the sum of a list of numbers


# Input: numbers, a possibly empty list of numbers
numbers = [4, -1, 3, 5, -4]
# Output: total
total = 0
for number in numbers: # you can replace this part with:
total= sum(numbers)
total = total + number
print('The sum of', numbers, 'is', total)
193

The sum of [4, -1, 3, 5, -4] is 7


Search

• A filtering problem (subsection 2.1.3) can also be seen as a


search problem: retrieve all the items that satisfy a particular
condition. In this section, we look at the problem of retrieving
any one item that satisfies the condition. Such a search
problem is a reduction problem and therefore will be solved in
a manner similar to other reduction problems.

• Search problems can be stated for all types of items, but


again the examples will be with numeric items.

194
Find a Value

• The pattern is similar to the previous reduction patterns: a


variable stores the item found so far, if any, which will be the final
result when the iteration over the list ends.
• The issue is, again, how to initialize the result variable.
• The trick is, again, to think what should happen if the list is empty.
In that case, the loop is not executed, so the final value of the
result variable will be the value it was initialized with.
• If the list is empty, there is no item to be found, and the output
should somehow represent ‘no item’.
• Some programming languages have a ‘null value’ for that. Hence
the result variable has to be initialized with the ‘null value’.

195
Program 2.8 Find Negative Temperature
• Pattern 2.7 Find value:

1. initialize the input list.


2. set found to the null value.
3. for each item in the list:
a. if the item satisfies the condition:
i. set found to the item.
4. print found.

# Problem: find a negative temperature during the week


# Input: temperatures, a list of 7 numbers
temperatures = [4, 5, 3, 1, 0, 3, -2]
# Output: a negative temperature, if it exists, or None
negative = None
for temperature in temperatures:
if temperature < 0:
negative = temperature
print(temperatures,'has negative temperature',negative)

[4, 5, 3, 1, 0, 3, -2] has negative temperature -2196


Find the best value
• Another variation of the retrieval problem is to find the ‘best’ item
among a list, for some criterion of what ‘best’ means. Here is an
example.

Problem 2.5 Find The Coldest Day


• Given a list of daily temperatures, print the lowest value in the list.

• For this problem type, I will assume that the input list is not empty.
This guarantees that there will be a best value.

• The pattern to solve the general problem again uses a variable to


store the result so far, as we iterate over the list, but what is its initial
value? Since we have to go through the whole list to find the best
item, we can take any of the items as the baseline to compare
against. The simplest is to take the first item. 197
Program 2.9 Find Coldest Day
• Pattern 2.8 Find best value:
1. initialize the input with a non-empty list
2. set best to the first item in the list.
3. for each item in the list:
a. if the item is better than best:
I. set best to the item.
4. print best.

# Problem: find the lowest temperature of the week


# Input: a list of 7 numbers
temperatures = [5, 0, -3, 7, 8, 5, 0]
# Output: coldest, the lowest value in temperatures
coldest = temperatures[0]
for temperature in temperatures:
if temperature < coldest:
coldest = temperature
print('The lowest of', temperatures, 'is', coldest)
The lowest of [5, 0, -3, 7, 8, 5, 0] is -3
• Note: To avoid the redundant comparison, the loop should start with 198
the second item of the list, but that is not possible in an iterate-by-
item loop. If we iterate by position, then we can avoid it.
Program 2.9 Find Coldest Day
If we iterate by position, then we can avoid a redundant comparison
of the first item. The code will be:

# Problem: find the lowest temperature of the week


# Input: a list of 7 numbers
temperatures = [5, 0, -3, 7, 8, 5, 0]
coldest = temperatures[0]
# Output: coldest, the lowest value in temperatures
for i in range(1,len(temperatures)):
if temperatures[i] < coldest:
coldest = temperatures[i]
print('The lowest of', temperatures, 'is', coldest)

The lowest of [5, 0, -3, 7, 8, 5, 0] is -3


199
Block 2 (Part 4)
Organizing Your Python Code and Data

200
Functions in Python

• You may recall being told that you can find the size of a list named
gloves by using len(gloves). This involved a Python function,
len().

• The len() function is a built-in Python function: the Python


interpreter will know how to handle it, without you needing to add
any additional code to your program.

201
Function names and arguments
• When we talk about a function, we use its name – in this case, len – followed
by a pair of parentheses: len(). This way, you can see at once that we are
talking about a function rather than, for instance, a variable.

• In addition to a name, a function can have arguments (though not all


functions have arguments). In the example len(gloves), the variable
gloves is the argument of the function.

• When a function name is combined with its argument, the result is an


expression. As illustrated in Figure 4.2, combining the function name len
with the argument gloves results in the expression len(gloves).

Figure 4.2 The expression len(gloves) is


composed of the function name len and the
argument 202
Functions
Four parts of function’s definition:
1. The reserved word def which indicates that what
follows is a function definition.
2. The name of the function.
3. List of arguments enclosed in parentheses.
4. The body of the function, which is the set of
statements that carry out the work of the function,
noting that:
• Use colon ( : ) to start the body of the function.
• Use indentation for the body of the function .

def printHello():
print("Hello! ")

203
Functions without return

• Example: Write a function that will display a welcome message to a student with
his/her name. Then, use the function in your program.
def welcome(aName):
print("Hello " , aName)
• Someprint("Welcome
functions perform to AOU")
simple procedural tasks (specified in their bodies)
but do not return any information when they are called.
welcome("Ahmad") #Function call

• When the function is called, an actual value for the argument must be used.
• For example, when the function call welcome('Ahmad') is executed, the actual
string 'Ahmad' replaces the argument aName resulting in the following output:

-----output-----
Hello Ahmad
Welcome to AOU
204
Functions with return

• Some functions provide a value as the result of some calculations made in


the function’s body.
• Python provides us with the return statement:
return followed by the value it needs to return
• In order not to lose the returned value , you need to:
• assign it to a variable (for further calculations, if needed)
OR
• print it immediately.
205
Functions with return

4 6

def recArea(aHeight,
4
aWidth) :
6
area = aHeight * aWidth
return area
• Example : Write a function that takes the height and width as arguments,
calculates the area of a rectangle, and returns it. Then display the area in the
h =window.
output eval(input("Enter
Use the function inthe height: " ))
your program.
w = eval(input("Enter the width: "))
print("Area = ",recArea(h,w)) #Function call inside
print()

-----output-----
Enter the height: 4
Enter the width: 6
Area = 24 206
Functions with Multiple return values
So what if you want to return two variables from a function instead of one?
There are a couple of approaches which new programmers take.
Let’s take a look at a simple example:

# example on multiple return vales # example on multiple return vales


def profile(): def profile():
name = input('Enter your name: ') name = input('Enter your name: ')
BD = int(input('Enter your birth date: ')) BD = int(input('Enter your birth date: '))
age= 2020- BD The returned values are age= 2020- BD
return [name, age] now in a list return name, age
Capturing the returned The returned
profile_data = profile() values in the list name, age = profile() values are
profile_data assigned to
print('Name: ',profile_data[0]) print('Name: ',name) variables
print('Age: ',profile_data[1]) Retrieving each print('Age: ',age)
item by its index
Enter your name: Naji Enter your name: Naji
Enter your birth date: 1999 Enter your birth date: 1999
Name: Naji Name: Naji
Age: 21 Age: 21
207
Hiding Complexity: Interfaces and
Implementations
• When a function is called, the resulting value is referred to as the return
value of the function.

• To use a function, all you need to know are:

• the name of the function.


• what kind of argument(s) it needs.
• what it gives back as a return value.
• what other effects calling the function has.

• There is a name for the items you need to know: the interface of the
function. To use a function, knowing the interface is enough. There’s no
need to know what goes on under the bonnet-its implementation.

• When the implementation is separated from the interface, thereby hiding


some of the complexity, we also say that an abstraction has been created.
208
The Python Interpreter and Functions

• The Python interpreter deals with a program one line after the
other, starting with the very first line. A normal program line is
executed when the interpreter gets to it. However, functions do
receive special treatment, which we examine in this section.

• When the Python interpreter encounters a function definition, it


reads this definition into memory for later use.

• The interpreter only executes the lines of a function definition,


when the function is called.

209
The Python Interpreter and Functions

• On those occasions where a function is called, the parameters are


initialized with the actual arguments and then the statements in its
body are executed.
• On encountering the keyword return, Python does two things:

1. the function stops executing (control is handed back to the


point where the function was called originally)
2. the value of the expression following return is passed back. In
other words, the evaluation of the function expression is
complete and the result is the return value of the function.

210
Using Functions-The Benefits
• Replacing duplicate code with a function can make a program
shorter and more readable.

• A further advantage is that it makes it much easier to modify the


code, if needed.

• In short, functions can help us write code that:

 is shorter and consequently more readable.


 is easier to change, making it less likely that errors are
introduced.
 explicitly mirrors the decomposition of a problem and
consequently is easier to understand.

211
Reusing Code
• There is a more elegant way to reuse functions. Instead of
copying a function into a new program, you can also put all
your functions – say, for drawing figures – into a separate file
(with the .py extension). Let’s call it
figure_drawing_functions.py. At the beginning of
your new program, you then simply add from
figure_drawing_functions import *. This has the
same effect as placing the function definitions themselves at
the beginning of the program. This way, you can create your
own library of figure drawing functions.

• Important note: You will need to make sure that


figure_drawing_functions.py is in the same folder as
your new program, otherwise Python won’t be able to find it.

212
Summary
• In this part, you again practiced the techniques to solve
computational problems by:
1. looking in the problem statement for the input(s) and the
output(s)
2. thinking which inputs are not allowed, e.g. negative
numbers or empty lists
3. writing tests (pairs of admissible inputs and their
expected outputs)
4. recognizing the type of the problem or the types of the
sub-problems
5. instantiating the patterns for those problem types
6. combining the algorithms for the sub-problems to solve
the whole problem.
7. organizing your code using functions.

• You have seen more examples of different thinking213


approaches.
TM112: Introduction to Computing and
Information Technology 2

Meeting #7
Block 2 (Part 5)
Diving Into Data

OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati


219
Contents

• Introduction
• The Creative Problem Solver
• A Data Analysis Project
• The Geography of Happiness
• Correlation and All That
• Summary

220
Introduction
• Data analysis consists of taking a set of data and computing something
from it – ‘crunching the numbers’ – to extract information.

• In the modern world, datasets can be huge, encompassing thousands,


millions or billions of items. To analyze such vast datasets, we need the
help of computers and programming languages. Fortunately, Python is
well suited for data analysis.

• Data analysis typically has a purpose. What do we want to get out of


the analysis?

• The analysis will involve a more extended use of Python than you have
seen so far, but we will take you through it step by step, and it will give
you valuable experience that will help with your own Python projects.

221
Problem Solving As A Process
• Human beings have taken problem solving further than our fellow
animals, and have been doing so for thousands of years.

• But for most of our human history, problem solving was just something
people did. Only very recently have people thought about problem
solving itself as a process that can be studied. One of the pioneers was
the Hungarian mathematician George Pólya, who set out his ideas in a
ground-breaking book How to Solve It.

• Pólya suggested that problem solving is a process with four stages


(Figure 5.4).

Figure 5.4 Four stages of problem


solving

222
Problem Solving As A Process
• Let’s look briefly at the individual stages. Pólya’s interest was
in mathematical problems, but we have adapted his ideas to
apply to problem solving using a computer.

1. Understand the problem: Before we can solve the


problem, we need a clear idea of where we are starting from
and what we want to get out of it. An example of this is in
Block 1 Part 4, in which we discussed the idea that when
tackling a problem, it may be a good idea to begin by
writing down how you will test the solution, and doing this
can help you get a clearer understanding of the problem.

223
Problem Solving As A Process
2. Make a plan: This corresponds to devising an appropriate
algorithm. Of course, this isn’t always straightforward,
otherwise we might not feel we have a problem to solve!
But there are some very useful guidelines, generally called
heuristics*, that we can apply.

3. Carry out the plan: Write and test code to implement the
algorithm.

4. Look back: What challenges did we meet and how did we


overcome them? What resources did we use to help us?
What have we learned that we can apply to similar problems
in the future?
* Heuristic: is any approach to problem solving, learning, or
224
discovery that employs a practical method
Problem Solving As A Process
Heuristics are methods of discovery and invention; rules of thumb that we
can use to help us solve problems.
Below are some of the heuristics points which we think are important:

1. Break the problem into smaller sub-problems.


2. Try to solve a simpler form of the problem.
3. Think if there is a pattern you have seen before.
4. Try working backwards from the solution.
5. Try representing the problem in a different way.
6. Don’t give up too quickly.
7. Don’t be afraid to make mistakes.

• You should recognize the first three of these, particularly 1 and 3,


from your previous study.
• For a heuristic to be useful, we must learn to recognize when it
can be applied, so it’s worth looking at some specific examples in
your book. 225
A Data Analysis Project
• For our data analysis project, we have chosen to look at some
questions about happiness and well-being, as measured by data
from the Office for National Statistics (ONS).

• You may be surprised that data is collected about happiness, and


perhaps skeptical about whether happiness can even be
measured.

• However, economists have become increasingly aware that to


assess the well-being of a nation, it is not enough to measure only
objective quantities such as gross domestic product. It is also
necessary to take subjective measures into account, such as how
happy people report themselves as being.

226
A Data Analysis Project
• The ONS (2015) has this to say about why this data is important.

• “The personal well-being statistics are used to inform decision making


among policy-makers, individuals, communities, businesses and civil
society. They complement other traditional measures of progress and
quality of life, such as unemployment and household income.”

• Our example project can’t take such a wide view as the World Happiness
Survey the article mentions, because that would mean pulling together
data from too many different sources and in too many different formats.
To keep the task manageable, we will just be looking at subjective well-
being in the UK.

• The data we need is freely available. Since 2011, the ONS has collected
data on subjective well-being and published it in a format that is quite
easy to work with in Python.
227
Working With Data in Python
• Figure 5.7 shows how people
responded – using a scale from 0 (‘not
at all’) to 10 (‘completely’) – to the
happiness question in the financial
years ending in 2012 to 2015.

• In this section we shall look at the data


that underlies the graph aside, explain
how it is structured, and how we can
access the data from a Python Figure 5.7 How happy people
reported themselves as being
program. yesterday, measured 2012–2015
• Begin by downloading the code (Source: ONS, 2015)

resources zip file from the module


website. Save the zip file to a
convenient location and extract the
contents. 228
Getting Started on The Data
• Our first goal will be to examine the question, “Are we getting
happier as a nation?”. The ONS has already considered this, of
course, but we want to work out the answer for ourselves, so we
can study how the data analysis is done.

• To address our problem, we shall need to read the data into


Python, process it appropriately, then output some results. In
other words, we will decompose the problem as follows:

> Find evidence on whether we are getting happier as a nation

>> Read the ONS data into Python

>> Process the data

>> Output the results 229


A Data Analysis Project
• At this point, we can usefully apply the heuristic of solving a simpler
problem first. As a first step, we will read the data in, then simply print it
out again without doing anything with it.

• This initial problem can itself be decomposed as follows:

 open the data file


 for each line
 read it in
 print it out

• The data file is happ_1.txt, which will be stored in the same folder
as our Python program. Opening a file in Python uses fairly self-
explanatory syntax.
file = open('happ_1.txt', 'r')
• The second argument 'r' tells Python we want to read from the file.
(To write to it, we would use 'w’.) 230
A Data Analysis Project
• Now the file is open, how can we access the data? A file consists of a series of lines,
and applying the heuristic suggests using the same approach as we would to
access the items in a list. Python lets us do just that: we can loop through the lines
in a file using a for statement, and print each as we go.

for line in file:


print(line)

• The complete program is:

file = open('happ_1.txt', 'r')


for line in file:
print(line)

231
Working Out An Average
• To examine whether happiness in the UK increased over the time
period 2012–2015, we will work out the average for each year and
compare the results.

• There is more than one kind of average, but the one we will use
first is the mean.

• Given a simple list of numbers we find the mean by summing the


numbers, then dividing by how many numbers there are.

• For example, the sum of 1,3,8,1,5,4,2,6


is 30, and there are 8 numbers.
So the mean is 30 / 8 = 3.75

232
Working Out An Average
• Our ONS data is more complicated than the example above,
though. The column for a particular year – 2012, say – shows the
percentage of people who rated their happiness at each point of
the scale.

1.4% rated it at 0
0.9% rated it at 1
2.2% rated it at 2

• and so on. We can’t just average the percentages; we have to take
the ratings into account as well.

• This is still an aggregation problem, but of a slightly different kind.


To see what we need to do, we’ll again start with a similar but
simpler problem.
233
Example-Averaging Star Ratings
• In an online star rating system, reviewers award something – for
example, a book – a number of stars between 0 and 5.

• Imagine we want to compare two books on Python – Book A and


Book B – by averaging the ratings given by a number of reviewers.

• Table 5.2 shows the percentages of reviewers awarding the two


books each of the six possible star ratings. (For instance, 9% of
reviewers gave Book A no stars, 27% gave it 1 star, 26% gave it 2
stars, and so on.)
Rating No stars ✩ ✩✩ ✩✩✩ ✩✩✩✩ ✩✩✩✩✩
Book A 9 27 26 19 12 7
Book B 4 21 30 25 18 2
 Table 5.2 Star ratings for Book A and Book B

• What algorithm can we use to calculate an average rating for each


book? 234
Finding An Algorithm
• Let’s look at Book A first. The numbers 9, 27, …,7 are
percentages, i.e. out of 100. For simplicity, let’s
imagine exactly 100 people rated Book A. Then out of
those 100 people, 9 awarded 0 stars, 27 awarded 1 star,
26 awarded 2 stars, and so on.
 ( 9*0 + 27*1 + 26*2 + 19*3 + 12*4 + 7*5)

• Now we can calculate the average rating. There were a


total of 219 stars and 100 people, so the average rating
is 219 ÷ 100 = 2.19

Rating No stars ✩ ✩✩ ✩✩✩ ✩✩✩✩ ✩✩✩✩✩


Book A 9 27 26 19 12 7
Book B 4 21 30 25 18 2 235
Finding An Algorithm
• This can be translated into Python as follows. The list numbers
represents the ratings for Book A.

numbers = [9, 27, 26, 19, 12, 7]


total = 0
for rating in range(6):
product = rating * numbers[rating]
total = total + product
average = total / 100
print('average rating =', average)

236
Finding An Algorithm
• Now suppose that instead of two separate lists for Book A and Book B, we
want to store both sets of data together. What Python data structure can we
use to represent a table such as the data in Table 5.2?

• If we think of the table as a series of rows, with every row being a list, then
we can represent the table as a list of lists, like this.

table = [[9, 27, 26, 19, 12, 7],[4, 21, 30, 25, 18, 2]]

• We can now find the average of each row using the same algorithm as
before, but inside a loop .
table = [[9, 27, 26, 19, 12, 7],[4, 21, 30, 25, 18, 2]]
for row in table:
Note that numbers is now a list variable that will
numbers = row
pass over all the internal lists
total = 0
for rating in range(6):
product = rating * numbers[rating]
total = total + product
average = total / 100
print('average rating =', average)
237
Getting The Data in The Form We Want
• Now we have a working algorithm for
calculating the average of star
ratings, we want to extend it to the
ONS happiness data. However, we
first need to get the data into a
suitable form. Our averaging program
works with a table of numbers,
represented as a list of lists, but the
data is currently in a CSV file.

• We saw earlier how to read this into


Python, and we saw how data is
printed.

238
Getting The Data in The Form We Want
• This might look fairly promising, but it turns out each row is
actually a string, not a list at all, so we still have some way to go.
We can tackle the problem in a series of stages, each of which will
take us nearer to our goal (rather like the squirrels we saw earlier).

• The first step is to convert each row into a list of separate values.
There are various ways to do this, but we shall use the Python CSV
reader, which is in a Python module csv.

• Here’s how the code will work. This is just an explanation and you
don’t have to run any code yet. We have numbered the lines so we
can refer to them.

• Before we can use the CSV reader, we must import the Python
module containing it.
239
Getting The Data in The Form We Want
• LinePython code
1 import csv

• Then we read the data into a table using the following code, which
is a adaptation of the code you saw previously.

• LinePython code
2 file = open('happ_1.txt', 'r')
3 table =[]
4 reader = csv.reader(file)
5 for row in reader:
6 table.append(row)
7 print(table)

• The key differences are in lines 3 to 6.


240
'Happiness',2012,2013,2014,2015,
0,1.4,1.2,1.1,1.0,
1,0.9,0.9,0.8,0.7,
2,2.2,2.0,1.9,1.6,
3,2.7,2.5,2.5,2.3,
4,3.7,3.8,3.4,3.3,
5,9.4,9.3,8.5,8.5,
6,8.7,8.8,8.6,8.2,
7,15.8,16.4,16.1,15.8,
8,23.5,24.2,24.6,24.4,
9,15.4,15.6,16.3,17.0,
10,16.4,15.3,16.3,17.1,

241
Getting The Data in The Form We Want

• Line 3 creates an empty list that is going to hold the rows of the table.
• Line 4 opens a special CSV reader. A CSV reader automatically splits a string
of comma-separated values into a list of individual items.
For example, if it came across the string : 'the,cat,sat,on,the,mat’
• it would convert it to the list:['the','cat','sat','on','the','mat’]
• Lines 5 and 6 iterate through the rows of the CSV file. Each row is converted
to a list as described above, and the resulting list added to the table.
• Line 7 just prints the final table. 242
Introducing table_utils
Here are four problems you may have noted.
• The table has borders: on the left, the ratings 0--10, and across the top the
years 2012-2015. We need to exclude these from the calculation.
• The last column is a series of empty strings; we need to exclude that too
• Because each year corresponds to a column, not a row, we can't calculate
the averages row by row as we did for the book data
• The elements are strings, not numbers.
Issues like this are common when we analyze data. It won't usually be structured as
we'd like and we need some initial processing to get it into the form we require.
To overcome these issues, we will process the data using a series of utility
functions. There is a well-known library for data analysis in Python called pandas,
but this does far more than we need and is not part of the standard Python
installation. So instead we have written our own small library table_utils, a sort
of ‘baby pandas’ that does everything we require here.

243
Introducing table_utils
• We shall use four functions that process a table in
various ways:
• rows(), which selects only that part of a
table between specified rows
• cols(), which selects only that part of a
table between specified columns
• flip(), which swaps the rows and columns
in a table
• to_float(), which converts all the strings
in a table to the equivalent numbers.

These are explained further below.


244
Getting The Data in The Form We Want
• The function rows()takes three arguments: a
table, a start row and an end row. Numbering
starts at 0, as is usual in Python. The function
returns a new table containing just the rows
beginning with the start row and finishing with
the end row, inclusively (Figure 5.10).

• For example, starting with the table


1,2,3
4,5,6
7,8,9 Figure 5.10 New table
formed of rows from start to

• rows starting at 0 and ending at 1 gives finish inclusive

1,2,3
4,5,6
245
Getting The Data in The Form We Want
• The function cols() takes three arguments: a
table, a start column and an end column. It returns a
new table containing just the columns beginning
with the start column and finishing with the end
column, inclusively (Figure 5.11).

• For example, starting with


1,2,3
4,5,6
7,8,9 Figure 5.11 New table
formed of columns from
• cols()starting at 1 and ending at 1 gives start to finish inclusive

2
5
8
246
Getting The Data in The Form We Want
• The function flip() takes one argument, a
table, and returns a new table like the original
but with the rows and columns interchanged
(Figure 5.12).

• For example, starting with


1,2,3
4,5,6

• flip()gives
1,4 Figure 5.12 Rows and
2,5 columns are interchanged
to form new table
3,6

247
Getting The Data in The Form We Want
• The function to_float() is needed when a table contains numbers
represented by strings, for example '1.23’.

• At first sight, '1.23' may look like a floating-point number – that is, a number
with a decimal point in it. (Floating-point numbers were introduced in Block 1
Part 1.)

• However, because of the quotes, '1.23' isn’t a number as far as Python is


concerned, but simply a sequence of characters. Before we can use it as a
number and do arithmetic with it, we need to tell Python to convert it from a
string to the corresponding floating-point value 1.23.

• The to_float() function takes a table as argument and returns a new table
with all the strings in the table converted to their equivalent floating-point
values, where possible. For instance, '1.23' would become 1.23, as described
above. Any strings that don’t correspond to floating-point values are simply left
unchanged.

248
Getting The Data in The Form We Want
Using these functions, we can get the table into the shape we want as follows:

• select rows 1–11 (to exclude top border, which is row 0)


• select columns 1–4 (to exclude left border and empty values at
far right)

apply flip()
apply to_float()

• We can think of this as a sort of production line, with the result


of each operation being passed on to the next (Figure 5.13). You
will recognise this as an algorithmic sequence – an idea
introduced in Block 2 Part 2. It is also an example of heuristic 1,
the familiar idea of breaking a problem up into sub-problems.

Figure 5.13 Operations in 249


sequence
Getting The Data in The Form We Want
• The to_float() function is needed because initially all the
numbers are written in the form of strings and we won’t be able to
do arithmetic on them until they have been converted to floating-
point values.

• Here’s the code, assuming we start with table.

from table_utils import *

table2 = rows(table, 1, 11)


table3 = cols(table2, 1, 4)
table4 = flip(table3)
table5 = to_float(table4)
print(table5)

250
251
Getting The Data in The Form We Want
• The next stage now, is to find the average rating of happiness for each year.
We will follow the same steps that we have used in calculating the books
average rating using a scale from 0 (‘not at all’) to 10 (‘completely’) – to the
happiness question in the financial years ending in 2012 to 2015.

x=2012
for row in table5:
numbers=row
total=0
for rating in range(11):
product = rating*numbers[rating]
total=total + product
average=total/100
print('average rating of happiness in year’, x ,'is =', round(average,2))
x=x+1

252
More on Summarizing Data
• In the last section, we summarized ONS data on happiness by
calculating the mean. This approach is widely accepted, and the
ONS uses it.

• However, some people argue we should only use the mean when
the data measures a quantity such as weight or height. Their
objection comes from the fact that points on an arbitrary scale (such
as happiness) don’t represent a real quantity in the way weight or
length do. We can’t really say that someone who rates their
happiness as 10 is twice as happy as someone who rates theirs as 5.

• If we accepted the argument that using the mean is not legitimate,


there is another form of average we could use: the median. This is
another example of reduction, because we produce a single number
that summarizes the dataset. Using the median overcomes the
possible objections to using the mean in this case. 253
More on Summarizing Data
• The median is just the middle value when the data are sorted in
numerical order. If the number of data is even, so there are two
middle values, we take the value halfway between them (their
mean, in fact).

• For example, given the numbers


3, 1, 8, 4, 7, 6, 4, 2, 5, 9

• we first sort them, getting


1, 2, 3, 4, 4, 5, 6, 7, 8, 9

• The length of the list is even, so we need to find the two middle
values and take the value halfway between them. The middle values
are 4 and 5, so the median is 4.5

254
Dispersion (distributing things over a wide area)
• Another type of reduction is used to assess dispersion – how spread out the
data are. Do they mostly cluster close to the average, or are the data more
spread out, with many values much smaller or much greater than average?

• Figure 5.14 shows two distributions with the same median (and mean), but
(b) is more dispersed than (a).
Figure 5.14 Dispersion

• One measure of dispersion that goes naturally with the median is the
interquartile range. The quartiles are the values that split the data into
quarters, similarly to the way the median splits them in half; see Figure 5.15.
Note that the median is the same thing as Quartile 2.

• Then the formula is


Interquartile range = Quartile 3 – Quartile 1. Figure 5.15 Quartiles

• If we keep the median the same but increase the interquartile range, the
dispersion increases and we have a graph like Figure 5.14(b), as contrasted
255
with Figure 5.14(a).
Skew
Figure 5.16
Frequency of word
lengths in the
Gettysburg Address

• A final property of interest a dataset may have is skew. A dataset is skewed if the
values are distributed unevenly about the average, with the tail on one side
stretching off further than on the other. This can be explained best with an example.
Figure 5.16 is a bar chart showing the frequency with which different word lengths occur in a
well-known speech.

• You see there is a peak at 3, which shows that words of length 3 are comparatively common,
and there is quite a long tail off to the right. (The distribution is described as right-skewed.)
Although it is possible to measure degree of skew as a number, it is more common to
comment on any skew and say whether it is to the left (also known as negative skew) or
right (also known as positive skew).

• When data is skewed, the median is often preferred to the mean.


256
Correlation and All That
• In Sections 5.2 and 5.3, we looked at questions that involve
taking a single dataset and computing a new value from those
in the dataset; recall that this process is called a reduction.

• Block 2 Part 2 discussed several types of reduction. In our


case, finding the mean and the median are examples of
aggregation problems. Finding the happiest location is an
example of a retrieval problem.

• In this section, we consider correlation, which describes a


relationship between two datasets. This is another form of
aggregation problem, but now we compute a new number
from two lists, rather than one.
257
Visualizing and Measuring Correlation
• When the values in two datasets seem to change together, we
say they are correlated. A good way of visualizing a
correlation is to use a scatterplot like Figure 5.27.

Figure 5.27 Internet


access and weekly Figure 5.28
income (Source: ONS, Approximate straight fit
2011)

• There is nothing special about this data; we just chose it as an


illustration of correlation. Each point on the graph corresponds to a
particular geographical area. You can see that, generally speaking,
areas where the average weekly income was higher tended also to
have a higher percentage of internet access.

• The trend is fairly clear, and we can see that the points all fit roughly
around a straight line, as shown in Figure 5.28. 258
Calculating and Interpreting
Correlation Coefficients
• Courses on statistics explain in detail how the correlation
coefficient is calculated, but here we are only interested in
using it as a tool, so we have written a Python function for this
purpose.

• corr_coef()takes two lists of numbers, which must be of


equal length, and returns the correlation coefficient between
them.

• To interpret the values returned by the function, we will use


the scale in Table 5.4, loosely based on Cohen (1992)..

259
Calculating and Interpreting
Correlation Coefficients
 Table 5.4 Interpretation of correlation coefficient r
Value of r Level of correlation

–0.3 to 0.3 Low

0.3 to 0.5 or –0.3 to –0.5 Moderate

0.5 to 0.9 or –0.5 to –0.9 High

0.9 to 1.0 or –0.9 to –1.0 Very high

• The value of r is sometimes described as a way of


measuring the ‘size of an effect’, which, as you can see,
also ties in with the range of levels shown in Table 5.4
260
Correlation is Not Causation
• When we say A causes B, we mean that A has somehow brought
about B. The cause A is responsible for the effect B.

• Human beings tend to look for relationships between things. It


helps us make sense of the world and contributes to our survival.
But this natural tendency also means we easily confuse
correlation with causation, and see causation where none exists
– especially if it seems intuitively plausible somehow.

• A famous example is the correlation between the viral disease


polio and ice-cream consumption. Polio is now rare, but until the
middle of the twentieth century epidemics occurred regularly.
The disease, which particularly affected children, killed and
disabled many.
261
Summary
• In this part, you extended your existing problem-solving skills. You were
introduced to Pólya’s four stages of problem solving, and saw how to apply
several useful heuristics.

• You saw that data analysis is a process of analyzing data to extract information
from it. You were introduced to an important source of publicly available data,
the Office for National Statistics, and investigated data to do with well-being.
With the aid of suitable library functions, you used Python to manipulate data in
the form of tables and produce summary information such as averages,
measures of spread, and correlation coefficients.

• Throughout this part, you extended your ability to use Python for larger-scale
problems, and to apply ideas of algorithmic combination you met in Block 2 Part
2. You also met examples of good programming practice, such as keeping a
laboratory notebook, and laying code files out for optimum readability.

• Finally, you applied critical thinking to the results of the data analysis and
considered how reliable the conclusions drawn were likely to be.
262
TM112: Introduction to Computing and
Information Technology 2

Meeting #8
Block 2 (Part 1 )
Cloud Computing

Block 2 (Part 3)
Mobile Phones
OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati 263


Contents
• Introduction
• Cloud Basics
• The Cloud for Businesses
• Changes Driving and Being Driven By Cloud Computing
Cloud Computing
• Reading About The Cloud And Other Technologies
• Writing About The Cloud And Other Technologies

• The Components of a Mobile Phone


Mobile Phones • Sensors in Mobile Phones
• Communication Technologies Used By Mobile Devices
• Challenges and Issues
264
• Summary
Block 2 (Part 1 )
Cloud Computing

265
Introduction
• The term cloud computing describes a system where users access a
virtual machine or application that is hosted on a remote server
and supplied by a cloud provider.

• The data and information the user exchanges with the server is
delivered via public internet connections, rather than privately
owned networks.

• These days, we are used to making transactions in the cloud. For


example, as an individual you may have used Skype, Dropbox,
Gmail or a myriad of other cloud services.

266
What is The Cloud?
Cloud computing includes the following five essential characteristics.

• On-demand self-service: the consumer can unilaterally access


server time and network storage, as needed, automatically

• Broad network access: the services are available over a


suitable network and can be accessed by mobile phones,
tablets, laptops, workstations and other devices.

• Resource pooling: the cloud provider’s computing resources


are pooled to provide a service to multiple customers (so-called
multi-tenancy), with the physical resources (the infrastructure)
and virtual resources (software) dynamically assigned and
reassigned according to consumer demand. The consumer
might not know the exact location of these resources.
267
What is The Cloud?

• Rapid elasticity and scalability: the resources are elastically


released so that an increase in demand can be accommodated.
Scalability should appear to have unlimited potential to the
consumer and should be on tap at any time.

• Measured service: cloud systems automatically control and


optimise resource use that at some level is ‘metered’ according
to the type of service (storage, processing, bandwidth and user
accounts). Hence resource usage can be monitored, controlled
and reported, providing transparency for both the provider and
consumer of the utilised service.

268
Where is The Cloud?
The servers that host the virtual machines at the
heart of the cloud are located in large air-conditioned
rooms in data centres.
(see Figure 1.3)

Figure 1.3 Servers in a data centre


A major consideration when selecting the location of a data center is latency
(which you will recall is the time it takes for a request to travel from a user
to the data center and the response to come back again); in general, the
closer
a data center is to the user, the lower the latency.
269
Where is The Cloud?
• Microsoft has experimented with underwater data
centres, which have a great potential for reducing
latency as half of the world’s population is located
within 125 miles of a coast.

• They also need much less cooling, making them very


energy efficient, particularly if they could also be co-
located with renewable energy sources, such as tidal
or wind energy.

• However there are also drawbacks, such as access for


maintenance, the effects of thermal pollution on the
oceanic environment and possible issues with data
protection in international waters. 270
Types of Cloud
• You might think that the idea of a cloud is quite nebulous – an
anonymous, unknown location that is shared by many
unrelated users. In fact, this is a pretty spot on description of a
public cloud.

• Public cloud is owned, managed and operated by a business,


or academic or government organisation, or some
combination of these, and the hardware infrastructure
physically exists on one or more of the premises owned by the
cloud provider or one of their partners.

• Cloud providers, such as Amazon Web Services, Microsoft and


Google, have constructed vast data centres to house
thousands of servers and the necessary associated networking
equipment.

271
Types of Cloud
• The term private cloud has come to mean a cloud
infrastructure that is for the exclusive use of a single
organisation.

• It may be owned, managed and operated by the organisation,


a third party or some combination of these, and it may exist on
or off the premises of the organisation.

• Although private clouds offer some of the advantages of public


clouds, compared to a public cloud, they are likely to be on a
much smaller scale with far fewer servers.

272
Types of Cloud
• A third type of cloud, a community cloud, is a cloud
infrastructure that is exclusively for a specific community of
consumers from organisations that have shared concerns (e.g.
mission, security requirements, policy and compliance
considerations). It may be owned, managed and operated by
one or more of the organisations in the community, a third
party, or some combination of them, and it may exist on or off
premises.
• In addition to the above three types, we have a popular
solution for large enterprises that want the privacy and control
of a private cloud, but the scalability and multiple locations of
the public cloud.
A hybrid cloud is a composition of two or more distinct
types of cloud (private, public or community).

273
The Cloud And I

• If you use Facebook, Twitter, Dropbox or Gmail on a PC,


tablet or phone, your data is already being stored, and
possibly processed, in the cloud.

• The companies behind these social interaction, file


storage and communication tools use vast data centres
that house the servers, databases and networking
equipment to support millions of users around the
world. They have designed their applications to run on
a variety of user devices so that you can choose how
you interact with their services.

274
The Cloud And I
• One of the most common ways in which individuals use the
public cloud is to store and share files using free packages
such as Dropbox, Google Drive, IDrive, OneDrive and Apple
iCloud Drive.

• Many people entrust their photographs to such sites, but we


also use them to store text documents, music and video.

• The ability to access files from several devices or locations,


and being able to easily share the files with other people, are
usually mentioned by the users of these services as the top
reasons for using storage in the cloud. Protection against
data loss and increased memory space are also cited.
275
No Such Thing As a Free Lunch
• Many of the services we have mentioned are free to
individual users, but of course there is no such thing as a free
lunch. Some companies mine the data that you provide them
with when you sign on for ‘free’ services.

• Other companies might tempt you in with modest free


services in the hope that you will sign up for additional
chargeable services. Or you may be bombarded by adverts,
the revenue from which goes to the company.

• The fact that Google and other providers of ‘free’ services are
incredibly profitable may suggest that we underestimate the
revenue that they accrue from their ‘free’ offerings.
276
Are My Documents, Photos and Login
Data Secure in The Cloud?
• In terms of privacy, bear in mind too that your photographs
and documents may be more vulnerable to being seen by
others if they are in the cloud.

• Although data breaches on personal cloud services are


relatively rare, saving sensitive documents such as
unencrypted files containing passwords and bank account
details in the cloud is never a good idea.

• You should also be aware that login details for cloud storage
sites can be a target for hackers; for example, Dropbox
admitted in 2016 that the passwords and email addresses of
nearly 70 million users had been stolen.

277
Cloud Architecture
• Cloud providers must have a mechanism to allow multiple users
to access the same physical resources, which are usually large
servers located at a distance from the consumers.

• The virtual machine form of virtualisation takes the resources of a


single physical host computer (CPUs, memory and input/output
devices) and divides them into multiple virtual machines.

• Each virtual machine appears to be an independent, self-


contained computer with its own processor, memory and
peripherals, running a standard operating system.

• This kind of virtualisation allows a single piece of hardware (a


server) to be shared by many different cloud customers. This is an
example of abstraction – the user of a virtual machine sitting ‘on
top of’ the server does not need to know the details of the
278
particular hardware that the server employs.
Cloud Architecture
• Cloud customers may want to access different levels of service.
Consider three cloud customers (individuals or businesses)
who have different needs.

Customer 1 Customer 2
I want additional Give me a ready-made
storage and (possibly) platform – that is an Customer 3
processing capability operating system and I just want to use
on which I can do my everything I need to applications for tasks
own thing. I may want execute a program – on such as emailing, word
to install my own which I can build and processing, or data
operating system as run my own crunching. Please don’t
well as build and use applications. I don’t bother me with any
my own applications. want to worry about details at all; just supply
But someone else can how that platform is me with access to the
manage the actual deployed or on what applications I want to
hardware (the hardware, I am only use.
infrastructure); I am interested in using and
not interested in that. writing my own
applications.
279
Cloud Architecture

Figure 1.5 The layers of abstraction in cloud


computing. Note that each layer is dependent
on the layers below it

280
Cloud Architecture
• The lowest layer is the infrastructure layer and is mostly composed
of the physical kit, such as servers, storage and networking hardware.
Customer 1 wants to operate within this layer, which means he is
willing to pay the cloud provider to be responsible for the
infrastructure.

• The middle layer is the platform layer and provides an interface


between the applications and the infrastructure. This layer includes an
operating system plus other software (collectively called middleware)
that is needed to write and run applications. Customer 2 wants to
operate within this layer, which means she is willing to pay the cloud
provider to be responsible for both the infrastructure and the
platform.

• Finally, the top layer, sometimes called the application layer, includes
the data and applications. Customer 3 wants to operate within this
layer, which means he is willing to pay the cloud provider to be
281
responsible for the infrastructure, the platform and the applications.
Pizza as a Service
• Figure 1.6 shows a famous infographic developed by Albert Barron, a
Senior Software Client Architect at IBM. In it he uses the different
ways a customer can obtain a pizza dinner as an analogy for who
manages what when a customer (you) buys a cloud service.

282

Figure 1.6 Pizza as a Service


Cloud Services for Business
• The NIST standard defined three
major cloud services, which still
form the backbone of the services
offered by cloud providers today.

• The leftmost column of Figure 1.7 is


labelled ‘Packaged Software’ – this
is the situation when you buy and
install software on your own
hardware. The remaining columns
provide a comparison of what the
Figure 1.7 The division of responsibilities when a user
vendor (the cloud provider) has packaged software, or purchases one of three
different cloud services: Infrastructure as a Service,
manages and what the customer Platform as a Service and Software as a Service
(you) manage when you buy
different cloud services.

283
Infrastructure as a Service
• If, as a business, you select Infrastructure as a Service (IaaS),
you are essentially outsourcing your hardware needs to a cloud
provider. IaaS cloud providers sell virtual access to off-site
servers, storage and networking hardware. As the customer,
you can build your own platform on this hardware and access it
at any time, paying only for the resources you use.

• IaaS can have many advantages for businesses. It means that


they do not have to buy and maintain expensive equipment.
They also do not have to pay their own IT professionals to run
it, or to manage data security and disaster recovery processes.

An Example of Infrastructure as a Service is the Netflix which decided to


move its film distribution business to the public cloud in 2008.

284
Platform as a Service (PaaS)
• PaaS provides one or more platforms on which a business can run
existing applications or develop and test new ones without being at
risk of compromising their internal systems. It also enables
development teams that are geographically distributed to work
together on the same software project.

• PaaS is particularly useful for companies that build and deploy


software applications that run in the cloud, such as web application
management, application design, app hosting, storage, security and
app development collaboration tools.

285
Software as a Service (SaaS)

• Any application hosted on a remote server that can be


accessed over the internet is considered to be Software as a
Service (SaaS). This is the kind of cloud computing that you are
most likely to be familiar with as an individual.

• However, these services are also being widely taken up by


businesses that appreciate the scalability when new users
come on board or leave.

286
User Issues in The Cloud
• Downtime is time during which the cloud services are not available.
The cloud service providers have to juggle demand, and there is
always a danger that they may be overwhelmed.

• Uptime is the time during which the cloud services are available.

• Storing data and important files on external clouds and moving them
across networks always opens up risks. A single vulnerability,
misconfiguration or malicious hacker can cause a security breach across an
entire provider’s cloud.

• Minimised risk: Cloud users might also want to ensure that their data and
files are not accessible by intruders and terrorism.

• Control: Customers have very little control, particularly over any downtime,
trouble-shooting, back-ups and disaster recovery

287
New Ways of Working
• The move to cloud-hosted SaaS has also seen a shift in the way software
applications are updated.

• This change in the way software is developed and deployed has also led to
a more agile way of working called continuous delivery. As updates and
bug fixes are completed, they are tested against a version of the current
release; if the tests pass, the update is added to the live application. In
order to achieve this, the developers, who make the software, and the
operational staff, who deal with the software after it is deployed, must
work in tandem to develop, maintain and improve a running service. In
particular, the developer role must take into account the way the software
is deployed and maintained operationally. They must test their new
software against a model of the live application and should expect their
work to be added to it when appropriate.

• The term DevOps (which is an amalgamation of the words development


and operations) has been coined to describe this new culture, which, as
Figure 1.10 shows, aims to integrate the whole software life cycle.
288
New Ways of Working

Figure 1.10 The DevOps life-cycle


• DevOps, as a culture, is driven by cloud computing, but it is also driving the use of
cloud-based systems for managing content and synchronizing and sharing files.
Many cloud providers now systematically support DevOps by providing a
platform that includes continuous development tools, which ensure that a
developer’s changes are immediately tested and reported. So the cloud provides
a centralized and cost-effective platform for production, testing and deployment
of software. These cloud resources can also be tracked using different criteria: by
application, developer, user, data, etc., which makes it easier to track the289 costs
involved.
Block 2 (Part 3)
Mobile Phones and How We Use Them

290
• The components of a mobile phone

• Smartphones
• What is Inside a Mobile Device?
• Touch-Based Graphical User Interface
• Operating Systems for Mobile Devices

291
Smartphones

• A smartphone is a mobile phone that performs many of the


functions of a computer.

• Typically, a smartphone will have internet access and an operating


system capable of running downloaded software applications
(known as apps).

• Smartphones also use touch screens and icons, which provide an


easy and intuitive user interface.

292
What is Inside a Mobile Device?
• The heart of the modern mobile phone consists of a multi-core central
processing unit (CPU), a graphics processing unit (GPU) and digital
signal processor (DSP).

• The CPU (or processor) is the component that directly processes data and
instructions.
• GPU is optimized to process graphics.

• The DSP takes digitized information – which might be audio, video,


temperature, pressure or position – and then mathematically manipulates
it so that it can be displayed, analyzed or converted to another type of
signal.

• These three units work together to provide enough processing power to


handle multimedia content such as photographic images, music and videos
that are often used on mobile phones. Unlike a personal computer, the
components of a mobile phone are usually packed on a small silicon chip
known as system on a chip. 293
What is Inside a Mobile Device?

• Mobile phones usually possess a lot of random access memory (RAM),and


it is usually tightly packed with the processors in a package, known as
system in a package.

• Solid-state flash memory, like that in a USB stick, usually provides the
storage memory, as this is faster, lighter and uses less power than a disk
drive – important characteristics for a small, portable device.

• Another important component of the mobile phone is the display. Most


mobile phones possess a large touch-screen display, which has the dual
roles of displaying screen output and taking user input. 294
What is Inside a Mobile Device?

• Almost all mobile phones have one or more peripherals for users to connect
to other devices, such as subscriber identity module (SIM) cards, memory
cards, chargers, headphones and other computers.

• Some common peripheral connectors are micro-USB, SIM, memory card


connectors, microphone and headphone jack.

• Wireless and mobile communications require a radio-frequency (RF)


communication module and one or more antennas to transmit or receive
radio waves and other signals.

• A mobile phone must convert analogue quantities, such as sound picked up


by a microphone or light entering a digital camera, into a digital signal,
which is achieved by an analogue-to-digital converter (ADC).
295
Touch-Based Graphical User Interface
• Technological advances have enabled increasingly small and more portable
mobile phones, but small mobile phones are difficult to use, especially for
reading and typing, because of the small screen and keypad buttons. So from
around 2007, manufacturers started reversing the trend of making mobile
phones smaller by designing ones with a big screen.

• Another major trend has been the integration of the large screen with touch
sensors, and by 2010 graphical user interfaces (GUIs) were the norm. Users
can also manipulate graphics as well as text. Figure 3.3 shows some mobiles
phones from different eras.

296

Figure 3.3 The evolution of mobile phone design over three decades
Touch-Based Graphical User Interface
How does a touch screen work?
Usually, touch screens use changes in electrical properties to detect a touch.

• A resistive touch screen uses the principle that the pressure of a finger can
be used to connect two thin conducting layers, causing a change in
electrical resistance at that point. This kind of screen has the advantage
that it works if the user has gloves on, and it is cheap and resistant to
liquids and other contaminants.
• However, resistive touch screens have the disadvantage that the user has
to press down, either with a finger or a stylus, which can cause damage.
Because of these disadvantages, capacitive touch screens are now the
norm on smartphones.
• A capacitive touch screen is based on another electrical property, called
capacitance, which measures how much charge a device (called a
capacitor) can hold.

297
Operating systems for mobile devices

• As the functionality of mobile phones grew, it became necessary to install


an operating system (OS) to manage user interactions, applications,
memory, hardware and other resources.

• The GUI of a mobile OS is usually based on the concept of direct


manipulation such as tap, swipe and pinch. Many mobile OSs have been
developed, but two of them dominate the market: Google’s Android and
Apple’s iOS.

298
Android vs. Apple iOS
Android
• Android is by far the most widely used mobile OS. It was initially developed
by a California-based company Android Inc. in 2003, but was acquired by
Google in 2005.
• The heart of Android is based on the most fundamental operations within
the Linux operating system, which is open-source. As a result, Android itself
became an open-source project, known as the Android Open Source
Project (AOSP).

Apple iOS
• iOS is the second-most used mobile OS. It was developed by Apple Inc. and
is used exclusively on Apple’s i-series mobile devices, such as the iPhone and
iPad. The success of iOS is a result of the availability of high-quality
applications and the popularity of the i-devices.

• iOS is a closed-source project, and third-party applications were not initially


supported.
299
Sensors in Mobile Phones (Overview)
• Modern mobile phones have electronic devices, called sensors,
which can detect the environment around your phone and convert
what they sense into an electrical signal.

• The mobile phone is so small, but carries so many different sensors.


Why do we need so many sensors? some sensors are needed for
the basic function of a telephone, while others are needed to
provide accurate measurements for the enhanced functionalities
such as localization, orientation and user recognition.

300
Cameras
• Just to briefly recap, an image is made up of a two-dimensional grid of pixels,
and each pixel represents a dot or a square on the image. The resolution of
an image sensor is a measure of the number of pixel sensors it contains.

• The heart of a camera is an image sensor. Most image sensors are now of the
active pixel sensor (APS) type. As APS is usually made using the
complementary metal-oxide semiconductor (CMOS), this type of sensor is also
known as CMOS image sensor. A picture of a magnified CMOS APS image sensor
is shown in Figure 3.5.

• The resolution and quality of a picture taken by a camera mostly depends on its
image sensor. Generally, the larger the sensor, the better the image quality.

• The heart of a pixel sensor is a photodiode, which is an electronic component


that converts light into an electrical current, coupled with a control circuit, which
allows the image sensor to read and reset the status of an individual pixel sensor.

301

Figure 3.5 A magnified CMOS


APS image sensor
Cameras
• If an image sensor has 2560 columns and 1920 rows of pixel sensors, it
will produce an image that has 2560 pixels along its width and 1920
pixels along its height. The ratio of the width to the height of a
rectangle is known as its aspect ratio, so such an image has an aspect
ratio of 2560:1920.

• The resolution of an image sensor is determined by the number of pixel


sensors it has.

To produce a colored image, the pixel sensors need to be able to sense at least
the three primary colors of light, i.e. red, green and blue, because any color of
light can be separated into various amounts of these three colors.

Figure 3.6 An illustration of how color filters work 302


Gyroscopes
• Gyroscopes are used to measure how quickly an object rotates.
Rotational changes such as tilts or rolls are often undesirable – for
example, in camera shake – and a gyroscope can be used to detect
and reduce this kind of motion.

• However, when gaming, gyroscopes can be used in conjunction


with accelerometers to allow the action on a mobile phone screen
to respond to the rotational movements of the device in a smoother
and more precise way.

303
• Communication Technologies Used
By Mobile Devices

• Long-range Mobile Communications


• Short-range Wireless Communication Methods
• Wi-Fi Direct
• Bluetooth and NFC

304
Long-range Mobile Communications
• Mobile phones are connected wirelessly through radio waves to a base
station. A base station coordinates what happens inside each local part of
the mobile phone network, which is called a cell. From the base station,
the calls are routed onward to their destination through cables or
different wireless links via some intermediate subsystems.

• At the end of 2017, the fourth generation of mobile telecommunications


technology, 4G, was available in major UK cities but was not yet offered
by all mobile operators.

• However, activities such as streaming ultra-high-definition videos and


large-scale machine-to-machine communications are increasingly
generating huge amounts of traffic, and soon the current mobile network
will not be able to cope with these demands.

305
5G
• The main objectives of the 5G network are to:

 provide a capacity that allows 1000 times more devices to be


connected than the current network

 achieve a target peak data rate of 10 Gbps for stationary


users, 1 Gbps for slow-moving users and no less than 100
Mbps in urban areas

 provide real-time interactive applications (with no more than


1 microsecond latency)

 improve coverage such that a consistent user experience at


any time and anywhere is achieved

 reduce power consumption by up to 90%. 306


Short-range Wireless Communication Methods

• Advantages of using short-range communication:

 signals are usually stronger and hence can achieve a higher


data rate
 it is usually cheaper (if not free)
 it uses less power to transmit and receive data.

• A number of technologies for short-range communication are


available, but the most common ones used by mobile phones are
Wi-Fi Direct, Bluetooth and near-field communication (NFC).

307
Wi-Fi Direct
• Wi-Fi is a very common wireless communication method used by laptops
and mobile devices, often for connecting the devices to a wireless local
area network (WLAN) and to access a broadband internet connection
through an access point.

• Like Wi-Fi devices, Wi-Fi Direct devices usually operate on the 2.4 GHz
and 5 GHz radio frequency bands. It has an indoor range of tens of
meters, which is usually enough to cover a typical home environment,
but the outdoor range can be several times higher.

• As it is a wireless technology, it is by nature not secure, as the data


stream can be intercepted. Encryption and authentication are therefore
often employed to protect the data.

• At the time of writing, Wi-Fi Direct uses the Wi-Fi Protected Access II
encryption (WPA2).

308
Bluetooth
• Bluetooth is another wireless technology that operates on the 2.4 GHz
radio frequency band. Jim Kardach, an engineer who worked on the
technology in the 1990s explained many years later that the name
Bluetooth

• It has a shorter range and lower data rate than Wi-Fi – often less than 10
meters and 24 Mbps.
• Apart from using encryption, Bluetooth also uses a pairing process to
improve security. The pairing process usually involves some human
input, such as entering the same PIN on both devices.

• Mobile phone users often use Bluetooth to connect auxiliary devices such
as headphones and remote controls to their mobile phones.

309
Near-field Communication
• Near-field communication (NFC) is a wireless technology that was
developed for contactless payments using debit and credit cards.

• An NFC-enabled device can also act as an electronic identity document or


a keycard.

• The range of NFC is usually a few centimeters, with a modest data rate of
up to 424 kbps.

• NFC is a point-to-point communication technology between two devices,


but as it offers no protection against eavesdropping, the two devices
have to operate at a close proximity to prevent interception.

310
Challenges and Issues
1- Coverage
We have all experienced the frustration of not being able to get a mobile
signal. But why does this occur?
Locations:
In the remote countryside, the signals are usually weaker. This is because in areas
with lower population densities. The providers therefore build fewer base
stations, so each has to cover a larger area.

Weather:
• As the propagation of radio waves can be affected by the weather, this can
weaken the signal reception at some places

311
Challenges and Issues
2- Battery Life
• With a large display, various radio connections and powerful processors,
modern mobile phones consume lots of power.

• This means that heavy users have to carry a spare battery or an


emergency charger. So what are the solutions?

More Efficient Use of Power:


One way is to reduce unnecessary power consumption of the mobile phone.
This can be achieved by automatically turning off or reducing the power of
high-draining components of the mobile phone

High-power Batteries:
Another way is to improve its energy-storing ability.

Short Charging Time:


Scientists have also been looking for ways to reduce the charging time of
batteries using nanotechnology that makes it fully recharged in few minutes.
312
Challenges and Issues
3- Implications for Health
• Mobile phones use electromagnetic waves in the microwave range to
transmit and receive data from base stations.

• Many national radiation advisory authorities and government health


organizations, such as Public Health England (PHE), continue to suggest
precautionary measures such as using a hand-free device to keep the
mobile phone away from the head, to minimize exposure.

313
Summary
• You learned what ‘the cloud’ is and the kinds of services it can
provide to individuals and businesses. You also considered some of
the issues with the cloud, and how the expectations of the cloud
providers and the cloud consumers can be managed through
customer service agreements.

• Also, you have been introduced to some of the components and


sensors in mobile phones and some of the ways in which mobile
phones communicate with other devices.

• You have also learned about some of the challenges and issues
around the use of mobile phones.

314
TM112: Introduction to Computing and
Information Technology 2

Meeting #9
Block 2 (Part 6)
Location-based Computing

OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati


315
Contents

• Introduction
• Making Use of Location
• Introducing GPS: Where in The World Are You?
• Where’s My Phone?
• I’m Home… Wi-Fi Reveals Location Too
• Indoor Tracking: Just The Place for Bluetooth Beacons
• Summary

316
Introduction
• Where, exactly, are you? How do you know where you are? Who
else knows? And does it matter?

• If I were to ask you these questions, how would you respond? You would
possibly reply to the first question with a physical location that is
meaningful to you (‘at home’, ‘in the living room’), or you might be
travelling, in which case your response might be more vague (‘walking
along the Embankment in London between Waterloo Station and the
Houses of Parliament’).

• More precisely, you might specify a particular address, or even the


latitude and longitude of your current location. In answer to the second
question, you might quickly check, and reply ‘just by looking around’.

317
Making Use of Location
• A location-based computing is a form of computing that is used to
support the delivery of location-based services. These services
provide information based on the location of the user.

• The service provider needs to know the user’s location in order to


provide the service that the user is interested in.

• The ways in which location can be determined is referred to as


localization.
• To explicitly identify a location as geographically based according to
a known coordinate system, such as latitude and longitude, we
often refer to the location as a geolocation and the process of
localizing an object as geolocating it.

318
Making Use of Location
• Geocoding refers to the encoding of location information, such as
an address, into a geolocation.
• Reverse geocoding is the process of identifying a location in human
terms (such as an address, or point of interest such as the Eiffel
Tower) from a geolocation.

• The various ways in which location can be determined will be


reviewed in more detail later. But before we get into some of the
technicalities, let’s first consider some important ideas associated
with location-based services, as well as a way of categorizing them.

319
Are We There Yet? Geofencing
• One of the most widespread uses of geolocation data is for route
planning as part of a navigation system, such as in-car GPS. As well as
supporting navigation, such systems know when you’ve reached your
destination. But how?
• One way is to create a range of services based on the location of
someone or something using a technique known as geofencing.
• A geofence is a notional (virtual) geographical boundary that can be
used to define an area within which particular services may be offered
or withheld.

• Geofencing can be used in two main cases:


• When a tracked object crosses a predefined boundary, the act of
crossing the boundary is detected and used to trigger some action.
• The range of available services or actions may be restricted or
enabled according to whether the tracked item is inside or outside
the geofenced area.
320
Linking Location and Context
• A single piece of location information on its own may appear to have little
value. However, when collected over time or combined with other sorts of
information, its true value may become more apparent.

• Location information may reveal a considerable amount of behavioral


information about me. It may also be used to associate a behavior with a
particular time and place, particularly when the location data is correlated
with other data, such as my e-book reader usage data.

• One issue with revealing location information is that the type of locations
visited might be used to infer information about our activities.

321
Linking Location and Context
• As well as using location data to identify particular locations a user has
visited, it is often possible to use a series of time-stamped location readings
to make better estimates of the user’s current activity.

• Location data captured by devices such as GPS-equipped mobile phones


can be used to determine speed, direction of movement and route taken.
• Given two location readings taken at different times, it is possible to
calculate the distance between them.
• Calculating the time between the readings then lets us calculate the
speed. (Recall that speed = distance/time.)
• The combination of this information might indicate that I am in a car,
although we would not necessarily be able to deduce from it whether I
was driving or not.

322
Linking Location and Context
• Location-based technologies may also be used to track, or reveal, the
location of other people.

• Many camera phones allow photos to be tagged with information about


who is in the photograph as well as geotagged with location information.

• Many people have a routine that requires them to move between a small
number of locations each day, such as their home, workplace, children’s
school, etc. It is often possible to use just a few such location updates at
different times of the day to identify significant locations in that individual’s
life, and from those, to identify who that individual actually is.

323
Introducing GPS: Where in The World
Are You?
• The Global Positioning System (GPS) was originally developed for
military use, but has been readily adopted as a consumer-facing
technology, and is the most commonly used satellite navigation system
(hence the abbreviated term ‘satnav’).

• We will look in more detail at how GPS works in Subsection 6.2.3, but
suffice to say for now that GPS is a line-of-sight technology; that is,
there should be an unobstructed view of the satellites from the
receiver, typically limiting the effective operating area of GPS systems
to outdoor locations.

324
Introducing GPS: Where in The World
Are You?
• More recently produced phone-based navigation apps, take the form of
fully connected devices, using a data connection provided via the
mobile phone network.

• These are capable of both receiving data from a range of live, context-
based online information services, such as alerts about upcoming traffic
delays, as well as feeding information such as your speed and current
location back to service providers.

• As well as using GPS for individual route-finding, GPS trackers are also
used by many commercial vehicle operators to privately keep track of
the vehicles in their fleet. However, real-time sources of data are also
available for tracking other sorts of vehicle, and increasing amounts of
this information are being made publicly available.
325
Identifying Where You Are: From
Addresses to Locations

• That GPS helps us locate ourselves is something we take for


granted. But where does it locate us? When giving travel directions,
the name of a town district or village, or a particular postcode,
might be enough to get you into the vicinity of a particular location,
but it still defines a broad geographical area.

• To pinpoint a location more exactly, we need to find a way of


referring to a particular point, rather than an area.

326
Latitude and Longitude
• In Activity 6.3, you used a
Google Maps search to map the
location of a set of latitude and
longitude coordinates. A
standard Google search also
allows you to run a web search
for the coordinates of named
location. When I searched for the
coordinates of Milton Keynes
(see Figure 6.4), I was presented
with the result 52.0406° N,
0.7594° W. This represents an Figure 6.4 Global coordinates of Milton Keynes

absolute location defined using


the global coordinate system of
latitude and longitude.
327
Latitude and Longitude
• The lines of latitude are imaginary horizontal lines that run parallel to the
equator. The latitude at the equator is defined to be 0 degrees latitude. Lines of
longitude, also known as meridians, run vertically from the North Pole to the
South Pole. The line of longitude that runs through the grounds of the Royal
Observatory at Greenwich was established as the Prime Meridian, or 0 degrees
longitude, in the nineteenth century. The fixed reference point or origin of the
latitude/longitude coordinate scheme is the location where the equator and the
Greenwich meridian cross.

328
Latitude and Longitude

• Using this scheme, the coordinates of The


Open University in Milton Keynes can be
defined as 52.0251, -0.7105 – which is to
say, 52.0251 degrees north of the equator
and 0.7105 degrees west of the Greenwich
meridian, or 52.0251° N, 0.7105° W.

• You may recall that the Google search


result for a location gave the result using
the degrees north/degrees west
convention, so the result I obtained of
(52.0406°N, 0.7594° W) would be
represented as (52.0406, -0.7594) using the
numerical representation.

329
GPS-based Location Detection
• GPS devices can use latitude and longitude coordinates to
represent and communicate geolocation information in an
unambiguous way.

• Even though millions of separate GPS receivers can use the same
GPS satellites to determine their own location, none of the
satellites need to know anything about any of the receivers.
• Rather than responding to individual requests from each separate
receiver, each satellite broadcasts the same information to all the
GPS receivers that are in sight of it. It is then up to each receiver to
work out where it is for itself.

• But how do the GPS receivers determine those coordinates, and


how do they do it without needing to connect to the GPS satellites
directly?
330
GPS-based Location Detection
• As the animation shows, the calculation used to determine the
distance of the receiver from each satellite is based on the time
taken for the signal to go from satellites (with known positions) to
the receiver. This requires very accurate timing and the ability to
use synchronized clocks on the satellites and at the receiver.

• The signal transmitted from a satellite to a GPS receiver contains a


sequence of code that runs continuously and synchronously (in
step) at both the GPS receiver and the GPS satellite transmitter, as
shown in the Figure 6.7.

Figure 6.7 GPS


code

331
GPS-based Location Detection
• The GPS receiver can compare the received code with the code already
running at the receiver. The received code will appear delayed by the
amount of time that it has taken to propagate from the transmitter – that
is, it will appear to have been shifted in time (see Figure 6.7).
• The adjustment in time needed to bring the received code and the original
code sequences back into alignment is the time it takes for the signal to
travel from the satellite transmitter to the receiver.

• The following calculation shows how we can estimate the time it takes for a
signal to reach the Earth’s surface from a GPS satellite directly above it, at
an altitude of 20 800 km. A radio signal travels at the speed of light, which
is approximately 3 × 108 m/s (300 000 000 meters per second), and which
we assume to be constant.

• The relationship between speed, distance and time is:


speed = distance / time

332
GPS-based Location Detection
• Where time is in seconds, distance in meters and speed in meters
per second.
• As we know the distance from the receiver to the satellite and the
speed of light, we can rearrange the equation to calculate the time
as follows:
• time = distance / speed

• The first thing we need to do is to convert the distance in km into


meters as follows:
20 800 𝑘𝑚 = 20 800 000 𝑚
= 2.08 × 107 𝑚
My calculation then becomes:
time in seconds = (2.08 × 107𝑚) / (3 × 108𝑚/𝑠)

333
GPS-based Location Detection
• By expressing the calculation in scientific notation, we
can divide the first power-of-10 term by the second by
subtracting the exponent of the second number (the
denominator) from the exponent of the first number
(the numerator):
(2.08 / 3) × 107−8 𝑠 = 0.69 × 10−1 𝑠
(to 2 significant figures)

• As you should recall, the negative exponent tells us that


if we want to express this in ordinary notation, we
should move the decimal point one place to the left.
Thus the propagation time is 0.069 𝑠 (to 2 significant
figures), which is to say, 69 milliseconds (69 𝑚𝑠).
334
Two More Challenges to Using GPS
• Activities 6.10 and 6.11 suggest that GPS needs to measure a time
delay of somewhere around 60 or 70 ms to a precision of about 3 ns
for a resolution of 1 meter to be achieved.

• But we are still faced with two problems when it comes to


determining our location. Firstly, how do we ensure that the clocks
in the receiver and the satellites are synchronised? Secondly, where
are the satellites, so that we can fix our position relative to them?

• In answer to the first question, high-accuracy atomic clocks are


available, but they are expensive. The cost of such a clock can be
justified on the 24 GPS satellites, but not in the millions of low-cost
GPS receivers. This means that the clocks in GPS receivers are not
sufficiently accurate to identify a precise location from three
satellites.
335
Two More Challenges to Using GPS

• A solution to the delay and inaccuracy problem lies in using a


fourth satellite. In a perfectly accurate system, the theoretical
lines joining the GPS receiver to three satellites that the
receiver is using will meet in a single point.

• However, if the calculated distances from each satellite are


slightly inaccurate (because they have been calculated from
slightly inaccurate times), then the theoretical spheres won’t
be in exactly the right place.

• Consequently, the point of intersection won’t be at the right


place on the Earth’s surface and the GPS receiver will report
an erroneous position. This is where the fourth satellite
comes in. 336
Where’s My Phone?
• In the previous section, you learned how GPS can be used to
find the latitude and longitude of a location anywhere on the
planet – as long as enough GPS satellites are in sight.
• In this section, you will learn how locations can be identified
using mobile phone cell towers, rather than GPS satellites.

• Unlike the GPS system, where the satellites are completely


unaware of the location of receivers, cell-tower localization
may reveal the location of the receiver back to the network,
via the cell towers. In addition, communication service
providers may retain this information and associate it with
you, which means that cell-tower localization may have
implications for personal privacy.

337
Localization Using Triangulation and
Trilateration
• If you enable your mobile device to supply location information to
other networked services, they may use that information to deliver
location-dependent services back to your device. But it is also
possible for networks to become aware of your location based on
the physical location from where you connect to the network.

• Using a compass and an accurate scale map, you can locate your
position using two or more bearings. (A bearing is an angle
measured from due north.) Identify two or more visible points that
you can see from your current location and that you can also
identify on the map.

338
Localization Using Triangulation and
Trilateration
• When draw a line through each observed location on the bearing it
is measured on and your location is the point at which the lines
intersect. This technique is known as triangulation, and it uses
angles to known locations to locate your position. However, if we
are trying to identify our location based on just our distance from
multiple cell towers (or GPS satellites), we need a slightly different
technique called trilateration. Note that triangulation is often used
to refer to both trilateration and triangulation. A more general term
– localization – is also used to cover these two techniques (and
more).

• As with GPS, the trilateration technique is at the heart of cell-tower


localization, determining location based on proximity to multiple
cell towers. 339
Approximate Location Determination
• Each base station or cell tower in a terrestrial mobile phone
network has a unique identifier, the Cell ID.

• The Cell ID is included in every transmission sent by a base station.


A mobile device connected to the network detects the Cell ID of the
base stations that are currently within transmission range. Using
this information, together with the strength of the signal received
from each base station, a mobile device can determine the identity
of the base station that is closest to its current location.

• Each Cell ID is geocoded with a latitude and longitude. The device


uses a directory to look up the geocoded location information (the
latitude and longitude) of the Cell ID. The device then has an
estimate of its location based upon the location of the cell tower.

340
Approximate Location Determination
• Figure 6.9 shows two base stations, M1 and M2, with location
coordinates (2, 4) and (5, 2), respectively. The circles represent the
transmission range of the base stations. Mobile device A is in range
of M1 and mobile device B is in range of M2. Using the technique
described, device A can estimate its location to be the coordinates
of M1 – in other words, the estimated location of device A is (2, 4).
Similarly, the estimated location of device B is (5, 2).

Figure 6.9 Two


geocoded terrestrial base
stations

• The accuracy of the computed location is inversely related to the


transmission range of the base stations: the bigger the transmission
341
range of a base station, the lower the accuracy.
Identifying Cell Tower Locations
• Cell tower localization is more frequently based on locating a device
relative to multiple cell towers based on the relative signal strength
detected from each of them. Your phone calculates this
automatically, but you can explore how this works for yourself using
services such as CellIdFinder or the Google geolocation API.

342
I’m Home… Wi-Fi Reveals Location Too
• In the previous two sections, you learned how GPS and cell-
tower localization can be used to find the location of a mobile
device.

• But you may also have noticed how your web browser
occasionally prompts you for location information when using
your desktop or laptop computer. (Browser location services
can be enabled and disabled through your browser
settings/preferences.)

• This information can then be shared with the website that


requested it in order to provide location-based services. But
how do desktop and laptop computers know where they are?
343
Wi-Fi-hotspot-based Localization
• A cell-tower localization system allows a device to crudely estimate
its location by looking up the location of any cell towers that it can
connect to, and then doing some calculations based on the distance
to each of them. Computing devices can also use Wi-Fi connection
points to estimate location in a related way.

• Just as each cell tower has a unique Cell ID, each Wi-Fi router also
has a unique identifier associated with it in the form of MAC
address.

• MAC addresses (media access control addresses) are unique


identifiers associated with a device’s network interface. A device’s
MAC address is used at the transport level to route traffic to
network-connected devices.

344
Wi-Fi-hotspot-based localization
• Several global databases exist that have location information
associated with Wi-Fi router MAC addresses. These location/MAC
address pairs are often harvested from location-aware mobile
devices that have encountered the Wi-Fi router at a particular
location. As such, the location information may not be reliable – for
example, if the router has been moved since its location was last
confirmed.

• If you look up one or more Wi-Fi router MAC addresses, you can use
the locations associated with them to roughly locate your position.
If you also know the power of the signal from the router, and an
estimate of the signal power associated with it, you can use that
information to improve your estimate of your distance to the
router, and hence improve the accuracy of your computed location.

345
Wi-Fi-hotspot-based localization
• When you grant permission to a website allowing it to use your
location, the browser may request information from the computer
about in-range Wi-Fi router MAC addresses. It can then submit
these addresses to a lookup service which returns the estimated
location of the browser based on the location of the identified Wi-Fi
routers.

• One of the major providers of browser-based location lookup


services is Google’s geolocation service. This allows users to look up
the location of Wi-Fi access points based on the MAC address of the
Wi-Fi router and also mobile phone cell towers based on the cell
tower ID.

346
Tracking Devices Using Wi-Fi
• Can the Wi-Fi network also be used to track the movements of users
carrying Wi-Fi enabled devices?

• There are two ways in which Wi-Fi enabled devices can identify any
nearby access points (Haigh, 2014):

 passive mode: in which the device listens out for an announcement


signal, or beacon signal, transmitted by the access point across several
different radio channels in turn

 active mode: the typical default mode, in which the device broadcasts
a ‘Who is there?’ packet on each channel and waits for a response. The
broadcast message includes the device MAC address, which means
that the device reveals itself to anyone who happens to be listening.

347
Passing trade: Bluetooth Beacons and
Contextual Alerts
• Originally developed by Ericsson Mobile Communications to replace wired
connections to mobile phones, Bluetooth is now a widely adopted
technology for providing short-range radio communications between
devices. Another major application area is connecting peripherals such as
audio speakers to media players or wearables to mobile phones.

• But Bluetooth can also be used as part of a location-determining system in


which fixed Bluetooth beacons are installed in particular locations, such as
shopping centers or airports. The identifying code of each beacon is
registered in a directory, along with its location. When the beacon is
detected by a passing device, the owner of the device can be prompted
with an alert relating to that location or context by looking up the location
of that beacon.

348
Passing trade: Bluetooth Beacons and
Contextual Alerts
• In contrast to GPS-based services, a predominantly outdoors technology
where a device locates itself with reference to multiple GPS satellites,
beacon technology typically associates a fixed indoor location with a
beacon identifier that alerts passing receivers to the presence of that
beacon.

• Contextual alerts can be brought to a user’s attention by means of a mobile


app which uses Bluetooth to listen out for the presence of a nearby beacon.
In a retail environment such as a shopping center, if a store beacon is
identified and recognized, the app may look up any promotions associated
with that beacon and alert the user to them. Beacons may also be used to
support navigation for users with limited vision: the Wayfindr protocol has
been defined to support apps developed specifically for this user context.

349
Localization Using Bluetooth Beacons

• In the protocol used by iBeacons, the beacon identifier is transmitted along


with a benchmarked power value associated with the beacon at a distance
1 m away from it. The power of a wireless transmission decays according to
the square of the distance away from the transmitter. This means that we
can use the power of the signal received by a receiver together with the
transmitted power value to calculate the distance the receiver is from the
transmitter. In effect, we can use the power information as a proxy for the
distance the receiver is from the beacon.

350
Localization Using Bluetooth Beacons

• So how does this help us determine our position if we are surrounded by


iBeacons?

• Suppose now that there are three beacons in range. Based on the power of
the signal received from each beacon, we can work out the distance to
each beacon, but not necessarily in which direction it is. However, we also
have the identifier for each beacon, which can be used to look up its
physical location. Knowing the location of the three beacons, and the
distance to each of them, allows us to locate the receiver.

351
Localization Using Bluetooth Beacons
• In Figure 6.11, three beacons
are in range of a receiver. The
receiver uses power readings
(the strength of the signals
received from the beacons)
and power ratings (the power
level 1 m away from each
beacon) to calculate the
distance to each beacon.
Circles are drawn around each
beacon to indicate the
measured distance the
receiver is from the beacon – Figure 6.11 An
idealised view of
we just don’t know in which localization using
direction each beacon is trilateration

situated from us.


352
Summary
• If you have had several generations of smartphone, you may have noticed
that they have moved on from offering GPS-based location services to
‘location services’ in general. The location of the device may be determined
by a variety of means: GPS, cell-tower localization, Wi-Fi hotspot
localization or even beacon-based/Bluetooth localization. As you have
learned over the previous sections, the triangulation/trilateration principle
that uses estimated distance measures to reference points of known (or at
least, discoverable) location plays a key role in calculating the actual device
location. You have also seen how some localization techniques by their very
nature reveal the user’s location to the wider system, whereas others
maintain a level of anonymity.

• The extent to which location data is collected and stored raises


considerable privacy concerns for some, and disregard from others. But the
first step towards putting proper safeguards in place, if such safeguards are
required, is to develop a good understanding of what the technology is
capable of, and what the limits to it are.
353
TM112: Introduction to Computing and
Information Technology 2

Meeting #10
Block 2 (Part 7)
Dangerous Data

OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati 354


Contents

• Introduction
• Online – The New Font Line
• Information Assets
• Authentication
• Malware
• Cyberwar
• Summary

355
Introduction
• The explosion in goods and services available online, as well as our
society’s desire to socialize online, has made the internet an irresistible
target for people who wish to do us harm; ranging from criminals who
want to steal our money and our identities, to those who abuse
vulnerable people.

• The term hacker has historically been a divisive one, sometimes being used
as a term of admiration for an individual who exhibits a high degree of skill,
as well as creativity in his or her approach to technical problems, and
sometimes (more commonly) applied to an individual who uses this skill for
illegal or unethical purposes.

356

ethical hacker unethical hacker


CIA
• The guiding principles behind information security can be summed up
in a three-letter acronym you are sure to remember: CIA, standing for
confidentiality, integrity and availability.

• We want our information to:

 only be read by the right people (confidentiality)


 remain unchanged so long as we’re not editing it (integrity)
 be available to read and use whenever we want (availability).

• It is important to be able to distinguish between these three aspects of


security.

357
Spear Phishing: The Targeted Attack
You have almost certainly received spam email supposedly coming
from a bank or another company telling you there is a problem with
your account. These emails are phishing for information. Their
senders hope you will respond and provide personal information that
can be used to commit fraud.

In late 2014, the US cybersecurity corporation Cylance reported their


findings of a major hacking operation called Cleaver that broke into
numerous computer systems, extracted large amounts of sensitive
data and caused potentially serious damage.

• Instead of sending millions of messages in the hope of getting a few


responses, Cleaver’s operators targeted their phishing. The
attackers used information stolen in the first part of the operation
to identify and attack people and organizations of interest.

358
Spear Phishing: The Targeted
Attack
• This type of targeting is called a spear phishing attack. A 2012 estimate
(TrendLabs, 2012) suggested that 91% of targeted attacks used spear
phishing at some point. Spear phishing focused on senior
management, (who are most likely to have privileged access to
information), is known as whaling.

• Cleaver’s targets had received emails saying that they were being
considered for an important job. They were asked to complete a CV by
following a link to the website easyresumecreatorpro.com where they
could download a copy of a well-known tool called Easy Resume
Creator Pro.

359
The Final Attack: Malicious Software
• Cleaver’s developers created a new, malicious version of the CV writing
application which could be downloaded from
easyresumecreatorpro.com.

• Just like the original, the application allowed users to create a new CV.
When complete, users were encouraged to upload their document so it
could be reviewed by potential employers. In fact, nothing was
uploaded; submitting the CV activated malicious software that had
been downloaded along with the application.

• This malicious software represented a persistent threat to the user. So


long as it was active on the user’s computer, Cleaver had access to that
machine and its data. This piece of malicious software is called
TinyZBot – an example of a backdoor, which is a gap in a computer’s
security that allows attackers to control the computer and/or steal
data.
360
The Final Attack: Malicious Software
• Among its capabilities, TinyZBot could:
 log keystrokes: the program recorded which keys were pressed
on a keyboard, which is a common way for hackers to steal user
IDs and passwords from computers they have access to

 monitor clipboard activity: the clipboard is an area of memory in


an operating system used to store data that is being copied and
pasted

 capture screenshots: another way of stealing data, but it can also


be used by attackers to learn the layout of industrial plants from
on-screen displays

 detect security software: many malicious programs attempt to


either hide themselves or might disable the security features of
361
modern operating systems and antivirus software.
Understanding Current Threats

• Now you have listed your information assets, it is necessary to


consider how they can be compromised.

• Any discussion of protecting assets uses three key terms:

 vulnerability: a point at which there is potential for


a security breach
 threat: some danger that can exploit a vulnerability
 countermeasure: an action to protect assets
against threats and vulnerabilities.

362
Passwords

• Millions of people use online services every day, and it is crucial


that these systems prevent users from accessing each other’s
information. To do this, they need a way of uniquely identifying
each user in a way that prevents users from impersonating one
another. This way includes identification, authentication and
authorization.

• To better understand the difference between the three, consider


the real-life example of entering the AOU campus.

363
Passwords
I. Identification. The process of claiming you are a particular
individual. In our example, when you hand over your AOU
university ID to the university campus security officer, you
identified yourself as an AOU student. Identification doesn’t
prove that you are telling the truth; although you presented a
UID, you might be using a false one.

II. Authentication. The process of proving your identification.


The security officer has to verify that this ID is genuine and
belongs to you and will authenticate your identity by
examining your ID and comparing your face against the
photograph in the ID.

III. Authorization. Follows the processes of identification and


authentication and provides access.
Finally, satisfied that the ID is genuine and that your
face indeed matches the photograph in the ID, you are 364

then authorized to enter the campus.


Passwords

• With computers and computer systems, you most commonly


perform identification by providing a user ID and password,
authentication is performed by comparing the password you
provided with a password stored on the system for the user ID you
provided. If they match, authorization takes place. For instance,
authorization may result in you being able to access your email,
bank, shopping or other account

365
What Happens When You Enter A
Password?
• Imagine you had to create a computer password system for a
website. You might start off by having a user enter their password,
which is transmitted to the site’s server and compared to a stored
password. Only if the two match is the user allowed into the site.

• You can probably recognize a couple of potential vulnerabilities


with this approach.
1. The password is transmitted as plaintext.
If the password is 12345 and sent across the internet, it can be
intercepted by an attacker;
2. The password is also stored as plaintext on the server. An
attack on this server would not only reveal an individual user’s
password, but potentially expose every password belonging
to every user.
366
What Happens When You Enter A Password?
• Fortunately, countermeasures exist for both problems.
1. The first problem is overcome by encrypting
communications between the user and the server.
Encryption is a process that scrambles data so that it cannot
be read by unauthorized parties. (We will talk much more
about encryption in Block 3 Part 3)
2. The second problem is solved by obscuring passwords using a
technique known as hashing.

Hashing is an algorithm performed on data such as a file or message


to produce a checksum message called a hash .
The hash is used to verify that data is not modified, tampered with,
or corrupted.
For instance, using a hashing algorithm called MD5:
 The MD5 hash of ‘hello’ is always 367
5d41402abc4b2a76b9719d911017c592
What Happens When You Enter a Password?
Hashes have three crucial properties:
1. Every different piece of plaintext produces a unique hash. For example,
the hash for ‘hello’ is different from that for ‘Hello’. Despite the only
difference being that one word is capitalized and the other entirely in lower
case, not only are their hashes different, but there is no obvious resemblance
between the hashes.
2. Hashes are always the same length, no matter the length of the
original plaintext. The MD5 hashing algorithm always produces hashes that
are 128 bits long. The hash for a large chunk of Alice is exactly the same
length as that for ‘hello’; so it is impossible to determine the length of the
original text from the hash.
3. It is nearly impossible to transform the hash back into the original data.
Even if you obtain the MD5 hash of a password – e.g.
4a77060f0f04a1bcd2f3b7975f8e6d68 – there is no quick, simple way to
recover the original plaintext solely from the hash.
368
Applying Hashing to Passwords

• When a user wants to log on to their account, their password is


hashed and sent over a secure internet connection. It is then
compared to a hashed password stored for that user. (If the stored
password is encrypted, the computer will first decrypt the hash.)
The user is granted access to the computer only if the two hashes
match.

• Even if the password file for every user is stolen, the attackers still
don’t know the actual passwords they need to enter in order to
access the computer. The users are not immediately at risk.

369
Attacking Passwords
• Two common techniques are used to obtain passwords:

I. Brute-force attack: The simplest method of breaking passwords


is a brute-force attack, where a computer methodically attempts
to log on using all possible passwords, beginning with ‘A’, then
‘AA’, ‘AB’, and so on – trying each in turn until it stumbles upon an
actual password.

Brute-force attacks are time-consuming, but if enough computers


are employed, brute force will break enough passwords to justify
the time and expense of running the attack in the first place.
Fortunately, brute-force attacks can be easily defeated by
restricting the number of failed attempts that can be made to
access an account before it is locked.
370
Attacking Passwords

II. Dictionary attack: An alternative to brute-force is to attempt to


find passwords that are also found in a dictionary. A dictionary
attack is usually performed on a copy of a stolen password file.
The attack itself is very simple: every password in the password
file is compared to every entry in a dictionary that may contain
popular names, birthdates, easy passwords, etc. Once a working
password is identified, an attacker can use it and the matching
user ID to log in to the hacked site, masquerading as the
legitimate user.

371
Username Hashed password
A record from a
Fadi2020 570a90bfbf8c7eab5dc5d4e26832d5b1 Stolen password file

Plaintext Hash

samar 7294001ae51b8cdfd50eb4459ee28182

Fadi2020 570a90bfbf8c7eab5dc5d4e26832d5b1

12345678 c794890af6c9e1b6d9050e056abcc4d3 Hashed dictionary


Aou123 aa2d6e4f578eb0cfaba23beef76c2194

2006199 d5aa1729c8c253e5d917a5264855eab8

qwerty daa759be97f37e5f7eff5883801aebed
• Hence, hashing, alone, cannot protect passwords from dictionary attacks if
the original password can be found in a dictionary. Matching a hash in the
password file with one from the hashed dictionary means that they
represent the same piece of plaintext.

372
Non-technical Attacks

• Rather than try to steal and break a password file, attackers may
risk stealing passwords from offices and other workplaces.

 Attackers may masquerade as office cleaners or couriers and steal


passwords written on pieces of paper or stuck to the computer itself.

 Attackers may try to strike lucky by trying obvious passwords such as


‘abc123’, ‘password’ or names of victims’ families, friends or pets.

 It takes only a few moments and a removable flash memory drive for
an attacker to install a keylogger program which captures passwords
as they are entered on the keyboard.

373
Password Managers
• A password manager is a computer application that stores passwords in
an encrypted database.

• Most password managers can create new passwords; since computers can
generate and store arbitrarily long pieces of nonsense text – such as
MHpKQCvpYoouTAaPiiWuFKjpNe7qnsbwkrvq3s3cX – password managers
can produce passwords that are highly resistant to both brute-force and
dictionary attacks.

374
Two-factor Authentication
• So, if one password isn’t secure enough, perhaps having two pieces
of information to perform authentication will be more secure? So-
called two-factor authentication will be familiar to you as you will
have used it to withdraw money from an ATM. Here, you must give
the bank two pieces of information:

 something you have


 something you know.

• In this case, the possession is the data stored on your bank card;
the information you know is your PIN. Individually, neither can
access your account, but when brought together they allow you to
withdraw money.

375
Hardware Security Tokens
• Many banks offer two-factor authentication to online banking
customers, with accounts accessed using a combination of a password
and a four- or six-digit number generated by a small hardware security
token that can be kept in a wallet or attached to a keychain.

• Each token uses a microchip containing a clock and a random number


generator to generate a new password every minute or so. (This type
of changing password is known as a one-time password.) The token is
synchronized with a master computer at the bank which is generating
identical passwords alongsidethe token.

• When a user logs in to their bank, they are asked to enter the token’s
one-time password into their browser. The bank’s computer will have
also generated the same number. The two values are compared by the
bank; if they match, the user is allowed into their account.
376
How Big is The Threat From
Malicious Software
• By 2014, nearly one million new pieces of malware (a contraction of
‘malicious software’) were released every day.

• Malware is capable of corrupting or erasing data and rendering


computers useless. It can create fake data; send spam emails; capture
credit card numbers, addresses and passwords; host and share child
pornography; fool users into visiting fraudulent sites; sabotage
industrial and medical machinery; attack government, business and
industrial computers. It can even commit blackmail.

377
How Big is The Threat From
Malicious Software
• Since 2013, a range of malware programs have targeted PCs; quietly
and quickly encrypting crucial data so that it can no longer be accessed
without paying a ransom. If this payment, sometimes running into
hundreds of pounds, is not paid, the data will be irretrievably lost.

• Some of this ransomware has been linked to organized crime. Just one
piece of ransomware, called Cryptolocker , is estimated to have
‘earned’ $27 million for its creators. In other cases, ransomware
appears to be primarily intended to cause disruption, such as the
WannaCry program which crippled computers in 150 countries during
May 2017, including those belonging to the NHS, O2, Nissan, FedEx
and Russian Railways.

378
What is Malware?

• Malware is a collective term for any type of software that attempts to


harm computers, or the data held on them. It is usually categorized
into three types:

i. viruses
ii. worms
iii. Trojans.

• However, as malware has evolved, the boundaries between the


different categories are beginning to blur.

379
Viruses
• A virus is a program capable of making new copies of itself which are
inserted into applications, data or crucial areas of a computer’s hard
disk.
• Viruses are attached to specific applications on a computer and are
activated when that program first runs.
Most computer viruses are built from three main programming components:

 The infection mechanism


The part of the virus responsible for finding new targets by searching for files
on a disk or a new device to infect.

 The trigger
An event or condition that activates the virus. The trigger can include a
certain date or time, or an action.
 The payload
The destructive code that forms the heart of the virus, which can perform
such tasks as corrupting, destroying or encrypting a user’s data or damaging
the operating system.
380
Worms
• Like a virus, a worm is a self-replicating program designed to make
copies of itself. Unlike a virus, a worm is a standalone application.
Most worms spread through network connections.
• Worms can use triggers to remain dormant on infected machines
until certain times or conditions whereupon their payload is
activated.

381
Trojans
• Unlike viruses and worms, Trojans are not self-replicating; instead, they
are often distributed by email or pop-up adverts on websites,
masquerading as legitimate applications such as screensavers. The Trojan
might even work as advertised – a download accelerator might result in
(slightly) faster downloading, but the Trojan will also contain a destructive
payload.

• Trojans allow attackers to gain control of the computer, copy or delete


personal information, monitor keystrokes looking for passwords or credit
card numbers, or quietly spread to other computers using the PC’s email
software.

382
Other Types of Malware
• In addition to the three types of malware described above and the
ransomware discussed earlier, you may see references to other forms of
malware, including the following.
• Adware
Forces users to view advertising and may report their internet use to
advertisers or its creators.
• Spyware
Attempts to access personal information by monitoring keystrokes
or patterns of activity.
• Rootkits
Hidden programs used by attackers to remotely control or access a
computer.
• Hijackers
Redirect browsers to unwanted websites, either to earn advertising
clicks or to download further malware. Some of the sites
masquerade as legitimate websites and are designed to harvest
personal information such as logins and credit card details.
383
Botnets
• One strand of malware is concerned with recruiting computers into
an army of infected machines coordinated over the internet to
perform a malicious task. Affected machines are called zombies,
whilst their network is known as a botnet (or zombie army).
Individual botnets may consist of tens of thousands or even
millions of machines spread across the world, giving the owner of
the botnet enormous power to cause damage.

• Botnet malware infects computers just like other forms of


malicious software. Initially, it rarely damages the host computer,
but is more concerned with spreading further copies of itself across
a network. The newly infected computer is registered as a member
of the botnet and the malware waits for orders from the botnet’s
controllers.

384
Botnets

Botnets Fall Into Two Broad Categories:

 Client–server (Figure 7.9a)

 These are the older type of botnet in which infected machines


are under the control of a remote command and control
server (sometimes called the controller).

 Once installed on a computer, the botnet malware contacts


its server via the internet using pre-existing channels such as
the Internet Relay Chat instant messaging program or web
connections. The server not only delivers instructions (such as
‘go to sleep’ or ‘wake up’), but it can deliver updates to ensure
the malware remains undetected.

385
Botnets
• Botnets Fall Into Two Broad Categories:

 Peer-to-peer (Figure 7.9b)

 Individual zombies, each maintain a short list of known peers


with which they exchange information.

 Commands and updates are introduced to the botnet by so-


called commanders and propagate through the botnet as
peers communicate with one another. Peer-to-peer botnets
are much harder to disable than client-server botnets since
they can continue to function so long as a single commander
remains online.

386
Botnets
• Botnets can also be used for the following:
• Spam email
Zombies can be used to send spam messages to every contact in
their address book.

• Click fraud
Most online advertising is paid for on a ‘per-click’ basis, with an
advertiser paying each time a user clicks on an advert. Click fraud
uses software to simulate clicking on an advert.

• Brute-force decryption
Passwords and other forms of secure data can be attacked by brute
force. Botnets share the task amongst many machines, allowing for
faster decryption.

387
Botnets
• Bitcoin mining
Bitcoins are produced through a complex mathematical process
requiring huge amounts of computer power. Rather than invest in
their own computers, criminals can use botnets to create new
Bitcoins.

• Denial-of-service(DoS) attacks
DoS is a method of attacking computers by flooding their network
connections with spurious data that prevents legitimate traffic from
being sent or received. Denial of service can cripple online services if
sufficient traffic can be directed at one site.
Botnets allow thousands or even millions of zombies to collaborate
in an attack; since the attackers are spread across the internet, these
attacks are described as a Distributed denial-of-service (DDoS)
attack.

388
Antivirus Software
• Antivirus software aims to detect, isolate and, if necessary, delete malware
on a computer before it can harm data. Antivirus software uses several
techniques to identify malware, the two most common are known as
signatures and heuristics.

• Signatures

 A signature is a unique pattern of data created by a malware


program in a computer’s memory or in a file. Antivirus programs
may run invisibly in the background, looking for malware
signatures in files either when they are downloaded or when they
are accessed by opening a file.
• Heuristics

 Heuristics are rules used to identify dangerous software based on


previous experience of known malware. The antivirus software will
‘decompile’ the suspicious program back to its source code and
examine it for instructions typical of malware – such as attempting
389
to replicate itself or overwriting key operating system files.
Antivirus Software
 Unlike signatures, heuristics do not require specific knowledge
about individual types of malware; they detect new malware,
for which signatures do not exist, simply by their behaviour.
Unfortunately, since heuristics rely on previous experience to
identify dangerous software, radically new malware (which
appears all too regularly) can pass unnoticed.

 Many antivirus programs use a combination of signatures and


heuristics to offer maximum protection.

390
Summary
• This part introduced you to cybersecurity, a topic relevant to you as
an individual as well as our society. Awareness of computer security
not only protects you, your family and your data; it is a key
academic skill for anyone wishing to work in the modern IT and
computing industries. It is no longer acceptable, or safe, for devices
and software to fail to include security features that affect their
usability or the safety of their users.
• You have met several key cybersecurity technologies, including
how passwords are processed by computers and how they can be
broken – a topic we will return to in Block 3 Part 3.

• You were then introduced to several different types of malware –


software especially designed to cause harm – and learned how they
spread, function and cause harm. You also learned about some of
the technologies used by antivirus companies to identify, contain
and destroy malware. 391
TM112: Introduction to Computing and
Information Technology 2

Meeting #11
Block 3 (Part 1)
Data on Your Computer: A Private Investigation!

OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati 392


Contents

• Introduction
• Hard Disk Drives
• Solid-State Drives (SSDs)
• Securing and Analyzing A Hard Drive
• System Files and Deleted Files
• Analyzing Main Memory (RAM) and Closing The Case
• Summary

393
The Structure of A Hard Disk Drive

• A hard disk is part of a unit -- often called a disk drive, hard drive or hard
disk drive -- that stores and provides relatively quick access to large
amounts of data on an electromagnetically charged surface or set of
surfaces. Today's computers typically come with a hard disk that can
contain anywhere from billions to trillions of bytes of storage.

• A hard disk is actually a set of stacked disks, like phonograph records.


Each disk has data recorded electromagnetically in concentric circles, or
tracks, on the disk.

Figure 1.4 The inside of a hard disk drive

394
Formatting A Hard Disk
• If there are lots of different operating systems, each with its own kind of
file system, do you need a different kind of disk drive for each? The
answer is no.
This is because we can prepare almost any hard disk drive to work with
any operating system and its file system by going through a process,
called formatting it, before we try to save any data to it.

• The most important thing that happens when a disk is formatted is that at
least one area of the disk must be loaded with the operating system’s file
system in readiness for it to store data. These areas are called partitions.
If you want to run more than one operating system on your machine, you
can even create partitions that have different file systems.

• If you have more than one partition, the formatting process will cause
them to be displayed as separate drives by your operating system –
for example, in Windows Explorer or Finder on a Mac.
• Once a disk has been formatted, you can write data to it.
395
Formatting A Hard Disk
1. Each platter of a hard disk is divided into several concentric tracks.
2. Each track is divided into several sectors, each of which can store the same
amount of data. A sector is the smallest physical storage unit on the disk,
and on most file systems it is fixed at 512 bytes in size.
3. A cluster can consist of one or more consecutive sectors – commonly, a
cluster will have 4 or 8 sectors. As a file is written to the disk, the file
system allocating the appropriate whole number of clusters to store the
file’s data.

396
Formatting A Hard Disk
• In older hard disk drives, the sectors on the outside of the disk had a larger
area than those closer to the center, which meant that they held fewer bits
per unit area and were less efficient at storing data than the inner sectors.
• On modern disks, each sector has the same area so they each store the same
number of bits per unit area.

• Figure 1.5 A comparison of sectors on an older and more modern hard disk
drive

397
Figure 1.5 A comparison of sectors on an older and more modern hard disk drive
Formatting A Hard Disk
• So once a file has been written to one or more clusters,
how does the operating system know where to find the
file again?

• FAT, which stands for File Allocation Table, is the area of the hard
disk that is used as an index of every cluster on the disk and records
whether a cluster is being used or not. It is what is at the heart of the
file system called FAT32, which used to be used by Windows
operating systems, but is now mainly used with solid-state memory,
such as flash. It can only cope with a maximum file size of 4 GB.

398
Formatting A Hard Disk
• Windows computers now mainly use a file system called New
Technology File System (NTFS), where a table called a Master File
Table (MFT) does a similar job.

• Apple has a file system that is unimaginatively called The Apple File
System, and the Linux file system is called ext4. They each have
similar tables.

• suppose that when a new file is to be saved, the operating system


can also use this table to determine which clusters are in use and
which are free to be allocated

• The space that is available for files to be written to is referred to as


unallocated space on the disk, and of course this is always a whole
number of clusters’ worth of bytes.
399
Deleting Data From An HDD

What happens when a file is deleted from a hard disk?

• When a file is deleted, the operating system doesn’t erase the file; it
simply makes the clusters that the file occupies available for
reallocation. So the data is still there until it is overwritten, but there
is no reference to it in the file allocation table. However, the
operating system at some point might allocate a new file to one of
those clusters, which overwrites the original data.

• So even if a file has been deleted, the data might still be right there
on the hard disk if it hasn’t been overwritten yet.

400
Deleting Data From An HDD
• As we have seen, the cluster system means that there is almost always some
unused space in the last cluster when a file is stored. The amount of unused
space depends on the cluster size and the file size. The logical size of a file is
a measure of the number of bytes of data a file actually contains. Its physical
size is almost always bigger than this because it has to be stored in a discrete
number of clusters.

• So, for example, take a file that has a logical size of 1280 bytes. In a system
where there are 4 sectors of 512 bytes in a cluster, the file takes up a whole
cluster (or 2048 bytes), which means that the physical size of the file is 2048
bytes. The difference between 2048 and 1280 is 768, which means that there
is a slack space of 768 bytes.

Figure 1.6 Slack space in a cluster


401
Deleting Data From An HDD
• After a file is deleted, if the newly allocated file does not occupy the
whole of the cluster or clusters, the data in the slack space is not
overwritten. And if the deleted file was larger than the newly allocated
file, there will be bits of the old file still intact, although not visible to the
file system. So data in the slack space might well come from files
considerably older than the deleted file.

• This leftover data, which is called latent data or ambient data, can
provide investigators with clues as to what was originally stored in the
whole cluster, which may in turn provide leads for other enquiries.

402
How to Permanently Delete Data
From An HDD
• For a hard disk drive, there are only three sure-fire ways:

i. Overwriting (wiping or shredding): Some disk management


programs (even some OS) provide an ‘overwrite’ utility that
fills every part of the disk with zeros or ones or a random mix
of the two.

ii. Degaussing: Data on HDDs (and other magnetic storage media) is


stored in patterns of magnetization. These patterns can be
disrupted by a powerful magnetic field, and a sufficiently powerful
field can erase an entire disk in a few minutes – a process known as
degaussing.

iii. Physical destruction: Some people and organizations require


that their disks are completely destroyed when no longer
403
needed.
Fragmentation
• Have you noticed that a hard disk drive gets slower the fuller it gets?

• If your operating system tries to save a file that cannot be stored in a


single cluster, the file system breaks up the file in cluster-sized chunks
and tries to save them in contiguous clusters.
• However, if contiguous clusters are not available, the file is fragmented,
which means that the remaining clusters are written elsewhere on the
same disk.That is called fragmentation.

• When you defragment (or defrag) a hard disk, you are using a
software utility that moves the chunks of files to try to arrange them
in contiguous clusters. That is called defragging

Figure 1.8 (a) A fragmented file occupying 6 clusters and (b) the same
404
file after defragmentation
The Growth of Solid-State Drives
• SSDs are solid-state drives, which use integrated circuits to store
data. They use a technology called flash memory.
• The file and operating systems still maintain the same system of
dividing the memory into logical sectors and clusters, even though
the physical form of a solid-state drive is very different to that of a
spinning disk.
• The operating system doesn’t need to know what physical type of
drive it is reading data from, or writing to, as long as it understands
the logical file storage structure defined by the file system.

• As the stats below show, the use of SSDs in computers is increasing.

405

Figure 1.9 A forecast of the growth of the worldwide use of SSDs in PCs
Comparing SSDs and HDDs
There are many advantages for SSDs over HDDs:

• One important factor is that fragmentation is not a performance


issue on SSDs, as SSD is truly a random access memory. Thus, the
access time for an SSD is the same, regardless of the location of the
data, and so fragmentation does not lead to the same problems as
for an HDD.

• SSDs are less susceptible to mechanical failure.

• SSDs are more resistant to shock and vibration.

• SSDs have lower power consumption

406
How Flash Memory Works
• On a microscopic level, SSDs are made up of semiconducting materials
that are configured so that they create a whole series of tiny electrically
insulated boxes, which act as memory cells.

• In flash memory, where cells are initially set to 1, a writing operation


can only change a 1 to a 0. Previously-stored data must be erased
and all cells reset to 1 by applying a high-voltage electrical pulse to a
block of cells  new data can be written to this block.

• Unfortunately, this use of a high-voltage pulse to reset a block of cells


to 1 causes damage to the structure of the cells where the data is
stored and limits flash memory’s lifetime to a finite number of write
cycles.

407
Deleting Data From An SSD

• You can still physically destroy the drive, but degaussing does not work
because SSDs do not rely on magnetism to store zeros and ones.

• However, many manufacturers provide SSD control software, called TRIM,


that uses any quiet time to ‘garbage collect’ any unreferenced file fragments,
and make them ready for writing (resets only the slack segments)
• Moreover, most SSD manufacturers have a utility for managing and securely
erasing their SSDs using a command called ATA Secure Erase.

• ATA Secure Erase command resets the whole of the SSD by applying a spike of
voltage to all of the memory cells simultaneously, flushing out all of the stored
electrons and forcing the drive to ‘forget’ all of its data.

408
Copying The Hard Drive and
Allocating A Hash Code
• As you know, data is represented by bits in computer storage and we are
going to make what is called a disk image of the hard drive – that is, we are
going to copy it, bit for bit. This is a process called dead system imaging,
because we have removed the hard disk from a switched off computer.

• It will take a while, though. “Take a look on that hard drive – can you see
where it says 4 TB? That means there are four terabytes’ worth of bits to
copy over, one at a time. And remember each terabyte is ~1012 bytes and
each byte has 8 bits.

• At the same time as we copy it, we are going to ensure that the image
cannot be accidently changed in any way. We are going to use a write
blocker for that – it is a piece of software that makes the image disk read-
only. Then we can work on extracting the data from the image, leaving the
original hard drive untouched. To be sure, we will seal it in an evidence bag as
soon as the copy is made. 409
Copying The Hard Drive and
Allocating A Hash Code
• The piece of software that I will use to make the disk image will also run an
algorithm that calculates a number, called a hash code, from all of the 0s and
1s on the original disk, This hash code provides a single number that is much
smaller than the total number of bits on the disk.

• Once we have made the disk image, we will use the same process to calculate
the hash code for that too. If the hash codes match, we can be certain (or
virtually certain, anyway) that the disk image is a true bit-for-bit copy of the
original disk, which means that we can do all of our investigations on the image
disk, rather than the original disk.

410
Copying The Hard Drive and
Allocating A Hash Code
How is this hash code calculated?
• Considering a simple hash code algorithm. Say the original
disk only contains a binary representation of the name
‘TAM’. We could reduce this to a single number by adding up
the ASCII codes for ‘T’, ‘A’ and ‘M’, which gives us a hash code
of : 84 + 65 + 77 = 226

• Suppose now that a dishonest forensic investigator tries to


change the data on the disk image to ‘SAM’ in an attempt to
get ‘TAM’ off the hook. If the court orders the recalculation
of the hash code of that disk image, it will now have the
value: 83 + 65 + 77 = 225
Which will not match the hash code of the original disk in the sealed
evidence bag.
411
Copying The Hard Drive and
Allocating A Hash Code
• Anagrams of ‘TAM’ will all have the same hash code, so here is an example of
another, slightly better, algorithm. Now we will ‘weight’ the ASCII codes
depending on their position in the word. So, for example, we can calculate
the weighted sum of the characters in ‘TAM’ as follows:
1 × 84 + 2 × 65 + 3 × 77 = 445
• When two sequences of bits have the same hash code we call it a collision,
but the very good hash code algorithms used in practice are designed to
minimize collisions.

• One way to deal with that problem is to use the modulus operator (%).

• A smaller hash code is often obtained by finding the modulus with a prime
number. So, for example, if we take the weighted value that we got for ‘TAM’
we can find its modulus with a prime number such as 23. So 445 % 23 = 8 and
we can use 8 as the hash code for ‘TAM’.
412
Reading The Hard Drive
• We need an image mounter. This is a piece of software that enables
the operating system to read and write data to a disk image. Except,
of course, in this case we can’t write anything to our image because
we used the write blocker when we created it. Once the disk image
has been mounted, its content will appear just as a physical disk
would in the computer, so it will look like another drive in Windows
Explorer or Finder on a Mac.

413
Timestamps and Other Metadata
• Metadata is a set of data that describes and gives information about
other data. The important pieces of metadata about a file kept by
any file system include the file’s name, size and path, as well as lots
of other information. It also keeps timestamps, which tell you when
a file was created, modified or deleted.

• You can see that the file size and the size on disk are different. The
first must be the logical size and the second must be the physical
size.

• The ‘Modify’ timestamp tells us the last time that the content of the
file was modified.

• You do have to be a bit careful when looking at timestamps, because


it is possible for someone to alter them by manipulating the
computer clock.
414
System Files

• The operating system keeps a log file of events such as logins,


logouts, device changes, system changes, etc.

• Windows and Mac machines sometimes hide files to prevent you


from accidentally deleting them. On this Windows machine, We can
see them if we go to the control panel and select ‘File Explorer
Options’ then select the ‘View’ tab. You could look up how to view
hidden files on a Mac or a Linux machine if you ever need to.

415
The Recycle Bin and Soft Deletes Files
• Assuming it was in the ‘Business plan’ folder, it looks as if it was deleted, so it
may be gone forever.

• We might be in luck if it was deleted using a soft delete: This is when a file is
deleted, either by pressing the delete button or dragging it to the Recycle
Bin (or the trash can in some operating systems).

• In fact, the file stays exactly where it is on the physical disk, whether it is an
HDD or an SSD. However, the operating system renames the deleted file
with a name that starts with $R and creates an associated file, the $I file, to
contain metadata about the deleted file. It then stores this new file in a
hidden location on the hard drive.

• If you want to recover your file from the Recycle Bin, you need to select it and
choose the option to restore it. When you do this, the data in the $I file in the
hidden location enables the operating system to reinstate its original path and
name, so it can be opened from its original folder in the Explorer window.
416
Hard Deletes
• Recycle Bin keeps your soft deleted files until the garbage consumes
about 5 percent of your computer’s available space. Then it purges
your oldest deleted files to make room for the new ones.

• Hard delete :This is done when either


a) Emptying the Recycle Bin, or
b) Pressing shift as you select delete.

• When a file is hard deleted from the Recycle Bin of a hard disk, the
data still exists in its original location, but the loss of the $I file and
the removal of any reference to the $R file in the hidden folder
means that the operating system cannot locate the file any more. So
the space it occupied is released so something else can be stored
there.
But until the space is overwritten, the original content is still there.

417
File Carving
There is software available, called file carving or sometimes data carving
software.
• Once the software thinks it has found a file format it recognizes, it
does some further checks on the subsequent bytes to see if they are
compatible with the kind of file identified.
• The software then tries to find the end of the file.

• If the end of the file can’t be found, or if the beginning of the file has
been overwritten or if all else fails, the file carving software will at
least guess where the file ends, knowing where the next header
starts.
• However, if the header is missing, these basic data carving techniques
won’t work.

418
File Carving
• File carving doesn’t work on SSDs,because the TRIM function will
ensure that the unallocated and slack space will be overwritten, so
there is nothing to find.”

• The software we are going to use will let us select the particular types
of file that we are interested in finding. This speeds up the process.

• The output looks like depends on the software package, but this one
will copy all of the matching file types that it finds onto a new disk.
Then we can search it using filters, such as keywords, to try to find
the specific file that we are looking for.

419
Live Acquisition of Main Memory (RAM)
• The rule for forensic investigations is that if a computer is running when it is
first encountered, then leave it running. If it is not running, then don’t boot it
up ( Live acquisition).

• Live forensic acquisition provides for digital evidence collection in the order
that acknowledges the volatility of the evidence and collects it to maximize
the preservation of evidence.

• As you recall, RAM is the memory that is used to store instructions and data
just before they are required by the processor.

• Also, RAM is also is the place where the operating system is loaded. So it also
contains information about what processes and programs are running, which
networks the computer is connected to, passwords, files that have been
decrypted and the keys that were used to decrypt them.

420
RAM Data Recovery
Then there are registry hives:
• The registry is an area of RAM that is used to store the lowest-level settings
of the operating system.
• A hive is just a space within this registry area. Each time a new user logs onto
a computer, a new hive is created for that user that contains registry
information about their profile, such as their settings, desktop, environment,
network connections and printers.

There may also be file data ‘temporarily’ stored in RAM before it is written to the
hard drive, so RAM analysis can reveal a lot of important information about a
system and its users.

The data stored in RAM disappears when the power is switched off. So, at any
given moment, the state of a system’s volatile memory is not reproducible.
That is why we have to use ‘live acquisition’ if we want to find evidence in RAM.

421
Summary

• You have learned such a lot since then. How data is stored
on a hard disk and in solid-state memory and how
difficult it is to entirely delete data. you now know how to
collect digital evidence, you know what metadata is and
how it can be found, and how to carve data from hard
disks and RAM.

422
TM112: Introduction to Computing and
Information Technology 2

Meeting #12
Block 3 (Part 3)
Cryptography: The Secret of Keeping Secrets

OU Materials, PPT prepared by Dr. Khaled Suwais

Edited by Dr. Ahmad Mikati


423
Contents

• Introduction
• Hashing
• Ciphers and Keys: An Introduction to Encryption
• Symmetric Encryption
• Turning The World Upside Down: Asymmetric Cryptography
• Summary

424
Introduction
• Computer security technologies are a double-edged sword: they
not only protect legitimate users from attack, but they can also hide
criminals from law enforcement. The history of computer security
has always been a balance between those who see these
technologies as a benefit to society and those who consider it a
great threat.

• As we move through this part, you should notice that we place


increased emphasis on the conflict between what is technologically
possible and what is socially acceptable.

Can we trust people with data? Should governments dictate how we


use data? Can we trust governments? And can we trust the computers
themselves?

425
Hashing
• We used hashing earlier to obscure passwords stored on computers. In
this context, hashing is used to hide the actual value of the password
from prying eyes, but hashing has many more uses and is crucial to a
wide range of computer technologies.
• Hashing is useful because of two related characteristics:

1. It is a ‘one-way’ operation.
2. A variation of a single bit of data between two otherwise
identical files will result in vastly different hash values

• Many different hashing algorithms have been developed, of which


several have been widely adopted (seeTable 3.1).

426
Hashing
Algorithm Hash size (bits) Published

Message Digest algorithm 5 (MD5) 128 1992

Secure Hash Algorithm 1 (SHA-1) 160 1995

Secure Hash Algorithm 2 (SHA-2) Up to 512 2001

Secure Hash Algorithm 3 (SHA-3) Up to 512 2015


Table 3.1 A comparison of four widely used hashing standards

• Whilst hashes are described in terms of the number of bits making up the
hash, they are usually stored and displayed as hexadecimal values, with
every four bits represented by a single hexadecimal value (0–f). So the 128-
bit MD5 hash

1100 0111 1111 0100 0101 0101 1110 0010 0111 0111 0000 0100 0011 0110
0100 0110 1111 0111 1101 1101 0110 0111 1000 0001 1001 1100 0110 1000
0000 0101 0011 1111

• is stored as the 32 character hexadecimal value


427

c7f455e2 77043646 f7dd6781 9c68053f.


Collisions
• Hashes are widely used in so-called digital certificates, which are used to
authenticate the origins of software.

• Ideally, a hashing algorithm should produce a unique hash for every


different piece of data. However, hashing algorithms can produce
identical hashes (so-called non-unique hashing) for different pieces of
data – known as a collision.

• Collisions are extremely rare – the first MD5 collision was only found after
hashing 250 different pieces of data – but that they exist at all means it is
impossible to completely guarantee the integrity of data hashed using
MD5. It is safe to say that if a malicious party processes enough MD5
hashes, they will find collisions that can be exploited.

428
Collisions
• The possibility of collisions means the MD5 algorithm cannot guarantee data is
authentic. Nor is it the only hashing algorithm under threat.

• The possibility that SHA-1 collisions could be used to falsify data has
encouraged software developers to redesign their applications, replacing MD5
and SHA-1 with more secure hashing algorithms such as SHA-2.

• Whilst SHA-2 is still considered secure, the US government has approved an


even more secure algorithm – unimaginatively called SHA-3. There is a much
smaller possibility that collisions will be found between SHA-3 hashes than
those for SHA-2.

429
Protecting Hashed Passwords
• Block 2 Part 7 showed how hashes can obscure computer
passwords, but cannot guarantee their safety, since hashed
passwords can still be compromised by a dictionary attack using a
dictionary of hashed words.

• Dictionary attacks are relatively simple to mount and can be


devastatingly effective. Therefore, security designers have
attempted to further strengthen defenses for those people who
choose to use easily guessed passwords. Salting is a process in
which a computer adds a small amount of additional data to a
password before it is hashed.

430
Protecting Hashed Passwords
• For instance:
1. A new user might choose the (terrible) password passw0rd, which is
almost certainly in any attacker’s dictionary and therefore
vulnerable.
2. The computer generates a random number, called the salt, e.g.
73950.
3. The two are joined together, creating a new password; depending
on the implementation of salting, the user’s password is
transformed into either passw0rd73950 or 73950passw0rd.
4. The new value is hashed.
5. The computer securely stores the salt alongside the hash.

• When the user next logs in, they enter their password (passw0rd);
the computer recovers their salt, recombines it with the password
and generates a hash which is compared to the stored hash.
431
Why Salt Works
• Salt greatly increases the number of possible hashes any attacker
must test in a dictionary attack. Rather than the attacker having to
generate and test a single hash value for each entry in the
dictionary, they would have to generate and test hashes for every
word combined with every possible salt value.

• For example, the MD5 hash for passw0rd is:

 bed128365216c019988915ed3add75fb

• Without salting, an attacker only needs to test this hash to see if an


account’s password is passw0rd. If we add just a three-bit salt –
which has eight possible values, 000, 001, 010, 011, 100, 101, 110 or
111 – we must now generate and test hashes for:
432
Why Salt Works
 passw0rd000, passw0rd001, passw0rd010, passw0rd011,
passw0rd100, passw0rd101, passw0rd110, passw0rd111,
000passw0rd, 001passw0rd, 010passw0rd, 011passw0rd,
100passw0rd, 101passw0rd, 110passw0rd, 111passw0rd

• Rather than testing one hash for each word in the dictionary, we
now need to test sixteen different hashes. Salting has made a
brute-force attack sixteen times more difficult than without using
salt.
• Real-world salts are much longer than three bits; typically, salting
schemes use equal-length salts and hashes.

433
More password protection-Key stretching

• Key stretching increases the amount of time required for even the
fastest CPUs to create a hash. It has little or no effect on most
legitimate users; if a hash takes half a second to generate.
• However, if verifying a single password takes half a second, it is
impossible to perform a brute-force attack on that computer in a
reasonable amount of time.

• Key stretching may be problematic for online shopping sites or social


media services where very large numbers of users are constantly
logging in and out.

434
More password protection-Encrypting Hashes

• We can further protect the password file using encryption, obscuring


its contents to anyone not possessing a piece of data known as the
key. Even if the password file is stolen, it is useless so long as the
encryption key is not also stolen.

• In the most secure systems, passwords are stored, encrypted and


decrypted by hardware security modules (HSM) plugged into a USB
or Ethernet port on the host computer (Figure 3.2). HSMs are designed
in such a manner that there is no way to export keys from the HSM in a
usable format. In fact, the only way to steal the keys is to steal the
HSM itself from a highly secure location.

435
Figure 3.2 The YubiHSM, a hardware security module designed to
plug into a USB port on almost any type of computer.
The Benefits and Limitations of
Hashing
• Hashing can:

 confirm data has not been changed since the hash was generated
 obscure passwords from casual inspection.

• Hashing cannot:

 confirm that data has never been changed


 guarantee the confidentiality of data
 authenticate the creator or sender of data.

436
Ciphers and Keys: An Introduction
to Encryption
• Encryption is a field of mathematics concerned with obscuring
information from unwanted viewers in such a way that the original
information can be recovered later. Machine encryption systems
originated during the early twentieth century, including the famous
Enigma codes of the Second World War. For most of history, encryption
was time-consuming, expensive and largely restricted to governments
and businesses.

• The development of the computer, in part to break sophisticated wartime


German and Japanese codes, spurred the development of yet more
complex means of encryption. Computers could perform:
 the mathematical operations that underpin all cryptography
 much more complex mathematics than could be reasonably
expected of a human
 much faster than a human …
 … on much more data than a human could handle. 437
Ciphers and Keys: An Introduction
to Encryption
• Computer encryption algorithms work on binary data, so any data that
can be represented in binary can be encrypted. It is not an exaggeration to
say that encryption makes much of the modern world possible.
• Some commonplace applications for cryptography include:
 secure banking and payments systems – cryptography safeguards
your money, whether it is sitting in an account, being transferred
between accounts, issued at an ATM or used to shop online
 protecting conversations made over mobile phones and online
telecoms applications such as Skype and WhatsApp
 safeguarding wireless networks
 authenticating data (as seen in Section 3.1)
 securing files stored on hard disks and memory sticks
 authenticating electronic documents
 electronic voting
 preventing piracy of media files, including games, music and movies
 and so on. 438
Some Terminology
Before going further, it is necessary to introduce some specialized
terminology relevant to cryptography that we will use for the
remainder of this module.

 Plaintext is information that can be directly read by humans or


a machine.
 Ciphertext is the encrypted data.
 A key is a piece of data that determines the value of the
ciphertext when plaintext is encrypted (and vice versa).
 A cipher is the algorithm responsible for turning plaintext into
ciphertext, and for restoring ciphertext to plaintext, using one
or more keys.

 Encryption is the process of converting plaintext to ciphertext.

 Decryption is the process of reverting ciphertext to plaintext


(occasionally ‘decipherment’). 439
Computer Encryption Keys
• Keys are the second input to an encryption algorithm alongside
the plaintext itself. (For decryption, the key and the ciphertext are
inputs and the plaintext is the output.)
• Different keys allow a single encryption algorithm to produce an
almost limitless number of different outputs.

• An encryption key is a string of bits. The longer the string (the key
length), the greater the number of possible keys. For a key length
of 𝑛, there are 2𝑛 possible keys (see Table 3.2).
Key Length Number of keys Key values

1 21 (2) 0, 1

2 22 (4) 00, 01, 10, 11

3 23 (8) 000, 001, 010, 011, 100, 101, 110, 111


440
10 210 (1024) 0000000000, 0000000001, 0000000010, …
Table 3.2 The number of possible keys available with differing key lengths
The Problem With Short Keys

• Short keys are vulnerable to brute-force attacks, where one or


more computers attempt to decrypt ciphertext by testing every
possible key until they produce recognizable plaintext.

• Testing a million keys per second may sound fast, but this can
easily be achieved by a modest PC. Therefore, keys must be
sufficiently long that they offer a very large number of possible
values. Keys often have lengths of 2128, 21024 or 22048 bits, producing
unimaginably large numbers of possible key values, rendering
brute-force attacks useless.

• Encryption that is resistant to brute-force attacks and whose


algorithm has no known weaknesses is known as strong
encryption.
441
Session Keys
• The final type of key listed above is called a session key (or
sometimes a content encryption key or traffic encryption key).
Session keys can offer greater levels of protection than other
forms of encryption:

 New keys are generated for each exchange of data. In the unlikely
event that a session key is broken by an attacker, later exchanges are
protected by different keys.
 Issuing new keys prevents attackers recovering plaintext by
exploiting any similarities between ciphertexts when a single key is
reused on multiple pieces of plaintext.
 Keys are deleted at the end of a session; they cannot be stolen by
hacking or theft of the computer.

• We will revisit session keys later when we see how data is


encrypted on the internet. 442
Symmetric Encryption

• Highly secure symmetric encryption can be performed at very high


speeds even on modest computer equipment. For this reason,
most encrypted data sent over networks uses one of a relatively
small number of symmetric algorithms.

443
The Data Encryption Standard
(DES)
• Data Encryption Standard – 56 bit keys was originally 64bits as
proposed by IBM, and then reduced to 56 bits.

• DES breaks plaintext into 64-bit blocks, each of which are divided
into two halves. One half is scrambled using an algorithm (the F-
function) which stretches, mixes and substitutes bits within the 32-
bits. The two halves are recombined, then swapped and the
process repeated. This is repeated sixteen times to produce the
final DES ciphertext. Decryption of DES ciphertext is performed by
reversing the process using the same key.

444
The Stopgap: Triple DES
• From 1999 onwards, the US government recommended users of
DES moved to so-called Triple DES (3DES) encryption. Rather than
a new form of encryption, 3DES applies the DES algorithm three
times to each of the plaintext blocks. 3DES is more secure than
DES because it uses a key-bundle usually containing two –
occasionally three – DES keys, giving a key size of either 112 or 168
bits.

• Most implementations of 3DES use two keys to perform three


passes of encryption:

1. the first pass uses the first key in the bundle


2. the second pass re-encrypts the output of the first pass using the
second key
3. the third pass re-encrypts the output of the second pass reusing the
first key. 445
The Stopgap: Triple DES
• (Less frequently, a third key is used for the third pass.)

• 3DES proved to be a relatively simple way of increasing data


security. It increased key size without requiring developers to
create a new algorithm and prove its security. 3DES quickly
became a global standard and is still found in applications as
diverse as smart cards for public transport and utilities, ‘chip and
PIN’ bank cards and protecting user data in Microsoft Outlook.
Research suggests that 3DES using a three-key bundle will remain
secure against brute force until 2030, by which time advances in
computer processing power will finally make it vulnerable.

446
The replacement: the Advanced
Encryption Standard (AES)
• The US Department of Commerce began replacing DES in 1997 by
soliciting expressions of interests from cryptographers to work
alongside the government in developing a new encryption
standard, unimaginatively called the Advanced Encryption
Standard (AES).

447
Meet Alice and Bob
• From now on, we are going to follow a pair of fictional characters
known as Alice and Bob: two people struggling to have a secret
conversation. Alice and Bob, occasionally joined by further
characters, were created by the cryptographer Ron Rivest in 1976
to explain cryptographic principles. A third character in this story is
the eavesdropper Eve, who desperately wants to know what Alice
and Bob are saying.

• Until relatively recently, symmetric encryption was thought to be


the only way of encrypting data. Before encrypted data could be
exchanged, a shared symmetric key had to be generated and
shared between Alice and Bob. This creates two related problems:

448
Meet Alice and Bob
 Alice and Bob could meet, generate the key and each leave with a
copy. This might be inconvenient or even dangerous if Eve became
aware of the meeting. Alternatively;
 Either Alice or Bob would generate two copies of the symmetric key.
They would keep one key and send the copy to the other person. Not
only must Alice and Bob trust one another, but one copy of the key
could be lost; or stolen or copied by Eve when it is in transit.

• Together, these shortcomings are known as the apparently insoluble


key distribution problem.

• However, between 1969 and 1976, at least three groups of


mathematicians independently discovered how to make the key
distribution problem irrelevant.

• They named their algorithm ‘non-secret encryption’, but it is now called


asymmetric cryptography or public-key cryptography. 449
How Asymmetric Cryptography
Works?
• Asymmetric cryptography sidesteps the key distribution problem
by having each user create two keys:

1. the private key that the key owner must keep safe and never
distribute.
2. the public key which can be sent to anyone with whom they
want to exchange encrypted information.

• Together, the keys are known as a key pair. Unlike symmetric


encryption, where a single key performs both encryption and
decryption, each asymmetric key has a different purpose:

 The private key is the only key that can decrypt files encrypted
with the corresponding public key.
 The public key is the only key that can decrypt ciphertext450
encrypted using the corresponding private key.
How Asymmetric Cryptography
Works?
• The value of one key in a pair cannot easily be determined from
the other. Even if Alice’s public key falls into Eve’s hands, Eve can’t
recreate Alice’s private key. Therefore, the public key can be just
that – public. Public keys can be distributed by insecure methods,
such as email or by posting them to internet public key chain
servers.

• Anyone wanting to send an encrypted message to Alice uses a


copy of her public key to secure the message. The encrypted
message can only be decrypted using Alice’s private key, which she
never shares.

451
Exchanging Secrets Using
Asymmetric Cryptography
• Let’s have the below scenario:

1. Alice will encrypt a message to Bob using public-key


cryptography. She first needs a copy of Bob’s public key. Alice
can either ask Bob to attach his key to an email, or she can
request a copy of Bob’s public key from a public key chain
server located on the internet.
2. After composing the message, Alice encrypts the plaintext
using her copy of Bob’s public key and sends him the
ciphertext.
3. When Bob receives the ciphertext, he uses his private key,
which has remained safe in his care, to decrypt the ciphertext
and recover the original plaintext.
452
Asymmetric Key Strength

• Unlike symmetric keys, which are rarely longer than 256 bits,
asymmetric keys are typically very large – usually 1,024, 2,048 or
4,096 bits long. Despite their greater length, differences in the
underlying mathematics mean asymmetric keys are not
appreciably more secure than much shorter symmetric keys.
Whilst we can say a 4,096-bit asymmetric key is more secure than
a 1,024-bit asymmetric key, it is much harder to judge its security
relative to symmetric keys.

453
Asymmetric Versus Symmetric Encryption
It is tempting to think that asymmetric encryption’s ability to avoid the key
distribution problem means it can entirely replace symmetric encryption.
In fact, almost all encryption is performed using symmetric encryption
for the following reasons:
1. Symmetricencryption is fast.
Most modern CPUs can perform one or more forms of symmetric
ncryption in hardware. Symmetric encryption can also be performed
in software at very high speed, even on modest computers.
2. It uses small keys.
Generating and using symmetric keys is relatively quick compared to
creating and using the much larger asymmetric keys.

3. It is well-suited to encrypting any amountof data.


Unlike asymmetric encryption, symmetric encryption can encrypt data even
if the final file size is unknown – such as encrypting an internet telephone call
whose length is not necessarily known at the outset.
• Rather than treating asymmetric encryption as ‘better’ than symmetric
454
encryption, the two forms of encryption complement one another.
Using Asymmetric Cryptography to
Authenticate Data
• Asymmetric cryptography not only protects data, but it can also be
used to uniquely identify the author of a piece of data. Asymmetric
cryptography allows creators to ‘sign’ their data using the unique
properties of asymmetric keys.

• A public key can only decrypt ciphertext encrypted with the


corresponding private key, so whoever created the ciphertext must
hold the private key. The data has been ‘signed’ by the holder of
the private key.

• In practice, because asymmetric cryptography is computationally


expensive and time-consuming, it is normal to encrypt the
relatively small hash of a document, rather than the document
itself. The encrypted hash is called a digital signature. 455
A Simple Digital Signature
• Imagine Alice wants to send a confidential business document to Bob. Both
Alice and Bob need to be confident Eve has not tampered with the document
in route:

1. Alice hashes the document and encrypts the hash using her private key
to produce a digital signature.

2. Alice attaches the digital signature and the document to her email to
Bob.

3. Bob decrypts the digital signature using Alice’s corresponding public key,
revealing the hash.
4. Bob uses the same hashing algorithm as Alice to hash his copy of the
document. He then compares his hash with that from the signature.

5. If the two hashes are identical, then both Bob and Alice can be confident
that the document has not changed in transit.
456
A Simple Digital Signature
ഥ𝑨
𝑲 KB : : KA
ഥ𝑩
𝑲

Alice Bob

M
H(M) M
{𝑯(𝑴)}𝑲ഥ 𝑨 , {𝑴}𝑲𝑩
{𝑯(𝑴)}𝑲ഥ 𝑨 {𝑴}𝑲𝑩 {𝑯(𝑴)}𝑲ഥ 𝑨 {𝑴}𝑲𝑩

H(M) M
Yes
 Alice sent the message =?
 H(M) 457
A Simple Digital Signature

KB : : KA
ഥ𝑨
𝑲

ഥ𝑩
𝑲
{𝑯(𝑴)}𝑲
ഥ 𝑨 , {𝑴}𝑲𝑩 {𝑯(𝑴)}𝑲
ഥ 𝑨 , {𝑴′}𝑲𝑩

Bob

{𝑯(𝑴)}𝑲ഥ 𝑨 {𝑴′}𝑲𝑩

H(M) M’
 M’ is not originated
H(M) ≠ H(M’)
from Alice!  H(M’)

But, why Eve has not simply replaced 458

{𝑯(𝑴)}𝑲ഥ 𝑨 by {𝑯(𝑴′)}𝑲ഥ 𝑨 ?
Digital Certificates (public-key
Certificates)
• Eve’s deception succeeded because there was no way for Alice to
determine if the key came from Bob, or, as it turned out, was fake.
Eve’s scheme would fail if genuine keys were authenticated by a
trusted third party, the Certification Authority.

• A Certification Authority (CA) acts as a trusted third party with


the role of issuing digital certificates that bind individuals’
identities to their public keys.

459
Digital Certificates (public-key
Certificates)
A digital certificate will typically include:
• A copy of the public key
• Information about the owner of the key: the owner’s name,
etc.
• Information about the digital certificate: a serial number,
expiry date, etc.
• Information about the CA itself: CA name, its own digital
signature, etc.

460
Secure Web Connections
• Web traffic is not encrypted by default; instead, web pages are
transmitted as plaintext and can be intercepted. Obviously, this
lack of security was a problem to the pioneering online shopping
companies. Some of the first online shops allowed customers to
browse online catalogues but only accepted telephone payments –
which were probably just as insecure.

TLS/SSL: Transport Layer Security/Secure Sockets Layer


TLS/SSL is used in the majority of web browsers and forms the basis
of the HTTPS protocol.
The services TLS/SSL provides are:
Data encryption,
Client authentication using username and password or username
and token, or digital certificate,
Server authentication,
Data integrity. 461
Summary
• This part concentrated on the principal technologies that allow us
to securely exchange information over an insecure network. We
began by revisiting hashing, a technology first introduced as a way
of protecting passwords from attackers. However, even hashing
cannot guarantee password security, so we discussed improving
password security through the concept of salting and by
encrypting password data.

• Following on from hashing, we studied encryption concepts and


several different encryption algorithms which can be broadly
divided into symmetric and asymmetric technologies. Important
ideas, such as the requirement to have sufficiently long keys to
defeat brute-force attacks and the key distribution problem, were
all discussed.

462

You might also like