Professional Documents
Culture Documents
Information Technology 2
Meeting #1
Block 1 (Part 1 )
Binary data representation and computation
OU Materials, PPT prepared by Dr. Khaled Suwais
1
Edited by Dr. Ahmad Mikati
Contents
• Introduction
• 1.1 Representing integers and text in binary
• 1.2 Decimal numbers and some limitations of binary representations
• 1.3 Representing logic operations and logic circuits
• Summary
2
Introduction
3
1.1 Representing integers and text in
binary
• The printed symbols shown in Figure 1.1 provide
convenient representations of short and long flashes of
light or short and long bleeps of sound, which is how
Morse code is normally transmitted.
4
1.1 Representing integers and text in
binary
• In any representation it is important that the symbols can
be distinguished from each other.
• Changes in electrical voltages or friction in mechanical
systems can cause random fluctuations, called noise,
which may distort how the symbol is perceived.
• In a binary system there are only two symbols, so it is
generally easier to make them different enough to be
distinguishable – for example, Morse code specifies that a
dash should be three times as long as a dot.
5
Binary representation systems in
computers
• Some very early computers, such as the ENIAC
(Electronic Numerical Integrator and Computer), tried to
represent data using our usual base-10 system. So 0 volts
was used to represent the digit 0, 1 volt to represent the
digit 1, and so on, all the way up to 9 volts to represent
the digit 9.
• Hence, a lot of circuitry was needed just in order to
distinguish between the different voltages, which took up
a lot of space and generated a lot of heat.
• The advantage of representing data in binary is that only
two ranges of voltage need to be detected.
6
Converting numbers from binary to
decimal notation
To convert a number from binary to decimal notation, we put
the number in the table and add up the values of each place
value.
So to convert the binary number 1001 into decimal notation,
we can use the following table
7
Converting numbers from decimal to
binary notation
8
Representing integers in binary
Unsigned integers
10
Adding unsigned integers in binary
notation
12
Sign-magnitude representation
Signed magnitude is the most intuitive method for representing the
unsigned numbers.
The MSB (Most Significant Bit) of a binary number is kept as the
“sign” of the number
MSB = 1: negative number
MSB = 0: positive number
The remaining bits represent the magnitude (or absolute value) of
the numeric value.
• So for our 3 bits, there would be 23 = 8 possible binary codes, which
could be used to encode positive and negative integers as shown in
Table 1.7.
13
Sign-magnitude representation
In an N bit word signed magnitude system
1 bit is used for the sign of the number (MSB).
N-1 bits are used for the magnitude of the number.
The largest integer is 2N-1- 1
The smallest integer is -(2N-1- 1)
• Since 2007, the standard encoding system for characters has been
Unicode Transformation Format-8 (UTF-8) which uses a variable
number of bytes (up to 6) to encode characters in use across the
world. However, in order to maintain backward compatibility, the
original 127 ASCII codes are preserved in UTF-8.
16
Floating-point numbers and scientific
notation
• Consider the decimal number 2343.56. We could also write this
as 23.4356 × 102 OR 0.234356 × 104 or 234356.0 × 10–2.
• The decimal point can ‘float’ to any position as long as the
power of 10 is appropriate.
• Scientific notation is a special case of floating-point notation
where there is a single non-zero digit between 1 and 9
(inclusive) to the left of the decimal point.
• So the number 2343.56 can be represented in scientific
notation as 2.34356 × 103. Note that the exponent, 3, indicates
that the decimal point should be moved three places to the
right to get back the original decimal notation.
17
Floating-point numbers and scientific
notation
• The number –0.000654 in decimal notation can be
written as -6.54 × 10–4 in scientific notation. Here, the
negative exponent (–4) indicates that the decimal point
should be moved 4 places to the left to get back to the
original decimal notation.
• Notice that scientific notation has three distinct parts,
shown in Figure 1.10:
• a sign
• an exponent (the power of 10)
• a mantissa (the decimal number part).
18
1.3 Representing logic operations and
logic circuits
• In the previous two sections, we have seen that we can use binary
encodings to represent numerical and textual data. We will now see
that operations, including arithmetical operations such as addition,
and comparison operations such as less than and equals, can be
encoded as one or more logic operations. These logic operations act
on the binary representations of the data.
• To move from the human to the computer view, the integers have to
be encoded as binary representations and the addition operator has
to be encoded as a sequence of logical operations that have what is
called the truth table.
19
1.3 Representing logic operations and
logic circuits
• A truth table for a logic operation lists all the possible
combinations of input values, and for each possibility gives the
output value for that operation. As the operations we will
consider will always be applied to binary encodings, each input
value must be either a 1 or a 0 and the result of the operation
must also always be a 1 or a 0.
• We will start by looking at the truth tables for three of the
fundamental logical operations defined by Boole. We will then
see how these basic operations can be used as building blocks
for the logic circuits that perform more complex operations.
By the end of this subsection, you will see how these simple
operations can be used to build a logic circuit to add two binary
numbers. 20
1.3 Representing logic operations and
logic circuits
The NOT operation
• One of the most fundamental operations we might want to perform is
to ‘flip’ a single bit – let’s call the bit a. So if 𝒙 is 1, we want the result to
be 0, and if 𝒙 is 0, we want the result to be 1. This operation is called
NOT 𝒙 and is expressed as: ഥ
𝒙 or 𝒙’.
The behavior of NOT operator is characterized by the truth table
shown below:
To physically perform logic operations on
binary data in a computer, we need to use
electrical components. The components
that represent the most fundamental
operations are called logic gates, which
can be combined in a logic circuit in order
to create more complex operations.
The NOT truth table
21
22
1.3 Representing logic operations and
logic circuits
• The OR operation
• truth table for the logic operation OR (which Boole originally
designated by the symbol +) is shown below
OR logic gate
OR truth table
23
Building logic circuits
24
Building logic circuits
• To translate this into a logic expression – that is a combination of
our logic operations (NOT, AND and OR) – we follow this algorithm.
• Identify the row where the outcome (B > A) is 1.
• If input A is 1, write A; otherwise write NOT A in the logic expression for
the selected row.
• If input B is 1, write B; otherwise write NOT B in the logic expression for
the selected row.
• Join these with an AND.
• The final equation will be the sum of all the deduced logic expressions.
This algorithm yields the answer given
• Here, the resulting logic expression NOT A AND B tells us that the
logic circuit that is equivalent to this truth table for each combination
of inputs can be constructed from two logic gates
27
1.3.4 What is inside a logic gate?
• How Logic gates are actually constructed , and what exactly is
inside a logic gate?
• A Logic gate is itself made up of a combination of more
fundamental components that act as on/off switches.
• In early computers, such devices were generally based on
various designs of vacuum tube (collectively called valves).
• In modern computers, they are based on transistors, which are
formed of layers of semiconducting material such as silicon.
28
A ‘pluggable’ unit made of valves from A chip containing six inverters
an IBM computer of the mid-1950s
TM112: Introduction to Computing and
Information Technology 2
Meeting #2
Block 1 (Part 3)
Hardware and Software Concept
OU Materials, PPT prepared by Dr. Khaled Suwais
29
Edited by Dr. Ahmad Mikati
Contents
• Introduction
• 3.1 The processor
• 3.2 Storing and moving data and instructions
• 3.3 Peripherals and pulling it all together
• 3.4 Instructing the processor
• 3.5 Programmers, programming and
programs
• Summary
30
Introduction
You will learn the answers to questions such as the following.
• You will learn the answers to questions such as the following.
• How does a data bottleneck occur in a computer and how it can be
avoided?
• How can I melt my computer?
• What are those strange strings of symbols when I get the ‘blue screen
of death’ on my Windows machine?
• How can a sip and a puff help a person with disabilities interact with a
computer?
• How do computers and programmers pull themselves up by their
bootstraps?
• Do you do RISC?
• When is hardware not required for a computer?
31
3.1 The processor
32
3.1 The processor
• The arithmetic and logic unit (ALU) and the floating-
point unit (FPU) are at the heart of the processor, as these
are the places where the data is actually manipulated.
ALU FPU
• Contains electronic circuits that • It is a common part of most
perform binary arithmetic, such modern processors.
as addition, subtraction, • Its function is very similar to that
multiplication and division on of the ALU, but it operates on
integers. floating-point numbers using
• Contains circuits to perform specialised circuitry optimised to
logical operations, such as be as efficient as possible when
comparing integers with zero, working with floating-point
testing two integers for equality, representations.
testing if one integer is greater
than another, etc.
33
3.1.2 Registers and cache memory
• Main memory is a storage area that contains program
instructions and data.
• When a program is first loaded, the corresponding instructions
and data are put into main memory, which is outside the
processor.
• Each instruction and piece of data is held in a ‘chunk’
called a word.
• A word has a fixed size (usually 32 or 64 bits in a modern
computer), and it is handled as a unit by the hardware of the
processor.
34
3.1.2 Registers and cache memory
35
3.1.2 Registers and cache memory
In modern processors, there may be several levels of cache
memory.
• Level 1 cache is the fastest (and smallest), and the aim is to use
this for the data and instructions that will imminently be
transferred to the registers.
• Level 2 cache is a larger but slower cache memory.
• There may be two more levels of cache below Level 2, each with
more capacity but slower speed.
36
37
3.1.2 Registers and cache memory
• There are several different types of registers in different
parts of the processor, and each is designed to hold a
particular type of information for a specific function.
• The accumulator is a register within the ALU where an
actual calculation takes place.
• The status register, sometimes called the flags register,
holds further information about the last operation
executed. Each bit in the register represents some
description of the result – is the result zero? Is the result
negative? Is the result too big to be stored in the
accumulator? And so on.
38
3.1.3 The control unit and other
registers
• The control unit has the role of coordinating the
movement of data and instructions within the processor.
It does this by sending out electrical pulses, called control signals,
that activate the necessary connections between main memory,
cache, registers, ALU and FPU, as required, to execute the
instruction.
• The address register holds the memory address of the
next instruction to be executed.
39
3.1.4 Multi-core processors
• A multi-core processor is a single chip that contains two or
more independent processors called cores.
• Each core performs the usual functions of loading data and
instructions into registers and performing arithmetic manipulations or
floating-point manipulations, but instructions can be shared between
each of the cores and run at the same time, increasing the overall
speed of programs.
• You may think that four cores all working simultaneously would
make a program run four times as fast. However, this is far from
being the case, for several reasons.
• Firstly, each core requires its share of the data and instructions to be
moved from the shared main memory into cache memory, and from
there into its registers.
• Each core may have its own Level 1 cache memory, but often the
other levels of cache memory are shared between them. This can lead
to delays while the cores wait for data and instructions to be 40
transferred.
3.1.4 Multi-core processors
• In order to take advantage of multiple cores, the program has to
be written in such a way that a task can be split up into
independent sub-tasks, each of which can be completed by a
core, and then, if necessary, reassembled into a final solution.
This process is called threading – with each of the independent
tasks being coordinated by a separate thread
41
A multi-core processor where each core is processing a separate thread. (L1, Level 1; L2, Level 2.)
3.2 Storing and moving data and instructions
(Main Memory)
• Main memory is where the instructions, and the data they act on, are
loaded from when a program is executed.
• It is volatile memory, which means that its content is lost when the
power is switched off.
• Each byte in main memory is numbered in sequence, so that it has a
unique memory address.
• In main memory, every memory address can be directly accessed,
which is why this type of memory is referred to as random-access
memory (RAM).
• Most forms of memory today are random access, but for historical
reasons we still tend to reserve the acronym RAM for main memory.
• An advantage of any form of random access memory, is that accessing
any location in memory takes the same amount of time, regardless of
whether it is stored at a location with a low or a high memory address.
42
3.2.3 Buses and clocks
• The wiring that connects the various internal and external
components of a computer is known as a bus. Internal
buses inside the processor connect the various registers
and cache memory together.
• The control bus: this bus carries the control signals
between the processor and main memory (and other
parts of the computer system).
• The address bus: this bus carries the addresses of
memory locations to be accessed.
• The data bus: this bus transfers data from place to
place.
43
3.2.3 Buses and clocks
44
3.2.4 The operating system
46
3.2.4 The operating system
Some of the functions that the operating system provides are as follows:
• Provision of a user interface:
• It provides us with a means of inputting data and instructions, and displaying output in
a form that users can understand.
• Management of multiple programs:
• The operating system supports hardware designed to enable the processor to switch
between different executing programs in order to multitask.
• Management of memory:
• It is the job of the operating system to allocate appropriately sized areas of memory to
each executing program, and to ensure that program instructions and data do not
interfere with each other or with data and instructions of other programs.
• Coordination and control of peripheral devices:
• in order to carry out its tasks, a computer will need to communicate with one or more
peripheral devices. For example, it may wish to receive data from the keyboard or
mouse, read from a file on a disk, send output to the monitor or printer, connect to a
network, and so on.
47
3.3.2 Secondary memory
• It is also the case that the registers are built directly into the processor,
so there are usually a fixed number of them – typically fewer than 50.
50
3.5 Programmers, programming and
programs
• An assembly language is a programming language that uses human-
readable symbolic instructions and symbolic addresses that translate
into machine language instructions on a one-to-one basis.
• A program written in assembly language has the ability to directly
access all the features and instructions available on the processor it is
designed for.
• Whenever a program is written in a language other than machine
language, the instructions in the original program (called the source
code) need to be converted into equivalent machine language
instructions.
51
3.5 Programmers, programming and
programs
• The task of converting the source code into machine language is carried
out by special programs called translators.
• When the source code is in assembly language, the program that does
this translation into machine code is called an assembler.
• An assembler takes an assembly language program and generates an
equivalent program in machine language, which can then be loaded into
memory and executed. Since each processor family has a different
machine language, and therefore a different assembly language, they
each require a different assembler.
52
3.5 Programmers, programming and
programs
• It would be exceptionally tedious (not to mention error-prone) to have
to deal with computer programs by writing in low-level languages and
writing code specifically for each family of processors, so modern
computing is not done in this way.
• Instead, high-level programming languages are used, in which each
instruction in the high-level language is translated into many
instructions in the machine language of the processor on which it is to
be executed.
• High-level programming languages include Python, JavaScript, Java,
C++, Smalltalk, Scratch and a whole range of application-specific
languages that attempt to make the process of writing programs easier
for the human involved.
53
3.5 Programmers, programming and
programs
• In compilation, the program written in the high-level
language, called the source code or source program, is
used as the input to a translator program called a compiler.
• The compiler translates the entire source program into the
machine language understood by the processor; this
translation is referred to as the object code or object
program.
• The object code is then saved, and it is this machine
language program that is loaded into memory and executed
when the program is executed.
• Languages such as C, C++, and Visual Basic are designed to
be compiled.
54
3.5 Programmers, programming and
programs
• Whereas a compiler translates all the source code in one go, an
interpreter translates each instruction in the source code only
when it is required for that instruction to be executed.
• There is never a complete translation of the whole of the source
code into machine language, and so no object code program is
generated.
• The advantage of an interpreted language is that the potentially
lengthy process of compilation does not need to be gone through
for each small change in the source code.
• The main disadvantage is that the translation process must take
place every time a program is executed, resulting in a slower
execution of the program. Like compilers, it is also the case that
each processor family needs a different interpreter.
• Languages such as JavaScript, Perl and Basic are designed to be
interpreted.
55
3.5 Programmers, programming and
programs
• Virtualisation is a term used to describe any configuration
where a physical computer system is emulated using software.
• Using a virtual machine to interpret bytecode as we described above
is just one example, but there are many different kinds of
virtualisation. For example, if you use a Mac, you might have a virtual
machine on your computer that allows you to also run an emulated
Windows platform.
• Cloud computing relies on virtual machines sitting on top of
remote servers, allowing the server’s processing and storage
capacity to be shared between several users by using a software
layer called a hypervisor to act as an intermediary between
multiple ‘guest’ operating systems and the host operating
system that directly interacts with the hardware.
56
Summary
• In this part, you have learned how the main components
of a computer work together to execute a program.
57
TM112: Introduction to Computing and
Information Technology 2
Meeting #3
Block 1 (Part 2- Intro)
Introduction to Python
Collected by Dr. Ahmad Mikati
58
Why Python?
Python is object-oriented
• Supports concepts such as polymorphism, operation overloading, and multiple inheritance
It's free (open source)
• Downloading and installing Python is free and easy
• Source code is easily accessible
• Free doesn't mean unsupported! Online Python community is huge
It's portable
• Python runs virtually on major platforms used today
• As long as you have a compatible Python interpreter installed, Python programs will run in exactly
the same manner, irrespective of platform
It's powerful
• Dynamic typing
• Built-in types and tools
• Library utilities
• Third party utilities (e.g. Numeric, NumPy, SciPy)
• Automatic memory management
7 December
2021 59
Python IDLE
• Interactive Mode
• gives you immediate feedback
• Not designed to create programs to be saved and run later
• Script Mode
• Write, edit, save, and run (later)
• Save your file using the “.py” extension
7 December
2021 61
Create and run programs in Script Mode
1. Go to the File menu.
2. Make a new file.
3. Give a name for the new file such as:
firstProgram.py and then save with .py
extension.
4. You can now start writing your code.
5. To run your code, save it first and then go to the
run menu choose run Module or press F5.
7 December
2021 62
Python print() Function
The print() function prints the specified message to the screen, or other
standard output device.
The message can be a string, or any other object, the object will be
converted into a string before written to the screen.
63
Your First Python Program
• Python is "case-sensitive":
• print("hello") #correct
• print('hello') #correct
• Print("hello") #error
• PRINT("hello") #error
7 December
2021 64
String Literals
• String literals in python are surrounded by either single
quotation marks, or double quotation marks.
• 'hello' is the same as "hello".
• Strings can be output to screen using the print function.
For example: print("hello").
• Like many other popular programming languages,
strings in Python are arrays of bytes representing
unicode characters.
• However, Python does not have a character data type, a
single character is simply a string with a length of 1.
Square brackets can be used to access elements of the
string.
65
String Literals
Example 1: Get the character at position 1 (remember that the first character
has the position 0):
a = "Hello, World!"
e
print(a[1])
66
Program Documentation
7 December
2021 67
Variables
7 December
2021 68
Variables
• Example1:
>>> x,y = 2,3
>>> x
2
>>> y
3
• Example2:
>>> x = 5; y = 4; L = [0,1,2]
7 December
2021 69
Variables
• Variable names can contain letters, numbers, and the
underscore (the dollar sign is NOT accepted!).
• Variable names cannot contain spaces.
• Variable names cannot start with a number.
• Variable name cannot be a reserved word.
• Case matters: temp and Temp are different variables.
• There are many reserved words such as:
>>> x = '100'
>>> y = '-90'
>>> print (x + y)
Since they are strings, x and
100-90 y will be concatenated
7 December
2021 74
Casting in Python
• Casting to integers:
x=int(input(“enter the value”)) #5
y=int(input(“enter the value”)) #10
x+y = 15
• Casting to floats:
x=float(input(“enter the value”)) #5.0
y=float(input(“enter the value”)) #10.0
x+y = 15.0
• Casting to strings:
x = str("s1") # x will be 's1'
y = str(2) # y will be '2'
z = str(3.0) # z will be '3.0'
7 December
2021 75
Math Operators
Name Meaning Example Result
+ Addition 34 + 1 35
Can be used
also for string - Subtraction 34.0 - 0.1 33.9
concatenation:
y="hello" * Multiplication 300 * 30 9000
y=y+“ world!"
/ Float Division 1/2 0.5
// Integer Division 1 // 2 0
% Remainder 20 % 3 2
7 December
2021 76
If statement
7 December
2021 78
Python Conditions and If statements
Example:
a = 33
b = 200
if b > a: 79
print("b is greater than a")
The elif Statement
The elif keyword is a python’s way of saying "if the
previous conditions were not true, then try this condition".
Example:
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
Output:
a and b are equal
80
The else Statement
The else keyword catches anything which isn't caught by
the preceding conditions.
Example:
a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")
Output:
a is greater than b
7 December
2021 81
Conditional Operators
7 December
2021 82
Common Mistakes
7 December
2021 83
Exercises
1.Write a program that asks the user to enter a length in
centimeters. If the user enters a negative length, the program
should tell the user that the entry is invalid. Otherwise, the
program should convert the length to inches and print out the
result.There are 2.54 centimeters in an inch.
2.Ask the user for a temperature. Then ask them what units,
Celsius or Fahrenheit, the temperature is in. Your program
should convert the temperature to the other unit. The
conversion Formulas are:
F = 9C/5 + 32 and C = 5 (F -32) /9
7 December
2021 84
Exercises
Solution of Ex 1:
length = float(input(“Enter the length in Centimeters: "))
if length <0:
print("Length should be positive!!")
else:
inch = length/ 2.54
print(length, “Centimeters is: ",inch," Inches")
Output:
Enter the length in Centimeters : 20
20.0 Centimeters is: 7.874015748031496 Inches
>>>
Note:
• You can use the round function to round the result:
print(length, “Centimeters is: ",round(inch,1)," Inches")
• In this case, the output will be rounded into 1 decimal place:
20.0 Centimeters is: 7.9 Inches
7 December
2021 85
Exercises
Solution of Ex 2:
temp = float(input(“Enter the temperature: "))
unit = input(“Enter the unit: C/F: ")
if unit == "C":
fah = 9/5 * temp + 32
print(temp," Celsius is ",fah," Fahrenheit")
else:
cel = 5/9*(temp -32)
print(temp," Fahrenheit is ",cel," Celsius")
Output:
Enter the temperature: 50
Enter the unit: C/F: C
50.0 Celsius is 122.0 Fahrenheit
>>>
7 December
2021 86
For Loop
A for loop is used for iterating over a sequence (that is either a list, a
tuple, a dictionary, a set, or a string).
With the for loop we can execute a set of statements, once for each
item in a list, tuple, set etc.
7 December 2021 87
For Loop
• Example 2 The program below asks the user for a number
and prints its square. It does this three times and then prints:
‘The loop is done’.
The output:
No
indentation
here; so it is
outside the
loop
88
7 December 2021
For Loop
--output--
A
B
C
D
C
D
C
D
C
D
C
D
E
7 December
2021 89
The range function
• To loop through a set of code a specified number of times, we can
use the range() function.
• The range() function returns a sequence of numbers, starting from 0
by default, and increments by 1 (by default), and ends at a specified
number.
• The value we put in the range function determines how many times
we will loop.
• The range function produces a list of numbers from zero (by
default, unless other is specified) to the value minus one.
• For instance, range(5) produces five values:
0, 1, 2, 3, and 4.
7 December
2021 90
The range function
--output--
0
1
2
• Prints the numbers from 0 to 99. 3
.
.
.
.
.
.
.
.
99
7 December
2021 91
The range function
Example
• Since the loop variable i, gets increased by 1 each time
through the loop, it can be used to keep track of where we
are in the looping process. Consider the example below:
7 December
2021 92
The range function
• If we want the list of values to start at a value other than 0, we can do that by
specifying the starting value.
range(1,5) will produce the list 1, 2, 3, 4.
• Another thing we can do is to get the list of values to go up by more than one at a
time. To do this, we can specify an optional step as the third argument.
range(1,10,2) steps through the list by twos, producing 1, 3, 5, 7, 9.
7 December 2021 93
The range function
Output
7 December
2021 94
The range function
Example:
for i in range(1,7):
print (i, i**2, i**3, i**4)
----output----
1 1 1 1
2 4 8 16
3 9 27 81
4 16 64 256
5 25 125 625
6 36 216 1296
>>>
7 December
2021 95
The range function
Here is a program that counts down from 5 and then prints a message:
Output:
Note:
• Python’s print() function comes with a parameter called ‘end’.
• By default, the value of this parameter is ‘\n’ (the new line character).
• You can end a print statement with any character or string using this parameter.
7 December
2021 96
The while loop
7 December
2021 97
The while loop
While statements have the following basic structure:
while condition:
action
As long as the condition is true, the while statement will execute the action.
Example:
x = 1
while x < 4: # as long as x < 4...
print (x**2) # print the square of x
x = x+1 # increment x by +1
--output--
1 # only the squares of 1, 2, and 3 are printed, because
4 # once x = 4, the condition is false
9
7 December
2021 98
The while loop
The following while and for loops are equivalent
Example:
x = 1
while x == 1:
print('Hello world‘)
# so-called Infinite loop! Python will keep printing
# “Hello world” because x does not change
7 December 10
2021 0
The while loop
x = 1
---output---
while x < 3 :
1
print (x)
2
x = x + 1
hello
else:
print ('hello')
7 December 10
2021 1
The break Statement
With the break statement we can stop the loop even if the
while condition is true:
7 December 10
2021 2
The continue Statement
With the continue statement we can stop the current iteration, and
continue with the next.
6
Functions & Returns
7 December 10
2021 4
Functions & Returns
• max(x1, x2,...) The largest of its arguments: the value
closest to positive infinity
• min(x1, x2,...) The smallest of its arguments: the value
closest to negative infinity
• modf(x) The fractional and integer parts of x in a two-item tuple.
Both parts have the same sign as x. The integer part is returned as a
float.
• pow(x, y) The value of x**y.
7 December 10
2021 6
Functions & Returns
• Other function are not built-in and require you to import the
math library.
7 December 10
2021 7
TM112: Introduction to Computing and
Information Technology 2
Meeting #4
Block 1 (Part 2)
Introduction to problem solving in Python
OU Materials, PPT prepared by Dr. Khaled Suwais
• Introduction
• Strategies for success
• Strategies for problem solving
• 2.1 Problem solving using decomposition
• 2.2 Iteration
• 2.3 Problems
• 2.4 Using lists for flexibility
• 2.5 Nested iteration
• Summary
109
Introduction
• Programming is about solving problems, so while it is
sometimes perceived as difficult, it is closely related to
many things we do every day.
• Python is a great language for beginner programmers as
small programs can be quickly and easily written.
• In this module, programming is viewed as a problem-
solving process that requires you to think about a
problem and decompose it, before writing any code.
110
Strategies for problem solving
111
Problem solving using decomposition
(Algorithms)
• Solving problems, when programming, starts with
working out an algorithm.
• An algorithm tells us or a computer how to carry out a
task.
• An algorithm should be self-contained in that all of the
instructions for the task are included within the algorithm
and nothing else is needed.
• A step-by-step approach allows anyone following the
algorithm to organise their work and know precisely
where they are in the algorithm.
112
Problem solving using decomposition
(Algorithms)
• Most of the algorithms we meet in everyday life – such as
recipes or instructions for completing a task – are
sequences of individual actions.
• We start at the first action (usually at the top, and
sometimes labelled 1 or a) and work through the actions
in order.
• A key point about sequences is that the outcome of
performing an action is dependent on the outcome of
previous steps.
113
Problem solving using decomposition
(Make life simple: algorithms in simple English)
We will describe an algorithm using a series of steps
expressed largely in ordinary natural language, but in a
structured way.
• We solve a problem by decomposing (dividing) it into
steps that solve it.
• If the original problem is complex enough, we may first
decompose it into sub-problems and then decompose
these into steps.
• For difficult problems, we may further decompose the
sub-problems.
114
Problem solving using decomposition
Programming and robotic turtles
115
Problem solving using decomposition
Programming and robotic turtles
116
Problem solving using decomposition
Programming and robotic turtles
• Here is our first simple turtle program. (You can try these
commands in Python and see what happens.)
120
Problem solving using decomposition
Iteration
Python
• The program will move the turtle forward by 100 units. It does this by
moving forward 10 units, and repeating this movement ten times.
• range(1, 11) means that the range of numbers starts at 1, counts
upwards by 1 and stops just before 11.
124
Problem solving using decomposition
(Programming for repetition)
Python
Alternative Solution
125
Problem solving using decomposition
(A more powerful approach to design)
• Let’s think about a slightly more complicated problem.
We will design and implement a program to draw two
squares, one below the other, with a gap in between.
Problem decomposition
126
Problem solving using decomposition
(A more powerful approach to design)
• We have now decomposed the problem into sub-problems,
rather than steps.
133
Using lists for flexibility
(Working with simple lists)
• A list is a data structure in Python that is a mutable
(or changeable), ordered sequence of elements.
• Each element or value that is inside of a list is called an item.
• Lists are defined by having values between square brackets [ ].
• Lists are great to use when you want to work with many
related values.
• They enable you to keep data together that belongs together,
condense your code, and perform the same methods and
operations on multiple values at once.
13
4
Using lists for flexibility
(Working with simple lists)
• To create a list we use square brackets to indicate the start and end of
the list, and separate the items by commas:
L = [1,2,3]
• You can use the print function to print the entire contents of a list:
print(L) #[1, 2, 3] will be printed
• Lists are mutable: individual elements can be reassigned in place.
L[0] = 4
print(L) #[4, 2, 3] will be printed
• The empty list is []. It is the list equivalent of 0 or empty string ''.
• If you have a long list to enter, you can split it across several lines, like below:
nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40]
13
5
Using lists for flexibility
(Working with simple lists)
• If we had sales data for every day of the year, with
gloves1 holding the data for the first day of the year,
and so on, we could still use the previous approach –
but things become tedious.
• Instead, we would like a way of holding the glove data
together and being able to access it in a flexible way.
• In Python, we can say:
gloves = [10,8,3,5]
• This variable refers to a list of numbers, rather than just
a single number as earlier. 136
Using lists for flexibility
(Working with simple lists)
Now, to calculate and print the total number of glove
sales :
gloves = [10,8,3,5]
total = 0
for index in range(0, len(gloves)):
total = total + gloves [index]
print(total)
This gives me a loop which gets executed with values for
index of 0, 1, 2, and 3.
137
Nested iteration
(Independent nested loops)
• Sometimes we use loops within loops (embedded loops) to solve
more complex problems.
• Breaking the problem down, we see the need to show all the
multiples of 1, all the multiples of 2, and so on. So we can use a loop.
The sub-problems, such as finding the multiples of 1, 2 and 3, can
also be done using a loop. 138
Nested iteration
(Independent nested loops)
Problem
Decomposition
Translated
Python Code 139
Nested iteration
(Independent nested loops)
• In the previous case, we just want a space character between the
numbers. We need to include end = ... any time we don’t want print
to move to a new line. So end = ' ' adds a space character without
moving to a new line.
• To prevent print moving to a new line, but without adding a
character, we can say end = ''.
140
Nested iteration
(Programming the turtle using nested loops)
• Consider a program to produce a number of squares across
the page. A decomposition of this problem is as follows:
Problem
Decomposition
pd()
Python Code
Nested iteration
(Dependencies between nested loops)
• In this subsection, we will look at loops where the index of
the inner loop is dependent on the outer loop.
*
**
***
143
Nested iteration
(Dependencies between nested loops)
Python
# Print Right-angled triangle
size=3
for line in range(1,size+1):
for asterisk in range(1, line+1):
print('*', end='')
print()
144
Summary
After studying this part, you should be able to:
• decompose a simple problem to produce an algorithm,
using sequence, selection and iteration
• translate a simple design of an algorithm into Python
• make use of iteration to produce code that can solve
problems where we need to do things several times
• make use of lists to express the idea that a number of
data items are related
• solve problems involving drawing line-based images using
turtle graphics
• use algorithmic thinking to solve problems. 145
TM112: Introduction to Computing and
Information Technology 2
Meeting #5
Block 1 (Part 4)
Patterns, algorithms and programs 1
OU Materials, PPT prepared by Dr. Khaled Suwais
• Introduction
• 4.1 Calculate
• 4.2 Document and test
• Lists
• Summary
147
Introduction
• We will study several types of problems and the
corresponding patterns that you can use to solve them.
• A pattern will be a template to be filled in to obtain a
concrete algorithm.
• When given a problem, if you recognise the type of the
problem, you can then use the corresponding pattern to
create a concrete algorithm for the given problem. That is
much easier and faster than having to design an algorithm
from scratch.
148
Introduction
• The problem-solving process you saw in Part 2 becomes more
structured, with intermediate steps to go from problem to
algorithm, illustrated in this Figure.
149
4.1 Calculate
• Throughout this part, we will show you various kinds of
numeric problems,
• i.e. where the inputs and outputs are numbers.
• In this section, we’ll start with the simplest of such
problems, with a single output calculated directly from
the inputs.
• The aims of this section are to recap and summarise some
concepts you already came across in Part 2 and possibly in
TM111, and to set the stage for the structure of the
numeric problems and solutions in the rest of this part.
150
4.1.1 Numeric expressions
151
4.1.1 Numeric expressions
152
4.1.1 Numeric expressions
• Python also provides the remainder (modulus) operator,
confusingly written as the percentage sign (%).
• The % operator calculates the remainder of the floor
division of the first number by the second,
• For example, 7%2 (read as “7 modulo 2”) is 1 because 2 fits 3 times
in 7 (3 * 2 = 6) and 1 remains (7 – 6 = 1).
• The remainder operator is useful to check if a number n is a
multiple of another number m; if it is, then n modulo m will
be 0, because m will fit an exact number of times in n.
• For example, 14%7 is 0 because 14 is a multiple of 7.
153
4.1.2 Formula problems: pattern
Problem 4.1 Brick volume
154
4.1.2 Formula problems: pattern
Problem 4.1 Brick volume
Decompose the problem
• The first step is to decompose the problem into simpler
sub-problems.
• In this case, we remember from school that usually the
volume of a solid is obtained from the area of its base and
its height. In this problem, the base of the brick is the
rectangular area formed by the width and length.
• Since the volume depends on the area, I have to
decompose the problem in the right order:
• > Compute the volume, given the width, length and height:
• >> Compute the base area, given the width and the length
• >> Compute the volume, given the base area and the height
155
4.1.2 Formula problems: pattern.
Pattern 4.1 Formula
Line Instruction
2 set the output variable to the value of the formula applied to the
inputs
157
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm
158
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm
• Line 3 asks to provide the
initial values, but the sub-
problem doesn’t state what
the values of the width and
length are.
• Usually the algorithm
would ask the user to type
in some values on the
keyboard, but to keep
things simple I choose the
values myself – preferably
small ones (let’s say 2 and
3) to easily check if the
area is correct in line 5.
• So, line 3 of algorithm 4.1
version 1 is instantiated as
159
follows:
4.1.2 Formula problems: pattern
4.1.3 Formula problems: algorithm
• Note how line 3 of the pattern
became two lines of the
algorithm.
• Note that we use specific and
descriptive variable names that
capture the problem at hand.
161
4.2 Document and test
• For complicated or less familiar problems, the reader can struggle
to understand what’s going on in the code, even with descriptive
variable names.
162
4.2 Document and test
• No matter how simple your program is, you should get
into the habit of testing it to catch any errors.
163
4.2 Document and test
• You only have to test your program for admissible input
values and document in the code which values are not.
• If the user inputs an inadmissible value, it’s their fault, and
all bets are off: your program may crash, output a
nonsensical value – or, even worse, output a value that
seems correct.
• For numeric problems, some of the typical conditions for
an input value to be admissible are: the value is an integer;
positive; negative; non-zero; a multiple of some number n;
a percentage (i.e. a floating-point number from 0 to 1 or
an integer from 0 to 100); larger than another input; etc.
164
How to solve computational
problems- Summary
As a conclusion, we briefly summarize a process to solve
computational problems:
165
Lists
• There are a number of things which work the same way for lists
as for strings.
• len(): where you can use len(L) to know the number of
items in a list L.
• in: operator which tells you if a list contains something.
• Examples:
if 2 in L:
print('Your list contains number 2.')
if 0 not in L:
print('Your list has no zeroes.')
16
7
Lists
• A list is a data structure in Python that is a mutable
(or changeable), ordered sequence of elements.
16
8
Lists
• A list can contain all kinds of things, even other lists.
• For example, this is a valid list:
L = [1, 2.718, ‘xyz’, [1,2,5]]
• To access the items of the list inside a list, you have to use double squared
brackets.
16
9
Lists
17
0
Lists
In Python, for loop is used to iterate over the items of any sequence
including the Python list, string, tuple etc. The for loop is also used to access
elements from a container (for example list, string, tuple) using built-in
function range().
Both of the following examples print out the items of a list, one-by-one, on
separate lines.
17
1
Lists
17
2
Lists
17
3
Lists
More operations on lists:
• Concatenation:
>>> L1 = [0,1,2]; L2 = [3,4,5]
>>> L1+L2
[0,1,2,3,4,5]
• Repetition:
>>> L1*3
[0,1,2,0,1,2,0,1,2]
• Appending:
>>> L1.append(10)
[0,1,2,10]
• Sorting:
>>> L3 = [2,1,4,3]
>>> L3.sort()
[1,2,3,4] 17
4
Lists
• Reversal:
>>> L4 = [4,3,2,1]
>>> L4.reverse()
>>> L4
[1,2,3,4]
• Shrinking:
>>> del L4[0] # L4 will be [2,3,4]
>>> L4 = [] # L4 will be []
• Index and slice assignment:
>>> L1 = [10,20,30,40]
>>> L1[1] = 0 # L1 will be [10,0,30,40]
>>> L1[1:4] = [4,5,6] # L1 will be [10,4,5,6]
• Making a list of integers:
>>> list(range(4))
[0,1,2,3]
>>> list(range(1,5))
[1,2,3,4] 17
5
Example- List of lists
• Suppose we have the following list:
List=[['Ali', 5, 10,15],['Naji',12,12,15],['Fadi',10,14,12],['Rajaa',18,16,14]]
- Print the average of the second grade for all students.
- Print the name of the student and his/her average grade
OR
176
Summary
In this part, you learned further techniques to solve
computational problems by:
• looking in the problem statement for the input(s) and the
output(s)
• thinking about which inputs are not allowed, e.g. negative
numbers
• writing tests (pairs of admissible inputs and their
expected outputs)
• recognising the type of the problem or the types of the
sub-problems
• instantiating the patterns for those problem types
• combining the algorithms for the sub-problems to solve
the whole problem.
177
TM112: Introduction to Computing and
Information Technology 2
Meeting #6
Block 2 (Part 2)
Patterns, Algorithms and Programs 2
Block 2 (Part 4)
Organizing Your Python Code and Data
• Introduction
• Generate a List- Append
• Reduce
• Search
• Combine
• The Final Problem
• Summary
179
Introduction
• This part continues to apply the problem-solving approach from Block 1
Part 4, in which we go from problems to programs via patterns and
algorithms.
• Lists are a very useful and flexible way of storing multiple data items.
Most Python programs use lists. In this part, I will show some common
problem types on lists and the corresponding solution patterns.
180
• Generate a List
181
Append
• Normally the generated sequence has to be stored for
further processing, and lists are ideally suited for that.
Storing the generated sequence in a list requires just two
changes to Pattern 2.1:
start with the empty list.
instead of (or in addition to) printing the value,
append it to the list, i.e. add it to the end of the list.
182
Append
• In Python, adding value to the end of a list , say sequence, is
intuitively written as:
183
Filter
184
Python editor: hot days program
• Given a list of daily temperatures in degrees Celsius in a certain place,
construct a list of the temperatures above 30. Assume temperatures are
given as whole numbers.
185
O/P: The hot days had temperatures [33, 32, 42, 36]
Transform
• Another way to construct a new list from an existing
one is to transform each item of the input list into
one item of the output list.
189
Program 2.5 Negative temperatures
• Pattern 2.5 Counting:
1. initialize the input list.
2. set a counter to zero.
3. for each item in list:
Algorithm a. if the item satisfies the condition:
i. increment the counter, i.e.
add 1 to it.
4. print the counter.
192
Program 2.7 list sum
• Pattern 2.6 Aggregate:
194
Find a Value
195
Program 2.8 Find Negative Temperature
• Pattern 2.7 Find value:
• For this problem type, I will assume that the input list is not empty.
This guarantees that there will be a best value.
200
Functions in Python
• You may recall being told that you can find the size of a list named
gloves by using len(gloves). This involved a Python function,
len().
201
Function names and arguments
• When we talk about a function, we use its name – in this case, len – followed
by a pair of parentheses: len(). This way, you can see at once that we are
talking about a function rather than, for instance, a variable.
def printHello():
print("Hello! ")
203
Functions without return
• Example: Write a function that will display a welcome message to a student with
his/her name. Then, use the function in your program.
def welcome(aName):
print("Hello " , aName)
• Someprint("Welcome
functions perform to AOU")
simple procedural tasks (specified in their bodies)
but do not return any information when they are called.
welcome("Ahmad") #Function call
• When the function is called, an actual value for the argument must be used.
• For example, when the function call welcome('Ahmad') is executed, the actual
string 'Ahmad' replaces the argument aName resulting in the following output:
-----output-----
Hello Ahmad
Welcome to AOU
204
Functions with return
4 6
def recArea(aHeight,
4
aWidth) :
6
area = aHeight * aWidth
return area
• Example : Write a function that takes the height and width as arguments,
calculates the area of a rectangle, and returns it. Then display the area in the
h =window.
output eval(input("Enter
Use the function inthe height: " ))
your program.
w = eval(input("Enter the width: "))
print("Area = ",recArea(h,w)) #Function call inside
print()
-----output-----
Enter the height: 4
Enter the width: 6
Area = 24 206
Functions with Multiple return values
So what if you want to return two variables from a function instead of one?
There are a couple of approaches which new programmers take.
Let’s take a look at a simple example:
• There is a name for the items you need to know: the interface of the
function. To use a function, knowing the interface is enough. There’s no
need to know what goes on under the bonnet-its implementation.
• The Python interpreter deals with a program one line after the
other, starting with the very first line. A normal program line is
executed when the interpreter gets to it. However, functions do
receive special treatment, which we examine in this section.
209
The Python Interpreter and Functions
210
Using Functions-The Benefits
• Replacing duplicate code with a function can make a program
shorter and more readable.
211
Reusing Code
• There is a more elegant way to reuse functions. Instead of
copying a function into a new program, you can also put all
your functions – say, for drawing figures – into a separate file
(with the .py extension). Let’s call it
figure_drawing_functions.py. At the beginning of
your new program, you then simply add from
figure_drawing_functions import *. This has the
same effect as placing the function definitions themselves at
the beginning of the program. This way, you can create your
own library of figure drawing functions.
212
Summary
• In this part, you again practiced the techniques to solve
computational problems by:
1. looking in the problem statement for the input(s) and the
output(s)
2. thinking which inputs are not allowed, e.g. negative
numbers or empty lists
3. writing tests (pairs of admissible inputs and their
expected outputs)
4. recognizing the type of the problem or the types of the
sub-problems
5. instantiating the patterns for those problem types
6. combining the algorithms for the sub-problems to solve
the whole problem.
7. organizing your code using functions.
Meeting #7
Block 2 (Part 5)
Diving Into Data
• Introduction
• The Creative Problem Solver
• A Data Analysis Project
• The Geography of Happiness
• Correlation and All That
• Summary
220
Introduction
• Data analysis consists of taking a set of data and computing something
from it – ‘crunching the numbers’ – to extract information.
• The analysis will involve a more extended use of Python than you have
seen so far, but we will take you through it step by step, and it will give
you valuable experience that will help with your own Python projects.
221
Problem Solving As A Process
• Human beings have taken problem solving further than our fellow
animals, and have been doing so for thousands of years.
• But for most of our human history, problem solving was just something
people did. Only very recently have people thought about problem
solving itself as a process that can be studied. One of the pioneers was
the Hungarian mathematician George Pólya, who set out his ideas in a
ground-breaking book How to Solve It.
222
Problem Solving As A Process
• Let’s look briefly at the individual stages. Pólya’s interest was
in mathematical problems, but we have adapted his ideas to
apply to problem solving using a computer.
223
Problem Solving As A Process
2. Make a plan: This corresponds to devising an appropriate
algorithm. Of course, this isn’t always straightforward,
otherwise we might not feel we have a problem to solve!
But there are some very useful guidelines, generally called
heuristics*, that we can apply.
3. Carry out the plan: Write and test code to implement the
algorithm.
226
A Data Analysis Project
• The ONS (2015) has this to say about why this data is important.
• Our example project can’t take such a wide view as the World Happiness
Survey the article mentions, because that would mean pulling together
data from too many different sources and in too many different formats.
To keep the task manageable, we will just be looking at subjective well-
being in the UK.
• The data we need is freely available. Since 2011, the ONS has collected
data on subjective well-being and published it in a format that is quite
easy to work with in Python.
227
Working With Data in Python
• Figure 5.7 shows how people
responded – using a scale from 0 (‘not
at all’) to 10 (‘completely’) – to the
happiness question in the financial
years ending in 2012 to 2015.
• The data file is happ_1.txt, which will be stored in the same folder
as our Python program. Opening a file in Python uses fairly self-
explanatory syntax.
file = open('happ_1.txt', 'r')
• The second argument 'r' tells Python we want to read from the file.
(To write to it, we would use 'w’.) 230
A Data Analysis Project
• Now the file is open, how can we access the data? A file consists of a series of lines,
and applying the heuristic suggests using the same approach as we would to
access the items in a list. Python lets us do just that: we can loop through the lines
in a file using a for statement, and print each as we go.
231
Working Out An Average
• To examine whether happiness in the UK increased over the time
period 2012–2015, we will work out the average for each year and
compare the results.
• There is more than one kind of average, but the one we will use
first is the mean.
232
Working Out An Average
• Our ONS data is more complicated than the example above,
though. The column for a particular year – 2012, say – shows the
percentage of people who rated their happiness at each point of
the scale.
1.4% rated it at 0
0.9% rated it at 1
2.2% rated it at 2
…
• and so on. We can’t just average the percentages; we have to take
the ratings into account as well.
236
Finding An Algorithm
• Now suppose that instead of two separate lists for Book A and Book B, we
want to store both sets of data together. What Python data structure can we
use to represent a table such as the data in Table 5.2?
• If we think of the table as a series of rows, with every row being a list, then
we can represent the table as a list of lists, like this.
table = [[9, 27, 26, 19, 12, 7],[4, 21, 30, 25, 18, 2]]
• We can now find the average of each row using the same algorithm as
before, but inside a loop .
table = [[9, 27, 26, 19, 12, 7],[4, 21, 30, 25, 18, 2]]
for row in table:
Note that numbers is now a list variable that will
numbers = row
pass over all the internal lists
total = 0
for rating in range(6):
product = rating * numbers[rating]
total = total + product
average = total / 100
print('average rating =', average)
237
Getting The Data in The Form We Want
• Now we have a working algorithm for
calculating the average of star
ratings, we want to extend it to the
ONS happiness data. However, we
first need to get the data into a
suitable form. Our averaging program
works with a table of numbers,
represented as a list of lists, but the
data is currently in a CSV file.
238
Getting The Data in The Form We Want
• This might look fairly promising, but it turns out each row is
actually a string, not a list at all, so we still have some way to go.
We can tackle the problem in a series of stages, each of which will
take us nearer to our goal (rather like the squirrels we saw earlier).
• The first step is to convert each row into a list of separate values.
There are various ways to do this, but we shall use the Python CSV
reader, which is in a Python module csv.
• Here’s how the code will work. This is just an explanation and you
don’t have to run any code yet. We have numbered the lines so we
can refer to them.
• Before we can use the CSV reader, we must import the Python
module containing it.
239
Getting The Data in The Form We Want
• LinePython code
1 import csv
• Then we read the data into a table using the following code, which
is a adaptation of the code you saw previously.
• LinePython code
2 file = open('happ_1.txt', 'r')
3 table =[]
4 reader = csv.reader(file)
5 for row in reader:
6 table.append(row)
7 print(table)
241
Getting The Data in The Form We Want
• Line 3 creates an empty list that is going to hold the rows of the table.
• Line 4 opens a special CSV reader. A CSV reader automatically splits a string
of comma-separated values into a list of individual items.
For example, if it came across the string : 'the,cat,sat,on,the,mat’
• it would convert it to the list:['the','cat','sat','on','the','mat’]
• Lines 5 and 6 iterate through the rows of the CSV file. Each row is converted
to a list as described above, and the resulting list added to the table.
• Line 7 just prints the final table. 242
Introducing table_utils
Here are four problems you may have noted.
• The table has borders: on the left, the ratings 0--10, and across the top the
years 2012-2015. We need to exclude these from the calculation.
• The last column is a series of empty strings; we need to exclude that too
• Because each year corresponds to a column, not a row, we can't calculate
the averages row by row as we did for the book data
• The elements are strings, not numbers.
Issues like this are common when we analyze data. It won't usually be structured as
we'd like and we need some initial processing to get it into the form we require.
To overcome these issues, we will process the data using a series of utility
functions. There is a well-known library for data analysis in Python called pandas,
but this does far more than we need and is not part of the standard Python
installation. So instead we have written our own small library table_utils, a sort
of ‘baby pandas’ that does everything we require here.
243
Introducing table_utils
• We shall use four functions that process a table in
various ways:
• rows(), which selects only that part of a
table between specified rows
• cols(), which selects only that part of a
table between specified columns
• flip(), which swaps the rows and columns
in a table
• to_float(), which converts all the strings
in a table to the equivalent numbers.
1,2,3
4,5,6
245
Getting The Data in The Form We Want
• The function cols() takes three arguments: a
table, a start column and an end column. It returns a
new table containing just the columns beginning
with the start column and finishing with the end
column, inclusively (Figure 5.11).
2
5
8
246
Getting The Data in The Form We Want
• The function flip() takes one argument, a
table, and returns a new table like the original
but with the rows and columns interchanged
(Figure 5.12).
• flip()gives
1,4 Figure 5.12 Rows and
2,5 columns are interchanged
to form new table
3,6
247
Getting The Data in The Form We Want
• The function to_float() is needed when a table contains numbers
represented by strings, for example '1.23’.
• At first sight, '1.23' may look like a floating-point number – that is, a number
with a decimal point in it. (Floating-point numbers were introduced in Block 1
Part 1.)
• The to_float() function takes a table as argument and returns a new table
with all the strings in the table converted to their equivalent floating-point
values, where possible. For instance, '1.23' would become 1.23, as described
above. Any strings that don’t correspond to floating-point values are simply left
unchanged.
248
Getting The Data in The Form We Want
Using these functions, we can get the table into the shape we want as follows:
apply flip()
apply to_float()
250
251
Getting The Data in The Form We Want
• The next stage now, is to find the average rating of happiness for each year.
We will follow the same steps that we have used in calculating the books
average rating using a scale from 0 (‘not at all’) to 10 (‘completely’) – to the
happiness question in the financial years ending in 2012 to 2015.
x=2012
for row in table5:
numbers=row
total=0
for rating in range(11):
product = rating*numbers[rating]
total=total + product
average=total/100
print('average rating of happiness in year’, x ,'is =', round(average,2))
x=x+1
252
More on Summarizing Data
• In the last section, we summarized ONS data on happiness by
calculating the mean. This approach is widely accepted, and the
ONS uses it.
• However, some people argue we should only use the mean when
the data measures a quantity such as weight or height. Their
objection comes from the fact that points on an arbitrary scale (such
as happiness) don’t represent a real quantity in the way weight or
length do. We can’t really say that someone who rates their
happiness as 10 is twice as happy as someone who rates theirs as 5.
• The length of the list is even, so we need to find the two middle
values and take the value halfway between them. The middle values
are 4 and 5, so the median is 4.5
254
Dispersion (distributing things over a wide area)
• Another type of reduction is used to assess dispersion – how spread out the
data are. Do they mostly cluster close to the average, or are the data more
spread out, with many values much smaller or much greater than average?
• Figure 5.14 shows two distributions with the same median (and mean), but
(b) is more dispersed than (a).
Figure 5.14 Dispersion
• One measure of dispersion that goes naturally with the median is the
interquartile range. The quartiles are the values that split the data into
quarters, similarly to the way the median splits them in half; see Figure 5.15.
Note that the median is the same thing as Quartile 2.
• If we keep the median the same but increase the interquartile range, the
dispersion increases and we have a graph like Figure 5.14(b), as contrasted
255
with Figure 5.14(a).
Skew
Figure 5.16
Frequency of word
lengths in the
Gettysburg Address
• A final property of interest a dataset may have is skew. A dataset is skewed if the
values are distributed unevenly about the average, with the tail on one side
stretching off further than on the other. This can be explained best with an example.
Figure 5.16 is a bar chart showing the frequency with which different word lengths occur in a
well-known speech.
• You see there is a peak at 3, which shows that words of length 3 are comparatively common,
and there is quite a long tail off to the right. (The distribution is described as right-skewed.)
Although it is possible to measure degree of skew as a number, it is more common to
comment on any skew and say whether it is to the left (also known as negative skew) or
right (also known as positive skew).
• The trend is fairly clear, and we can see that the points all fit roughly
around a straight line, as shown in Figure 5.28. 258
Calculating and Interpreting
Correlation Coefficients
• Courses on statistics explain in detail how the correlation
coefficient is calculated, but here we are only interested in
using it as a tool, so we have written a Python function for this
purpose.
259
Calculating and Interpreting
Correlation Coefficients
Table 5.4 Interpretation of correlation coefficient r
Value of r Level of correlation
• You saw that data analysis is a process of analyzing data to extract information
from it. You were introduced to an important source of publicly available data,
the Office for National Statistics, and investigated data to do with well-being.
With the aid of suitable library functions, you used Python to manipulate data in
the form of tables and produce summary information such as averages,
measures of spread, and correlation coefficients.
• Throughout this part, you extended your ability to use Python for larger-scale
problems, and to apply ideas of algorithmic combination you met in Block 2 Part
2. You also met examples of good programming practice, such as keeping a
laboratory notebook, and laying code files out for optimum readability.
• Finally, you applied critical thinking to the results of the data analysis and
considered how reliable the conclusions drawn were likely to be.
262
TM112: Introduction to Computing and
Information Technology 2
Meeting #8
Block 2 (Part 1 )
Cloud Computing
Block 2 (Part 3)
Mobile Phones
OU Materials, PPT prepared by Dr. Khaled Suwais
265
Introduction
• The term cloud computing describes a system where users access a
virtual machine or application that is hosted on a remote server
and supplied by a cloud provider.
• The data and information the user exchanges with the server is
delivered via public internet connections, rather than privately
owned networks.
266
What is The Cloud?
Cloud computing includes the following five essential characteristics.
268
Where is The Cloud?
The servers that host the virtual machines at the
heart of the cloud are located in large air-conditioned
rooms in data centres.
(see Figure 1.3)
271
Types of Cloud
• The term private cloud has come to mean a cloud
infrastructure that is for the exclusive use of a single
organisation.
272
Types of Cloud
• A third type of cloud, a community cloud, is a cloud
infrastructure that is exclusively for a specific community of
consumers from organisations that have shared concerns (e.g.
mission, security requirements, policy and compliance
considerations). It may be owned, managed and operated by
one or more of the organisations in the community, a third
party, or some combination of them, and it may exist on or off
premises.
• In addition to the above three types, we have a popular
solution for large enterprises that want the privacy and control
of a private cloud, but the scalability and multiple locations of
the public cloud.
A hybrid cloud is a composition of two or more distinct
types of cloud (private, public or community).
273
The Cloud And I
274
The Cloud And I
• One of the most common ways in which individuals use the
public cloud is to store and share files using free packages
such as Dropbox, Google Drive, IDrive, OneDrive and Apple
iCloud Drive.
• The fact that Google and other providers of ‘free’ services are
incredibly profitable may suggest that we underestimate the
revenue that they accrue from their ‘free’ offerings.
276
Are My Documents, Photos and Login
Data Secure in The Cloud?
• In terms of privacy, bear in mind too that your photographs
and documents may be more vulnerable to being seen by
others if they are in the cloud.
• You should also be aware that login details for cloud storage
sites can be a target for hackers; for example, Dropbox
admitted in 2016 that the passwords and email addresses of
nearly 70 million users had been stolen.
277
Cloud Architecture
• Cloud providers must have a mechanism to allow multiple users
to access the same physical resources, which are usually large
servers located at a distance from the consumers.
Customer 1 Customer 2
I want additional Give me a ready-made
storage and (possibly) platform – that is an Customer 3
processing capability operating system and I just want to use
on which I can do my everything I need to applications for tasks
own thing. I may want execute a program – on such as emailing, word
to install my own which I can build and processing, or data
operating system as run my own crunching. Please don’t
well as build and use applications. I don’t bother me with any
my own applications. want to worry about details at all; just supply
But someone else can how that platform is me with access to the
manage the actual deployed or on what applications I want to
hardware (the hardware, I am only use.
infrastructure); I am interested in using and
not interested in that. writing my own
applications.
279
Cloud Architecture
280
Cloud Architecture
• The lowest layer is the infrastructure layer and is mostly composed
of the physical kit, such as servers, storage and networking hardware.
Customer 1 wants to operate within this layer, which means he is
willing to pay the cloud provider to be responsible for the
infrastructure.
• Finally, the top layer, sometimes called the application layer, includes
the data and applications. Customer 3 wants to operate within this
layer, which means he is willing to pay the cloud provider to be
281
responsible for the infrastructure, the platform and the applications.
Pizza as a Service
• Figure 1.6 shows a famous infographic developed by Albert Barron, a
Senior Software Client Architect at IBM. In it he uses the different
ways a customer can obtain a pizza dinner as an analogy for who
manages what when a customer (you) buys a cloud service.
282
283
Infrastructure as a Service
• If, as a business, you select Infrastructure as a Service (IaaS),
you are essentially outsourcing your hardware needs to a cloud
provider. IaaS cloud providers sell virtual access to off-site
servers, storage and networking hardware. As the customer,
you can build your own platform on this hardware and access it
at any time, paying only for the resources you use.
284
Platform as a Service (PaaS)
• PaaS provides one or more platforms on which a business can run
existing applications or develop and test new ones without being at
risk of compromising their internal systems. It also enables
development teams that are geographically distributed to work
together on the same software project.
285
Software as a Service (SaaS)
286
User Issues in The Cloud
• Downtime is time during which the cloud services are not available.
The cloud service providers have to juggle demand, and there is
always a danger that they may be overwhelmed.
• Uptime is the time during which the cloud services are available.
• Storing data and important files on external clouds and moving them
across networks always opens up risks. A single vulnerability,
misconfiguration or malicious hacker can cause a security breach across an
entire provider’s cloud.
• Minimised risk: Cloud users might also want to ensure that their data and
files are not accessible by intruders and terrorism.
• Control: Customers have very little control, particularly over any downtime,
trouble-shooting, back-ups and disaster recovery
287
New Ways of Working
• The move to cloud-hosted SaaS has also seen a shift in the way software
applications are updated.
• This change in the way software is developed and deployed has also led to
a more agile way of working called continuous delivery. As updates and
bug fixes are completed, they are tested against a version of the current
release; if the tests pass, the update is added to the live application. In
order to achieve this, the developers, who make the software, and the
operational staff, who deal with the software after it is deployed, must
work in tandem to develop, maintain and improve a running service. In
particular, the developer role must take into account the way the software
is deployed and maintained operationally. They must test their new
software against a model of the live application and should expect their
work to be added to it when appropriate.
290
• The components of a mobile phone
• Smartphones
• What is Inside a Mobile Device?
• Touch-Based Graphical User Interface
• Operating Systems for Mobile Devices
291
Smartphones
292
What is Inside a Mobile Device?
• The heart of the modern mobile phone consists of a multi-core central
processing unit (CPU), a graphics processing unit (GPU) and digital
signal processor (DSP).
• The CPU (or processor) is the component that directly processes data and
instructions.
• GPU is optimized to process graphics.
• Solid-state flash memory, like that in a USB stick, usually provides the
storage memory, as this is faster, lighter and uses less power than a disk
drive – important characteristics for a small, portable device.
• Almost all mobile phones have one or more peripherals for users to connect
to other devices, such as subscriber identity module (SIM) cards, memory
cards, chargers, headphones and other computers.
• Another major trend has been the integration of the large screen with touch
sensors, and by 2010 graphical user interfaces (GUIs) were the norm. Users
can also manipulate graphics as well as text. Figure 3.3 shows some mobiles
phones from different eras.
296
Figure 3.3 The evolution of mobile phone design over three decades
Touch-Based Graphical User Interface
How does a touch screen work?
Usually, touch screens use changes in electrical properties to detect a touch.
• A resistive touch screen uses the principle that the pressure of a finger can
be used to connect two thin conducting layers, causing a change in
electrical resistance at that point. This kind of screen has the advantage
that it works if the user has gloves on, and it is cheap and resistant to
liquids and other contaminants.
• However, resistive touch screens have the disadvantage that the user has
to press down, either with a finger or a stylus, which can cause damage.
Because of these disadvantages, capacitive touch screens are now the
norm on smartphones.
• A capacitive touch screen is based on another electrical property, called
capacitance, which measures how much charge a device (called a
capacitor) can hold.
297
Operating systems for mobile devices
298
Android vs. Apple iOS
Android
• Android is by far the most widely used mobile OS. It was initially developed
by a California-based company Android Inc. in 2003, but was acquired by
Google in 2005.
• The heart of Android is based on the most fundamental operations within
the Linux operating system, which is open-source. As a result, Android itself
became an open-source project, known as the Android Open Source
Project (AOSP).
Apple iOS
• iOS is the second-most used mobile OS. It was developed by Apple Inc. and
is used exclusively on Apple’s i-series mobile devices, such as the iPhone and
iPad. The success of iOS is a result of the availability of high-quality
applications and the popularity of the i-devices.
300
Cameras
• Just to briefly recap, an image is made up of a two-dimensional grid of pixels,
and each pixel represents a dot or a square on the image. The resolution of
an image sensor is a measure of the number of pixel sensors it contains.
• The heart of a camera is an image sensor. Most image sensors are now of the
active pixel sensor (APS) type. As APS is usually made using the
complementary metal-oxide semiconductor (CMOS), this type of sensor is also
known as CMOS image sensor. A picture of a magnified CMOS APS image sensor
is shown in Figure 3.5.
• The resolution and quality of a picture taken by a camera mostly depends on its
image sensor. Generally, the larger the sensor, the better the image quality.
301
To produce a colored image, the pixel sensors need to be able to sense at least
the three primary colors of light, i.e. red, green and blue, because any color of
light can be separated into various amounts of these three colors.
303
• Communication Technologies Used
By Mobile Devices
304
Long-range Mobile Communications
• Mobile phones are connected wirelessly through radio waves to a base
station. A base station coordinates what happens inside each local part of
the mobile phone network, which is called a cell. From the base station,
the calls are routed onward to their destination through cables or
different wireless links via some intermediate subsystems.
305
5G
• The main objectives of the 5G network are to:
307
Wi-Fi Direct
• Wi-Fi is a very common wireless communication method used by laptops
and mobile devices, often for connecting the devices to a wireless local
area network (WLAN) and to access a broadband internet connection
through an access point.
• Like Wi-Fi devices, Wi-Fi Direct devices usually operate on the 2.4 GHz
and 5 GHz radio frequency bands. It has an indoor range of tens of
meters, which is usually enough to cover a typical home environment,
but the outdoor range can be several times higher.
• At the time of writing, Wi-Fi Direct uses the Wi-Fi Protected Access II
encryption (WPA2).
308
Bluetooth
• Bluetooth is another wireless technology that operates on the 2.4 GHz
radio frequency band. Jim Kardach, an engineer who worked on the
technology in the 1990s explained many years later that the name
Bluetooth
• It has a shorter range and lower data rate than Wi-Fi – often less than 10
meters and 24 Mbps.
• Apart from using encryption, Bluetooth also uses a pairing process to
improve security. The pairing process usually involves some human
input, such as entering the same PIN on both devices.
• Mobile phone users often use Bluetooth to connect auxiliary devices such
as headphones and remote controls to their mobile phones.
309
Near-field Communication
• Near-field communication (NFC) is a wireless technology that was
developed for contactless payments using debit and credit cards.
• The range of NFC is usually a few centimeters, with a modest data rate of
up to 424 kbps.
310
Challenges and Issues
1- Coverage
We have all experienced the frustration of not being able to get a mobile
signal. But why does this occur?
Locations:
In the remote countryside, the signals are usually weaker. This is because in areas
with lower population densities. The providers therefore build fewer base
stations, so each has to cover a larger area.
Weather:
• As the propagation of radio waves can be affected by the weather, this can
weaken the signal reception at some places
311
Challenges and Issues
2- Battery Life
• With a large display, various radio connections and powerful processors,
modern mobile phones consume lots of power.
High-power Batteries:
Another way is to improve its energy-storing ability.
313
Summary
• You learned what ‘the cloud’ is and the kinds of services it can
provide to individuals and businesses. You also considered some of
the issues with the cloud, and how the expectations of the cloud
providers and the cloud consumers can be managed through
customer service agreements.
• You have also learned about some of the challenges and issues
around the use of mobile phones.
314
TM112: Introduction to Computing and
Information Technology 2
Meeting #9
Block 2 (Part 6)
Location-based Computing
• Introduction
• Making Use of Location
• Introducing GPS: Where in The World Are You?
• Where’s My Phone?
• I’m Home… Wi-Fi Reveals Location Too
• Indoor Tracking: Just The Place for Bluetooth Beacons
• Summary
316
Introduction
• Where, exactly, are you? How do you know where you are? Who
else knows? And does it matter?
• If I were to ask you these questions, how would you respond? You would
possibly reply to the first question with a physical location that is
meaningful to you (‘at home’, ‘in the living room’), or you might be
travelling, in which case your response might be more vague (‘walking
along the Embankment in London between Waterloo Station and the
Houses of Parliament’).
317
Making Use of Location
• A location-based computing is a form of computing that is used to
support the delivery of location-based services. These services
provide information based on the location of the user.
318
Making Use of Location
• Geocoding refers to the encoding of location information, such as
an address, into a geolocation.
• Reverse geocoding is the process of identifying a location in human
terms (such as an address, or point of interest such as the Eiffel
Tower) from a geolocation.
319
Are We There Yet? Geofencing
• One of the most widespread uses of geolocation data is for route
planning as part of a navigation system, such as in-car GPS. As well as
supporting navigation, such systems know when you’ve reached your
destination. But how?
• One way is to create a range of services based on the location of
someone or something using a technique known as geofencing.
• A geofence is a notional (virtual) geographical boundary that can be
used to define an area within which particular services may be offered
or withheld.
• One issue with revealing location information is that the type of locations
visited might be used to infer information about our activities.
321
Linking Location and Context
• As well as using location data to identify particular locations a user has
visited, it is often possible to use a series of time-stamped location readings
to make better estimates of the user’s current activity.
322
Linking Location and Context
• Location-based technologies may also be used to track, or reveal, the
location of other people.
• Many people have a routine that requires them to move between a small
number of locations each day, such as their home, workplace, children’s
school, etc. It is often possible to use just a few such location updates at
different times of the day to identify significant locations in that individual’s
life, and from those, to identify who that individual actually is.
323
Introducing GPS: Where in The World
Are You?
• The Global Positioning System (GPS) was originally developed for
military use, but has been readily adopted as a consumer-facing
technology, and is the most commonly used satellite navigation system
(hence the abbreviated term ‘satnav’).
• We will look in more detail at how GPS works in Subsection 6.2.3, but
suffice to say for now that GPS is a line-of-sight technology; that is,
there should be an unobstructed view of the satellites from the
receiver, typically limiting the effective operating area of GPS systems
to outdoor locations.
324
Introducing GPS: Where in The World
Are You?
• More recently produced phone-based navigation apps, take the form of
fully connected devices, using a data connection provided via the
mobile phone network.
• These are capable of both receiving data from a range of live, context-
based online information services, such as alerts about upcoming traffic
delays, as well as feeding information such as your speed and current
location back to service providers.
• As well as using GPS for individual route-finding, GPS trackers are also
used by many commercial vehicle operators to privately keep track of
the vehicles in their fleet. However, real-time sources of data are also
available for tracking other sorts of vehicle, and increasing amounts of
this information are being made publicly available.
325
Identifying Where You Are: From
Addresses to Locations
326
Latitude and Longitude
• In Activity 6.3, you used a
Google Maps search to map the
location of a set of latitude and
longitude coordinates. A
standard Google search also
allows you to run a web search
for the coordinates of named
location. When I searched for the
coordinates of Milton Keynes
(see Figure 6.4), I was presented
with the result 52.0406° N,
0.7594° W. This represents an Figure 6.4 Global coordinates of Milton Keynes
328
Latitude and Longitude
329
GPS-based Location Detection
• GPS devices can use latitude and longitude coordinates to
represent and communicate geolocation information in an
unambiguous way.
• Even though millions of separate GPS receivers can use the same
GPS satellites to determine their own location, none of the
satellites need to know anything about any of the receivers.
• Rather than responding to individual requests from each separate
receiver, each satellite broadcasts the same information to all the
GPS receivers that are in sight of it. It is then up to each receiver to
work out where it is for itself.
331
GPS-based Location Detection
• The GPS receiver can compare the received code with the code already
running at the receiver. The received code will appear delayed by the
amount of time that it has taken to propagate from the transmitter – that
is, it will appear to have been shifted in time (see Figure 6.7).
• The adjustment in time needed to bring the received code and the original
code sequences back into alignment is the time it takes for the signal to
travel from the satellite transmitter to the receiver.
• The following calculation shows how we can estimate the time it takes for a
signal to reach the Earth’s surface from a GPS satellite directly above it, at
an altitude of 20 800 km. A radio signal travels at the speed of light, which
is approximately 3 × 108 m/s (300 000 000 meters per second), and which
we assume to be constant.
332
GPS-based Location Detection
• Where time is in seconds, distance in meters and speed in meters
per second.
• As we know the distance from the receiver to the satellite and the
speed of light, we can rearrange the equation to calculate the time
as follows:
• time = distance / speed
333
GPS-based Location Detection
• By expressing the calculation in scientific notation, we
can divide the first power-of-10 term by the second by
subtracting the exponent of the second number (the
denominator) from the exponent of the first number
(the numerator):
(2.08 / 3) × 107−8 𝑠 = 0.69 × 10−1 𝑠
(to 2 significant figures)
337
Localization Using Triangulation and
Trilateration
• If you enable your mobile device to supply location information to
other networked services, they may use that information to deliver
location-dependent services back to your device. But it is also
possible for networks to become aware of your location based on
the physical location from where you connect to the network.
• Using a compass and an accurate scale map, you can locate your
position using two or more bearings. (A bearing is an angle
measured from due north.) Identify two or more visible points that
you can see from your current location and that you can also
identify on the map.
338
Localization Using Triangulation and
Trilateration
• When draw a line through each observed location on the bearing it
is measured on and your location is the point at which the lines
intersect. This technique is known as triangulation, and it uses
angles to known locations to locate your position. However, if we
are trying to identify our location based on just our distance from
multiple cell towers (or GPS satellites), we need a slightly different
technique called trilateration. Note that triangulation is often used
to refer to both trilateration and triangulation. A more general term
– localization – is also used to cover these two techniques (and
more).
340
Approximate Location Determination
• Figure 6.9 shows two base stations, M1 and M2, with location
coordinates (2, 4) and (5, 2), respectively. The circles represent the
transmission range of the base stations. Mobile device A is in range
of M1 and mobile device B is in range of M2. Using the technique
described, device A can estimate its location to be the coordinates
of M1 – in other words, the estimated location of device A is (2, 4).
Similarly, the estimated location of device B is (5, 2).
342
I’m Home… Wi-Fi Reveals Location Too
• In the previous two sections, you learned how GPS and cell-
tower localization can be used to find the location of a mobile
device.
• But you may also have noticed how your web browser
occasionally prompts you for location information when using
your desktop or laptop computer. (Browser location services
can be enabled and disabled through your browser
settings/preferences.)
• Just as each cell tower has a unique Cell ID, each Wi-Fi router also
has a unique identifier associated with it in the form of MAC
address.
344
Wi-Fi-hotspot-based localization
• Several global databases exist that have location information
associated with Wi-Fi router MAC addresses. These location/MAC
address pairs are often harvested from location-aware mobile
devices that have encountered the Wi-Fi router at a particular
location. As such, the location information may not be reliable – for
example, if the router has been moved since its location was last
confirmed.
• If you look up one or more Wi-Fi router MAC addresses, you can use
the locations associated with them to roughly locate your position.
If you also know the power of the signal from the router, and an
estimate of the signal power associated with it, you can use that
information to improve your estimate of your distance to the
router, and hence improve the accuracy of your computed location.
345
Wi-Fi-hotspot-based localization
• When you grant permission to a website allowing it to use your
location, the browser may request information from the computer
about in-range Wi-Fi router MAC addresses. It can then submit
these addresses to a lookup service which returns the estimated
location of the browser based on the location of the identified Wi-Fi
routers.
346
Tracking Devices Using Wi-Fi
• Can the Wi-Fi network also be used to track the movements of users
carrying Wi-Fi enabled devices?
• There are two ways in which Wi-Fi enabled devices can identify any
nearby access points (Haigh, 2014):
active mode: the typical default mode, in which the device broadcasts
a ‘Who is there?’ packet on each channel and waits for a response. The
broadcast message includes the device MAC address, which means
that the device reveals itself to anyone who happens to be listening.
347
Passing trade: Bluetooth Beacons and
Contextual Alerts
• Originally developed by Ericsson Mobile Communications to replace wired
connections to mobile phones, Bluetooth is now a widely adopted
technology for providing short-range radio communications between
devices. Another major application area is connecting peripherals such as
audio speakers to media players or wearables to mobile phones.
348
Passing trade: Bluetooth Beacons and
Contextual Alerts
• In contrast to GPS-based services, a predominantly outdoors technology
where a device locates itself with reference to multiple GPS satellites,
beacon technology typically associates a fixed indoor location with a
beacon identifier that alerts passing receivers to the presence of that
beacon.
349
Localization Using Bluetooth Beacons
350
Localization Using Bluetooth Beacons
• Suppose now that there are three beacons in range. Based on the power of
the signal received from each beacon, we can work out the distance to
each beacon, but not necessarily in which direction it is. However, we also
have the identifier for each beacon, which can be used to look up its
physical location. Knowing the location of the three beacons, and the
distance to each of them, allows us to locate the receiver.
351
Localization Using Bluetooth Beacons
• In Figure 6.11, three beacons
are in range of a receiver. The
receiver uses power readings
(the strength of the signals
received from the beacons)
and power ratings (the power
level 1 m away from each
beacon) to calculate the
distance to each beacon.
Circles are drawn around each
beacon to indicate the
measured distance the
receiver is from the beacon – Figure 6.11 An
idealised view of
we just don’t know in which localization using
direction each beacon is trilateration
Meeting #10
Block 2 (Part 7)
Dangerous Data
• Introduction
• Online – The New Font Line
• Information Assets
• Authentication
• Malware
• Cyberwar
• Summary
355
Introduction
• The explosion in goods and services available online, as well as our
society’s desire to socialize online, has made the internet an irresistible
target for people who wish to do us harm; ranging from criminals who
want to steal our money and our identities, to those who abuse
vulnerable people.
• The term hacker has historically been a divisive one, sometimes being used
as a term of admiration for an individual who exhibits a high degree of skill,
as well as creativity in his or her approach to technical problems, and
sometimes (more commonly) applied to an individual who uses this skill for
illegal or unethical purposes.
356
357
Spear Phishing: The Targeted Attack
You have almost certainly received spam email supposedly coming
from a bank or another company telling you there is a problem with
your account. These emails are phishing for information. Their
senders hope you will respond and provide personal information that
can be used to commit fraud.
358
Spear Phishing: The Targeted
Attack
• This type of targeting is called a spear phishing attack. A 2012 estimate
(TrendLabs, 2012) suggested that 91% of targeted attacks used spear
phishing at some point. Spear phishing focused on senior
management, (who are most likely to have privileged access to
information), is known as whaling.
• Cleaver’s targets had received emails saying that they were being
considered for an important job. They were asked to complete a CV by
following a link to the website easyresumecreatorpro.com where they
could download a copy of a well-known tool called Easy Resume
Creator Pro.
359
The Final Attack: Malicious Software
• Cleaver’s developers created a new, malicious version of the CV writing
application which could be downloaded from
easyresumecreatorpro.com.
• Just like the original, the application allowed users to create a new CV.
When complete, users were encouraged to upload their document so it
could be reviewed by potential employers. In fact, nothing was
uploaded; submitting the CV activated malicious software that had
been downloaded along with the application.
362
Passwords
363
Passwords
I. Identification. The process of claiming you are a particular
individual. In our example, when you hand over your AOU
university ID to the university campus security officer, you
identified yourself as an AOU student. Identification doesn’t
prove that you are telling the truth; although you presented a
UID, you might be using a false one.
365
What Happens When You Enter A
Password?
• Imagine you had to create a computer password system for a
website. You might start off by having a user enter their password,
which is transmitted to the site’s server and compared to a stored
password. Only if the two match is the user allowed into the site.
• Even if the password file for every user is stolen, the attackers still
don’t know the actual passwords they need to enter in order to
access the computer. The users are not immediately at risk.
369
Attacking Passwords
• Two common techniques are used to obtain passwords:
371
Username Hashed password
A record from a
Fadi2020 570a90bfbf8c7eab5dc5d4e26832d5b1 Stolen password file
Plaintext Hash
samar 7294001ae51b8cdfd50eb4459ee28182
Fadi2020 570a90bfbf8c7eab5dc5d4e26832d5b1
2006199 d5aa1729c8c253e5d917a5264855eab8
qwerty daa759be97f37e5f7eff5883801aebed
• Hence, hashing, alone, cannot protect passwords from dictionary attacks if
the original password can be found in a dictionary. Matching a hash in the
password file with one from the hashed dictionary means that they
represent the same piece of plaintext.
372
Non-technical Attacks
• Rather than try to steal and break a password file, attackers may
risk stealing passwords from offices and other workplaces.
It takes only a few moments and a removable flash memory drive for
an attacker to install a keylogger program which captures passwords
as they are entered on the keyboard.
373
Password Managers
• A password manager is a computer application that stores passwords in
an encrypted database.
• Most password managers can create new passwords; since computers can
generate and store arbitrarily long pieces of nonsense text – such as
MHpKQCvpYoouTAaPiiWuFKjpNe7qnsbwkrvq3s3cX – password managers
can produce passwords that are highly resistant to both brute-force and
dictionary attacks.
374
Two-factor Authentication
• So, if one password isn’t secure enough, perhaps having two pieces
of information to perform authentication will be more secure? So-
called two-factor authentication will be familiar to you as you will
have used it to withdraw money from an ATM. Here, you must give
the bank two pieces of information:
• In this case, the possession is the data stored on your bank card;
the information you know is your PIN. Individually, neither can
access your account, but when brought together they allow you to
withdraw money.
375
Hardware Security Tokens
• Many banks offer two-factor authentication to online banking
customers, with accounts accessed using a combination of a password
and a four- or six-digit number generated by a small hardware security
token that can be kept in a wallet or attached to a keychain.
• When a user logs in to their bank, they are asked to enter the token’s
one-time password into their browser. The bank’s computer will have
also generated the same number. The two values are compared by the
bank; if they match, the user is allowed into their account.
376
How Big is The Threat From
Malicious Software
• By 2014, nearly one million new pieces of malware (a contraction of
‘malicious software’) were released every day.
377
How Big is The Threat From
Malicious Software
• Since 2013, a range of malware programs have targeted PCs; quietly
and quickly encrypting crucial data so that it can no longer be accessed
without paying a ransom. If this payment, sometimes running into
hundreds of pounds, is not paid, the data will be irretrievably lost.
• Some of this ransomware has been linked to organized crime. Just one
piece of ransomware, called Cryptolocker , is estimated to have
‘earned’ $27 million for its creators. In other cases, ransomware
appears to be primarily intended to cause disruption, such as the
WannaCry program which crippled computers in 150 countries during
May 2017, including those belonging to the NHS, O2, Nissan, FedEx
and Russian Railways.
378
What is Malware?
i. viruses
ii. worms
iii. Trojans.
379
Viruses
• A virus is a program capable of making new copies of itself which are
inserted into applications, data or crucial areas of a computer’s hard
disk.
• Viruses are attached to specific applications on a computer and are
activated when that program first runs.
Most computer viruses are built from three main programming components:
The trigger
An event or condition that activates the virus. The trigger can include a
certain date or time, or an action.
The payload
The destructive code that forms the heart of the virus, which can perform
such tasks as corrupting, destroying or encrypting a user’s data or damaging
the operating system.
380
Worms
• Like a virus, a worm is a self-replicating program designed to make
copies of itself. Unlike a virus, a worm is a standalone application.
Most worms spread through network connections.
• Worms can use triggers to remain dormant on infected machines
until certain times or conditions whereupon their payload is
activated.
381
Trojans
• Unlike viruses and worms, Trojans are not self-replicating; instead, they
are often distributed by email or pop-up adverts on websites,
masquerading as legitimate applications such as screensavers. The Trojan
might even work as advertised – a download accelerator might result in
(slightly) faster downloading, but the Trojan will also contain a destructive
payload.
382
Other Types of Malware
• In addition to the three types of malware described above and the
ransomware discussed earlier, you may see references to other forms of
malware, including the following.
• Adware
Forces users to view advertising and may report their internet use to
advertisers or its creators.
• Spyware
Attempts to access personal information by monitoring keystrokes
or patterns of activity.
• Rootkits
Hidden programs used by attackers to remotely control or access a
computer.
• Hijackers
Redirect browsers to unwanted websites, either to earn advertising
clicks or to download further malware. Some of the sites
masquerade as legitimate websites and are designed to harvest
personal information such as logins and credit card details.
383
Botnets
• One strand of malware is concerned with recruiting computers into
an army of infected machines coordinated over the internet to
perform a malicious task. Affected machines are called zombies,
whilst their network is known as a botnet (or zombie army).
Individual botnets may consist of tens of thousands or even
millions of machines spread across the world, giving the owner of
the botnet enormous power to cause damage.
384
Botnets
385
Botnets
• Botnets Fall Into Two Broad Categories:
386
Botnets
• Botnets can also be used for the following:
• Spam email
Zombies can be used to send spam messages to every contact in
their address book.
• Click fraud
Most online advertising is paid for on a ‘per-click’ basis, with an
advertiser paying each time a user clicks on an advert. Click fraud
uses software to simulate clicking on an advert.
• Brute-force decryption
Passwords and other forms of secure data can be attacked by brute
force. Botnets share the task amongst many machines, allowing for
faster decryption.
387
Botnets
• Bitcoin mining
Bitcoins are produced through a complex mathematical process
requiring huge amounts of computer power. Rather than invest in
their own computers, criminals can use botnets to create new
Bitcoins.
• Denial-of-service(DoS) attacks
DoS is a method of attacking computers by flooding their network
connections with spurious data that prevents legitimate traffic from
being sent or received. Denial of service can cripple online services if
sufficient traffic can be directed at one site.
Botnets allow thousands or even millions of zombies to collaborate
in an attack; since the attackers are spread across the internet, these
attacks are described as a Distributed denial-of-service (DDoS)
attack.
388
Antivirus Software
• Antivirus software aims to detect, isolate and, if necessary, delete malware
on a computer before it can harm data. Antivirus software uses several
techniques to identify malware, the two most common are known as
signatures and heuristics.
• Signatures
390
Summary
• This part introduced you to cybersecurity, a topic relevant to you as
an individual as well as our society. Awareness of computer security
not only protects you, your family and your data; it is a key
academic skill for anyone wishing to work in the modern IT and
computing industries. It is no longer acceptable, or safe, for devices
and software to fail to include security features that affect their
usability or the safety of their users.
• You have met several key cybersecurity technologies, including
how passwords are processed by computers and how they can be
broken – a topic we will return to in Block 3 Part 3.
Meeting #11
Block 3 (Part 1)
Data on Your Computer: A Private Investigation!
• Introduction
• Hard Disk Drives
• Solid-State Drives (SSDs)
• Securing and Analyzing A Hard Drive
• System Files and Deleted Files
• Analyzing Main Memory (RAM) and Closing The Case
• Summary
393
The Structure of A Hard Disk Drive
• A hard disk is part of a unit -- often called a disk drive, hard drive or hard
disk drive -- that stores and provides relatively quick access to large
amounts of data on an electromagnetically charged surface or set of
surfaces. Today's computers typically come with a hard disk that can
contain anywhere from billions to trillions of bytes of storage.
394
Formatting A Hard Disk
• If there are lots of different operating systems, each with its own kind of
file system, do you need a different kind of disk drive for each? The
answer is no.
This is because we can prepare almost any hard disk drive to work with
any operating system and its file system by going through a process,
called formatting it, before we try to save any data to it.
• The most important thing that happens when a disk is formatted is that at
least one area of the disk must be loaded with the operating system’s file
system in readiness for it to store data. These areas are called partitions.
If you want to run more than one operating system on your machine, you
can even create partitions that have different file systems.
• If you have more than one partition, the formatting process will cause
them to be displayed as separate drives by your operating system –
for example, in Windows Explorer or Finder on a Mac.
• Once a disk has been formatted, you can write data to it.
395
Formatting A Hard Disk
1. Each platter of a hard disk is divided into several concentric tracks.
2. Each track is divided into several sectors, each of which can store the same
amount of data. A sector is the smallest physical storage unit on the disk,
and on most file systems it is fixed at 512 bytes in size.
3. A cluster can consist of one or more consecutive sectors – commonly, a
cluster will have 4 or 8 sectors. As a file is written to the disk, the file
system allocating the appropriate whole number of clusters to store the
file’s data.
396
Formatting A Hard Disk
• In older hard disk drives, the sectors on the outside of the disk had a larger
area than those closer to the center, which meant that they held fewer bits
per unit area and were less efficient at storing data than the inner sectors.
• On modern disks, each sector has the same area so they each store the same
number of bits per unit area.
• Figure 1.5 A comparison of sectors on an older and more modern hard disk
drive
397
Figure 1.5 A comparison of sectors on an older and more modern hard disk drive
Formatting A Hard Disk
• So once a file has been written to one or more clusters,
how does the operating system know where to find the
file again?
• FAT, which stands for File Allocation Table, is the area of the hard
disk that is used as an index of every cluster on the disk and records
whether a cluster is being used or not. It is what is at the heart of the
file system called FAT32, which used to be used by Windows
operating systems, but is now mainly used with solid-state memory,
such as flash. It can only cope with a maximum file size of 4 GB.
398
Formatting A Hard Disk
• Windows computers now mainly use a file system called New
Technology File System (NTFS), where a table called a Master File
Table (MFT) does a similar job.
• Apple has a file system that is unimaginatively called The Apple File
System, and the Linux file system is called ext4. They each have
similar tables.
• When a file is deleted, the operating system doesn’t erase the file; it
simply makes the clusters that the file occupies available for
reallocation. So the data is still there until it is overwritten, but there
is no reference to it in the file allocation table. However, the
operating system at some point might allocate a new file to one of
those clusters, which overwrites the original data.
• So even if a file has been deleted, the data might still be right there
on the hard disk if it hasn’t been overwritten yet.
400
Deleting Data From An HDD
• As we have seen, the cluster system means that there is almost always some
unused space in the last cluster when a file is stored. The amount of unused
space depends on the cluster size and the file size. The logical size of a file is
a measure of the number of bytes of data a file actually contains. Its physical
size is almost always bigger than this because it has to be stored in a discrete
number of clusters.
• So, for example, take a file that has a logical size of 1280 bytes. In a system
where there are 4 sectors of 512 bytes in a cluster, the file takes up a whole
cluster (or 2048 bytes), which means that the physical size of the file is 2048
bytes. The difference between 2048 and 1280 is 768, which means that there
is a slack space of 768 bytes.
• This leftover data, which is called latent data or ambient data, can
provide investigators with clues as to what was originally stored in the
whole cluster, which may in turn provide leads for other enquiries.
402
How to Permanently Delete Data
From An HDD
• For a hard disk drive, there are only three sure-fire ways:
• When you defragment (or defrag) a hard disk, you are using a
software utility that moves the chunks of files to try to arrange them
in contiguous clusters. That is called defragging
Figure 1.8 (a) A fragmented file occupying 6 clusters and (b) the same
404
file after defragmentation
The Growth of Solid-State Drives
• SSDs are solid-state drives, which use integrated circuits to store
data. They use a technology called flash memory.
• The file and operating systems still maintain the same system of
dividing the memory into logical sectors and clusters, even though
the physical form of a solid-state drive is very different to that of a
spinning disk.
• The operating system doesn’t need to know what physical type of
drive it is reading data from, or writing to, as long as it understands
the logical file storage structure defined by the file system.
405
Figure 1.9 A forecast of the growth of the worldwide use of SSDs in PCs
Comparing SSDs and HDDs
There are many advantages for SSDs over HDDs:
406
How Flash Memory Works
• On a microscopic level, SSDs are made up of semiconducting materials
that are configured so that they create a whole series of tiny electrically
insulated boxes, which act as memory cells.
407
Deleting Data From An SSD
• You can still physically destroy the drive, but degaussing does not work
because SSDs do not rely on magnetism to store zeros and ones.
• ATA Secure Erase command resets the whole of the SSD by applying a spike of
voltage to all of the memory cells simultaneously, flushing out all of the stored
electrons and forcing the drive to ‘forget’ all of its data.
408
Copying The Hard Drive and
Allocating A Hash Code
• As you know, data is represented by bits in computer storage and we are
going to make what is called a disk image of the hard drive – that is, we are
going to copy it, bit for bit. This is a process called dead system imaging,
because we have removed the hard disk from a switched off computer.
• It will take a while, though. “Take a look on that hard drive – can you see
where it says 4 TB? That means there are four terabytes’ worth of bits to
copy over, one at a time. And remember each terabyte is ~1012 bytes and
each byte has 8 bits.
• At the same time as we copy it, we are going to ensure that the image
cannot be accidently changed in any way. We are going to use a write
blocker for that – it is a piece of software that makes the image disk read-
only. Then we can work on extracting the data from the image, leaving the
original hard drive untouched. To be sure, we will seal it in an evidence bag as
soon as the copy is made. 409
Copying The Hard Drive and
Allocating A Hash Code
• The piece of software that I will use to make the disk image will also run an
algorithm that calculates a number, called a hash code, from all of the 0s and
1s on the original disk, This hash code provides a single number that is much
smaller than the total number of bits on the disk.
• Once we have made the disk image, we will use the same process to calculate
the hash code for that too. If the hash codes match, we can be certain (or
virtually certain, anyway) that the disk image is a true bit-for-bit copy of the
original disk, which means that we can do all of our investigations on the image
disk, rather than the original disk.
410
Copying The Hard Drive and
Allocating A Hash Code
How is this hash code calculated?
• Considering a simple hash code algorithm. Say the original
disk only contains a binary representation of the name
‘TAM’. We could reduce this to a single number by adding up
the ASCII codes for ‘T’, ‘A’ and ‘M’, which gives us a hash code
of : 84 + 65 + 77 = 226
• One way to deal with that problem is to use the modulus operator (%).
• A smaller hash code is often obtained by finding the modulus with a prime
number. So, for example, if we take the weighted value that we got for ‘TAM’
we can find its modulus with a prime number such as 23. So 445 % 23 = 8 and
we can use 8 as the hash code for ‘TAM’.
412
Reading The Hard Drive
• We need an image mounter. This is a piece of software that enables
the operating system to read and write data to a disk image. Except,
of course, in this case we can’t write anything to our image because
we used the write blocker when we created it. Once the disk image
has been mounted, its content will appear just as a physical disk
would in the computer, so it will look like another drive in Windows
Explorer or Finder on a Mac.
413
Timestamps and Other Metadata
• Metadata is a set of data that describes and gives information about
other data. The important pieces of metadata about a file kept by
any file system include the file’s name, size and path, as well as lots
of other information. It also keeps timestamps, which tell you when
a file was created, modified or deleted.
• You can see that the file size and the size on disk are different. The
first must be the logical size and the second must be the physical
size.
• The ‘Modify’ timestamp tells us the last time that the content of the
file was modified.
415
The Recycle Bin and Soft Deletes Files
• Assuming it was in the ‘Business plan’ folder, it looks as if it was deleted, so it
may be gone forever.
• We might be in luck if it was deleted using a soft delete: This is when a file is
deleted, either by pressing the delete button or dragging it to the Recycle
Bin (or the trash can in some operating systems).
• In fact, the file stays exactly where it is on the physical disk, whether it is an
HDD or an SSD. However, the operating system renames the deleted file
with a name that starts with $R and creates an associated file, the $I file, to
contain metadata about the deleted file. It then stores this new file in a
hidden location on the hard drive.
• If you want to recover your file from the Recycle Bin, you need to select it and
choose the option to restore it. When you do this, the data in the $I file in the
hidden location enables the operating system to reinstate its original path and
name, so it can be opened from its original folder in the Explorer window.
416
Hard Deletes
• Recycle Bin keeps your soft deleted files until the garbage consumes
about 5 percent of your computer’s available space. Then it purges
your oldest deleted files to make room for the new ones.
• When a file is hard deleted from the Recycle Bin of a hard disk, the
data still exists in its original location, but the loss of the $I file and
the removal of any reference to the $R file in the hidden folder
means that the operating system cannot locate the file any more. So
the space it occupied is released so something else can be stored
there.
But until the space is overwritten, the original content is still there.
417
File Carving
There is software available, called file carving or sometimes data carving
software.
• Once the software thinks it has found a file format it recognizes, it
does some further checks on the subsequent bytes to see if they are
compatible with the kind of file identified.
• The software then tries to find the end of the file.
• If the end of the file can’t be found, or if the beginning of the file has
been overwritten or if all else fails, the file carving software will at
least guess where the file ends, knowing where the next header
starts.
• However, if the header is missing, these basic data carving techniques
won’t work.
418
File Carving
• File carving doesn’t work on SSDs,because the TRIM function will
ensure that the unallocated and slack space will be overwritten, so
there is nothing to find.”
• The software we are going to use will let us select the particular types
of file that we are interested in finding. This speeds up the process.
• The output looks like depends on the software package, but this one
will copy all of the matching file types that it finds onto a new disk.
Then we can search it using filters, such as keywords, to try to find
the specific file that we are looking for.
419
Live Acquisition of Main Memory (RAM)
• The rule for forensic investigations is that if a computer is running when it is
first encountered, then leave it running. If it is not running, then don’t boot it
up ( Live acquisition).
• Live forensic acquisition provides for digital evidence collection in the order
that acknowledges the volatility of the evidence and collects it to maximize
the preservation of evidence.
• As you recall, RAM is the memory that is used to store instructions and data
just before they are required by the processor.
• Also, RAM is also is the place where the operating system is loaded. So it also
contains information about what processes and programs are running, which
networks the computer is connected to, passwords, files that have been
decrypted and the keys that were used to decrypt them.
420
RAM Data Recovery
Then there are registry hives:
• The registry is an area of RAM that is used to store the lowest-level settings
of the operating system.
• A hive is just a space within this registry area. Each time a new user logs onto
a computer, a new hive is created for that user that contains registry
information about their profile, such as their settings, desktop, environment,
network connections and printers.
There may also be file data ‘temporarily’ stored in RAM before it is written to the
hard drive, so RAM analysis can reveal a lot of important information about a
system and its users.
The data stored in RAM disappears when the power is switched off. So, at any
given moment, the state of a system’s volatile memory is not reproducible.
That is why we have to use ‘live acquisition’ if we want to find evidence in RAM.
421
Summary
• You have learned such a lot since then. How data is stored
on a hard disk and in solid-state memory and how
difficult it is to entirely delete data. you now know how to
collect digital evidence, you know what metadata is and
how it can be found, and how to carve data from hard
disks and RAM.
422
TM112: Introduction to Computing and
Information Technology 2
Meeting #12
Block 3 (Part 3)
Cryptography: The Secret of Keeping Secrets
• Introduction
• Hashing
• Ciphers and Keys: An Introduction to Encryption
• Symmetric Encryption
• Turning The World Upside Down: Asymmetric Cryptography
• Summary
424
Introduction
• Computer security technologies are a double-edged sword: they
not only protect legitimate users from attack, but they can also hide
criminals from law enforcement. The history of computer security
has always been a balance between those who see these
technologies as a benefit to society and those who consider it a
great threat.
425
Hashing
• We used hashing earlier to obscure passwords stored on computers. In
this context, hashing is used to hide the actual value of the password
from prying eyes, but hashing has many more uses and is crucial to a
wide range of computer technologies.
• Hashing is useful because of two related characteristics:
1. It is a ‘one-way’ operation.
2. A variation of a single bit of data between two otherwise
identical files will result in vastly different hash values
426
Hashing
Algorithm Hash size (bits) Published
• Whilst hashes are described in terms of the number of bits making up the
hash, they are usually stored and displayed as hexadecimal values, with
every four bits represented by a single hexadecimal value (0–f). So the 128-
bit MD5 hash
1100 0111 1111 0100 0101 0101 1110 0010 0111 0111 0000 0100 0011 0110
0100 0110 1111 0111 1101 1101 0110 0111 1000 0001 1001 1100 0110 1000
0000 0101 0011 1111
• Collisions are extremely rare – the first MD5 collision was only found after
hashing 250 different pieces of data – but that they exist at all means it is
impossible to completely guarantee the integrity of data hashed using
MD5. It is safe to say that if a malicious party processes enough MD5
hashes, they will find collisions that can be exploited.
428
Collisions
• The possibility of collisions means the MD5 algorithm cannot guarantee data is
authentic. Nor is it the only hashing algorithm under threat.
• The possibility that SHA-1 collisions could be used to falsify data has
encouraged software developers to redesign their applications, replacing MD5
and SHA-1 with more secure hashing algorithms such as SHA-2.
429
Protecting Hashed Passwords
• Block 2 Part 7 showed how hashes can obscure computer
passwords, but cannot guarantee their safety, since hashed
passwords can still be compromised by a dictionary attack using a
dictionary of hashed words.
430
Protecting Hashed Passwords
• For instance:
1. A new user might choose the (terrible) password passw0rd, which is
almost certainly in any attacker’s dictionary and therefore
vulnerable.
2. The computer generates a random number, called the salt, e.g.
73950.
3. The two are joined together, creating a new password; depending
on the implementation of salting, the user’s password is
transformed into either passw0rd73950 or 73950passw0rd.
4. The new value is hashed.
5. The computer securely stores the salt alongside the hash.
• When the user next logs in, they enter their password (passw0rd);
the computer recovers their salt, recombines it with the password
and generates a hash which is compared to the stored hash.
431
Why Salt Works
• Salt greatly increases the number of possible hashes any attacker
must test in a dictionary attack. Rather than the attacker having to
generate and test a single hash value for each entry in the
dictionary, they would have to generate and test hashes for every
word combined with every possible salt value.
bed128365216c019988915ed3add75fb
• Rather than testing one hash for each word in the dictionary, we
now need to test sixteen different hashes. Salting has made a
brute-force attack sixteen times more difficult than without using
salt.
• Real-world salts are much longer than three bits; typically, salting
schemes use equal-length salts and hashes.
433
More password protection-Key stretching
• Key stretching increases the amount of time required for even the
fastest CPUs to create a hash. It has little or no effect on most
legitimate users; if a hash takes half a second to generate.
• However, if verifying a single password takes half a second, it is
impossible to perform a brute-force attack on that computer in a
reasonable amount of time.
434
More password protection-Encrypting Hashes
435
Figure 3.2 The YubiHSM, a hardware security module designed to
plug into a USB port on almost any type of computer.
The Benefits and Limitations of
Hashing
• Hashing can:
confirm data has not been changed since the hash was generated
obscure passwords from casual inspection.
• Hashing cannot:
436
Ciphers and Keys: An Introduction
to Encryption
• Encryption is a field of mathematics concerned with obscuring
information from unwanted viewers in such a way that the original
information can be recovered later. Machine encryption systems
originated during the early twentieth century, including the famous
Enigma codes of the Second World War. For most of history, encryption
was time-consuming, expensive and largely restricted to governments
and businesses.
• An encryption key is a string of bits. The longer the string (the key
length), the greater the number of possible keys. For a key length
of 𝑛, there are 2𝑛 possible keys (see Table 3.2).
Key Length Number of keys Key values
1 21 (2) 0, 1
…
440
10 210 (1024) 0000000000, 0000000001, 0000000010, …
Table 3.2 The number of possible keys available with differing key lengths
The Problem With Short Keys
• Testing a million keys per second may sound fast, but this can
easily be achieved by a modest PC. Therefore, keys must be
sufficiently long that they offer a very large number of possible
values. Keys often have lengths of 2128, 21024 or 22048 bits, producing
unimaginably large numbers of possible key values, rendering
brute-force attacks useless.
New keys are generated for each exchange of data. In the unlikely
event that a session key is broken by an attacker, later exchanges are
protected by different keys.
Issuing new keys prevents attackers recovering plaintext by
exploiting any similarities between ciphertexts when a single key is
reused on multiple pieces of plaintext.
Keys are deleted at the end of a session; they cannot be stolen by
hacking or theft of the computer.
443
The Data Encryption Standard
(DES)
• Data Encryption Standard – 56 bit keys was originally 64bits as
proposed by IBM, and then reduced to 56 bits.
• DES breaks plaintext into 64-bit blocks, each of which are divided
into two halves. One half is scrambled using an algorithm (the F-
function) which stretches, mixes and substitutes bits within the 32-
bits. The two halves are recombined, then swapped and the
process repeated. This is repeated sixteen times to produce the
final DES ciphertext. Decryption of DES ciphertext is performed by
reversing the process using the same key.
444
The Stopgap: Triple DES
• From 1999 onwards, the US government recommended users of
DES moved to so-called Triple DES (3DES) encryption. Rather than
a new form of encryption, 3DES applies the DES algorithm three
times to each of the plaintext blocks. 3DES is more secure than
DES because it uses a key-bundle usually containing two –
occasionally three – DES keys, giving a key size of either 112 or 168
bits.
446
The replacement: the Advanced
Encryption Standard (AES)
• The US Department of Commerce began replacing DES in 1997 by
soliciting expressions of interests from cryptographers to work
alongside the government in developing a new encryption
standard, unimaginatively called the Advanced Encryption
Standard (AES).
447
Meet Alice and Bob
• From now on, we are going to follow a pair of fictional characters
known as Alice and Bob: two people struggling to have a secret
conversation. Alice and Bob, occasionally joined by further
characters, were created by the cryptographer Ron Rivest in 1976
to explain cryptographic principles. A third character in this story is
the eavesdropper Eve, who desperately wants to know what Alice
and Bob are saying.
448
Meet Alice and Bob
Alice and Bob could meet, generate the key and each leave with a
copy. This might be inconvenient or even dangerous if Eve became
aware of the meeting. Alternatively;
Either Alice or Bob would generate two copies of the symmetric key.
They would keep one key and send the copy to the other person. Not
only must Alice and Bob trust one another, but one copy of the key
could be lost; or stolen or copied by Eve when it is in transit.
1. the private key that the key owner must keep safe and never
distribute.
2. the public key which can be sent to anyone with whom they
want to exchange encrypted information.
The private key is the only key that can decrypt files encrypted
with the corresponding public key.
The public key is the only key that can decrypt ciphertext450
encrypted using the corresponding private key.
How Asymmetric Cryptography
Works?
• The value of one key in a pair cannot easily be determined from
the other. Even if Alice’s public key falls into Eve’s hands, Eve can’t
recreate Alice’s private key. Therefore, the public key can be just
that – public. Public keys can be distributed by insecure methods,
such as email or by posting them to internet public key chain
servers.
451
Exchanging Secrets Using
Asymmetric Cryptography
• Let’s have the below scenario:
• Unlike symmetric keys, which are rarely longer than 256 bits,
asymmetric keys are typically very large – usually 1,024, 2,048 or
4,096 bits long. Despite their greater length, differences in the
underlying mathematics mean asymmetric keys are not
appreciably more secure than much shorter symmetric keys.
Whilst we can say a 4,096-bit asymmetric key is more secure than
a 1,024-bit asymmetric key, it is much harder to judge its security
relative to symmetric keys.
453
Asymmetric Versus Symmetric Encryption
It is tempting to think that asymmetric encryption’s ability to avoid the key
distribution problem means it can entirely replace symmetric encryption.
In fact, almost all encryption is performed using symmetric encryption
for the following reasons:
1. Symmetricencryption is fast.
Most modern CPUs can perform one or more forms of symmetric
ncryption in hardware. Symmetric encryption can also be performed
in software at very high speed, even on modest computers.
2. It uses small keys.
Generating and using symmetric keys is relatively quick compared to
creating and using the much larger asymmetric keys.
1. Alice hashes the document and encrypts the hash using her private key
to produce a digital signature.
2. Alice attaches the digital signature and the document to her email to
Bob.
3. Bob decrypts the digital signature using Alice’s corresponding public key,
revealing the hash.
4. Bob uses the same hashing algorithm as Alice to hash his copy of the
document. He then compares his hash with that from the signature.
5. If the two hashes are identical, then both Bob and Alice can be confident
that the document has not changed in transit.
456
A Simple Digital Signature
ഥ𝑨
𝑲 KB : : KA
ഥ𝑩
𝑲
Alice Bob
M
H(M) M
{𝑯(𝑴)}𝑲ഥ 𝑨 , {𝑴}𝑲𝑩
{𝑯(𝑴)}𝑲ഥ 𝑨 {𝑴}𝑲𝑩 {𝑯(𝑴)}𝑲ഥ 𝑨 {𝑴}𝑲𝑩
H(M) M
Yes
Alice sent the message =?
H(M) 457
A Simple Digital Signature
KB : : KA
ഥ𝑨
𝑲
ഥ𝑩
𝑲
{𝑯(𝑴)}𝑲
ഥ 𝑨 , {𝑴}𝑲𝑩 {𝑯(𝑴)}𝑲
ഥ 𝑨 , {𝑴′}𝑲𝑩
Bob
{𝑯(𝑴)}𝑲ഥ 𝑨 {𝑴′}𝑲𝑩
H(M) M’
M’ is not originated
H(M) ≠ H(M’)
from Alice! H(M’)
{𝑯(𝑴)}𝑲ഥ 𝑨 by {𝑯(𝑴′)}𝑲ഥ 𝑨 ?
Digital Certificates (public-key
Certificates)
• Eve’s deception succeeded because there was no way for Alice to
determine if the key came from Bob, or, as it turned out, was fake.
Eve’s scheme would fail if genuine keys were authenticated by a
trusted third party, the Certification Authority.
459
Digital Certificates (public-key
Certificates)
A digital certificate will typically include:
• A copy of the public key
• Information about the owner of the key: the owner’s name,
etc.
• Information about the digital certificate: a serial number,
expiry date, etc.
• Information about the CA itself: CA name, its own digital
signature, etc.
460
Secure Web Connections
• Web traffic is not encrypted by default; instead, web pages are
transmitted as plaintext and can be intercepted. Obviously, this
lack of security was a problem to the pioneering online shopping
companies. Some of the first online shops allowed customers to
browse online catalogues but only accepted telephone payments –
which were probably just as insecure.
462