EC02 Informatica Infocod

Informtica
Ing. Aeronutica
Joan Vila Carb & Enrique Hernndez Orallo

Information coding
mircoles 12 de febrero de 14
Informtica
Ing. Aeronutica
Information coding
8as|c concepts
Number|ng systems
Lncod|ng s|gned numbers
Lncod|ng rea| numbers
Lncod|ng texts
kedundancy and compress|on
2
J. Vila & E. Hernndez
Informtica
Ing. Aeronutica
Introduction
8as|c concepts
All information processed by a digital computer needs to be encoded:
transformed it into a form of representation suitable for the computer.
- Numerical values: magnitudes of computer applications related to geometry
(longitudes, angles,...), physics (pressure, temperature, volumes, forces,...),
mathematics, statistics, nances, etc
- Text information in different formats, like books, reports, manuals, etc
- Media information: graphics, images, videos, sounds, etc.
- Computer programs
Computers encode information using a binary numbers rather than decimal
numbers.
- Binary encoding will be introduced in short.
3
Informtica
Ing. Aeronutica
Introduction
8as|c concepts
Information range: the area of variation between upper and lower limits of a
magnitude. All the set of different values or codes that the information may take.
- Example: assume that a magnitude like a length is encoded in decimal with 3
integer digits and 2 decimal digits. The information range is [000.00 ... 999.99].
Information accuracy: information resolution, i.e, the minimum representable
information value.
- Example: in the above representation, accuracy is 0.01 units.
Information volume: amount of information. Number of measurements
(information instances) of a magnitude times the number of digits of each
measurement.
- Example: in the above representation, 1000 length measurements have a
volume of 5000 digits.
- Useful to compute the capacity of a storing device.
4
Informtica
Ing. Aeronutica
Introduction
8as|c concepts
Information compression: reduction of the information volume by
- Removing redundancy
! Represent repeated items in a compact form. Example 1000 consecutive white
pixels in a graphic.
- Reducing accuracy
! Use a representation with less digits.
5
Informtica
Ing. Aeronutica
Information coding
8as|c concepts
Number|ng systems
Lncod|ng texts
6
Informtica
Ing. Aeronutica
Numbering systems
os|nona| number|ng systems
Numbers are represented as a sequence of digits where each digit has a weight
according to its position.
Numbering in base b uses set of digits D= {0,1,2, ..., b-1}
Assume X is represented as X
n
X
n!1
...X
2
X
1
X
0
,X
!1
,X
!2
,...X
!m
in base b.
Example:
- In decimal b =10 and D= {0,1,2, ..., 9}
7
X =
n
X
i=m
X
i
b
i
1, 234.56
10
= 1 10
3
+ 2 10
2
+ 3 10
1
+ 4 10
0
+
+ 5 10
1
+ 6 10
2
(1)
Informtica
Ing. Aeronutica
Numbering systems
1he b|nary system
In binary b =10 and D= {0,1}
Converting from binary to decimal: use eq. (1)
Converting from decimal to binary:
8
110100
2
= 1 2
5
+ 1 2
4
+ 0 2
3
+ 1 2
2
+ 0 2
1
+ 0 2
0
= 52
10
2
2
2
2
26
0
13
1
6
0
3
1
1
1
2
0
Stop
26
(10
= 11010
(2
2 into 26 goes 13 times and 0 is left over
2 squared, 2 cubed, 2 to the fourth,...
Informtica
Ing. Aeronutica
Numbering systems
1he b|nary system
A bit is binary digit.
The range of a representation in base b with n digits is [0 ... b
n
!1]
- The range corresponds to Pr(2,n): n-permutations of 2 elements with repetition.
- Example: b=2, n=3
! [0 ... 2
3
! 1] = [000, 001, 010, 011, 100, 101, 110, 111]
A byte is an 8-bit binary code.
- The range of this representation is [0
10
... 255
10
].
The information volume in the binary system is usually measured as the number
of bytes. Multiples of the byte:
- Kb Kilobyte = 2
10
bytes =1,024 bytes
- Mb Megabyte = 2
20
bytes =1,024 Kbytes
- Gb Gigabyte = 2
30
bytes =1,024 Mbytes
- Tb Terabyte = 2
40
bytes =1,024 Gbytes
9
Informtica
Ing. Aeronutica
Numbering systems
1he b|nary system
Binary codes can be used to encode numbers, colors, products of a vending
machine, etc...
Information volume: it is usual to round the result to the nearest power of two.
- Example: capacity of a disk to store the sequence of codes of sold products in a
vending machine. Assume that the machine capacity is 1000 products and there are
16 different products.
! 16 products require 4 bit encoding
! 4 bit x 1000 prods = 4000 bits = 4000/8 bytes = 500 bytes ! 2
9
bytes = 0.5 Kbyte
10
Informtica
Ing. Aeronutica
Numbering systems
8|nary ar|thmencs
Truth tables for operations with one bit:
Overows: an operation yields a result that exceeds the representation range.
- Using a 1-bit representation, 1+1 is 10, which requires a 2-bit representation.
- We say carry is 1.
11
Overows:
(**) 1 plus 1 = 0 carry 1
(*) 0 minus 1 = 1 borrow 1
Informtica
Ing. Aeronutica
Numbering systems
8|nary ar|thmencs
Algorithms for operations with two bits or more:
12
Addluon
SubLracuon
Mulupllcauon
!"##$
!"##$
%&##&'
Informtica
Ing. Aeronutica
Numbering systems
nexadec|ma| ("hex") system
In hexadecimal b=16, d = {0, 1, 2, ..., 9, A, B, C, D, E, F}
Converting from hex to decimal:
Converting from decimal to hex: algorithm successive divisions
Converting between binary and hex
- Group binary digits in fours. Four binary digits correspond to one hex digit
13
0x7F9A = 7 16
3
+F 16
2
+ 9 16
1
+A16
0
=
7 16
3
+ 15 16
2
+ 9 16
1
+ 10 16
0
= 43, 511
10
8 A C 2
16

Informtica
Ing. Aeronutica
Information coding
8as|c concepts
Number|ng systems
Lncod|ng texts
14
Informtica
Ing. Aeronutica
Encoding signed numbers
S|gn-and-magn|tude representanon
Allocate the most signicant bit to represent the sign.
- 0 for positive numbers, 1 for negative numbers.
The remaining bits indicate the magnitude (or absolute value).
Example:
- 00101010
2
= 42
10 ,
10101010
2
= -42
10
Representation range with n bits: [!2
n!1
! 1, ..., 2
n!1
! 1].
- With 8 bits: [!127,...,+127]
Disadvantages:
- Two different zeros: 00000000 (0) and 10000000 (-0).
15
Informtica
Ing. Aeronutica
1wo's-comp|ement representanon
The representation of a negative number -x in n-bits is dened as its twos-
complement.
The twos complement can be calculated in decimal as 2
n
! x modulus 2
n
:
! -x ! C
2
(x,n) = (2
n
! x) % 2
n
(with |x| < 2
n
)
! Examples:
! C2(3,4) = (2
4
! 3) % 2
4
= 1310 = 11012 " -3
! C2(3,8) = (2
8
! 3) % 2
8
= 25310 = 1111 11012 (sign extension) " -3
! C2(-3,4)= (2
4
+ 3) % 2
4
= 310 = 00112 " 3
! C2(0,4) = (2
4
! 0) % 2
4
= 010 = 00002 " 0
16
Informtica
Ing. Aeronutica
The twos complement of a binary n-bit representation is a new representation
with range [!2
n!1
, ..., 2
n!1
! 1] in which:
- Codes [0,... ,2
n!1
! 1] " positive integers [0,... ,2
n!1
! 1]
- Codes [2
n!1
,... ,2
n
! 1] " negative integers [-2
n-1
,... ,! 1]
Example
- n=4 " range = [-8, 7]
- codes [0,7]"positive integers [0,7], codes [8,15] " negative integers [-8,-1]
17
blnary code unslgned
1wo's
compl.
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4 4
0101 3 3
0110 6 6
0111 7 7
blnary code unslgned
1wo's
compl.
1000 8 -8
1001 9 -7
1010 10 -6
1011 11 -3
1100 12 -4
1101 13 -3
1110 14 -2
1111 13 -1
Informtica
Ing. Aeronutica
Converting from decimal to twos complement:
1. Invert the bits
2. Add one
- Example:
! C2(3,4) : 0011 "(invert)" 1100 "(add 1)" 1101
! C2(-3,4) : 1101 "(invert)" 0010 "(add 1)" 0011
Converting from twos complement to decimal:
- if it is a positive number (MSB=0), then apply weighted digits eq. (1):
! Example: 0111 = 0 x 2
3
+ 1 x 2
2
+ 1 x 2
1
+ 1 x 2
0
= 710
- if it is a negative number (MSB=1), then compute the twos complement and
apply weighted digits eq. (1) to get the absolute value. Next, change the sign of
the absolute value.
! Example: 1111 "(invert)" 0000 "(add 1)" 0001
! 0001 = 0 x 2
3
+ 0 x 2
2
+ 0 x 2
1
+ 1 x 2
0
= 110 " -110
18
Informtica
Ing. Aeronutica
Property 1: x - y = x + (-y) = x + C
2
(y,n)
! Example
! 2 - 3 = 0010 ! 0011 = 1111 = -110
! 2 - 3 = 2 + (-3) = 0010 + 1101 = 1111 = -110
Property 2: only one zero " 0......0000
Property 3: sign extension.
- Positive numbers have MSB=0
- Negative numbers have MSB=1
- C
2
(3,4) = 1101
2
" C
2
(3,8) = 1111 1101
2
(sign extension)
19
Informtica
Ing. Aeronutica
Lxcess-k representanon
Excess-K (also called biased representation) of an n-bit representation is a
representation with range [!K, ..., 2
n
!1!K] that uses a pre-specied number K
as a biasing value to displace the origin of the representation so as to map the
most negative number of the representation (-K) to the code 0000.
Example
- Excess-K, K=8, n=4 " range = [-8, 7]
- codes [0,7]"negative integers [-8,-1], codes [8,15] " positive integers [0,7]
20
blnary code unslgned Lxcess-8
0000 0 -8
0001 1 -7
0010 2 -6
0011 3 -3
0100 4 -4
0101 3 -3
0110 6 -2
0111 7 -1
blnary code unslgned Lxcess-8
1000 8 0
1001 9 1
1010 10 2
1011 11 3
1100 12 4
1101 13 3
1110 14 6
1111 13 7
Informtica
Ing. Aeronutica
Converting from decimal to excess-K:
- Add K to x in decimal and then convert it to binary
- Examples: Assume n=4, K=8.
! x=-3 !(add 8)! -3+8=5 !(binary)! 0101
! x=3 !(add 8)! 3+8=11 !(binary)! 1011
Converting from excess-K to decimal:
- Convert it to decimal and then subtract K
- Examples: Assume n=4, K=8.
! x=0011 !(decimal)! 3 !(subtract 8)! 3-8=-5
! x=1011 !(decimal)! 11 !(subtract 8)! 11-8=3
21
Informtica
Ing. Aeronutica
Property 1: it is monotonic increasing, so it eases to perform comparisons (>, <,
etc.).
Property 2: the excess-K representation with K=2
n!1
matches the twos-
complement representation by inverting the most signicant bit (sign bit).
22
Informtica
Ing. Aeronutica
Summary of s|gned representanons
23
b|nary code uns|gned
s|gn &
magn|tud2
1wo's
comp|.
Lxcess-8
0000 0 0 0 -8
0001 1 1 1 -7
0010 2 2 2 -6
0011 3 3 3 -3
0100 4 4 4 -4
0101 3 3 3 -3
0110 6 6 6 -2
0111 7 7 7 -1
1000 8 -0 -8 0
1001 9 -1 -7 1
1010 10 -2 -6 2
1011 11 -3 -3 3
1100 12 -4 -4 4
1101 13 -3 -3 3
1110 14 -6 -2 6
1111 13 -7 -1 7
Informtica
Ing. Aeronutica
Information coding
8as|c concepts
Number|ng systems
Lncod|ng texts
24
Informtica
Ing. Aeronutica
Encoding real numbers
I|xed-po|nt representanon
It has a xed number of digits n after the decimal point and (sometimes) a xed
number of digits m before the decimal point.
- Examples in decimal: n = 3, m = 4 " 0200.003, 1200.100
- Examples in binary: n = 3, m = 4 " 0100.001, 1100.100
Drawbacks: loss of accuracy and overow
- Overow occurs, f.e., when the result of a xed point multiplication could
potentially have as many bits as the sum of the number of bits in the two
operands.
- Loss of accuracy occurs, f.e., after a sequence of operations with truncated
results.
! Example with n = 3 and m = 4
! 0023.941 x 0000.001 = 0000.023.
! If we want to recover the original number: 0000.023 x 1000.000 = 0023.00
25
Informtica
Ing. Aeronutica
I|oanng-po|nt representanon
It consists of a xed number of signicant digits, called mantissa, which are
scaled them using an exponent. The base for the scaling is usually 2 or 10:
mantissa " base
exponent
- Examples of the same number using different exponents (scaling factors):
! 1125.0"10
0
112.5"10
1
11.25"10
2
1.125"10
3
0.1125"10
4
- The point can oat, i.e., be placed anywhere relative to the signicant digits of
the number.
Normalized representation: the one that the point follows the most signicant
digit different from zero: 1,125"10
3
.
Advantage: it supports a much wider range of values with the same number of
digits
26
Informtica
Ing. Aeronutica
ILLL Standard for I|oanng- o|nt Ar|thmenc (ILLL 7S4)
It describes several formats with different accuracies. A given format comprises:
- Finite numbers. Described by three integers (s,c,q). The value of the number is:
- (!1)
s
" c " b
q
- Two innities: +" and !".
- Two kinds of NaN (Not A Number)
Finite numbers:
- s: the sign (zero or one).
- c: is the mantissa (also called signicand or coefcient).
! Uses sign-and-magnitude format. The sign of the mantissa is the sign bit.
! Normalized format " the point follows the most signicant digit different from zero. Since
this bit is always a 1, it is implied and there is no need to store it.
- q: is the exponent.
! Uses excess-K representation with K = 2
ne !1
! 1 ,where ne: number of bits of the exponent.
! K=15 for IEEE-16, K=127 for IEEE-32, and K=1023 for IEEE-64.
- b: is the base which may be 2 or 10.
27
Informtica
Ing. Aeronutica
28
Name Base Digits Digits Digits Min. Max.
Total Mantissa Exponent Number Number
Half precision 2 16 10+1 5 9.3132 10
10
4.2949 10
9
Single precision 2 32 23+1 8 1.1754 10
38
3.4028 10
38
Double precision 2 64 52+1 11 2.2250 10
308
1.7977 10
308
!"#$ %&'($)$* +,$-..,
/ 0 /1
/ 2 34
/ // 03
! !!!! !!!! !!!! !!!
! !!!! !!!! !!!! #!!
$ %%%% %%%% %%%% !!!
% %%%% %%%% %%%% %%&
% %%%% %%%% %%%% %%%
! !!!! !!!! !!!! #!'
!"#$%&
'()*%&
+,%-
56
6
7,7
+"$
1
+,&
Informtica
Ing. Aeronutica
Converting from oating-point to decimal - Example single: 7F7F FFFF
16
- Sign : leading bit " 0 " positive number.
- Exponent: 8 bits after the sign " 111 1111 0
2
= 254
10
.
- It is in in excess-127 " 254 ! 127 = 127
- Mantissa: 23 bits after the exponen plus the implied bit which is always 1.
- It is: 1,11111....1. Represented with sign-and magnitude.
- Use eq (1) to get the value in decimal:
- 1#2
0
+ 1#2
!1
+ 1#2
!2
+ + 1#2
!23
= 1.999999880790710 " 2
Result: +2 " 2
127
! 3.4028 " 10
38
29
Informtica
Ing. Aeronutica
Converting from decimal to oating-point (1) - Example -29.6875 to double
- Convert the absolute value of the number to binary. Convert the integral and
fractional parts separately:
! 2910 = 111012
! 0.6875 " 2 = 1.375 " 1
! 0.3750 " 2 = 0.750 " 0
! 0.75 " 2 = 1.5 " 1
! 0.5 " 2 = 1.0 " 1
! 0.687510= 0.10112 " 29.687510= 11101.10112 = 11101.10112 " 2
0

- Normalize the number: 11101.10112 " 2
0
" 1.110110112 " 2
4
- Generate the mantissa. Omit the implied one. Fill with zeros on the right up to the
52 bits of the mantissa. Using hex notation:
! 1101 1011 0000 0000 ... 00002 = D B000 0000 000016
30
Informtica
Ing. Aeronutica
Converting from decimal to oating-point (2): example -29.6875 to double
- Generate the exponent: expressed in excess-1023. For IEEE-64 the bias is
1023. Add the bias:
! 4
10
+ 1023
10
= 1027
10
= 100 0000 0011
2
= 403
16
- Set the sign bit: 1 " negative
- Place the sign, exponent, and mantissa into the elds of the IEEE format:
Result: !29.6875
10
= C03D B000 0000 0000
16
IEEE64
31
Informtica
Ing. Aeronutica
Information coding
8as|c concepts
Number|ng systems
Lncod|ng texts
32
Informtica
Ing. Aeronutica
Encoding texts
1he ASCII code
American
Standard
Code for
Information
Interchange
33
decimal hex character decimal hex char decimal hex char decimal hex char
Informtica
Ing. Aeronutica
Encoding texts
1he ASCII code
The American Standard Code for Information Interchange is a 7-bit coding
scheme that supports the English alphabet and control characters.
Example:
- sends to the console the following codes (decimal):
- 65 110 32 9 32 65 83 67 73 73 32 13 32 10 32 116 101 120 116 32 10
Drawback: it lacks for symbols from other languages
Solutions:
- Extended 8-bit ASCII coding: ISO 8859-1 standard, known as ISO Latin 1
- Unicode ...
34
Informtica
Ing. Aeronutica
Encoding texts
nand||ng ASCII codes |n Mat|ab
char() : converts ASCII codes to characters
>> char(65)
ans =
A
>> char([65,66,67,68])
ans =
ABCD
abs() : converts strings to arrays of ASCII codes
>> abs('A')
ans =
65
>> abs('Hello\n')
ans =
72 101 108 108 111 92 110
35
Informtica
Ing. Aeronutica
Encoding texts
1he Un|code standard (ISC]ILC 10646)
Attempt to create a universal character set with support for most of the worlds
writing systems.
Not only a character chart; it denes a complete encoding methodology. It deals
with aspects like:
- Character properties (upper and lower case)
- Rules for composition of characters with different types of accents
- Normalization rules for obtaining equivalent forms, etc
It species a name and a unique numeric identier for each character or symbol,
named the code point.
Originally this identier was intended to be coded as a 16-bit integer, but over
time it proved to be insufcient.
36
Informtica
Ing. Aeronutica
Encoding texts
1he Un|code standard (ISC]ILC 10646)
Unicode denes three encoding forms under the name UTF (Unicode Transformation
Format):
- UTF-8 - byte oriented coding with variable length symbols (1 to 4 bytes per Unicode
character).
! One-Byte: Those listed in US-ASCII, a total of 128 characters.
! Two-byte: A total of 1920 characters. Includes the characters romances diacritics, and
Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac ...
! Three-byte: Unicode Basic Multilingual Plane, which together with the previous group,
includes CJK characters in the group: Chinese, Japanese and Korean.
! Four-byte: Supplemental multilingual plane. Mathematical symbols. Linear B syllabic and
ideographic alphabet Persian, Phoenician ... And the supplementary ideographic plane:
Han characters used unusual.
- UTF-16 - it uses a 16-bit code for the Basic Multilingual Plane (BMP) and two 16-bit
(surrogates pairs) for additional less frequent planes.
- UTF-32 - 32-bit encoding of xed length, and the simplest of the three.
World Wide Web was ASCII until December 2007, when it was surpassed by UTF-8.
37
Informtica
Ing. Aeronutica
Encoding texts
Iormaued text
A markup language is a way to encrypt a document which, in addition to the
text, includes labels or markings to specify the structure of the text.
- Examples: HTML, nroff, troff, LaTeX, RTF
RTF (Rich Text Format) used for text editing:
{\rtf1\ansi\ansicpg1252\cocoartf1138
{\fonttbl\f0\froman\fcharset0 TimesNewRomanPSMT;}
{\colortbl;\red255\green255\blue255;}
\pard
This is a {\b boldface} example.
}
Presentational markup: used by traditional text editors. Marking is performed by
the text editor in such a way that marking is hidden from human users
producing the WYSIWYG (What You See Is What You Get) effect.
Procedural marking: used by LaTeX and some HTML editors. In these systems
the user explicitly writes the formatting labels in the source le.
38
Informtica
Ing. Aeronutica
Information coding
8as|c concepts
Number|ng systems
Lncod|ng texts
39
Informtica
Ing. Aeronutica
Redundancy and compression
kedundant encod|ng
Information may get corrupted when it is transmitted through communication
lines or stored in disks or other storing devices.
Redundancy is used to detect and to detect-and-correct errors.
- Error detection: parity bit, checksums
- Error detection and correction: ECC. They require higher levels of redundancy.
40
Informtica
Ing. Aeronutica
kedundant encod|ng
A parity bit: redundant bit added to a set of bits to ensure that the number of
bits with value 1 in the outcome is even or odd.
- Even parity: 1100 0011
- Odd parity: 0100 0011
Parity bits are often used when transmitting ASCII characters from/to peripherals.
41
!"#
$%&'$ $%&'$
(')*+,
-)+.($*((*%.
Informtica
Ing. Aeronutica
Informanon compress|on
Data compression: process of transforming an encoded information using fewer
bits than the original representation uses.
- Goal: to reduce the information volume and the consumption of expensive
resources, such as hard disk space or transmission bandwidth.
It has a cost: extra processing for compressing-decompressing.
- Trade-off between the costs of encoding and decoding: time consuming
compression " time efcient decompressing. And viceversa.
Two types of compression:
- Lossless compression: the encoded data is not distortioned or modied, so it
can reconstructed from the compressed data.
! Example: text compression. ZIP format
- Lossy compression: the original data is only approximately represented. It only
allows to reconstruct an approximation of the original data.
! Example: image/audio compression. PNG, GIF, MPEG, MP3 formats
42
Informtica
Ing. Aeronutica
Loss|ess compress|on
Lossless algorithms usually exploit statistical redundancy in such a way that
more frequent data are represented with fewer bits.
Huffman coding
- Example: text with only four characters: , A, B, C with frequencies 45%,
35%, 15% and 5% respectively.
- Compressing ratio:
43
!"#$ " !"#$%
!"%$ " #&$
!"&$ " #'$
!"!$ " #($
!"'!
!"$$
&"!!
!!!
!!"
!!
!
!"
"
r = 1
0.45 1 + 0.35 2 + 0.15 3 + 0.05 3
2
= 12.5%
Informtica
Ing. Aeronutica
Lossy compress|on
It compresses data by discarding (losing) some of it.
Usually based on perceptual coding: transforming the raw data obtained from a
device to a domain that more accurately reects the information content.
- Example: a sound le can be more efciently represented as the frequency
spectrum over time than as the amplitude levels.
Lossy encoding/decoding programs are usually known as codecs.
Key point: required accuracy or Quality of Service (QoS).
- Example: image qualities for video conference 640x480, 800x600,
1920x1080, ...
44

EC02 Informatica Infocod

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EC02 Informatica Infocod

Uploaded by

Copyright:

Available Formats

Informtica

You might also like