You are on page 1of 76

BC-0504

Natureza da Informao
Introduction and some bits about the
History of Information Theory

David Correa Martins Junior


david.martins@ufabc.edu.br

Prof. David Correa Martins Junior (CMCC)


Email: david.martins@ufabc.edu.br
Email subject: [NI] Subject

Delta Building (So Bernardo), room 253


Research interests
Pattern recognition applied to Bioinformatics and Systems
Biology.

Course Goals

Goal:

To present the foundations of the Information Nature


o

I.e., the main concepts about information representation and


quantification.

Prerequisites
No formal prerequisites
High school prerequisites

Set theory
Combinatorial analysis
Probability and statistics
Logarithms properties
Text interpretation

Assessments & Grade


Assessments
Midterm Exam: 50% , November 3rd
Final Exam: 50%, December 8th
Retake Exam: December 15th

Replace the smallest of the two exam grades


Open to anyone who wants to try to improve the final
grade
Subject: All content!

Assessment & Final grade


Assessment

8,5 average = A
7,0 average < 8,5 = B
6,0 average < 7,0 = C
5,0 average < 6,0 = D
Average < 5,0 = F

Tidia
http://tidia-ae.ufabc.edu.br
Log in using your institutional
credentials. The professor will
register the students

Course material
Course material: Will be made available at
Tidia (http://tidia-ae.ufabc.edu.br) on the
section Repositrio (Resources).
Class slides
Exercise lists and answers.

IMPORTANT: study the material *BEFORE*


the class

Bring your own questions to solve them and


facilitate learning!
YOU are the main responsible for your learning.
The content is extensive: study throughout the
course (not just before the exams)

Estimated chronogram
Content
22, 28/09

Course introduction, Data, Information, Knowledge,


History of Information Theory, Semitica

29/09, 06/10 Enumeration systems, base conversion, Bit, Boolean


algebra
13, 20/10

Codes, source coding, digital/analogic conversion

26/10, 27/10 Information Theory


03/11
Midterm exam

Cronograma estimado
09 e 10/11

Contedo
Efficient coding and data compression

17 e 23/11

Error detection and correction

24/11 e
01/12
07/12
08/12
15/12

Neural coding and genetic code


Exercises and review class
Final exam
Retake exam

Recommended reading:
Decoding the Universe. Charles Seife (2006) Penguin Books.

(Este livro, embora de carter divulgativo, o que captura melhor o foco multidisciplinar da
disciplina. Apresenta um roteiro que pode ser preenchido abordando cada tpico com uma
profundidade maior . Existe somente um exemplar na biblioteca. Existe verso em pdf)

Sistemas Digitais: fundamentos e aplicaes. Thomas L.


Floyd. (Essencial para explicar os sistemas de numerao, cdigo de Hamming, a

converso A/D e D/A e o teorema de amostragem de Shannon-Nyquist. A lgebra booleana


tambm est bem explicada. Existem vrios exemplares na biblioteca)

Sistemas de Comunicao Analgicos e Digitais. Simon


Haykin (2004). (A Teoria da Informao, compresso de dados e deteco/correo
de erros)

An Introduction to Information theory. Symbols, signals and


Noise. John R. Pierce. Dover. (Embora um pouco ultrapassado, apresenta
os desafios encontrados pelos pioneiros da teoria de informao. Mostra as solues
encontradas para estes desafios)

Informational Axis
The advances of science and technology are
multiplying our capacity to collect, process, produce
and use information leading it to new levels never
reached before. It brings:
new opportunities
new social questions
more science and technology advances, fostering a
virtuous cycle

Informational Axis
Fundamentals and processes:
Information Nature: what is Information and how can
we represent or measure it?
Information Processing: manipulation and information
treatment, under both human and computer aspects
(processing)
Information communication: transmission and
distribution of the information and its impact

Natureza da
Informao

Transforma
o
da
Informao
Comunicao
da
Informao

Abstrata

Concreta

(conceitual)
Terica

(suporte)
Tecnologia

Social
(utilizao)
Humana

O Bit
Entropia
Analog. X Digital
Capac. Shannon
T. Informao

Smbolos e
Sinais
Rudo

T. Computao

Proc. Sinais

Criptografia
Complexidade

Programao
Minerao Dados
Traduo

Aprendizado
Crebro
Conhecimento
Razo/Emoo
Redes Sociais

Sist.
Comunicaes
Redes e Trfego
Eletrnica
Fotnica
Novas Tecnologias

Linguagem
Humana
Internet
Soc. Informao
Econ. Informao
Regulao/tica

Org. Computadores
Compresso Dados

T. Comunicaes
Capacidade canal
Canal gaussiano
Informao
gentica
Codificao

Proc. Estocsticos
Ordem e Desordem

Caos

Transformadas

Sentidos/percepo
Cognio e Ao
Inteligncia
Conscincia
Memria

Amostras

What this course is really about?


The course Natureza da Informao
shows how the Information is present in
our lives.
Not just at the technologic level
(telecommunications, computers,
internet)
But also at the biologic (DNA, brain) and
human (language, semiotics) levels

Data, Information and Knowledge

Data
Data: basic element that quantifies or
qualifies something
Do not carry any intrinsic meaning by

itself
Initial perception about the subject
Identified by symbolic characteristics

Ex.: screw 63 weights 60 grams

Data
A set of facts about the world;
They are usually quantifiable;
Can be easily captured and stored in
computational devices;
Do not carry meaning and neither
can be used in judgements;
So actions cant be made based on
just data.

Data
Types
Alphanumeric
Product code, price, amount, etc.

Images
Photographs

Audio
Video

etc...

Data
Basic facts
E.g.: shopping at the supermarket
Code

Description

Individual Price

Quantity

12314

Chocolate

R$ 3,50

86456

Milk

R$ 1,50

45675

Butter

R$ 2,00

54387

Juice

R$ 3,00

57871

Cheese

R$ 5,00

89452

Beer

R$ 1,50

Information
Information: interpreted and
contextualized data
Requires data interpretation
Is the result of processed data, its useful for
decision-making.

It answers questions like who, what, where and


when.

Ex.: screw 25 is the heaviest of the group


Data + processing = information

Information
A conjugated data set that has
relevance and purpose;
Can be transformed by humans and
be submitted to judgment
Semiotics (quality of information)
Information analysis to produce

knowledge

Constitute base for action (decisionmaking)

Information
Transformed data with aditional information
Code

Description

Individual Price

Quantity

12314

Chocolate

R$ 3,50

R$ 7,00

86456

Milk

R$ 1,50

R$ 12,00

45675

Butter

R$ 2,00

R$ 2,00

54387

Juice

R$ 3,00

R$ 12,00

57871

Cheese

R$ 5,00

R$ 5,00

89452

Beer

R$ 1,50

R$ 9,00

22

R$ 47,00

Total

Dairy products sold

Total Price

Information
Can help to increase the profit
Cdigo

Descrio

12314

Chocolate

R$ 3,50

R$ 7,00

86456

Leite

R$ 1,50

R$ 12,00

45675

Manteiga

R$ 2,00

R$ 2,00

54387

Suco

R$ 3,00

R$ 12,00

57871

Queijo

R$ 5,00

R$ 5,00

89452

Cerveja

R$ 1,50

R$ 9,00

22

R$ 47,00

Total

Preo Individual

Quantidade

Beverage products sold

Preo Total

Knowledge
Knowledge: ability to create a
model and suggest actions or
decisions to take
Comprehension, analysis and synthesis

start at the knowledge level


Required level to take smart decisions

Responde questes do tipo como

Ex.: heavier definition, comparison rules,

procedures (models)
Information + processing = knowledge
(processing = experience, training, etc.)

Knowledge
A structured and organized set of
information;
Requires intelligent human
judgement
Semiotics: assignment of meaning to

information;

Offers rules, comparisons,


deductions and implications

Knowledge
Ex. supermarket:
The items X and Y are frequently bought

together
Milk and chocolate

So put them near each other

Hierarchy

This hierarchy represents the sequence between the elements, but


also the volume of each one

Data x Information x
Knowledge

Data x Information x
Knowledge

Data are raw numbers or labels.


A grid segmentation could produce information (e.g. more or less
populous segments) about these raw data.
Rules that describe when a segment is red or green, for example, are
part of knowledge

What is information?
Latin etimology
from in- "into", + formare to form, shape
(into form, delineate)

Different perspectives
Biology
Linguistics
Physics
Computer Science
What is the relationship between information and
communication?
Communication: information exchange between the
actors

Information Theory
Information is inversely proportional to
the probability of occurrence of a fact
Higher probability facts are less informative
Examples:
The first time we listen a disc, it brings new musical
knowledge. But after listen to it many times, we can
predict the next accords, so this disc brings no more
information to us.
We all can predict the missing O at the word L_VE,
because its a common word. So it is unnecessary to
write that character at that position.

Goal: to quantify the information. The meaning


or quality is irrelevant a priori.
The information quality or meaning is of
interest to semiotics

History of Information Theory

Claude E. Shannon (1916-2001) is the father of Information Theory. His


book The Mathematical Theory of Communication was published in 1949
e seu livro publicado em 1949

Shannon
Em 1948, Shannon published the paper A Mathematical
Theory of Communication, republished as book next
year
Before him, isolated works went step by step towards a
general theory of communication
Nowadays the Theory of Communication (or Information
Theory) is a huge research area with many books and
symposiums about the subject

Information Theory
Information Theory is a broad theory that involves a lot of
math.
Bit is the fundamental measure to quantify information

At UFABC there is a specific course about Information


Theory

Information Theory
Information theory allows us to:
Say how many informative bits could be sent per second through
a certain communication channel
Measure the rate in which a source can produce information
Say how to represent efficiently, or encode, messages to be
transmitted through some channel.
Say how we can avoid transmission errors

The pilars of Information Theory

Thermodynamic studies
Cryptography and computers built at
the World War II
Transmission technology, starting with
Morse code and telephony

Entropy and thermodynamics

Boltzmann, the father of


thermodynamics and information theory
Entropy equation inscribed at
Boltzmanns tomb
The bases of Information
Theory are provided by
thermodynamic studies
But Boltzmann didnt know
that his studies would be the
bases of Information Theory

Industrial Revolution
1769. Watt
invented the
steam
machine
Research to
improve the
machine
efficiency

Carnots Heat engine

Carnots Heat engine


Cold
source

Cold
source

Q2

Heat
Engine

Q1
Hot source

Q2

work

work

Heat
Source
Q1
Hot source

Refrigeration

Rudolf Claussius
It is impossible to
stop the tendency
to thermodynamic
equilibrium of the
universe (1860)
In other words, the
entropy always
tends to increase

Impossible to stop the


tendency to thermodynamic
equilibrium
Cold source
Q2

Q2

Heat
Engine

Work

Q1

Heat
source

Q2

Heat
Engine

Q1

Work

Q1
Hot source

II Thermodynamics law
Entropy measures the
thermodynamics equilibrium
More entropy means more
equilibrium
The second thermodynamics law
says that the entropy amount of any
system tends to increase with time
until its maximum value

S=k log(W)
Entropy equation
inscribed at Boltzmann's
tomb
Atoms of a gas tend to
disperse itself uniformly.
The universe entropy
always increases
Later in the course we
will see the bridge
between the entropies of
Boltzmann and Shannon

Cryptography and
Computers at World War II

AF is short of water
These words
changed the
course of the
war at Pacific.
The
cryptographic
japanese code
(JN-25) was
decoded by
americans

Lets attack AF!


Lets attack AF!

AF??? What is AF???

Would AF be Midway Island?


Admiral Yamamoto

Commander Rochefort

Midway Island is out of water!


Midway Island is out of water!

AF is out of water!

AHA!!! I got you!!!


Commander Rochefort

Admiral Yamamoto

End of War
Americans waited
for the japanese
arrival
Four Japanese
aircraft carriers
were destroyed.
The end of World
War II began

European U-boats

Enigma
The german Arthur
Scherbius invented
Enigma, a machine
to encrypt messages
3x10114 different
states possible
Brute force would
require: each atom
on universe
computing a trillion
of keys per second,
since the beginning
of universe

Turing and colleagues broke


the Enigma code
All characters were
swapped at each time
The corresponding
symbol to character
F was never F itself
Usual sentences (e.g.
the weather is good
today) allowed the
destruction of U-boats
and the end of war.

The theoretical Turing


Machine
The machine reads,
writes and deletes
bits in an endless
string
The Turing Machine
has universal
computability
I.e. any current
computer is a Turing
Machine

The allied cryptography helped


to end the World War II and
started the Information Era
Turing
contributed to
the end of World
War II

Cryptography
A simple cryptographic method is to
replace each letter (character) by the
letter placed N positions forward in
the alphabet
Caesar cipher

Try to decode the following message:


Q xgpvq jqlg guvcxc owkvq hqtg
O vento hoje estava muito forte
n = -2

Redundancy

-s v-n-s d- c--nc-- e d- t-cn-l-g-- -stm-lt-pl-c-ndo -s n-ss-s c-p-c-d-d-s dc-l-t-r, tr-t-r, g-r-r - -t-l-z-r
-nf-rm--es.

Our brain takes advantage of redundancy to


decode texts as the following:
De aorcdo com uma peqsiusa de uma
uinrvesriddae ignlsea, no ipomtra em
qaul odrem as Lteras de uma plravaa
etso, a ncia csioa iprotmatne que a
piremria e tmlia Lteras etejasm no lgaur
crteo. O rseto pdoe ser uma bguana ttaol,
que vco anida pdoe ler sem pobrlmea.
Itso poqrue ns no lmeos cdaa Ltera
isladoa, mas a plravaa cmoo um tdoo.
Sohw de bloa.

Or like this one...


35T3 P3QU3N0 T3XTO 53RV3 4P3N45 P4R4
M05TR4R COMO NO554 C4B34
CONS3GU3 F4Z3R CO1545 1MPR3551ON4ANT35!
R3P4R3 N155O! NO COM3O 35T4V4
M310 COMPL1C4DO, M45 N3ST4 L1NH4 SU4
M3NT3 V41 D3C1FR4NDO O CD1GO
QU453 4UTOM4T1C4M3NT3, S3M PR3C1S4R
P3N54R MU1TO, C3RTO? POD3 F1C4R
B3M ORGULHO5O D155O! SU4 C4P4C1D4D3
M3R3C3! P4R4BN5!

Technologies for
Information Transmission

Origins of the modern


Information Theory
There are mathematical analogies between the
Information entropy and the entropies of
Thermodynamic and Statistical Mechanic, but
the modern Information Theory are rooted at the
origins of electronic communication

Telegraph origins
1838: Samuel B. Morse worked with Alfred Vail on a
code known today as the Morse code:
Alphabetic characters are represented by spaces, dots and
dashes
Electronic transmission was achieved representing spaces by
current absence, dots by short currents, and dashes by long
currents

Combinations of these symbols were associated with characters


E (the most frequent character in English) was associated to a
single dot.

Morse Code

The Morse code and


Information Theory
Important question:
Could another mapping (dots, dashes, spaces) imply to faster
telegraphic transmissions of English texts?

Answer:
Using the modern Information Theory we found that the
transmission rate gain would be about 15% at most.
This suggests that Morse intuitively attacked on of the main
problems addressed by Information Theory.

Telegraph limitations
Limitations summary

Limits related to the signal transmission speed


Interferences (noises)
Hard to distinguish among many possible current values
Limited current intensity to avoid destruction of the cable
insulation

A more precise mathematical analysis was required

Contributions to the
Information Theory
Many people contributed mathematically to the information
theory at XIX century:

Lorde Kelvin (William Thomson)


Alexander Graham Bell (telephone inventor in 1875)
Henri Poincar
Oliver Heaviside
Michael Pupin
G. A. Campbell (ATTC)

But the biggest contribution was obtained by Joseph


Fourier

Fouriers contribution
Fourier based his works on the sine function

Showed that any function (including electric signals)


can be decomposed in a sum of sines with different
amplitudes, phases and frequencies.

Kolmogoroffs and Wieners


contributions
In the 1940s Russians and Americans solved
independently the problem of estimating the
correct signal from an unknown noisy signal.

Finally Shannon
So when Shannon published his works in 1947,
much was already done
In a certain way, he summarized and produced
new knowledge about all these problems
previously studied by other researchers

Shannons contribution
But we could say that his great contribution was to answer
the following questions:
How could we encode (with electric signals) a message from a
source to transmit it as fast as possible through a channel that
introduces noises with a certain pattern?
How fast could we transmit certain message through specific
channels without errors?

All we have seen until now is part of what we call


Information Theory. In this course we will see introductory
topics about this theory.

To do before next class


Study the slides and answer the exercise list
corresponding to the first week (Semiotics)
Section Repositrio (Resources) at Tidia
No need to deliver the list for grading, but it is
fundamental to do the exercises and study
before classes

and consequently for the exams

You might also like