You are on page 1of 46

Discussion #34

Data Compression
"Mercifully, our errors can soon be swallowed
up by resilient repentance, showing the faith
to try again--whether in a task or in a
relationship. Such resilience is really an
affirmation of our true identities! Spirit sons
and daughters of God need not be
permanently put down when lifted up by
Jesus' Atonement. Christ's infinite Atonement
thus applies to our finite failures!"
- Neal A. Maxwell
Data Compression
How big?
• Image 1024x1024x3
– 3 Million bytes (3 MB)
• Audio - 48000 x 10 min x 60 sec/min x 2
– 58 million bytes (58 MB)
• Video
– 640 x 480 x 10 minutes
– 307,200 x 600 sec x 30 fps
– 16.6 billion pixels (17 GB)
• Compression (reduce the size)
Problem
• Reduce the size of a data object
– Text
– Image
– Audio
– Video
• How to do it
– Cheat in ways that the user can’t see
– Coherence
Ways to cheat
• Text generally only has less than 128
possible characters.
– Use 7 bits instead of 8 (12%)
• For text, some characters are more common
than others
– Use fewer bits for common characters, more
bits for infrequently used characters
Ways to cheat
• People can’t see more than 64 levels of gray
– Use 6 bits instead of 8 (25%)
• People don’t see color as well as B/W
– Use 6 bits for B/W and much less for color
Coherence
• If we know the previous value of
something, then we generally have a good
idea what the next value will be

• 3 Techniques
– Run length encoding
– Reuse of subsequences
– Prediction and error
Run length encoding
• Values are frequently repeated.
– Instead of storing each value, store a single
value with a count of how many times to repeat
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11

• 12 x 10 = 120 pixels
• 120 pixels x 3 bytes/pixel = 360 bytes
Run encoded
12
0 1 2 3 4 5 6 7 8 9 6
0 4
1 1
2 9 RGB - 3 bytes
1
3 9
Count - 1 byte
4 1
5 9
1
Entries - 23
6
7 3
6
8 1
Space - 4*23 = 92
9 9
10 1
11 9
1
Compression
9 (360-92)/360 = 74%
1
9
1
5
12
Run encoded - with indexed color
12
0 1 2 3 4 5 6 7 8 9 6 4 colors - 12 bytes
0 4
1 1
2 9 index - 2 bits
1 Count - 6 bits
3 9
4 1
5 9 Entries - 23
6 1
7 3
6 Space - 12+1*23 = 35
8 1
9 9
10 1
11 9
1
Compression
9 (360-35)/360 = 90%
1
9
1
5
12
Run encoding

Works well

HELLO
Run encoding

Works Badly
Run encoding

Works well
Run encoding

Not good

Too much variation


in the rose
Run encoding - text
four score and
seven years
ago, our
fathers Not good
brought forth
on this no repetition
continent
Run Encoding - Audio

Not good

No repetition
Run Encoding - Audio

Not good

No repetition
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
Reuse common sequences
Reuse common sequences
Reuse common sequences

Works really well

Used in GIF format


Reuse common sequences

Works fair

Blacks are good


Rose has some similarities
Reuse common sequences
Reuse common sequences

Works really well


Reuse common sequences

Works poorly
Reuse common sequences
Video

Works really well

Copy pieces from last


frame into this frame

One technique in MPEG


Reuse common sequences
Text
• Reuses words and phrases

• Works fairly well

• Most common text compression technique


Prediction + error
• Given previous values, predict what the
next value will be

• When it is not quite right, store the error

• The error almost always takes fewer bits


than the value
Linear prediction
line through previous predicts
next

Little error
Linear prediction
line through previous predicts
next

More error
Linear prediction
line through previous predicts
next

Still more error


Linear prediction
line through previous predicts
next

less error
Linear prediction
line through previous predicts
next

less error
Linear prediction
line through previous predicts
next

little error
Linear Prediction
Linear Prediction
More error
• Look closer

Little Error
Linear Prediction
• Prediction + error
• Shades of black
• Follows shade of rose
• Rose detail is error off
shade

• Prediction + error +
cheating = JPEG
Video
• Copy from
previous frame
• Store error for
small details

• MPEG
Text
• N-Grams

• Use the last N letters to predict the next letter

• Store errors

• English is quite regular


Review
• Cheat
– Exploit weakness in what people can perceive
• Coherence
– Run encoding (count repetitions)
– Reuse (reference pieces from previous data)
– Predict + error

• Know when each technique will or will not work

You might also like