Professional Documents
Culture Documents
Data Compression
"Mercifully, our errors can soon be swallowed
up by resilient repentance, showing the faith
to try again--whether in a task or in a
relationship. Such resilience is really an
affirmation of our true identities! Spirit sons
and daughters of God need not be
permanently put down when lifted up by
Jesus' Atonement. Christ's infinite Atonement
thus applies to our finite failures!"
- Neal A. Maxwell
Data Compression
How big?
• Image 1024x1024x3
– 3 Million bytes (3 MB)
• Audio - 48000 x 10 min x 60 sec/min x 2
– 58 million bytes (58 MB)
• Video
– 640 x 480 x 10 minutes
– 307,200 x 600 sec x 30 fps
– 16.6 billion pixels (17 GB)
• Compression (reduce the size)
Problem
• Reduce the size of a data object
– Text
– Image
– Audio
– Video
• How to do it
– Cheat in ways that the user can’t see
– Coherence
Ways to cheat
• Text generally only has less than 128
possible characters.
– Use 7 bits instead of 8 (12%)
• For text, some characters are more common
than others
– Use fewer bits for common characters, more
bits for infrequently used characters
Ways to cheat
• People can’t see more than 64 levels of gray
– Use 6 bits instead of 8 (25%)
• People don’t see color as well as B/W
– Use 6 bits for B/W and much less for color
Coherence
• If we know the previous value of
something, then we generally have a good
idea what the next value will be
• 3 Techniques
– Run length encoding
– Reuse of subsequences
– Prediction and error
Run length encoding
• Values are frequently repeated.
– Instead of storing each value, store a single
value with a count of how many times to repeat
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
• 12 x 10 = 120 pixels
• 120 pixels x 3 bytes/pixel = 360 bytes
Run encoded
12
0 1 2 3 4 5 6 7 8 9 6
0 4
1 1
2 9 RGB - 3 bytes
1
3 9
Count - 1 byte
4 1
5 9
1
Entries - 23
6
7 3
6
8 1
Space - 4*23 = 92
9 9
10 1
11 9
1
Compression
9 (360-92)/360 = 74%
1
9
1
5
12
Run encoded - with indexed color
12
0 1 2 3 4 5 6 7 8 9 6 4 colors - 12 bytes
0 4
1 1
2 9 index - 2 bits
1 Count - 6 bits
3 9
4 1
5 9 Entries - 23
6 1
7 3
6 Space - 12+1*23 = 35
8 1
9 9
10 1
11 9
1
Compression
9 (360-35)/360 = 90%
1
9
1
5
12
Run encoding
Works well
HELLO
Run encoding
Works Badly
Run encoding
Works well
Run encoding
Not good
Not good
No repetition
Run Encoding - Audio
Not good
No repetition
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Reuse common sequences
Reuse common sequences
Reuse common sequences
Reuse common sequences
Works fair
Works poorly
Reuse common sequences
Video
Little error
Linear prediction
line through previous predicts
next
More error
Linear prediction
line through previous predicts
next
less error
Linear prediction
line through previous predicts
next
less error
Linear prediction
line through previous predicts
next
little error
Linear Prediction
Linear Prediction
More error
• Look closer
Little Error
Linear Prediction
• Prediction + error
• Shades of black
• Follows shade of rose
• Rose detail is error off
shade
• Prediction + error +
cheating = JPEG
Video
• Copy from
previous frame
• Store error for
small details
• MPEG
Text
• N-Grams
• Store errors