Data Compression

Discussion #34
Data Compression
"Mercifully, our errors can soon be swallowed
up by resilient repentance, showing the faith
to try again--whether in a task or in a
relationship. Such resilience is really an
affirmation of our true identities! Spirit sons
and daughters of God need not be
permanently put down when lifted up by
Jesus' Atonement. Christ's infinite Atonement
thus applies to our finite failures!"
- Neal A. Maxwell
Data Compression
How big?
• Image 1024x1024x3
– 3 Million bytes (3 MB)
• Audio - 48000 x 10 min x 60 sec/min x 2
– 58 million bytes (58 MB)
• Video
– 640 x 480 x 10 minutes
– 307,200 x 600 sec x 30 fps
– 16.6 billion pixels (17 GB)
• Compression (reduce the size)
Problem
• Reduce the size of a data object
– Text
– Image
– Audio
– Video
• How to do it
– Cheat in ways that the user can’t see
– Coherence
Ways to cheat
• Text generally only has less than 128
possible characters.
– Use 7 bits instead of 8 (12%)
• For text, some characters are more common
than others
– Use fewer bits for common characters, more
bits for infrequently used characters
Ways to cheat
• People can’t see more than 64 levels of gray
– Use 6 bits instead of 8 (25%)
• People don’t see color as well as B/W
– Use 6 bits for B/W and much less for color
Coherence
• If we know the previous value of
something, then we generally have a good
idea what the next value will be
• 3 Techniques
– Run length encoding
– Reuse of subsequences
– Prediction and error
Run length encoding
• Values are frequently repeated.
– Instead of storing each value, store a single
value with a count of how many times to repeat
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
• 12 x 10 = 120 pixels
• 120 pixels x 3 bytes/pixel = 360 bytes
Run encoded
12
0 1 2 3 4 5 6 7 8 9 6
0 4
1 1
2 9 RGB - 3 bytes
1
3 9
Count - 1 byte
4 1
5 9
1
Entries - 23
6
7 3
6
8 1
Space - 4*23 = 92
9 9
10 1
11 9
1
Compression
9 (360-92)/360 = 74%
1
9
1
5
12
Run encoded - with indexed color
12
0 1 2 3 4 5 6 7 8 9 6 4 colors - 12 bytes
0 4
1 1
2 9 index - 2 bits
1 Count - 6 bits
3 9
4 1
5 9 Entries - 23
6 1
7 3
6 Space - 12+1*23 = 35
8 1
9 9
10 1
11 9
1
Compression
9 (360-35)/360 = 90%
1
9
1
5
12
Run encoding
Works well
HELLO
Run encoding
Works Badly
Run encoding
Works well
Run encoding
Not good
Too much variation

in the rose
Run encoding - text
four score and
seven years
ago, our
fathers Not good
brought forth
on this no repetition
continent
Run Encoding - Audio
Not good
No repetition
Run Encoding - Audio
Not good
No repetition
Reuse common sequences
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
10
11
Works really well
Used in GIF format

Works fair
Blacks are good

Rose has some similarities
Works really well

Works poorly
Video
Works really well
Copy pieces from last

frame into this frame
One technique in MPEG

Text
• Reuses words and phrases
• Works fairly well
• Most common text compression technique

Prediction + error
• Given previous values, predict what the
next value will be
• When it is not quite right, store the error
• The error almost always takes fewer bits

than the value
Linear prediction
line through previous predicts
next
Little error
Linear prediction
next
More error
Linear prediction
next
Still more error

Linear prediction
next
less error
Linear prediction
next
less error
Linear prediction
next
little error
Linear Prediction
Linear Prediction
More error
• Look closer
Little Error
Linear Prediction
• Prediction + error
• Shades of black
• Follows shade of rose
• Rose detail is error off
shade
• Prediction + error +
cheating = JPEG
Video
• Copy from
previous frame
• Store error for
small details
• MPEG
Text
• N-Grams
• Use the last N letters to predict the next letter
• Store errors
• English is quite regular

Review
• Cheat
– Exploit weakness in what people can perceive
• Coherence
– Run encoding (count repetitions)
– Reuse (reference pieces from previous data)
– Predict + error
• Know when each technique will or will not work

Data Compression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression

Uploaded by

Copyright:

Available Formats

Discussion #34

Too much variation

Works really well

Used in GIF format

Blacks are good

Works really well

Works really well

Copy pieces from last

One technique in MPEG

• Works fairly well

• Most common text compression technique

• When it is not quite right, store the error

• The error almost always takes fewer bits

Still more error

• Use the last N letters to predict the next letter

• English is quite regular

• Know when each technique will or will not work

You might also like