Image and video compression
University of the Philippines - Diliman August 2006
Diane Lingrand
[email protected]
http://www.polytech.unice.fr/~lingrand
Today's menu
• Ideas for image compression
• Huffman coding
• LZW coding
• Discrete transforms: Fourier, cosinus
• Well known formats :
– GIF, PNG, JPEG
Why do we need to compress an image ?
• Storage :
– hard drive
– digital camera, PDA, …
• Transmission
– Internet
– Radio waves
Objectives
• Rapidity of compression / decompression
• Robustness of decompression
• Compression ratio
• Quantity of informations :
– with or without loss of data
• Quality :
– the best, according to our visual abilities – sufficient for detection of informations
Error measurement
• Difficult and
complex problem
• MSE = Mean Square Error
• PSNR = Peak Signal Noise Ratio
MSE
MSE
Huffman coding: descending phase
0.03 0.05
0.09 0.10
0.10 0.15
0.3 0.18
probabilities
n
8n
7n
6n
5n
4n
3n
2n
1values n
i1st step : from m=8 to m=7
n7 and n8 have the smallest probabilities : we group them in one element n7,8 of probability 0.08
...
last step : there is only one element left of probability 1
Huffman coding: ascending phase
0.03 0.05
0.09 0.10
0.10 0.15
0.18
probabilities
0.3n
8n
7n
6n
5n
4n
3n
2n
1values n
i0.08 0.20 0.17
0.32 0.38
0.62 1
0
1
1
0
1 1 0
0 1
1
1
0 0
00011
order of reading
01 11 001 100 101 0000 00010
Huffman coding
• without Huffman coding
– sum of : p(ni) * 3 bits
• with Huffman coding
– sum of : p(ni) * li
• In our example :
3 bits versus 2.79 bits
image (640*480) : 64512 bits=8kB
LZW or Lempel-Ziv Welch
• Used by :
– gif format (color images using 8 bits) – tiff (not always)
– .Z files (compress)
– .gzip or .gz files (gnu zip)
• copyrighted by Compuserve and Unisys
LZW
• Compression without loss
• Good for images with large uniform areas
• Algorithm: splits the set of pixels into words and gives a code to each word
• Consider pixels as a 1D array (no
vertical redundancy)
LZW : algorithm
• Splitting the string of pixels into the longest strings
• Construction of a table :
– we begin with pixels alone
– then, we consider strings of pixels, longer and longer
• The code for a string does not depend on
the string's length
Example of LZW coding
w <- empty string while (read a char k)
if wk is already in the dictionnary
w <- wk else
add wk in the dictionnary return code of w
w <- k
ABRACADABRACADA...
w k wk existe ? retour adresse
A B AB non @(A) AB 100
B R BR non @(B) BR 101
R A RA non @(R) RA 102
A C AC non @(A) AC 103
C A CA non @(C) CA 104
A D AD non @(A) AD 105
D A DA non @(D) DA 106
A B AB oui
R ABR non @(AB)=100 ABR 107
R A RA oui
C RAC non @(RA)=102 RAC 108
C A CA oui
D CAD non @(CA)=104 CAD 109 entrée
exists ? return input @
Example of LZW decoding
previous string <- empty string while (read a code k)
current string <- *k return current string
c <- 1st char of current string
@free <- previous string + c
previous string <- current string
ABRACADABRACADA...
...
sortie C @ chaine
A A A A
B B B B AB 100 B
R R R R BR 101 R
A A A A RA 102 A
C C C C AC 103 C
A A A A CA 104 A
D D D D AD 105 D
100 AB AB A DA 106 AB
102 RA RA R ABR 107 RA
104 CA CA C RAC 108 CA
code reçu chaîne
courante entrée input
code current
string output input string
Discrete transforms
• Discrete Fourier transform
• Discrete cosin transform
– smaller coefficients – real coefficients
with and
Zig-zag scanning
• The highest
coefficients are located in the top left part of the
image transform
64 63
59 58
50 49
37 36
62 60
57 51
48 38
35 22
61
52 56 47
39 34
23 21
55 53
46 40
33 24
20 11
54 45
41 32
25 19
12 10
44 42
31 26
18
9 13
4
43 30
27 17
8 14
5 3
29 28
16
7 15
6 2
1
Run length coding
• A lot of coefficients are null:
– we count the zeros between two non zeros values
– Example:
• 200 0 80 0 0 4 0 0 0 0 1 …
• is replaced by:
• 0 200 1 80 2 4 4 1 …
JPEG
• 8x8 blocks encoding
• Several steps :
– DCT
– Quantification – Zig-zag scanning – Run-length coding – Huffman coding
JPEG : original ; 88 kB
GIF : 232 kB
JPEG : 50 % ; 68 kB
JPEG : 25 % ; 36 kB
JPEG : 12 % ; 20 kB
JPEG : 5 % ; 12 kB
JPEG : 1 % ; 8 kB
About wavelets (1)
• Replace DCT in JPEG 2000
• Principle :
– Decomposition of the signal on a wavelet basis
• Wavelet basis :
– generated by scaling and translation of a
“mother” wavelet
(a,b)∈ℜ2 a≠0
About wavelets (2)
• Orthogonal basis :
• Wavelet coefficients :
• Haar's basis :
JPEG 2000
• progressive binary stream
• efficient compression with or without loss of data
• regions of interest can be selected for different compression rates
• includes a mechanism for error robustness
Examples using JPEG 2000
http://jpeg2000.epfl.ch
80 kB 40 kB
JPEG 2000 --- JPEG
20 kB
Video formats and video compression
Analog formats
• Composit formats :
– separation luminance / chrominance – PAL, SECAM, NTSC
– primary colors for NTSC in 1954 :
• red = 612 nm, green = 530 nm and blue = 472 nm
– luminance : EY = 0.30 ER + 0.59 EG+ 0.11 EB – chrominance : Dr = ER - EY and Db = EB – EY
– UER (Union Européenne de Radiodiffusion) decided to use the same equations for PAL et SECAM (but different λ)
to allow the compatibility of black and with TV with color TV and reciprocally
Digital formats
• Componants:
– analog : Y Dr Db – digital : Y Cr Cb
• Allows copies without loss
• Images dimensions :
– 525 lines, 60 frames / s
– 625 lines (576 actives), 50 frames / s
• Format 4:2:2
• Format 4:2:0 (DVD)
Luminance / Chrominance coding:
format 4:2:2 ....
Historically, 4 represents the sample frequency of luminance at 13.5 Mhz
4:4:4 4:2:2 4:2:0 4:1:1
: luminance sample
Digitalization
• As for fixed images
– Sampling
– Quantification
• Maximal frequencies :
– audio : 20kHz – video : 6 MHz
– Nyquist's theorem : Fe(Y) = 13.5MHz – Fe(Cr) = 6.75 MHz = Fe(Cb)
Video compression
• Properties :
– 25 à 30 images / second – video rate
– speed of coding/decoding
• Size of the data :
– 1 image format 4:2:2, 8 bits : 810 kB
• 720+360+360 = 1440 bytes / line * 576 lines – 1 second of video : 21 MB
– 1 CD of 650 MB = 34 s of video
Y Cr Cb
52 µs 64 µs
625 lignes au total 576 lignes actives
Ideas pour video compression
• compression of frames (= image)
• motion estimation
• if frame (n+1) is almost the same as frame
(n), only encode the differences
Compression ratio
• Fixed images
– without loss : 3:1
– with loss : 10:1 (still good quality)
• Video :
– diffusion applications: 15:1 < σ < 40:1 – processing : σ ≃5:1
• How to compute the compression ratio :
– it is necessary to know the original format
(4:2:2 8 bits, …)
Compression standards
• 1989 : JPEG (Joint Photographics Experts Group)
• M-JPEG = Motion JPEG
– compression / decompression in real time 25 or 30 images / s
– problem : synchronisation with sound and
transformation of JPEG into M-JPEG not normalized:
several methods, incompatibilities
DV
• similar to M-JPEG, but :
– normalized
– efficient quantification tables
• 4:1:1 or 4:2:0
• open market video products or professional
(with few differences)
MPEG1, MPEG2
Motion Picture Experts Group
• 1992: MPEG 1
– norm for animated images with low resolution, for multimedia applications
– JPEG + temporal redondancies
– rate : 1.5 Mbits/s for video and sound
– quality VHS, compatible CDRom, CDVideo
– 1 CD = 650 MB = 74 minutes (video and sound)
• 1994 : MPEG 2 (DVD's norm)
– come from MPEG 1 with highest quality
• standard video (3 to 10 Mbits/s)
• high definition ( 300 Mbits / s)
– MPEG3 was initialy build for high definition but is now included in MPEG2
The last « MPEG 4 »
• MPEG 4 AVC or H.264
– better compression rate – blocks 4x4
– prediction using several images
• ex: blinking
• Windows Media 9 (WM9)
– similar quality
– is not a norm - proprietary
SIF
• SIF (Source Intermediate Format)
– half spatial resolution and half temporal resolution ( 1 frame / 2)
– 360 pixels by 288 lines at 25 Hz
Y
Cr,Cb
4:2:2
4:2:2
odd frames
720 720 360
288 144 288 288
288 576
horizontal
under sampling
vertical
under sampling
SIF
TV 4/3 625 lines TVHD 16/9 1250 lines
GOP (Group Of Pictures)
• composed by 3 types of images
– I (intra) : coded using JPEG – P (predicted) :
• predicted from a previous I or P
• coded using only motion vectors
• can propagate errors
– B (bidirectionnal) :
• computed using bidirectional interpolation from past or future I or P using motion vectors
• the smallest
• don't propagate errors 3 times
smaller than I
6 times smaller
than I
GOP
• a GOP begins with a I and ends just before the next I
• Typical organization : GOP 12 images
– M = 3 (distance between 2 P) – N = 12 (distance between 2 I)
I B B P B B P B B P B B
prediction
GOP
• with a long GOP, the compression ratio is higher
• for a given compression rate, a long GOP gives a better image quality
• access to an image :
– not possible to cut a GOP – GOP = random access unit
• decompression :
– GOP = latency
Motion estimation
• Block = 8x8 (cf JPEG)
• Macroblock : build from 4 blocks of luminance and 2 or 4 blocks of chrominance
• Motion estimation on macroblocks :
– Search for similar macroblocks between an image and the previous one
– Computation of motion vectors (translation)
– Computation of the predicted image using motion vectors
– Comparaison between the predicted image and the image => errors of prediction
– Coding and transmission of motion vectors and
MPEG 4 and 7
• for multimedia applications
– hybrid coding of both natural and synthetic video – interactive modes allowing an user to interact
with the contents
– compatibility with low BP canals – robustness to noise
– copyright protection
– possibility to search for informations in the video
MPEG 4 and 7 (suite)
• MPEG 4 :
– object oriented coding => we need segmentation
• MPEG 7 :
– normalization of the way to describe the content of a video (text criterions, visuals, sounds, …)
– does not normalized the extraction of information or the search engine