Image and video compression

(1)

Image and video compression

University of the Philippines - Diliman August 2006

Diane Lingrand

[email protected]

http://www.polytech.unice.fr/~lingrand

(2)

Today's menu

• Ideas for image compression

• Huffman coding

• LZW coding

• Discrete transforms: Fourier, cosinus

• Well known formats :

– GIF, PNG, JPEG

(3)

Why do we need to compress an image ?

• Storage :

– hard drive

– digital camera, PDA, …

• Transmission

– Internet

– Radio waves

(4)

Objectives

• Rapidity of compression / decompression

• Robustness of decompression

• Compression ratio

• Quantity of informations :

– with or without loss of data

• Quality :

– the best, according to our visual abilities – sufficient for detection of informations

(5)

Error measurement

• Difficult and

complex problem

• MSE = Mean Square Error

• PSNR = Peak Signal Noise Ratio

MSE

(6)

Huffman coding: descending phase

0.03 0.05

0.09 0.10

0.10 0.15

0.3 0.18

probabilities

n

₈

n

₇

n

₆

n

₅

n

₄

n

₃

n

₂

n

₁

values n

_i

1st step : from m=8 to m=7

n₇ and n₈ have the smallest probabilities : we group them in one element n_7,8 of probability 0.08

...

last step : there is only one element left of probability 1

(7)

Huffman coding: ascending phase

0.03 0.05

0.09 0.10

0.10 0.15

0.18

probabilities

0.3

n

₈

n

₇

n

₆

n

₅

n

₄

n

₃

n

₂

n

₁

values n

_i

0.08 0.20 0.17

0.32 0.38

0.62 1

0

1

0

1 1 0

0 1

1

0 0

00011

order of reading

01 11 001 100 101 0000 00010

(8)

Huffman coding

• without Huffman coding

– sum of : p(n_i) * 3 bits

• with Huffman coding

– sum of : p(n_i) * l_i

• In our example :

3 bits versus 2.79 bits

image (640*480) : 64512 bits=8kB

(9)

LZW or Lempel-Ziv Welch

• Used by :

– gif format (color images using 8 bits) – tiff (not always)

– .Z files (compress)

– .gzip or .gz files (gnu zip)

• copyrighted by Compuserve and Unisys

(10)

LZW

• Compression without loss

• Good for images with large uniform areas

• Algorithm: splits the set of pixels into words and gives a code to each word

• Consider pixels as a 1D array (no

vertical redundancy)

(11)

LZW : algorithm

• Splitting the string of pixels into the longest strings

• Construction of a table :

– we begin with pixels alone

– then, we consider strings of pixels, longer and longer

• The code for a string does not depend on

the string's length

(12)

Example of LZW coding

w <- empty string while (read a char k)

if wk is already in the dictionnary

w <- wk else

add wk in the dictionnary return code of w

w <- k

ABRACADABRACADA...

w k wk existe ? retour adresse

A B AB non @(A) AB 100

B R BR non @(B) BR 101

R A RA non @(R) RA 102

A C AC non @(A) AC 103

C A CA non @(C) CA 104

A D AD non @(A) AD 105

D A DA non @(D) DA 106

A B AB oui

R ABR non @(AB)=100 ABR 107

R A RA oui

C RAC non @(RA)=102 RAC 108

C A CA oui

D CAD non @(CA)=104 CAD 109 entrée

exists ? return input @

(13)

Example of LZW decoding

previous string <- empty string while (read a code k)

current string <- *k return current string

c <- 1^st char of current string

@free <- previous string + c

previous string <- current string

ABRACADABRACADA...

...

sortie C @ chaine

A A A A

B B B B AB 100 B

R R R R BR 101 R

A A A A RA 102 A

C C C C AC 103 C

A A A A CA 104 A

D D D D AD 105 D

100 AB AB A DA 106 AB

102 RA RA R ABR 107 RA

104 CA CA C RAC 108 CA

code reçu chaîne

courante entrée input

code current

string output input string

(14)

Discrete transforms

• Discrete Fourier transform

• Discrete cosin transform

– smaller coefficients – real coefficients

with and

(15)

Zig-zag scanning

• The highest

coefficients are located in the top left part of the

image transform

64 63

59 58

50 49

37 36

62 60

57 51

48 38

35 22

61

52 56 47

39 34

23 21

55 53

46 40

33 24

20 11

54 45

41 32

25 19

12 10

44 42

31 26

18

9 13

4

43 30

27 17

8 14

5 3

29 28

16

7 15

6 2

1

(16)

Run length coding

• A lot of coefficients are null:

– we count the zeros between two non zeros values

– Example:

• 200 0 80 0 0 4 0 0 0 0 1 …

• is replaced by:

• 0 200 1 80 2 4 4 1 …

(17)

JPEG

• 8x8 blocks encoding

• Several steps :

– DCT

– Quantification – Zig-zag scanning – Run-length coding – Huffman coding

(18)

JPEG : original ; 88 kB

(19)

GIF : 232 kB

(20)

JPEG : 50 % ; 68 kB

(21)

JPEG : 25 % ; 36 kB

(22)

JPEG : 12 % ; 20 kB

(23)

JPEG : 5 % ; 12 kB

(24)

JPEG : 1 % ; 8 kB

(25)

About wavelets (1)

• Replace DCT in JPEG 2000

• Principle :

– Decomposition of the signal on a wavelet basis

• Wavelet basis :

– generated by scaling and translation of a

“mother” wavelet

(a,b)∈ℜ² a≠0

(26)

About wavelets (2)

• Orthogonal basis :

• Wavelet coefficients :

• Haar's basis :

(27)

JPEG 2000

• progressive binary stream

• efficient compression with or without loss of data

• regions of interest can be selected for different compression rates

• includes a mechanism for error robustness

(28)

Examples using JPEG 2000

http://jpeg2000.epfl.ch

80 kB 40 kB

(29)

JPEG 2000 --- JPEG

20 kB

(30)

Video formats and video compression

(31)

Analog formats

• Composit formats :

– separation luminance / chrominance – PAL, SECAM, NTSC

– primary colors for NTSC in 1954 :

• red = 612 nm, green = 530 nm and blue = 472 nm

– luminance : E_Y = 0.30 E_R + 0.59 E_G+ 0.11 E_B – chrominance : D_r = E_R - E_Y and D_b = E_B – E_Y

– UER ⁽Union Européenne de Radiodiffusion) decided to use the same equations for PAL et SECAM (but different λ)

to allow the compatibility of black and with TV with color TV and reciprocally

(32)

Digital formats

• Componants:

– analog : Y Dr Db – digital : Y Cr Cb

• Allows copies without loss

• Images dimensions :

– 525 lines, 60 frames / s

– 625 lines (576 actives), 50 frames / s

• Format 4:2:2

• Format 4:2:0 (DVD)

(33)

Luminance / Chrominance coding:

format 4:2:2 ....

Historically, 4 represents the sample frequency of luminance at 13.5 Mhz

4:4:4 4:2:2 4:2:0 4:1:1

: luminance sample

(34)

Digitalization

• As for fixed images

– Sampling

– Quantification

• Maximal frequencies :

– audio : 20kHz – video : 6 MHz

– Nyquist's theorem : Fe(Y) = 13.5MHz – Fe(Cr) = 6.75 MHz = Fe(Cb)

(35)

Video compression

• Properties :

– 25 à 30 images / second – video rate

– speed of coding/decoding

• Size of the data :

– 1 image format 4:2:2, 8 bits : 810 kB

• 720+360+360 = 1440 bytes / line * 576 lines – 1 second of video : 21 MB

– 1 CD of 650 MB = 34 s of video

Y Cr Cb

52 µs 64 µs

625 lignes au total 576 lignes actives

(36)

Ideas pour video compression

• compression of frames (= image)

• motion estimation

• if frame (n+1) is almost the same as frame

(n), only encode the differences

(37)

Compression ratio

• Fixed images

– without loss : 3:1

– with loss : 10:1 (still good quality)

• Video :

– diffusion applications: 15:1 < σ < 40:1 – processing : σ ≃5:1

• How to compute the compression ratio :

– it is necessary to know the original format

(4:2:2 8 bits, …)

(38)

Compression standards

• 1989 : JPEG (Joint Photographics Experts Group)

• M-JPEG = Motion JPEG

– compression / decompression in real time 25 or 30 images / s

– problem : synchronisation with sound and

transformation of JPEG into M-JPEG not normalized:

several methods, incompatibilities

(39)

DV

• similar to M-JPEG, but :

– normalized

– efficient quantification tables

• 4:1:1 or 4:2:0

• open market video products or professional

(with few differences)

(40)

MPEG1, MPEG2

Motion Picture Experts Group

• 1992: MPEG 1

– norm for animated images with low resolution, for multimedia applications

– JPEG + temporal redondancies

– rate : 1.5 Mbits/s for video and sound

– quality VHS, compatible CDRom, CDVideo

– 1 CD = 650 MB = 74 minutes (video and sound)

• 1994 : MPEG 2 (DVD's norm)

– come from MPEG 1 with highest quality

• standard video (3 to 10 Mbits/s)

• high definition ( 300 Mbits / s)

– MPEG3 was initialy build for high definition but is now included in MPEG2

(41)

The last « MPEG 4 »

• MPEG 4 AVC or H.264

– better compression rate – blocks 4x4

– prediction using several images

• ex: blinking

• Windows Media 9 (WM9)

– similar quality

– is not a norm - proprietary

(42)

SIF

• SIF (Source Intermediate Format)

– half spatial resolution and half temporal resolution ( 1 frame / 2)

– 360 pixels by 288 lines at 25 Hz

Y

Cr,Cb

4:2:2

odd frames

720 720 360

288 144 288 288

288 576

horizontal

under sampling

vertical

under sampling

SIF

TV 4/3 625 lines TVHD 16/9 1250 lines

(43)

GOP (Group Of Pictures)

• composed by 3 types of images

– I (intra) : coded using JPEG – P (predicted) :

• predicted from a previous I or P

• coded using only motion vectors

• can propagate errors

– B (bidirectionnal) :

• computed using bidirectional interpolation from past or future I or P using motion vectors

• the smallest

• don't propagate errors 3 times

smaller than I

6 times smaller

than I

(44)

GOP

• a GOP begins with a I and ends just before the next I

• Typical organization : GOP 12 images

– M = 3 (distance between 2 P) – N = 12 (distance between 2 I)

I B B P B B P B B P B B

prediction

(45)

GOP

• with a long GOP, the compression ratio is higher

• for a given compression rate, a long GOP gives a better image quality

• access to an image :

– not possible to cut a GOP – GOP = random access unit

• decompression :

– GOP = latency

(46)

Motion estimation

• Block = 8x8 (cf JPEG)

• Macroblock : build from 4 blocks of luminance and 2 or 4 blocks of chrominance

• Motion estimation on macroblocks :

– Search for similar macroblocks between an image and the previous one

– Computation of motion vectors (translation)

– Computation of the predicted image using motion vectors

– Comparaison between the predicted image and the image => errors of prediction

– Coding and transmission of motion vectors and

(47)

MPEG 4 and 7

• for multimedia applications

– hybrid coding of both natural and synthetic video – interactive modes allowing an user to interact

with the contents

– compatibility with low BP canals – robustness to noise

– copyright protection

– possibility to search for informations in the video

(48)

MPEG 4 and 7 (suite)

• MPEG 4 :

– object oriented coding => we need segmentation

• MPEG 7 :

– normalization of the way to describe the content of a video (text criterions, visuals, sounds, …)