Videotelephony and Videoconference

Transcrição

Videotelephony and Videoconference
VIDEOTELEPHONY AND
VIDEOCONFERENCE
OVER ISDN
Fernando Pereira
Instituto Superior Técnico
Comunicação de Áudio e Vídeo, Fernando Pereira
Digital Video
Comunicação de Áudio e Vídeo, Fernando Pereira
Video versus Images
• Still Image Services – No strong temporal requirements; no realtime notion.
• Video Services (moving images) – There is a need to strictly
follow strong delay requirements to provide a good illusion of
motion; essential to provide real-time performance.
For each image and video service, it is possible to associate a quality
target (quality of service); the first impact of this target is the
selection of the right spatial and temporal resolutions to use.
Comunicação de Áudio e Vídeo, Fernando Pereira
Why Does Video Information Have to be
Compressed ?
A video sequence is created and consumed as a set of
images, happening at a certain temporal rate (F), each
of them with a spatial resolution of M×
×N luminance
and chrominance samples and a certain number of
bits per sample (L)
This means the total number of (PCM) bits
- and thus the required bandwidth and memory –
necessary to digitally represent a video sequence is
HUGE !!!
Comunicação de Áudio e Vídeo, Fernando Pereira
Videotelephony: Just an Example
• Resolution: 10 images/s with 288×
×360 luminance samples and
144 × 188 samples for each chrominance, with 8 bit/sample
[(360 × 288) + 2 × (180 × 144)] × 8 × 10 = 12.44 Mbit/s
• Reasonable bitrate: e.g. 64 kbit/s for an ISDN B channel
=> Compression Factor: 12.44 Mbit/s/64 kbit/s ≈ 194
The usage or not of compression/source implies the
possibility or not to deploy services and thus the existence
or not of certain industries, e.g. DVD.
Comunicação de Áudio e Vídeo, Fernando Pereira
Digital Video: Why is it So Difficult ?
Serviço
Resolução
espacial –
Lum. (Y)
Resolução
espacial –
Crom. (U,V)
Resolução
temporal
Factor de
forma
Débito
binário
(PCM)
TV alta
definição
1152 × 1920
576 × 960
50 img/s
16/9
1.3 Gbit/s
TV (qualidade
difusão, DVD)
576 × 720
576 × 360
25 img/s
entrelaçadas
4/3
166 Mbit/s
TV (gravação
CD)
288 × 360
144 × 180
25 img/s
progressivas
4/3
31 Mbit/s
Videotelefonia
e Videoconfer.
288 × 360
144 × 180
10 img/s
progressivas
4/3
12.4 Mbit/s
Videotelefonia
móvel
144 × 180
72 × 90
5 img/s
progressivas
4/3
1.6 Mbit/s
Comunicação de Áudio e Vídeo, Fernando Pereira
Video Coding/Compression: a Definition
Efficient representation (this means with a smaller than
PCM number of bits) of a periodic sequence of
(correlated) images, satisfying the relevant
requirements, e.g. acceptable quality, error robustness,
random access.
And the service requirements change with the
services/applications and the corresponding
funcionalities ...
Comunicação de Áudio e Vídeo, Fernando Pereira
Coding … and Decoding ...
Encoder
Decoder
Comunicação de Áudio e Vídeo, Fernando Pereira
How Big
Have to be
the
Compression
Factors ?
Serviço
Resolução
espacial
Resolução
temporal
Débito
binário
codificado
34 Mbit/s
Factor de
compressão
50 img/s
Débito
binário
(PCM)
1.3 Gbit/s
TV alta
definição
1152 × 1920
576 × 960
TV alta
definição
1152 × 1920
50 img/s
1.3 Gbit/s
17 Mbit/s
76.5
TV (qualidade
difusão, DVD)
576 × 720
25 img/s
entrelaçadas
166 Mbit/s
6 Mbit/s
27.5
TV (qualidade
difusão, DVD)
576 × 720
166 Mbit/s
3 Mbit/s
55
576 × 360
25 img/s
entrelaçadas
TV (gravação
CD)
288 × 360
144 × 180
25 img/s
progressivas
31 Mbit/s
1.15 Mbit/s
27
Videoconfer.
288 × 360
144 × 180
25 img/s
progressivas
31 Mbit/s
2 Mbit/s
15.5
Videoconfer.
288 × 360
144 × 180
10 img/s
progressivas
12.4 Mbit/s
384 kbit/s
32.3
Videotelefonia
fixa
288 × 360
10 img/s
progressivas
12.4 Mbit/s
64 kbit/s
194
Videotelefonia
móvel (GSM)
144 × 180
5 img/s
progressivas
1.6 Mbit/s
13 kbit/s
100
38.2
576 × 960
576 × 360
144 × 180
72 × 90
Comunicação de Áudio e Vídeo, Fernando Pereira
Where Does Compression Come From ?
• REDUNDANCY – Regards the similarities, correlations and predictability
amoig the image samples, within (space) and between (temporal) images.
- Elimination or reduction of redundancy does not imply information losses
since only ‘repetitions’ are removed; this means decoded images are
mathematically equal to the original images.
• IRRELEVANCY – Regards the information which is imperceptible for the
human visual systems.
- Elimination or reduction of irrelevancy implies losses in an irreversible
process; however the decoded images should have ‘transparent’ quality since
the removed information should not be perceptible.
Video coding algorithms have to manage these 2 concepts; this means it is
essential for them to know about the information statistics
(redundancy) and the human visual system behavior (irrelevancy).
Comunicação de Áudio e Vídeo, Fernando Pereira
The Basic Coding Chain
Original
Video
Symbol
Generator
(Model)
Bits
Symbols
Entropy
Coding
Comunicação de Áudio e Vídeo, Fernando Pereira
Coding ...
The (source) encoder represents
the images using a set of
symbols, and later bits, using
in the best way the available
coding tools.
Video encoders may be more or
less complex, and thus more or
less efficient for the same
content.
The encoder extracts the
video information ‘juice’ !
Comunicação de Áudio e Vídeo, Fernando Pereira
Interoperability as a Major Requirement:
Standards to Assure that More is not Less ...
• Compression is essential for digital audiovisual
services where interoperability is a major
requirement.
• Interoperability requires the specification and adoption of
standards, notably audiovisual coding standards.
• To allow some evolution of the standards and some
competition in the market between compatible products from
different companies, standards must specify the minimum set
of technology possible, typically the bitstream syntax and the
decoding process (not encoding process).
Comunicação de Áudio e Vídeo, Fernando Pereira
Standards: a Trade
-off between Fixing and
Trade-off
Inovating
Normative !
Encoder
Decoder
Comunicação de Áudio e Vídeo, Fernando Pereira
Video Coding Standards …
• ITU-T H.120 (1984) - Videoconference (1.5 - 2 Mbit/s)
• ITU-T H.261 (1988) – Audiovisual services (videotelephony and
videoconference) at p×
×64kbit/s, p=1,…,30
• ISO/IEC MPEG-1 (1990)- CD-ROM Video
• ISO/IEC MPEG-2 also ITU-T H.262 (1993) – Digital TV
• ITU-T H.263 (1996) – PSTN and mobile video
• ISO/IEC MPEG-4 (1998) – Audiovisual objects, improved efficiency
• ISO/IEC MPEG-4 AVC also ITU-T H.264 (2003) – Improved efficiency
Comunicação de Áudio e Vídeo, Fernando Pereira
ITU-T H.320 Terminals
Videotelephony and
Videoconference
Comunicação de Áudio e Vídeo, Fernando Pereira
Videotelephony and Videoconference
Personal (bidirectional) communications in real-time !
Comunicação de Áudio e Vídeo, Fernando Pereira
ITU
-T H.320 Recommendation: Motivation
ITU-T
• The starting of the work towards Rec. H.320 and H.261 goes back
to 1984 when it was acknowledged that:
• There was an increase in the demand for image-based services,
notably videotelephony and videoconference.
• There was a growing availability of 64, 384 e 1536/1920 kbit/s
digital lines as well as ISDN lines.
• There was a need to make available image-based services and
terminals for the digital lines mentioned above.
• The acknowledgement that Rec. H.120, just issued at that time,
for videoconference services, was already obsolete in terms of
compression efficiency due to the fast development in the are of
video compression.
Comunicação de Áudio e Vídeo, Fernando Pereira
ISDN: Motivation and Objectives
• Growing usage of digital technology for the transmission and switching of
speech due to their several benefits
• Growing demand for data transmission services, notably influenced by the
increasing availability of computers
• Advantages of service integration in a single network
• Need to create services able to justify the digitization of the local network to
evolve later to the Broadband ISDN
To offer to the users the capability to establish digital connections through a
limited number of access types supporting a variety of services for data,
speech, audio, image and video.
Considering the size of the existing communication infrastructure, the
evolution to ISDN should be gradual and mostly based in the available
resources.
Comunicação de Áudio e Vídeo, Fernando Pereira
ISDN: Basic Features
• Narrowband-ISDN – Should offer bitrates up to a few Mbit/s
- Based on the existing telephone network (PSTN) with large
accessibility which guarantees a large geographical coverage
- Involves the usage of digital transmission and switching centrals
beside signal processing
- Uses a new type of signalling between public central (ITU-T nº 7)
• Broadband-ISDN – Should offer bitrate up to several hundreds of
Mbit/s integrating all services in terms of access and network
- Involves a longer time frame and depends on the availability in a
large scale of broadband transmission means such as fiber optic
- Based on the asynchronous transfer mode (ATM) and the usage of
fixed size packets
- Offers dynamic resource allocation through a traffic contract
Comunicação de Áudio e Vídeo, Fernando Pereira
Basic ISDN Channels
• B-Channel - 64 kbit/s – B-channel connections may be
performed with circuit-switching, packet-switching or rented
lines.
• D-Channel - 16 ou 64 kbit/s – D-channels have the main function
to transport the signalling information associated to B-channels;
in the idle periods, they may be used to transmit user data using
packet-switching
• H-Channel - 384, 1536 ou 1920 kbit/s – Offer connections with
higher bitrates.
Comunicação de Áudio e Vídeo, Fernando Pereira
ISDN Access
• ISDN channels are grouped according to 2 access types offered
to the users:
• Basic Access (2B+D) – Consists in 2 B-channels with kbit/s (fullduplex) and one D-channel with 16 kbit/s (full-duplex); this
configuration corresponds to a total bitrate of 192 kbit/s,
including synchronization information and frame overhead.
• Primary Access – Offers 2 configurations related to the digital
transmission hierarchies:
- 2048 kbit/s (30B+D) for Europe
- 1544 kbit/s (23B+D) for USA/Canada/Japan
Comunicação de Áudio e Vídeo, Fernando Pereira
Videotelephony and Videoconference: Main
Features
• Personal communications (point to point
or multipoint to multipoint)
• Symmetric bidirectional communications
(all nodes involved have the same similar
features)
• Critical delay requirements
• Low or intermediate quality requirements
• Strong psychological and sociological
impacts
Comunicação de Áudio e Vídeo, Fernando Pereira
Rec. H.320 Terminal
Comunicação de Áudio e Vídeo, Fernando Pereira
Establishing a
H.320 Connection
Rec. H.242 defines the
communication protocol
between H.320 terminals for
the initial phase of the
connection.
Comunicação de Áudio e Vídeo, Fernando Pereira
Rec. H.221: Frame Structure
Rec. H.221 defines
the frame
structure for
audiovisual
services using
single or multiple
64 kbit/s
channels.
Frame structure for a 64 kbit/s channel – ISDN B-channel
Comunicação de Áudio e Vídeo, Fernando Pereira
Video Coding:
Rec. ITU-T H.261
Comunicação de Áudio e Vídeo, Fernando Pereira
Recommendation H.261: Objectives
Efficient coding of
videotelephony and
videoconference sequences
with a minimum
acceptable quality using
bitrate from 40 kbit/s to
2Mbit/s targeting
synchronous channel
(ISDN) at p×
×64 kbit/s, with
p=1,...,30.
This is the first international video coding standard with
meaningful adoption thus introducing the notion of
backward compatibility in video coding standards.
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261 Basic Coding Scheme
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261: Signals to Code
• The signals to code for each image are the luminance (Y) and 2
chrominances, named CB and CR or U and V.
• The samples are quantized according to Rec. ITU-R BT-601:
- Black = 16; White = 235; Null colour difference = 128
- Peak colour difference (U,V) = 16 and 240
• The coding algorithm operates over progressive (non-interlaced)
content at 29.97 image/s. The temporal resolution may be
decreased by jumping 1, 2 or 3 images between each transmitted
image.
Comunicação de Áudio e Vídeo, Fernando Pereira
RGB to YUV Conversion
Comunicação de Áudio e Vídeo, Fernando Pereira
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261: Image Format
Two spatial resolutions are possible:
• CIF (Common Intermediate Format) - 288 ×
352 pels for luminance (Y) and 144 × 176 pels
for each chrominance (U,V) this means a
4:2:0 subsampling format, with ‘quincux’
positioning, progressive, 30 frame/s with a 4/3
form factor.
• QCIF (Quarter CIF) – Similar to CIF with
half spatial resolution in both directions this
means 144 × 176 pels for luminance and 72 ×
88 pels for each chrominance.
All H.261 codecs must work with QCIF ands
some may be able to work also in CIF.
Comunicação de Áudio e Vídeo, Fernando Pereira
Chrominance Subsampling Formats
Formato
Crominância
Amostras
Lum/Linha
Linhas
Lum/Imagem
Amostras
Crom/Linha
Linhas
Crom/Imagem
Factor
Subamostr.
Horizontal
Factor
Subamostr.
Vertical
4:4:4
4:2:2
4:2:0
4:1:1
4:1:0
720
720
720
720
720
576
576
576
576
576
720
360
360
180
180
576
576
288
576
144
2:1
2:1
4:1
4:1
2:1
4:1
Comunicação de Áudio e Vídeo, Fernando Pereira
Groups Of Blocks (GOBs), Macroblocks and
Blocks
QCIF
GOB 1
GOB 2
GOB 3
GOB 4
GOB 5
GOB 6
GOB 7
GOB 8
GOB 9
GOB 10
GOB 11
GOB 12
CIF
The video sequence is
spatially organized
according to a
hierarchical structure
with 4 levels:
-
Images
Group of Blocks (GOB)
Macroblocks (MB)
Blocks
1
2
Y
3
U
5
V
6
4
4:2:0
Comunicação de Áudio e Vídeo, Fernando Pereira
Comunicação de Áudio e Vídeo, Fernando Pereira
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261: Coding Tools
• Temporal Redundancy
• Predictive coding: sending differences and motion
compensation
• Spatial Redundancy
• Transform coding (Discrete Cosine Transform, DCT)
• Statistical Redundancy
• Huffman entropy coding
• Irrelevancy
• Quantization of DCT coefficients
Comunicação de Áudio e Vídeo, Fernando Pereira
Exploring
Temporal Redundancy
Comunicação de Áudio e Vídeo, Fernando Pereira
Temporal Prediction and Prediction Error
• Temporal prediction is based on the principle that, locally, each
imagem may be represented using as reference a part of some
preceding image, typically the previous one.
• The prediction quality strongly determines the coding
performance since it defines the amount of information to code
and transmit this means the energy of the error/difference signal
called prediction error.
error
• The lower is the prediction error the lower is the
information/energy to transmit and thus
- Better quality may be achieved for a certain available bitrate
- Lower bitrate is needed to achieve a certain video quality
Comunicação de Áudio e Vídeo, Fernando Pereira
Predictive or Differential Coding: Basic
Scheme
In Rec. H.261, there is no quantization in the temporal domain.
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261 Temporal Prediction
H.261 temporal prediction includes 2 tools which have both the
target to eliminate/reduce the temporal redundancy in the PCM
video signal:
Sending the Differences
Motion Compensation
Comunicação de Áudio e Vídeo, Fernando Pereira
Temporal Redundancy: Sending the
Differences
The idea is that only the
new information in the
image (this means what
changes from the
previous image) is sent;
the previous image
works as an easy
prediction of the current
image.
...
There are no losses !
Comunicação de Áudio e Vídeo, Fernando Pereira
Computing the Differences: an Example
Image t
Image t-1
Differences
Comunicação de Áudio e Vídeo, Fernando Pereira
Coding and Decoding ...
Comunicação de Áudio e Vídeo, Fernando Pereira
Motion Estimation and Compensation
Motion estimation and compensation has to target to improve the
temporal predictions for each image zone by detecting, estimating
and compensating the motion in the image.
• Motion estimation is not normative (it is part of the encoder) but the
so called block matching is the most used technique.
• In H.261, motion compensation is made at macroblock level. The
usage of motion compensation for each MB is optional and decided
by the encoder.
Motion estimation implies a very high computational effort. This
justifies the usage of fast motion estimation methods which try to
decrease the complexity compared to full search without much
quality losses.
Comunicação de Áudio e Vídeo, Fernando Pereira
Reference image
Image to code
Comunicação de Áudio e Vídeo, Fernando Pereira
Temporal Redundancy: Motion Estimation
t
Comunicação de Áudio e Vídeo, Fernando Pereira
Search, Where ?
Searching area
Reference image
Image to code
Comunicação de Áudio e Vídeo, Fernando Pereira
Motion Vectors at Different Spatial Resolutions
Comunicação de Áudio e Vídeo, Fernando Pereira
Searching Area: an Efficiency
-Complexity
Efficiency-Complexity
Trade
-off
Trade-off
t
Comunicação de Áudio e Vídeo, Fernando Pereira
MBs to Code and Prediction MBs
Reference image
Current image
Comunicação de Áudio e Vídeo, Fernando Pereira
Motion Compensation: an Example
Image t
Image t-1
Diff. WITHOUT motion comp.
Differences
WITH motion
comp
Motion vectors
Comunicação de Áudio e Vídeo, Fernando Pereira
The 33-- Steps Motion Estimation Algorithm
Fast motion
estimation
algorithms offer
lower complexity
at the cost of some
quality decrease
since predictions
are less optimal
and thus the
prediction error is
higher !
Comunicação de Áudio e Vídeo, Fernando Pereira
Motion Compensation Decision Characteristic
X
db – difference block
dbd – displaced block difference
X
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261 Motion Estimation
• One motion vectors may be transmitted for each macroblock (if
convenient).
• Motion vectors may take values from -15 to + 15 pels, in the vertical
and horizontal directions, only the integer values.
• Only motion vectors referencing existing zones in the reference
(previous) image are valid.
• The motion vectors transmitted for each MB is used for the 4
luminance blocks in the MB. The chrominance motion vector is
computed by dividing by 2 and truncating the luminance motion
vector.
• O positive value for the horizontal or vertical motion vector
components means the prediction must be made using the samples in
the previous image, spatially located to the right and below the
samples to be predicted.
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261 Motion Vectors Coding
• To exploit the redundancy between the motion vectors of adjacent
MBs, each motion vector is differentially coded as the difference
between the motion vector of the actual MB and its prediction this
means the motion vector of the preceding MB.
• The motion vectors prediction is null when:
- The actual MB is number 1, 12 or 23
- The last transmitted MB is not adjacent
to the actual MB
- The preceding and contiguous MB did
not use motion compensation
Comunicação de Áudio e Vídeo, Fernando Pereira
VLC Coding Table
for (Differential)
Motion Vectors
Comunicação de Áudio e Vídeo, Fernando Pereira
Exploring Spatial
Redundancy and
Irrelevancy
Comunicação de Áudio e Vídeo, Fernando Pereira
After Time, the Space
Actual image
+
+
-
DCT
Transform
Prediction error
Prediction image,
motion compensated
Comunicação de Áudio e Vídeo, Fernando Pereira
Transform Coding
Transform coding involves the division of the image in N×
×N blocks to
which the transform is applied, producing blocks with N×
×N
coefficients.
A transform is formally defined through the formulas with the direct and
inverse transforms:
F(u,v) = Σi=0 N-1 Σ j=0 N-1 f(i,j) A(i,j,u,v)
f(i,j) = Σu=0 N-1 Σ v=0 N-1 F(u,v) B(i,j,u,v)
where
f(i,j) – input signal (in space)
A (i,j,u,v) – direct transform basis functions
F(u,v) – transform coefficients
B (i,j,u,v) – inverse transform basis functions
Comunicação de Áudio e Vídeo, Fernando Pereira
The Transform Coefficients
Comunicação de Áudio e Vídeo, Fernando Pereira
Discrete Cosine Transform (DCT)
The DCT is one of the sinusoidal transforms which means its basis
functions are sampled sinusoidal functions.
N −1 N −1
2
 u( 2 j + 1)   v ( 2k + 1) 
F ( u, v ) = C ( u)C (v )∑∑ f ( j , k ) cos π
 cos π

N
2
N
2
N

 

j =0 k =0
2
f ( j, k ) =
N
 u( 2 j + 1)   v ( 2k + 1) 
C
(
u
)
C
(
v
)
F
(
u
,
v
)
cos
π  cos
π

∑∑
 2N
  2N

u =0 v =0
N −1 N −1
The DCT is undoubtedly the most used transform in image and
video coding since its performance is close to the KLT
compacting performance for signals with high correlation and
there are fast implementation solutions available.
Comunicação de Áudio e Vídeo, Fernando Pereira
Bidimensional DCT Basis Functions (N=8
(N=8))
Comunicação de Áudio e Vídeo, Fernando Pereira
DCT
KLT
Comunicação de Áudio e Vídeo, Fernando Pereira
Spatial Redundancy: DCT Transform
X
X
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
X
X X X X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X X
X
X
X
X
X
X
X X
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
Spatial domain
(samples)
X
DCT
X
X
X
X X
X X
X
Frequency domain
(DCT coefficients)
Comunicação de Áudio e Vídeo, Fernando Pereira
The DCT Transform in H.261
• In H.261, the DCT is applied to blocks with 8×8 samples.
samples This value
results from a trade-off between the exploitation of the spatial
redundancy and the computational complexity.
• The DCT coefficients to transmit are selected using non-normative
thresholds allowing the consideration of psycho visual criteria in the
coding process targeting the maximization of the subjective quality.
• To exploit the irrelevancy in the original signal, the DCT coefficients
to transmit for each block are quantized.
quantized
• Since the signal energy is compacted in the upper, left corner of the
coefficients’ matrix and the different human visual system sensibility
to the various frequencies, the quantized coefficients are zig-zag
scanned to assure that more important coefficients are always
transmitted before less important ones.
Comunicação de Áudio e Vídeo, Fernando Pereira
H.261 Quantization
• Rec. H.261 uses as quantization steps
all even values between 2 and 62 (32
quantizers available).
• Within each MB, all DCT
coefficients are quantized with the
same quantization step with the
exception of the DC coefficients for
Intra MBs which are always
quantized with step 8.
Example quantization
characteristic
• Rec. H.261 normatively defines the
regeneration values for the
quantized coefficients but not the
decision values which may be
selected to implement different
quantization characteristics,
uniform or not.
Comunicação de Áudio e Vídeo, Fernando Pereira
Serializing the DCT Coefficients
• The transmission of the quantized DCT
coefficients requires to send the
decoder two types of information about
the coefficients: their position and
amplitude.
• For each DCT coefficient to transmit,
its position and amplitude is
represented using a bidimensional
symbol called
(run, level)
• where the run indicates the number of
null coefficients before the coefficient
under coding, and the level indicates
the quantized value of the coefficient.
Comunicação de Áudio e Vídeo, Fernando Pereira
Scanning the DCT Coefficients
Zigzag scanning
Each DCT coefficients block is represented
as a sequence of (run, level) pairs, e.g.
(0,124), (0, 25), (0,147), (0, 126), (3,13), (0,
147), (1,40) ...
Comunicação de Áudio e Vídeo, Fernando Pereira
Comunicação de Áudio e Vídeo, Fernando Pereira
Exploring
Statistical
Redundancy
Comunicação de Áudio e Vídeo, Fernando Pereira
Statistical Redundancy: Entropy Coding
Entropy coding
CONVERTS SYMBOLS IN BITS !
Using the statistics of the symbols to transmit to achieve
additional (lossless) compression by allocating in a
clever way bits to the input symbol stream.
• A, B, C, D -> 00, 01, 10, 11
• A, B, C, D -> 0, 10, 110, 111
Comunicação de Áudio e Vídeo, Fernando Pereira
Huffman Coding
Huffman coding is one of the entropy coding tools which allows to
exploit the fact that the symbols produced by the encoder do not
have equal probability.
• To each generated symbol is attributed a codeword which size
(in bits) is inversely proportional to its probability.
• The usage of variable length codes implies the usage of an
output buffer to ‘smooth’ the bitrate flow, if a synchronous
channel is available.
• The increase in coding efficiency is ‘paid’ with an increase in the
sensibility to channel errors.
Comunicação de Áudio e Vídeo, Fernando Pereira
VLC Table
for
Macroblock
Addressing
Comunicação de Áudio e Vídeo, Fernando Pereira
Combining the
Tools ...
Comunicação de Áudio e Vídeo, Fernando Pereira
The H.261 Symbolic Model
Original
Video
Symbol
Generator
(Model)
Bits
Symbols
Entropy
Coder
A video sequence is represented as a set of images structured
in macroblocks, each of them represented with motion
vectors and/or (intra or inter) DCT coefficients.
Comunicação de Áudio e Vídeo, Fernando Pereira
Encoder: the Winning Cocktail !
Originals
+
DCT
Symbols
Gener.
Quantiz.
Entropy
coder
Buffer
Inverse
Quantiz.
Entropy
coder
Inverse
DCT
+
Motion
det./comp.
Previous
frame
Comunicação de Áudio e Vídeo, Fernando Pereira
Decoder: the Slave !
Data
Buffer
Demux.
Huffman
decoder
IDCT
+
Motion
comp.
Data
Comunicação de Áudio e Vídeo, Fernando Pereira
Output Buffer
• The production of bits by the encoder is highly non-uniform in
time essentially because:
• Variation in spatial detail for the various parts of each image
• Variation of temporal activity along time
• Entropy coding of the coded symbols
To adapt the variable bitrate flow produced by the encoder to the
constant bitrate flow transmitted by the channel, an output
buffer is used, which adds some delay.
Comunicação de Áudio e Vídeo, Fernando Pereira
Bitrate Control
The encoder must efficiently control the evolution in time of its bit
production in order it maximizes the decoded quality for the
synchronous channel available.
Rec. H.261 does not specify what type of bitrate control must be used;
various tools are available:
• Changing the temporal resolution/frame rate
• Changing the spatial resolution, e.g. CIF to QCIf and vice-versa
• Controlling the macroblock classification
• Changing the quantization step value
The bitrate control strategy has a huge impact on the video quality
that may be achieved with a certain bitrate !
Comunicação de Áudio e Vídeo, Fernando Pereira
Quantization Step versus Buffer Fullness
The bitrate control solution
recognized as more efficient,
notably in terms of the
granularity and frequency of
the control, controls the
quantization step as a
function of the output buffer
fullness.
Quantization step control
Video sequence
Encoder
Output
buffer
Binary flow
Comunicação de Áudio e Vídeo, Fernando Pereira
The Importance of Well Choosing !
To well exploit the redundancy and irrelevancy in the video
sequence, the encoder has to adequately select:
• Which coding tools are used for each MB, depending of its
characteristics;
• Which set of symbols is the best to represent each MB, e.g. motion
vector and DCT coefficients.
While the encoder has the mission to take important decisions and
make important choices, the decoder is a ‘slave’, limited to follow
the ‘orders’ sent by the encoder; decoder intelligence is only shown
for error concealment.
Comunicação de Áudio e Vídeo, Fernando Pereira
Macroblock Classification
• Macroblocks are the basic coding unit since it is at the macroblock
level that the encoder selects the coding tools to use.
• Each coding tool is more or less adequate to a certain type of MB;
it is important that, for each MB, the right coding tools are
selected.
• Since Rec. H,261 includes several coding tools, it is the task of the
encoder to select the best tools for each MB; MBs are thus
classified following the tools used for their coding.
• When only spatial redundancy is exploited, MBs are INTRA coded;
if also temporal redundancy is exploited, MBs are INTER coded.
Comunicação de Áudio e Vídeo, Fernando Pereira
Macroblock Classification Table
Comunicação de Áudio e Vídeo, Fernando Pereira
Hierarchical Information Structure
• Image
- Resynchronization (Picture header)
- Temporal resolution control
- Spatial resolution control
• Group of Blocks (GOB)
- Resynchronization (GOB header)
- Quantization step control (mandatory)
• Macroblock
- Motion estimation and compensation
- Quantization step control (optional)
- Selection of coding tools (MB classification)
• Block
- DCT
Comunicação de Áudio e Vídeo, Fernando Pereira
Coding Syntax: Image and GOB Levels
Comunicação de Áudio e Vídeo, Fernando Pereira
Coding Syntax: MB and Block Levels
Comunicação de Áudio e Vídeo, Fernando Pereira
Error Protection for the H.261 Binary Flow
• Error protection for the H.261 binary flow is implemented by
using a BCH (511,493) - Bose-Chaudhuri-Hocquenghem – block
coding.
• The usage of the channel coding bits (also parity bits) at the
decoder is optional.
• The syndrome polynomial to generate the parity bits is
g (x) = (x9+ x4+ x) ( x9+ x6+ x4+ x3+ 1)
Comunicação de Áudio e Vídeo, Fernando Pereira
Error Protection for the H.261 Binary Flow
The final video signal stream structure (multiframe with 512×
×8 =
4096 bits):
Transmission
S1
S7
S2
S8
S1S2S3S4S5...S8 – Alignment sequence
S1
Code bits
Parity bits
(1)
(493)
(18)
1
(1)
0
(1)
Video bits
(492)
Stuffing bits (1's)
(492)
00011011
When decoding, realignment is
only valid after the good
reception of 3 alignment
sequences (S1S2 ...S8).
Comunicação de Áudio e Vídeo, Fernando Pereira
Error Concealment
• Even when channel coding is used, some residual (transmission)
errors may end at the source decoder.
• Residual errors may be detected at the source detector due to
syntactical and semantic inconsistencies.
• For digital video, the most basic error concealment techniques
imply:
- Repeating the co-located data from previous frame
- Repeating data from previous frame after motion compensation
• Error concealment for non-detected errors may be performed
through post-processing.
Comunicação de Áudio e Vídeo, Fernando Pereira
Error Concealment and Post
-Processing
Post-Processing
Examples
Comunicação de Áudio e Vídeo, Fernando Pereira
Final Comments
• Rec. H.261 has been the first video coding international
standard with significant adoption.
• As the first relevant video coding standard, Rec. H.261 has
established legacy and backward compatibility requirements
which influenced the next standards to come, notably in terms of
technology selected.
• Many products and services have been available based on Rec.
H.261.
• However, Rec. H.261 does not represent anymore the state-ofthe-art on video coding (remind this standard is from the end of
the eighties).
Comunicação de Áudio e Vídeo, Fernando Pereira
Bibliography
• Videoconferencing and Videotelephony, Richard Schaphorst,
Artech House, 1996
• Image and Video Compression Standards: Algorithms and
Architectures, Vasudev Bhaskaran and Konstantinos
Konstantinides, Kluwer Academic Publishers, 1995
• Multimedia Communications, Fred Halsall, Addison-Wesley,
2001
• Multimedia Systems, Standards, and Networks, A. Puri & T.
Chen, Marcel Dekker, Inc., 2000
Comunicação de Áudio e Vídeo, Fernando Pereira

Documentos relacionados

MPEG-4 Visual - Multimedia Signal Processing Group, IT-Lx

MPEG-4 Visual - Multimedia Signal Processing Group, IT-Lx conveyance by a variety of transport layers or storage media Comunicação de Áudio e Vídeo, Fernando Pereira

Leia mais