Videotelephony and Videoconference
Transcrição
Videotelephony and Videoconference
VIDEOTELEPHONY AND VIDEOCONFERENCE OVER ISDN Fernando Pereira Instituto Superior Técnico Comunicação de Áudio e Vídeo, Fernando Pereira Digital Video Comunicação de Áudio e Vídeo, Fernando Pereira Video versus Images • Still Image Services – No strong temporal requirements; no realtime notion. • Video Services (moving images) – There is a need to strictly follow strong delay requirements to provide a good illusion of motion; essential to provide real-time performance. For each image and video service, it is possible to associate a quality target (quality of service); the first impact of this target is the selection of the right spatial and temporal resolutions to use. Comunicação de Áudio e Vídeo, Fernando Pereira Why Does Video Information Have to be Compressed ? A video sequence is created and consumed as a set of images, happening at a certain temporal rate (F), each of them with a spatial resolution of M× ×N luminance and chrominance samples and a certain number of bits per sample (L) This means the total number of (PCM) bits - and thus the required bandwidth and memory – necessary to digitally represent a video sequence is HUGE !!! Comunicação de Áudio e Vídeo, Fernando Pereira Videotelephony: Just an Example • Resolution: 10 images/s with 288× ×360 luminance samples and 144 × 188 samples for each chrominance, with 8 bit/sample [(360 × 288) + 2 × (180 × 144)] × 8 × 10 = 12.44 Mbit/s • Reasonable bitrate: e.g. 64 kbit/s for an ISDN B channel => Compression Factor: 12.44 Mbit/s/64 kbit/s ≈ 194 The usage or not of compression/source implies the possibility or not to deploy services and thus the existence or not of certain industries, e.g. DVD. Comunicação de Áudio e Vídeo, Fernando Pereira Digital Video: Why is it So Difficult ? Serviço Resolução espacial – Lum. (Y) Resolução espacial – Crom. (U,V) Resolução temporal Factor de forma Débito binário (PCM) TV alta definição 1152 × 1920 576 × 960 50 img/s 16/9 1.3 Gbit/s TV (qualidade difusão, DVD) 576 × 720 576 × 360 25 img/s entrelaçadas 4/3 166 Mbit/s TV (gravação CD) 288 × 360 144 × 180 25 img/s progressivas 4/3 31 Mbit/s Videotelefonia e Videoconfer. 288 × 360 144 × 180 10 img/s progressivas 4/3 12.4 Mbit/s Videotelefonia móvel 144 × 180 72 × 90 5 img/s progressivas 4/3 1.6 Mbit/s Comunicação de Áudio e Vídeo, Fernando Pereira Video Coding/Compression: a Definition Efficient representation (this means with a smaller than PCM number of bits) of a periodic sequence of (correlated) images, satisfying the relevant requirements, e.g. acceptable quality, error robustness, random access. And the service requirements change with the services/applications and the corresponding funcionalities ... Comunicação de Áudio e Vídeo, Fernando Pereira Coding … and Decoding ... Encoder Decoder Comunicação de Áudio e Vídeo, Fernando Pereira How Big Have to be the Compression Factors ? Serviço Resolução espacial Resolução temporal Débito binário codificado 34 Mbit/s Factor de compressão 50 img/s Débito binário (PCM) 1.3 Gbit/s TV alta definição 1152 × 1920 576 × 960 TV alta definição 1152 × 1920 50 img/s 1.3 Gbit/s 17 Mbit/s 76.5 TV (qualidade difusão, DVD) 576 × 720 25 img/s entrelaçadas 166 Mbit/s 6 Mbit/s 27.5 TV (qualidade difusão, DVD) 576 × 720 166 Mbit/s 3 Mbit/s 55 576 × 360 25 img/s entrelaçadas TV (gravação CD) 288 × 360 144 × 180 25 img/s progressivas 31 Mbit/s 1.15 Mbit/s 27 Videoconfer. 288 × 360 144 × 180 25 img/s progressivas 31 Mbit/s 2 Mbit/s 15.5 Videoconfer. 288 × 360 144 × 180 10 img/s progressivas 12.4 Mbit/s 384 kbit/s 32.3 Videotelefonia fixa 288 × 360 10 img/s progressivas 12.4 Mbit/s 64 kbit/s 194 Videotelefonia móvel (GSM) 144 × 180 5 img/s progressivas 1.6 Mbit/s 13 kbit/s 100 38.2 576 × 960 576 × 360 144 × 180 72 × 90 Comunicação de Áudio e Vídeo, Fernando Pereira Where Does Compression Come From ? • REDUNDANCY – Regards the similarities, correlations and predictability amoig the image samples, within (space) and between (temporal) images. - Elimination or reduction of redundancy does not imply information losses since only ‘repetitions’ are removed; this means decoded images are mathematically equal to the original images. • IRRELEVANCY – Regards the information which is imperceptible for the human visual systems. - Elimination or reduction of irrelevancy implies losses in an irreversible process; however the decoded images should have ‘transparent’ quality since the removed information should not be perceptible. Video coding algorithms have to manage these 2 concepts; this means it is essential for them to know about the information statistics (redundancy) and the human visual system behavior (irrelevancy). Comunicação de Áudio e Vídeo, Fernando Pereira The Basic Coding Chain Original Video Symbol Generator (Model) Bits Symbols Entropy Coding Comunicação de Áudio e Vídeo, Fernando Pereira Coding ... The (source) encoder represents the images using a set of symbols, and later bits, using in the best way the available coding tools. Video encoders may be more or less complex, and thus more or less efficient for the same content. The encoder extracts the video information ‘juice’ ! Comunicação de Áudio e Vídeo, Fernando Pereira Interoperability as a Major Requirement: Standards to Assure that More is not Less ... • Compression is essential for digital audiovisual services where interoperability is a major requirement. • Interoperability requires the specification and adoption of standards, notably audiovisual coding standards. • To allow some evolution of the standards and some competition in the market between compatible products from different companies, standards must specify the minimum set of technology possible, typically the bitstream syntax and the decoding process (not encoding process). Comunicação de Áudio e Vídeo, Fernando Pereira Standards: a Trade -off between Fixing and Trade-off Inovating Normative ! Encoder Decoder Comunicação de Áudio e Vídeo, Fernando Pereira Video Coding Standards … • ITU-T H.120 (1984) - Videoconference (1.5 - 2 Mbit/s) • ITU-T H.261 (1988) – Audiovisual services (videotelephony and videoconference) at p× ×64kbit/s, p=1,…,30 • ISO/IEC MPEG-1 (1990)- CD-ROM Video • ISO/IEC MPEG-2 also ITU-T H.262 (1993) – Digital TV • ITU-T H.263 (1996) – PSTN and mobile video • ISO/IEC MPEG-4 (1998) – Audiovisual objects, improved efficiency • ISO/IEC MPEG-4 AVC also ITU-T H.264 (2003) – Improved efficiency Comunicação de Áudio e Vídeo, Fernando Pereira ITU-T H.320 Terminals Videotelephony and Videoconference Comunicação de Áudio e Vídeo, Fernando Pereira Videotelephony and Videoconference Personal (bidirectional) communications in real-time ! Comunicação de Áudio e Vídeo, Fernando Pereira ITU -T H.320 Recommendation: Motivation ITU-T • The starting of the work towards Rec. H.320 and H.261 goes back to 1984 when it was acknowledged that: • There was an increase in the demand for image-based services, notably videotelephony and videoconference. • There was a growing availability of 64, 384 e 1536/1920 kbit/s digital lines as well as ISDN lines. • There was a need to make available image-based services and terminals for the digital lines mentioned above. • The acknowledgement that Rec. H.120, just issued at that time, for videoconference services, was already obsolete in terms of compression efficiency due to the fast development in the are of video compression. Comunicação de Áudio e Vídeo, Fernando Pereira ISDN: Motivation and Objectives • Growing usage of digital technology for the transmission and switching of speech due to their several benefits • Growing demand for data transmission services, notably influenced by the increasing availability of computers • Advantages of service integration in a single network • Need to create services able to justify the digitization of the local network to evolve later to the Broadband ISDN To offer to the users the capability to establish digital connections through a limited number of access types supporting a variety of services for data, speech, audio, image and video. Considering the size of the existing communication infrastructure, the evolution to ISDN should be gradual and mostly based in the available resources. Comunicação de Áudio e Vídeo, Fernando Pereira ISDN: Basic Features • Narrowband-ISDN – Should offer bitrates up to a few Mbit/s - Based on the existing telephone network (PSTN) with large accessibility which guarantees a large geographical coverage - Involves the usage of digital transmission and switching centrals beside signal processing - Uses a new type of signalling between public central (ITU-T nº 7) • Broadband-ISDN – Should offer bitrate up to several hundreds of Mbit/s integrating all services in terms of access and network - Involves a longer time frame and depends on the availability in a large scale of broadband transmission means such as fiber optic - Based on the asynchronous transfer mode (ATM) and the usage of fixed size packets - Offers dynamic resource allocation through a traffic contract Comunicação de Áudio e Vídeo, Fernando Pereira Basic ISDN Channels • B-Channel - 64 kbit/s – B-channel connections may be performed with circuit-switching, packet-switching or rented lines. • D-Channel - 16 ou 64 kbit/s – D-channels have the main function to transport the signalling information associated to B-channels; in the idle periods, they may be used to transmit user data using packet-switching • H-Channel - 384, 1536 ou 1920 kbit/s – Offer connections with higher bitrates. Comunicação de Áudio e Vídeo, Fernando Pereira ISDN Access • ISDN channels are grouped according to 2 access types offered to the users: • Basic Access (2B+D) – Consists in 2 B-channels with kbit/s (fullduplex) and one D-channel with 16 kbit/s (full-duplex); this configuration corresponds to a total bitrate of 192 kbit/s, including synchronization information and frame overhead. • Primary Access – Offers 2 configurations related to the digital transmission hierarchies: - 2048 kbit/s (30B+D) for Europe - 1544 kbit/s (23B+D) for USA/Canada/Japan Comunicação de Áudio e Vídeo, Fernando Pereira Videotelephony and Videoconference: Main Features • Personal communications (point to point or multipoint to multipoint) • Symmetric bidirectional communications (all nodes involved have the same similar features) • Critical delay requirements • Low or intermediate quality requirements • Strong psychological and sociological impacts Comunicação de Áudio e Vídeo, Fernando Pereira Rec. H.320 Terminal Comunicação de Áudio e Vídeo, Fernando Pereira Establishing a H.320 Connection Rec. H.242 defines the communication protocol between H.320 terminals for the initial phase of the connection. Comunicação de Áudio e Vídeo, Fernando Pereira Rec. H.221: Frame Structure Rec. H.221 defines the frame structure for audiovisual services using single or multiple 64 kbit/s channels. Frame structure for a 64 kbit/s channel – ISDN B-channel Comunicação de Áudio e Vídeo, Fernando Pereira Video Coding: Rec. ITU-T H.261 Comunicação de Áudio e Vídeo, Fernando Pereira Recommendation H.261: Objectives Efficient coding of videotelephony and videoconference sequences with a minimum acceptable quality using bitrate from 40 kbit/s to 2Mbit/s targeting synchronous channel (ISDN) at p× ×64 kbit/s, with p=1,...,30. This is the first international video coding standard with meaningful adoption thus introducing the notion of backward compatibility in video coding standards. Comunicação de Áudio e Vídeo, Fernando Pereira H.261 Basic Coding Scheme Comunicação de Áudio e Vídeo, Fernando Pereira H.261: Signals to Code • The signals to code for each image are the luminance (Y) and 2 chrominances, named CB and CR or U and V. • The samples are quantized according to Rec. ITU-R BT-601: - Black = 16; White = 235; Null colour difference = 128 - Peak colour difference (U,V) = 16 and 240 • The coding algorithm operates over progressive (non-interlaced) content at 29.97 image/s. The temporal resolution may be decreased by jumping 1, 2 or 3 images between each transmitted image. Comunicação de Áudio e Vídeo, Fernando Pereira RGB to YUV Conversion Comunicação de Áudio e Vídeo, Fernando Pereira Comunicação de Áudio e Vídeo, Fernando Pereira H.261: Image Format Two spatial resolutions are possible: • CIF (Common Intermediate Format) - 288 × 352 pels for luminance (Y) and 144 × 176 pels for each chrominance (U,V) this means a 4:2:0 subsampling format, with ‘quincux’ positioning, progressive, 30 frame/s with a 4/3 form factor. • QCIF (Quarter CIF) – Similar to CIF with half spatial resolution in both directions this means 144 × 176 pels for luminance and 72 × 88 pels for each chrominance. All H.261 codecs must work with QCIF ands some may be able to work also in CIF. Comunicação de Áudio e Vídeo, Fernando Pereira Chrominance Subsampling Formats Formato Crominância Amostras Lum/Linha Linhas Lum/Imagem Amostras Crom/Linha Linhas Crom/Imagem Factor Subamostr. Horizontal Factor Subamostr. Vertical 4:4:4 4:2:2 4:2:0 4:1:1 4:1:0 720 720 720 720 720 576 576 576 576 576 720 360 360 180 180 576 576 288 576 144 2:1 2:1 4:1 4:1 2:1 4:1 Comunicação de Áudio e Vídeo, Fernando Pereira Groups Of Blocks (GOBs), Macroblocks and Blocks QCIF GOB 1 GOB 2 GOB 3 GOB 4 GOB 5 GOB 6 GOB 7 GOB 8 GOB 9 GOB 10 GOB 11 GOB 12 CIF The video sequence is spatially organized according to a hierarchical structure with 4 levels: - Images Group of Blocks (GOB) Macroblocks (MB) Blocks 1 2 Y 3 U 5 V 6 4 4:2:0 Comunicação de Áudio e Vídeo, Fernando Pereira Comunicação de Áudio e Vídeo, Fernando Pereira Comunicação de Áudio e Vídeo, Fernando Pereira H.261: Coding Tools • Temporal Redundancy • Predictive coding: sending differences and motion compensation • Spatial Redundancy • Transform coding (Discrete Cosine Transform, DCT) • Statistical Redundancy • Huffman entropy coding • Irrelevancy • Quantization of DCT coefficients Comunicação de Áudio e Vídeo, Fernando Pereira Exploring Temporal Redundancy Comunicação de Áudio e Vídeo, Fernando Pereira Temporal Prediction and Prediction Error • Temporal prediction is based on the principle that, locally, each imagem may be represented using as reference a part of some preceding image, typically the previous one. • The prediction quality strongly determines the coding performance since it defines the amount of information to code and transmit this means the energy of the error/difference signal called prediction error. error • The lower is the prediction error the lower is the information/energy to transmit and thus - Better quality may be achieved for a certain available bitrate - Lower bitrate is needed to achieve a certain video quality Comunicação de Áudio e Vídeo, Fernando Pereira Predictive or Differential Coding: Basic Scheme In Rec. H.261, there is no quantization in the temporal domain. Comunicação de Áudio e Vídeo, Fernando Pereira H.261 Temporal Prediction H.261 temporal prediction includes 2 tools which have both the target to eliminate/reduce the temporal redundancy in the PCM video signal: Sending the Differences Motion Compensation Comunicação de Áudio e Vídeo, Fernando Pereira Temporal Redundancy: Sending the Differences The idea is that only the new information in the image (this means what changes from the previous image) is sent; the previous image works as an easy prediction of the current image. ... There are no losses ! Comunicação de Áudio e Vídeo, Fernando Pereira Computing the Differences: an Example Image t Image t-1 Differences Comunicação de Áudio e Vídeo, Fernando Pereira Coding and Decoding ... Comunicação de Áudio e Vídeo, Fernando Pereira Motion Estimation and Compensation Motion estimation and compensation has to target to improve the temporal predictions for each image zone by detecting, estimating and compensating the motion in the image. • Motion estimation is not normative (it is part of the encoder) but the so called block matching is the most used technique. • In H.261, motion compensation is made at macroblock level. The usage of motion compensation for each MB is optional and decided by the encoder. Motion estimation implies a very high computational effort. This justifies the usage of fast motion estimation methods which try to decrease the complexity compared to full search without much quality losses. Comunicação de Áudio e Vídeo, Fernando Pereira Reference image Image to code Comunicação de Áudio e Vídeo, Fernando Pereira Temporal Redundancy: Motion Estimation t Comunicação de Áudio e Vídeo, Fernando Pereira Search, Where ? Searching area Reference image Image to code Comunicação de Áudio e Vídeo, Fernando Pereira Motion Vectors at Different Spatial Resolutions Comunicação de Áudio e Vídeo, Fernando Pereira Searching Area: an Efficiency -Complexity Efficiency-Complexity Trade -off Trade-off t Comunicação de Áudio e Vídeo, Fernando Pereira MBs to Code and Prediction MBs Reference image Current image Comunicação de Áudio e Vídeo, Fernando Pereira Motion Compensation: an Example Image t Image t-1 Diff. WITHOUT motion comp. Differences WITH motion comp Motion vectors Comunicação de Áudio e Vídeo, Fernando Pereira The 33-- Steps Motion Estimation Algorithm Fast motion estimation algorithms offer lower complexity at the cost of some quality decrease since predictions are less optimal and thus the prediction error is higher ! Comunicação de Áudio e Vídeo, Fernando Pereira Motion Compensation Decision Characteristic X db – difference block dbd – displaced block difference X Comunicação de Áudio e Vídeo, Fernando Pereira H.261 Motion Estimation • One motion vectors may be transmitted for each macroblock (if convenient). • Motion vectors may take values from -15 to + 15 pels, in the vertical and horizontal directions, only the integer values. • Only motion vectors referencing existing zones in the reference (previous) image are valid. • The motion vectors transmitted for each MB is used for the 4 luminance blocks in the MB. The chrominance motion vector is computed by dividing by 2 and truncating the luminance motion vector. • O positive value for the horizontal or vertical motion vector components means the prediction must be made using the samples in the previous image, spatially located to the right and below the samples to be predicted. Comunicação de Áudio e Vídeo, Fernando Pereira H.261 Motion Vectors Coding • To exploit the redundancy between the motion vectors of adjacent MBs, each motion vector is differentially coded as the difference between the motion vector of the actual MB and its prediction this means the motion vector of the preceding MB. • The motion vectors prediction is null when: - The actual MB is number 1, 12 or 23 - The last transmitted MB is not adjacent to the actual MB - The preceding and contiguous MB did not use motion compensation Comunicação de Áudio e Vídeo, Fernando Pereira VLC Coding Table for (Differential) Motion Vectors Comunicação de Áudio e Vídeo, Fernando Pereira Exploring Spatial Redundancy and Irrelevancy Comunicação de Áudio e Vídeo, Fernando Pereira After Time, the Space Actual image + + - DCT Transform Prediction error Prediction image, motion compensated Comunicação de Áudio e Vídeo, Fernando Pereira Transform Coding Transform coding involves the division of the image in N× ×N blocks to which the transform is applied, producing blocks with N× ×N coefficients. A transform is formally defined through the formulas with the direct and inverse transforms: F(u,v) = Σi=0 N-1 Σ j=0 N-1 f(i,j) A(i,j,u,v) f(i,j) = Σu=0 N-1 Σ v=0 N-1 F(u,v) B(i,j,u,v) where f(i,j) – input signal (in space) A (i,j,u,v) – direct transform basis functions F(u,v) – transform coefficients B (i,j,u,v) – inverse transform basis functions Comunicação de Áudio e Vídeo, Fernando Pereira The Transform Coefficients Comunicação de Áudio e Vídeo, Fernando Pereira Discrete Cosine Transform (DCT) The DCT is one of the sinusoidal transforms which means its basis functions are sampled sinusoidal functions. N −1 N −1 2 u( 2 j + 1) v ( 2k + 1) F ( u, v ) = C ( u)C (v )∑∑ f ( j , k ) cos π cos π N 2 N 2 N j =0 k =0 2 f ( j, k ) = N u( 2 j + 1) v ( 2k + 1) C ( u ) C ( v ) F ( u , v ) cos π cos π ∑∑ 2N 2N u =0 v =0 N −1 N −1 The DCT is undoubtedly the most used transform in image and video coding since its performance is close to the KLT compacting performance for signals with high correlation and there are fast implementation solutions available. Comunicação de Áudio e Vídeo, Fernando Pereira Bidimensional DCT Basis Functions (N=8 (N=8)) Comunicação de Áudio e Vídeo, Fernando Pereira DCT KLT Comunicação de Áudio e Vídeo, Fernando Pereira Spatial Redundancy: DCT Transform X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Spatial domain (samples) X DCT X X X X X X X X Frequency domain (DCT coefficients) Comunicação de Áudio e Vídeo, Fernando Pereira The DCT Transform in H.261 • In H.261, the DCT is applied to blocks with 8×8 samples. samples This value results from a trade-off between the exploitation of the spatial redundancy and the computational complexity. • The DCT coefficients to transmit are selected using non-normative thresholds allowing the consideration of psycho visual criteria in the coding process targeting the maximization of the subjective quality. • To exploit the irrelevancy in the original signal, the DCT coefficients to transmit for each block are quantized. quantized • Since the signal energy is compacted in the upper, left corner of the coefficients’ matrix and the different human visual system sensibility to the various frequencies, the quantized coefficients are zig-zag scanned to assure that more important coefficients are always transmitted before less important ones. Comunicação de Áudio e Vídeo, Fernando Pereira H.261 Quantization • Rec. H.261 uses as quantization steps all even values between 2 and 62 (32 quantizers available). • Within each MB, all DCT coefficients are quantized with the same quantization step with the exception of the DC coefficients for Intra MBs which are always quantized with step 8. Example quantization characteristic • Rec. H.261 normatively defines the regeneration values for the quantized coefficients but not the decision values which may be selected to implement different quantization characteristics, uniform or not. Comunicação de Áudio e Vídeo, Fernando Pereira Serializing the DCT Coefficients • The transmission of the quantized DCT coefficients requires to send the decoder two types of information about the coefficients: their position and amplitude. • For each DCT coefficient to transmit, its position and amplitude is represented using a bidimensional symbol called (run, level) • where the run indicates the number of null coefficients before the coefficient under coding, and the level indicates the quantized value of the coefficient. Comunicação de Áudio e Vídeo, Fernando Pereira Scanning the DCT Coefficients Zigzag scanning Each DCT coefficients block is represented as a sequence of (run, level) pairs, e.g. (0,124), (0, 25), (0,147), (0, 126), (3,13), (0, 147), (1,40) ... Comunicação de Áudio e Vídeo, Fernando Pereira Comunicação de Áudio e Vídeo, Fernando Pereira Exploring Statistical Redundancy Comunicação de Áudio e Vídeo, Fernando Pereira Statistical Redundancy: Entropy Coding Entropy coding CONVERTS SYMBOLS IN BITS ! Using the statistics of the symbols to transmit to achieve additional (lossless) compression by allocating in a clever way bits to the input symbol stream. • A, B, C, D -> 00, 01, 10, 11 • A, B, C, D -> 0, 10, 110, 111 Comunicação de Áudio e Vídeo, Fernando Pereira Huffman Coding Huffman coding is one of the entropy coding tools which allows to exploit the fact that the symbols produced by the encoder do not have equal probability. • To each generated symbol is attributed a codeword which size (in bits) is inversely proportional to its probability. • The usage of variable length codes implies the usage of an output buffer to ‘smooth’ the bitrate flow, if a synchronous channel is available. • The increase in coding efficiency is ‘paid’ with an increase in the sensibility to channel errors. Comunicação de Áudio e Vídeo, Fernando Pereira VLC Table for Macroblock Addressing Comunicação de Áudio e Vídeo, Fernando Pereira Combining the Tools ... Comunicação de Áudio e Vídeo, Fernando Pereira The H.261 Symbolic Model Original Video Symbol Generator (Model) Bits Symbols Entropy Coder A video sequence is represented as a set of images structured in macroblocks, each of them represented with motion vectors and/or (intra or inter) DCT coefficients. Comunicação de Áudio e Vídeo, Fernando Pereira Encoder: the Winning Cocktail ! Originals + DCT Symbols Gener. Quantiz. Entropy coder Buffer Inverse Quantiz. Entropy coder Inverse DCT + Motion det./comp. Previous frame Comunicação de Áudio e Vídeo, Fernando Pereira Decoder: the Slave ! Data Buffer Demux. Huffman decoder IDCT + Motion comp. Data Comunicação de Áudio e Vídeo, Fernando Pereira Output Buffer • The production of bits by the encoder is highly non-uniform in time essentially because: • Variation in spatial detail for the various parts of each image • Variation of temporal activity along time • Entropy coding of the coded symbols To adapt the variable bitrate flow produced by the encoder to the constant bitrate flow transmitted by the channel, an output buffer is used, which adds some delay. Comunicação de Áudio e Vídeo, Fernando Pereira Bitrate Control The encoder must efficiently control the evolution in time of its bit production in order it maximizes the decoded quality for the synchronous channel available. Rec. H.261 does not specify what type of bitrate control must be used; various tools are available: • Changing the temporal resolution/frame rate • Changing the spatial resolution, e.g. CIF to QCIf and vice-versa • Controlling the macroblock classification • Changing the quantization step value The bitrate control strategy has a huge impact on the video quality that may be achieved with a certain bitrate ! Comunicação de Áudio e Vídeo, Fernando Pereira Quantization Step versus Buffer Fullness The bitrate control solution recognized as more efficient, notably in terms of the granularity and frequency of the control, controls the quantization step as a function of the output buffer fullness. Quantization step control Video sequence Encoder Output buffer Binary flow Comunicação de Áudio e Vídeo, Fernando Pereira The Importance of Well Choosing ! To well exploit the redundancy and irrelevancy in the video sequence, the encoder has to adequately select: • Which coding tools are used for each MB, depending of its characteristics; • Which set of symbols is the best to represent each MB, e.g. motion vector and DCT coefficients. While the encoder has the mission to take important decisions and make important choices, the decoder is a ‘slave’, limited to follow the ‘orders’ sent by the encoder; decoder intelligence is only shown for error concealment. Comunicação de Áudio e Vídeo, Fernando Pereira Macroblock Classification • Macroblocks are the basic coding unit since it is at the macroblock level that the encoder selects the coding tools to use. • Each coding tool is more or less adequate to a certain type of MB; it is important that, for each MB, the right coding tools are selected. • Since Rec. H,261 includes several coding tools, it is the task of the encoder to select the best tools for each MB; MBs are thus classified following the tools used for their coding. • When only spatial redundancy is exploited, MBs are INTRA coded; if also temporal redundancy is exploited, MBs are INTER coded. Comunicação de Áudio e Vídeo, Fernando Pereira Macroblock Classification Table Comunicação de Áudio e Vídeo, Fernando Pereira Hierarchical Information Structure • Image - Resynchronization (Picture header) - Temporal resolution control - Spatial resolution control • Group of Blocks (GOB) - Resynchronization (GOB header) - Quantization step control (mandatory) • Macroblock - Motion estimation and compensation - Quantization step control (optional) - Selection of coding tools (MB classification) • Block - DCT Comunicação de Áudio e Vídeo, Fernando Pereira Coding Syntax: Image and GOB Levels Comunicação de Áudio e Vídeo, Fernando Pereira Coding Syntax: MB and Block Levels Comunicação de Áudio e Vídeo, Fernando Pereira Error Protection for the H.261 Binary Flow • Error protection for the H.261 binary flow is implemented by using a BCH (511,493) - Bose-Chaudhuri-Hocquenghem – block coding. • The usage of the channel coding bits (also parity bits) at the decoder is optional. • The syndrome polynomial to generate the parity bits is g (x) = (x9+ x4+ x) ( x9+ x6+ x4+ x3+ 1) Comunicação de Áudio e Vídeo, Fernando Pereira Error Protection for the H.261 Binary Flow The final video signal stream structure (multiframe with 512× ×8 = 4096 bits): Transmission S1 S7 S2 S8 S1S2S3S4S5...S8 – Alignment sequence S1 Code bits Parity bits (1) (493) (18) 1 (1) 0 (1) Video bits (492) Stuffing bits (1's) (492) 00011011 When decoding, realignment is only valid after the good reception of 3 alignment sequences (S1S2 ...S8). Comunicação de Áudio e Vídeo, Fernando Pereira Error Concealment • Even when channel coding is used, some residual (transmission) errors may end at the source decoder. • Residual errors may be detected at the source detector due to syntactical and semantic inconsistencies. • For digital video, the most basic error concealment techniques imply: - Repeating the co-located data from previous frame - Repeating data from previous frame after motion compensation • Error concealment for non-detected errors may be performed through post-processing. Comunicação de Áudio e Vídeo, Fernando Pereira Error Concealment and Post -Processing Post-Processing Examples Comunicação de Áudio e Vídeo, Fernando Pereira Final Comments • Rec. H.261 has been the first video coding international standard with significant adoption. • As the first relevant video coding standard, Rec. H.261 has established legacy and backward compatibility requirements which influenced the next standards to come, notably in terms of technology selected. • Many products and services have been available based on Rec. H.261. • However, Rec. H.261 does not represent anymore the state-ofthe-art on video coding (remind this standard is from the end of the eighties). Comunicação de Áudio e Vídeo, Fernando Pereira Bibliography • Videoconferencing and Videotelephony, Richard Schaphorst, Artech House, 1996 • Image and Video Compression Standards: Algorithms and Architectures, Vasudev Bhaskaran and Konstantinos Konstantinides, Kluwer Academic Publishers, 1995 • Multimedia Communications, Fred Halsall, Addison-Wesley, 2001 • Multimedia Systems, Standards, and Networks, A. Puri & T. Chen, Marcel Dekker, Inc., 2000 Comunicação de Áudio e Vídeo, Fernando Pereira
Documentos relacionados
MPEG-4 Visual - Multimedia Signal Processing Group, IT-Lx
conveyance by a variety of transport layers or storage media Comunicação de Áudio e Vídeo, Fernando Pereira
Leia mais