Video Compression with Intra/Inter Mode Switching and a Dual Frame Buffer Λ - PDF

Video Compression with Intra/Inter Mode Switching and a Dual Frame Buffer Λ Athanasios Leontaris and Pamela C. Cosman University of California, San Diego Abstract Video codecs that use motion compensation

Please download to get full document.

View again

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.


Publish on:

Views: 17 | Pages: 10

Extension: PDF | Download: 0

Video Compression with Intra/Inter Mode Switching and a Dual Frame Buffer Λ Athanasios Leontaris and Pamela C. Cosman University of California, San Diego Abstract Video codecs that use motion compensation have recently achieved performance improvements from the use of intra/inter mode switching decisions within a ratedistortion framework. A separate development has involved the use of multiple frame prediction, in which more than one past reference frame is available for motion estimation. In this paper, we show that using a dual frame buffer (one short term frame and one long term frame available for prediction) together with intra/inter mode switching improves the compression performance of the coder. Also, we improve the mode switching algorithm with the use of half-pel motion vectors. 1 Introduction Most of today s hybrid video codecs use motion compensated prediction to efficiently encode a raw input video stream. A block in the current frame is predicted from a displaced block in the previous frame. The difference between the original one and its prediction is compressed and transmitted along with the displacement (motion) vectors. Called inter coding, this is the basic approach found in the video coding standards MPEG, MPEG-2, MPEG-4 [1], H.263 [10] and the latest and state-of-the-art H.26L. The idea of using more than one past reference frame to improve coding efficienc y is not new. The first mention [3] of multiple reference frames dates almost a decade back; it was shown that the meansuared error (MSE) between the current frame and the predicted one strictly decreases by using multiple temporal frames for motion compensation. Another early attempt to code an image using a so-called library of past frame components can be found in [2], and made use of vector uantization. Multiple frame prediction was also treated in [8]. In [9], only two time-differential frames were used, thus reuiring a relatively modest increase in computational complexity. We refer to this as a dual frame buffer. One frame was the previous one, as in many hybrid codecs, and the second one contained a reference Λ Supported in part by the National Science Foundation, the Center for Wireless Communications at UCSD, and the CoRe program of the State of California. The authors are with the Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Dr. MC 0407, La Jolla, CA fpcosman, frame from the more distant past that was periodically updated according to a predefined rule. In [7], the authors use a linear weighted combination of two frames, primarily to enhance the error robustness of the codec. Error robustness was also studied within a multiple-reference frame scheme in [4] and [6]. As these papers showed, there is a significant gain in reconstructed PSNR to be obtained, at the expense of increased computational burden and memory complexity. Motion estimation is the main performance bottleneck in a hybrid video coding system, and can account for more than 80-90% of the total encoding time. Thus adding even one additional frame buffer can double the encoding time. The same is true with memory reuirements, where the increase is also linear and thus prohibitive as the number of reference frames grows large. Some efforts were directed to finding fast and efficient algorithms for motion estimation, but some performance penalty is sustained as it is traded-off for reduced computational complexity [14]. In this paper, we show how using a dual frame buffer together with an algorithm for intra/inter mode switching decisions can lead to improved compression performance. The paper is organized as follows. In Section 2 we review the ROPE algorithm [11] for intra/inter mode switching. In Section 3, we show how this algorithm can be used in the context of a dual frame buffer. The use of half-pel motion vectors is covered in Section 4. Results and conclusions are presented in Section 5. 2 Baseline ROPE Algorithm Recent attempts to switch coding modes according to error robustness criteria can be found in [12] and [11]. Our work makes use of the recursive optimal per-pixel estimate (ROPE) algorithm [11] which provides mode decisions in hybrid video coders operating over packet erasure channels. In general, inter-mode achieves higher compression efficienc y than intramode, at the cost of potentially severe error propagation. A single error in a past frame may corrupt all subseuent frames if inter-coding is used repeatedly. This error propagation can only be stopped by transmitting and successfully receiving an intra-coded macroblock. The problem that arises is how to optimally select between intra- and inter-coding for a particular macroblock, such that both error resilience and coding efficienc y are achieved. We assume that the video bitstream is transmitted over a packet erasure channel. Each frame is partitioned into Groups Of Blocks (GOB). Each GOB contains a single horizontal slice of macroblocks (MBs) and is transmitted as a single packet. Each packet can be independently received and decoded, due to resynchronization markers. Thus, a loss of a single packet wipes out one slice of MBs, but keeps the rest of the frame unharmed. Let p be the probability of packet erasure, which is also the erasure probability for each single pixel. When the erasure is detected by the decoder, error concealment is applied. The decoder replaces the lost macroblock by one from the previous frame, using as motion vector (MV) the median of the MVs of the three closest macroblocks in the GOB above the lost one. If the GOB above has also been lost (or the 3 nearest MBs were all intra-coded and therefore have no motion vectors), then the all-zero (0; 0) MV is used, and the lost macroblock is replaced with the co-located one from the previous frame. We will now summarize the ROPE algorithm [11] in some detail as these euations will prove useful in elaborating our proposed method. Frame n of the original video signal is denoted f n, which is compressed and reconstructed at the encoder as ^f n. The decoded (and possibly error-concealed) reconstruction of frame n at the receiver is denoted by f ~ n. The encoder does not know f ~ n, and treats it as a random variable. Let fn i denote the original value of pixel i in frame n, and let i ^f n denote its encoder reconstruction. The reconstructed value at the decoder, possibly after error concealment, is denoted by f ~ n i. The expected distortion for pixel i is: d i n = Ef(f i n ~ f i n )2 g =(f i n )2 2f i n Ef ~ f i n g + Ef( ~ f i n )2 g (1) Calculation of d i n reuires the first and second moments of the random variable of the estimated image seuence f ~ n i. To compute these, recursion functions are developed in [11], in which it is necessary to separate out the cases of intra- and inter-coded MBs. For an intra-coded MB, f ~ n i = ^f n i with probability 1 p, corresponding to correct receipt of the packet. If the packet is lost, but the previous GOB is correct, the concealment based on the median motion vector leads the decoder to associate pixel i in the current frame with pixel k in the previous frame. Thus, f ~ i n = f ~ k with probability p(1 p). Finally, if both current and previous GOB-packets are lost, f ~ n i = f ~ i (occurs with probability p2 ). So the two moments for a pixel in an intra-coded MB are [11]: Ef ~ f i n g =(1 p)( ^f i n )+p(1 p)ef ~ f k g + p2 Ef ~ f i g (2) Ef( f ~ n i )2 g =(1 p)( ^f n i )2 + p(1 p)ef( f ~ k + p 2 Ef( f ~ i (3) For an inter-coded MB, let us assume that its true motion vector is such that pixel i is predicted from pixel j in the previous frame. Thus, the encoder prediction of this pixel is ^f j. The prediction error, ei n, is compressed, and the uantized residue is ^ei n. The encoder reconstruction is: ^f n i = ^f j +^ei n (4) The encoder transmits ^e i n and the MB s motion vector. If the packet is correctly received, the decoder knows ^e i n and the MV, but must still use its own reconstruction of pixel j in the previous frame, f ~ j, which may differ from the encoder value ^f j. Thus, the decoder reconstruction of pixel i is given by: ~f i n = ~ f j +^ei n (5) Again, the encoder models ~ f j as a random variable. The derivation of the moments is similar to the intra-coded MB for the last two cases, but differs for the first case where there is no transmission error (probability 1 p). The first and second moment of ~ f i n for a pixel in an inter-coded MB is then given by: Ef ~ f i n g =(1 p) ^e i n + Ef ~ f j g + p(1 p)ef ~ f k g + p2 Ef ~ f i g (6) Ef( f ~ n i )2 g = (1 p) (^e i n )2 +2^e i n Ef f ~ j g + Ef( f ~ j + p(1 p)ef( ~ f k + p 2 Ef( ~ f i (7) These recursions are performed at the encoder in order to calculate the expected distortion at the decoder. The encoder can exploit this result in its encoding decisions, to optimally choose the coding mode for each MB. 2.1 Rate-Distortion Framework The ROPE algorithm takes into account the expected distortion due to both compression and transmission errors for optimal mode switching. The encoder switches between intraor inter-coding on a macroblock basis, in an optimal fashion for a given bit rate and packet loss rate. The goal is to minimize the total distortion D subject to a bit rate constraint R. Using a Lagrange multiplier, the ROPE algorithm minimizes the total cost J = D + R. Individual MB contributions to this cost are additive, thus it can be minimized on a macroblock basis. Therefore, the encoding mode for each MB is chosen by minimizing min mode J MB =min mode (D MB + R MB ) (8) where the distortion D MB of the MB is the sum of the distortion contributions of the individual pixels. Rate control is achieved by modifying as in [13]. Both the coding mode and the uantization step size are chosen to minimize the Lagrangian cost. This is computationally complex for the encoder, but it enhances coding efficienc y. The resulting bitstream is compatible with an unmodified hybrid video encoder. We note that while the ROPE algorithm is optimal under the given assumptions, there is potential for improvement by incorporating the motion vector choice into the rate-distortion framework, by correctly estimating distortion for half-pel vectors (the algorithm only models distortion for integer motion vectors) or by taking into account the fact that all pixels in a GOB are either lost together, or transmitted correctly. 3 Dual frame buffer extension Our research has focused on using a dual frame buffer together with optimal mode switching within a rate-distortion framework. The basic use of the dual frame buffer is as follows. While encoding frame n, the encoder and decoder both maintain two reference frames in memory. The short-term reference frame is frame n 1. The long term reference frame varies from as recent as frame n 2 to as old as frame n N. When the long term frame is n N, then, when the encoder moves on to encoding frame n +1, the short term reference frame will move forward one to frame n, and the long term reference frame will jump forward to be frame n 1. The long term reference frame will then remain static for N frames, and then jump forward again. Each macroblock can be encoded in one of three coding modes: intra coding, inter coding using the short term buffer (inter-st-coding), and inter coding using the long term buffer (inter-lt-coding). The choice among these three will be made using an extended version of the ROPE algorithm, as described below. Once the coding mode is chosen, the syntax for encoding the bit stream is almost identical to the standard case of the single frame buffer. The only modification is that, if inter coding is chosen, a single bit will be sent to indicate use of the short term or long term frame. We now describe how the choice is made among the coding modes. As before, we use f n, ^f n, and ~ f n to denote the original frame n, the encoder reconstruction of the compressed frame, and the decoder version of the frame, respectively. We assume that the long-term frame buffer was updated l frames ago. Thus, it contains ^f n l at the transmitter and f ~ n l at the receiver. The expected distortion for pixel i in frame n is given by Euation 1. To compute the moments in Euation 1, the recursion steps for pixels in intra-coded and inter-st-coded MBs are identical to the corresponding steps in the original ROPE algorithm. For a pixel in an inter-lt-coded MB, we assume that the true motion vector of the MB is such that pixel i in frame n is predicted from pixel j in frame n l, where l 1. The encoder prediction of this pixel is ^f j n l. The prediction error ei n is compressed, and the uantized residue is denoted by ^e i n. The encoder reconstruction of the pixel is: As the receiver does not have access to ^f j n l, it uses ~ f j n l : ^f i n =^ei n + ^f j n l (9) ~f i n =^ei n + ~ f j n l (10) When the MB is lost, the median motion vector from the three nearest MBs is calculated and used to associate pixel i in the current frame with pixel k in the previous frame. Using the same arguments as in the original ROPE algorithm, we compute the first and second moments of f ~ n i for a pixel in an inter-lt-coded MB, Ef f ~ n i g =(1 p) ^e i n + Ef f ~ j n l g + p(1 p)ef f ~ k g + p2 Ef f ~ i g (11) Ef( f ~ n i )2 g = (1 p) (^e i n )2 +2^e i n Ef f ~ j n l g + Ef( f ~ j n l )2 g + p(1 p)ef( ~ f k + p 2 Ef( ~ f i (12) We note that error concealment is still done using the previous frame n 1 and not the long term frame. This is done regardless of whether the three MBs above are inter- ST-coded or inter-lt-coded, or some combination of the two. The motion vectors may be highly uncorrelated. If the upper GOB is also lost, then we conceal the MB using the co-located block from the previous frame. The presence of neighboring uncorrelated motion vectors negatively affects motion vector coding efficienc y. There is a bit rate loss due to inaccurate prediction of motion vectors from their neighboring motion vectors. Furthermore, compression efficienc y is reduced because we use one bit for every inter-coded MB, to specify the frame buffer. (This overhead could be reduced by using run length coding on the bits, but we do not do this as it incurs penalties in terms of buffering at the decoder and a risk of catastrophic error if the RLC encoded frame buffer selection stream is lost.) Nonetheless, as experimental results will show, the rate-distortion optimization models these additional bits, and is still able to yield superior compression performance. Since the uantization parameter QP takes values from 1 to 31, the original ROPE algorithm optimizes over 62 potential combinations of coding modes (intra or inter) and uantization parameters. With the extra coding mode inter-lt, the search for optimal coding parameters is conducted over 93 combinations. There is a computational increase of approximately 50% for the rate-distortion optimization portion of the encoder. Furthermore, motion estimation complexity is approximately doubled. Hence the total encoding time of the modified encoder is roughly 1.8 times that of the baseline ROPE encoder. 4 Half-Pixel Accuracy Extension to the ROPE Algorithm The use of integer motion vectors limits the reference choices in the previous frame. Most video codecs show a performance advantage when half-pel motion vectors are implemented, as the encoder is now presented with many more options in the search for the best-match block. The use of an additional reference frame likewise presents the encoder with more options for the best match block. We wished to see how the gains from an additional frame buffer compared to those from adding a half-pel grid, and also whether the two approaches could be used together for greater benefit. The use of a half-pel grid in a standard video codec reuires the generation of the halfpel values using some kind of interpolation, and then reuires a four-fold increase in the motion vector search. However, simply adding a half-pel grid within the ROPE algorithm, and attempting to run the optimal mode switching over it, incurs a far more substantial complexity penalty than this, as discussed below. Since the accurate use of a half-pel grid is prohibitive, another approach would be to use a half-pel grid only for finding and transmitting motion vectors, but to leave it out of the ROPE distortion calculation altogether. This is what is done in [11], and it provides some improvement over the use of strictly integer motion vectors. However, as we will now discuss, an approximate modeling of the half-pels within the ROPE algorithm provides further improvement, while avoiding the computational complexity of the fully accurate modeling of a half-pel grid in ROPE. We assume that error concealment (EC) is still done using only the integer portion of the motion vectors, and therefore Euations 2 and 3 for the intra-coded MBs are unchanged. Returning to Euations 6 and 7 for the inter-coded MBs, we see the terms ^e i n, Ef f ~ k g, Ef f ~ i g, Ef( f ~ k and Ef( f ~ i remain unchanged. However, the calculation of Ef f ~ j g and Ef( f ~ j has become critical. Pixel coordinate j now points to a position in an interpolated grid that covers an area four times that of the original image. For this calculation, we differentiate among three types of pixels on the half-pel grid: pixels that coincide with actual (original) pixel positions (called integer-indexed pixels, they do not need to be interpolated), pixels that lie between two integer-indexed pixels (either horizontally or vertically), and pixels that lie diagonally between four integer-indexed pixels. We use bilinear interpolation, so the interpolated value is simply the average of the two or four neighboring integer-indexed pixels. For the integer-indexed pixels, the recursion euations are identical to those of the baseline ROPE algorithm, and the estimation is optimal. Horizontally or Vertically Interpolated Pixel: For a horizontally or vertically interpolated pixel, we assume that j on the interpolated pixel domain corresponds to a pixel that was interpolated using pixels k 1 and k 2 in the original pixel domain. The first moment is computationally tractable: h i 1+Ef f ~ k 1 g + Ef f ~ k 2 g (13) Ef ~ f j g = 1 2 But the expression for the second moment is: Ef( f ~ j = 1 h 1+Ef( f ~ k 1 4 )2 g + Ef( f ~ k 2 i + 2Ef f ~ k 1 g +2Ef f ~ k 2 g +2Ef f ~ k 1 ~ f k 2 g (14) The last term reuires calculating the correlation of matrices whose horizontal/vertical dimension euals the number of pixels in the image. This is computationally infeasible for images of typical size. We will approximate it using the cosine ineuality: Ef( ~ f j» 1 4 h 1+Ef( ~ f k 1 + Ef( ~ f k 2 + 2Ef ~ f k 1 g +2Ef ~ f k 2 g +2 Ef( ~ f k 1 Ef( ~ f k 2 (15) Diagonally Interpolated Pixel: For a diagonally interpolated pixel, we assume that j on the interpolated pixel grid is the result of interpolating pixels k 1, k 2, k 3 and k 4 in the original pixel domain. The first moment can be computed exactly as: h i 2+Ef f ~ k 1 g + Ef f ~ k 2 g + Ef f ~ k 3 g + Ef f ~ k 4 g (16) Ef ~ f j g = 1 4 The accurate but intractable expression for the second moment is: Ef( ~ f j = 1 16 h 4+Ef( ~ f k 1 + Ef( ~ f k 2 + Ef( ~ f k 3 + Ef( ~ f k 4 + 4Ef f ~ k 1 g +4Ef f ~ k 2 g +4Ef f ~ k 3 g +4Ef f ~ k 4 g + 2Ef f ~ k 1 ~ f k 2 g +2Ef f ~ k 1 ~ f k 3 g +2Ef f ~ k 1 ~ f k 4 g i + 2Ef f ~ k 2 ~ f k 3 g +2Ef f ~ k 2 ~ f k 4 g +2Ef f ~ k 3 ~ f k 4 g (17) Applying the same approximation as with the horizontal/vertical case, we obtain: Ef( ~ f j» 1 16 h 4+Ef( ~ f k 1 + Ef( ~ f k 2 + Ef( ~ f k 3 + Ef( ~ f k 4 + 4Ef f ~ k 1 g +4Ef f ~ k 2 g +4Ef f ~ k 3 g +4Ef f ~ k Ef( f ~ k 1 Ef( f ~ k 2 +2 Ef( f ~ k 1 Ef( f ~ k 4 +2 Ef( ~ f k 2 Ef( ~ f k 4 +2 g Ef( f ~ k 1 Ef( f ~ k 3 Ef( ~ f k 2 Ef( ~ f k 3 Ef( ~ f k 3 Ef( ~ f k 4 (18) and we use this upper limit to approximate the second moment. 5 Results We modified an existing H.263+ video codec [10] in two
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks