Abstract | We show that traditional waveform-coding and 3-D model-based coding are not competing alternatives but should be combined to support and complement each other. Both approaches are combined such that the generality of waveform coding and the eciency of 3-D model-based coding are available where needed. The combination is achieved by providing the block-based video coder with a second reference frame for prediction which is synthesized by the model-based coder. The model-based coder uses a parameterized 3-D head model specifying shape and color of a person. We therefore restrict our investigations to typical videotelephony scenarios that show head-and-shoulder scenes. Motion and deformation of the 3-D head model constitute facial expressions which are represented by facial animation parameters (FAPs) based on the MPEG-4 standard. An intensity gradient-based approach that exploits the 3-D model information is used to estimate the FAPs as well as illumination parameters that describe changes of the brightness in the scene. Model failures and objects that are not known at the decoder are handled by standard block-based motion-compensated prediction which is not restricted to a special scene content, but results in lower coding eciency. A Lagrangian approach is employed to determine the most ecient prediction for each block from either the synthesized model frame or the previous decoded frame. Experiments on ve video sequences show that bit-rate savings of about 35 % are achieved at equal average PSNR when comparing the model-aided codec to TMN-10, the state-ofthe-art test model of the H.263 standard. This corresponds to a gain of 2-3 dB in PSNR when encoding at the same average bit-rate.
|
682
|
A versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses
– Tsai
- 1987
|
|
251
|
Efficient bit allocation for an arbitrary set of quantizers
– Shoham, Gersho
- 1998
|
|
243
|
Solving Least Squares Problems
– Lawson, Hanson
- 1974
|
|
138
|
Entropyconstrained Vector Quantization
– Chou, Lookabaugh, et al.
- 1989
|
|
116
|
3D motion estimation in model-based facial image coding
– Li, Roivainen, et al.
- 1993
|
|
108
|
Generalized Lagrange multiplier method for solving problems of optimum allocation of resources
– Everett
- 1963
|
|
104
|
Rate-distortion optimization for video compression
– Sullivan, Wiegand
- 1998
|
|
67
|
Rate-Distortion Optimized Mode Selection for Very Low Bit Rate Video Coding and the Emerging H.263
– Wiegand, Lightstone, et al.
- 1996
|
|
56
|
Long-term memory motion-compensated prediction
– Wiegand, Zhang, et al.
- 1999
|
|
44
|
Developments in model-based video coding
– Pearson
- 1995
|
|
41
|
Analyzing facial expressions for virtual conferencing
– Eisert, Girod
- 1998
|
|
37
|
CANDIDE: A Parameterized Face
– Rydfalk
- 1987
|
|
36
|
Rate constrained motion estimation
– Girod
- 1994
|
|
35
|
Motion compensation for video compression using control grid interpolation
– Sullivan, Baker
- 1991
|
|
28
|
Estimating coloured 3D face models from single images: An example based approach
– Vetter, Blanz
- 1998
|
|
22
|
Object-based analysis-synthesis coding (OBASC) based on the source model of moving flexible 3-D objects
– Ostermann
- 1994
|
|
21
|
Automatic adaptation of a face model in a layered coder with an object-based analysissynthesis layer and a knowledge-based layer
– Kampmann, Ostermann
- 1997
|
|
20
|
3-D motion estimation and wireframe adaption including photometric effects for model-based coding of facial image sequences
– Bozdagi, Tekalp, et al.
- 1994
|
|
19
|
Model-based image coding
– Welsh, Searsby, et al.
- 1990
|
|
16
|
Modeling and animation of facial expressions based on B-splines
– Hoch, Fleischmann, et al.
- 1994
|
|
14
|
Estimation of point light source parameters for object-based coding
– Stauder
- 1995
|
|
13
|
A layered coding system for very low bit rate video coding
– Musmann
- 1995
|
|
13
|
Fully embedded coding of triangle meshes
– Magnor, Girod
- 1999
|
|
12
|
Model-based estimation of facial expression parameters from image sequences
– Eisert, Girod
- 1997
|
|
12
|
Image Sequence Coding using 3D Scene Models
– Girod
- 1994
|
|
9
|
Model-based coding of facial image sequences at varying illumination conditions
– Eisert, Girod
- 1998
|
|
8
|
14496-2, Generic Coding of audio-visual objects
– FDIS
- 1999
|
|
8
|
its disguises: The nonlinear mappings of intensity in perception, CRT’s, film, and video
– Poynton
- 1993
|
|
6
|
A switched model-based coder for video signals
– Chowdhury, Clark, et al.
- 1994
|
|
6
|
Efficient mode selection for block-based motion compensated video coding
– Wiegand, Lightstone, et al.
- 1995
|
|
5
|
Splines in computer graphics: Polar forms and triangular B-spline surfaces", Eurographics
– Greiner, Seidel
- 1993
|
|
5
|
Codec Test Model, Near Term, Version 10
– ITU-TSG16Q15-D-65
- 1998
|
|
4
|
Multiple reference picture coding using polynomial motion models
– Wiegand, Steinbach, et al.
- 1998
|
|
3
|
Using Multiple Global Motion Models for Improved Block-Based Video Coding
– Steinbach, Wiegand, et al.
- 1999
|
|
3
|
The integration of optical and deformable models with applications to human face shape and motion estimation
– DeCarlo, Metaxas
- 1996
|
|
3
|
Fast and e#cient mode and quantizer selection in the rate distortion sense for H.263
– Schuster, Katsaggelos
- 1996
|
|
2
|
An Improved H.263-Codec Using Rate-Distortion Optimization", Download via anonymous ftp to: standard.pictel.com/videosite /9804 Tam/q15d13.doc
– ITU-TSG16Q15-D-13, Andrews
- 1998
|
|
1
|
Dynamic video coding - an overview
– Reusens, Castagno, et al.
- 1996
|