Temporally Coherent 3D Animation Reconstruction from RGB-D Video Data

We present a new method to reconstruct a temporally
coherent 3D animation from single or multi-view RGB-D video data
using unbiased feature point sampling. Given RGB-D video data, in
form of a 3D point cloud sequence, our method first extracts feature
points using both color and depth information. In the subsequent
steps, these feature points are used to match two 3D point clouds in
consecutive frames independent of their resolution. Our new motion
vectors based dynamic alignement method then fully reconstruct
a spatio-temporally coherent 3D animation. We perform extensive
quantitative validation using novel error functions to analyze the
results. We show that despite the limiting factors of temporal and
spatial noise associated to RGB-D data, it is possible to extract
temporal coherence to faithfully reconstruct a temporally coherent
3D animation from RGB-D video data.





References:
[1] Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter
Seidel. Free-viewpoint video of human actors. ACM Trans. Graph.,
22(3):569–577, 2003.
[2] Jonathan Starck and Adrian Hilton. Surface capture for
performance-based animation. IEEE Computer Graphics and
Applications, 27(3):21–31, 2007.
[3] Paul E. Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker,
Westley Sarokin, and Mark Sagar. Acquiring the reflectance field of
a human face. In SIGGRAPH, pages 145–156, 2000.
[4] Tim Hawkins, Per Einarsson, and Paul E. Debevec. A dual light stage.
In EGSR, pages 91–98, 2005.
[5] Christian Theobalt, Naveed Ahmed, Gernot Ziegler, and Hans-Peter
Seidel. High-quality reconstruction of virtual actors from multi-view
video streams. IEEE Signal Processing Magazine, 24(6):45–57, 2007.
[6] Edilson de Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed,
Hans-Peter Seidel, and Sebastian Thrun. Performance capture from
sparse multi-view video. ACM Trans. Graph., 27(3), 2008.
[7] Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popovic.
Articulated mesh animation from multi-view silhouettes. ACM Trans.
Graph., 27(3), 2008.
[8] Naveed Ahmed, Christian Theobalt, Christian R¨ossl, Sebastian
Thrun, and Hans-Peter Seidel. Dense correspondence finding for
parametrization-free animation reconstruction from video. In CVPR,
2008.
[9] Art Tevs, Alexander Berner, Michael Wand, Ivo Ihrke, and Hans-Peter
Seidel. Intrinsic shape matching by planned landmark sampling. In
Eurographics, 2011.
[10] Peng Huang, Adrian Hilton, and Jonathan Starck. Shape similarity for
3d video sequences of people. International Journal of Computer Vision,
89(2-3):362–381, 2010.
[11] Masaki Hilaga, Yoshihisa Shinagawa, Taku Kohmura, and Tosiyasu L.
Kunii. Topology matching for fully automatic similarity estimation of
3d shapes. In SIGGRAPH ’01, pages 203–212, New York, NY, USA,
2001. ACM.
[12] Cedric Cagniart, Edmond Boyer, and Slobdodan Ilic. Iterative mesh
deformation for dense surface tracking. In ICCV Workshops, ICCV’09,
2009.
[13] Kiran Varanasi, Andrei Zaharescu, Edmond Boyer, and Radu Horaud.
Temporal surface tracking using mesh evolution. In ECCV’08, pages
30–43, Berlin, Heidelberg, 2008.
[14] MICROSOFT. Kinect for microsoft windows and xbox 360.
http://www.kinectforwindows.org/, November 2010.
[15] Y. M. Kim, D. Chan, Christian Theobalt, and S. Thrun. Design and
calibration of a multi-view tof sensor fusion system. In CVPR Workshop,
2008.
[16] Y. M. Kim, Christian Theobalt, J. Diebel, J. Kosecka, B. Micusik,
and S. Thrun. Multi-view image and tof sensor fusion for dense 3d
reconstruction. In 3DIM, pages 1542–1549, Kyoto, Japan, 2009. IEEE.
[17] Victor Castaneda, Diana Mateus, and Nassir Navab. Stereo
time-of-flight. In ICCV, 2011.
[18] Alexander Weiss, David Hirshberg, and Michael J. Black. Home 3d
body scans from noisy image and range data. In ICCV, 2011.
[19] Andreas Baak, Meinard Muller, Gaurav Bharaj, Hans-Peter Seidel, and
Christian Theobalt. A data-driven approach for real-time full body pose
reconstruction from a depth camera. In ICCV, 2011.
[20] R. Girshick, J. Shotton, P. Kohli, A. Criminisi, and A. Fitzgibbon.
Efficient regression of general-activity human poses from depth images.
In ICCV, 2011.
[21] Kai Berger, Kai Ruhl, Yannic Schroeder, Christian Bruemmer, Alexander
Scholz, and Marcus A. Magnor. Markerless motion capture using
multiple color-depth sensors. In VMV, pages 317–324, 2011.
[22] Naveed Ahmed. A system for 360 degree acquisition and 3d animation
reconstruction using multiple rgb-d cameras. In Proceedings of the 25th
International Conference on Computer Animation and Social Agents
(CASA), Casa’12, 2012.
[23] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library
(PCL). In ICRA, 2011.
[24] David G. Lowe. Object recognition from local scale-invariant features.
In ICCV, pages 1150–1157, 1999.