GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

1University of Southern California, 2Google, 3Pennsylvania State University, 4Max Planck Institute for Intelligent Systems
empty


We propose Gaussian flow, a dense 2D motion flow created by splatting 3D Gaussian dynamics, which significantly benefits tasks such as 4D generation and 4D novel view synthesis. (a) Based on monocular videos generated by Lumiere and Sora, our model can generate 4D Gaussian Splatting fields that represent high-quality appearance, geometry and motions. (b) For 4D novel view synthesis, the motions in our generated 4D Gaussian fields are smooth and natural, even in highly dynamic regions where other existing methods suffer from undesirable artifacts.

Abstract

Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis.

Video

Method

Interpolate start reference image.


Between two consecutive frames, a pixel $x_{t_1}$ will be pushed towards $x_{t_1} \rightarrow x_{i,t_2}$ by the 2D Gaussian $i$'s motion $i^{t_1} \rightarrow i^{t_2}$. We can track $x_{t_1}$ in Gaussian $i$ by normalizing it to canonical Gaussian space as $\hat{x}_i$ and unnormalize it to image space to obtain $x_{i,t_2}$. Here, we denote this shift contribution from Gaussian $i$ as $flow^G_{i,t_1,t_2}$. The Gaussian flow $flow^G_{t_1,t_2}(x_{t_1})$ on pixel $x_{t_1}$ is defined as the weighted sum of the shift contributions from all Gaussians covering the pixel ($i$ and $j$ in our example). The weighting factor utilizes alpha composition weights. The Gaussian flow of the entire image can be obtained efficiently by splatting 3D Gaussian dynamics and rendering with alpha composition, which can be implemented similarly to the pipeline of the original 3D Gaussian Splatting.


Results

1. Qualitative results on Consistent4D datase

Empty

2. Gaussian flow and optical flow

Empty


Visualization of optical and Gaussian flows on the input view and a novel view. ``Ours (no flow)'' denotes our model without flow supervision while ``Ours'' is our full model. Note that optical flow values of the background should be ignored because dense optical flow algorithms calculate correspondences among background pixels. We calculate optical flow $flow^o_{t_1t_2}$ on rendered sequences by autoflow. From the $\#$1 and the $\#$4 column, we can see that both rendered sequences on input view have high-quality optical flow, indicating correct motions and appearance. Comparing Gaussian flows at the $\#$2 and the $\#$5 column, we can see that the underlining Gaussians will move inconsistently without flow supervision. It is due to the ambiguity of appearance and motions while only being optimized by photometric loss on a single input view. Aligning Gaussian flow to optical flow can drastically improve irregular motions ( $\#$3 column) and create high-quality dynamic motions ($\#$6 column) on novel views.


More Results

Empty
Empty

3. Qualitative results on DyNeRF dataset

Empty


The left column shows the novel view rendered images and depth maps of RT-4DGS, which suffers from artifacts in the dynamic regions and can hardly handle time-variant specular effect on the moving glossy object. The right column shows the results of the same method while optimized with our flow supervision during training. Please refer to our paper and video for more results.


Video Results





Acknowledgments


We thank the following great works DreamGaussian, DreamGaussian4D, Consistent4D, RT-4DGS, Dynamic3DGaussians, and 3DGS for their codes.

BibTeX

@article{gao2024gaussianflow,
  author    = {Quankai Gao and Qiangeng Xu and Zhe Cao and Ben Mildenhall and Wenchao Ma and Le Chen and Danhang Tang and Ulrich Neumann},
  title     = {GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation},
  journal   = {},
  year      = {2024},
}