HNeRV: A Hybrid Neural Representation for Videos CVPR 2023
- Hao Chen1 Matthew Gwilliam1, Ser-Nam Lim2,
- Abhinav Shrivastava1
Abstract
Implicit neural representations store videos as neural networks and have performed well for vision tasks such as video compression and denoising. With frame index and/or positional index as input, implicit representations (NeRV, E-NeRV, etc.) reconstruct video frames from fixed and content-agnostic embeddings. Such embedding largely limits the regression capacity and internal generalization for video interpolation. In this paper, we propose a Hybrid Neural Representation for Videos (HNeRV), where learnable and content-adaptive embeddings act as decoder input. Besides the input embedding, we introduce a HNeRV block to make model parameters evenly distributed across the entire network, therefore higher layers (layers near the output) can have more capacity to store high-resolution content and video details. With content-adaptive embedding and re-designed model architecture, HNeRV outperforms implicit methods (NeRV, E-NeRV) in video regression task for both reconstruction quality and convergence speed, and shows better internal generalization. As a simple and efficient video representation, HNeRV also shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs (H.264, H.265) and learning-based compression methods. Finally, we explore the effectiveness of HNeRV on downstream tasks such as video compression and video inpainting.
1) HNeRV overview
2) HNeRV architecture
data:image/s3,"s3://crabby-images/1994c/1994c87a7d6a15076408d102c2ac253d406b071e" alt=""
Balanced parameters
data:image/s3,"s3://crabby-images/f2dc3/f2dc348a97ffde9714b3088654bcb173b421f806" alt=""
K: (Kmin, Kmax);      
r = Cout / Cin.
We increase kernel sizes and channel withs (smaller r) for higher layers, to balance parameters.
data:image/s3,"s3://crabby-images/a049d/a049d391e077fa4e599131624cb393f4d041b80a" alt=""
3) Video Regression
data:image/s3,"s3://crabby-images/b927d/b927d1e9a923d110a1b45d03d8045f1107f3a215" alt=""
Visualization results
data:image/s3,"s3://crabby-images/06200/0620008dcae7010853ca7d41e92c073cf670505f" alt=""
4) Video Decoding
data:image/s3,"s3://crabby-images/bfa59/bfa598f9ab764754379c57db0e25f3da98cf1b55" alt=""
5) Internel Generalization
Embedding interpolation results
data:image/s3,"s3://crabby-images/4552e/4552edada633f224926ed80dc70065cc35af9796" alt=""
data:image/s3,"s3://crabby-images/71d77/71d77519a8e000e34991cfda1c46ad36f88e3f38" alt=""
6) Video Compression
Overall results on UVG dataset
data:image/s3,"s3://crabby-images/efcaa/efcaabab2d28690181808bb99e7a51912b61ea79" alt=""
Best & owrst cases on UVG dataset
data:image/s3,"s3://crabby-images/15c79/15c79a53a3b6020dbbdd8e0ead6b8f58586627b5" alt=""
7) Video restoration
data:image/s3,"s3://crabby-images/ada59/ada599ff76addf92097274c9ac11be8a126f0f4a" alt=""
data:image/s3,"s3://crabby-images/b8397/b839724b715ee59e6a9c6810dd76704b7d2fa010" alt=""
data:image/s3,"s3://crabby-images/db2ca/db2cacf284249d2bd6c2532a0df95ac155945a86" alt=""
data:image/s3,"s3://crabby-images/9af7f/9af7f7f38c5f35beb147fea6b83a5e5b973a9918" alt=""
HNeRV input HNeRV output
Citation
The website template was borrowed from Ben Mildenhall.