AdaptiveSplat

Abstract

Current feed-forward 3D reconstruction methods predict pixel aligned Gaussian primitives, resulting in highly redundant representations. A natural solution is to prune the redundant Gaussians, but naive pruning introduces severe artifacts and often requires inference time fine-tuning, breaking the feed-forward paradigm. Based on previous works, high frequency regions require more Gaussian primitives, while low frequency regions can be represented with significantly fewer primitives. Motivated by this, we propose a novel approach to explicitly control the number of Gaussians by leveraging local texture information. Our approach achieves this through three key components: (1) texture estimation to capture spatial variation in scene detail, (2) texture-aware pruning that removes redundant Gaussians from low frequency regions, and (3) an adaptive Gaussian head that predicts the modified attributes of the retained primitives without breaking the feed-forward paradigm. Experiments on RE10K, ACID, DL3DV, Tanks and Temples, and DTU demonstrate the effectiveness of our approach, while ablation studies validate the contributions of its key components.

Methodology

AdaptiveSplat plugs into any feed-forward backbone (e.g. MASt3R, VGGT, pixelSplat, AnySplat): the backbone produces an initial point cloud, which we make controllable through texture-guided pruning and an adaptive Gaussian head.

AdaptiveSplat pipeline — **(a)** Initial point cloud from the feed-forward backbone. **(b)** SuperCluster formation with DWT-based texture analysis to locate low-texture regions. **(c)** Texture-driven masks guide the Gaussian head to predict optimized attributes for the retained primitives.

Texture-based scene energy

A single-level Discrete Wavelet Transform on each input view aggregates the horizontal, vertical and diagonal responses into a per-pixel map of high-frequency scene energy.

Texture-aware pruning

Points are grouped into SuperClusters via K-means over position and colour. Low-energy (smooth) clusters are pruned first to meet the target pruning budget, while textured regions are retained.

Adaptive Gaussian head

A DPT head fuses ViT features with the retention masks to re-predict the scale, rotation, opacity and spherical harmonics of the kept primitives, so fewer, larger Gaussians cover the pruned regions.

Citation

@inproceedings{singhal2026adaptivesplat,
  title     = {AdaptiveSplat: Texture-Aware Controllable 3D Gaussian
               Allocation for Feed-Forward Reconstruction},
  author    = {Singhal, Badrinath and K G, Srihari and Iyer, Sreehari
               and Dhiman, Ankit and Babu, R. Venkatesh},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026},
  organization = {Vision and AI Lab, Indian Institute of Science}
}