Recent developments in Gaussian splatting have enabled high-fidelity 3D reconstruction from multi-view images, but pixel-aligned methods such as MASt3R often produce redundant primitives and inconsistent geometry under few-view settings. We propose CSG-Fusion, a feed-forward framework that mindfully integrates pixel-aligned pointmap to reduce redundant primitives and produce compact and consistent 3D structures. Our approach leverages a matching prior with spatial thresholds to prune overlapping Gaussians, forming a coherent base 3D model, and then applies a mask-based feature aggregation module to merge local features and improve photometric consistency with fewer primitives. To enforce cross-view agreement after fusion, we further incorporate context-view supervision to align appearance and geometry across perspectives. Experiments on the large-scale ScanNet++ and object-level DTU benchmarks demonstrate both the efficiency and generalization of our method. Compared to the leading pose-known and pose-free approaches, our method achieves higher rendering quality with substantially fewer Gaussians.
inproceedings
BibTeXKey: XJC+25