← 返回首页
LiWi: Layering in the Wild
Contents
Overview Abstract ADD Pipeline LiWi-100K Shadow-Guided Boundary Refinement Quantitative Results Qualitative Results Citation
LiWi: Layering in the Wild
Yu He1*
Fang Li1*
Haoyang Tong1,2
Lichen Ma1
Xinyuan Shan1
Jingling Fu1
Dong Chen1
Luohang Liu1
Junshi Huang1†
Yan Li1
1JD.com
2MAIS & NLPR, CASIA
*Equal contribution, †Corresponding author
Figure 1: Comparison overview of layered decomposition results on in-the-wild images.
TL;DR: LiWi tackles in-the-wild image layering with an Agent-driven Data Decomposition pipeline for scalable supervision, a shadow-guided learning objective for photometric effects, and a degradation-restoration objective for cleaner boundaries.
Recent advances in generative models have empowered impressive layered image generation, yet their success is largely confined to graphic design domains. The layering of in-the-wild images remains an underexplored problem, limiting fine-grained editing and applications of images in real-world scenarios. Specifically, challenges remain in scalable layered data and the modeling of object interaction in natural images, such as illumination effects and structural boundary. To address these bottlenecks, we propose a novel framework for high-fidelity natural image decomposition. First, we introduce an Agent-driven Data Decomposition (ADD) pipeline that orchestrates agents and tools to synthesize layered data without manual intervention. Utilizing this pipeline, we construct a large-scale dataset, named LiWi-100k, with over 100,000 high-quality layered in-the-wild images. Second, we present a novel framework that jointly improves photometric fidelity and alpha boundary accuracy. Specifically, shadow-guided learning explicitly models the illumination effects, and the degradation-restoration objective provides boundary-correction supervision by recovering clean foreground image from degraded one. Extensive experiments demonstrate that our framework achieves state-of-the-art performance in natural image decomposition, outperforming existing models in RGB L1 and Alpha IoU metrics.
Agent-driven Data Decomposition
Figure 2: Overview of the Agent-driven Data Decomposition pipeline.

As shown in Figure 2, the ADD pipeline leverages agents and specialized tools to automatically decompose in-the-wild images. Background and foreground layers are curated separately and later recombined by the Layered Image Curator, where consistency checks ensure the quality of the final layered compositions.

Learning in-the-wild image layering requires supervision that is rarely available in real photographs. Unlike PSD-style assets with explicitly authored layers, natural images entangle foreground appearance, occlusion, cast shadows, reflections, and illumination changes into a flattened RGB observation. To address this bottleneck, LiWi introduces a multi-agent system that automatically synthesizes high-quality layered samples from in-the-wild images.

LiWi-100K
Figure 3: Distribution summary of the LiWi-100k dataset.

Figure 3 summarizes the composition of LiWi-100k across diverse natural scenes and structural layouts. This distribution supports training and evaluation for layer decomposition beyond graphic design templates.

Shadow-Guided Learning
Figure 4: Shadow-guided learning models photometric footprints induced by foreground objects.

As illustrated in Figure 4, real-world photographs contain cast shadows, illumination variations, and contact darkening that are difficult to assign to ordinary foreground or background layers. LiWi introduces an explicit shadow layer to represent these foreground-induced photometric footprints, which helps the model separate illumination effects from semantic layers and improves color consistency in decomposition.

Degraded Boundary Refinement
Figure 5: Illustration of the degradation-restoration process for boundary refinement.

Figure 5 shows the degraded boundary refinement process. Natural image layers often suffer from boundary erosion, dilation, or blur around thin structures and object edges. LiWi addresses this with a degradation-restoration objective: the model is trained to recover clean foregrounds from deliberately degraded boundary observations, providing extra supervision for sharper alpha boundaries and more accurate foreground reconstruction.

Quantitative Results
Table 1: Quantitative results on the LiWi-100k test set.

Table 1 reports the quantitative results on LiWi-100k. The original Qwen-Image-Layered model shows a clear domain gap when transferred from graphic designs to in-the-wild images. Fine-tuning on LiWi data already improves both RGB reconstruction and alpha estimation, while the full LiWi framework further reduces RGB L1 and improves Alpha soft IoU across all edit settings.

Table 2: Quantitative results on the Crello benchmark.

Table 2 shows that LiWi also outperforms prior methods on the Crello benchmark. Although Crello contains raster graphic designs rather than natural photographs, LiWi retains strong performance and improves both RGB L1 and Alpha soft IoU, showing that the method generalizes beyond the in-the-wild setting.

Table 3: Zero-shot foreground segmentation results on an external benchmark.

Table 3 highlights that LiWi's alpha quality transfers to an external benchmark. Even without dedicated segmentation training on DIS-5K, the model remains competitive on fine boundary metrics, consistent with the role of the degradation-restoration objective.

Qualitative Results
Figure 6: Results of the LiWi framework on the test set.

Figure 6 summarizes the visual results of the LiWi framework on the test set. LiWi consistently produces cleaner decompositions for in-the-wild scenes, better preserving foreground completeness while removing residual shadows and other photometric artifacts from the background.

Figure 7: Visualization of LiWi layered samples with two and three layers.

Figure 7 highlights layered samples from the LiWi dataset with two- and three-layer compositions. It illustrates the diversity of foreground-background arrangements supported by the proposed data construction pipeline.

Figure 8: Visualization of LiWi-100k across multiple scenes and layouts.

Figure 8 presents a broader view of LiWi-100k across multiple scene types and object layouts. It reflects the coverage of the proposed dataset in terms of natural appearance, structural complexity, and layered composition diversity.

Citation

If you find our work interesting, please consider citing our paper:

@misc{he2026liwi, title = {LiWi: Layering in the Wild}, author = {He, Yu and Li, Fang and Tong, Haoyang and Ma, Lichen and Shan, Xinyuan and Fu, Jingling and Chen, Dong and Liu, Luohang and Huang, Junshi and Li, Yan}, year = {2026}, eprint = {2605.14552}, archivePrefix = {arXiv}, primaryClass = {cs.CV}, doi = {10.48550/arXiv.2605.14552}, url = {https://arxiv.org/abs/2605.14552} }
Model Access: Open the access form