INTERLACE

Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models
CVPR 2026

Parsa Madinei · Ryan Solgi · Ziqi Wen · Jonathan Skaza · Miguel Eckstein · Ramtin Pedarsani
UC Santa Barbara

Abstract

We introduce INTERLACE, a novel framework that prunes redundant layers in Vision-Language Models (VLMs) while maintaining performance through sample-efficient finetuning. Existing layer pruning methods lead to significant performance drop when applied to VLMs. Instead, we analyze triplets of consecutive layers to identify local redundancy, removing the most redundant of the first two layers, finetuning the remaining layer to compensate for the lost capacity, and freezing the third layer to serve as a stable anchor during finetuning.

By finetuning only a subset of layers on just 1% of the FineVision dataset for one epoch, INTERLACE achieves 88.9% average performance retention after dropping 25% of the network, outperforming alternative pruning methods by 28.4%.

INTERLACE framework overview: triplet-based layer selection, pruning, and fine-tuning with frozen anchors

Figure 1. Interlace identifies local redundancy by calculating cosine similarity for triplets of layers. In each selected triplet, the most redundant of the first two layers is dropped (red), the other is fine-tuned (cyan), and the third is frozen as a stable anchor (blue). The performance comparison (top right) shows that Interlace outperforms alternative pruning methods by 28.4%.

Method

1

Triplet Analysis

Compute cosine similarity across triplets of consecutive layers to identify locally redundant regions in the network.

2

Layer Assignment

Within each selected triplet: drop the most redundant layer, fine-tune the other, and freeze the third as a stable anchor.

3

Efficient Fine-Tuning

Train only the selected layers on 1% of FineVision for a single epoch using standard cross-entropy loss with DeepSpeed ZeRO-3.

Dropped Fine-tuned Frozen (anchor)

Results

Comparison with Pruning Methods (Qwen3-VL-8B, 25% pruning)

Method Sparsity Fine-Tune TTFT Speedup Text/Chart GVQA Perception Inst&Sci Avg Rel. Perf.
Dense0%1.00x 79.379.176.574.977.897.1%
Dense-FT0%1.00x 83.280.275.882.480.5100.0%
Wanda 2:450%0.97x 6.17.85.710.77.28.9%
Magnitude 2:450%0.97x 6.27.67.910.67.79.5%
SLEB25%1.12x 43.454.148.451.348.660.5%
SLEB-FT25%1.12x 50.543.841.447.446.057.1%
INTERLACE (Ours)25%1.18x 74.573.664.972.871.688.9%

Performance Across Pruning Ratios (Relative to Baseline, CoT Enabled)

Model 10% Drop 15% Drop 20% Drop 25% Drop
Qwen3-VL-8B 94.0%92.1%86.9%86.1%
Qwen3-VL-4B 93.9%91.9%88.0%81.7%

Ablation Studies (Relative to INTERLACE = 100%)

Method Text/Chart GVQA Perception Inst&Sci Avg
Consecutive76.655.954.871.465.1
Random87.986.977.587.985.1
Interlace-OA95.896.289.698.994.9
Interlace-TN99.497.399.398.398.7

Pretrained Models

All pruned models are available on HuggingFace for direct inference:

8B — 10% Drop

94.0%
relative performance
Download →

8B — 15% Drop

92.1%
relative performance
Download →

8B — 20% Drop

86.9%
relative performance
Download →

8B — 25% Drop

86.1%
relative performance
Download →

4B — 10% Drop

93.9%
relative performance
Download →

4B — 15% Drop

91.9%
relative performance
Download →

4B — 20% Drop

88.0%
relative performance
Download →

4B — 25% Drop

81.7%
relative performance
Download →

Citation

@inproceedings{madinei2026interlace, title={INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models}, author={Madinei, Parsa and Solgi, Ryan and Wen, Ziqi and Skaza, Jonathan and Eckstein, Miguel and Pedarsani, Ramtin}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2026} }