INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models

Abstract

We introduce INTERLACE, a novel framework that prunes redundant layers in Vision-Language Models (VLMs) while maintaining performance through sample-efficient finetuning. Existing layer pruning methods lead to significant performance drop when applied to VLMs. Instead, we analyze triplets of consecutive layers to identify local redundancy, removing the most redundant of the first two layers, finetuning the remaining layer to compensate for the lost capacity, and freezing the third layer to serve as a stable anchor during finetuning.

By finetuning only a subset of layers on just 1% of the FineVision dataset for one epoch, INTERLACE achieves 88.9% average performance retention after dropping 25% of the network, outperforming alternative pruning methods by 28.4%.

INTERLACE framework overview: triplet-based layer selection, pruning, and fine-tuning with frozen anchors

Figure 1. Interlace identifies local redundancy by calculating cosine similarity for triplets of layers. In each selected triplet, the most redundant of the first two layers is dropped (red), the other is fine-tuned (cyan), and the third is frozen as a stable anchor (blue). The performance comparison (top right) shows that Interlace outperforms alternative pruning methods by 28.4%.

Method

Triplet Analysis

Compute cosine similarity across triplets of consecutive layers to identify locally redundant regions in the network.

Layer Assignment

Within each selected triplet: drop the most redundant layer, fine-tune the other, and freeze the third as a stable anchor.

Efficient Fine-Tuning

Train only the selected layers on 1% of FineVision for a single epoch using standard cross-entropy loss with DeepSpeed ZeRO-3.

Dropped Fine-tuned Frozen (anchor)

Results

Comparison with Pruning Methods (Qwen3-VL-8B, 25% pruning)

Method	Sparsity	Fine-Tune	TTFT Speedup	Text/Chart	GVQA	Perception	Inst&Sci	Avg	Rel. Perf.
Dense	0%	–	1.00x	79.3	79.1	76.5	74.9	77.8	97.1%
Dense-FT	0%	✓	1.00x	83.2	80.2	75.8	82.4	80.5	100.0%
Wanda 2:4	50%	–	0.97x	6.1	7.8	5.7	10.7	7.2	8.9%
Magnitude 2:4	50%	–	0.97x	6.2	7.6	7.9	10.6	7.7	9.5%
SLEB	25%	–	1.12x	43.4	54.1	48.4	51.3	48.6	60.5%
SLEB-FT	25%	✓	1.12x	50.5	43.8	41.4	47.4	46.0	57.1%
INTERLACE (Ours)	25%	✓	1.18x	74.5	73.6	64.9	72.8	71.6	88.9%

Performance Across Pruning Ratios (Relative to Baseline, CoT Enabled)

Model	10% Drop	15% Drop	20% Drop	25% Drop
Qwen3-VL-8B	94.0%	92.1%	86.9%	86.1%
Qwen3-VL-4B	93.9%	91.9%	88.0%	81.7%

Ablation Studies (Relative to INTERLACE = 100%)

Method	Text/Chart	GVQA	Perception	Inst&Sci	Avg
Consecutive	76.6	55.9	54.8	71.4	65.1
Random	87.9	86.9	77.5	87.9	85.1
Interlace-OA	95.8	96.2	89.6	98.9	94.9
Interlace-TN	99.4	97.3	99.3	98.3	98.7