Adaptive Greedy Layer Pruning: Iterative Layer Pruning with Subsequent Model Repurposing

Reducing the memory requirements during inference time in pre-trained language models (PLMs) constitutes a key challenge. In this paper, we rigorously investigate the possibility of progressively removing layers from PLMs during their fine-tuning process, in such a way that their final task performance degrade minimally. Our proposed approach not only provides a considerable reduction in the inference cost of using PLMs, but it also highlights the importance of distinct layers, via the identification of layers with marginal contribution to downstream task performance. Our experiments, encompassing seven diverse tasks, corroborate that the exclusion of less pertinent transformer layers facilitates more efficient inference without causing serious degradation of task performance. Indeed, we were able to omit up to 2.2x more layers from the investigated PLMs (depending on the backbone model) compared to a strong layer pruning baseline when preserving no less than 95% of the performance of the full backbone model.

Morzsák

Oldal címe

Adaptive Greedy Layer Pruning: Iterative Layer Pruning with Subsequent Model Repurposing

Címlapos tartalom