LLM Retraining: Avoiding Catastrophic Forgetting with Narrow Updates

by Rachel Kim – Technology Editor October 14, 2025

written by Rachel Kim – Technology Editor October 14, 2025

New Research offers a Solution to “Catastrophic Forgetting” in Large ‌Language Models

A new approach to fine-tuning large language models⁣ (LLMs) aims to address a common problem: “catastrophic‍ forgetting,” where a⁤ model loses⁣ previously learned abilities when adapted for new tasks. ⁣Researchers at‍ the University‌ of Illinois Urbana-Champaign have developed a method⁤ to ‍retrain LLMs more efficiently, minimizing the risk of this knowledge loss and reducing ⁢significant computational costs.

the research, detailed in a‍ recent paper, focuses on two vision-language models, LLaVA and Qwen 2.5-VL. The ‌team’s core idea is to avoid retraining the entire model, rather concentrating on⁣ updating only ⁢specific components. This is crucial as training a new LLM can be⁤ incredibly expensive – costing‍ millions of dollars, taking⁢ weeks, and generating substantial⁤ carbon ⁤emissions.

Initially, ⁤the researchers investigated the cause of catastrophic forgetting by fine-tuning the⁢ models on a series of tasks and evaluating performance.They observed a surprising ⁣phenomenon: while performance initially dropped on some tasks,‍ it often recovered on⁤ others‌ not directly related to the‍ training data. This‍ led‌ them to hypothesize that forgetting ‍isn’t a true loss of memory,‌ but rather a shift in the model’s output bias caused⁢ by the new task distribution.

Further experimentation revealed that⁢ tuning only the self-attention projection layers resulted in strong performance on target⁤ tasks without any decline ‌in performance on other‍ tasks. Conversely, tuning⁤ the model’s ⁢multi-layer perceptron (MLP) – its internal decision-making component -‌ increased‌ the‍ likelihood of biased outputs and⁤ a temporary drop in⁢ accuracy on‌ held-out ⁢tasks.

The researchers discovered that by freezing the “down projection” of ‍the MLP and only tuning the “up/gating⁢ projections,” they could achieve similar learning results⁤ to ⁣full MLP ‍tuning, but with significantly less forgetting. This targeted‌ approach offers a more controlled and reproducible method for fine-tuning.

By focusing on these narrow segments⁤ of the model, enterprises can drastically reduce compute costs ‌and better manage output drift. While the study was limited to vision-language models due to resource constraints, the researchers believe their findings are broadly applicable ⁤to other LLMs ‍and modalities. The research suggests a ‍path towards more efficient and effective LLM ‍customization for real-world ‍applications,allowing models to adapt to new tasks without sacrificing ⁣existing knowledge.

Rachel Kim – Technology Editor

Rachel Kim – Technology Editor Rachel Kim is Technology Editor at World Today News, specializing in digital trends, artificial intelligence, and innovation. Her reporting helps readers understand the impact of new technologies on everyday life and the world economy.

LLM Retraining: Avoiding Catastrophic Forgetting with Narrow Updates

New Research offers a ​Solution to “Catastrophic Forgetting” in Large ‌Language Models

Share this:

Related

GLP-1 RA Access Barriers: Study Reveals Racial Disparities and High Costs

Smacking Ban: Celebrities Call for UK-Wide Law Change

You may also like

Leave a Comment Cancel Reply

New Research offers a Solution to “Catastrophic Forgetting” in Large ‌Language Models