Paper deep dive

Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models

Yuwen Tan, Boqing Gong

Year: 2025Venue: arXiv preprintArea: Model EditingType: PositionEmbeddings: 77

Models: CLIP

Abstract

Abstract:Machine unlearning removes certain training data points and their influence on AI models (e.g., when a data owner revokes their decision to allow models to learn from the data). In this position paper, we propose to lift data-tracing machine unlearning to knowledge-tracing for foundation models (FMs). We support this position based on practical needs and insights from cognitive studies. Practically, tracing data cannot meet the diverse unlearning requests for FMs, which may be from regulators, enterprise users, product teams, etc., having no access to FMs' massive training data. Instead, it is convenient for these parties to issue an unlearning request about the knowledge or capability FMs (should not) possess. Cognitively, knowledge-tracing unlearning aligns with how the human brain forgets more closely than tracing individual training data points. Finally, we provide a concrete case study about a vision-language FM to illustrate how an unlearner might instantiate the knowledge-tracing machine unlearning paradigm.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%

Last extracted: 3/12/2026, 5:17:15 PM

Summary

The paper proposes 'Knowledge-Tracing Machine Unlearning' for foundation models (FMs), shifting the focus from removing specific training data points (data-tracing) to removing specific knowledge or capabilities. This approach addresses practical needs for stakeholders without access to original training data and aligns with cognitive theories of human forgetting. The authors provide a case study on a vision-language model (CLIP) to demonstrate the feasibility of this paradigm.

Entities (4)

Foundation Models · technology · 99%Knowledge-Tracing Machine Unlearning · methodology · 98%CLIP · model · 97%Data-Tracing Machine Unlearning · methodology · 95%

Relation Signals (3)

CLIP → usedin → Case Study

confidence 98% · we provide a concrete case study about Contrastive Language-Image Pretraining (CLIP)

Knowledge-Tracing Machine Unlearning → appliedto → Foundation Models

confidence 95% · we propose to lift data-tracing machine unlearning to knowledge-tracing for foundation models

Knowledge-Tracing Machine Unlearning → improvesupon → Data-Tracing Machine Unlearning

confidence 92% · we propose to lift data-tracing machine unlearning to knowledge-tracing for foundation models

Cypher Suggestions (2)

Identify models that have been subjected to knowledge-tracing unlearning · confidence 95% · unvalidated

MATCH (m:Model)-[:SUBJECTED_TO]->(k:Methodology {name: 'Knowledge-Tracing Machine Unlearning'}) RETURN m

Find all methodologies related to machine unlearning · confidence 90% · unvalidated

MATCH (m:Methodology)-[:RELATED_TO]->(u:UnlearningTechnique) RETURN m, u

Full Text

76,888 characters extracted from source content.

Expand or collapse full text

arXiv:2506.11253v1 [cs.CV] 12 Jun 2025 Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models Yuwen Tan Boston University yuwentan@bu.edu Boqing Gong Boston University bgong@bu.edu Abstract Machine unlearning removes certain training data points and their influence on AI models (e.g. when a data owner revokes their decision to allow models to learn from the data). In this position paper, we propose to lift data-tracing machine unlearning to knowledge-tracing for foundation models (FMs). We support this position based on practical needs and insights from cognitive studies. Practically, tracing data cannot meet the diverse unlearning requests for FMs, which may be from regulators, enterprise users, product teams, etc., having no access to FMs’ massive training data. Instead, it is convenient for these parties to issue an unlearning request about the knowledge or capability FMs (should not) possess. Cognitively, knowledge- tracing unlearning aligns with how the human brain forgets more closely than tracing individual training data points. Finally, we provide a concrete case study about a vision-language FM to illustrate how an unlearner might instantiate the knowledge-tracing machine unlearning paradigm. 1 Introduction “The brain is always trying to forget the information it has already learned” [27]. The human brain possesses the ability to selectively forget past experiences and knowledge [16,70,72] in response to environmental changes during the process of memory and learning, which helps optimize cognitive resources. Forgetting is not a negative process but a natural and indispensable part [71], supporting abstraction and automation to acquire semantic and procedural knowledge [61]. This work is about machine unlearning [9,6,81] for foundation models (FMs) [4,7,68,62]. Such models are trained on large-scale data and have achieved human-level performance across diverse tasks. To enhance their adaptability and efficiency in dynamic environments, it is highly appealing that FMs can learn continuously and selectively unlearn—akin to humans. To this end, a pivotal question naturally arises: Can FMs achieve selective forgetting like humans? Conventionally, the exploration of selective forgetting mechanisms in FMs [18,54,24,48] has primarily been driven by privacy and safety concerns, following the machine unlearning (MU) paradigm initially designed for task-specialized models rather than general-purpose FMs. Under the regulation of the “right to be forgotten” [69], users may request to revoke their data and erase the influence from an AI model. MU, also known as data forgetting, aims to handle such requests by removing the privacy-sensitive and undesirable information from models while simultaneously preserving model utility. However, current efforts in MU predominantlytrace training data points, failing to handle similar requests at higher semantic levels (e.g., a product team might request to remove all people signals from a model). This gap becomes especially significant for FMs because many parties interact with FMs, such as data providers, legal and policy regulators, application † Code and project page:https://1yuwen.github.io/Knowledge-Tracing-MU-Page/. Preprint. Under review. Nature PlantAnimal DogCat .... Fungi Terrier Tre eFlower .... .... RetrieverPointer ... Boston Terrier Yorkshire Terrier ... Golden Retriever Flat-Coated Retriever Knowledge-Tracing Machine Unlearning Data-Tracing Machine Unlearning .... Mushroom Rose Unlearner LilyDaisy FireLily TigerLily ... .... ... ... Figure 1: Machine unlearning, also known as data forgetting in some works, aims to remove certain training data points and their influence on an AI model. It is challenging to apply this data-tracing paradigm to foundation models for various reasons. We propose to lift data to knowledge for foundation model unlearning, allowing one to request the unlearning of specific knowledge or capabilities of a foundation model. developers, and end users. Having no access to FMs’ training data, they may instead deliver their unlearning requests using high-level semantic descriptions. In this paper,we propose to lift data-tracing in foundation model unlearning (FMU) to knowledge- tracingas an initial step towards closing that gap. Figure 1 shows an exemplar realization of this position using a taxonomy of visual knowledge. It is a versatile interface between an unlearner and those who might issue unlearning requests at various levels of knowledge granularity, being responsive to real-world applications besides its strong analogy to how human brains forget. Suppose the request is to remove an FM’s visual recognition capability aboutPointer, a dog breed. The unlearner has sufficient flexibility to develop effective algorithms for this request, e.g., by collecting data labeled asPointer, designing regularizers to preserve the model’s performance on other classes, especiallyPointer’s parent classDog, and so on. Table 1 summarizes the key differences between existing MU that traces data and the advocated knowledge-tracing FMU. Table 1: Data-tracing machine unlearning vs. knowledge-tracing foundation model unlearning Data-tracing machine unlearningKnowledge-tracing foundation model unlearning RequesterUsers, data providersAnyone Request to remove Certain training data pointsmodel’s knowledge or capability Purpose Privacy, safetyPrivacy, safety, model capacity, human-like, etc. Models of interest(often) Task-specialized modelsGeneral-purpose foundation models Retention set(often)✔(default)✘ Oracle modelRetrained over remaining training data✘ Knowledge-tracing FMU is highly beneficial for both FM stakeholders and the development of more advanced FMs. From a practical view, it meets the incredibly diverse unlearning requests, which may come from anyone involved in the FM ecosystem, better than data-tracing MU. Indeed, many parties in the FM ecosystem have no direct access to the original training data at all. Transitioning from data-tracing unlearning to knowledge-tracing broadens FMU’s scope, moving beyond the deletion of data points. This is not to downgrade the significance of existing data-tracing MU, which remains imperative for privacy considerations (e.g., a user deauthorizes the use of their data by FMs), but only to showcase additional impacts of the advocated knowledge-tracing FMU. Moreover, knowledge- tracing FMU aligns more closely with the human brain’s forgetting process than data-level deletion, capturing how humans selectively retain and discard abstract knowledge and experience. In return, FMs can likely benefit from this unlearning process by freeing up model capacities for the efficient acquisition of new knowledge in the future. Following the proposed position, we conduct a concrete case study about unlearning fine-grained object classes from a vision-language FM. Over time, humans tend to forget specific details while 2 retaining abstract concepts. Accordingly, we choose some fine-grained concepts as the unlearning targets, not any particular training examples, and the goal is to effectively unlearn these concepts while maintaining the FM’s recognition ability over coarse-level classes and the remaining fine-grained ones. We envision a scenario that an unlearner source image examples for unlearning from hierarchical image classification datasets rather than the FM’s original training set. We do not use any extra retention images in the experiments. Extensive experiments demonstrate that existing data-tracing MU methods are applicable to the case study, but their performance could be strengthened in the future work for more satisfactory unlearning results. We stress that this case study is meant to support our position and spark discussion rather than provide a definitive solution to the challenges. Finally, we complement the case study by discussing other scenarios beyond the vision-language domain. The structure of this paper is as follows. First, we provide a concise review of data-tracing MU, revisit a prevalent formalization, and introduce its confluence with FMs, to offer readers the background of our position. We then articulate our position driven by various unlearning requests from the FM community and highlight the importance of knowledge-tracing unlearning from a cognitive science perspective. Next, we present a detailed case study about a vision-language FM, analyzing it from multiple perspectives. To broaden the discussion, we include more examples and the limitation of our case study. We conclude the paper with discussions about more related work, alternative views, and potential impacts to contextualize our position. 2 Existing MU traces training data points This section reviews MU and focuses on how the research unrolls across security, machine learning, and broader AI communities. We show that the existing MU workstrace training data points(e.g., from a user who decided to deauthorize the use of their data by machine learners). 2.1 Data-tracing MU: A concise review The concept of MU was first introduced in a pioneering study by Cao and Yang[9], who proposed to transform learning algorithms into a summation form rapidly amendable to data deletion. In the ensuing years, from 2015 to 2018, the studies about MU [8,44,10] primarily focused on the learning systems’ security and privacy aspects. MU started to gain traction in the machine learning and broader AI communities [28,77] after an influential work that applied an exact MU approach to deep neural networks for image classification [6]. Between 2019 and 2023, numerous MU works emerged to enhance unlearning quality for task-specialized neural networks [25,13,50,82]. Moreover, a competition [81] hosted in conjunction with NeurIPS 2023 heightened extensive interest in MU. Notably, the works reviewed above aredata-tracingbecause they operate on the data level, striving to remove some training data points (e.g., deauthorized by their owners) and their influence on a learning system or model. We can reiterate the formalization of MU in [81] to give readers a concrete understanding of MU’s data-tracing essence. The initial step is to train a modelθ 0 using a learning algorithmAon a given training datasetD train =(x i ,y i ) N i=1 . Then, the MU setup is to divide the training set intoforgetting setD f andretention setD r , whereD f ∪D r =D train andD f ∩D r =∅. An unlearner attempts to remove the influence ofD f ⊂ D train from the modelθ 0 . Intuitively, the unlearner can retrain a new modelθ r ← A(D r )from scratch on the retention set, often viewed as an oracle model as a result of MU. However, retraining is arguably resource-intensive and impractical, especially when multiple unlearning requests arrive sequentially. To overcome this limitation, the key is to design an unlearning algorithmUthat directly modifies the original modelθ 0 for each unlearning request, denoted byθ u ← U(θ 0 ,D f ,D r ), such that the unlearned modelθ u is as close to the oracleθ r as possible. Measuring the difference between the two models is yet another heated topic under discussion, along with the evaluation protocols for MU; We refer readers to [78,81,54,76] if they are interested in related works. 2.2 Data-tracing MU for FMs The data-tracing momentum in MU carried over to the confluence of MU and FMs, or FMU in short. The term FMs was coined by [4], referring to big models trained on broad data adaptable to a wide range of downstream tasks. Eldan and Russinovich[18]unlearned Harry Potter books from a 3 language FM [80]. Some studies explored MU to prevent text-to-image FMs from generating harmful content and undesirable styles [24,26]. Most recently, Cheng and Amiri[15], Li et al.[48], Poppi et al. [66] made initial efforts on multimodal FMU. Despite these early works and some new benchmarks [57,49,48], there remains no satisfactory research playground when it comes to FMU. Thaker et al.[76]experimentally showed that one could game existing FMU benchmarks rather than making real progress. Liu et al.[54]pointed out several challenges of MU for large language models, such as generality, authentication, and precision of an unlearning algorithm and its outcome. We celebrate and welcome these studies and discussions, which are much needed to formalize a reasonable research playground for FMU. This work adds to this discussion an actionable proposal, as elaborated below. 3 Lifting data to knowledge for FMU This work proposes to lift the focus on training data points to knowledge and capabilities for foundation model unlearning (FMU).Take the knowledge hierarchy in Figure 1, for example. While existing FMU accepts unlearning requests on the data point level only, we additionally allow one to request FMU at the knowledge level (e.g., please unlearnFlat-Coated Retrieverfrom a vision-language model without hurting the model’s other capabilities). More concretely, an unlearning request for FMs consists of a forget setD f ⊂data,knowledgeand nothing else, i.e., the retention setD r is left unspecified, orD r =∅. We contend that this request format is a user-friendly interface between unlearners and all relevant parties that might issue unlearning requests to FMs. Meanwhile, it provides unlearners sufficient flexibility to develop practical algorithms by translating the knowledge-level requests to data sets, constraints, and auxiliary models, to name a few. 3.1 Who might request FMU? DataProvider ModelDeveloper Users Legislatures Enterprise Users Figure 2: The foundation model unlearning re- quests may come from different members of the AI community. Not all members have access to the training data. They may instead issue unlearning requests as high-level semantic descriptions. As illustrated in Figure 2, FMs are not exclusive to model developers; they are also the focal point of many other parties like data providers, prod- uct developers, legal and policy regulators, and researchers in the community. Existing works on FMU mainly tried to remove the influence of some training examples from models, a sce- nario typically associated with data providers or model developers who possess direct access to the training data. Indeed, a common user could become adata providerto FMs at a certain point, and yet they could also withdraw the authoriza- tion about the use of their data at a later time, hence necessitating targeted unlearning of spe- cific samples. Formodel developers, discarding data that has become irrelevant or obsolete helps preserve the model’s accuracy and usability. Following legal and regulatory requirements,regulators must ensure that FMs are free from harmful, malicious, and undesirable content. These legislative entities often have no access to training data, and instead, it is more convenient for them to deliver the regulations as requests to unlearn at the knowledge level.Enterprise usersmay use FMs for specialized tasks that require unlearning undesired features. Finally,end usersmight dislike certain behaviors of an FM for cultural or personal reasons and request the model to avoid/unlearn those. Overall, the unlearning requests are extremely diverse from different parties of the FM ecosystem, expressed at both data and knowledge levels. In response to the wide range of needs in the real world, FMU cannot trace training data points only. Instead, we advocate for knowledge-tracing FMU. Beyond this practical argument, we also draw inspiration from cognitive science. 3.2 Knowledge-tracing FMU akin to human forgetting We reinforce the significance of knowledge-tracing FMU using insights from cognitive and psychol- ogy studies about forgetting. Although forgetting is often perceived as harmful and frustrating in daily life [2], it is, in fact, an essential part of the human cognition process [61,27,72]. It plays a vital role in knowledge acquisition, serving as a foundation for developing semantic and procedural 4 understanding by enabling abstraction and automation [61]. With limited cognitive capacity, humans excel at selectively forgetting at different levels, from instances to events to abstract knowledge, allowing them to prioritize relevant knowledge and enhance future learning [27, 3, 16]. Although one might argue that FMs do not necessarily need to learn from how human brains work to achieve human-level intelligence, drawing ideas from cognitive findings has been beneficial for machine learning and unlearning in general. Examples include unlearning for memory optimiza- tion [75] and the forget-and-relearn framework [102]. To this end, knowledge-tracing FMU is more akin to human forgetting than the data-tracing formalization. If FMs could selectively unlearn irrelevant information or abstract away unnecessary details — much like human development — they would become better at acquiring new knowledge in a lifelong learning scheme [86] efficiently and adaptively. 4 Case Study OudiA1 Before Unlearning OudiQ5 A photo of an OudiA1 A photo of an OudiQ5 AfterFine-grained A1 and Q5 A photo of an Oudi A photo of an OudiA1 A photo of an OudiQ5 A photo of an OudiA8 Retain Other Fine - grained Concepts Figure 3:Illustration of fine-grained vision- concept forgetting. The unlearned model fails to recognize the forgotten concepts yet still identifies their corresponding coarse-grained categories. Following this work’s position, we provide a concrete case study about Contrastive Language- Image Pretraining (CLIP) [68] to bridge the po- sition with real-world applications and, in return, explore the position in depth, spanning multiple factors and perspectives. We envision that Oudi Inc., a car manufacturer and an enterprise user of the CLIP model, has retired their O1 sedan for some reason. Ac- cordingly, Oudi’s product team requests that the Oudi O1 concept be unlearned from CLIP. An unlearner is equipped with existing MU meth- ods developed in the research community but realizes they all operate on the training data points. The unlearner cannot access CLIP’s training data; instead, they assemble a set of exemplar Oudi O1 images as the proxy forgetting setD f (but no retention set for convenience). Figure 3 illustrates this envision, and we formalize it as follows. 4.1 FMU for visual recognition: Experiment setup Denote byx,yan object image and its class label, respectively. We cast the class label to a knowledge ontology and, for simplicity, we consider a taxonomy of two levels of object classes. Denote byy c the parent of labely, i.e., the coarse-grained label of imagex. LetCbe the set of fine-grained classes, y∈C. The unlearning request is at the fine-grained level,D f ⊆C. Notably, the forgetting set is a subset of the fine-grained classes rather than training data points. The unlearner then enhances the forgetting set with images andhierarchical labelsD hf =(x i ,y i ,y c i )|y i ∈D f , aiming to remove CLIP’s visual recognition capacity for these requested classes without impairing CLIP’s other usage. 4.1.1 Datasets for unlearning We compile two fine-grained visual recognition datasets, CompCars-S and ImgnetDogs, of manmade and natural objects, respectively. CompCars-S is a subset of CompCars [90], a large-scale fine-grained car dataset with images from different viewpoints. It includes an extensive range of subcategories and a unique hierarchical structure. The subset we selected is relatively balanced and, more importantly, CLIP-friendly in that the CLIP model achieves high recognition accuracy. ImgnetDogs is a subset of ImageNet-1K [17], consisting of 99 fine-grained breeds of dogs worldwide. We randomly select 200 training images for each dog breed and use the corresponding validation subset in ImageNet as our test set. We use WordNet [22] to find the coarse-grained labels for the dog breeds. Please see the appendices for more details on the two datasets. 4.1.2 Unlearning methods While the unlearning requests in this case study happen at the class level,D f ⊆ C, we allow an unlearner to enhance them by collecting data for the forgetting classes:D hf =(x i ,y i ,y c i )|y i ∈D f . Hence, we are able to experiment with state-of-the-art data-tracing MU methods: Gradient ascent 5 (GA) [36,77,43] for the loss computed over the (enhanced) forgetting set, gradient difference (GD) [51], KL minimization [93], random labeling (Relabeling) [25], task vector [34], weight saliency unlearning (SalUn) [19], maximizing entropy (ME+GD) [95] and negative preference optimization (NPO) [98]. We refer readers to the appendices for more details of these methods. A coarse-grained “retention set”.Some of these methods depend on a retention set, which our unlearner does not have due to the inaccessibility of CLIP training data. Instead, we obtain an unconventional “retention set”,D r Parent =(x i ,y c i )|(x i ,y i ,y c i )∈D hf , consisting of the images in the unlearner-assembled forgetting set,D hf , and their coarse-grained class labels,y c i , leveraging the fact that the unlearner is supposed to preserve CLIP’s recognition performance over these labels, which are parents of the forgetting classes in the object taxonomy. A hinge loss for gradient ascent (GA).GA is the core of the above MU methods except task vectors and relabeling, and yet GA is prone to over-forgetting [84,79]. We alleviate this issue using a controllable and bounded hinge loss: max [0,m+SIM(x i ,y i )−max y̸=y i ,y∈C SIM(x i ,y)](1) whereSIM(x,y) is the CLIP similarity between imagexand labely, andmis the margin, a nonnegative hyper-parameter controlling the magnitude of forgetting. A larger margin requires more unlearning efforts. We can compare this hinge loss with NPO [98], another approach designed to avoid GA’s overly forgetting. While NPO also bounds their loss, it suffers from the initial model’s mistakes as shown by Fan et al.[20]empirically. In contrast, our loss effectively mitigates excessive unlearning by 0-clipping; if the initial model makes a mistake at a data point(x i ,y i ), the loss is 0 whenm= 0. We note a concurrent work [11] that applies the hinge loss to LLMs. Regularization using the enhanced forgetting setD hf .We find two intuitive regularization techniques universally effective for all MU methods studied in this work. Both help maximize the use of the images in the enhanced forgetting setD hf . Given an input imagex i , CLIP can return its similarities to all coarse-grained labels. We normalize them into a valid distribution. The first regularizer is a KL-divergence between such distributions induced by the original CLIP and the one to be unlearned. The second regularizer is defined similarly, except that the distributions are over the fine-grained classesnotcovered by the forgetting set. 4.1.3 Evaluation Noting that evaluation methodologies for MU remain a point of heated discussion in the com- munity [54,76], we design ours following both task-specialized MU [81] and MU for language FMs [18]. The former leads to a quality-utility trade-off measure explained below, and the latter is about preserving CLIP’s general capabilities. Quality-utility trade-off.Given a dataset described above, the forgetting quality and utility are metrics calculated within this dataset. Denote byθ 0 andθ u the CLIP models before and after unlearning, respectively. We define forgetting quality as the model’s degradation in recognition accuracy for the forgetting classesD f ⊆Cafter unlearning: Q= 1− ̄ A(D f ), ̄ A(·) =ACC(·;θ u )/ACC(·;θ 0 ) where ̄ A(D f )is the accuracy of the unlearned modelθ u ,ACC(D f ;θ u ), over the forgetting classes D f scaled by that of the original modelθ 0 . The higher the forgetting quality, the better, as it indicates how much of the targeted knowledge has been removed from CLIP. The utility cares about the unlearned model’s preservation of visual recognition performance over the classes other than the targeted forgetting ones. Importantly, we calculate utility using the full taxonomy of class labels; for the two datasets in this work, the scope of interest includes both D r =C f , the retention classes at the same level as the forgetting ones, and their parent classes in the taxonomy, represented asD r Parent andD f Parent . Specifically, the utility of an unlearned model is U= ( ̄ A(D r ) + ̄ A(D r Parents ) + ̄ A(D f Parents ))/3, where ̄ Ais the same scaled accuracy function as used in defining the forgetting quality. We then define aQ-Uscore as the harmonic mean of quality and utility, inspired by the F-score: Q-U= 2QU/(Q+U). Preservation of general capabilities.Radford et al.[68]demonstrated CLIP’s remarkable zero- shot image classification performance over multiple datasets, which should not be impaired by the 6 Table 2: Fine-grained concept removal results on ImgnetDogs. D f test D r test Performance Metrics Method coarse↑fine↓coarse↑fine↑Quality↑Utility↑Q-U↑Zero-shot↑ Origin CLIP [68]86.2093.4050.8865.55–83.24 GA [36]1.000.007.801.57100.006.3011.8578.55 GDiff [51]69.600.0040.549.30100.0058.2173.5880.89 GA+KL [93]77.403.0041.2835.9696.7975.2684.6881.66 Relabeling [25] 44.8043.8029.5745.6453.1059.9156.3081.32 SaLUN[19]47.8034.8030.4946.5262.7462.1262.4381.77 ME+GD [95]95.2053.2045.1246.7943.0488.6957.5281.70 Task vector [34]79.6036.6044.5862.3860.8191.7173.1382.57 NPO+KL [98]88.008.0049.3353.9191.4393.0692.2482.20 NHL+KL(Ours) 88.202.0048.2354.5697.8692.6895.2082.53 requested unlearning as long as those class labels have no overlap with the forgetting setD f . To test this general ability of unlearned CLIP, we follow [68,39] to use several image classification datasets [42, 21, 60, 41, 64, 5] to assess the zero-shot classification performance of the model. 4.2 Results Main comparison results.Table 2 shows the results of various MU baselines on the ImgnetDogs dataset.GA-basedmethods achieve high forgetting quality but suffer from a significant drop in retained fine-grained concept recognition accuracy due to their unbounded optimization loss. Without a regularization term, the fine-grained accuracy on the retention set drops sharply to 1.57%. Introducing a KL-divergence regularization term on the forget set helps preserve utility, raising the retention set accuracy to 35.96%.Relabelingperforms poorly in fine-grained unlearning, exhibiting low forgetting quality and model utility. The Q-U score ofSalUnis better than the relabeling method (62.43%vs. 56.30%). TheMEmethod disrupts the intrinsic relationships among fine-grained concepts, leading to a significant reduction in the accuracy of the retained concepts. Thetask-vector struggles to unlearn fine-grained concepts, resulting in low forgetting quality while maintaining high model utility. Unlike the unbounded loss in the GA-based method, the unlearning optimization loss forNPOis bounded, avoiding catastrophic collapse and achieving better unlearning performance. Our proposed method (NHL), incorporating KL divergence, attains a Q-U score of 95.20%, nearly 3 %higher than the NPO method. We also report the average zero-shot classification accuracy of the unlearned model. The results indi- cate that forgetting specific fine-grained concepts generally does not significantly impair the model’s generalizability, except in the case of the GA method without regularization, which experiences notable degradation. Moreover, models employing relabeling-based unlearning methods exhibit a more pronounced decline in generalizability. Table 3: Fine-grained concept removal results vs. different difficulty levels on ImgnetDogs. DifficultyD f test D r test Performance Metrics Level Method coarse↑fine↓coarse↑fine↑Quality↑Utility↑Q-U↑ CLIP [68]86.2093.4050.8865.55– Difficult NHL+KL(Ours)88.202.0048.2354.5697.8692.6895.20 CLIP [68]75.0082.8052.1366.74– Medium NHL+KL(Ours)6.000.4050.2958.2999.5294.6197.00 CLIP [68]60.7375.8253.6667.42– Easy NHL+KL(Ours)64.360.7352.2163.0999.0496.9597.98 Unlearning results for the fine-grained forgetting classes of various difficulty levels.Like humans, FMs demonstrate varying degrees of memorization for concepts, leading to different difficulty levels for unlearning. In our case study, we quantify concept memorization using the model’s confidence scores about the concepts, offering a simpler alternative to traditional metrics [101,100]. We conduct three sets of experiments under difficult, medium, and easy unlearning settings, corresponding to decreasing average confidence scores of the concepts to be unlearned. As shown in Table 3, removing difficult, high-confidence concepts causes a more substantial drop in model utility compared to 7 easy, low-confidence ones. This highlights the importance of avoiding excessive unlearning of low-confidence concepts and carefully regulating the unlearning of high-confidence concepts to preserve utility in future work. Table 4: Comparison of unlearning performance (Q-U metrics) on the OOD dataset. MethodsQuality↑Utility↑Q-U↑ GDiff [51]87.5777.0281.96 GA+KL [93]85.2484.5084.87 NPO+KL [98] 34.9896.6051.37 NHL+KL(Ours)27.2097.9242.58 Limitation of data-tracing MU methods. While we applied data-tracing MU methods to the case study, we contend they exhibit sig- nificant limitations for the knowledge-tracing FMU. Most existing data-tracing MU methods yield subpar quality-utility trade-off and zero- shot generalization in our case study. Although NPO and our proposed method perform better than others in the quality-utility trade-off, they have poor robustness under the out-of-dataset test (Table 4), where models were unlearned on ImgnetDogs and evaluated on OxfordPet. The results show that all data-tracing MU methods, including ours, fail to tackle knowledge-tracing MU, which underscores the limitations of current data-tracing MU methods. We expect that future techniques will be natively designed for knowledge-tracing FMU. 4.3 Discussions Our case study uses a taxonomy to represent knowledge structures for its flexibility in lieu of its completeness. We can extend it to higher abstraction levels, such as forgettingretrieverwhile retainingdog. One can also refined it further by subdividinggolden retrieverinto finer-grained categories or attributes. In this structure, each abstract concept corresponds to an inner node, and the granularity of the hierarchy determines the specificity of knowledge encoded in the leaf nodes. We acknowledge that a real-world ontology should be more complex than ours. A knowledge graph embedded in an LLM can be exponentially large. Exploring alternative structural representations and unlearning setups, such as graph-based knowledge unlearning, is a promising direction for future research on the knowledge-tracing FMU. Moreover, defining and quantifying the boundaries of knowledge remains a fundamental challenge in knowledge-tracing unlearning. What constitutes retention knowledge given a request to unlearn certain knowledge? How to best draw the fine line between them? Finally, we present additional knowledge-tracing unlearning scenarios and some potential strategies for constructing corresponding forgetting and retention datasets as follows.Retrieval:Forgetting targets are visual concepts such as “Golden Retriever.” The forgetting dataset consists of image-text pairs related to the target concepts, with images sourced from public datasets and captions generated by proprietary VLMs [1] and verified by humans. The retention dataset includes semantically similar but distinct concepts (e.g., other dog breeds) to assess the specificity of forgetting. General vision- language benchmarks [14] can be used to evaluate overall generalization.VQA:(e.g., LLaVA [52]). Forgetting targets include visual entities such as “Donald Trump.” The forgetting dataset comprises images of the target paired with QA examples—open-ended or multiple-choice—generated using GPT-4o [1] and verified manually. The retention dataset involves QA pairs about related but different concepts (e.g., other public figures). General VQA benchmarks [23] assess broader reasoning abilities.Text QA:(e.g., LLaMA [80]). Forgetting targets are private entity-level facts, such as details about “Harry Potter” characters. The forgetting dataset consists of QA pairs or passages explicitly referencing those facts, generated or collected to ensure contextual diversity. The retention set includes text about similar but untargeted entities. Evaluation relies on QA datasets such as Natural Questions [45] and TriviaQA [38]. We leave the specific implementation and study of these cases to future work. 5 Alternative Views While we argue to prioritize the research on knowledge-tracing FMU, one might argue that the data- tracing MU should remain the top priority even for FMs because the resulting methods are generally applicable. Indeed, we anticipate that the unlearning methods in the proposed knowledge-tracing paradigm will still rely on data for unlearning. One might also have a different view about the insights 8 we draw from cognitive science. Airplanes fly in a way different from how birds fly. Hence, it is not necessary to design FMU frameworks following the human brain’s forgetting mechanism. There could also be a wild alternative view that FMs do not need unlearning because the scaling law and hardware innovation allow them to continually grow and learn new information without losing previously acquired capabilities. Instead of prioritizing research on FMU, the focus should be on continual learning of FMs, where selective forgetting could be a subtopic or natural property emerging in an FM’s continual learning process. Another research priority one would probably like to pursue is evaluation at MU. We have witnessed some works on this topic already [76,78,74], which call for more comprehensive and solid bench- marks for MU research. In the data-tracing MU, one can obtain an oracle model by retraining a model over the retention set. However, such a model is often not supplied with any existing MU benchmarks, and it remains unclear how to leverage the oracle model to evaluate MU methods. Currently, there is no widely accepted standard for evaluating knowledge-level unlearning. Through this position paper, we hope to inspire future work that advances the evaluation criteria. 6 More related work Besides the works reviewed in Section 2, our position and case study are also broadly related to the following works. MU on vision.The SISA framework [6] has advanced MU in the classification task, with subsequent efforts [88,89] enhancing retraining efficiency. Recent research has shifted towards approximate MU that modifies trained models directly. Early approaches employing Hessian approximations [28,73] faced high computation costs. More general methods have been introduced for class-wise unlearning in deep neural networks [13,50,43,19,53]. The concept of MU has also been extended to diffusion models [24, 63, 26, 99], aiming to prevent generating harmful or unethical content. MU for LLMs.How to remove the influence of undesirable data on the pre-trained LLMs [54,74, 33,49,37,67,11] has received growing attention. Various unlearning methods have been proposed, including gradient ascent [36], random relabeling [92,93], and regenerating desirable answers [18] or safe tokens [35], demonstrating effective unlearning capabilities. Additionally, approaches combining gradient ascent with KL divergence [82,12,92] or gradient descent [92,12] have been widely adopted. Task-vector-based techniques [96,55,29] and weight-importance strategies [87,94] further enhance unlearning precision while preserving utility. Input-based unlearning methods [65,32] have emerged as a complementary solution for black-box LLMs unlearning. Multi-modality MU.Compared to single modality MU, unlearning for multimodal vision-language models [15,48,56,91,66] remains largely underexplored. SIU [48] proposed an efficient method for unlearning visual concepts in the pre-trained LLaVA [52] using just one image during the training process. MMDelte [15] proposed a multi-modality unlearning method for fine-tuned FMs on image- text and graph-text datasets. CLIPErase [91] and Safe-CLIP [66] explored machine unlearning on the CLIP model. Inspired by TOFU [57], a new benchmark FIUBENCH [56], which contains fictitious facial identity data, has been proposed to evaluate the unlearning methods on the fine-tuned VLM. Model editing.Model editing, or knowledge editing [59,31,85], shares similarities with unlearning, as both seek to modify the model while preserving its generalization capabilities. However, the two processes differ fundamentally: model editing focuses on predefined updates to address hallucinations in pre-trained models, whereas unlearning involves removing information without predefined outputs. While much of the existing research has concentrated on editing large language models [58,59,85], recent efforts have introduced new benchmarks for editing VLMs [31, 97, 30, 47]. 7 Conclusion This position paper is on the confluence of MU and FMs, or FMU in short. We have provided a historical review of MU and FMU, which exposes that existing works trace data — removing specific training examples’ influence from FMs. We argue that this setup is impractical for many FM users because they have no or limited access to FMs’ massive training data. Instead, we advocate for a shift toward knowledge-tracing FMU to meet diverse unlearning requests from all FM stakeholders. Besides this argument from a practical view, we also draw insights from cognitive science, backing 9 that knowledge-tracing FMU aligns with human-like memory processes. We have provided a detailed case study about CLIP, a visual-language FM, to explore our position further. The learning requests are formalized about the removal of some specific fine-grained object class recognition capabilities. We encourage the research community to pay attention to what to unlearn (knowledge or data) when they expand investigations into MU and FMU. References [1] OpenAI (2024). Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024. [2] Lee Averell and Andrew Heathcote. The form of the forgetting curve and the fate of memories.Journal of mathematical psychology, 55(1):25–35, 2011. [3]Robert A Bjork and Elizabeth L Bjork. Forgetting as the friend of learning: Implications for teaching and self-regulated learning.Advances in Physiology Education, 43(2):164–167, 2019. [4]Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021. [5]Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. InComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part VI 13, pages 446–461. Springer, 2014. [6]Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In2021 IEEE Symposium on Security and Privacy (SP), pages 141–159. IEEE, 2021. [7]Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [8]Yinzhi Cao. Machine unlearning: Repairing learning models in adversarial environments. InBig Data Analytics in Cybersecurity, pages 137–167. Auerbach Publications, 2017. [9]Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015. [10]Yinzhi Cao, Alexander Fangxiao Yu, Andrew Aday, Eric Stahl, Jon Merwine, and Junfeng Yang. Efficient repair of polluted machine learning systems via causal unlearning. InProceedings of the 2018 on Asia conference on computer and communications security, pages 735–747, 2018. [11]Sungmin Cha, Sungjun Cho, Dasol Hwang, and Moontae Lee. Towards robust and parameter-efficient knowledge unlearning for llms. InThe Thirteenth International Conference on Learning Representations, 2025. [12]Jiaao Chen and Diyi Yang. Unlearn what you want to forget: Efficient unlearning for llms.arXiv preprint arXiv:2310.20150, 2023. [13]Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7766–7775, 2023. [14]Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server.arXiv preprint arXiv:1504.00325, 2015. [15]Jiali Cheng and Hadi Amiri. Multidelete for multimodal machine unlearning. InEuropean Conference on Computer Vision, pages 165–184. Springer, 2025. [16]Ronald L Davis and Yi Zhong. The biology of forgetting—a perspective.Neuron, 95(3):490–503, 2017. [17]Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. [18]Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in llms.arXiv preprint arXiv:2310.02238, 2023. 10 [19]Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, and Sijia Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation.arXiv preprint arXiv:2310.12508, 2023. [20]Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Sim- plicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024. [21]Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004. [22]Christiane Fellbaum. Wordnet: An electronic lexical database.MIT Press google schola, 2:678–686, 1998. [23] Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, et al. Mme: A comprehensive evaluation benchmark for multimodal large language models.arXiv preprint arXiv:2306.13394, 2023. [24]Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, and David Bau. Erasing concepts from diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2426–2436, 2023. [25]Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9304–9312, 2020. [26] Chao Gong, Kai Chen, Zhipeng Wei, Jingjing Chen, and Yu-Gang Jiang. Reliable and efficient concept erasure of text-to-image diffusion models. InEuropean Conference on Computer Vision, pages 73–88. Springer, 2025. [27] Lauren Gravitz. The importance of forgetting.Nature, 571(July):S12–S14, 2019. [28]Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. Certified data removal from machine learning models.arXiv preprint arXiv:1911.03030, 2019. [29]Xinshuo Hu, Dongfang Li, Baotian Hu, Zihao Zheng, Zhenyu Liu, and Min Zhang. Separate the wheat from the chaff: Model deficiency unlearning via parameter-efficient module operation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 18252–18260, 2024. [30]Han Huang, Haitian Zhong, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. Kebench: A benchmark on knowledge editing for large vision-language models.arXiv preprint arXiv:2403.07350, 2024. [31]Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. Vlkeb: A large vision-language model knowledge editing benchmark. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. [32] James Y Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, and Muhao Chen. Offset unlearning for large language models.arXiv preprint arXiv:2404.11045, 2024. [33]Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, and Naoya Inoue. On effects of steering latent representation for large language model unlearning.arXiv preprint arXiv:2408.06223, 2024. [34]Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Han- naneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022. [35]Yoichi Ishibashi and Hidetoshi Shimodaira. Knowledge sanitization of large language models.arXiv preprint arXiv:2309.11852, 2023. [36] Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models.arXiv preprint arXiv:2210.01504, 2022. [37]Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, and Jun Zhao. Rwku: Benchmarking real-world knowledge unlearning for large language models. arXiv preprint arXiv:2406.10890, 2024. 11 [38]Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension.arXiv preprint arXiv:1705.03551, 2017. [39]Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Self-regulating prompts: Foundational model adaptation without forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15190–15200, 2023. [40]Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine- grained image categorization: Stanford dogs. InProc. CVPR workshop on fine-grained visual categoriza- tion (FGVC), volume 2, 2011. [41]Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. InProceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013. [42] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. [43] Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearning.Advances in neural information processing systems, 36, 2024. [44]Chanhee Kwak, Junyeong Lee, Kyuhong Park, and Heeseok Lee. Let machines unlearn–machine unlearning and the right to be forgotten. In2017 Americas Conference on Information Systems: A Tradition of Innovation, AMCIS 2017. Americas Conference on Information Systems, 2017. [45]Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. Natural questions: a benchmark for question answering research.Transactions of the Association for Computational Linguistics, 7:453–466, 2019. [46]Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Peiyuan Zhang, Yanwei Li, Ziwei Liu, et al. Llava-onevision: Easy visual task transfer.arXiv preprint arXiv:2408.03326, 2024. [47]Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, and Bozhong Tian. Mike: A new benchmark for fine-grained multimodal entity knowledge editing.arXiv preprint arXiv:2402.14835, 2024. [48] Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, and Sheng Bi. Single image unlearning: Efficient machine unlearning in multimodal large language models.arXiv preprint arXiv:2405.12523, 2024. [49]Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, et al. The wmdp benchmark: Measuring and reducing malicious use with unlearning.arXiv preprint arXiv:2403.03218, 2024. [50]Shen Lin, Xiaoyu Zhang, Chenyang Chen, Xiaofeng Chen, and Willy Susilo. Erm-ktp: Knowledge-level machine unlearning via knowledge transfer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20147–20155, 2023. [51] Bo Liu, Qiang Liu, and Peter Stone. Continual learning and private unlearning. InConference on Lifelong Learning Agents, pages 243–254. PMLR, 2022. [52]Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. [53] Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, PRANAY SHARMA, Sijia Liu, et al. Model sparsity can simplify machine unlearning.Advances in Neural Information Processing Systems, 36, 2024. [54]Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, et al. Rethinking machine unlearning for large language models. arXiv preprint arXiv:2402.08787, 2024. [55]Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning.arXiv preprint arXiv:2402.10058, 2024. [56]Yingzi Ma, Jiongxiao Wang, Fei Wang, Siyuan Ma, Jiazhao Li, Xiujun Li, Furong Huang, Lichao Sun, Bo Li, Yejin Choi, et al. Benchmarking vision language model unlearning via fictitious facial identity dataset.arXiv preprint arXiv:2411.03554, 2024. 12 [57]Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. Tofu: A task of fictitious unlearning for llms.arXiv preprint arXiv:2401.06121, 2024. [58]Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale.arXiv preprint arXiv:2110.11309, 2021. [59]Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. InInternational Conference on Machine Learning, pages 15817–15831. PMLR, 2022. [60]Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008. [61]Simon Nørby. Why forget? on the adaptive value of memory loss.Perspectives on Psychological Science, 10(5):551–578, 2015. [62] R OpenAI. Gpt-4 technical report. arxiv 2303.08774.View in Article, 2(5), 2023. [63]Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, and Gayoung Lee. Direct unlearning optimization for robust and safe text-to-image models.arXiv preprint arXiv:2407.21035, 2024. [64] Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012. [65]Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few shot unlearners.arXiv preprint arXiv:2310.07579, 2023. [66]Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. Safe-clip: Removing nsfw concepts from vision-and-language models. InEuropean Conference on Computer Vision, pages 340–356. Springer, 2025. [67]Xinchi Qiu, William F Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp, and Nicholas D Lane. Pistol: Dataset compilation pipeline for structural unlearning of llms.arXiv preprint arXiv:2406.16810, 2024. [68]Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. [69]Protection Regulation. Regulation (eu) 2016/679 of the european parliament and of the council.Regulation (eu), 679:2016, 2016. [70] Avery A Rizio and Nancy A Dennis. The neural correlates of cognitive control: Successful remembering and intentional forgetting.Journal of cognitive neuroscience, 25(2):297–312, 2013. [71] HENRY L ROEDIGER I, Yana Weinstein, and Pooja K Agarwal. Forgetting: preliminary considerations. InForgetting, pages 15–36. Psychology Press, 2010. [72]Tomás J Ryan and Paul W Frankland. Forgetting as a form of adaptive engram cell plasticity.Nature Reviews Neuroscience, 23(3):173–186, 2022. [73]Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh. Remember what you want to forget: Algorithms for machine unlearning.Advances in Neural Information Processing Systems, 34:18075–18086, 2021. [74]Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A Smith, and Chiyuan Zhang. Muse: Machine unlearning six-way evaluation for language models.arXiv preprint arXiv:2407.06460, 2024. [75]Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, and Angela Fan. Not all memories are created equal: Learning to forget by expiring. InInternational Conference on Machine Learning, pages 9902–9912. PMLR, 2021. [76]Pratiksha Thaker, Shengyuan Hu, Neil Kale, Yash Maurya, Zhiwei Steven Wu, and Virginia Smith. Position: Llm unlearning benchmarks are weak measures of progress.arXiv preprint arXiv:2410.02879, 2024. 13 [77]Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling sgd: Understanding factors influencing machine unlearning. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), pages 303–319. IEEE, 2022. [78]Anvith Thudi, Hengrui Jia, Ilia Shumailov, and Nicolas Papernot. On the necessity of auditable algorithmic definitions for machine unlearning. In31st USENIX Security Symposium (USENIX Security 22), pages 4007–4022, 2022. [79]Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, and Ningyu Zhang. To forget or not? towards practical knowledge unlearning for large language models.arXiv preprint arXiv:2407.01920, 2024. [80]Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023. [81]Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, et al. Are we making progress in unlearning? findings from the first neurips unlearning competition.arXiv preprint arXiv:2406.09073, 2024. [82] Lingzhi Wang, Tong Chen, Wei Yuan, Xingshan Zeng, Kam-Fai Wong, and Hongzhi Yin. Kga: A general machine unlearning framework based on knowledge gap alignment.arXiv preprint arXiv:2305.06535, 2023. [83] Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024. [84]Qizhou Wang, Bo Han, Puning Yang, Jianing Zhu, Tongliang Liu, and Masashi Sugiyama. Unlearn- ing with control: Assessing real-world utility for large language model unlearning.arXiv preprint arXiv:2406.09179, 2024. [85] Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, and Jundong Li. Knowledge editing for large language models: A survey.ACM Computing Surveys, 57(3):1–37, 2024. [86]Zhenyi Wang, Enneng Yang, Li Shen, and Heng Huang. A comprehensive survey of forgetting in deep learning beyond continual learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [87]Xinwei Wu, Junzhuo Li, Minghui Xu, Weilong Dong, Shuangzhi Wu, Chao Bian, and Deyi Xiong. Depn: Detecting and editing privacy neurons in pretrained language models.arXiv preprint arXiv:2310.20138, 2023. [88] Yinjun Wu, Edgar Dobriban, and Susan Davidson. Deltagrad: Rapid retraining of machine learning models. InInternational Conference on Machine Learning, pages 10355–10366. PMLR, 2020. [89]Haonan Yan, Xiaoguang Li, Ziyao Guo, Hui Li, Fenghua Li, and Xiaodong Lin. Arcane: An efficient architecture for exact machine unlearning. InIJCAI, volume 6, page 19, 2022. [90]Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. A large-scale car dataset for fine-grained categorization and verification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3973–3981, 2015. [91] Tianyu Yang, Lisen Dai, Zheyuan Liu, Xiangqi Wang, Meng Jiang, Yapeng Tian, and Xiangliang Zhang. Cliperase: Efficient unlearning of visual-textual associations in clip.arXiv preprint arXiv:2410.23330, 2024. [92]Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, and Xiang Yue. Machine unlearning of pre-trained large language models.arXiv preprint arXiv:2402.15159, 2024. [93]Yuanshun Yao, Xiaojun Xu, and Yang Liu.Large language model unlearning.arXiv preprint arXiv:2310.10683, 2023. [94]Charles Yu, Sullam Jeoung, Anish Kasi, Pengfei Yu, and Heng Ji. Unlearning bias in language models by partitioning gradients. InFindings of the Association for Computational Linguistics: ACL 2023, pages 6032–6048, 2023. 14 [95]Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, and Min Lin. A closer look at machine unlearning for large language models.arXiv preprint arXiv:2410.08109, 2024. [96]Jinghan Zhang, Junteng Liu, Junxian He, et al. Composing parameter-efficient modules with arithmetic operation.Advances in Neural Information Processing Systems, 36:12589–12610, 2023. [97]Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, and Xiaojun Wan. Mc-mke: A fine-grained multimodal knowledge editing benchmark emphasizing modality consistency. arXiv preprint arXiv:2406.13219, 2024. [98]Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning.arXiv preprint arXiv:2404.05868, 2024. [99]Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, and Sijia Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models.arXiv preprint arXiv:2405.15234, 2024. [100]Kairan Zhao and Peter Triantafillou. Scalability of memorization-based machine unlearning.arXiv preprint arXiv:2410.16516, 2024. [101]Kairan Zhao, Meghdad Kurmanji, George-Octavian B ̆ arbulescu, Eleni Triantafillou, and Peter Triantafillou. What makes unlearning hard and what to do about it.arXiv preprint arXiv:2406.01257, 2024. [102] Hattie Zhou, Ankit Vani, Hugo Larochelle, and Aaron Courville. Fortuitous forgetting in connectionist networks.arXiv preprint arXiv:2202.00155, 2022. 15 A Further Details of Related Work In this section, we provide more details on the unlearning setups of existing unlearning work. We systematically categorize unlearning tasks, models, and targets of related papers in Table 5. Table 5: Experiment setup details for existing machine unlearning work. Related WorkTaskUnlearned Model/Target Golatkar et al. [25]Image classificationAll-CNN/Entire Class or a hundred images of the class Jang et al. [36]Unlearn Privacy InformationGPT-Neo/Privacy Instances Chen et al. [13]Image classificationAll-CNN and Resnet/Entire Class Lin et al. [50]Image classificationResnet/Entire Class Fan et al. [19]Image classification and generationResnet and DDPM/Random samples and Entire class Chen and Yang [12]Classification and SummarizationFine-tuned T5 and T3 model/Random Instances Zhang et al. [96]Reduce the toxicityGPT-2 Model/All instances Gandikota et al. [24]Text-to-Image generationStable Diffusion Model/Predefine Concepts Wang et al. [84]Synthetic author profiles QAFine-tuned Llama-2-7B/Random Entities Maini et al. [57]Synthetic author profiles QAFine-tuned Llama-2-7B/Random Entities Zhang et al. [98]Synthetic author profiles QAFine-tuned Llama-2-7B/Random Entities Wu et al. [87]Privacy information forgettingFine-tuned BERT-base model/All instances Eldan and Russinovich [18]Unlearn the Harry Potter booksPre-trained Llama2-7b model/All instances Yao et al. [93]Unlearn the Harry Potter booksFine-tuned Llama model/All instances Yao et al. [92]Removing copyrighted dataPre-trained Yi-6B /Pre-training samples Jin et al. [37]Remove celebrity informationPre-trained LaMA3 and Phi-3/Predefined entities Li et al. [48]Unlearn visual conceptsPre-trained LLaVA/Predefined visual concepts Li et al. [49]Remove hazardous knowledgePre-trained ZEPHYR-7B and YI-34B/Hazardous VQA Poppi et al. [66]Unlearn unsafe embeddingsPre-trained CLIP/Unsafe Images and Texts B More details of the Dataset Table 6: Hierarchy Fine-grained Recognition Dataset Details DatasetCoarse Num.Fine Num.Training Num.Testing Num. CompCars-S4829226,6308,943 ImgnetDogs149919,8004,950 CompCars-S.The original dataset comprises 161 coarse and 1687 fine classes; however, the classification accuracy across these classes is notably low. Some coarse-grained categories may contain only one fine-grained category, and some fine-grained categories have limited images. Consequently, we implemented a filtering process on the original dataset. The process is as follows: Initially, at the coarse-grained level, each category must include at least two fine-grained categories, and each fine-grained category must contain no fewer than 90 images; otherwise, the category would be removed. Subsequently, we utilized a pre-trained CLIP model (ViT-L/14) to refine the dataset further. Those images and car models are retained if the accuracy of the fine class is above20%. Otherwise, the corresponding car model categories and images are removed. The details of dataset information are presented in Table 6. ImgnetDogs.The construction of the ImgnetDogs dataset is based on WordNet [22]. The Stanford- Dog dataset, as introduced in [40], is also a fine-grained dog breed recognition dataset, which forms a subset of ImageNet. However, some fine-grained dog categories in the StanfordDog datasets are assigned to highly abstract coarse categories across different semantic levels. We selectively chose fine-grained categories with clear, well-defined, higher-level coarse semantic information from the original ImageNet dataset. 16 C More Details of the Case Study Setting Unlearning fine-grained concepts that the model initially fails to recognize or has low accuracy is meaningless. Therefore, the selected concepts for unlearning should meet a predefined accuracy threshold. In our case study, we focus on unlearning fine-grained classes with an accuracy above 90%. For the medium and easy unlearning settings in ImgnetDogs, the overall accuracy of the unlearned fine-grained classes is 82%and 75%, respectively. The specific fine-grained concepts unlearned for each dataset are detailed Table 7. Table 7: Unlearned fine-grained concepts for each dataset. DatasetUnlearn Fine Classes CompCars-SAcura MDX, Lexus RX, Jaguar XK, MINI CABRIO, Audi A7, Audi A5 coupe, Cadillac SRX, Corvette, Mus- tang ImgnetDogs DifficultGerman short-haired pointer, Boston terrier, West High- land white terrier, Labrador retriever, golden retriever, German shepherd dog, keeshond, Samoyed, Pomeranian, Border terrier ImgnetDogs MediumIrish setter, Gordon setter, basset hound, Airedale ter- rier, Shih-Tzu, miniature pinscher, Alaskan malamute, flat-coated retriever, Chesapeake Bay retriever, Sealyham terrier ImgnetDogs EasyEnglish setter, beagle, whippet, Ibizan hound, Dandie Dinmont terrier, standard poodle, Border collie, Blenheim spaniel, cairn terrier, Doberman, groenendael D Baseline Machine Unlearning Methods Gradient Ascent.Gradient Ascent [36,77,43] is a straightforward yet effective unlearning method applied to various unlearning settings. GA aims at maximizing the predicted loss on the forgetting set, which can be formulated as follows: L GA = X (x i ,y i )∈D f [log(y i |x i ,θ)].(2) Gradient Difference.Gradient Difference [51] introduces the regularization term on the retaining dataset, which helps maintain the model ability on the retaining dataset. By incorporating the GA loss alongside the GD loss, the GDiff objective can be formulated as: L GD = X (x i ,y c i )∈D f [−log(y c i |x i ,θ)],(3) L GDiff =L GA +L GD .(4) KL Minimization.Different from GD, KL minimization [93] minimizes the KL divergence between the prediction of the unlearned model and the origin model on the retaining dataset. The objective is defined as: L KL = X (x i ,y c i )∈D f KL(p θ 0 (y c i |x i )||p θ (y c i |x i )).(5) Random Labeling.By fine-tuning the original model using relabeled labels [25] on the forgetting dataset, the relabeling method overwrites the information associated with the original labels. The optimization objective for relabeling is as follows: L Relabel = X (x i ,.)∈D f [−log(y rand |x i ,θ)],(6) wherey rand is randomly chosen from the label set andy rand ̸=y f . 17 Negative Preference Optimization.To address the catastrophic collapse problem of GA, NPO [98] has introduced the bounded optimization loss defined as L NPO =− 2 β X (x i ,y i )∈D f [logσ(−βlog p θ (y i |x i ) p θ 0 (y i |x i ) ].(7) Task Vectors.Task vector [34, 55] first computes the forgetting set-specific vector defined as τ f =θ tune −θ 0 ,(8) whereθ tune stands for the model tuned on the forgetting setD f andθ 0 represent the origin trained model. Subsequently, the task vector is negated and applied to the original model weights to compute the unlearned model as follows θ u =θ 0 −ατ f .(9) SalUn.SalUn [19] introduces a gradient-based weight saliency map to identify important parameters for unlearning. The saliency map is defined as: m s =I[∇ θ L(θ,D f ) θ=θ 0 > α],(10) whereI[·]denotes the indicator function andαis a predefined threshold controlling the selection. The method selectively updates parameters with high gradient magnitudes using a relabeling strategy while freezing the remaining parameters to preserve the model’s utility. ME.ME [95] minimizes the output distribution of the unlearned model between the uniform distribu- tion, which is defined as: L ME = X (x i ,y i )∈D f KL(U K ||p θ (y i |x i ))(11) whereU K is the uniform distribution over the classes. Table 8: Prompts of Compcars-S and ImagenetDog dataset. DatasetPrompts CompCars-S‘a photo of a ’, ‘a photo of the ’, ‘a photo of my ’, ‘i love my !’, ‘a photo of my dirty ’, ‘a photo of my clean ’, ‘a photo of my new ’, ‘a photo of my old ’ ImgnetDogs‘a bad photo of a ’, ‘a photo of many ’, ‘a sculpture of a ’, ‘a photo of the hard to see ’, ‘a low resolution photo of the ’, ’a rendering of a ’, ‘graffiti of a ’, ‘a bad photo of the ’, ‘a cropped photo of the ’, ‘a tattoo of a ’, ‘the embroidered ’, ‘a photo of a hard to see ’, ‘a bright photo of a ’, ‘a photo of a clean ’, ‘a photo of a dirty ’, ‘a dark photo of the ’, ‘a drawing of a ’, ‘a photo of my ’, ‘the plastic ’, ‘a photo of the cool ’, ‘a close-up photo of a ’, ‘a black and white photo of the ’, ‘a painting of the ’, ‘a painting of a ’, ‘a pixelated photo of the ’, ‘a sculpture of the ’, ‘a bright photo of the ’, ‘a cropped photo of a ’, ‘a plastic ’, ‘a photo of the dirty ’, ‘a jpeg corrupted photo of a ’, ‘a blurry photo of the ’, ‘a photo of the ’, ‘a good photo of the ’, ‘a rendering of the ’, ‘a in a video game’, ‘a photo of one ’, ‘a doodle of a ’, ‘a close-up photo of the ’, ‘a photo of a ’, ‘the origami ’, ‘the in a video game’, ‘a sketch of a ’, ‘a doodle of the ’, ‘an origami ’, ‘a low resolution photo of a ’, ‘the toy ’, ‘a rendition of the ’, ‘a photo of the clean ’, ‘a photo of a large ’, ‘a rendition of a ’, ‘a photo of a nice ’, ‘a photo of a weird ’, ‘a blurry photo of a ’, ‘a cartoon ’, ‘art of a ’, ‘a sketch of the ’, ‘an embroidered ’, ‘a pixelated photo of a ’, ‘itap of the ’, ‘a jpeg corrupted photo of the ’, ‘a good photo of a ’, ‘a plushie ’, ‘a photo of the nice ’, ‘a photo of the small ’, ‘a photo of the weird ’, ‘the cartoon ’, ‘art of the ’, ‘a drawing of the ’, ‘a photo of the large ’, ‘a black and white photo of a ’, ‘the plushie ’, ‘a dark photo of a ’, ‘itap of a ’, ‘graffiti of the ’, ‘a toy ’, ‘itap of my ’, ‘a photo of a cool ’, ‘a photo of a small ’, ‘a tattoo of the ’ 18 E Training Details We use a pre-trained ViT-L/14 CLIP model as the base model in all experiments. The prompts for each dataset are provided in Table 8. The unlearning process is trained for 8 epochs using the Adam optimizer. The batch size is set to 32 for the ImgnetDogs dataset and 16 for CompCars-S. For GA-based methods, the initial learning rate (lr) is set to8×10 −8 , for SaLun, it is2×10 −7 , and for all other methods, it is1×10 −7 . We save the checkpoint for evaluation when the unlearning accuracy on the training set stops decreasing. All experiments are conducted on a single Nvidia RTX A6000 GPU. Additional training details for the baseline methods are provided in Table 9. Since no retain set is used during training, KL divergence and gradient ascent are applied solely to the forget set to preserve the model’s coarse recognition capabilities. Table 9: Training details and hyper-parameters of the baselines. MethodOptimization Loss functionLrHyper Parameters GAL GA (x f ,y f )8e-8- GDiffL GA (x f ,y f )+L GD (x f ,y f c )8e-8- ME+GDL ME (x f ,y f )+L GD (x f ,y f c )1e-7- Task VectorL GD (x f ,y f ) + 0.05∗L GA (x f ,y f c )1e-7α= 1.5 KLL GA (x f ,y f )+α c KL(x f ,y f c )+α f KL(x f ,y f )8e-8α c = 5,α f = 20 NPO+KLL NPO (x f ,y f )+α c KL(x f ,y f c )+α f KL(x f ,y f )1e-7β= 0.5,α c = 5,α f = 20 NHL+KLL u (x f ,y f )+α c KL(x f ,y f c )+α f KL(x f ,y f )1e-7m= 2,α c = 10,α f = 20 RelabelL Relabel (x f ,.)1e-7- SalUnL Relabel (x f ,.)2e-7α= 0.1 Table 10: Generalization performance across different baseline methods for the unlearned model. DatasetStanford CarsFood101Flower102Catech101Cifar100Avg↑ Origin CLIP [68]77.7592.3279.1891.1175.8283.24 GA [36]75.4389.2674.4289.7363.9078.55 GDiff [51]77.1090.7577.3690.5768.6780.89 GA+KL[93]76.5991.4778.0890.7871.4081.66 NPO+KL [98]77.0791.9078.2690.6573.1282.20 Relabeling [25] 76.8891.3876.2689.2572.8181.32 Task Vector [34]77.1592.0578.5390.2874.8582.57 SalUn [19]77.5091.5676.6689.3173.8381.77 ME+GD [95]77.1491.5676.4889.5073.8081.70 NHL+KL(Ours)77.2492.0078.8190.6873.9482.53 Table 11: Comparison of fine-grained concept removal results across different baseline methods on the OOD dataset. D f test D r test Performance Metrics Setting coarse↑fine↓coarse↑fine↑Quality↑Utility↑Q-U↑ Origin CLIP [68]92.1899.1073.9891.54– GDiff [51] 85.7712.3254.7758.5987.5777.0281.96 GA+KL [93]87.7814.6363.3166.5785.2484.5084.87 NPO+KL [98]94.0964.4369.3487.9434.9896.6051.37 NHL+KL(Ours)93.5972.1472.1588.0927.2097.9242.58 F More results F.1 More results on the ImgnetDogs dataset. Details of zero-shot classification results are shown in Table 10. We evaluated several unlearning methods on the OxfordPet dataset, regarded as an out-of-domain evaluation dataset. According to the results shown in Table 11, nearly all unlearning methods struggled to achieve high-quality 19 Table 12: Comparison of fine-grained concept removal results across different baseline methods on ImgnetDogs (Medium Unlearn). D f test D r test Performance Metrics Method coarse↑fine↓coarse↑fine↑Quality↑Utility↑Q-U↑ Origin CLIP [68]75.0082.8052.1366.74– GA [36]22.20.0030.702.63100.0030.8147.11 GDiff [51]58.40.0041.0518.05100.0061.2275.95 GA+KL [93]69.001.2049.0140.3898.5582.1889.62 NPO+KL [98]74.64.4050.2057.3094.6993.8894.28 Relabeling [25]50.6049.4039.8751.2840.3473.5952.11 Task vector[34]77.6013.8054.4060.0983.3396.6889.51 SalUn[19] 55.0041.4042.4554.2950.0078.7161.15 ME+GD [95]83.6044.8043.3048.6745.8985.3359.68 NHL+KL(Ours)76.000.4050.2958.2999.5294.6097.00 Table 13: Comparison of fine-grained concept removal results across different baseline methods on ImgnetDogs (Easy Unlearn). D f test D r test Performance Metrics Method coarse↑fine↓coarse↑fine↑Quality↑Utility↑Q-U↑ Origin CLIP [68]60.7375.8253.6667.42– GA [36]24.550.0018.393.89100.0026.8242.29 GDiff [51] 71.090.0045.416.73100.0064.8778.69 GA+KL [93]63.820.3649.526.2399.5277.0586.85 NPO+KL [98]64.556.5554.0260.6691.3896.6593.94 Relabeling [25]37.0932.1833.1844.5757.5563.0060.16 Task vector[34]68.364.9148.5960.8680.1097.9488.12 SalUn[19]39.6428.1934.9845.5562.8266.0064.37 ME+GD [95]86.1842.1853.8049.1844.3690.9859.64 NHL+KL(Ours)64.360.7352.2163.0999.0496.9597.98 forgetting, except for GA-based methods. While GA-based methods demonstrated superior unlearning performance, they significantly decreased performance on non-unlearned fine-grained concepts. Since the CLIP model is a non-generative model, its classification evaluations are based on a closed set, requiring predefined class names for testing. The limited number of categories in the OxfordPet dataset compared to the training set also impacts the performance of these unlearning methods. Future work will improve the unlearning method further and expand this case study to generative models [46, 83] with fine-grained recognition capabilities. Additionally, we provide additional results for the medium and easy unlearning settings, as shown in Table 12 and Table 13. Across different memorization settings, our method consistently per- forms the best. Additionally, the relabeling-based methods consistently show the poorest per- formance. The task-vector method performs well in both medium and easy settings, indicating that it is unsuitable for high-memorization concept unlearning. Furthermore, the NPO method’s forgetting quality is not very high in low memorization settings, demonstrating its limitation. Table 14: Unlearning performance with different numbers of forgotten training samples per fine- grained class. Samples Num.Quality↑Utility↑Q-U↑ 1070.0295.5880.83 2080.0994.9786.89 3093.3694.2093.78 5094.6593.6994.17 100 95.7292.5894.12 150 96.1593.1994.65 20097.8692.6895.20 Results with varying numbers of forgetting training samples.Table 14 illustrates the influ- ence of varying the number of forgetting training samples on the unlearning performance of our proposed method. When the number of forget- ting training samples is too small—such as only 10 images per category—achieving effective un- learning is challenging, resulting in lower forget quality (70%). Unlearning quality improves as the number of forgotten samples increases; how- ever, this comes at the cost of reduced model utility. Notably, the improvement in unlearning 20 Table 15: Comparison of fine-grained concept removal results across different baseline methods on CompCars-S. D f test D r test Performance Metrics Method coarse↑fine↓coarse↑fine↑Quality↑Utility↑Q-U↑ Origin CLIP [68]92.7892.1073.2971.04– GA [36] 0.000.003.271.28100.002.094.09 GDiff [51]88.662.7569.7518.6297.0272.3182.86 GA+KL[93]45.021.3841.938.2198.5139.1055.98 NPO+KL [98]89.6916.1570.1339.8282.4682.8082.63 Relabeling [25]59.1125.4358.7043.2272.3968.2170.24 Task vector[34]82.8228.5268.4860.9169.0389.4877.94 SalUn[19]64.2623.7157.6943.3774.2569.6771.89 ME+GD [95]99.6628.1877.8337.9669.4084.4776.20 NHL+KL(Ours)87.972.4168.6859.0497.3990.5493.84 Table 16: Generalization performance across different baseline methods for the unlearned model. DatasetFood101Flower102Caltech101OxfodPetCifar100Avg↑ Origin CLIP [68]92.3279.1891.1193.5975.8286.40 GA [36]92.1978.7190.9293.5773.1885.71 GDiff [51]92.2979.6191.0193.7674.3286.20 GA+KL[93]92.3279.1791.0593.6274.0786.05 NPO+KL [98]92.2678.9190.9593.1075.6186.34 Relabeling [25]91.7776.9990.1890.1173.1784.44 Task Vector [34] 92.3078.7491.0293.1675.4586.13 SalUn[19]91.5276.3590.0788.1473.5183.92 ME+GD [95] 91.2275.0590.2886.2673.2183.20 NHL+KL(Ours)92.2678.9190.9593.1075.6186.17 effectiveness becomes less significant beyond 30 samples, highlighting the sample efficiency of our proposed unlearning method. F.2 More results on CompCars-S dataset. The comparison results of different baseline methods on the CompCars-S dataset are presented in Table 15 and Table 16. In this dataset, gradient ascent outperforms the KL divergence method. Addi- tionally, relabeling-based methods fail to achieve effective unlearning, similar to their performance on the ImagenetDogs dataset. Notably, our proposed method significantly outperforms other unlearning techniques on the CompCars-S dataset. Moreover, the generalizability of most unlearned models remains largely unaffected, except for the relabeling-based method and the gradient ascent method without regularization, both of which exhibit substantial degradation. 21