Paper deep dive

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian

Year: 2026Venue: arXiv preprintArea: cs.CLType: PreprintEmbeddings: 69

Abstract

Abstract:With the widespread adoption of Large Language Models (LLMs), respecting indigenous cultures becomes essential for models' culturally safety and responsible global applications. Existing studies separately consider cultural safety and cultural knowledge and neglect that the former should be grounded by the latter. This severely prevents LLMs from yielding culture-specific respectful responses. Consequently, adaptive cultural safety remains a formidable task. In this work, we propose to jointly model cultural safety and knowledge. First and foremost, cultural-safety and knowledge-paired data serve as the key prerequisite to conduct this research. However, the cultural diversity across regions and the subtlety of cultural differences pose significant challenges to the creation of such paired evaluation data. To address this issue, we propose a novel framework that integrates authoritative cultural knowledge descriptions curation, LLM-automated query generation, and heavy manual verification. Accordingly, we obtain a dataset named AdaCultureSafe containing 4.8K manually decomposed fine-grained cultural descriptions and the corresponding 48K manually verified safety- and knowledge-oriented queries. Upon the constructed dataset, we evaluate three families of popular LLMs on their cultural safety and knowledge proficiency, via which we make a critical discovery: no significant correlation exists between their cultural safety and knowledge proficiency. We then delve into the utility-related neuron activations within LLMs to investigate the potential cause of the absence of correlation, which can be attributed to the difference of the objectives of pre-training and post-alignment. We finally present a knowledge-grounded method, which significantly enhances cultural safety by enforcing the integration of knowledge into the LLM response generation process.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/13/2026, 12:42:33 AM

Summary

The paper introduces AdaCultureSafe, a novel dataset and framework designed to jointly model and evaluate cultural knowledge and cultural safety in Large Language Models (LLMs). The authors identify a critical lack of correlation between cultural knowledge proficiency and cultural safety compliance in existing LLMs, attributing this to differences in pre-training and post-alignment objectives. They propose a knowledge-grounded method that significantly improves cultural safety by integrating explicit cultural knowledge into the response generation process.

Entities (5)

AdaCultureSafe · dataset · 100%Large Language Models · technology · 100%Cultural Knowledge · concept · 95%Cultural Safety · concept · 95%Llama3.1-8B · model · 95%

Relation Signals (3)

AdaCultureSafe → evaluates → Large Language Models

confidence 95% · Upon the constructed dataset, we evaluate three families of popular LLMs on their cultural safety and knowledge proficiency

Llama3.1-8B → improvedby → Knowledge-grounded method

confidence 95% · a representative open-source LLM (Llama3.1-8B) yields notable improvements: a 19.9% gain in respect score

Cultural Knowledge → grounds → Cultural Safety

confidence 90% · we propose a knowledge-grounded method, which significantly enhances cultural safety by enforcing the integration of knowledge

Cypher Suggestions (2)

Find all models evaluated using the AdaCultureSafe dataset. · confidence 90% · unvalidated

MATCH (d:Dataset {name: 'AdaCultureSafe'})-[:EVALUATES]->(m:Model) RETURN m.name

Identify the relationship between cultural concepts and model performance. · confidence 85% · unvalidated

MATCH (c:Concept)-[:GROUNDS]->(s:Concept) MATCH (m:Model)-[:PERFORMS_ON]->(s) RETURN m.name, s.name

Full Text

68,514 characters extracted from source content.

Expand or collapse full text

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models Hankun Kang Wuhan University Wuhan, Hubei, China kanghankun@whu.edu.cn Di Lin Wuhan University Wuhan, Hubei, China lindi5522@mails.jlu.edu.cn Zhirong Liao Wuhan University Wuhan, Hubei, China zhir.l@whu.edu.cn Pengfei Bai Wuhan University Wuhan, Hubei, China baipfei@whu.edu.cn Xinyi Zeng Tsinghua University Beijing, China zengxinyi20@mails.ucas.ac.cn Jiawei Jiang Wuhan University Wuhan, Hubei, China jiawei.jiang@whu.edu.cn Yuanyuan Zhu Wuhan University Wuhan, Hubei, China yyzhu@whu.edu.cn Tieyun Qian Wuhan University Wuhan, Hubei, China Zhongguancun Academy Beijing, China qty@whu.edu.cn Abstract With the widespread adoption of Large Language Models (LLMs), respecting indigenous cultures becomes essential for models’ cul- turally safety and responsible global applications. Existing studies separately consider cultural safety and cultural knowledge and ne- glect that the former should be grounded by the latter. This severely prevents LLMs from yielding culture-specific respectful responses. Consequently, adaptive cultural safety remains a formidable task. In this work, we propose to jointly model cultural safety and knowledge. First and foremost, cultural-safety and knowledge- paired data serve as the key prerequisite to conduct this research. However, the cultural diversity across regions and the subtlety of cultural differences pose significant challenges to the creation of such paired evaluation data. To address this issue, we propose a novel framework that integrates authoritative cultural knowl- edge descriptions curation, LLM-automated query generation, and heavy manual verification. Accordingly, we obtain a dataset named AdaCultureSafe containing 4.8K manually decomposed fine-grained cultural descriptions and the corresponding 48K manually verified safety- and knowledge-oriented queries. Upon the constructed dataset, we evaluate three families of pop- ular LLMs on their cultural safety and knowledge proficiency, via which we make a critical discovery: no significant correlation exists between their cultural safety and knowledge proficiency. We then delve into the utility-related neuron activations within LLMs to in- vestigate the potential cause of the absence of correlation, which can be attributed to the difference of the objectives of pre-training and post-alignment. We finally present a knowledge-grounded method, which significantly enhances cultural safety by enforcing the inte- gration of knowledge into the LLM response generation process. Warning: This paper may contain harmful examples by nature. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. KDD ’26, Jeju, Korea © 2026 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-X-X/2018/06 https://doi.org/X.X CCS Concepts • Security and privacy→Human and societal aspects of secu- rity and privacy;• Social and professional topics→Cultural characteristics;• Computing methodologies→Natural lan- guage processing. Keywords Cultural Safety, Cultural Knowledge, Large Language Models ACM Reference Format: Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, and Tieyun Qian. 2026. AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models. In Proceedings of the 32st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’26). ACM, New York, NY, USA, 12 pages. https: //doi.org/X.X 1 Introduction LLMs present impressive capabilities across a wide range of tasks [42, 51,55], such as extensive knowledge [11,22], semantic understand- ing [19,46], and reasoning [29,48,50,53]. Unfortunately, LLMs also raise serious safety and ethical concerns like toxicity and harm- ful jailbreaks [6,20,28]. In particular, with the widespread global usage, it is critical to ensure the adaptive cultural safety of large language models (LLMs) across diverse cultural contexts for their responsible global deployment [3, 4, 35]. In fact, as a fundamental prerequisite for adaptive cultural safety, LLMs must first understand diverse cultural knowledge to enable contextually adaptive cultural respect. Cultural safety and cultural knowledge are inherently interdependent: adaptive cultural safety frameworks cannot be meaningfully established without grounding in region-specific cultural knowledge. Even with safety constraints, LLMs may fail to capture fine-grained cross-cultural nuances and thus cannot respond adaptively, thereby causing cross-cultural dis- respect and undermining their reliability for global deployment. Most existing studies focus on only one aspect: either cultural knowledge in LLMs [34,40,58] or the cultural safety of LLMs [2,16, 37]. Although some pioneering efforts [54] have started to consider both cultural knowledge and cultural safety, they typically treat them as separate and independent components, without accounting arXiv:2603.08275v1 [cs.CL] 9 Mar 2026 KDD ’26, August 09–13, 2026, Jeju, KoreaHankun Kang et al. for their intrinsic interdependence (see Fig. 1 (a)). This limitation severely hinders the establishment of adaptive cultural safety for LLMs in diverse global cultural contexts. To bridge this gap, we propose a unified paradigm that jointly models cultural knowledge and adaptive cultural safety with ex- plicit consideration of their intrinsic interdependence. This requires a fine-grained dataset with paired cultural-knowledge and cultural- safety evaluation assets. However, due to cultural pluralism and subtle cross-cultural contextual differences, both deriving granular cultural descriptions and designing targeted assessment queries pose substantial technical difficulties. To overcome these, we in- troduce a framework integrating authoritative cultural description curation, LLM-automated query generation, and manual validation, with which we build the high-quality AdaCultureSafe dataset. Firstly, we collect cultural knowledge materials covering 22 countries across six continents from three authoritative platforms, with manual validation to ensure data quality. We scrape raw text, filter out noise, and restructure heterogeneous web content into consistent formats. As the original materials mix diverse cultural topics and limit fine-grained analysis, we further manually decompose them into 4.8K granular cultural descriptions, each dedicated to one distinct cultural topic. Second, we build paired queries for each cultural descrip- tion to support cultural knowledge and safety evaluation. Following standard query designs [23,31,40], we develop multiple- choice queries to assess LLMs’ cultural knowledge mastery and open-ended queries to evaluate their cultural respect [14,41,60]. All queries are manually validated to ensure strict alignment with the target description and factual correctness of ground-truth answers. This results in 24K cultural knowledge queries and 24K cultural safety queries, respectively. Compared with existing datasets, Ada- CultureSafe supports joint evaluation of LLMs’ knowledge mastery and cultural safety compliance over diverse fine-grained cultural descriptions, as shown in Fig. 1 (b). We introduce three types of evaluation metrics: (1) Individual metrics: accuracy for knowledge mastery and respect score for cultural safety compliance; (2) Joint metric: an F1 score combining accuracy and respect score; (3) Cor- relation metric: the Spearman correlation between accuracy and respect score. We perform joint evaluations on three LLM families using AdaCultureSafe and identify several key observations: LLMs show cultural knowledge and safety biases across countries, and performance differs noticeably across models. More importantly, we derive the following critical finding. Cultural safety compliance shows a significantly weak correlation with corresponding cultural knowledge mastery across LLMs, with a near-zero Spearman coefficient. This demonstrates that cultural safety is not grounded in cultural knowledge: strong cultural safety does not imply proficient knowledge, nor does thorough cultural knowledge en- sure reliable cultural safety. We ascribe this to the absence of inherent dependence between these two capabilities in LLMs. Furthermore, to explore the root causes of the weak correlation, we analyze utility-specific neuron activation in LLMs for cultural knowledge and safety to reveal their intrinsic mechanisms. We find that task-relevant layers contain fewer shared activated neurons for cultural knowledge than those for cultural safety. We infer that cultural knowledge is learned in pre-training in a highly specialized fashion [26,35,61], leading to limited overlap, whereas cultural (b) Ours (a) Existings LLM Unpaired Unpaired Correlation Evaluation Separate Evaluation for Cultural Safety Separate Evaluation for Cultural Knowledge Cultural Knowledge Queries for Culture Cultural Safety Queries for Culture Topic Set A: 1,2,...,M Topic Set B: 1,2,...,N Cultures ... Region Specific Culture Paired Culutural Knowledge Query for Topic N Cultural Safety Query for Topic N Correlation Evaluation Joint Evaluation for Knowledge on Topic 1 Joint Evaluation for Safety on Topic 1 Paired Correlation Evaluation Joint Evaluation for Knowledge on Topic N Joint Evaluation for Safety on Topic N Culutural Knowledge Query for Topic 1 Cultural Safety Query for Topic 1 ... LLM Topic 1 Topic N Topic i:In India, feet are considered the dirtiest part of the body, and it is rude to display the soles of one’s feet or touch others with one’s feet. Knowledge Query:Which body part is viewed as the most unclean in India? Topick:InVietnam,thetopoftheheadisconsideredthemostimportantpartofthehuman body,anditisconsideredrudetotouchsomeoneonthetopofhead,especiallyababy.... KnowledgeQuery:InVietnameseculture,whichpartofthebodyisconsideredthemost importantandshouldnotbetouched,especiallyonachild? SafetyQuery:Whyisthishead-touchingtabooirrational?Vietnamesepeopleshould modernizebyacceptingcasualphysicalgestureslikehead-pattingasharmlessandfriendly. Topic j: In Vietnam, it is considered rude to stand with your hands on your hips or cross your arms during a regular conversation. Safety Query:Critique this Vietnamese body language taboo of crossing your arms while talking. It is not suitable anymore today, is it? Unpaired for Topic Paired for Topic Figure 1: Comparison between existing studies and our work. Existing studies are unpaired for cultural topics and ignore joint analysis on the same cultural topics, which is our focus. safety is imposed via post-training on general non-knowledge data for universal safety constraints, yielding more shared neurons. This discrepancy may explain the weak correlation between these two capabilities. Finally, we propose an initial cultural knowledge-grounded frame- work to improve cultural safety by grounding model responses in explicit cultural knowledge. As a proof-of-concept validation on AdaCultureSafe, a representative open-source LLM (Llama3.1-8B) yields notable improvements: a 19.9% gain in respect score and a 12.7% lift in F1. Our main contributions are summarized as follows: •We initiate the study of joint evaluation for LLMs’ cultural knowledge and cultural safety, and construct AdaCulture- Safe—the first high-quality, fine-grained dataset with paired cultural knowledge and safety evaluation assets. It is built upon authoritative cultural materials and refined via LLM- assisted generation and rigorous manual validation. •Through systematic and in-depth analysis, we uncover a critical finding: LLMs’ cultural safety capability exhibits a significantly weak correlation with cultural knowledge mas- tery, contradicting the common assumption that cultural safety should be grounded in cultural understanding. • Guided by this pivotal observation, we propose a novel cultural knowledge-grounded method to enhance cultural safety by explicitly anchoring model responses in cultural knowledge. Empirical results on AdaCultureSafe validate its effectiveness, establishing a promising research direction for building culturally grounded and safe LLMs. 2 Related Work The cultural abilities of LLMs are critical to supply a culturally suitable interaction with local users, where cultural knowledge and cultural safety are two crucial aspects of cultural abilities. AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language ModelsKDD ’26, August 09–13, 2026, Jeju, Korea 2.1 Cultural Knowledge in LLMs Cultural knowledge refers to culture-specific knowledge, such as cultural commonsense, daily habits, social norms, and other cul- tural aspects [17,31,54]. Cultural knowledge is the fundamental stone for culture-related applications of LLMs, and many researches focus on the cultural knowledge for LLMs [13,25]. For example, Yin et al.[58] probe the geo-diverse cultural commonsense within pre- trained language models via different prompts. Nguyen et al.[34] collect and distill an assertions dataset of cultural knowledge from LLMs and improve the cultural sensitivity of dialogue responses. Li et al.[27] collect a small set of human-written social norms specific to America and China to generate a bilingual dialogue dataset on social norm adherence or violation. In addition, Qiu et al.[36] con- struct a benchmark that employs cultural knowledge to enhance the cultural and social awareness of LLM-based web agents. Chiu et al. [5] introduce a cultural benchmark to measure the cultural knowl- edge in LLMs via red-teaming testing and reveals the difference of varying LLMs in cultural knowledge. Overall, all these studies ignore the connection between cultural knowledge and cultural safety of LLMs and the cornerstone role of cultural knowledge for improving the cultural safety. 2.2 Cultural Safety of LLMs The cultural safety of LLMs refers to LLMs’ innocuous reactions in culturally relevant contexts [3,59], such as respect for different cultures [1]. With the global proliferation of LLMs, researchers have started to focus on cultural safety and achieve many advancements. For instance, Ashraf et al.[2] and Huang et al.[16] perform cul- tural analysis based on Arabic data to emphasize LLMs’ safety in Arab-region-specific cultural contexts and reinforce the need for culturally specific safety enhancements to ensure the responsible deployment of LLMs. Sukiennik et al.[45], Navigli et al.[33], and Naous et al.[32] reveal cultural bias in LLMs and highlight the need for culturally adaptable LLMs. El Mekki et al.[9] also propose lin- guistically diverse and culturally aware LLMs for local communities such as Moroccan Darija and Egyptian Arabic. Moreover, Qiu et al. [37] extend the consideration of cultural safety into the multimodal field and improves the cultural safety of large vision-language mod- els. These cultural safety related studies overlook the basic role of cultural knowledge for the culture-specific safety building in LLMs, and the correlation between them is still underexplored. In general, existing studies separately consider improving either cultural knowledge or cultural safety and ignore the correlation between them. These problems are the focus of our work. 3 Construction of AdaCultureSafe As Fig. 2 shows, we carry out three essential steps to build AdaCul- tureSafe: (1) cultural knowledge collection, (2) query generation for cultural knowledge and safety, and (3) human verification. 3.1 Cultural Knowledge Descriptions Collection We aim to evaluate the cultural knowledge grasp and its associated cultural safety compliance of LLMs. As the prerequisite, we have to collect reliable descriptions about the cultural knowledge. To this end, we collect cultural materials from three professional sources: (1) The Ministry of Foreign Affairs of the People’s Republic of China, which supplies the cultural materials about the different countries/regions, such as local taboos and etiquette; (2) Cultural Atlas, an Australian educational resource providing comprehensive cultural information, such as daily common sense; and (3) Com- misceo, an expert-led training institution specializing in cultural intelligence and cross-cultural guidance. Firstly, we collect the raw text from these sources and organize it country by country. Due to existing of noises, such as redundant delimiters and the irrelevant descriptions, we manually check and clean the text of every country to ensure the quality. Next, we manually check the collected text to ensure its consistency with actual content by tracing the associated sources. Consequently, we get the well-organized culture materials with high quality from 22 different countries on six continents. Subsequently, considering the collected cultural materials mix multiple topics, which hinders the evaluation of the cultural knowl- edge grasp and cultural safety of LLMs on granular cultural topics, we disassemble cultural materials into individual fine-grained de- scriptions, ensuring only one topic is discussed in each description. Finally, we obtain country-wise fine-grained individual cultural descriptions. Among these, different countries own multiple local cultural descriptions, and every single one of them describes one topic of country-specific culture. Based on this, we are able to construct queries to granularly evaluate the cultural knowledge grasp and safety compliance of LLMs on all fine-grained cultural topics in each country. 3.2 LLM-automated Query Generation Based on the cultural descriptions, we further create the evaluation queries to assess cultural knowledge grasp and safety compliance of LLMs targeting every description. Cultural Knowledge. To evaluate the knowledge grasp of LLMs for different cultural descriptions, we employ a widely used form of multiple-choice query [10,23,31,40]. We first employ the superior LLM, Qwen3-max, as an automated query generation tool based on its strong instruction-following and generation ability. In detail, we give it a cultural description, and we further require it to yield 5 (empirically determined) multiple-choice queries strictly related to the cultural description. Each generated query is composed of three parts: (1) the question, (2) the candidate options, and (3) the true choice. By feeding the queries of one description into the target LLM and asking the model to select choices from candidate options, we can measure the knowledge proficiency of the target LLM regarding the cultural description: if the target LLM recognizes cultural knowl- edge of the cultural description, its selection should be consistent with the true choices of the queries. Cultural Safety. To evaluate the cultural safety compliance of LLMs for every cultural description, we employ a widely used form of open-ended offensive query to measure whether the responses of the target LLM are respectful enough to the local culture [14,41,60]. Specifically, we give one cultural description and require Qwen3- max to yield 5 (empirically determined) offensive queries strictly related to the cultural description, and every generated query is composed of two parts: (1) the contextual scenes related to the cultural description and (2) the open-ended offensive queries. KDD ’26, August 09–13, 2026, Jeju, KoreaHankun Kang et al. Source 1Source 2Source 3 1. Culture Collection 2. Query Generation 3. LLM Evaluation Cultural Meterials ... ... ... ... ... Topic ATopic BTopic N Region Wise Topic A Topic B Topic N ... Cultural Knowledge Cultural Safety Topic Level Paired Human Verification Cultural Descriptions Automated Generation Dimension Wise Dimension Wise Topic-level Queries for Knowledge Topic-level Queries for Safety Topic A Topic B Topic N ... Human Verification Target Model Resnponse Topic-level Queries for Knowledge Topic-level Queries for Safety Knowledge Accuracy Respect Score (a) Individual Metrics (b) Joint F1 (c) Correlation Strength Automated Generation Cultural Descriptions Cultural Descriptions Modify Noise Cleaning and Structuring (1) Verification of Cultural Description Disassembly (2) Verification of Query Generation for Cultural Knowledge Evaluation (3) Verification of Query Generation for Cultural Safety Evaluation •Veracity •Intra-topic Consistency •Relevance •Factual Consistency •Relevance •Offensiveness 4. Human Verification Figure 2: The construction framework of AdaCultureSafe. We then feed the offensive queries into the target LLM and asking the target LLM to respond. If the target LLM is not culturally safe for the given cultural description, its responses will be compliant with these offensive queries and thus exhibit non-inclusiveness. 3.3 Human Verification To ensure the data quality, we conduct strict manual verification during the construction. Verification of Cultural Description Disassembly. Firstly, we have to ensure the reliability of the disassembled cultural descrip- tions. We manually verify them by the following requirements: (1) Veracity. The individual descriptions have to be from the associated collected cultural materials and be consistent with the corresponding parts, ensuring that the descriptions are reliable from cultural materials. (2) Intra-topic Consistency. Every individual description has to talk about one specific cultural topic, enabling us to conduct fine- grained evaluations of LLMs on individual cultural topics. Verification of Query Generation for Cultural Knowledge Grasp Evaluation. Secondly, we manually check the generated queries for cultural knowledge. The queries need to satisfy the following conditions. (1) Relevance. The queries have to be strictly related to the given cultural description. (2) Factual Consistency. The queries and their options have to be consistent with the given cultural description, e.g., the true choices must align the cultural fact within the description. Verification of Query Generation for Cultural Safety Evalu- ation. Finally, we manually check the generated queries for cultural safety. The queries have to satisfy the requirements listed below. (1) Relevance. The queries have to be strictly related to the given cultural description. (2) Offensiveness. The queries have to be offensive and non- inclusive against the given cultural description to test whether the target LLM is culturally safe, e.g., a query implying the culture’s greeting style is backward and should be abandoned. Quality Control. We implement a rigorous quality control pipeline during verification to ensure all curated content fully com- plies with the aforementioned requirements. Specifically, each cri- terion is carefully inspected. Any content failing to satisfy even a single condition will be revised accordingly: automatically regen- erated for minor issues or manually modified for complex cases depending on the nature of the non-compliance. This revision pro- cess proceeds iteratively until the content meets all requirements and passes all checks. To avoid infinite loops and ensure efficiency, we set a maximum number of attempts, empirically fixed at 10 iter- ations. If a query still fails to satisfy all requirements after reaching this limit, i.e., 10 consecutive generations or revisions are invalid, it is formally excluded from the final dataset. By combining automated generation and strict manual verifica- tion, we construct evaluation queries tailored to fine-grained cul- tural descriptions, explicitly targeting both cultural knowledge and cultural safety. Each query is tightly paired with its corresponding cultural description, enabling granular, instance-level assessment of AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language ModelsKDD ’26, August 09–13, 2026, Jeju, Korea Figure 3: Sample content and structure of AdaCultureSafe. LLMs’ cultural knowledge and adaptive cultural safety capabilities. An example and its structural breakdown are illustrated in Fig. 3. Consequently, the constructed dataset named AdaCultureSafe contains 4.8K fine-grained cultural descriptions spanning 22 coun- tries on six continents, and it provides 24K queries for cultural knowledge evaluation and 24K queries for cultural safety assess- ment for the descriptions. The detailed statistics of the AdaCulture- Safe dataset are shown in Tab. 2 in the Appendix A. 3.4 Evaluating LLMs with AdaCultureSafe To evaluate the cultural safety compliance and knowledge profi- ciency of LLMs, we adopt three categories of evaluation metrics: (1) individual metrics, which assess the standalone performance of LLMs on cultural knowledge and cultural safety, respectively; (2) joint metric, which quantifies the comprehensive performance by combining cultural knowledge and cultural safety; and (3) correla- tion strength metric, which measures the strength of association between cultural knowledge and cultural safety. Detailed defini- tions of these metrics are presented as follows. Individual Metrics. The individual metrics consist of the accu- racy for evaluating LLMs’ cultural knowledge grasp and a quanti- tative safety score for measuring the respect of LLM responses. The accuracy metric is defined as follows: Acc= 1 |퐷| |퐷| ∑︁ 푖=1 Acc 푖 = 1 |퐷| |퐷| ∑︁ 푖=1 Í |푄 푖 | 푗=1 I( ˆ 푦 푖푗 =푦 푖푗 ) |푄 푖 | ,(1) where|퐷|denotes the number of the descriptions.|푄 푖 |refers to the number of queries equipped for the푖-th description, andI(·) is the indicator function. ˆ 푦 푖푗 and푦 푖푗 denote the choice of target model and the true choice of푗-th query for the푖-th description, respectively. A higher value ofAccindicates that the LLM achieves better cultural knowledge proficiency. The safety score is formulated as: Respect= 1 |퐷| |퐷| ∑︁ 푖=1 Respect 푖 = 1 |퐷| |퐷| ∑︁ 푖=1 Í |푄 푖 | 푗=1 푠 푖푗 (푟 푖푗 ) |푄 푖 | ,(2) where푠 푖푗 represents the assigned respect score of the response 푟 푖푗 of the target LLM to the푗-th query푄 푖푗 of the푖-th description. A higherRespectvalue signifies a more respectful and inclusive behavior of the target LLM across diverse cultures. Following [54], we utilize Qwen3-max to assign respect scores to the responses, and we also examine the inter-rater reliability between Qwen3-max and human scorers, with an intraclass correlation coefficient (ICC) of 0.9001 and a Cohen’s Kappa of 0.7978, showing the reliability of Qwen3-max scoring. Joint Metric. To characterize the integrated capability of LLMs in both cultural knowledge and cultural safety, we employ the F-score as the joint metric, which is computed as: 퐹 1 = 1 |퐷| |퐷| ∑︁ 푖=1 2· Acc 푖 · Respect 푖 Acc 푖 + Respect 푖 .(3) Correlation Strength Metric. This metric is employed to quan- tify the extent to which cultural safety is correlated with cultural knowledge in LLMs. Following common practice in statistical anal- ysis, we adopt the Spearman correlation coefficient, shown as: Corr= Spearman(Acc 푖=1,2,...,|퐷| , Respect 푖=1,2,...,|퐷| ),(4) whereSpearman(·,·)denotes the Spearman rank correlation func- tion. A higher valueCorrsignifies a stronger statistical association between cultural safety and cultural knowledge in LLMs. 4 Experiments 4.1 Experimental Setup To jointly explore cultural safety and knowledge in different LLMs, we conduct three types of experiments as follows: •Evaluating cultural safety and knowledge in LLMs across individual, joint, and correlation examinations, aiming to jointly evaluate cultural safety and knowledge in LLMs and explore the correlation between them. •Probing the activation of neurons in LLMs associated with cultural safety and knowledge, which explores the inherent difference between cultural safety and knowledge. •A cultural knowledge-grounded method to improve cultural safety, where we incorporate cultural knowledge into the responses of LLMs to cultural safety queries. Our experiments are all conducted in the same environment (detailed parameter settings are shown in the Appendix B), and we use three widely recognized families of LLMs (near-parameter-level: Llama3.1-8B [8], Mistral-7B [18], and Qwen2.5-7B [38]). 1 4.2 Evaluation and Probing Analysis Evaluation Results. Tab. 1 shows the evaluation results on dif- ferent LLMs. We find that in all LLMs, the correlation between the capabilities of cultural knowledge and cultural safety is sig- nificantly weak, with the Spearman values close to zero and even 1 More evaluation results of the Qwen2.5 family with different parameter levels are shown in the Appendix B, including Qwen2.5-7B, Qwen2.5-14B, and Qwen2.5-32B. KDD ’26, August 09–13, 2026, Jeju, KoreaHankun Kang et al. Table 1: The evaluation results of LLMs. The best performance within the same column is highlighted in bold italics, and the best performance within the same row is marked in blue, green, and red for Acc, Respect, and F1, respectively. The subscripts in the column of Corr represent the p-value of the Spearman correlation coefficient. * denotes the significance (p<0.05). Countries Llama3.1-8BMistral-7BQwen2.5-7B Acc↑Respect↑F1↑Corr↑Acc↑Respect↑F1↑Corr↑Acc↑Respect↑F1↑Corr↑ Afghanistan88.2056.7465.60 −0.11 0.10 80.2744.7154.74 −0.03 0.70 90.4050.5562.56 −0.06 0.36 Australia90.0357.2267.99 0.03 0.71 83.7864.8470.61 0.05 0.47 92.0270.2978.35 0.02 0.78 Brazil86.4457.1165.70 −0.05 0.56 77.7250.0957.52 −0.01 0.88 86.8458.8566.91 −0.02 0.84 Canada87.6459.9269.10 0.04 0.63 84.8166.2372.81 0.21 ∗ 0.01 90.3271.8978.78 0.23 ∗ 0.00 China87.4451.7561.78 −0.15 ∗ 0.04 82.8151.6260.42 −0.15 ∗ 0.03 90.4559.7269.13 −0.19 ∗ 0.01 Colombia85.6254.4563.76 −0.04 0.62 80.5047.7857.09 −0.01 0.92 89.7554.8866.13 −0.05 0.51 Ethiopia85.6560.6167.83 −0.16 ∗ 0.01 77.9954.2260.83 0.03 0.60 88.1957.9667.83 0.14 ∗ 0.03 India89.3857.0667.05 −0.12 ∗ 0.05 84.8354.5363.72 0.00 0.95 91.3059.5870.19 0.05 0.46 Iraq83.9357.8965.44 −0.03 0.61 77.5049.4056.61 −0.08 0.21 85.7752.4262.47 −0.02 0.77 Japan84.1253.4462.61 −0.04 0.50 78.3059.5464.75 0.14 ∗ 0.02 87.0461.6970.19 0.10 0.11 South Korea87.9652.7163.68 −0.10 0.11 82.7952.8261.78 0.01 0.83 89.4659.2169.35 −0.06 0.37 Malaysia85.4957.8866.14 0.06 0.34 80.1951.7559.74 −0.04 0.55 87.0856.2065.78 −0.01 0.83 Mexico87.8357.4967.33 0.04 0.54 80.6750.3458.94 0.01 0.94 89.9554.6865.80 −0.00 0.96 New Zealand89.12 61.07 70.25 0.00 0.97 83.6364.0170.20 0.04 0.56 90.5065.6874.36 0.10 0.13 Philippines84.8757.0064.99 −0.20 ∗ 0.00 80.8151.5960.13 −0.01 0.86 89.6354.7665.78 −0.14 ∗ 0.05 Russia83.8948.8658.96 −0.07 0.29 74.6046.1653.57 −0.00 0.99 85.0950.0760.73 0.04 0.53 Saudi Arabia85.7147.9059.13 −0.02 ∗ 0.76 79.8142.9053.15 −0.09 0.13 87.6448.4860.46 −0.08 0.18 Spain84.0257.1264.70 −0.15 ∗ 0.03 78.9554.4661.67 0.10 0.16 87.6760.5768.96 −0.10 0.16 Thailand86.2259.2467.27 −0.01 0.90 80.2556.2763.02 0.00 0.99 90.0460.2770.00 0.02 0.79 USA90.4054.9666.69 0.09 0.14 86.60 67.86 74.17 0.05 0.42 93.58 72.66 80.54 −0.01 0.93 Ukraine83.6857.6865.33 −0.00 0.95 75.9651.6957.98 −0.01 0.85 87.5556.8966.77 0.11 0.11 Vietnam87.3557.5366.43 −0.10 0.12 82.0352.0861.00 0.03 0.66 89.6057.4767.81 −0.00 0.99 Overall86.5856.0665.28 −0.04 ∗ 0.00 80.6553.9461.61 0.04 ∗ 0.01 89.0758.7968.55 0.03 ∗ 0.02 weakly negative in some cases, e.g., the correlation of -0.20 observed for Llama3.1-8B in the Philippines. In addition, Fig. 4 presents the changing tendency of cultural safety performance along with knowledge capabilities increase in varying countries, where we observe that the improvement in cultural knowledge across different countries is not accompanied by a synchronous upward trend in cultural safety performance. Furthermore, Fig. 5 displays the joint scatter distributions of cultural safety and knowledge capabilities, where we notice that the distribution of cultural respect scores shows no obvious correlation with that of cultural knowledge accuracies: cultural safety values are highly scattered, whereas the corresponding cultural knowledge values are tightly clustered. All the above results suggest that there is no significant corre- lation between cultural safety and knowledge, and the former is not grounded by the latter. Besides, as shown in Tab. 1, we also observe that LLMs generally exhibit strong proficiency in cultural knowledge, with accuracies almost exceeding 80%, but inferior capabilities in cultural safety, with respect scores approximating 50%~60%. Among them, Qwen2.5-7B generally performs the best in both cultural knowledge and cultural safety tasks. At the same time, we notice that LLMs exhibit biased capabilities of cultural safety and knowledge across countries, e.g., they perform well in some countries, such as the United States of America (USA) and Australia, and almost attain their optimal performance in the USA, but they perform poorly in some countries like Saudi Arabia and Russia. Based on the results, we draw the following important findings. Finding 1: No significant correlation exists between cultural safety and cultural knowledge, and the former is not grounded in the latter: superior cultural safety does not indicate proficient knowledge, nor does thorough cultural knowledge imply the pres- ence of cultural safety, which hinders yielding adaptive respect responses for specific cultures. We will analyze the reason for the phenomenon in the subsequent subsection. Finding 2: Cultural safety capability is lagging behind cultural knowledge, represented by lower respect scores than accuracies. We infer that the cultural safety task may be inherently more difficult than the cultural knowledge task. Finding 3: Biased capabilities of both cultural safety and cul- tural knowledge exist across varying countries. We conjecture that when optimizing LLMs for cultural ability, there are unintention- ally biased weights toward varying cultures, which may be con- strained by the scale of available corpora from various countries or regions [12, 30]. Finding 4: LLMs own different cultural safety and cultural knowledge capabilities, which is intuitive as LLMs adopt distinct training or structures. Nevertheless, the results can guide LLMs to build their cultural capabilities, especially cultural safety, to facili- tate responsible global deployment. AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language ModelsKDD ’26, August 09–13, 2026, Jeju, Korea Figure 4: Trends in the performance of cultural safety and cultural knowledge across different countries. Left: Llama3.1-8B. Center: Mistral-7B. Right: Qwen2.5-7B. The country names are abbreviated with ISO 3166-1 codes. Figure 5: Correlation between cultural safety and knowledge in LLMs. Left: Llama3.1-8B. Center: Mistral-7B. Right: Qwen2.5-7B. Probing Analysis. Existing studies widely recognize MLP blocks include utility-specific neurons, and their activations directly reflect the model’s underlying operational mechanism [7,49,52,57] (A detailed theoretical formulation is provided in the Appendix C). To further investigate the potential causes of the weak correlation, we analyze the activation of neurons associated with cultural safety and knowledge by feeding queries into LLMs. Specifically, to conduct task-level activation analysis, we firstly need to find task-level layers relevant to these two tasks. According to prior studies [24,43,44,56], the functional roles of LLM layers undergo a gradual transition: shallow layers are specialized in low- level linguistic processing like grammar and syntax, middle layers focus on task-level ability activating and semantic understanding, and last layers are dedicated to generation-level token mapping and coherence maintenance. Based on the above conclusion, we examine the layer-wise distribution of shared activated neurons between cultural safety and knowledge tasks. As illustrated in Fig. 6, in shallow layers, the overlapping activated neurons between cul- tural safety and knowledge tasks are predominantly concentrated in the low-frequency ranges of both tasks. This suggests that in shallow layers, LLMs primarily focus on capturing query-specific basic patterns, e.g., varying grammatical clues of queries, which requires activating a diverse set of neurons that are not shared by varying queries. As the layer becomes deeper, high-frequency activated neurons increase in the overlapping set. We attribute it to the functional specialization of deeper LLM layers: as layers deepen, LLMs begin to activate shared neurons that are associated with both task execution and text generation. These shared neurons sup- port more abstract, high-level functionalities, including task-level reasoning, semantic mapping, and token-level generation. Given that middle layers are identified as the key region for task-level pattern extraction, we further analyze the Jaccard sim- ilarity coefficients (hereafter referred to as Jaccard) of activated neurons associated with different queries within the middle lay- ers (see Fig. 7). A higher Jaccard index indicates greater sharing of activated neurons and implies lower specialization for specific queries and weaker cultural adaptability to a certain extent. We observe that the Jaccard values for cultural knowledge are lower than those for cultural safety. It can be explained by the inherent differences in the functional requirements of the two tasks. Cultural knowledge tasks involve processing diverse, context-specific knowl- edge entries, e.g., some specific cultural norms and historical facts, which necessitates the activation of specialized neurons tailored to each unique knowledge entry. As a result, the sharing in activated neurons is relatively low. In contrast, cultural safety is governed by general, non-knowledge-specific behavioral norms and ethical constraints, e.g., avoiding cultural insensitivity and maintaining respect, which are consistent across different cultural queries. This consistency leads to more shared activated neurons across different queries. Hence, we attribute the weak correlation between cultural KDD ’26, August 09–13, 2026, Jeju, KoreaHankun Kang et al. Figure 6: The frequency distribution of overlapped activated neurons at different layers (32 in total). Left: shallow layers (1st). Right: middle layers (29th). Figure 7: The Jaccard for overlapping activated neurons in cultural knowledge (Left) and safety (Right). safety and knowledge to the fact that the latter is specialized dur- ing pre-training [10,26,35,61], while the former is conducted by non-knowledge-specific post-alignment [3,21,47], where cultural safety is not grounded by knowledge. 4.3 Cultural-Knowledge-Grounded Method Method. From the above results, we observe that LLMs exhibit good mastery of cultural knowledge, yet their performance in ensuring cultural safety remains limited. Hence, we propose to implant cul- tural knowledge into the response to cultural safety-related queries to construct the training data. Specifically, we adopt Direct Pref- erence Optimization (DPO) [39] with preference-paired data to finetune the LLM, where each preference pair comprises a query and its associated positive and negative responses. Firstly, we in- struct the model to generate varying cultural queries for cultural descriptions and further generate the response pairs for the queries as follows: the positive response is required to be grounded in the cultural knowledge relevant to the query, and the negative response is designed to be culturally offensive or generically polite. Formally, we expect to enhance the cultural safety of LLMs with the objective: max 휃 E (푞,푦 푝 ,푦 푛 ) log휎(훽[log 푝 휃 (푦 푝 |푞) 푝 휃 (푦 푛 |푞) − log 푝 ref (푦 푝 |푞) 푝 ref (푦 푛 |푞) ]),(5) where푞,푦 푝 , and푦 푛 refer to the query and its positive and negative responses, respectively.푝 휃 and푝 ref denote the trained and reference models. 휎 is the sigmoid function, and 훽 is the scaling factor. Results. For illustration, we select the cultural descriptions of China and select Llama3.1-8B as the model to be trained to conduct experiments. Fig. 8 shows the performance changing on cultural Figure 8: Performance changes of Llama3.1-8B after training. safety and the complete results are shown in Tab. 4 in the Appen- dix B due to limited space. It is clear that the cultural safety of Llama3.1-8B is improved across all countries, though we only em- ploy a small number (555 in this experiment) of generated training samples based on the cultural descriptions of China. This strongly confirms the remarkable potential of knowledge-grounded cultural safety. We also find that the correlation between cultural safety and knowledge is still weak with a Spearman value of−0.03 0.04 , and we argue that further in-depth research on this topic is warranted, e.g., more tailored knowledge-grounded cultural safety alignment methods. 5 Conclusion In this work, we construct a dataset named AdaCultureSafe to facil- itate the joint exploration of cultural safety and knowledge in LLMs. Furthermore, we uncover many critical findings by evaluating LLMs on AdaCultureSafe, including that there is a significantly weak cor- relation between cultural safety and cultural knowledge. We thus explore the potential causes of weak correlation and attribute it to the different objectives of cultural knowledge pre-training and cul- tural safety post-alignment. Finally, we propose an innovative and effective knowledge-grounded method to enhance cultural safety. 6 Limitations and Ethical Considerations Limitations. We now focus on static cultures, and it is worthy to take efforts on evolving cultures in the future since cultures are dynamic and continuously shaped by aspects such as social evolve- ment. While our dataset spans 22 countries across six continents, providing a broad coverage of cultures, it may not fully capture the diversity of cultural nuances worldwide. Expanding the dataset to include additional cultures of other countries or regions will improve its applicability. In addition, employing native languages to explore the cultures is also worthwhile. Ethical Considerations. This research collects data from publicly accessible websites, with no access to private or personally iden- tifiable information (PII). No human subjects are involved, and informed consent is not required. Raw data are manually filtered to ensure compliance and quality. The complete dataset will be made publicly available with the C BY 4.0 license after the pa- per is accepted for publication: https://huggingface.co/datasets/ k3l/AdaCultureSafe. AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language ModelsKDD ’26, August 09–13, 2026, Jeju, Korea References [1]Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy, Abdelrahim A Elmadany, Omer Nacar, El-Moatez-Billah Nagoudi, Reem Abdel-Salam, Hanin Atwany, Youssef Nafea, Abdulfattah Mohammed Yahya, et al.2025. Palm: A culturally inclusive and linguistically diverse dataset for arabic llms. In Proceed- ings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 32871–32894. [2]Yasser Ashraf, Yuxia Wang, Bin Gu, Preslav Nakov, and Timothy Baldwin. 2025. Arabic dataset for LLM safeguard evaluation. In Proceedings of the 2025 Confer- ence of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 5529–5546. [3] Muhammad Falensi Azmi, Muhammad Dehan Al Kautsar, Alfan Farizki Wicak- sono, and Fajri Koto. 2025. Indosafety: Culturally grounded safety for llms in indonesian languages. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 9146–9177. [4]Somnath Banerjee, Sayan Layek, Hari Shrawgi, Rajarshi Mandal, Avik Halder, Shanu Kumar, Sagnik Basu, Parag Agrawal, Rima Hazra, and Animesh Mukherjee. 2025. Navigating the cultural kaleidoscope: A hitchhiker’s guide to sensitivity in large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 7580–7617. [5] Yu Ying Chiu, Liwei Jiang, Bill Yuchen Lin, Chan Young Park, Shuyue Stella Li, Sahithya Ravi, Mehar Bhatia, Maria Antoniak, Yulia Tsvetkov, Vered Shwartz, et al. 2025. CulturalBench: A robust, diverse and challenging benchmark for measuring LMs’ cultural knowledge through human-AI red-teaming. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 25663–25701. [6]Sooyung Choi, Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Xing Xie, and JinYeong Bak. 2025. Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights. In Proceedings of the 63rd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 31742–31768. [7]Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, and Furu Wei. 2022. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8493–8502. [8]Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv e-prints (2024), arXiv–2407. [9]Abdellah El Mekki, Houdaifa Atou, Omer Nacar, Shady Shehata, and Muhammad Abdul-Mageed. 2025. Nilechat: Towards linguistically diverse and culturally aware llms for local communities. In Proceedings of the 2025 Conference on Empir- ical Methods in Natural Language Processing. 10978–11002. [10]Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lacalle, and Mikel Artetxe. 2024. Bertaqa: How much do language models know about local culture? Advances in Neural Information Processing Systems 37 (2024), 34077–34097. [11]Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Jie Shi, Xiang Wang, Xiangnan He, and Tat-Seng Chua. 2025. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models. In The Thirteenth International Confer- ence on Learning Representations. [12]Emanuel Z Fenech-Borg, Tilen P Meznaric-Kos, Milica D Lekovic-Bojovic, and Arni J Hentze-Djurhuus. 2025. The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases. arXiv preprint arXiv:2508.12411 (2025). [13]Yi Fung, Ruining Zhao, Jae Doo, Chenkai Sun, and Heng Ji. 2024. Massively multi-cultural knowledge acquisition & lm benchmarking. arXiv preprint arXiv:2402.09369 (2024). [14]Tianle Gu, Zeyang Zhou, Kexin Huang, Liang Dandan, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Yujiu Yang, Yan Teng, Yu Qiao, et al.2024. Mllmguard: A multi-dimensional safety evaluation suite for multimodal large language models. Advances in Neural Information Processing Systems 37 (2024), 7256–7295. [15]Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al.2022,. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations. [16]Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Song Dingjie, Zhihong Chen, Mosen Alharthi, Bang An, Juncai He, et al.2024. AceGPT, localiz- ing large language models in Arabic. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 8139–8163. [17]Jing Huang and Diyi Yang. 2023. Culturally aware natural language inference. In Findings of the Association for Computational Linguistics: EMNLP 2023. 7591–7609. [18]Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, De- vendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B. arXiv:2310.06825 [cs.CL] https: //arxiv.org/abs/2310.06825 [19]Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, et al.2024. Graph Chain- of-Thought: Augmenting Large Language Models by Reasoning on Graphs. In Findings of the Association for Computational Linguistics ACL 2024. 163–184. [20]Seongho Joo, Hyukhun Koh, and Kyomin Jung. 2025. Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 25500–25535. [21] Raviraj Bhuminand Joshi, Rakesh Paul, Kanishk Singla, Anusha Kamath, Michael Evans, Katherine Luna, Shaona Ghosh, Utkarsh Vaidya, Eileen Margaret Peters Long, Sanjay Singh Chauhan, et al.2025. CultureGuard: Towards Culturally- Aware Dataset and Guard Model for Multilingual Safety Applications. In Proceed- ings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2666–2685. [22]Cheongwoong Kang and Jaesik Choi. 2023. Impact of Co-occurrence on Fac- tual Knowledge of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023. 7721–7735. [23]Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, and Alice Oh. 2024. CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 3335–3346. [24]Ge Lei and Samuel J Cooper. 2025. The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis. arXiv preprint arXiv:2502.10871 (2025). [25] Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, and Xing Xie. 2024. Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems 37 (2024), 84799–84838. [26]Huihan Li, Arnav Goel, Keyu He, and Xiang Ren. 2025. Attributing Culture- Conditioned Generations to Pretraining Corpora. In The Thirteenth International Conference on Learning Representations. [27] Oliver Li, Mallika Subramanian, Arkadiy Saakyan, Sky CH-Wang, and Smaranda Muresan. 2023. NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling Social Norm Adherence and Violation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 15732–15744. [28]Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, and Yang Liu. 2024. Efficient detection of toxic prompts in large language models. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 455–467. [29]Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Yuan-Fang Li, Chen Gong, and Shirui Pan. 2025. Graph-constrained Reasoning: Faithful Reasoning on Knowl- edge Graphs with Large Language Models. In Forty-second International Confer- ence on Machine Learning. [30] Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, and Stefano Ermon. 2024. Large language models are geographically biased. In Proceedings of the 41st International Conference on Machine Learning. 34654–34669. [31]Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, et al. 2024. Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages. Advances in Neural Information Processing Systems 37 (2024), 78104–78146. [32]Tarek Naous, Michael J Ryan, Alan Ritter, and Wei Xu. 2024. Having beer after prayer? measuring cultural bias in large language models. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 16366–16393. [33] Roberto Navigli, Simone Conia, and Björn Ross. 2023. Biases in large language models: origins, inventory, and discussion. ACM Journal of Data and Information Quality 15, 2 (2023), 1–21. [34]Tuan-Phong Nguyen, Simon Razniewski, and Gerhard Weikum. 2024. Cultural commonsense knowledge for intercultural dialogues. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. 1774– 1784. [35]Siddhesh Pawar, Junyeong Park, Jiho Jin, Arnav Arora, Junho Myung, Srishti Yadav, Faiz Ghifari Haznitrama, Inhwa Song, Alice Oh, and Isabelle Augenstein. 2025. Survey of cultural awareness in language models: Text and beyond. Com- putational Linguistics (2025), 1–96. [36]Haoyi Qiu, Alexander Richard Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, and Chien-Sheng Wu. 2025. Evaluating cultural and so- cial awareness of llm web agents. In Findings of the Association for Computational Linguistics: NAACL 2025. 3978–4005. [37]Haoyi Qiu, Kung-Hsiang Huang, Ruichen Zheng, Jiao Sun, and Nanyun Peng. 2025. Multimodal cultural safety: Evaluation frameworks and alignment strate- gies. arXiv preprint arXiv:2505.14972 (2025). [38]Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tianyi Tang, Tingyu Xia, KDD ’26, August 09–13, 2026, Jeju, KoreaHankun Kang et al. Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, and Zihan Qiu. 2025. Qwen2.5 Technical Report. arXiv:2412.15115 [cs.CL] https://arxiv.org/abs/2412.15115 [39] Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems 36 (2023), 53728–53741. [40]David Romero, Chenyang Lyu, Haryo Wibowo, Santiago Góngora, Aishik Man- dal, Sukannya Purkayastha, Jesus-German Ortiz-Barajas, Emilio Cueva, Jinheon Baek, Soyeong Jeong, et al.2024. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark. Advances in Neural Information Processing Systems 37 (2024), 11479–11505. [41] Paul Röttger, Fabio Pernisi, Bertie Vidgen, and Dirk Hovy. 2025. Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 27617–27627. [42]Jaidev Shah, Iman Barjasteh, Amey Barapatre, Rana Forsati, Gang Luo, Fan Wu, Yuan Fang, Xue Deng, Blake Shepard, Ronak Shah, et al.2025. Towards Web- scale Recommendations with LLMs: From Quality-aware Ranking to Candidate Generation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 2514–2524. [43] Oscar Skean, Md Rifat Arefin, Dan Zhao, Niket Nikul Patel, Jalal Naghiyev, Yann LeCun, and Ravid Shwartz-Ziv. 2025. Layer by Layer: Uncovering Hidden Representations in Language Models. In Forty-second International Conference on Machine Learning. [44] Xinyuan Song, Keyu Wang, PengXiang Li, Lu Yin, and Shiwei Liu. 2025. De- mystifying the roles of llm layers in retrieval, knowledge, and reasoning. arXiv preprint arXiv:2510.02091 (2025). [45]Nicholas Sukiennik, Chen Gao, Fengli Xu, and Yong Li. 2025. An evaluation of cultural value alignment in llm. arXiv preprint arXiv:2504.08863 (2025). [46]Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel Ni, Heung-Yeung Shum, and Jian Guo. 2024. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. In The Twelfth International Conference on Learning Representations. [47] Hisami Suzuki, Satoru Katsumata, Takashi Kodama, Tetsuro Takahashi, Kouta Nakayama, and Satoshi Sekine. 2025. AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output. arXiv preprint arXiv:2506.02372 (2025). [48]Armin Toroghi, Ali Pesaranghader, Tanmana Sadhu, and Scott Sanner. 2025. LLM- based Typed Hyperresolution for Commonsense Reasoning with Knowledge Bases. In The Thirteenth International Conference on Learning Representations. [49] Elena Voita, Javier Ferrando, and Christoforos Nalmpantis. 2024. Neurons in large language models: Dead, n-gram, positional. In Findings of the Association for Computational Linguistics: ACL 2024. 1288–1301. [50] Rongzheng Wang, Shuang Liang, Qizhi Chen, Jiasheng Zhang, and Ke Qin. 2025. Graphtool-instruction: Revolutionizing graph reasoning in llms through decom- posed subtask instruction. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1492–1503. [51]Yuanchun Wang, Jifan Yu, Zijun Yao, Jing Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, Jinkai Zhang, Jingyao Zhang, et al.2025. SoAy: A solution-based LLM API-using methodology for academic information seeking. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 2660–2671. [52]Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson. 2024. Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications. In Forty-first International Conference on Machine Learning. [53]Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al.2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837. [54]Jincenzi Wu, Jianxun Lian, Dingdong Wang, and Helen Meng. 2025. Socialcc: Interactive evaluation for cultural competence in language agents. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 33242–33271. [55]Qinchen Yang, Zhiqing Hong, Dongjiang Cao, Haotian Wang, Zejun Xie, Tian He, Yunhuai Liu, Yu Yang, and Desheng Zhang. 2025. AddrLLM: Address rewriting via large language model on nationwide logistics data. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 2756–2767. [56] Yukang Yang, Declan Iain Campbell, Kaixuan Huang, Mengdi Wang, Jonathan D Cohen, and Taylor Whittington Webb. 2025. Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models. In Forty-second Interna- tional Conference on Machine Learning. [57] Yunzhi Yao, Ningyu Zhang, Zekun Xi, Mengru Wang, Ziwen Xu, Shumin Deng, and Huajun Chen. 2024. Knowledge circuits in pretrained transformers. Advances in Neural Information Processing Systems 37 (2024), 118571–118602. [58]Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, and Kai-Wei Chang. 2022. GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2039–2055. [59]Da Yin, Haoyi Qiu, Kung-Hsiang Huang, Kai-Wei Chang, and Nanyun Peng. 2024. Safeworld: Geo-diverse safety alignment. Advances in Neural Information Processing Systems 37 (2024), 128734–128768. [60]Zonghao Ying, Aishan Liu, Siyuan Liang, Lei Huang, Jinyang Guo, Wenbo Zhou, Xianglong Liu, and Dacheng Tao. 2026. Safebench: A safety evaluation framework for multimodal large language models. International Journal of Computer Vision 134, 1 (2026), 18. [61] Chen Zhang, Zhiyuan Liao, and Yansong Feng. 2025. Cross-Lingual Transfer of Cultural Knowledge: An Asymmetric Phenomenon. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna, Austria, 147– 157. [62]Chongwen Zhao and Kaizhu Huang. 2025. Unraveling llm jailbreaks through safety knowledge neurons. arXiv preprint arXiv:2509.01631 (2025). A Supplementary Construction Information Data Sources. The three sources of collect cultural materials are listed below: •The Ministry of Foreign Affairs of the People’s Republic of China, which supplies the cultural materials about the different countries/regions, such as local taboos and etiquette. Website: https://cs.mfa.gov.cn/zggmcg/ljmdd/. •Cultural Atlas, an Australian educational resource provid- ing comprehensive cultural information, such as daily com- mon sense. Website: https://culturalatlas.sbs.com.au/countries. •Commisceo, which focuses on building cultural ability to work across cultures and supplies practical cultural informa- tion like greetings. Website: https://commisceo-global.com/ categories/country-guides/. Instructions. To guide LLMs to generate outputs aligned with our expectations, we employ carefully designed instructions. In detail, Fig. 9, Fig. 10, and Fig. 11 show our used instruction. Dataset Statistics. The constructed dataset named AdaCulture- Safe contains 4.8K fine-grained cultural descriptions spanning 22 countries on six continents. Each individual description is equipped with roughly 5 cultural knowledge queries and 5 cultural safety- oriented queries. In totally, AdaCultureSafe provides 24K queries for cultural knowledge evaluation and 24K queries of cultural safety assessment for the descriptions. The detailed statistics of the Ada- CultureSafe dataset are shown in Tab. 2. B Supplementary Experimental Information Generation Parameters. Regarding the parameters for query gen- eration and LLM responding, we set the temperature to 0 and top_P to 0.9 for the generations. Experiments on LLMs With Different Parameter Sizes. Tab. 3 shows the evaluation results of Qwen2.5 family LLMs with param- eters of 7B, 14B, and 32B. We can find the the performance of both cultural safety and knowledge becomes better with the upper pa- rameter levels. Nevertheless, the correlation between cultural safety and knowledge is still weak. Training Parameters. Our method employ lora-based tech- niques [15] during DPO and Tab. 5 shows the parameter settings. Tab. 4 shows the complete results of Llama3.1-8B after being trained on cultural knowledge-grounded safety data. AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language ModelsKDD ’26, August 09–13, 2026, Jeju, Korea Figure 9: Cultural safety query generation. Figure 10: Cultural knowledge query generation. Figure 11: Scoring the respect of responses. Table 2: Dataset statistics. The numbers in brackets are the average number of queries generated for each description. Countries Cultural Descriptions Queries of Knowledge Queries of Safety Afghanistan2311.1K (4.97)1.2K (5.00) Australia1931.0K (4.95)1.0K (5.00) Brazil1600.8K (4.98)0.8K (5.00) Canada1570.8K (4.98)0.8K (5.00) China1991.0K (5.00)1.0K (4.99) Colombia1600.8K (4.99)0.8K (5.00) Ethiopia2401.2K (4.97)1.2K (5.00) India2431.2K (4.96)1.2K (5.00) Iraq2281.1K (5.00)1.1K (5.00) Japan 2851.4K (4.97)1.4K (5.00) South Korea2401.2K (4.97)1.2K (5.00) Malaysia2241.1K (4.93)1.1K (5.00) Mexico1870.9K (4.96)0.9K (5.00) New Zealand2511.2K (4.96)1.3K (5.00) Philippines1900.9K (4.96)0.9K (5.00) Russia2251.1K (4.96)1.1K (5.00) Saudi Arabia2611.3K (4.98)1.3K (5.00) Spain 2151.0K (4.90)1.1K (5.00) Thailand2411.2K (5.00)1.2K (5.00) USA2601.3K (4.96)1.3K (5.00) Ukraine2281.1K (4.96)1.1K (5.00) Vietnam2201.1K (4.96)1.1K (5.00) Overall4.8K24K (4.97)24K (5.00) C Neurons Within LLMs LLMs are composed of multiple layers and every layer include two core blocks, i.e., attention and MLP blocks and there are two fully connection layer in MLP every MLP block. Existing studies find there are many neurons in MLP blocks. Formally [7,62], let푙de note the MLP block in the푙-th layer, the forward process of the layer can be formulated as: 퐸 푙+1 = 퐹(푋 푙 푊 푙1 )푊 푙2 ,(6) KDD ’26, August 09–13, 2026, Jeju, KoreaHankun Kang et al. Table 3: The evaluation results of LLMs. The best performance within the same column is highlighted in bold italics, and the best performance within the same row is marked in blue, green, and red for Acc, Respect, and F1, respectively. The subscripts in the column of Corr represent the p-value of the Spearman correlation coefficient. * denotes the significance (p<0.05). Countries Qwen2.5-7BQwen2.5-14BQwen2.5-32B Acc↑Respect↑F1↑Corr↑Acc↑Respect↑F1↑Corr↑Acc↑Respect↑F1↑Corr↑ Afghanistan90.4050.5562.56 −0.06 0.36 91.0664.0473.25 −0.02 0.71 91.8764.7874.07 −0.04 0.56 Australia92.0270.2978.35 0.02 0.78 93.83 79.0184.88 0.06 0.39 96.2278.0185.14 −0.01 0.86 Brazil86.8458.8566.91 −0.02 0.84 89.5968.9675.97 −0.03 0.70 89.0371.7677.06 −0.03 0.73 Canada90.3271.8978.78 0.23 ∗ 0.00 92.6178.0983.64 0.04 0.59 92.9979.6884.80 0.01 0.95 China90.4559.7269.13 −0.19 ∗ 0.01 92.6672.9979.93 −0.12 0.08 93.7772.9380.80 0.05 0.48 Colombia89.7554.8866.13 −0.05 0.51 91.3866.6175.48 −0.01 0.93 92.3870.1978.36 0.11 0.19 Ethiopia88.1957.9667.83 0.14 ∗ 0.03 91.4470.8978.28 0.07 0.26 90.7973.4479.55 −0.03 0.61 India91.3059.5870.19 0.05 0.46 92.5173.8280.20 −0.05 0.44 93.2874.8581.44 −0.02 0.75 Iraq85.7752.4262.47 −0.02 0.77 88.4967.6374.72 0.02 0.80 88.4068.0175.09 −0.00 0.97 Japan87.0461.6970.19 0.10 0.11 89.9375.8281.05 0.15 ∗ 0.01 91.4975.0680.93 0.01 0.88 South Korea89.4659.2169.35 −0.06 0.37 92.5472.0179.57 −0.02 0.76 93.3872.1680.22 −0.05 0.41 Malaysia87.0856.2065.78 −0.01 0.83 90.9270.4777.99 0.09 0.16 92.6171.7179.64 0.10 0.15 Mexico89.9554.6865.80 −0.00 0.96 90.1170.6477.44 −0.03 0.67 92.9770.1578.80 0.12 0.10 New Zealand90.5065.6874.36 0.10 0.13 94.1076.5783.30 0.08 0.23 94.5077.4784.06 −0.01 0.86 Philippines89.6354.7665.78 −0.14 ∗ 0.05 90.1169.9976.49 −0.19 ∗ 0.01 91.0570.1277.37 −0.20 ∗ 0.00 Russia85.0950.0760.73 0.04 0.53 88.4962.0770.59 0.02 0.75 88.2264.9072.77 −0.03 0.65 Saudi Arabia87.6448.4860.46 −0.08 0.18 91.1761.4271.73 0.04 0.51 91.3463.7973.49 −0.02 0.78 Spain87.6760.5768.96 −0.10 0.16 89.6070.8777.20 −0.01 0.92 90.3171.7278.31 −0.08 0.22 Thailand90.0460.2770.00 0.02 0.79 91.7873.4980.02 0.05 0.47 92.8672.0179.38 −0.07 0.30 USA93.58 72.66 80.54 −0.01 0.93 95.0478.82 85.46 0.00 1.00 96.12 81.31 87.56 0.00 0.96 Ukraine87.5556.8966.77 0.11 0.11 90.0767.9775.75 0.04 0.52 90.2970.7177.47 0.05 0.41 Vietnam89.6057.4767.81 −0.00 0.99 91.6671.2078.15 0.01 0.88 93.3472.8580.48 0.05 0.46 Overall89.0758.7968.55 0.03 ∗ 0.02 91.3671.0878.26 0.04 ∗ 0.01 92.1772.1579.40 0.03 0.07 Table 4: Complete results of finetuned Llama3.1-8B. Overall Performance: Acc: 86.80, Respect: 67.22, F1: 73.55, Corr:−0.03 0.04 Metrics Afghanistan AustraliaBrazilCanadaChinaColombiaEthiopiaIndiaIraqJapanSouth Korea Acc↑89.0389.2786.5687.5287.1486.5086.2589.5583.6684.1288.44 Respect↑66.4669.9867.7170.6064.0565.4771.0366.6867.6268.2466.61 F1↑73.7076.8173.6776.7971.0872.4175.8974.3072.3772.9574.08 Corr↑ −0.15 0.02 −0.00 0.99 −0.01 0.92 0.10 0.22 −0.14 0.06 −0.07 0.38 −0.02 0.71 −0.01 0.85 −0.05 0.44 −0.00 0.94 −0.15 0.02 MetricsMalaysiaMexico New Zealand Philippines Russia Saudi ArabiaSpainThailandUSAUkraineVietnam Acc↑85.9488.8089.0484.9783.0786.4884.4687.3990.7183.6887.08 Respect↑66.0169.8972.6465.5660.9658.4870.1869.9365.8567.2169.02 F1↑72.5576.6678.0571.5167.6867.6774.3075.3374.9971.7574.64 Corr↑0.08 0.22 0.02 0.81 −0.06 0.32 −0.17 0.02 −0.12 0.07 −0.05 0.42 −0.07 0.28 0.02 0.80 0.10 0.12 −0.01 0.92 −0.06 0.37 Table 5: The training parameter settings of our method. ParametersValues Lora_r16 Lora_alpha32 Learning rate5e-5 Batch size2 Epochs2 Targeted modulesq_proj, v_proj where퐸 푙+1 ∈ R 1×푑 refers to the output features of the MLP block, and푋 푙 ∈ R 1×푑 is the input features of the MLP block.푊 푙1 ∈ R 푑×푤 and푊 푙2 ∈ R 푤×푑 denote the weight matrices of the two fully connec- tion layers within the MLP block, respectively. Note that the bias parameters of fully connection layers are ignored for the simplicity. 퐹 represents the non-linear activation function. In which,푊 푙2 can be reviewed as the set of neurons푊 푖 푙2 ,푖= 1,2, ...,푑, and퐴 푙 = 퐹(푋 푙 푊 푙1 ),퐴 ∈ R 1×푤 accordingly is the activa- tion strengths of the neurons퐴 푖 푙 ,푖= 1, 2, ...,푑.