← Back to papers

Paper deep dive

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu

Year: 2026Venue: arXiv preprintArea: cs.MAType: PreprintEmbeddings: 25

Abstract

Abstract:Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

Tags

ai-safety (imported, 100%)alignment-training (suggested, 80%)csma (suggested, 92%)preprint (suggested, 88%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/13/2026, 12:07:53 AM

Summary

The paper introduces the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a multi-agent framework designed to improve LLM alignment with human values. By instantiating multiple moral agents representing distinct normative perspectives (Authority, Care, Fairness, Loyalty, Sanctity) and fusing their outputs using rank- and score-based aggregation (CFA), the system leverages cognitive diversity to mitigate conflicts and improve response quality compared to single-agent baselines.

Entities (5)

Combinatorial Fusion Analysis · methodology · 100%VAS-CFA · framework · 100%Direct Preference Optimization · algorithm · 95%Moral Integrity Corpus · dataset · 95%Cognitive Diversity · metric · 90%

Relation Signals (4)

VAS-CFA utilizes Combinatorial Fusion Analysis

confidence 100% · We propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA)

VAS-CFA trainedon Moral Integrity Corpus

confidence 95% · The Moral Integrity Corpus (MIC) [27] is used for fine-tuning the individual agents.

Combinatorial Fusion Analysis measures Cognitive Diversity

confidence 90% · CFA characterizes a scoring system AA with a score function sA... cognitive diversity (CD) between AA and BB

VAS-CFA outperforms CVA-GS

confidence 90% · It also outperforms previous multi-agent results by CVA-GS and CVA-GS-DYN.

Cypher Suggestions (2)

Find all frameworks that utilize a specific methodology · confidence 90% · unvalidated

MATCH (f:Framework)-[:UTILIZES]->(m:Methodology {name: 'Combinatorial Fusion Analysis'}) RETURN f.name

Identify performance comparisons between models · confidence 85% · unvalidated

MATCH (a:Model)-[r:OUTPERFORMS]->(b:Baseline) RETURN a.name, b.name

Full Text

24,305 characters extracted from source content.

Expand or collapse full text

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion Abstract Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs. Index Terms— cognitive diversity, combinatorial fusion analysis, large language models, multi-agent systems, value alignment 1 Introduction Aligning large language models (LLMs) with human values is critical because models pretrained on broad web corpora can produce outputs that are untruthful, unsafe, or misaligned with user intentions [2, 23]. Alignment methods were developed to close these gaps. In recent years, numerous techniques have been developed to better align LLMs with human values, several of which we summarize below. The canonical alignment approach is Reinforcement Learning from Human Feedback (RLHF). This process involves supervised fine-tuning on demonstrations, followed by policy optimization using a reward model trained from human pairwise preferences, yielding strong improvements in toxic or unsafe outputs [19]. To reduce the high cost of human labeling, RLAIF [3] largely replaces human raters with AI judges guided by a written constitution that supplies critiques and preference labels, producing a harmless but not evasive assistant with far fewer human ratings. Other methods aim to simplify the complex optimization process. For instance, Direct Preference Optimization (DPO) [20] replaces online RL with a simple supervised objective over preference pairs, yet still matches or exceeds the performance of RLHF with proximal policy optimization (PPO). Agent Aa resp.Agent B a resp.Agent Ca resp.Agent Da resp.Agent Ea resp.InferenceInferenceInferenceInferenceInferenceunit 1unit 2unit 3⋮ n moral unit pool score comb.rank comb. average comb. (ASC/ARC) weighted comb. (WSCDS/WRCDS) Paraphraser∙ Select best model∙ Paraphrase top unit(c) CFA step(a)(b)(d) Fig. 1: The diagram for Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA). Moving beyond pairwise preferences, researchers are now leveraging richer textual signals to directly steer model behavior. For instance, Self-Refine [18] instructs a model to revise its own output, improving both human preference and task performance scores without any additional weight updates. Similarly, Reflexion [21] converts feedback into verbal self-supervision by having the model generate brief reflections that guide its subsequent attempts, yielding better reasoning performance without additional training. Unifying these concepts, Diverse AI Feedback (DAIF) [25] combines critique, refinement, and preference into a single pipeline. It routes tasks to the most informative feedback type, outperforming single-feedback baselines. This demonstrates that heterogeneous feedback is a powerful mechanism for improving alignment. A growing literature warns that RLxF can result in narrow objectives and miss crucial ethical complexity [6]. In response, methods are emerging to address this gap. User-Driven Value Alignment [10] documents users’ strategies to notice, contest, and correct biased outputs, motivating interfaces for bottom-up steering and collective auditing. At community scale, STELA [4] conducts deliberative norm-elicitation with underrepresented groups and derives a community ruleset for alignment, positioning alignment criteria as iterative and revisable. The system we proposed in this paper contrasts with prior alignment methods centered on a single agent. The Value Alignment System using CFA (VAS-CFA) assembles multiple, diverse moral agents, each instantiated to capture a distinct normative perspective, and fuses their signals via a combinatorial fusion analysis framework with score-based and rank-based aggregation. By leveraging diversity across agents rather than relying on a single evaluator, and by explicitly integrating rankings into the fusion step, VAS-CFA offers a new mechanism for steering LLMs toward behavior that better reflects human values. 2 Combinatorial fusion analysis (CFA) and Kemeny Rank Space (KnK_n) Combinatorial Fusion Analysis (CFA) provides methods and workflows for combining multiple scoring systems (MSS) (e.g., multiple classifier systems, multiple expert systems, multiple language models, and multiple agent systems) in computational learning and modeling, informatics, and intelligent ML/AI systems [12, 15]. CFA characterizes a scoring system A with a score function sAs_A, a derived rank function rAr_A, and a function that relates normalized score values to rank values [12, 15, 14]. 1234567891011Rank0.00.00.20.20.40.40.60.60.80.81.01.0Normalized scoreRank-score function graph for q8657q_8657fAf_AfBf_BfCf_CfDf_DfEf_E Fig. 2: Rank-score function graph for the question q8657q_8657: fA,fB,fC,fDf_A,f_B,f_C,f_D and fEf_E refer to agent A, B, C, D and E w.r.t. Authority, Care, Fairness, Loyalty and Sanctity, respectively. Let A be a scoring system on the dataset D=d1,…,dnD=\d_1,…,d_n\. Let sA:D→Rs_A:D→ R be a score function. Rank function rA:D→Nr_A:D→ N is derived by sorting the score values and assigning an increasing rank value to the data item in D on the decreasing score values. Rank–score function (RSF) fA:N→Rf_A:N→ R is defined as fA​(i)=sA​(rA−1​(i))=(sA∘rA−1)​(i)f_A(i)=s_A(r_A^-1(i))=(s_A r_A^-1)(i). For scoring systems A and B, cognitive diversity (CD) between A and B, C​D​(A,B)CD(A,B), is defined as the difference between fAf_A and fBf_B [12, 14, 13]: C​D​(A,B)=d​(fA,fB)=(∑i=1n(fA​(i)−fB​(i))2/(n−1))1/2CD(A,B)=d(f_A,f_B)= ( _i=1^n(f_A(i)-f_B(i))^2/(n-1) )^1/2. In addition, the diversity strength of the scoring system A, D​S​(A)DS(A), is the average of C​D​(A,A′)CD(A,A ) between A and all other scoring system A′A under consideration. ABDECABBCACCDDECEAEBDBEADABCABDBCDACDCDEBCEACEABEBDEADEABCDBCDEACDEABCEABDEABCDECFA models0.860.860.880.880.900.900.920.920.940.940.960.960.980.981.001.00F1 BERTScoresingle agentASCWSCDSARCWRCDSbest single agent Fig. 3: F1 BERTScore across 26 combinations under four CFA combination types (ASC, WSCDS, ARC, WRCDS) for the question q8657q_8657 (ASC sorted in non-decreasing order in each model group). The weighted score combination by diversity strength (WSCDS) and the weighted rank combination by diversity strength (WRCDS) are defined for each t in 2, 3, 4, 5 as below, respectively sW​S​C​D​S​(di)=∑Aj∈′sAj​(di)⋅D​S​(Aj)∑Aj∈′D​S​(Aj),di∈Ds_WSCDS(d_i)= _A_j s_A_j(d_i)\,·\,DS(A_j) _A_j DS(A_j),d_i∈ D sW​R​C​D​S​(di)=∑Aj∈′rAj​(di)⋅1D​S​(Aj)∑Aj∈′1D​S​(Aj),di∈Ds_WRCDS(d_i)= _A_j r_A_j(d_i)\,·\, 1DS(A_j) _A_j 1DS(A_j),d_i∈ D where ′A is any subset of the full scoring system set =A1,⋯,A5A=\A_1,·s,A_5\ containing at least 2 elements, i.e., ′⊆A , and ∥′∥=t =t, with t∈2,3,4,5t∈\2,3,4,5\, sAj​(di)s_A_j(d_i) and rAj​(di)r_A_j(d_i) are the score and rank assigned by scoring system AjA_j to the data item did_i, respectively; D​S​(Aj)DS(A_j) represents the diversity strength of scoring system AjA_j. CFA has been applied to a wide variety of domain applications in scientific discovery and decision making, including more recent results in drug discovery and material science [22, 24]. it was shown [11] that rank combination tends to perform better w.r.t. larger cognitive diversity. The CFA architecture entails both the continuous Euclidean space and a discrete rank space, corresponding to score function vectors and rank function vectors, respectively. When the score function is an 1-1 function, the rank function on the dataset D with n elements is a complete permutation of the n positive integers, [1,n]=(1,2,3,…,n)[1,n]=(1,2,3,…,n). In this case, the set of complete permutations on the set [1,n][1,n] is the symmetric group of order n under the composition operator ([8] and [16] suppl. inf.), denoted SnS_n. If we consider each permutation as a vertex in the graph of n!n! vertices and define adjacency between two vertices as a swap between an adjacent pair, the resulting graph is the bubble-sort Cayley graph BnB_n. (This Cayley graph is generated by a subset of vertices consisting of all transpositions between adjacent pairs of vertices). The graph BnB_n is (n−1)(n-1)-regular with connectivity n−1n-1, and BnB_n can be recursively constructed from n copies of Bn−1B_n-1 [16, 26]. When the score function is not 1-1, the derived ranking has ties. In this case, the ranking is not a complete permutation. Kemeny and Snell proposed a metric which includes tie ranking [17]. The Kemeny rank space KnK_n with the metric has number of vertices greater than n!n! (which is the number of vertices for BnB_n). Works related to KnK_n include rank aggregation, multi-layer combinatorial fusion, and priority ranking [1, 26, 5]. 3 VAS-CFA workflow We propose a new framework that integrates Combinatorial Fusion Analysis into multiagent aggregation. We refer it to Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA). Figure 1 shows the main steps for the proposed system. In the first step of our VAS-CFA workflow, we fine-tuned five value-specific agents–Authority (A), Care (B), Fairness (C), Loyalty (D), and Sanctity (E)–starting from the OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 SFT checkpoint, using Direct Preference Optimization (DPO) with QLoRA [7] on a single NVIDIA A100-40GB. The Moral Integrity Corpus (MIC) [27] is used for fine-tuning the individual agents. MIC dataset is publicly available and provides a large set of prompt-response pairs of 113.8K with human revised answers and rich ethical annotations. It uses fixed data splits of 91.0K/11.4K/11.4K samples for train/validation/test. Each moral agent was trained independently from the same base for 1 epoch with β=0.1β=0.1, learning rate 1×10−51× 10^-5, per-device batch size 22 and 88 gradient-accumulation steps. QLoRA loads the 12B base in 4-bit NF4 and trains only LoRA adapters, enabling practical single-GPU fine-tuning; We chose DPO because it optimizes human preferences without a reward model or online RL, which simplifies the pipeline and improves training stability and sample efficiency relative to PPO-based RLHF. After fine-tuning five moral agents—each aligned with a distinct moral value—we obtain, for every test prompt, one generated response from each agent (Figure 1(a)). A naïve approach is to aggregate these five responses, following the value-fusion strategy of Dognin et al. [9], which combines outputs using contextual information. However, direct aggregation risks semantic conflict: different agents may express incompatible moral commitments, yielding diluted or incoherent answers and weakening value alignment. Motivated by this, we decompose each agent’s output into moral units using GPT-4.1 nano, where each unit conveys a single moral claim (Figure 1(b)). This design is further supported by the observation that human-revised ground-truth answers are typically brief and contain only one or two moral ideas, whereas conversational model outputs are longer and multi-thematic. One example of breaking down a response to units is that Agent B (care) produces a response “Promoting intelligence should be prioritized to ensure your child grows up healthy and prosperous”, which is decomposed to three units ”Promoting intelligence should be prioritized”, “Ensuring your child grows up healthy is important” and “Ensuring your child grows up prosperous is important”. We then pool together all moral units extracted from the five agent responses. To score each unit for its alignment within each of the five moral values, we train a “moral classifier.” Concretely, we encode the human-revised answer using SentenceTransformer (all-MiniLM-L6-v2) and fit a logistic regression model for multi-label prediction. Given a moral unit, the classifier returns five scores, one per moral value, yielding the scoring table used in the CFA step (Figure 1(c)). Interpreting each column as a value-specific scoring system, we obtain five scoring systems w.r.t. the five models A, B, C, D, and E for the five moral agents, respectively. With the five scoring systems A, B, C, D, and E representing the five base moral agents, we use CFA described in Section 2 to produce ∑i=15(5i)=26 _i=1^5 5i=26 combinations. In each of the 26 combinations, the CFA framework using diversity strength as weight, gives rise to four types of combinations: (1) average score combination (ASC), (2) weighted score combination by diversity strength (WSCDS), (3) average rank combination (ARC), and (4) weighted rank combination by diversity strength (WRCDS). Next, we compare these 26 candidates, in one of the four types of CFA combinations (1)-(4), against the human-revised answer and retain the single best unit per configuration, yielding four units per prompt. In this study we proceed with the top unit only and pass it to a paraphraser that is prompted to answer the user’s question while preserving the unit’s moral content (Figure 1(d)). This step is necessary because a moral unit encodes a moral claim, not necessarily a complete answer. If multiple units were selected, an aggregation module would instead fuse them into a coherent answer. 4 Results In this section, we report results for the VAS-CFA framework. Figure 2 plots the rank-score function graphs for question q8657q_8657 within test dataset of 113.8k prompt-response pairs. For each question, the five value-specific scoring systems yield five curves, which we overlay to visualize their mutual variation. Our results show that the five agents exhibit notable cognitive diversity across most of the questions in the test dataset. Table 1: Performance summary (ROUGE-L and F1F_1 BERTScore). Model (a) F1F_1 ROUGE-L (b) F1F_1 BERTScore (i). Individual moral agent A 0.0925 0.8569 B 0.0821 0.8533 C 0.1249 0.8628 D 0.1376 0.8663 E 0.1343 0.8653 (i). Fusion methods with CFA VAS-CFA: ASC 0.1594 0.8831 VAS-CFA: ARC 0.1691 0.8847 VAS-CFA: WSCDS 0.1598 0.8832 VAS-CFA: WRCDS 0.1692 0.8849 (i). Fusion methods without CFA Raw aggregation∗ 0.1318 0.8654 CVA-GS [9] 0.1120 0.8728 CVA-GS-DYN [9] 0.1450 0.8754 Note: Agents A, B, C, D and E refer to Authority, Care, Fairness, Loyalty and Sanctity, respectively. ∗ Raw aggregation = aggregating the original five responses from the moral agents. Results of the 26 combinations in Figure 3 from question q8657q_8657 show that rank combinations (ARC/WRCDS) consistently outperform score combinations (ASC/WSCDS) due to cognitive diversity between moral agents as exhibited in Figure 2 [11]. Moreover, diversity strength (DS) as weight gives rise to desirable non-linear combination among individual agents. Next, we evaluate VAS-CFA using the ROUGE-L metric. ROUGE-L measures the longest common subsequence between a system output and a reference, capturing sentence-level overlap while allowing for non-contiguous matches; we report the standard F1F_1 score, which balances precision and recall against the human-revised answer. Table 1(a) summarizes the results across three groups of models. Group (i) consists of the five based models. Group (i) lists results from each of four combinations (ASC, ARC, WSCDS, WRCDS). The group (i) consists of raw aggregation, CVA-GS, and CVA-GS-DYN [9]. Finally, Table 1(b) summarizes F1F_1 BERTScore results across three groups of models (i), (i), and (i) as in Table 1(a) for the F1 ROUGE-L score. In both cases (a) F1F_1 ROUGE-L and (b) F1F_1 BERTScore in Table 1, VAS-CFA results exhibits improvement over results from single moral agents. It also outperforms previous multi-agent results by CVA-GS and CVA-GS-DYN. In addition, rank combinations (ARC/WRCDS) outperform score combinations (ASC/WSCDS) due to cognitive diversity between agents [11]. 5 Conclusion In this paper, we introduced Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that advances value alignment by operationalizing multiagent aggregation. Unlike prior approaches that rely on a single evaluator or narrowly defined reward signals, VAS-CFA instantiates multiple moral agents, each aligned with a distinct normative perspective, and integrates their outputs through Combinatorial Fusion Analysis (CFA). This design explicitly leverages cognitive diversity, using both rank- and score-based fusion to mitigate redundancy, resolve conflicts, and produce more coherent, value-sensitive responses. Our experimental results demonstrate that the VAS-CFA is robust and consistently outperforms single-agent models and existing aggregation baselines across standard evaluation metrics. These findings highlight the potential of multiagent fusion as a powerful mechanism for capturing pluralistic values and improving alignment quality. References [1] S. Akbari and A. R. Escobedo (2023) Beyond kemeny rank aggregation: a parameterizable-penalty framework for robust ranking aggregation with ties. Omega 119, p. 102893. Cited by: §2. [2] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Mané (2016) Concrete problems in AI safety. arXiv preprint arXiv:1606.06565. Cited by: §1. [3] Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. (2022) Constitutional AI: harmlessness from AI feedback. arXiv preprint arXiv:2212.08073. Cited by: §1. [4] S. Bergman, N. Marchal, J. Mellor, S. Mohamed, I. Gabriel, and W. Isaac (2024) STELA: a community-centred approach to norm elicitation for AI alignment. Scientific Reports 14 (1), p. 6616. Cited by: §1. [5] W. D. Cook and L. M. Seiford (1978) Priority ranking and consensus formation. Management Science 24 (16), p. 1721–1732. Cited by: §2. [6] A. Dahlgren Lindström, L. Methnani, L. Krause, P. Ericson, Í. M. de Rituerto de Troya, D. Coelho Mollo, and R. Dobbe (2025) Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through reinforcement learning from human feedback: ad lindström et al.. Ethics and Information Technology 27 (2), p. 28. Cited by: §1. [7] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer (2023) Qlora: efficient finetuning of quantized llms. Advances in neural information processing systems 36, p. 10088–10115. Cited by: §3. [8] P. Diaconis (1988) Group representations in probability and statistics. Lecture notes-monograph series 11. Cited by: §2. [9] P. Dognin, J. Rios, R. Luss, P. Sattigeri, M. Liu, I. Padhi, M. Riemer, M. Nagireddy, K. Varshney, and D. Bouneffouf (2025) Contextual value alignment. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p. 1–5. Cited by: §3, Table 1, Table 1, §4. [10] X. Fan, Q. Xiao, X. Zhou, J. Pei, M. Sap, Z. Lu, and H. Shen (2025) User-driven value alignment: understanding users’ perceptions and strategies for addressing biased and discriminatory statements in ai companions. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, p. 1–19. Cited by: §1. [11] D. Frank Hsu and I. Taksa (2005) Comparing rank and score combination methods for data fusion in information retrieval. Information retrieval 8 (3), p. 449–480. Cited by: §2, §4, §4. [12] D. F. Hsu, Y. Chung, and B. S. Kristal (2006) Combinatorial fusion analysis: methods and practices of combining multiple scoring systems. In Advanced data mining technologies in bioinformatics, p. 32–62. Cited by: §2, §2. [13] D. F. Hsu, B. S. Kristal, Y. Hao, and C. Schweikert (2019) Cognitive diversity: a measurement of dissimilarity between multiple scoring systems. Journal of Interconnection Networks 19 (01), p. 1940001. Cited by: §2. [14] D. F. Hsu, B. S. Kristal, and C. Schweikert (2010) Rank-score characteristics (RSC) function and cognitive diversity. In International Conference on Brain Informatics, p. 42–54. Cited by: §2, §2. [15] D. F. Hsu, B. S. Kristal, and C. Schweikert (2024) Combinatorial fusion analysis. Computer 57 (09), p. 96–100. Cited by: §2. [16] N. Jiang, M. Quazi, C. Schweikert, D. Hsu, T. Oprea, and S. Sirimulla (2023) Enhancing admet property models performance through combinatorial fusion analysis. ChemRxiv. Cited by: §2. [17] J. G. Kemeny and J. L. Snell (1962) Mathematical models in the social sciences, chapter preference rankings: an axiomatic approach. MIT Press, Cambridge. Cited by: §2. [18] A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, et al. (2023) Self-refine: iterative refinement with self-feedback. Advances in Neural Information Processing Systems 36, p. 46534–46594. Cited by: §1. [19] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. (2022) Training language models to follow instructions with human feedback. Advances in neural information processing systems 35, p. 27730–27744. Cited by: §1. [20] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn (2023) Direct preference optimization: your language model is secretly a reward model. Advances in neural information processing systems 36, p. 53728–53741. Cited by: §1. [21] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao (2023) Reflexion: language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36, p. 8634–8652. Cited by: §1. [22] Y. Tang, Z. Li, M. A. N. Nellikkal, H. Eramian, E. M. Chan, A. J. Norquist, D. F. Hsu, and J. Schrier (2021) Improving data and prediction quality of high-throughput perovskite synthesis with model fusion. Journal of chemical information and modeling 61 (4), p. 1593–1602. Cited by: §2. [23] Y. Wang, W. Zhong, L. Li, F. Mi, X. Zeng, W. Huang, L. Shang, X. Jiang, and Q. Liu (2023) Aligning large language models with human: a survey. arXiv preprint arXiv:2307.12966. Cited by: §1. [24] J. Yang, Y. Chen, T. Shen, B. S. Kristal, and D. F. Hsu (2005) Consensus scoring criteria for improving enrichment in virtual screening. Journal of chemical information and modeling 45 (4), p. 1134–1146. Cited by: §2. [25] T. Yu, T. Lin, Y. Wu, M. Yang, F. Huang, and Y. Li (2025) Diverse AI feedback for large language model alignment. Transactions of the Association for Computational Linguistics 13, p. 392–407. Cited by: §1. [26] X. Zhong, L. Hurley, S. Sirimulla, C. Schweikert, and D. Hsu (2019) Combining multiple ranking systems on the generalized permutation rank space. In 2019 IEEE 5th International Conference on Big Data Intelligence and Computing (DATACOM), p. 123–129. Cited by: §2, §2. [27] C. Ziems, J. Yu, Y. Wang, A. Halevy, and D. Yang (2022) The moral integrity corpus: a benchmark for ethical dialogue systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p. 3755–3773. Cited by: §3.