← Back to papers

Paper deep dive

Ensemble Debates with Local Large Language Models for AI Alignment

Ephraiem Sarabamoun

Year: 2025Venue: arXiv preprintArea: Scalable OversightType: EmpiricalEmbeddings: 0

Models: DeepSeek-R1-32B, DeepSeek-R1-7B, Mistral-7B, Phi-3-3.8B

Abstract

Abstract:As large language models (LLMs) take on greater roles in high-stakes decisions, alignment with human values is essential. Reliance on proprietary APIs limits reproducibility and broad participation. We study whether local open-source ensemble debates can improve alignmentoriented reasoning. Across 150 debates spanning 15 scenarios and five ensemble configurations, ensembles outperform single-model baselines on a 7-point rubric (overall: 3.48 vs. 3.13), with the largest gains in reasoning depth (+19.4%) and argument quality (+34.1%). Improvements are strongest for truthfulness (+1.25 points) and human enhancement (+0.80). We provide code, prompts, and a debate data set, providing an accessible and reproducible foundation for ensemble-based alignment evaluation.

Tags

ai-safety (imported, 100%)alignment-training (suggested, 80%)empirical (suggested, 88%)scalable-oversight (suggested, 92%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

No full-text extraction is stored for this paper yet.