Paper deep dive
Between Generation and Judgment: A Cloud-Native Framework for Adversarial Evaluation of LLM Alignment
Diego E. G. Caetano De Oliveira, C. Miers, Marcos A. Simplicio, Victor Takashi Hayashi
Models: DeepSeek-V3, GPT-4o, Llama 3.3 70B Instruct, Mixtral 8x7B Instruct
Abstract
We present a cloud-native, end-to-end pipeline that unifies automated attack generation with a modular LLM-as-a-Judge, supports static/adaptive corpora across open and proprietary models, enables calibrated multi-judge consensus, and emits audit-ready cost/latency telemetry-addressing gaps from non-cloud-native scripts, decoupled attack/judgment, and single-judge bias. Evaluated under two budgets (MAX_PROMPTS <tex>$=10,32)$</tex> on three targets-Llama 3.3 70B Instruct, Mix-tral <tex>$8\times 7\mathrm{B}$</tex> Instruct, DeepSeek-V3-ASR spans 0-37.5% (10) and 0-26.6% (32); judge accuracy ranges from 92.93% (open Llama 3.3 70B Instruct) to 98.94% (proprietary GPT-4o); mean judgment latency is 1.2-5.6 s <tex>$\text { (GPT-4o DeepSeek-V3) }$</tex>; and unit cost is $0.10-$2.18 per lk adjudications <tex>$(\text{Mixtral}8\times 7\mathrm{B} \rightarrow \text{GPT}-40)$</tex>. These results motivate a tiered policy-lightweight open-family judges for high-throughput triage and cross-family/ensemble judges for low-confidence cases-offering a practical blueprint for continuous, auditable adversarial eval-uation of LLM alignment at cloud scale.
Tags
Links
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%
Last extracted: 3/11/2026, 12:38:53 AM
Summary
The paper introduces a cloud-native framework for the adversarial evaluation of LLM alignment, integrating automated attack generation with a modular LLM-as-a-Judge system. The framework supports multi-judge consensus, provides audit-ready telemetry, and addresses biases found in single-judge setups. Empirical evaluation on models like Llama 3.3, Mixtral, and DeepSeek-V3 demonstrates the framework's effectiveness in cost, latency, and accuracy, proposing a tiered policy for scalable evaluation.
Entities (5)
Relation Signals (4)
Cloud-Native Framework → evaluates → Llama 3.3 70B Instruct
confidence 95% · Evaluated under two budgets on three targets-Llama 3.3 70B Instruct
Cloud-Native Framework → evaluates → Mixtral 8x7B Instruct
confidence 95% · Evaluated under two budgets on three targets-Mixtral 8x7B Instruct
Cloud-Native Framework → evaluates → DeepSeek-V3
confidence 95% · Evaluated under two budgets on three targets-DeepSeek-V3
GPT-4o → actsas → LLM-as-a-Judge
confidence 90% · judge accuracy ranges from 92.93% (open Llama 3.3 70B Instruct) to 98.94% (proprietary GPT-4o)
Cypher Suggestions (2)
Find all LLMs evaluated by the framework · confidence 90% · unvalidated
MATCH (f:Framework {name: 'Cloud-Native Framework'})-[:EVALUATES]->(m:LLM) RETURN m.nameIdentify models acting as judges · confidence 90% · unvalidated
MATCH (m:LLM)-[:ACTS_AS]->(r:Role {name: 'LLM-as-a-Judge'}) RETURN m.nameFull Text
850 characters extracted from source content.
Expand or collapse full text
Between Generation and Judgment: A Cloud-Native Framework for Adversarial Evaluation of LLM Alignment | IEEE Conference Publication | IEEE Xplore IEEE Account Change Username/Password Update Address Purchase Details Payment Options Order History View Purchased Documents Profile Information Communications Preferences Profession and Education Technical Interests Need Help? US & Canada: +1 800 678 4333 Worldwide: +1 732 981 0060 Contact & Support About IEEE Xplore Contact Us Help Accessibility Terms of Use Nondiscrimination Policy Sitemap Privacy & Opting Out of Cookies A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.© Copyright 2026 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.