Paper deep dive

AutORAN: LLM-driven Natural Language Programming for Agile xApp Development

Xin Li, Shiming Yu, Leming Shen, Jianing Zhang, Yuanqing Zheng, Yaxiong Xie

Year: 2026Venue: arXiv preprintArea: cs.NIType: PreprintEmbeddings: 83

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 98%

Last extracted: 3/22/2026, 6:06:48 AM

Summary

AutORAN is an LLM-driven natural language programming framework designed to automate the development of xApps for Open Radio Access Network (O-RAN) systems. It addresses the complexity and time-consuming nature of manual xApp development by using a multi-stage pipeline that includes requirement refinement, O-RAN domain knowledge injection, and staged validation to ensure generated code is compliant with O-RAN standards and interfaces.

Entities (5)

AutoRAN · framework · 100%LLM · technology · 100%O-RAN · network-architecture · 100%xApp · software-application · 100%RIC · network-component · 95%

Relation Signals (3)

AutoRAN → automates → xApp development

confidence 100% · AutORAN, the first LLM-driven natural language programming framework for agile xApps that automates the entire xApp development pipeline.

xApp → executeson → RIC

confidence 100% · These xApps execute on the RIC to perform real-time or near-real-time monitoring and control tasks.

AutoRAN → uses → LLM

confidence 100% · AutORAN, the first LLM-driven natural language programming framework.

Cypher Suggestions (2)

Identify the execution environment of xApps · confidence 95% · unvalidated

MATCH (x:Software {type: 'xApp'})-[:EXECUTES_ON]->(r:Component {type: 'RIC'}) RETURN x, r

Find all components related to the AutORAN framework · confidence 90% · unvalidated

MATCH (a:Framework {name: 'AutORAN'})-[:USES|AUTOMATES]->(target) RETURN a, target

Abstract

Abstract:Traditional RAN systems are closed and monolithic, stifling innovation. The openness and programmability enabled by Open Radio Access Network (O-RAN) are envisioned to revolutionize cellular networks with control-plane applications--xApps. The development of xApps (typically by third-party developers), however, remains time-consuming and cumbersome, often requiring months of manual coding and integration, which hinders the roll-out of new functionalities in practice. To lower the barrier of xApp development for both developers and network operators, we present AutORAN, the first LLM-driven natural language programming framework for agile xApps that automates the entire xApp development pipeline. In a nutshell, AutORAN turns high-level user intents into swiftly deployable xApps within minutes, eliminating the need for manual coding or testing. To this end, AutORAN builds a fully automated xApp generation pipeline, which integrates multiple functional modules (from user requirement elicitation, AI/ML function design and validation, to xApp synthesis and deployment). We design, implement, and comprehensively evaluate AutORAN on representative xApp tasks. Results show AutORAN-generated xApps can achieve similar or even better performance than the best known hand-crafted baselines. AutORAN drastically accelerates the xApp development cycle (from user intent elicitation to roll-out), streamlining O-RAN innovation.

PDF

Open source PDF →Open local PDF →

Full Text

82,722 characters extracted from source content.

Expand or collapse full text

SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING1 AutORAN: LLM-driven Natural Language Programming for Agile xApp Development Xin Li, Shiming Yu, Leming Shen, Jianing Zhang, Yuanqing Zheng, and Yaxiong Xie Abstract—Traditional RAN systems are closed and monolithic, stifling innovation. The openness and programmability enabled by Open Radio Access Network (O-RAN) are envisioned to revolutionize cellular networks with control-plane applications– xApps. The development of xApps (typically by third-party developers), however, remains time-consuming and cumbersome, often requiring months of manual coding and integration, which hinders the roll-out of new functionalities in practice. To lower the barrier of xApp development for both developers and network operators, we presentAutORAN, the first LLM-driven natural language programming framework for agile xApps that automates the entire xApp development pipeline. In a nutshell, AutORANturns high-level user intents into swiftly deployable xApps within minutes, eliminating the need for manual coding or testing. To this end,AutORANbuilds a fully automated xApp generation pipeline, which integrates multiple functional modules (from user requirement elicitation, AI/ML function design and validation, to xApp synthesis and deployment). We design, implement, and comprehensively evaluateAutORANon representative xApp tasks. Results showAutORAN-generated xApps can achieve similar or even better performance than the best known hand-crafted baselines.AutORANdrastically accel- erates the xApp development cycle (from user intent elicitation to roll-out), streamlining O-RAN innovation. Index Terms—Open RAN, xApp Generation, Near-RT RIC, LLM-assisted Coding. I. INTRODUCTION I N recent years, Open Radio Access Network (O-RAN) [1]– [ 8] has emerged as a promising paradigm for more intel- ligent and flexible cellular networks. O-RAN disaggregates the traditional RAN into multiple modularized units and standardizes the interfaces for interoperability. A key element of O-RAN is the RAN Intelligent Controllers (RICs), which acts as the network’s intelligence layer by hosting modular control-plane applications calledxApps. These xApps execute on the RIC to perform real-time or near-real-time monitoring and control tasks,e.g., anomaly detection [ 9]–[11] and traffic steering [ 12], [13] for self-optimizing networks. This open and programmable architecture unleashes innovation by allowing third-party developers to introduce new RAN functionalities via xApps. However, it also creates an urgent need forfast and flexible xApp development:network operators must be Xin Li, Shiming Yu, Leming Shen, Jianing Zhang, and Yuanqing Zheng are with the Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China (e-mail: cs-xin.li@connect.polyu.hk; shiming.yu@connect.polyu.hk;leming.shen@connect.polyu.hk; jianing98.zhang@connect.polyu.hk; csyqzheng@comp.polyu.edu.hk). Yaxiong Xie is with the Department of Computer Science and Engineering, University at Buffalo, NY, USA (e-mail: yaxiongx@buffalo.edu). This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. RU DU CU RIC xApps Database Traditional xApp Development Network Operators AutORAN Agent I want an xApp... xApp Programming Paradigm Shift KPM Fig. 1.Conventional xApp development hinders O-RAN innovation.Au- tORANpresents an automated agile xApp development framework. able to develop and update xApps at the pace of evolving service demands. Developing a new xApp today, however, remainstime- consuming and cumbersome. It requires deep expertise in O-RAN architecture with complex interface specifications, protocols, and AI/ML algorithms for network control. Thus, third-party xApp developers often face tremendous challenges aligning with operators requirements, hindering the rollout of new functionalities in practice. As illustrated in Fig. 1, introducing a new service via xApps typically involves either a costly in-house development or reliance on external vendors. In-house development demands substantial time and resources, while outsourcing raises serious concerns about data privacy and lack of transparency. Moreover, as networks scale and become more heterogeneous, operators may need to deploy hundreds of specialized xApps to cover diverse use cases [ 2], [14]–[17]. This manual development paradigm has become a major bottleneck to O-RAN innovation, calling for a radically more efficient approach to xApp creation. To address these challenges, we introduceAutORAN, the first LLM-driven natural language programming framework for agile xApp development.AutORANturns high-level user intents into deployable xApps in an end-to-end automated pipeline. Thus, a network operator can simply describe a desired network functionality or policy in natural language, andAutORANgenerates a corresponding xApp to fulfill the requirement. This approach is inspired by recent advances in large language models (LLMs) and agentic AI [18]– [ 21], which have exhibited remarkable ability to interpret complex instructions and even generate code from specifica- tions. However, off-the-shelf LLMs (e.g., GPT-4 [ 19]) cannot directly generate correct or deployable xApps for O-RAN. General-purpose prompting fails because LLMs lack O-RAN domain knowledge, often violate strict interface semantics, or hallucinate control logic inconsistent with operator policies. Our key insight is to combine LLMs language understanding with a domain-specific development pipeline that injects O- RAN knowledge, enforces interface compliance, and performs arXiv:2603.18604v1 [cs.NI] 19 Mar 2026 SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING2 CU-CP xApp 2xApp 1 ...... Messaging Infrastructure Service and Management Orchestration RANDatabase Conflict Mitigation Subscription Mgmt Mgmt Services Security Shared Data Layer xApp N RU UE Non-RT RIC Near-RT RIC A1 O1 O1 F1 F1 O-Front-haul E2 Real-time Control Loops Near-RT Control Loops Non-RT Control Loops CU-UP DU Fig. 2.O-RAN architecture. staged validation. This enablesAutORANto bridge high-level intent and low-level RAN control, making LLM-based xApp generation both feasible and reliable. Realizing this vision requires us to overcome several tech- nical challenges through a careful system design.AutORAN addresses the following major issues that make naive LLM- based xApp generation infeasible: 1 Under-specified re- quirements:Network operators often struggle to precisely articulate their xApp needs due to the complexity of O- RAN interfaces and jargon. Natural language requests can be incomplete or ambiguously structured, causing an LLM to mis- interpret the true intent.AutORANmitigates this by interacting with the user via arequirement refinement and structuring module. We define a structured template, guiding the user to specify key details of the desired xApp (e.g.. objectives, input metrics, policies). This requirement structuring process helps capture the intent accurately and provides the LLM a clear, unambiguous specification to work from. 2 Lack of O- RAN domain knowledge:State-of-the-art LLMs are trained on general-purpose text and lack specialized knowledge of O- RAN standards and interfaces. As a result, a vanilla LLM has no built-in understanding of the E2 interface or Key Performance Metrics (KPMs), and it often overlooks system constraints or protocol semantics. This can lead to hallucinated code or non-compliant API usage when generating an xApp. AutORANinjects domain knowledge into the generation pro- cess through an automatedknowledge retrievalmodule. Given the refined user requirements,AutORANretrieves relevant O- RAN documentation and examples, and feeds them as context to the LLM. By equipping the LLM with up-to-date O-RAN specifications and constraints, we ensure the generated code respects the proper interfaces and semantics (i.e., no missing or wrong API calls) and aligns with real RAN data models. This in-context knowledge injection [ 22] greatly improves code reliability. 3Entangled xApp logic and interfaces:A typical xApp program comprises two parts: (i) the core logic (often an AI/ML algorithm implementing the control functionality, such as an anomaly detector or scheduler), and (i) the interfacing code that connects to the RAN (configuring subscriptions, retrieving measurements, and actuating control via standard O-RAN messages). Generating an entire xApp in one shot iserror-prone— the LLM might entangle the control logic with O-RAN API usage in incorrect ways, leading to failures in function or integration.AutORANtackles this with atwo- stage generation and validation process. We first prompt the LLM to synthesize the standalone algorithmiccore function, ensuring the control logic is correct. Only then do we prompt the LLM (with the validated core as context) tointegrate the O-RAN interfacing codeand produce the complete xApp. This staged approach cleanly separates concerns, enabling the LLM to focus on each step. By the end,AutORANproduces a fully- formed xApp that not only implements the desired policy, but also adheres to O-RANs interface requirements and can be executed on the RIC without manual fixes. We buildAutORANon a real-world 5G testbed using the open-source srsRAN stack [23]. We evaluateAutORANon multiple representative xApp use cases, including anomaly detection, interference classification, and slice scheduling, which cover various network monitoring and control scenarios. Results demonstrate thatAutORANcan generate functionally correct xApps that match or even exceed the effectiveness of the best hand-crafted ones. By eliminating most manual development effort,AutORANdramatically accelerates the xApp development pipeline — new xApp can be realized and rolled out in a matter of hours rather than weeks or months in the traditional paradigm. In summary, this paper makes the following key contributions: •We presentAutORAN, the first framework to automate xApp development using LLMs, transforming traditional manual processes into natural-language-driven workflows. This paradigm shift significantly reduces development effort and lowers the barrier for operators to create xApps. •We propose a suite of novel techniques to adapt LLMs into xApp development agents, injecting O-RAN domain knowl- edge and structuring the generation process to handle com- plex RAN interfaces and control logic. These innovations empower the LLM to understand O-RAN specifications, comply with standard interfaces, and generate correct and efficient xApp code for diverse functionalities. •We design, implement, and extensively evaluateAutORAN on a live O-RAN testbed. Across diverse use cases, theAu- tORAN-generated xApps demonstrate both strong functional performance and seamless deployment viability, achieving comparable or superior results to human-built xApps while requiring minimal human effort. I. BACKGROUND ANDMOTIVATION A. O-RAN v.s. Traditional RAN Traditional RAN architectures are built on vertically in- tegrated hardware and software stacks provided by a single vendor. While this approach simplifies system integration, it leads to limited flexibility, increased cost, and vendor lock- in. Thus, launching new features often necessitates hardware upgrade by vendors, which is costly and time-consuming. Unlike traditional RANs that rely on monolithic, vendor- locked solutions, O-RAN promotes interoperability by disag- gregating RAN into multiple units with distinct functionali- ties. In Fig. 2, O-RU handles physical-layer and RF signal processing; O-DU executes real-time protocol stack functions, including MAC and RLC; and O-CU manages non-real-time operations such as RRC and PDCP. Meanwhile, to enable agile SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING3 TABLE I TRADITIONALRANV.S. O-RAN AspectsTraditional RANO-RAN Architecture Monolithic, Integrated hardware/software Disaggregated into modular components InterfaceProprietaryStandardized Vendor Ecosystem Closed Single-vendor Open Multi-vendor Control Logic Embedded in software stack Realized through xApps Upgrade Flexibility Constrained by vendor-specific dependencies High flexibility in upgrading components Function Development Black-box implementation Software-defined Programmable Non-RT RIC Near-RT RIC E2 Nodes E2 Setup Request (E2 node ID, RAN Function) RIC Subscription Request RIC Subscription Response RIC Indication (KPM E2SM Encode) Update Request RIC Control Action xApp Update E2 Setup Response (RIC ID, Accept RAN Function) xApp Control E2 Connection xApp Update Subscription RIC Control Fig. 3.xApp Workflow. and intelligent control, O-RAN separates RAN control intel- ligence into the near-RT RIC [24] and the non-RT RIC [25]. Operating at timescales ranging from 10 milliseconds to 1 second, the near-RT RIC enables dynamic control with xApps. Information exchange between the two RICs is supported via the A1 interface. By defining the interfaces between the units, O-RAN supports the interoperability of units from different vendors and offers network operators more unit options. TABLE Isummarizes the key differences between traditional RAN and O-RAN. B. xApp Workflow xApps enable fine-grained, near-RT control over O-RAN operations. These applications, hosted on the near-RT RIC, subscribe to specific telemetry streams (e.g., network KPMs), execute control logic, and transmit decisions to underly- ing RAN units via the E2 interface. Each xApp follows a structured lifecycle including deployment, registration, data subscription, decision execution, and control feedback. Specif- ically, upon deployment, an xApp registers its service capa- bilities and communication endpoints with the RIC. It then subscribes to measurements such as signal-to-noise ratio, handover events, or buffer occupancy, streamed in near-RT from gNodeBs or other RAN nodes. The xApp processes the stream using control logictypically rule-based policies or AI/ML modelsand issues corresponding control actions,e.g., DU CU RU RIC xApps E2 E2 F1 Fronthaul I want an xApp to detect anomalies in O-RAN based on the KPI metrics? Network Operators Professional Developer Requirement Analysis & Design Testing & Debugging Intelligence & Policy Module Message Handling Logic xApp Development Environment Setup Traditional xApp Development O-RAN System LLM-basedxAppDevelopment AutORAN Agent Anomaly Detection xApp User-AutORAN Interface O-RAN Knowledge Retrieval Function Design & Validation Automated xApp Synthesis UE Paradigm Shift Anomaly Detection xAppPool BEST Fig. 4.Traditional xApp Development in O-RAN v.s. Automated xApp Development. adjusting scheduling priorities or triggering handovers. This closed-loop interaction enables responsive network control. In parallel, the non-RT RIC can influence xApp behavior via the A1 interface, delivering high-level policy directives, model updates, or intent-driven objectives. Fig.3shows the typical workflow of an anomaly detection xApp. C. Current xApp Development Process While O-RAN provides programmability and architectural openness, it comes at the cost of increased complexity of xApp development. xApp developers must navigate through hetero- geneous vendor units and constantly evolving specifications. Developing new xApps typically involves highly specialized AI/ML model design, meticulous parameter tuning, model training based on collected dataset, and extensive testing and iterations, all of which require deep domain expertise and familiarity with the O-RAN specifications (Fig. 4). As a result, practical xApp development is time-consuming (span- ning weeks or months) even for experienced developers. For network operators, this not only delays innovation but also increases reliance on third-party developers, hindering agility in xApp development, raising concerns about user data, and increasing the deployment costs of new functionalities. These challenges highlight the pressing need for an agile xApp development paradigm. D. Towards Automated xApp Development Recent advances in LLMs (e.g., GPT-4o [26], DeepSeek- Coder [27]) have achieved significant progress in program synthesis. By translating high-level task descriptions provided by users into functional programs, LLMs can significantly reduce the effort required in traditional program development, offering new opportunities for automated xApp development. Fig. 4illustrates the transition from the conventional xApp development to the proposed LLM-based generation. Rather than manually crafting control functions, we envision that xApp developers or network operators could express their demands in natural language. An AI agent then automati- cally translates these demands into executable xApps. This paradigm shift from function-level programming (by profes- sional xApp developers) to intent-driven xApp generation SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING4 User Requirements O-RAN Knowledge Retrieval (§3.2) User-AutORAN Interface (§3.1) xApp Function Design &Validation (§3.3) Automated xApp Synthesis (§3.4) Requirement Refinement & Structuring Algorithm Outline Generation Detailed Design Generation Code Generation Code Validation O-RAN Code & Background Database Interface Matching Algorithm Integration xApp Validation xApp Fig. 5.AutORANoverview. (initiated by either developers or network operators) could streamline xApp development [28]. Although promising, directly applying general-purpose LLMs to the O-RAN contextfaces tremendous practical challenges. First, LLMs are unaware of underlying system constraints, protocol semantics, or architectural structures, which are highly specialized and complex, yet important to xApp development. Moreover, limitations such as prompt length and model opacity hinder their ability to process raw network telemetry or effectively encode system-specific con- text. Directly prompting AI agents for xApp development often results in incomplete or overly generic implementations [ 29]. For example, when we instructed several advanced models (e.g., Claude Opus 4.1 [30], GPT-5 [31], and Cursor [32]) with a task such as“Design a Python-based xApp to detect anomalies in O-RAN based on KPI metrics”, the generated outputs consistently exhibited serious flaws. As we found, Claude-generated framework contained non-standard RIC in- terfaces (e.g., a fabricatedsubscribe _ to _ kpismethod rather than an E2AP/E2SM-KPM subscription), invalid control paths (e.g., policies not conforming to A1 or E2SM-RC), synthetic data pipelines unrelated to actual KPM ranges, and empty or undeployable API servers. The other models produced highly similar skeletons. These observations provide an im- portant insight: because LLMs are fundamentally next-token predictors, once they generate an imprecise or hallucinated token sequence for an O-RAN-specific construct, subsequent predictions compound the error and the overall program quickly diverges from compliant implementations. In contrast, when external knowledge is supplied to guide function-level generation and integration, the produced tokens align more closely with O-RAN semantics, making it possible to assemble a coherent and executable xApp. To address these limitations, we develop a set of novel solutions for automated xApp generation. By integrating struc- tured input design, domain-aware prompt engineering, and automated validation mechanisms, we transform LLMs from generic coding assistants into specialized development agents tailored for O-RAN. In the following, we present the system architecture and the techniques to bridge the gap between general-purpose LLMs and xApp generation. I. AUTORAN DESIGN This section presents the detailed architecture and techni- cal modules ofAutORAN. As shown in Fig.5,AutORAN consists of four core functional modules:User-AutORAN In- terface,Domain Knowledge Retrieval,xApp Function Design and Validation, andAutomated xApp Synthesis. These mod- ules collaboratively transform user-provided requirements into deployable xApps. The automated framework boosts xApp development efficiency, protects user data from third-party developers, and lowers the barrier of xApp development. A. User-AutORAN Interface Traditional xApp development requires mastering detailed O-RAN interfaces, data streams, and APIs, posing a high entry barrier.AutORANaddresses this by providing a user- friendly interface where developers or operators specify de- sired functionalities in natural language. This design hides O- RAN complexity and enables non-experts to quickly develop and deploy xApps. Specifically, users first specify task require- ments, operational objectives, and high-level control policies in natural language as “Design a Python-based xApp to detect anomalies in O-RAN based on KPI metrics”. Such natural language specifications are intentionally designed to abstract away the details of implementation, allowing non-expert users to focus on expressing their intents. Practical challenge.LLMs cannot generate high-quality code to meet user expectations when prompted with simple instruc- tions [ 33]. The main reason is that user requirements are often unstructured and under-specified, and lack sufficient details for correct design and implementation. To tackle this challenge, we develop arequirement refinement and structuringmodule with a user requirement template specifically tailored for xApp development. Requirement Refinement and Structuring.This module aims to progressively guide users to specify precise and structured requirements through multi-round elicitation. To achieve this,AutORANfirst parses the initial user requirement using a lightweight intent extraction module, which identifies key task components such as objectives (e.g., anomaly de- tection, traffic classification), expected data modalities (e.g., KPMs), and control targets. If any fields are missing or under- specified, they are flagged as unresolved fields and trigger follow-up interactions. To handle ambiguous or incomplete input,AutORANcan engage users in follow-up dialogues, automatically generating targeted questions to elicit additional details—such as the specific type of anomalies, relevant data sources, or required detection granularity. To determine what information needs to be clarified,AutORANmaps the initial user input to a predefinedUser Requirement Templatecorre- sponding to the intended xApp type (e.g., anomaly detection, interference classification, traffic classification). This template specifies essential fields such as the task objective, input SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING5 modality, temporal resolution, and output format, ensuring that requirements are captured in a consistent and complete form. B. O-RAN Knowledge Retrieval To bridge the gap between user intents (specified in user requirement templates) and detailed algorithm design and implementation,AutORANdevelops anO-RAN knowledge re- trievalmodule, which gathers relevant knowledge from diverse sources (e.g., O-RAN specifications) to facilitate accurate interpretation of user intents and xApp generation. Precise Keyword Extraction.AutORANfirst automatically identifies some keywords from the standardized user require- ments in the context of O-RAN. This is important for retriev- ing highly relevant domain knowledge. Naive methods suffer the risk of missing essential contextual information or adding irrelevant noise [34]. For instance, extracting generic terms like “anomaly detection” without specifying the application domain may retrieve less relevant knowledge from other fields such as finance or healthcare. Conversely, extracting excessively fine-grained keywords may limit the flexibility and generalizability of the knowledge retrieval module. To address this challenge,AutORANadopts a structured keyword extraction strategy that decomposes each keyword into two semantic fields: thefunctional task(e.g., “anomaly detection”, “traffic classification”) andtarget domain(e.g., “in O-RAN”, “in near-RT RIC”, “based on KPMs”). This two-part structure ensures that the extracted keywords precisely reflect both the algorithmic intent and the deployment context. As shown in Prompt 1, the system explicitly instructs LLMs to balance specificity and generality by extracting phrases with optimal granularity—for example, preferring “anomaly detection in O-RAN” over either “anomaly detection” (too general) or “anomaly detection in O-RAN based on past hour KPMs” (which is too specific and can be revised to “anomaly detection in O-RAN” and “KPMs in O-RAN”). This trade-off ensures that the retrieved domain knowledge is both highly relevant and widely applicable across similar xApp tasks. Prompt 1: Keyword Extraction *User Problem* <user input>...</user input> *Target* To effectively understand the user problem, please identify key concepts that provide essential background knowledge. *Rules* Extract task-related and domain-related keyword phrases from the user problem. The keyword should contain two parts: the core task or function, and the domain context. Avoid generic or overly specific expressions. *Response Format* Term1, Term2, ... Efficient Knowledge Storage and Retrieval.As a first step, AutORANinvokes a web search engine to retrieve correspond- ing technical repositories, O-RAN specifications, and open- source xApp development libraries from the Internet. This step is executed before local indexing to ensure that any missing domain knowledge not present in the local database is supplemented in time. To enhance efficiency and reliability, the search is scoped to a predefined list of selected author- itative sources (as in Prompt 2), ensuring that the retrieved content is both precise and relevant to the keyword and the development task. The search results are then passed into the structuring and embedding pipeline for integration into local knowledge base. With various retrieved relevant information, AutORANconverts them into a structured format (e.g., Mark- down). Next,AutORANuses an embedding model to encode available knowledge items into a dense vector space (i.e., embeddings), enabling semantic-level retrieval beyond simple keyword matching. Finally, the embeddings are organized and stored in a local knowledge base, categorized according to content types such as O-RAN background and specifications, algorithm principles, performance optimization methods, and coding patterns. During the entire code generation process, AutORANproactively retrieves relevant information from the knowledge base for reference. For instance, when a user requests for “anomaly detection in O-RAN”, the knowledge base provides relevant information about common KPMs for the functionality in the O-RAN literature, best practices for AI/ML model design for xApps, and a variety of performance evaluation metrics. Prompt 2: Knowledge Search *User Problem* <user input>...</user input> *Target* To enrich domain knowledge, please search for the fundamental defini- tions or background of given terms .... *Rules* Prioritize Wikipedia or official sites. Exclude implementation details. Filter out irrelevant content. *Response Format* URL1, URL2, ... C. xApp Function Design and Validation This module focuses on generating executable code for xApp implementation with automated validation. This module is designed to produce high-quality, optimized algorithms tailored to fulfill user requirements, which is accomplished through a multi-stage prompting framework guided by Chain- of-Thought (CoT) reasoning strategies [ 35]. In particular, the reasoning process essentially breaks down the xApp algorithm design into multiple manageable sub-components for fine- grained code generation and program synthesis. The divide- and-conquer process has three key stages: algorithm outline generation, detailed design generation, and code generation with validation. Prompt 3: Algorithm Outline Generation *User Problem* <user input>...</user input> *Target* To address the user problem effectively, please provide an algorithm outline step by step to solve the user problem based on the background information. *Rules* Provide a step-by-step algorithm design. Each step should have a clear goal and describe what actions will be performed. Focus on input/output relationship between steps. *Response Format* Step 1: [Title] ... ; Step 2: [Title] ... ; ... SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING6 Algorithm Outline Generation.Considering the high com- plexity of xApp algorithms,AutORANis instructed to first generate a high-level outline of the target algorithm, de- composing the solution into a sequence of logical steps or functional modules. To ensure the relevance and clarity of the output,AutORANemploys a specially designed Prompt 3 that reiterates the user problem description, specifies the generation target, and enforces strict quality assurance rules for consistency and task alignment. For example, when a user specifies an anomaly detection task based on O-RAN KPMs, AutORANgenerates an outline that begins with dataset load- ing and preprocessing, followed by feature selection, model construction, prediction logic, and evaluation using provided metrics (e.g., accuracy, F1 score). Each step is designed to ensure input-output consistency, robustness to missing data, and compatibility with a standalone execution script. Detailed Design Generation.Subsequently,AutORANis further prompted to expand on each step in the outline by specifying concrete operations, data processing methods, fea- ture selection strategies, and decision criteria. For example, in the context of anomaly detection, the LLM may recommend selecting key performance metrics (KPMs) – such as PRB utilization, user throughput, or handover rates – as critical input features for training the detection model. Prompt 4: Detailed Design Generation *User Problem* <user input>...</user input> *Target* To ensure the generated code is executable and robust, please analyze the provided output logs ..., identify the problem, and modify the code to fix the errors if the compiler or interpreter cannot successfully run the code. *Rules* The following case is not allowed: # ... (other functions remain unchanged) # ... (same as before) Code Generation with Validation.Once the detailed de- sign is completed,AutORANtriggers theCode Generation module to translate each design specification into executable code segments. Then,AutORANconstructively integrates them into a comprehensive program (i.e., the algorithm part of an xApp). In practice, however, the input data is collected from different O-RAN interfaces and protocols, exhibiting significant variability and lacking ground truth labels for model training. This limitation hindersAutORANfrom auto- matically verifying the correctness of the algorithm and even improving its performance iteratively. To tackle this challenge, we propose to instructAutORANto first download public datasets corresponding to the user problem (e.g., SpotLight [ 9] for anomaly detection). The dataset is then input into the generated algorithm for evaluation, and the error logs are recorded and fed back toAutORANfor iterative refinement, with the designed prompt shown in Prompt 4). Such a self- correcting loop produces stable and functional code with minimal human intervention. Additionally,AutORANcould generate multiple algorithm variants for the same task, each exploring different model architectures, feature engineering techniques, or parameter configurations.AutORANcould then select the one that achieves the best performance on the evaluation dataset across multiple metrics, which could be optimized to execute in parallel but comes at an increased xApp generation cost. D. Automated xApp Synthesis After validating the core algorithms and functional com- ponents with either public or local datasets, the final step is to integrate them into a deployable xApp. Unlike general- purpose program synthesis, O-RAN xApps must satisfy strict requirements on correctness, interface compliance, timing feasibility, and runtime stability. Conventional LLM-based code generation often stops once the code is syntactically correct or shows good offline accuracy, butAutORANtreats validationas a central objective and evaluates the correctness, robustness, timing behavior, and near-RT deployability of the generated control logic. Nevertheless, the complex nature of O-RAN platforms and dynamic working environments present significant practical challenges. Challenges.One key challenge isinterface matching, where xApps must comply with specific service models (e.g., E2SM- KPM, E2SM-RC) and encode control messages in stan- dardized formats. Any mismatch between the expected and available metrics–such as expecting per-UE throughput when only cell-level aggregates are reported–can result in functional failures. Moreover,policy enforcementpresents another layer of complexity. xApps must respect dynamic control rules defined by the RIC, such as slicing constraints, control pri- orities, and resource allocation limits. Even if an anomaly is correctly detected, xApps must verify whether intended actions (e.g., resource reallocation) are permissible under current policies.Compatibility validationis also essential to ensure that generated xApps conform to runtime constraints such as configuration schemas, protocol versions, and execution time bounds. For example, if the model inference of one xApp exceeds the 1-second near-RT latency requirement, it becomes inapplicable regardless of its correct logic or high accuracy.These challenges collectively highlight the need for robust validation, adaptive interfacing, and policy-awareness mechanisms in the final deployment phase. To tackle these challenges, we first design an xApp tem- plate based on the sample code from O-RAN ALLIANCE, composed of multiple placeholder functions. The template begins with a system initialization module that sets up runtime environments and registers the xApp with the RIC platform. A configuration parser is adopted to load deployment-specific pa- rameters, including E2 subscription settings and service model bindings. Incoming messages from the E2 interface (e.g., periodic KPMs) are handled by a dedicated event processing module that extracts relevant metrics and prepares them for analysis. The core decision-making logic (i.e., the generated algorithm) is inserted into a processing unit that analyzes the input features and determines appropriate control actions. If the xApp is subject to A1 policy constraints, a policy inter- pretation module reads and enforces relevant operator rules. Finally, the output of the decision logic is encoded and sent to the RAN via a control message dispatch module, ensuring SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING7 that the xApp completes the loop from monitoring to actuation in compliance with near-RT requirements. To generate the final xApp program, we design a function-filling module consisting of three stages: interface matching, algorithm integration, and xApp validation. Interface Matching.AutORANfirst searches the knowledge base to retrieve relevant interface specifications, API formats, and policy requirements for the target xApp. We then use a few-shot in-context learning Prompt 5 that instructsAutORAN to fill the interface-related placeholder functions that are consistent with the retrieved knowledge and user requirements. This ensures that the generated xApp conforms to the interface specifications and data formats. Prompt 5: Interface Matching *User Problem* <user input>...</user input> *Target* To ensure the xApp communicates correctly with O-RAN components, please generate complete and specification-aligned interface functions by filling in the placeholder sections of the template code, based on the retrieved domain knowledge and user requirements. *Rules* - Refer to the knowledge base. - The generated functions should follow the data formats and message structures defined in O-RAN specifications. - All code must be complete and self-contained. Algorithm Integration.The algorithms and functional mod- ules generated byAutORANtake the public or local datasets as input (§I-C), rather than the real-world data streams reported via O-RAN interfaces. Thus, we need to modify and adapt the algorithms before integrating them into an xApp. Specifically, we design Prompt 6 that instructsAutORANto replace the offline data loading and evaluation code with interface-driven input and output handling logic. This includes adapting input pipelines to consume KPMs from E2 messages, restructuring the output format to align with xApp control actions, and embedding the algorithm logic into a real-time loop. Prompt 6: Algorithm Integration *User Problem* <user input>...</user input> *Target* To transform the algorithm into a deployable xApp component, please adapt the code to read real-time E2 KPM inputs, process them with the existing algorithm logic, and output actionable results to the RIC system using standardized control message formats. *Rules* - Refer to the knowledge base. - Follow the previous code template .... - Avoid any blocking operations. - All code must be complete with no placeholders. xApp Validation.To enhance the reliability of the integrated xApp, we further design an auxiliary function that supports essential tasks such as data parsing, logging, and configuration management. In addition,AutORANperforms syntax checking and static code analysis using SonarQube [ 36] to detect potential issues in code structure, API usage, and data type handling. This process can effectively improve the reliability and maintainability of the generated xApp. However, it is important to note that the existing code analysis methods are limited to verifying code-level semantics and structural correctness. They do not assess the functional logic of the xApp. Specifically, these existing methods do not validate whether each interface is correctly implemented or real-time communication and control actions are performed as expected. To address this limitation, we perform system-level validation in the experimental evaluation (§V-D), where we deploy and execute generated xApps on a real-world O-RAN testbed. This enables us to validate both code correctness and runtime functionality in execution. IV. AUTOMATEDDEPLOYMENT ANDEXECUTION We now present the xApp deployment and execution process in real-world environments. The xApp execution module is tightly coupled with O-RAN, ensuring that the entire workflow from xApp generation to deployment can be validated end-to- end within the operational O-RAN environment. Inter-unit Communication.The generated xApps are de- ployed on the near-RT RIC. These xApps communicate in- ternally via RIC Message Router (RMR) [ 37] to handle inter-component messaging within the RIC software stack. The xApps communicate with other units in O-RAN via the standardized E2 interface. In particular, the xApps receive real- time telemetry from base stations (e.g., gNB) and transmit control messages to dynamically influence network behavior. Automated xApp Deployment.The generated xApps are packaged as Docker containers in accordance with the Flexric xApp Framework [ 38] and are deployed onto the RIC host. Deployment scripts include all necessary environment con- figurations to enable integration with platform services such as logging, status monitoring, and configuration management. Containerized xApps are then launched by the RIC orchestra- tor and automatically establish initial connections to the RMR, subscription manager, and internal databases. Dynamic Registration Procedure.Once deployed, the xApps register their presence and functional capabilities with the E2 Service Manager of the RIC. This involves subscribing to specific metrics (e.g., RLC buffer occupancy, PRB utilization, handover statistics) emitted by the base stations, depending on the functionalities of the generated xApps. For instance, in the case of an anomaly detection xApp,AutORANensures that only the relevant data streams are subscribed, minimizing telemetry transmission overhead. Real-time Execution and Decision Logic.When the xApps operate in a closed-loop control mode, telemetry data col- lected via the E2 interface are continuously streamed into the xApps’ processing engine. The core algorithms generated byAutORANprocess these data streams in real time for various tasks such as anomaly detection, load balancing, and interference mitigation. Based on real-time processing results, control actions (e.g., RAN parameter reconfiguration, traffic steering) are generated and sent back to the base stations via E2 to adapt network configurations. Standard-compliant Interface Integration.The generated xApps are designed to comply with the O-RAN specifica- tions. To this end,AutORANautomatically generates the task- specific E2 Service Model (E2SM) handlers, such as E2SM- SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING8 CU & DU + 5G Core + RIC RU UE USRP RU Host Machine 10GbE COTS UE RIC FlexRIC CU&DU srsRAN 5G Core Open5GS Fig. 6.Testbedding with SDR and COTS UE. KPM for KPM reporting and E2SM-RC for RAN control mes- saging. These modules ensure seamless interaction with base stations, retaining compliance with expected message formats, telemetry structures, and control interfaces as defined in the O-RAN standard. Adhering to standard-compliant interfaces allows the generated xApps to be readily deployable in near- RT RIC without extra adjustments. V. IMPLEMENTATION ANDEVALUATION In this section, we introduce the implementation and evalua- tion ofAutORAN, aiming to answer the following key research questions: (1) How effective isAutORANfor automated xApp development? (2) CanAutORAN-generated xApps be automat- ically deployed and executed on real-world O-RAN platforms with near-RT control loop constraints? (3) How does each novel technical module contribute to the overall performance ofAutORAN? A. Experiment Setup Hardware.We implementAutORANon the software- defined srsRAN [23] stack. As shown in Fig.6, we use a Ubuntu 22.04.1 LTS workstation with an Intel Xeon(R) Core E5-2620 v4 CPU, 32 GB RAM, and USRP X310 units to serve as gNodeB. We run Open5GS [ 39] on the workstation as the core network. User equipments (UEs) include OnePlus 8T and Xiaomi 13 Pro smartphones with programmable SIM/USIM cards (sysmoISIM-SJA2 SIM cards [40]). For control-plane support, we utilize the FlexRIC framework [38] as the near- RT RIC to host xApps and manage the RAN in near real-time. Software.We use GPT-4 [ 19] as the default LLM and LangChain [ 41] as the orchestration framework for prompt management, module integration, and tool wrapping. We implement all backend logic in Python 3.10 and utilize li- braries such as OpenAI SDK, FAISS [ 42] for embedding- based retrieval, and FastAPI [43] for interface interactions. SonarQube [36] is used for static code analysis. B. Open-Source Dataset Performance We compareAutORANwith two SOTA xApps to evaluate whetherAutORAN-developed xApps are effective in meeting user intentions. For fair comparisons, we use the same dataset (i.e., the one that each baseline uses) for evaluation. TABLE I ACCURACY COMPARISON WITHSPOTLIGHT[9] MethodMetricMAC NETWORK PDCP RADIO MIXED AutORAN Precision97.3%98.9%92.1% 78.8%97.6% Recall97.5%98.9%91.8% 81.5%97.6% SpotLight Precision93.6%94%100%95%95.5% Recall100%92%93%93%94.5% Z Score Precision74.4%75%69.2% 58.8%65.4% Recall82.4%83.6%69.4% 59.7%79.2% LSTM Precision74.5%65.2%73.5% 92.05% 69.5% Recall6.3%37.6%7.1% 54.2%10.5% 1) Evaluation Metrics:We adopt two sets of metrics: AutORAN-generated xApp Evaluation.(1)Precision,Recall RateandF1 Scoreare the primary indicators of model correctness. (2)VRAM Usageevaluates the runtime efficiency of the generated AI/ML algorithms. AutORANEvaluation.(3)Synthesis Timequantifies the total duration from the moment a user requirement is input to the completion of xApp generation; (4)Number of Bugsassesses the code quality and the robustness ofAutORANs xApp generation. Each syntactic or semantic error caught during compilation or testing is counted as a bug; (5)Iteration- to-Success Countmeasures the correction cycles required to synthesize a fully functional program; (6)One-Shot Success Ratemeasures the proportion of xApps that can be automati- cally executed without errors on the very first generation and execution, without additional iterations. 2) Baseline 1 - SpotLight for Anomaly Detection: Spot- Light [ 9] collects a comprehensive 5G O-RAN dataset, con- taining over 100 million KPM datapoints sampled at 100ms resolution. The dataset covers 600+ metrics across MAC, RLC, PDCP, Radio, and platform components, and captures both synthetic and real-world anomalies such as CPU contention and fronthaul congestion. SpotLight adopts a two-stage dis- tribution learning method (i.e., JVGAN+MRPI) to detect the anomalous behaviors of RAN KPMs. For fair comparison, we leverageAutORANto synthesize an anomaly detection xApp based on the user requirements. We also report the performance of a typical Z-score detection algorithm and an LSTM-based autoencoder for comparison. Results.Table IIlists the precision and recall rate on five KPM subsets: MAC, Network, PDCP, Radio, and Mixed, corresponding to five different O-RAN anomalies. We observe thatAutORAN-developed xApps achieve higher precision and recall rates across all subsets. Surprisingly, it surpasses Spot- Light in several subsets, especially when using Mixed KPMs that aggregate multiple cases. Further analysis of the generated code reveals thatAutORANcan automatically select and preprocess the most relevant KPMs, while discarding noisy or weakly correlated knowledge during prompt construction. AutORANalso applies task-specific filtering strategies during data preparation, leading to more robust features and better generalization across anomaly types. To examine the model iteration process ofAutORANand evaluate the performance of its automatically generated xApps, we analyze both accuracy and code quality across multi- ple design and validation cycles. As shown in Fig. 7(a), SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING9 (a) Quality vs. Iterations(b) Model diversity Fig. 7.Performance ofAutORAN-developed anomaly detection xApp on SpotLight dataset: (a) quality improvement across iterations; and (b) xApp variants. (a) Quality vs. Iteration(b) Model diversity Fig. 8.Performance ofAutORAN-developed anomaly detection xApp on MobiWatch: (a) quality increment across iterations; and (b) diverse xApp variants. each iteration yields consistent improvements with increased accuracy and reduced bug counts. On average, the initial implementation contains approximately five bugs per trial across 25 independent runs. These bugs typically stem from incomplete logic, improper data preprocessing, and incorrect control flow or algorithm boundaries, all of which require refinement to achieve functional correctness. Notably, the number of bugs decreases significantly after two to three iterations, with accuracy improving in parallel. This is driven byAutORANs code improvement module, which adapts subsequent prompts based on insights from prior errors and system feedback. Over time, the nature of refinements shifts: early iterations focus on correcting structural issues such as flawed logic or misused KPMs, while later iterations address finer details including hyperparameter tuning, code formatting, and exception handling. Beyond accuracy,AutORANalso sup- ports interactive customization. Users can specify preferences during development, such as trade-offs between computational workload and model accuracy. To demonstrate this flexibil- ity, we independently developed three xApp variants using AutORAN, each with distinct requirements (e.g., adjusting VRAM usage per iteration). As shown in Fig. 7(b), the variants (i.e.,AutORAN-1toAutORAN-3) differ in feature selection and AI/ML model architecture. Higher accuracy generally corre- lates with increased VRAM consumption, allowing operators to select the most suitable variant based on available hardware resources. 3) Baseline 2 - MobiWatch for Anomaly Detection:Mobi- Watch [10] monitors link-layer and session-layer (RRC and NAS) messages using MobiFlow [10], a telemetry pipeline that transforms packet traces into structured signaling features. The dataset includes both benign and adversarial traces. Mo- biWatch implements two versions of anomaly detection using Autoencoders [ 44] and LSTM [45], noted as MobiWatch-1 and MobiWatch-2, respectively. Results.TableIVplots the average accuracy, precision, and recall of both MobiWatch andAutORAN-developed xApps. Since the dataset complexity of MobiWatch is simpler than Baseline 1,i.e., fewer KPMs included, the accuracy ofAu- TABLE I ACCURACY COMPARISON WITHIC [46] MethodType of xApps (Dataset)Accuracy AutORAN InterClass-Spec92.9% InterClass-KPM98.8% IC [46] InterClass-Spec98% InterClass-KPM97.9% TABLE IV ACCURACY COMPARISON WITHMOBIWATCH[10] MethodDatasetAccuracy Precision Recall AutORAN Benign 100%100%N/A Attack100%100%100% MobiWatch-1 Benign93.23%93.23%N/A Attack 100%100%100% MobiWatch-2 Benign91.15%91.15%N/A Attack95%88.68%100% tORANsignificantly outperforms MobiWatch on the same dataset. Fig. 8(a) illustrates the model iteration process. We observe that for simpler tasks,AutORANcan iterate more quickly to a satisfactory model with<3 iterations. In addi- tion,AutORANprovides various xApp options as shown in Fig.8(b). Users can choose simple models (i.e.,AutORAN-2 without GPU usage) to achieve comparable performance. 4) Baseline 3 - IC for Interference Classification:This dataset was collected in [46] using a real srsRAN-based O- RAN testbed. It includes two modalities: 10,000 spectrograms (128×128 grayscale) recorded over the air from USRP-based receivers, and over 25,000 uplink KPM traces including SINR, BLER, MCS, and throughput metrics. Note that half of the samples correspond to clean transmission, while the other half represent continuous-wave jamming scenarios. Results.We generated two versions ofAutORAN-IC xApps using either the spectrogram data (InterClass-Spec) or KPM subsets (InterClass-KPM). As shown in Table I,AutORAN achieves detection accuracies of 92.9% and 98.8% on the two respective subsets, matching the performance of the CNN and DNN models proposed in the baseline paper [46]. However, we observed significant performance variance across differ- ent generated versions: some models achieve less than 80% accuracy, while others exceed 95%. The underlying reasons are twofold. Unlike time-series KPM data, spectrograms are high-dimensional images, which require more complex algo- rithms for effective and robust data preprocessing. In addition, inappropriate selections of AI/ML model architectures and hyperparameters from the vast configuration space can lead to unstable performance [ 47]. To tackle this issue, we may further integrate architecture-level reasoning and guide search strategies into the generation process. Overall, the experiment results demonstrate thatAutORANis capable of generating xApp functions for radio-layer tasks by adopting more com- plex image-based AI/ML models. C. Private & Domain-Specific Performance To further assess the generalizability and practical utility of AutORAN, we evaluate its performance on slice schedulinga SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING10 102030405060 Latency (ms) 0 20 40 60 80 100 Throughput (Mbps) RefxApp eMBB RefxApp URLLC RefxApp mMTC RefxApp Gaming RefxApp IoT Better (a) Latency vs. throughput per slice 50100150200 Time Step 0.4 0.6 0.8 1 QoS Satisfaction Rate GreedyAutORAN (b) QoS satisfaction over time Fig. 9.Performance ofAutORAN-generated slice scheduling xApp. more complex, control-oriented xApp that requires real-time radio resource allocation under multi-slice QoS constraints. Unlike pattern recognition tasks such as anomaly detection, slice scheduling involves reasoning over competing service objectives and issuing control actions that adhere to policy requirements within the RAN. Given the operator-specific nature of slice configurations and their strong dependence on deployment environments, no public datasets currently exist for benchmarking slice scheduling performance. To address this limitation, we construct a synthetic dataset that emulates realistic per-slice telemetry and QoS policies, grounded in standardized O-RAN interface specifications. A key advantage of this synthetic approach is its exclusion of publicly available samples that may have been encountered during LLM pre- training, thereby mitigating memorization effects and enabling a more accurate evaluation of the generalization capabilities of AutORAN. We extend standard service types defined by 3GPP [ 48] to include five representative slice categories, where each class embodies distinct QoS objectiveseMBB emphasizes sustained high throughput, URLLC targets ultra-low latency, mMTC prioritizes energy-efficient massive connectivity, Gaming de- mands low latency with stable jitter, and IoT focuses on reliable transmission under low data-rate conditions. These slice definitions are instantiated across the O-RAN stack, influ- encing A1 policy configurations, guiding E2 node telemetry reporting, and serving as templates for SMO-driven service orchestration. To support evaluation, we construct a synthetic dataset that closely replicates slice-level telemetry patterns observed in operational O-RAN deployments. The telemetry generation process is grounded in the stan- dardized E2SM-KPM service model and captures a broad set of per-slice metrics, including Physical Resource Block (PRB) utilization, end-to-end latency, throughput, active UE count, packet loss, jitter, Reference Signal Received Power (RSRP), and Physical Downlink Control Channel (PDCCH) utilization. These metrics are sampled at regular intervals from gNBs and streamed to the near-real-time RIC, thereby reflecting the operational data flow of a real O-RAN deployment. In addition to raw measurements, slice-specific QoS targets and relative priorities are derived from A1 policy profiles, enabling the dataset to encode not only observed performance states but also policy-driven intents. This design allows downstream modules and generated xApps to reason jointly about runtime performance and compliance with operator objectives. To enhance realism, we further calibrate statistical distributions and inter-metric correlations using empirical trends reported in prior measurement studies. In particular, Spotlight [ 9] provides (a)(b)(c) 200250300350400450 Execution Time (ms) 0 0.5 1 Slice Scheduling AutORAN-1AutORAN-2AutORAN-3 6080100 Execution Time (ms) 0 0.5 1 Anomaly-2 050100 Execution Time (ms) 0 0.5 1 CDF Anomaly-1 200300400 Execution Time (ms) 0 0.5 1 Slice Scheduling Fig. 10.Execution time ofAutORAN-generated xApps on real-world testbed. fine-grained insights into PRB allocation and interference patterns, while MobiWatch [10] reports user mobility and throughput dynamics across slices. These studies guide our selection of value ranges, correlations, and noise models, ensuring that the resulting synthetic dataset remains represen- tative of realistic RAN telemetry. We evaluate theAutORAN-generated slice scheduling xApp by jointly examining its effect on per-slice throughput/latency trade-offs and on the temporal evolution of QoS satisfaction. As shown in Fig. 9(a), each marker pair represents the Re- quirement (Ref) and theAutORAN-generated xApp for the same slice type. Across all five slice categories, the xApp consistently shifts the operating point towards the top-left region, indicating simultaneous throughput gains and latency reductions. URLLC slices, in particular, exhibit the largest latency improvement, confirming thatAutORANcan generate control logic that prioritizes delay-sensitive services while preserving throughput. Fig. 9(b)further presents the QoS satisfaction rate over time under two scheduling strategies: a greedy heuristic baseline andAutORAN. While the greedy approach achieves only moderate compliance and fluctuates heavily with traffic changes, theAutORAN-generated xApp maintains consistently higher satisfaction rates by dynamically reallocating PRBs in response to real-time slice performance feedback, ensuring robust QoS across heterogeneous and time- varying loads. D. Real-World Execution Performance To assess the deployability ofAutORAN-generated xApps, we conducted full integration and runtime testing on a real- world O-RAN testbed. For each representative functions (two anomaly detection applications and one slice scheduling ap- plication), we automatically generated three distinct algorith- mic variants of the xApp, denoted as AutORAN-1/2/3 and onboarded them into the near-RT RIC platform via standard integration procedures. We measured the end-to-end execution time of the complete control loop, encompassing KPI stream reception, model infer- ence, and control policy execution. As shown in Fig.10(a,b), the two anomaly detection xApps exhibit execution times ranging from 20110 ms and 5590 ms, respectively. The slightly higher latency of the second baseline is attributed to its more intensive telemetry analysis. For slice scheduling (Fig. 10(c)), execution times span 250420 ms, reflecting the added com- plexity of multi-slice QoS optimization. These results demon- strate thatAutORAN-generated xApps are deployable within the sub-second latency constraints of near-RT RIC control SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING11 GPT-4o GPT-3.5 DeepSeek Claude-3 LLaMA 3.1 GPT-5 0 50 100 Average F1 Score (%) 0 10 20 30 40 Synthesis Time (min) F1 Score Synthesis Time (a)AutORAN- Baseline 1 GPT-4o GPT-3.5 DeepSeek Claude-3 LLaMA 3.1 GPT-5 0 50 100 Average Accuracy (%) 0 10 20 30 40 Synthesis Time (min) F1 Score Synthesis Time (b)AutORAN- Baseline 2 GPT-4o GPT-3.5 DeepSeek Claude-3 LLaMA 3.1 GPT-5 0 50 100 Average Accuracy (%) 0 10 20 30 40 Synthesis Time (min) F1 Score Synthesis Time (c)AutORAN- Baseline 3 0246810 Iteration-to-success count 0 2 4 6 8 # of bugs C++CPython (d) Programming language Fig. 11.Impact of different LLM and programming languages onAutORAN-generated xApps. loops, even when accommodating more sophisticated control tasks. E. Key Impact Factors To further test the critical determinants of xApp function performance and development outcomes, we conducted a set of in-depth experiments isolating key design and environment factors. Specifically, we vary the following factors: base LLM models, code iteration count, bug resolution cycle, and pro- gramming language. These include the choice of LLM model used for synthesis, the effect of code iteration count and bug resolution cycles, the influence of programming language, and the importance of each modular step in theAutORAN generation pipeline. The insights derived from these analyses not only reveal the inner mechanisms of our framework but also inform practical decisions for developers usingAutORAN in future xApps. We test six LLMs for comparison: GPT-4o [ 19], GPT- 3.5 [49], DeepSeek-R1 [50], Claude 3 [30], LLaMA 3.1 (70B) [51] and GPT-5 [31]. Each model was used to indepen- dently generate xApps, which were subsequently evaluated in terms of accuracy/F1 score and total synthesis time. As shown in Fig. 11(a), the xApp generated via GPT-4o consistently achieves strong performance across all datasets, while GPT- 5 delivers nearly identical accuracy but with longer synthesis time. This superior performance stems from three key factors: (1) stronger chain-of-thought reasoning, leading to better de- composition of high-level xApp logic; (2) more semantically aligned code with fewer hallucinations or misinterpretations of O-RAN domain terms; and (3) more modular and syntactically complete outputs, reducing post-generation correction. Moreover, GPT-4o-basedAutORANrequires less than 20 minutes to generate most of the xApps, whereas GPT-5 typi- cally incurs longer wall-clock synthesis time despite achieving comparable accuracy. Other models exhibit greater variability: some tend to produce verbose code with redundant control logic, while others rely on overly generic patterns or omit low- level implementation details. Importantly, LLM performance is not uniform across all xApp generation tasks: anomaly detection based on structured KPM telemetry (Baseline 1) is generally easier to synthesize, whereas tasks involving more complex or unstructured modalities pose greater challenges. In addition, as shown in Fig. 11, for Baseline 2, which includes fine-grained latency and jitter traces, GPT-4o and GPT-5 produced xApps with over 95% accuracy, whereas GPT-3.5 and DeepSeek-based variants often misinterpreted the input schema, yielding accuracy drops of 10–15%. Claude 3 showed intermediate performance, producing mostly correct feature extraction logic but occasionally omitting edge-case handling. For Baseline 3, results varied more substantially depending on both the chosen LLM and the input modality (spectrogram vs. KPM). While GPT-4o and GPT-5 successfully generated convolution-based pipelines that matched or exceeded the original baselines, GPT-3.5 and DeepSeek struggled to pro- duce valid preprocessing stages for spectrograms, sometimes defaulting to generic feedforward models unsuitable for high- dimensional inputs. These results reinforce that xApp synthesis quality depends not only on the target task modality but also on the LLMs ability to align generated code with domain-specific structures and constraints. We explicitly promptedAutORANto synthesize functionally equivalent xApps in Python, C, and C++. Each version was tested across 10 independent trials using the same require- ments and input dataset. We then recorded the cumulative bug count before the first successful run and the iteration- to-success count to assess the code reliability and structural difficulty. As shown in Fig.11(d), Python xApps exhibit higher efficiency and concise structures, averaging only 1.8 bugs per version and requiring only 3 iterations to achieve a fully functional implementation. This advantage can be attributed to Python’s high-level syntax, which resembles natural language compared to C and C++. Its concise, human-readable structure enables LLMs to more effectively translate high-level user requirements into functional code [ 52]. In contrast, C and C++ based xApps exhibit more bugs – often due to errors in memory management, pointer arithmetic, and data type handling – and require more code refinement iterations to converge. F. Ablation Study We conducted ablation studies to separately evaluate the contribution of each technical module. Specifically, we con- sider four key modules:Requirement Refinement and Structur- ing,Background Knowledge Retrieval,xApp Function Design, andCode Validation. In each ablation setting, we disabled one module while keeping the others intact, then generated 25 xApps per dataset and recorded two metrics: the one-shot success rate and the iteration-to-success count. The results shown in Table Vdemonstrate that the key modules collectively achieve the robust and reliable xApp synthesis with high performance and distinct contributions. The ablation of any single module results in a noticeable degradation in the overall performance. Specifically, a signif- icant drop occurs when theBackground Knowledge Retrieval SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING12 TABLE V ABLATION STUDY(REQUIREMENT REFINEMENT AND STRUCTURING, KNOWLEDGE RETRIEVAL,FUNCTION DESIGN,ANDVALIDATION.) Method One-Shot Success RateIteration-to-Success Count Baseline 1 Baseline 2Baseline 1Baseline 2 w/o R. (§I-A)0.880.8483 w/o K. (§I-B)0.760.80145 w/o F. (§I-C)0.900.8083 w/o V. (§I-C)0.840.92106 AutORAN0.920.9653 module is excluded. Without this module,AutORANwould lack the essential knowledge about O-RAN-related technical information (e.g., O-RAN KPMs, optimization heuristics, and SOTA algorithm templates) to perform in-context reasoning throughout the entire xApp generation process. As a result, the generated xApps tend to adopt simplistic algorithms with generic logic flow, leading to a one-shot success rate below 60%. Without this module, we also observed a dramatic increase in the number of code refinement iterations. In addition, the absence of theRequirement Refinement and Structuringmodule also causes poor performance. Without it, user inputs are directly passed to LLMs in an ambiguous and unstructured form, hamperingAutORAN’s ability to accurately interpret user intents. Consequently, the generated xApps often produce misaligned outputs with incorrect formats. Worse still, when thexApp Function Designmodule is disabled,AutORAN skips important intermediate reasoning steps without CoT prompting. This results in code that appears to be syntactically correct but often suffers from logical flaws and incomplete functionality implementations. Lastly, the ablation ofCode Validationmodule would disable the automated syntax check- ing and static analysis before final output, which leads to more immediate runtime failures. In summary, these findings emphasize that the complete pipeline architecture ofAutORANis essential to achieve consistent performance for generating functional xApps. Such a complementary combination of all technical modules allows AutORANto reliably generate executable xApps across diverse O-RAN control scenarios. VI. RELATEDWORK xApp Development.O-RAN promotes openness and modular- ity, enabling flexible integration of heterogeneous components across the RAN. Numerous studies [ 53]–[60] have explored various aspects of O-RAN, but xApp development [ 37] re- mains cumbersome due to system complexity and evolving specifications [61]. To facilitate deployment efficiency and multi-service support, OREO [15] formulates xApp orchestra- tion as a multi-service deployment problem and enables xApp sharing. Spotlight [ 9] focuses on functionality design and proposes a telemetry-driven xApp for explainable anomaly de- tection. In control intelligence, DRL-based xApps [ 62] adopt deep reinforcement learning to allocate O-RAN resources and meet traffic and slicing demands. Recent work [63] explores natural-language-based intent translation for O-RAN slice control, but it focuses on simple rule-based control and does not support code synthesis or validation. In contrast,AutORAN presents an end-to-end automation framework for agile xApp development, which allows developers or even network oper- ators to quickly transform high-level business ideas and into readily deployable xApps and thereby drastically shorten the xApp development cycle. LLMs for Code Generation.By exploiting the remarkable language understanding and processing capabilities of LLMs, numerous intelligent applications have been developed, such as mobile task automation [64], [65] and IoT data interpretation [66], [67]. LLM-based code generation [68]–[71] automati- cally translates natural language-described user requirements into executable programs. A pioneering work MapCoder [70] decomposes the code generation process into four stages— retrieval, planning, coding, and debugging—each orchestrated by LLM prompts. Most existing LLM-based code generation systems, however, focus on general-purpose programming tasks but are not tailored to specific application domains. AutORANdraws inspiration from these arts and develops novel technical modules and prompts tailored to automated xApp de- velopment. Compared with the existing works, the key novelty ofAutORANlies in its systematic design and implementation of a fully automated pipeline specifically designed foragile xApp development. VII. CONCLUSION AutORANmarks the paradigm shift in xApp develop- ment, from current manual programming by experts to fully automated xApp generation and deployment.AutORANis motivated to streamline xApp development with the latest advances in LLMs and agentic AI, and thus simplify xApp programming, shorten new feature launch time, and stimulate O-RAN innovation. To this end,AutORANbuilds an end- to-end automated xApp generation pipeline integrating a set of novel techniques for user intent elicitation, knowledge retrieval, code generation, and deployment. Evaluations show AutORANcan generate performant xApps (on par with hand- crafted SOTA baselines) with little programming effort at a much shorter time. These encouraging results underscore the potential ofAutORANin accelerating O-RAN innovation. REFERENCES [1]O-RAN Alliance,"https://w.o-ran.org", 2024. [2]X. Foukas, B. Radunovic, M. Balkwill, and Z. Lai, “Taking 5g ran analytics and control to a new level,” inProceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023, p. 1–16. [3]J. Xing, J. Gong, X. Foukas, A. Kalia, D. Kim, and M. Kotaru, “Enabling resilience in virtualized rans with atlas,” inProceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023, p. 1–15. [4]X. Foukas and B. Radunovic, “Concordia: Teaching the 5g vran to share compute,” inProceedings of the ACM SIGCOMM 2021 Conference, 2021, p. 580–596. [5]N. Lazarev, T. Ji, A. Kalia, D. Kim, I. Marinos, F. Y. Yan, C. Delimitrou, Z. Zhang, and A. Akella, “Resilient baseband processing in virtualized rans with slingshot,” inProceedings of the ACM SIGCOMM 2023 Conference, 2023, p. 654–667. [6]X. Foukas, T. S. Ukyab, B. Radunovic, S. Ratnasamy, and S. Shenker, “Ranbooster: Democratizing advanced cellular connectivity through fronthaul middleboxes,” inProceedings of the ACM SIGCOMM 2025 Conference, 2025, p. 742–757. [7]X. Cheng, J. Zhang, N. Ding, N. Li, Y. Li, T. Wu, W. Xu, J. Zhang, and Q. Sun, “Integrated ai and communications: A two-way catalysis toward 6g and beyond,”Journal of Communications and Information Networks, vol. 10, no. 3, p. 191–200, 2025. SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING13 [8]Y. Chen, J. Guo, Y. Sun, H. Yao, Y. Liu, and Y. He, “Sensing resource scheduling in 5g vran: an elastic approach,”ACM Transactions on Internet of Things, vol. 6, no. 3, p. 1–24, 2025. [9]C. Sun, U. Pawar, M. Khoja, X. Foukas, M. K. Marina, and B. Radunovic, “Spotlight: Accurate, explainable and efficient anomaly detection for open ran,” inProceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024, p. 923–937. [10]H. Wen, P. Sharma, V. Yegneswaran, P. Porras, A. Gehani, and Z. Lin, “6g-xsec: Explainable edge security for emerging openran architectures,” inProceedings of the 23rd ACM Workshop on Hot Topics in Networks, 2024, p. 77–85. [11]S.-E. Hong, J. Moon, and J. Na, “Design of interpretable and enhanced anomaly detection xapp for traffic steering in o-ran,” inIEEE 36th International Symposium on Personal, Indoor and Mobile Radio Com- munications (PIMRC), 2025, p. 1–6. [12]A. Lacava, M. Polese, R. Sivaraj, R. Soundrarajan, B. S. Bhati, T. Singh, T. Zugno, F. Cuomo, and T. Melodia, “Programmable and customized intelligence for traffic steering in 5g networks using open ran archi- tectures,”IEEE Transactions on Mobile Computing, vol. 23, no. 4, p. 2882–2897, 2023. [13]R. Ntassah, G. M. Dell’Aera, and F. Granelli, “xapp for traffic steering and load balancing in the o-ran architecture,” inIEEE International Conference on Communications, 2023, p. 5259–5264. [14]M. Polese, L. Bonati, S. Doro, S. Basagni, and T. Melodia, “Understand- ing o-ran: Architecture, interfaces, algorithms, security, and research challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 2, p. 1376–1411, 2023. [15]F. Mungari, C. Puligheddu, A. Garcia-Saavedra, and C. F. Chiasserini, “O-ran intelligence orchestration framework for quality-driven xapp deployment and sharing,”IEEE Transactions on Mobile Computing, vol. 24, no. 6, p. 4811–4828, 2025. [16]N. N. Sapavath, B. Kim, K. Chowdhury, and V. K. Shah, “Experimental study of adversarial attacks on ml-based xapps in o-ran,” inIEEE Global Communications Conference, 2023, p. 6352–6357. [17]A. Scalingi, S. DOro, F. Restuccia, T. Melodia, and D. Giustiniano, “Det- ran: Data-driven cross-layer real-time attack detection in 5g open rans,” inIEEE Conference on Computer Communications, 2024, p. 41–50. [18]J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,”ACM Transactions on Software Engineering and Methodology, vol. 35, no. 2, p. 1–72, 2026. [19]J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [20]C. Fiandrino, L. Bonati, S. D’Oro, M. Polese, T. Melodia, and J. Widmer, “Explora: Ai/ml explainability for the open ran,”Proceedings of the ACM on Networking, vol. 1, no. CoNEXT3, p. 1–26, 2023. [21]G. Aguzzi, N. Farabegoli, and M. Viroli, “A language-based approach to macroprogramming for iot systems through large language models,” ACM Transactions on Internet of Things, vol. 6, no. 4, p. 1–30, 2025. [22]Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, H. Xia, J. Xu, Z. Wu, B. Changet al., “A survey on in-context learning,” inProceedings of the conference on empirical methods in natural language processing, 2024, p. 1107–1128. [23]srsRAN: Open source 4g/5g software radio access network,"https:// w.srsran.com/", 2023. [24]O-RAN Working Group 3, “O-ran near-rt ric: Architecture 7.0,”O- RAN.WG3.TS.RICARCH-R004-v07.00 Technical Specification, 2025. [25]O-RAN Working Group 2, “O-ran non-rt ric: Architecture 6.0,” O-RAN.WG2.Non-RT-RIC-ARCH-R004-v06.00 Technical Specification, 2024. [26]A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024. [27]D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. Liet al., “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,”arXiv preprint arXiv:2401.14196, 2024. [28]AI-RAN Alliance,"https://ai-ran.org/", 2024. [29]L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, no. 2, p. 1–55, 2025. [30]Claude Opus 4.1,"https://w.anthropic.com/news/claude-opus-4-1", 2025. [31]GPT-5,"https://openai.com/index/introducing-gpt-5/", 2025. [32]Cursor,"https://cursor.com/", 2025. [33]A. A. Abbassi, L. Da Silva, A. Nikanjam, and F. Khomh, “Unveiling in- efficiencies in llm-generated code: Toward a comprehensive taxonomy,” arXiv preprint arXiv:2503.06327, 2025. [34]Y. Zhu, H. Yuan, S. Wang, J. Liu, W. Liu, C. Deng, H. Chen, Z. Liu, Z. Dou, and J.-R. Wen, “Large language models for information retrieval: A survey,”ACM Transactions on Information Systems, vol. 44, no. 1, p. 1–54, 2025. [35]J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, p. 24 824–24 837, 2022. [36]SonarQube,"https://w.sonarsource.com/", 2024. [37]J. F. Santos, A. Huff, D. Campos, K. V. Cardoso, C. B. Both, and L. A. DaSilva, “Managing o-ran networks: xapp development from zero to hero,”IEEE Communications Surveys & Tutorials, vol. 28, p. 800– 840, 2025. [38]R. Schmidt, M. Irazabal, and N. Nikaein, “Flexric: An sdk for next- generation sd-rans,” inProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies, 2021, p. 411– 425. [39]Open5GS,"https://open5gs.org", 2024. [40]sysmoISIM SJA2,"https://sysmocom.de/products/sim/sysmousim/", 2021. [41]Langchain,"https://w.langchain.com/", 2024. [42]M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, and H. Jégou, “The faiss library,”IEEE Transactions on Big Data, vol. 12, p. 346–361, 2025. [43]S. Ramírezet al., “Fastapi.” [Online]. Available:https://fastapi.tiangolo. com [44]D. E. Rumelhart, G. E. Hinton, R. J. Williamset al., “Learning internal representations by error propagation,” 1985. [45]S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, p. 1735–1780, 1997. [46]A. Chiejina, B. Kim, K. Chowhdury, and V. K. Shah, “System-level analysis of adversarial attacks and defenses on intelligence in o-ran based cellular networks,” inProceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2024, p. 237–247. [47]H. Li, P. Chaudhari, H. Yang, M. Lam, A. Ravichandran, R. Bhotika, and S. Soatto, “Rethinking the hyperparameters for fine-tuning,”arXiv preprint arXiv:2002.11770, 2020. [48]3GPP. (2024) 3rd generation partnership project."https://w.3gpp.org/ ". [49]T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, p. 1877–1901, 2020. [50]A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024. [51]H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023. [52]J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Leet al., “Program synthesis with large language models,”arXiv preprint arXiv:2108.07732, 2021. [53]B. Brik, H. Chergui, L. Zanzi, F. Devoti, A. Ksentini, M. S. Siddiqui, X. Costa-Pérez, and C. Verikoukis, “Explainable ai in 6g o-ran: A tutorial and survey on architecture, use cases, challenges, and future research,”IEEE Communications Surveys & Tutorials, vol. 27, no. 5, p. 2826–2859, 2024. [54]B. Agarwal, R. Irmer, D. Lister, and G.-M. Muntean, “Open ran for 6g networks: Architecture, use cases and open issues,”IEEE Communica- tions Surveys & Tutorials, vol. 28, p. 2881–2924, 2025. [55]X. Foukas, N. Nikaein, M. M. Kassem, M. K. Marina, and K. Konto- vasilis, “Flexran: A flexible and programmable platform for software- defined radio access networks,” inProceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies, 2016, p. 427–441. [56]X. Foukas, G. Patounas, A. Elmokashfi, and M. K. Marina, “Network slicing in 5g: Survey and challenges,”IEEE communications magazine, vol. 55, no. 5, p. 94–100, 2017. [57]J. Groen, S. DOro, U. Demir, L. Bonati, M. Polese, T. Melodia, and K. Chowdhury, “Implementing and evaluating security in o-ran: Interfaces, intelligence, and platforms,”IEEE Network, vol. 39, no. 1, p. 227–234, 2024. SUBMITTED FOR REVIEW TO IEEE TRANSACTIONS ON MOBILE COMPUTING14 [58]W.-H. Ko, U. Ghosh, U. Dinesha, R. Wu, S. Shakkottai, and D. Bharadia, “Edgeric: Empowering real-time intelligent optimization and control in nextg cellular networks,” in21st USENIX Symposium on Networked Systems Design and Implementation, 2024, p. 1315–1330. [59]A. Calagna, S. Maxenti, L. Bonati, S. D’Oro, T. Melodia, and C. F. Chiasserini, “Cormo-ran: Lossless migration of xapps in o-ran,”arXiv preprint arXiv:2506.19760, 2025. [60]L. P. Rachakonda, M. Siddula, and V. Sathya, “A comprehensive study on iot privacy and security challenges with focus on spectrum sharing in next-generation networks (5g/6g/beyond),”High-Confidence Computing, vol. 4, no. 2, p. 100220, 2024. [61]C. Kilinc, M. K. Marina, M. Usama, S. Ergut, J. Crowcroft, T. Gun- dogdu, and I. Akinci, “Jade: Data-driven automated jammer detection framework for operational mobile networks,” inIEEE Conference on Computer Communications, 2022, p. 1139–1148. [62]M. Martínez-Morfa, C. R. De Mendoza, C. Cervelló-Pastor, and S. Sal- lent, “Drl-based xapps for dynamic ran and mec resource allocation and slicing in o-ran,” in15th International Conference on Network of the Future, 2024, p. 106–114. [63]X. Wu, J. Farooq, Y. Wang, and J. Chen, “Llm-xapp: A large language model empowered radio resource management xapp for 5g o-ran,” in Symposium on Networks and Distributed Systems Security, Workshop on Security and Privacy of Next-Generation Networks (FutureG 2025), San Diego, CA, 2025. [64]H. Wen, Y. Li, G. Liu, S. Zhao, T. Yu, T. J.-J. Li, S. Jiang, Y. Liu, Y. Zhang, and Y. Liu, “Autodroid: Llm-powered task automation in android,” inProceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024, p. 543–557. [65]S. Lee, J. Choi, J. Lee, M. H. Wasi, H. Choi, S. Ko, S. Oh, and I. Shin, “Mobilegpt: Augmenting llm with human-like app memory for mobile task automation,” inProceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024, p. 1119– 1133. [66]H. Xu, L. Han, Q. Yang, M. Li, and M. Srivastava, “Penetrative ai: Making llms comprehend the physical world,” inProceedings of the 25th International Workshop on Mobile Computing Systems and Applications, 2024, p. 1–7. [67]S. Ji, X. Zheng, and C. Wu, “Hargpt: Are llms zero-shot human activity recognizers?” inIEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things, 2024, p. 38–43. [68]L. Shen, Q. Yang, Y. Zheng, and M. Li, “Autoiot: Llm-driven automated natural language programming for aiot applications,” inProceedings of the 31st Annual International Conference on Mobile Computing and Networking, 2025, p. 468–482. [69]L. Shen, Q. Yang, X. Huang, Z. Ma, and Y. Zheng, “Gpiot: Tailoring small language models for iot program synthesis and development,” inProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025, p. 199–212. [70]M. A. Islam, M. E. Ali, and M. R. Parvez, “Mapcoder: Multi-agent code generation for competitive problem solving,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, p. 4912–4944. [71]D. Huang, J. M. Zhang, M. Luck, Q. Bu, Y. Qing, and H. Cui, “Agentcoder: Multi-agent-based code generation with iterative testing and optimisation,”arXiv preprint arXiv:2312.13010, 2023.