Paper deep dive

A Computationally Efficient Learning of Artificial Intelligence System Reliability Considering Error Propagation

Fenglian Pan, Yinwei Zhang, Yili Hong, Larry Head, Jian Liu

Year: 2026Venue: arXiv preprintArea: cs.AIType: PreprintEmbeddings: 119

Abstract

Abstract:Artificial Intelligence (AI) systems are increasingly prominent in emerging smart cities, yet their reliability remains a critical concern. These systems typically operate through a sequence of interconnected functional stages, where upstream errors may propagate to downstream stages, ultimately affecting overall system reliability. Quantifying such error propagation is essential for accurate modeling of AI system reliability. However, this task is challenging due to: i) data availability: real-world AI system reliability data are often scarce and constrained by privacy concerns; ii) model validity: recurring error events across sequential stages are interdependent, violating the independence assumptions of statistical inference; and iii) computational complexity: AI systems process large volumes of high-speed data, resulting in frequent and complex recurrent error events that are difficult to track and analyze. To address these challenges, this paper leverages a physics-based autonomous vehicle simulation platform with a justifiable error injector to generate high-quality data for AI system reliability analysis. Building on this data, a new reliability modeling framework is developed to explicitly characterize error propagation across stages. Model parameters are estimated using a computationally efficient, theoretically guaranteed composite likelihood expectation - maximization algorithm. Its application to the reliability modeling for autonomous vehicle perception systems demonstrates its predictive accuracy and computational efficiency.

PDF

Open source PDF →Open local PDF →

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

118,500 characters extracted from source content.

Expand or collapse full text

A Computationally Efficient Learning of Artificial Intelligence System Reliability Considering Error Propagation Fenglian Pan1, Yinwei Zhang2, Yili Hong3, Larry Head2, and Jian Liu2 1Department of Industrial and Systems Engineering, UNC at Charlotte, Charlotte, NC, USA 2Department of Systems and Industrial Engineering, University of Arizona, Tucson, AZ, USA 3 Department of Statistics, Virginia Tech, Blacksburg, VA, USA Corresponding author: jianliu@arizona.edu Abstract Artificial Intelligence (AI) systems are increasingly prominent in emerging smart cities, yet their reliability remains a critical concern. These systems typically operate through a sequence of interconnected functional stages, where upstream errors may propagate to downstream stages, ultimately affecting overall system reliability. Quantifying such error propagation is essential for accurate modeling of AI system reliability. However, this task is challenging due to: i) data availability: real-world AI system reliability data are often scarce and constrained by privacy concerns; i) model validity: recurring error events across sequential stages are interdependent, violating the independence assumptions of statistical inference; and i) computational complexity: AI systems process large volumes of high-speed data, resulting in frequent and complex recurrent error events that are difficult to track and analyze. To address these challenges, this paper leverages a physics-based autonomous vehicle simulation platform with a justifiable error injector to generate high-quality data for AI system reliability analysis. Building on this data, a new reliability modeling framework is developed to explicitly characterize error propagation across stages. Model parameters are estimated using a computationally efficient, theoretically guaranteed composite likelihood expectation–maximization algorithm. Its application to the reliability modeling for autonomous vehicle perception systems demonstrates its predictive accuracy and computational efficiency. Keywords: Autonomous vehicle simulation, Error propagation modeling, Composite likelihood estimation, Multistage AI systems, Scalable statistical inference 1 Introduction Artificial intelligence (AI) systems are becoming increasingly prevalent in various industry sectors, including transportation (Mnyakin, 2023), information technology (Chan et al., 2019), manufacturing (Peres et al., 2020), and healthcare (Balagurunathan et al., 2021). While AI technologies have unlocked unprecedented potential for innovation and efficiency in real-world applications, their reliability has emerged as a critical barrier to broader adoption. This challenge is particularly pronounced in safety-critical applications, such as autonomous vehicles (AVs), where failures in AI system may lead to severe consequences. For example, in a recent tragic incident, the AI system of an AV misclassified the white side of a trailer as a bright sky, resulting in a fatal crash (NHTSA, 2017). Such incidents undermines public trust in AI technology and highlight the urgent need for scientifically grounded reliability modeling methods (Zou et al., 2022). Motivated by this need, this paper aims to develop a statistical reliability modeling framework, with applications to AI systems in AVs, to enable systematic quantification of AI system performance. AI systems in AVs usually integrate multiple sensors and AI/machine learning (ML) algorithms to enable situation awareness, decision-making, and control for AV navigation. In practice, an AI system operates through a series of functionally interconnected stages, where the outputs of one stage serve as the essential inputs for a functionally connected subsequent stage. Within each stage, one or more modules may operate in parallel to provide functional redundancy, ensuring that if one module fails or produces erroneous outputs, the remaining modules can compensate. As shown in Fig. 1(a), the data acquisition stage of a three-stage AI system segment in an AV employs both camera and LiDAR sensors to capture complementary information: high-resolution images and three-dimensional (3-D) point cloud data that “depict” surrounding driving environment. These heterogeneous, but complementary, data streams are processed in a subsequent object detection stage, where AI/ML algorithms perform 2-D detection on image data and 3-D detection on LiDAR point clouds to identify vehicles, pedestrians, and other relevant objects, respectively. In the subsequent object localization stage, the detected outputs from both modules are fused to determine precise spatial locations of the objects on the road, enabling the AV to construct an accurate, real-time map of its immediate surroundings. While this multi-module, multi-stage structure is critical for ensuring overall system reliability and ensuring the safety of an AV, each module or stage may generate or be exposed to errors under diverse and potentially challenging driving conditions. For example, in Fig. 1(a), the camera module may capture low-quality images in adverse weather conditions, such as heavy rain or dense fog. The 2-D object detection module may fail to detect the car on the road due to the inherent limitations of its own algorithm, or to the degraded quality of the input image. Such error events tend to recur intermittently over time and can be represented as random points along the time domain, as shown in Fig. 1(b). Individually, these errors may not immediately result in catastrophic failure of the AI system. However, due to the functional dependencies among stages, some errors may propagate to downstream stages and trigger additional errors in subsequent stages. This phenomenon is referred to as error propagation (EP) in the rest of this paper. As illustrated in Fig. 1(b), a low-quality image error at time t3t_3 in the data acquisition stage propagates to 2-D object detection stage and causes 2-D detection error at time t6t_6, which further propagates to localization stage and lead to localization error at time t9t_9. To better understanding EP, this paper distinguishes two types of errors, defined as follows. • Definition 1. A Primary Error is an error event caused by nonconforming performance of a module at a certain functional stage itself, which is independent of the performance/input of modules in upstream stages. For example, in Fig. 1(c), even with a high-quality image captured by a camera module in data acquisition stage at t1t_1, the 2-D object detection module still failed to detect the object in the scene at t4t_4, indicating that this mis-detection error event arises directly (or “primarily”) from the inherent limitation of the object detection algorithm. Such primary errors are marked as solid-filled points in Fig. 1(c). • Definition 2. A Propagated Error is an error event triggered by erroneous input from an upstream stage. For instance, in Fig. 1(c), the 2-D object detection error at time t6t_6 is triggered by an excessively noisy image captured by the camera sensor module in the data acquisition stage at time t3t_3. Such propagated errors are marked as hollow-filled points in Fig. 1(c). Figure 1: An Illustration of the EP in a Critical Segment of a Multi-Stage AI System It is essential to explicitly model EP for AI system reliability analysis. This is because EP effects may accumulate across stages over time, potentially degrading system performance and ultimately leading to failure of the entire AI system. Neglecting such effects may result in an inaccurate system reliability model that underestimates the risk of system-level failure. Nevertheless, modeling EP in AI systems presents several unique challenges, including: (1) Data challenge: lack of readily available AI reliability data. A fundamental challenge in statistical AI system reliability modeling and analysis is the scarcity of reliability data. This is especially true for rare system-failure event data, such as AV disengagement event (not necessarily fatalities) (Min et al., 2022), which would require real-world tests with vehicles driven for hundreds of millions of miles to collect (Kalra and Paddock, 2016) (Zheng et al., 2025). In contrast, the AV simulation can offer a cost-effective and highly repeatable means of data generation. Simulation platforms, such as Automated Driving Toolbox, CARLA, and Autoware, can provide a variety of driving environments, traffic agents (e.g., vehicles and pedestrians), and rule-based traffic flows (Li and Okhrin, 2023). Nevertheless, concerns regarding the fidelity, realism, and representativeness of simulated scenarios relative to real-world driving conditions continue to limit the reliability of simulation data. (2) Model challenge: latent and probabilistic EP. The presence of EP introduces several fundamental challenges for AI system reliability modeling. First, EP makes the error events across stages NOT independent given all of the known factors, which violates the independence assumptions of statistical reliability modeling. Second, EP is inherently latent, i.e., downstream errors do not carry labels indicating whether they are primary or propagated, making these error types indistinguishable given the observed data. Third, EP itself is probabilistic rather than deterministic. An upstream error may increase the risk of downstream failures, but does not guarantee their occurrence. These characteristics jointly complicate EP modeling in AI systems. (3) Inference challenge: inaccurate and inefficient EP estimation. The latent and probabilistic nature of EP tends to confound primary errors with propagated ones, thereby obscuring the original source of system failures and leading to biased model inference. Furthermore, the scale of recurring error event data collected across all system modules and stages, spanning the entire system operational duration, is vast. Performing inference on such datasets while simultaneously differentiating the effects of EP creates a significant computational bottleneck for traditional estimation methods. To address the data challenge, existing studies have proposed collecting AI system reliability data from real-world testing or simulation, as summarized in Table 1. Most publicly available AI system reliability data are collected independently at either the module or system level. For example, at the module level, the KITTI dataset (Geiger et al., 2013) provides reliability data of multiple AI system modules in moving AVs on the road, e.g., camera images, laser scans, GPS measurements, and IMU accelerations. Similarly, the classification errors were collected from an image recognition model to assess the reliability of ML algorithms (Lian et al., 2021), (Faddi et al., 2024). At the system level, the Autonomous Vehicle Tester (AVT) program (California Department of Motor Vehicles, 2024) reports disengagement event data collected during real-world testing of AV systems, providing valuable insights into AVs reliability. Additionally, online database (AI Incident, 2024) documents incidents involving the use of AI systems that result in harm or near-harm consequences. However, because cross-stage functional connections are absent, such disjoint module-level or system-level data alone cannot be used to model the EP. Table 1: Existing AI System Reliability Data Dataset Level Source KITTI (Geiger et al., 2013) Module Real-world ML/AI Algorithm Error (Lian et al., 2021) (Faddi et al., 2024) Module Simulation AV Disengagement (California Department of Motor Vehicles, 2024) System Real-world General AI Incidence Data (AI Incident, 2024) System Real-world To address the reliability modeling challenge, various stochastic approaches have been developed to model different types of reliability data, such as time-to-event data and degradation data (Gorjian et al., 2010). For example, a Bayesian hazard model was introduced to consider latent heterogeneity in lifetime data (Li and Liu, 2016), while a Bayesian nonparametric model was proposed to model the heterogeneous time-to-event data (Li et al., 2017). For degradation data, Wiener-process-based methods (Zhang et al., 2018), Gamma process models (Pan and Balakrishnan, 2011), and Bayesian models (Yuan et al., 2019) have been widely employed. In the context of AI reliability analysis, (Wang et al., 2020) proposed a new data fusion model that utilizes a set of degradation data to monitor the reliability of AI systems. These methods primarily focus on single failure events or continuous degradation until failure and are well-suited for modeling lifetime and degradation data. However, they are not directly applicable to recurrent-event data, which involves modeling events that occur over time. To analyze recurrent event data, approaches such as the Homogeneous Poisson Process (HPP) (Hossain et al., 1993; Yang et al., 2024), the Non-homogeneous Poisson Process (NHPP) (Pham and Zhang, 2003), and the Renewal Process (RP) (Pyke, 1961) are commonly employed. Among these methods, the NHPP models have particularly gained much popularity. For instance, the NHPP has been developed to model recurrent disengagement events observed during autonomous vehicle driving tests (Min et al., 2022). Similarly, software reliability growth models (SRGMs) (Zhao et al., 2019) have been built using the same dataset to analyze the AV reliability over time. Being successful in predicting module-level or system-level reliability, however, these methods didn’t consider the event-level interdependency between different module/system (i.e., the cross-stage EP). To address this limitation, a multi-stage Hawkes Process (MSHP) (Pan et al., 2022) was recently proposed to explicitly quantify the EP in a multi-stage serial perception system in AV. This method assumes only one module in each functional stage and cannot model scenarios with multiple modules within a stage. A subsequent research (Pan et al., 2024) introduced an event-triggering point process that extends the model to handle such scenarios. Despite these advancements, how to efficiently estimate the model parameter from massive reliability data has not been investigated in these models. To address the inference challenge, the likelihood-based methods have been widely used to estimate the model. A well-established approach is maximum likelihood estimation (MLE) (Pan et al., 2025), which is popular for its asymptotic efficiency and desirable statistical properties. However, MLE can pose significant challenges in terms of convergence and computational efficiency, particularly when estimating models with a large number of parameters (Kapur and Pecht, 2014). These difficulties are further compounded in the presence of incomplete data, where the need to account for unobserved or latent information (e.g., the latent EP in this paper) adds additional layers of mathematical and computational complexity. To address these limitations, researchers have explored alternative ways to implement MLE in settings with incomplete or partially observed data. One of the most widely used approaches in such contexts is the expectation–maximization (EM) algorithm, which is designed to handle latent information by iteratively alternating between two steps: (1) the expectation (E) step, which computes the expected log-likelihood given the observed data and current parameter estimates, and (2) the maximization (M) step, which updates the parameters by maximizing this expected log-likelihood (Dempster, 1977). The EM algorithm was first applied to estimate the parameters of SRGM in (Okamura et al., 2003) for the Exponential, Gamma, and Weibull distributions, and then later in (Okamura and Dohi, 2013) for other fault detection time distributions. However, the update rule for the EM algorithm for all distribution parameters can be computationally complex when applied to a large dataset (Zeephongsekul et al., 2016). In this context, some studies have explored an extensions of the EM algorithm to pseudo-likelihood settings, including pseudo-EM methods for network tomography (Liang and Yu, 2003), pairwise EM approaches for spatial generalized linear mixed models (Varin et al., 2005), and composite likelihood EM formulations for multivariate hidden Markov models (Gao and Song, 2011). These developments primarily focus on latent state sequence models or high-dimensional correlated data. To date, there has been limited work on developing composite likelihood EM methods for recurrent-event data, particularly for models that incorporate latent event-propagation mechanisms. This paper develops a comprehensive methodology to address challenges in data, modeling, and estimation for AI system reliability, with explicit consideration of EP. First, we develop an error injection (EI) mechanism within a scalable physics-based simulation platform to systematically generate and collect recurring error event data across multiple interconnected stages of AI systems. Building upon these data, we propose an intensity-decomposition approach that explicitly models latent and probabilistic EP by mathematically distinguishing the effects of primary and propagated errors. To efficiently facilitate inference under latent and probabilistic EP, we introduce latent variables that indicate error types and design a composite likelihood–based EM algorithm. The proposed approach can achieve higher computational efficiency compared to conventional full-likelihood EM methods while preserving theoretical properties. The main contributions of this research include: • A systematic data collection framework, which systematically generates and collects reliability data across multiple interconnected stages of an AI system. The framework enables targeted perturbations in specific modules at user-defined times and probabilities. By incorporating real-world testing information, it helps bridge the gap between simulation and real-world testing, providing more realistic and justifiable reliability data. • An intensity-decomposition-based EP modeling approach, which explicitly captures latent and probabilistic EP by explicitly differentiating primary and propagated errors and decomposing the overall error effects into interpretable intensity components, thereby relaxing unrealistic independence assumptions across system modules. • A new CLEM algorithm for point process model inference, which exploits localized composite likelihood construction and substantially reduces computational cost compared with the conventional full-likelihood EM approach. A stepwise Friedman Test selection procedure is further introduced to guide the construction of the composite likelihood, ensuring that estimation accuracy remains comparable to that of the standard EM algorithm. Moreover, the proposed framework preserves key theoretical properties, including the ascent property and statistical consistency. Table 2 summarizes the limitations of the existing methods and the features of the proposed methods in addressing the data, model, and inference challenges in AI system reliability modeling Table 2: Challenges and Solutions: Existing vs. Proposed Methods Challenges Limitations of Existing Approaches Features of Proposed Methods Data Disjoint module- or system-level data without EP information Systematic data collection with EP information Expensive and time-consuming real-world tests with limited scenarios Scalable, EI-enabled simulation with diverse driving scenarios Model Potential EP effects neglected EP effects explicitly modeled as a quantifiable hazard intensity component Inference Computationally prohibitive/inefficient Computationally efficient with theoretical guarantee The remainder of this paper is organized as follows. Section 2 introduces a methodology. Section 3 demonstrates the advantages of the proposed method based on the numerical case study and physics-based simulation case study. The conclusion and future work will be given in Section 4. 2 Methodology The objective of this study is to effectively and efficiently analyze the reliability of AI systems in AVs, with a particular focus on accurate EP effect quantification. This analysis relies on extensive reliability data and subsequent modeling and estimation techniques, which can be implemented in an integrative methodological framework presented in this section, including data simulation, model formulation, parameter estimation, and performance evaluation. 2.1 Data Simulation with a Justifiable EI-Enabled Framework To address the data challenge by collecting reliability data with EP information, we propose a physics-based simulation framework to systematically generate AI system reliability data for AVs under diverse traffic scenarios. As illustrated in Fig. 2, the framework comprises three main components. • A physics-based AV simulation platform, which is used to simulate AV performance in diverse traffic scenarios. As illustrated in Fig. 2 (a), this platform consists of two main elements: (i) environment, which integrates diverse physical models, including infrastructures, driving scenarios, and traffic-related agents, facilitating a high-fidelity simulation environment that closely mirrors real-world driving conditions; (i) ego vehicle, which interacts with the driving environment via an AI system. This AI system fuses data from multiple sensors (e.g., camera, RADAR, and LiDAR) to perceive the surroundings, leveraging AI/ML algorithms for object detection, localization, and path planning. • An EI-enabled framework, which is built on the Robot Operating System (ROS) (Quigley et al., 2009) that adopts a publisher-subscriber mechanism to facilitate data transmission between modules at upstream (publisher) and downstream (subscriber) stages in an AI system. With this mechanism, sensors and perception modules act as upstream publishers, continuously generating data streams such as camera images, LiDAR point clouds, or radar signals. These data streams are published as topics in ROS. Downstream modules, such as object detection, tracking, and sensor fusion nodes, function as subscribers. They receive relevant topics, process the incoming data, and produce higher-level perception outputs such as detected objects, lane boundaries, or free-space maps. As illustrated in Fig. 2 (b), the proposed EI framework involves the following three key steps: (i) create a Publisher Node to transmit data from the upstream modules to downstream modules through ROS topics, (i) inject realistic errors (e.g., randomly remove the point cloud data or add Gaussian noise to the image data) into the ROS topics, and (i) create a subscriber node to receive erroneous data in downstream modules. This design allows for targeted EI into specific AI modules at user-defined timestamps and probabilities. Formally, given a module msm_s, a user-defined timestamp tmserrt^err_m_s, and a probability ptmserrp_t^err_m_s, the EI indicator function is defined as f(ms,tmserr,ptmserr)=u≤ptmserr,u∼U(0,1)f(m_s,t^err_m_s,p_t^err_m_s)= 1\u≤ p_t^err_m_s\,u U(0,1), where u is a random number drawn from a uniform distribution, and ⋅ 1\·\ is an indicator function. Here, f(ms,tmserr,ptmserr)=1f(m_s,t^err_m_s,p_t^err_m_s)=1 indicates that an error is successfully injected into module msm_s at time tmserrt^err_m_s with probability ptmserrp_t^err_m_s, whereas f(ms,tmserr,ptmserr)=0f(m_s,t^err_m_s,p_t^err_m_s)=0 indicates that an error is not successfully injected into module msm_s at time tmserrt^err_m_s. Fig. 2(b) shows a successful EI application in the perception module, where clear camera images and dense LiDAR point clouds (blue and red car) are transformed into noisy images and sparse point clouds. With this EI framework, we can design the EI probabilities based on empirical findings from real-world tests, which can narrow the gap between the simulation and reality. In addition, for rare events, instead of waiting for errors to occur naturally, the EI can accelerate the error occurrence process, providing an efficient method to generate and log sufficient error event data for reliability modeling. • A data logging component, which is responsible for systematically collecting recurrent error events from multiple modules across different functional stages throughout the continuous operation of the AV. Fig. 2(c1) shows a representative layout of a three-stage segment of a multi-stage AI system, where s is the stage index and msm_s is the module index at stage s. The recorded error events can be organized hierarchically at module, stage, and system levels, as illustrated in Fig. 2(c2). For each error event, the logging component captures key information, including its error type, occurrence time, stage index, module index (see Subsection 2.2 for notation details). By jointly tracking error events across stages over time, the logged data captures the temporal and cross-stage dependencies among modules, thereby providing the necessary information to characterize and model EP throughout the multi-stage AI system. Figure 2: An Illustration of Error Injection Framework in a Physics-Based Simulation Platform 2.2 Model Formulation Consider a multi-stage AI system with S functional stages, each of which fulfills its functionality with MsM_s module(s), where Ms≥1M_s≥ 1. Let ms∈1s,…,Msm_s∈\1_s,...,M_s\ denote the index of the msm_s-th module at stage s, where s∈1,2,…,Ss∈\1,2,...,S\. At the module level, let ms=[tms1,…,tmsi,…,tmsnms]⊤∈ℝnms×1t_m_s=[t_m_s^1,...,t_m_s^i,...,t_m_s^n_m_s] ^n_m_s× 1 denote the vector of observed error events times collected from module msm_s. Here, tmsit_m_s^i is the timestamp of the ithi^th-th error event in module msm_s and nmsn_m_s is the number of error events observed in module msm_s, as illustrated in Fig. 3 (a). At the stage level, the collection of error events observed in stage s is represented by s=∪ms=1Msmst_s= _m_s=1^M_st_m_s and the overall system-level error events are represented by =∪s=1Sst= _s=1^St_s, as shown in Fig. 3 (d) and (g), respectively. The error events, t, collected from an AI system can be naturally characterized as a multivariate point process evolving over time (Cox and Isham, 1980). In general, a point process is described by its conditional intensity function (CIF, “intensity” hereafter), which quantifies the instantaneous probability of an event occurring at time t, given the event history up to that time. Mathematically, the module-level intensity associated with module msm_s is defined as, λms(t|ms)=limdt→0E[N(t+dt)−N(t)|ms]dt, _m_s(t|t_m_s)= _dt→ 0 E[N(t+dt)-N(t)|t_m_s]dt, (1) where N(t)N(t) denote the cumulative number of events up to time t, and E[N(t+dt)−N(t)|ms]E[N(t+dt)-N(t)|t_m_s] is the expected number of new events in the infinitesimal interval (t,t+dt](t,t+dt] conditional on the historical mst_m_s. This intensity function provides a principal way to model the stochastic occurrence of errors at a given module msm_s. It serves as the fundamental building block of the proposed EP model. Figure 3: Illustration of Error Event Differentiation It is worth noting that the raw error event observations in Fig. 3 (a), (d), and (g) do not carry labels that indicate whether they are primary errors or propagated errors. In other words, EP is latent in the observed data. To explicitly model this latent EP, this paper distinguishes the module-level error events mst_m_s can be differentiated into two categories: (i) primary errors, denoted by ms0t_m_s^0, and propagated errors, denoted by mspt_m_s^p. One illustrative example of this distinction is shown in Fig. 3 (a)-(c). Similarly, stage-level error events st_s can be differentiated into primary errors, s0t_s^0, and propagated errors, spt_s^p, as illustrated in Fig. 3 (d)–(f). System-level errors, t, can be differentiated into the primary error, 0t^0, and the propagated error, pt^p, as shown in Fig. 3 (g)–(i). Based on these event decompositions, this paper decomposes the total error intensity of module msm_s as the sum of a primary error intensity and an accumulated propagated error intensity caused by stage s−1s-1. To make this decomposition interpretable and mathematically tractable, we made two assumptions regarding the propagation mechanism as below. • Assumption 1. Errors in module msm_s can be propagated only from its immediate preceding stage, which accumulates the propagation effects from further upstream stages. • Assumption 2. Given the external environmental factors, such as rainy and/or windy weather, the error events collected from the parallel modules within the same stage are independent. Under these two assumptions, the total intensity of module msm_s can be decomposed as λms(t|ms)=λms0(t|ms0)+∑ms−1=1Ms−1λms,ms−1p(t|ms−1p), _m_s(t|t_m_s)= _m_s^0(t|t_m_s^0)+Σ _m_s-1=1^M_s-1 _m_s,m_s-1^p(t|t_m_s-1^p), (2) where the first term, λms0(t|ms0) _m_s^0(t|t_m_s^0), denotes the primary error intensity of module msm_s and the second term, ∑ms−1=1Ms−1λms,ms−1p(t|ms−1p)Σ _m_s-1=1^M_s-1 _m_s,m_s-1^p(t|t_m_s-1^p), represents a collection of propagated error intensities inherited from the immediate proceeding stage s−1s-1. Specifically, the primary error intensity, λms0(t|ms0) _m_s^0(t|t_m_s^0), characterizes the instantaneous occurrence rate of errors that causes by module msm_s itself, independent of upstream modules. It reflects the intrinsic reliability of the module msm_s without considering the EP. The propagated error intensity, λms,ms−1p(t|msp) _m_s,m_s-1^p(t|t_m_s^p), captures the instantaneous occurrence rate of errors at module msm_s that are probabilistically triggered by errors occurring in an upstream module ms−1m_s-1. This term explicitly explains the functional dependencies between module ms−1m_s-1 and module msm_s and provide a mechanism for modeling EP throughout the multi-stahe AI system. Appropriate parameterization of both the primary error intensity and the propagated error intensity in Eq. (2) is critical for accurately modeling the AI system reliability. This paper assumes that, without considering EP, the primary error intensity of module msm_s is a constant over time, i.e., λms0(t|ms0)=λms0, λms0>0. _m_s^0(t|t_m_s^0)= _m_s^0, _m_s^0>0. (3) This is because AVs are subject to strict safety and reliability standards, such as ISO 26262 (Kafka, 2012), which emphasize minimizing variability in error rates to ensure safe operation. To meet these standards, AI modules in AVs undergo rigorous development, testing, and validation processes, all aimed at delivering consistent performance under normal operating conditions (i.e., without considering EP). For the propagated error intensity, we adopt a different characterization. We assume that an error that occurs in an upstream module tends to propagate immediately to downstream modules. This effect naturally diminishes over time because the vehicle’s decision-making process is primarily driven by the most recent input data. As new information continuously arrives, earlier errors are corrected, filtered, or overridden, thereby reducing their long-term influence on the system. To capture this mechanism, we model the propagated error process as a function of the upstream error process, modulated by an exponential decay function, i.e., λms,ms−1p(t|ms−1p)=∑tms−1j<tαms,ms−1⋅exp⁡[−βms,ms−1⋅(t−tms−1j)], _m_s,m_s-1^p(t|t_m_s-1^p)= _t_m_s-1^j<t _m_s,m_s-1· [- _m_s,m_s-1·(t-t_m_s-1^j)], (4) where parameter αms,ms−1 _m_s,m_s-1 models the probabilistic nature of latent EP, with αms,ms−1=0 _m_s,m_s-1=0 indicates that no EP exists between from module ms−1m_s-1 and module msm_s, whereas a positive αms,ms−1 _m_s,m_s-1 quantifies instantaneous EP effect for errors occurred in module ms−1m_s-1 to propagate to module msm_s. Parameter βms,ms−1 _m_s,m_s-1 models the temporal decay rate of the EP effects from module ms−1m_s-1 to module msm_s. Both αms,ms−1 _m_s,m_s-1 and βms,ms−1 _m_s,m_s-1 are uniquely associated to a module pair, (ms−1m_s-1, msm_s), characterizing the latent and probabilistic EP between module ms−1m_s-1 and msm_s. A higher αms,ms−1 _m_s,m_s-1 indicates a greater instantaneous EP effect, and a lower βms,ms−1 _m_s,m_s-1 indicates a lasting EP effect. 2.3 Model Estimation With the model established, this section focuses on developing a likelihood-based estimation algorithm to estimate parameters defined in Eqs. (2)-(4). This algorithm is designed to: (i) explicitly distinguish and quantify primary errors and propagated errors, (i) improve computational efficiency, and (i) demonstrate theoretical guarantees for convergence and consistency. 2.3.1 Likelihood Function Let ms=λms0,αms,ms−1,βms,ms−1:ms−1=1,…,Ms−1 _m_s= \λ^0_m_s,\, _m_s,m_s-1,\, _m_s,m_s-1:m_s-1=1,…,M_s-1 \ denote the parameter set of module msm_s. The complete parameter set of an AI system can be defined as =ms:s=1,⋯,S,ms=1,⋯,Ms =\ _m_s:s=1,·s,S,m_s=1,·s,M_s\. Proposition 1. Let T denote the length of the observation window for all modules, the overall log-likelihood across all the modules and stages in an AI system will be: ℓ()=∑s=1S∑ms=1Msℓms(ms∣ms), ( )= _s=1^S _m_s=1^M_s _m_s( _m_s _m_s), (5) where ℓms _m_s is the log-likelihood for module msm_s and can be formulated as ℓms(ms|ms)=∑i=1nmslog⁡(λms(tmsi))−∫0Tλms(t)t. _m_s( _m_s|t_m_s)= _i=1^n_m_s ( _m_s(t_m_s^i))- _0^T _m_s(t)dt. (6) λms(t) _m_s(t) is defined in Eqs. (2) - (4). The parameters can be estimated by finding the values of that maximizes ℓ() ( ) in Eqs. (5) - (6). Standard numerical routines for maximum likelihood estimation (MLE) can be directly applied to obtain parameter estimates. However, it can be observed that we need to simultaneously estimate M1+∑s=2SMs(2×Ms−1+1)M_1+ _s=2^SM_s(2× M_s-1+1) parameters in this model, where Ms,s=1,…,SM_s,s=1,...,S, denotes the number of modules at stage s. Even for a moderate MsM_s, such high-dimensional optimization can become computationally intensive and numerically unstable (Veen and Schoenberg, 2008). Moreover, beyond computational challenges, directly applying MLE does not distinguish between the primary errors 0t^0 and propagated errors pt^p, which limits its ability to quantify the latent and probabilistic EP within the system. 2.3.2 Latency Indicator To distinguish between primary and propagated errors, we introduce a latency indicator variable, ImsiI_m_s^i, for each error event tmsi∈mst_m_s^i _m_s. ImsiI_m_s^i is defined as Imsi=0,if tmsi is a primary error,j,if tmsi is triggered by an upstream error tms−1j∈s−1,I_m_s^i= cases0,&if $t_m_s^i$ is a primary error,\\[6.0pt] j,&if $t_m_s^i$ is triggered by an upstream error $t_m_s-1^j _s-1$, cases (7) where, j≠0j≠ 0 denotes the index of an error event in module ms−1m_s-1 in the previous stage s−1s-1. Let I denote the collection of all latency indicators, i.e., =Imsi:s=1,…,S;ms=1,…,Ms;i=1,…,nmsI= \I_m_s^i:s=1,…,S;\;m_s=1,…,M_s;\;i=1,…,n_m_s \. Given I, the error events t can be decomposed into two disjoint subsets, i.e., =0∪pt=t^0 ^p, where 0=tmsi∈|Imsi=0t^0=\t_m_s^i |I_m_s^i=0\ and p=tmsi∈|Imsi=jt^p=\t_m_s^i |I_m_s^i=j\. Inference of the latency indicators I is critical for the model estimation based on the likelihood functions defined in Eqs. (5)-(6). By treating I as missing values, the estimation can be implemented with an EM algorithm, as described in (Pan et al., 2024). However, the EM algorithm will be computationally prohibitive when the number of error events is large. This is because the decay kernel function in Eq. (4), exp⁡[−βms,ms−1⋅(t−tms−1j)] [- _m_s,m_s-1·(t-t_m_s-1^j)], with infinite time supports imply that every historical error event in upstream stages could potentially propagate to the downstream stages and contribute to their error intensity functions. Thus, in the E-step of the EM algorithm, evaluating the distribution of each latency indicator ImsiI_m_s^i requires looping over the complete history of upstream error events tms−1j∈s−1t_m_s-1^j _s-1. It involves computing a series of conditional probabilities: pmsi(0)=Pr⁡(Imsi=0)p_m_s^i^(0)= (I_m_s^i=0), representing the probability that tmsit_m_s^i is a primary error, and pmsi(j)=Pr⁡(Imsi=j)p_m_s^i^(j)= (I_m_s^i=j), representing the error tmsit_m_s^i is a propagated error triggered by the upstream error tms−1jt_m_s-1^j. As the observation window expands, the number of such probabilities grows quadratically with the number of events, resulting in a substantial computational burden for the E-step. 2.3.3 Composite Likelihood Expectation–Maximization Algorithm To alleviate the computational burden imposed by the EM algorithm, this paper proposes a composite likelihood (CL) approach, which facilitates statistical estimation and inference by constructing a pseudo-likelihood as the product of a collection of tractable component likelihoods, with the specific components selected according to the modeling context. Building upon this approach, a CL version EM algorithm, CLEM, that maximizes CL to estimate the model parameter , in the presence of missing label, i.e., latent EP represented by the latency indicators, ImsiI_m_s^i’s, in Eq. (7). To the best of our knowledge, this work is the first to develop a CLEM algorithm for point process model inference. Specifically, we formulate a CLEM framework for point processes by partitioning the observation window into multiple sub-windows and constructing a CL from block-likelihood contributions evaluated within each sub-window. Unlike conventional CLEM algorithms that typically combine marginal or conditional likelihood components, the proposed method preserves the structural form of the point process likelihood locally within each sub-window based on observed events. This formulation preserves the essential EP mechanism within localized temporal neighborhoods while improving computational tractability by imposing a finite temporal support on the triggering effect. Such an assumption is practically reasonable in autonomous driving contexts, where decision-making primarily relies on most recent data inputs. As new data continuously arrive, earlier errors are progressively corrected or overridden, thereby limiting their long-term propagated influence. Let the observation window [0,T)[0,T) be partitioned into K non-overlapping sub-windows of equal length d=T/Kd=T/K, where T is the length of observation window. The selection of K will be discussed in detail in Section 2.3.4. For each sub-window k,k=1,…,Kk,k=1,...,K, we redefine the module-level, stage-level, and system-level error as msk=tmsi∈∣(k−1)d≤tmsi<kd,sk=∪ms=1Msmsk, and k=∪s=1Sskt_m_s^k=\t_m_s^i (k-1)d≤ t_m_s^i<kd\, _s^k= _m_s=1^M_st_m_s^k, and t^k= _s=1^St_s^k, respectively. The composite log-likelihood is then defined as ℓCL(∣)=∑k=1Kℓ(∣k), _CL( )= _k=1^K ( ^k), (8) where each component ℓ(∣k) ( ^k) is derived analogously to Eqs. (5)–(6), but with observation window [(k−1)d,kd)[(k-1)d,kd) and the restricted event set kt^k. Accordingly, the latency indicator for each event tmsi∈kt_m_s^i ^k can be simplified: Imsik=0,if tmsi is a primary error,j,if tmsi is triggered by upstream error tms−1j∈s−1k,I_m_s^i^k= cases0,&if $t_m_s^i$ is a primary error,\\ j,&if $t_m_s^i$ is triggered by upstream error $t_m_s-1^j _s-1^k$, cases (9) where, tms−1j∈s−1kt_m_s-1^j _s-1^k rather than tms−1j∈s−1t_m_s-1^j _s-1 as in Eq. (7), implying that only EP scenarios within each sub-window are considered. In other words, we do not need to track the entire event history of stage s−1s-1. In this setting, let CLk=Imsik:s=1,⋯,S;ms=1,⋯,Ms;i=1,⋯,nmskI_CL^k=\I_m_s^i^k:s=1,·s,S;m_s=1,·s,M_s;i=1,·s,n_m_s^k\ denote the collection of latency indicators associated with event set kt^k where nmskn_m_s^k is the number of errors in module msm_s occurred in interval k. The collection of latency indicators aross all sub-windows is then defined as CL=∪k=1KCLkI_CL= _k=1^KI_CL^k. Evaluating the distribution of each latency indicator ImsikI_m_s^i^k requires only computing the the conditional probabilities: pmsi(0)=Pr⁡(Imsik=0)p_m_s^i^(0)= (I_m_s^i^k=0), representing the probability that tmsit_m_s^i is a primary error, and pmsi(j)=Pr⁡(Imsik=j)p_m_s^i^(j)= (I_m_s^i^k=j), representing the error tmsit_m_s^i is a propagated error triggered by the upstream error tms−1j∈s−1kt_m_s-1^j _s-1^k. With ℓCL(∣) _CL( ) defined in Eq. (8), our objective is to develop a CLEM algorithm that can produce the maximum CL estimation of the model parameter in the presence of missing data, CLI_CL. Suppose the CLEM algorithm has completed the aath iteration and produced an update ^(a),a=0,1,2,⋯ ^(a),a=0,1,2,·s. At the (a+1)th(a+1)^th iteration, the CL-E step requires calculating the expected value of complete data ℓCL(∣) _CL( ) with respect to the observed data t, the conditional distribution of missing data CLI_CL, and the current estimates ^(a) ^(a), i.e., (∣^(a)) \! ( ^(a) ) =CL∣,^(a)[ℓCL(∣,CL)]=∑k=1KCLk∣k,^(a)[ℓ(∣k,CLk)]. =E_I_CL , ^(a) [\, _CL\! ( ,I_CL )\, ]= _k=1^KE_I_CL^k ^k, ^(a) [\, \! ( ^k,I_CL^k )\, ]. (10) It is worth noting that, in evaluating the Q function in Eq. (10), the proposed CLEM only requires computing likelihood contributions based on subset-specific data kt^k, i.e., ℓ(∣k,CLk) ( ^k,I_CL^k) and then aggregating them across all sub-windows. In contrast, the conventional EM algorithm requires evaluating the likelihood using the full dataset t, i.e., ℓ(∣,) ( ,I), which leads to substantially higher computational complexity. The proposed CLEM algorithm iterates the CL-E and CL-M steps until convergence. • CL-E Step: Given the current update ^(a) ^(a), obtaining the expected CL function Q defined in Eq. (10). • CL-M Step: Maximize (∣^(a))Q\! ( ^(a) ) with respect to to produce an updated (a+1) ^(a+1). Details of the Q-function calculation in the CL-E step and the parameter updates for in the (a+1)th(a+1)^th iteration of the CL-M step are provided in Appendix A. In the proposed CLEM algorithm, the observation window is partitioned into K sub-windows, and a CL is constructed by considering only the EP effects within each sub-window. The choice of K represents a trade-off between statistical accuracy and computational efficiency. If K is too large, each sub-window becomes temporally narrow, causing many cross-sub-window EP effects to be ignored and potentially leading to biased parameter estimation. For example, missing cross-sub-window EP effects tends to underestimate the EP propagation strength parameter αms,ms−1 _m_s,m_s-1 in Eq. 4, as some true triggering contributions are omitted. On the other hand, if K is too small, the CL formulation approaches the full-likelihood formulation, thereby diminishing the computational advantage of CLEM. 2.3.4 Selection of K To systematically determine an optimal K value, we propose a stepwise Friedman testing framework. The Friedman test is a nonparametric group hypothesis test commonly used to test whether the median performance of multiple algorithms or treatments evaluated on the same datasets is equal. It does not assume normality or homoscedasticity of the data (Sheldon et al., 1996). Instead, it relies on within-replication rankings of the performance measures, making it particularly suitable for comparing the performance of the proposed CLEM algorithms with different choices of K (Xia et al., 2025). Specifically, we consider a candidate set =K1=1<K2<⋯<KMK=\K_1=1<K_2<·s<K_M\ and construct the corresponding partition set ℳ=EM,CLEM(K2),…,CLEM(KM)M=\EM,CLEM(K_2),…,CLEM(K_M)\, where CLEM reduces to EM when K=1K=1. Each method is evaluated using a predefined performance metric e.,.e_\.,.\ on the same dataset across R independent replications. In this paper, we use the relative root mean square error discussed in Section 3.1.2 as a performance metric. Let b=1,2,…b=1,2,... denote the step index corresponding to candidate value Kb+1K_b+1. Starting from b=1b=1, we gradually expand the comparison set ℳb=EM,CLEM(K2),…,CLEM(Kb+1)M_b=\EM,CLEM(K_2),…,CLEM(K_b+1)\ to include CLEM with increasingly larger Kb+1K_b+1. At each step b, a Friedman test is performed to examine whether the median performances of the methods in the current comparison set remain statistically indistinguishable. If the null hypothesis of equal medians is not rejected, the comparison set is expanded further. Once the null hypothesis is rejected, the procedure stops, and the optimal K∗K is selected as the largest Kb+1K_b+1 whose performance remains statistically comparable to that of EM. The complete stepwise Friedman testing procedure is summarized in Algorithm 1. Algorithm 1 Stepwise Friedman Test for Selecting Optimal K∗K^* Input: Candidate set =K1=1<K2<⋯<KMK=\K_1=1<K_2<·s<K_M\; method sets ℳ=EM,CLEM(K2),…,CLEM(KM)M=\EM,CLEM(K_2),…,CLEM(K_M)\; the number of repeated estimation runs conducted for each method, i.e., R; performance metric e(⋅,⋅)e_(·,·); significance level α. Output: Selected optimal K∗K such that the median estimation performances of CLEM(K∗K^*) is not significantly different from the median estimation performance of EM algorithm. Methods: for method m∈ℳm do Run method m on the same data for R replications and record performance matrix e.,.e_\.,.\ Stepwise testing.Initialize b←1b← 1, K∗←K1K ← K_1 while true do Construct comparison set ℳb=EM,CLEM(K1),…,CLEM(Kb+1)M_b=\EM,CLEM(K_1),…,CLEM(K_b+1)\ Perform a Friedman test at level α for H0(b):the median estimation performances of all methods in ℳb are equalH_0^(b):\ the median estimation performances of all methods in M_b are equal H1(b):not all medians are equal.H_1^(b):\ not all medians are equal. if H0(b)H_0^(b) is rejected then break // stop immediately else b←b+1b← b+1; K∗←Kb+1K ← K_b+1 // Expand comparison set if b=Mb=M then break return K∗K 2.3.5 Computational Complexity and Statistical Property While the stepwise Friedman test provides a data-driven strategy for selecting K, this choice introduces a trade-off between computational efficiency and statistical accuracy. To fully understand this trade-off, this paper formally examines both the computational complexity and the statistical properties of CLEM, with particular emphasis on how the value of K influences the CLEM performance. Figure 4: Comparing the Computational Loads of EM and that of CLEM. Computational Complexity Analysis. In an EM algorithm, each E-step requires calculating probabilities of the form pmsi(0)p_m_s^i^(0) and pmsi(j)p_m_s^i^(j)’s. This involves tracking the entire event history tms−1j∈st_m_s-1^j _s for every error tmsi∈t_m_s^i , leading to a computational complexity of (N2)O(N^2), where N=||N=|t| and |⋅||·| denotes the cardinality operator, indicating the number of elements in a set. Such quadratic complexity becomes prohibitive when N is large. However, the proposed CLEM algorithm restricts EP to occur only within sub-windows of length d. Each CL-E step requires tracking only the partial history tms−1j∈s−1kt_m_s-1^j ^k_s-1 for each error tmsi∈kt_m_s^i ^k, with k∈[K]k∈[K]. This strategy reduces the complexity to (∑k=1KNk2)O\! ( _k=1^KN_k^2 ), where Nk=|k|N_k=|t^k|, yielding a significant savings when supk|k|≪|| _k|t^k| |t|. A simplified example is shown in Fig. 4 to illustrate the computational cost improvements of the EM and the CLEM algorithms. In this example, there are five (5) errors at stage s−1s-1 (i.e., red dots) and two (2) errors at stage s (i.e., red triangles). Fig. 4 (a) shows the computational cost of the EM algorithm, where every historical error at stage s−1s-1 has a probability to propagate to stage s, as indicated by the dashed red lines, resulting in eight possible EP scenarios, all of which must be evaluated during the E-step of the EM algorithm. In contrast, Fig. 4 (b) presents the computational cost of the CLEM algorithm, where the observation window is divided into three sub-windows. Only EP scenarios within each sub-window are considered, while cross-sub-window scenarios are excluded. As a result, the number of potential EP scenarios is reduced to two, meaning that the CL-E step needs to evaluate only these two scenarios. This reduction directly lowers the number of likelihood evaluations required and, thus, substantially decreases the overall computational burden. This simplified example uses only a few errors for illustration. However, the computational savings become more significant in practice when the number of events is large. Ascent Property. Although the proposed CLEM algorithm employs a composite log-likelihood to approximate the full log-likelihood, it preserves the desirable ascent property. In other words, the composite log-likelihood is guaranteed to be non-decreasing at each CLEM iteration. This monotonicity is important for ensuring numerical stability and supporting convergence of the iterative estimation procedure. Formally, given the observed data t, the composite log-likelihood ℓCL(∣) _CL( ) defined in (8) and the sequence of CLEM parameter estimators ^(a)a=0,1,…\ ^(a)\_a=0,1,... satisfy ℓCL(^(a)∣)≤ℓCL(^(a+1)∣), _CL( ^(a) )≤ _CL( ^(a+1) ), (11) with equality holding if and only if (^(a+1)∣,CL,^(a))=(^(a)∣,CL,^(a))Q( ^(a+1) ,I_CL, ^(a))=Q( ^(a) ,I_CL, ^(a)). Statistical consistency. To study the asymptotic behavior of the proposed CLEM estimator, consider the composite score function associated with the composite log-likelihood in (8), UK,d()=∑k=1KUk,d(),U_K,d( )= _k=1^KU_k,d( ), (12) where Uk,d()=∂log⁡f(k∣),U_k,d( )= ∂ f(t^k ), (13) f(k∣)f(t^k ) denotes the likelihood contribution from the kkth sub-window, d represents the sub-window length. Let ^K,d _K,d denote the CLEM estimator defined as the solution to UK,d()=0U_K,d( )=0 and 0 _0 denote the true parameter vector. Assume that the parameter space of is compact, and the following conditions are satisfied, (i) The expected composite score satisfies [UK,d()]=0E[U_K,d( )]=0 if and only if =d = _d, where d _d denotes the pseudo-true parameter associated with the composite likelihood approximation; (i) There exists a nonnegative function τ(⋅)τ(·) such that ‖f(1)(k∣)f(k∣)‖<τ(nk),\| f^(1)(t^k )f(t^k )\|<τ(n^k), (14) ‖f(2)(k∣)f(k∣)‖<τ(nk),\| f^(2)(t^k )f(t^k )\|<τ(n^k), (15) where [τ(nk)2]<∞E[τ(n^k)^2]<∞ and nk=|k|n^k=|t^k| denotes the number of observed events in the kkth sub-window. Then, the CLEM estimator ^K,d _K,d converges in probability to d _d as the number of sub-windows K→∞K→∞. Furthermore, if [UK,d()]→0E[U_K,d( )]→ 0 as sub-window length d→∞d→∞ only when =0 = _0, then d→0 _d→ _0 as d→∞d→∞. The first result implies that, for a fixed d, increasing K yields convergence of the estimator to a pseudo-true parameter d _d, which maximizes the composite likelihood. In general, d≠0 _d≠ _0 because restricting EP to localized temporal neighborhoods introduces an approximation to the full likelihood. The second result indicates that as the sub-window length d increases, the locality constraint weakens and the composite likelihood more closely approximates the full likelihood. Consequently, d→0 _d→ _0. These results show that the proposed CLEM algorithm is statistically consistent when the effective sample size increases, that is, as the number of sub-windows K→∞K→∞, the sub-window length d→∞d→∞, and consequently the total observation horizon T=K⋅d→∞T=K· d→∞. The proof of ascent property and statistical consistency are provided in the Appendix B. 2.4 Model Evaluation The performance of the proposed model is evaluated in terms of estimation accuracy, prediction accuracy, and computational efficiency. Estimation accuracy. The relative root mean square error (RRMSE) (Min et al., 2022) is used to measure the accuracy of parameter estimation. It is defined as RRMSE=1P∑p=1P(θp−θ^pθp)2,RRMSE= 1P _p=1^P ( _p- θ_p _p )^2, (16) where θp∈ _p∈ and θ^p∈ θ_p∈ denote the true and estimated values of the ppth parameter, respectively, p=1,…,Pp=1,...,P, and P is the total number of parameters. The mean RRMSE (MRRMSE) across R replications is then computed as MRRMSE=1R∑r=1RRMSEr.MRRMSE= 1R _r=1^RRRMSE_r. (17) Prediction Accuracy. The mean absolute error (MAE) is adopted to assess prediction accuracy. Suppose error events occurring in module msm_s are observed up to time τ. Using these events, parameters can be estimated via the proposed method in Section 2.3. Substituting the estimates into Eqs. (2)–(4) yields the predictive intensity λ^ms(t) λ_m_s(t). Based on this, the predicted number of error events during a future interval [τ,τ+Δτ)[τ,τ+ τ) is given by (Van Lieshout, 2012): N^ms|[τ,τ+Δτ)=∫τ+Δτλ^ms(t)t. N_m_s|[τ,τ+ τ)= _τ^τ+ τ λ_m_s(t)\,dt. (18) Let Nms|[τ,τ+Δτ)N_m_s|[τ,τ+ τ) denote the actual number of error events observed in the same interval. For replication r (r=1,…,Rr=1,…,R), denote the predicted and actual number of error events as N^ms|[τ,τ+Δτ)r N^r_m_s|[τ,τ+ τ) and Nms|[τ,τ+Δτ)rN^r_m_s|[τ,τ+ τ), respectively. The MAE is then defined as MAEms|[τ,τ+Δτ)=1R∑r=1R|Nms|[τ,τ+Δτ)r−N^ms|[τ,τ+Δτ)r|,MAE_m_s|[τ,τ+ τ)= 1R _r=1^R |N^r_m_s|[τ,τ+ τ)- N^r_m_s|[τ,τ+ τ) |, (19) which evaluates prediction performance in module msm_s during [τ,τ+Δτ)[τ,τ+ τ). Computational Efficiency. Computational efficiency is measured by the average computation time across R replications. To summarize, the methodology framework is illustrated in Fig. 5. Figure 5: An Illustration of the Proposed Methodological Framework 3 Case Study This section demonstrates the effectiveness and computational efficiency of the proposed CLEM-based EP modeling framework for AI systems reliability analysis. Section 3.1 presents a numerical simulation showing that, with an appropriate choice of K, the proposed CLEM algorithm substantially reduces computational time while maintaining estimation accuracy comparable to that of the standard EM algorithm. Section 3.2 reports the results from a physics-based AV simulation, demonstrating that the proposed EP modeling framework improves reliability prediction performance relative to benchmark models. 3.1 Numerical Case Study 3.1.1 Experiment Setup In this case study, we simulate error event data from two consecutive stages (i.e., Stage 11 and Stage 22), using the thinning algorithm proposed in (Lewis and Shedler, 1979). Specifically, we consider two modules at Stage 11 (i.e., m11m_1_1 and m21m_2_1) and one module at Stage 22 (i.e., m12m_1_2), as shown in Figure 6. For m11m_1_1 and m21m_2_1 at stage 1, the error events are simulated from a univariate homogeneous Poisson process with intensity λ11=λ21=0.2 _1_1= _2_1=0.2. For the m12m_1_2 at stage 22, the error events are simulated with intensity, λ12(t)=λ120+∑m=12∑j:tm1j<tα12,m1⋅β12,m1⋅exp⁡−β12,m1⋅(t−tm1j), _1_2(t)= _1_2^0+Σ _m=1^2 _j:t_m_1^j<t _1_2,m_1· _1_2,m_1· \- _1_2,m_1·(t-t_m_1^j)\, (20) where we set the parameters as follows: λ120=0.5 _1_2^0=0.5, and α12,11=α12,21=β12,11=β12,21=0.3 _1_2,1_1= _1_2,2_1= _1_2,1_1= _1_2,2_1=0.3. These values are chosen to roughly align with the error rates reported for each module (Pan et al., 2024). To simulate datasets of varying sizes, we set the simulation window length T to 500, 1000, 2500, and 5000. For each parameter configuration, we repeatedly generated three types of error events from three distinct modules for 100 times (i.e., R=100R=100). Model estimation was performed using both the EM and the CLEM algorithms, with the number of sub-windows K set to 2, 5, 10, 20, 50, 100, 250, and 500. Figure 6: Numerical Simulation Scenario 3.1.2 Result - Estimation Accuracy Table 3 presents the means and standard deviations of 100 repeated parameter estimates using the EM algorithm and the CLEM algorithm with different choices of K. The results show that the mean of CLEM estimates with K=2,5,10,20,50,and 100K=2,5,10,20,50,and 100, as well as those from the EM algorithm, closely align with the true parameter values. However, noticeable bias emerges when K=250,and 500K=250,and 500, which indicates that excessively large values of K may reduce estimation accuracy. This reduction occurs because, under a fixed simulation time window T, increasing K divides the window into many smaller sub-windows. The CLEM algorithm then approximates the full likelihood using CLs constructed within each small sub-windows, potentially ignoring important temporal dependencies and resulting in information loss. The results shown in Table 3 are based on a simulation an overall AI system operation time window of T=5,000T=5,000. Table 3: Parameter Estimation Result Based on EM and CLEM Algorithms (T=5,000) Method KK λ^ms0 λ_m_s^0 α^ms,1s−1 α_m_s,1_s-1 α^ms,2s−1 α_m_s,2_s-1 β^ms,2s−1 β_m_s,2_s-1 β^ms,2s−1 β_m_s,2_s-1 True - 0.5 0.3 0.3 0.3 0.3 EM - 0.481§ (0.015)† 0.295 (0.047) 0.294 (0.037) 0.287 (0.040) 0.289 (0.035) CLEM 2 0.480 (0.015) 0.295 (0.044) 0.295 (0.039) 0.286 (0.041) 0.289 (0.036) 5 0.480 (0.015) 0.297 (0.045) 0.298 (0.042) 0.284 (0.037) 0.287 (0.035) 10 0.480 (0.016) 0.298 (0.042) 0.298 (0.039) 0.284 (0.039) 0.286 (0.034) 20 0.482 (0.015) 0.295 (0.045) 0.295 (0.041) 0.285 (0.039) 0.288 (0.032) 50 0.487 (0.015) 0.287 (0.045) 0.288 (0.043) 0.289 (0.039) 0.289 (0.034) 100 0.494 (0.014) 0.281 (0.041) 0.282 (0.039) 0.290 (0.040) 0.291 (0.032) 250 0.512 (0.013) 0.261 (0.043) 0.260 (0.038) 0.290 (0.041) 0.290 (0.038) 500 0.533 (0.013) 0.251 (0.041) 0.250 (0.039) 0.261 (0.039) 0.262 (0.037) • §: Mean value across 100 estimates. • †: Values in parentheses represent the standard deviation of the 100 estimates. Table 3 provides the estimation results of each individual parameter. To evaluate the overall estimation accuracy of EM and CLEM algorithms, we use the MRRMSE metric defined in Eqs.(16)-(17). Fig. 7 presents the MRRMSE values for the EM and the CLEM algorithms with different choices of K across different lengths of the simulation time window T. The results show that when K=2,5,10K=2,5,10, and 2020, the MRRMSE values of the CLEM algorithm are comparable to those of the EM algorithm across all values of T, indicating that dividing T into a small number of sub-windows in CLEM does not compromise estimation accuracy. In addition, as T increases, the CLEM algorithm can tolerate a larger number of sub-windows without a loss in accuracy. For instance, when T=5,000T=5,000, the MRRMSE is still comparable with the EM algorithm even when K increases to 50 or 100. However, when K becomes excessively large (e.g., K=250K=250 or 500500), the MRRMSE of the CLEM algorithm increases substantially, which is due to the information loss incurred when the full likelihood is approximated by CLs over a large number of small sub-windows. Figure 7: MRRMSE Values of EM and CLEM Algorithms 3.1.3 Result - Computational Efficiency Fig. 8 presents the computational time required by the EM and the CLEM algorithms for different K across different lengths of simulation time window T. There are several interesting findings. First, as expected, a larger T results in longer computational times for both methods. This is because a larger T leads to a larger dataset size, increasing the computational burden. This result highlights the importance of developing computationally efficient algorithms for large-scale data settings and longer AI system operations. Second, compared to the EM algorithm, the CLEM algorithm demonstrates a significant reduction in computation time, especially as K and T increase. This improvement arises because, by dividing the data into smaller segments, the CLEM algorithm reduces the computational load within each sub-window. Third, when the dataset size is small (e.g., T=500T=500), the computational advantage of the CLEM over the EM becomes less significant. Figure 8: Computational Time of EM and CLEM Algorithms 3.1.4 Optimal K Selection The results in Sections 3.1.2 and 3.1.3 demonstrate that the choice of K influences both estimation accuracy and computational efficiency of the CLEM algorithm. To further illustrate this trade-off, Fig. 9 presents the MRRMSE values and computational times of the EM and the CLEM algorithms with varying K. In each subplot, the red dashed line indicates the MRRMSE, with the red interval representing the 95% confidence interval (CI) of RRMSE values across 100 replications. The blue solid line represents the corresponding mean computational time. This figure clearly illustrates that increasing K yields substantial reductions in computational time, but at the expense of increased MRRMSE values. These findings highlight the importance of carefully selecting an appropriate K to avoid compromising estimation accuracy for the sake of computational efficiency. Figure 9: Comparison of MRRMSE and Computation Time The stepwise Friedman test introduced in Section 2.3.4 is employed to determine the optimal K∗K^* for the CLEM algorithm. In this case study, the candidate set is =2,5,10,20,50,100,250,500K=\2,5,10,20,50,\\ 100,250,500\, and the corresponding model set is ℳ=EM,CLEM(2),CLEM(5),CLEM(10),CLEM(20),CLEM(50),CLEM(100),CLEM(250),CLEM(500)M=\EM,CLEM(2),CLEM(5),CLEM(10),\\ CLEM(20),CLEM(50),CLEM(100),CLEM(250),CLEM(500)\. The RRMSE is used as the evaluation metric. The objective is to select the largest K∗∈K^* such that the median RRMSE of CLEM(K∗)CLEM(K^*), denoted MCLEM(K∗)M_CLEM(K^*), is not significantly different from the median RRMSE of EM, denoted MEMM_EM. We begin with step 1 and the comparison set is ℳ1=EM,CLEM(2),CLEM(5)M_1=\EM,CLEM(2),CLEM(5)\. The hypothesis is formulated as: H0(1): H_0^(1): MEM=MCLEM(2)=MCLEM(5); \;M_EM=M_CLEM(2)=M_CLEM(5); H1(1): H_1^(1): Not all medians are equal. \;Not all medians are equal. The results, shown in Table 4, fail to reject H0(1)H_0^(1) with p=0.181p=0.181, indicating no significant difference among MEMM_EM, MCLEM(2)M_CLEM(2), and MCLEM(5)M_CLEM(5). In step 2, the comparison set is extended to ℳ2=EM,CLEM(2),CLEM(5),CLEM(10)M_2=\EM,CLEM(2),CLEM(5),CLEM(10)\. Again, the null hypothesis H0(2)H_0^(2) is not rejected with p=0.260p=0.260. However, beginning with step 3, which includes CLEM(20) in the comparison set, the null hypothesis H0(3)H_0^(3) is rejected with p=0.026p=0.026, indicating that CLEM(20) introduces significant difference among MEMM_EM, MCLEM(2)M_CLEM(2), CLEM(5), CLEM(10), and MCLEM(20)M_CLEM(20). Based on these results, we select K∗=10K^*=10 as the optimal number of sub-windows. Table 4 reports the stepwise Friedman test results for T=500T=500, while additional results for T=1000T=1000, 25002500, and 50005000 are provided in the Appendix C. Using the same selection criterion, we obtain the following optimal choices: K∗=20K^*=20 for T=1000T=1000, K∗=50K^*=50 for T=2500T=2500, and K∗=100K^*=100 for T=5000T=5000. A noteworthy finding is that the optimal sub-window length, defined as T/K∗T/K^*, remains consistently equal to 50 across all examined values of T. This consistency suggests that the performance of the CLEM algorithm depends more fundamentally on the sub-window length than on the total simulation time T. Table 4: Stepwise Friedman Test Results for Selecting the Optimal K (T=500T=500) Hypothesis p-value Conclusion H0(1):MEM=MCLEM(2)=MCLEM(5)H_0^(1):M_EM=M_CLEM(2)=M_CLEM(5) 0.181 No reject H01H_0^1 H1(1):Not all medians are equalH_1^(1):Not all medians are equal H0(2):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)H_0^(2):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10) 0.226 No reject H02H_0^2 H1(2):Not all medians are equalH_1^(2):Not all medians are equal H0(3):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)=MCLEM(20)H_0^(3):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10)=M_CLEM(20) 0.026 Reject H03H_0^3a H1(3):Not all medians are equalH_1^(3):Not all medians are equal a Rejection decision is made if p-value is smaller that 0.05. 3.2 Physics-based Simulation Case Study This case study is enabled by a physics-based simulation platform equipped with an error injector (EI). The EI provides the capability to inject errors into any functional module msm_s at user-defined timestamps tmserrt^err_m_s, with each injection occurring under a specified probability ptmserrp_t^err_m_s. 3.2.1 Experiment Setup This case study investigates EP between the object detection stage and the object localization stage. As illustrated in Fig. 10, the object detection stage consists of two modules, i.e., the 2-D detection module and the 3-D detection module, while the object localization stage is composed of a single module, the object localization module. In this study, errors are injected into both modules of the object detection stage, and their subsequent propagation into the object localization stage is examined. Figure 10: Physics-Based Simulation Scenario We consider two EI settings, as defined in Section 2.1 and summarized in Table 5. Setting I consists of four scenarios, in which errors are continuously injected over the entire simulation interval, i.e., tmserr∈[0,200)t^err_m_s∈[0,200), with varying injection probabilities. These scenarios are designed to emulate the performance of the 2-D and 3-D detection modules under persistent weather conditions: clear (Sce. 1), snowy (Sce. 2), rainy (Sce. 3), and foggy (Sce. 4). Setting I, by contrast, introduces errors intermittently, with injections given only during the time intervals [50,100)[50,100) and [150,200)[150,200). This setting reflects intermittent driving conditions, where errors occur sporadically rather than persistently. Corresponding injection probabilities are applied to simulate the performance of the 2-D and 3-D detection modules under intermittent snow (Sce. 5), rain (Sce. 6), and fog (Sce. 7). The EI probabilities, ptmserrp_t^err_m_s, listed in Table 5, are chosen based on empirical findings reported in (Hassaballah et al., 2020; Kilic et al., 2021). These values approximate the error occurrence rates of 2-D and 3-D detection modules when operating under real-world adverse weather conditions. In each scenario, error-event data are simulated at a sampling frequency of 20 Hz for a total duration of 200 seconds. Three types of error events are collected, i.e., 2-D miss-detection errors, 3-D miss-detection errors, and miss-localization errors. The simulation is repeated thirty times under each scenario. Table 5: Error Injection Probability in Object Detection Stage Setting I: Persistent EI Setting I: Intermittent EI Sce. 1 Sce. 2 Sce. 3 Sce. 4 Sce. 5 Sce. 6 Sce. 7 Clear Snow Rain Foggy Snow Rain Foggy 2-D detection 0.00 0.40 0.55 0.60 0.40 0.55 0.60 3-D detection 0.00 0.25 0.50 0.60 0.25 0.50 0.60 Table 6: Summary of Comparison Methods Benchmarks λms(t;) _m_s(t; θ) Parameters NHPP Musa-Okumoto (MO) (Musa and Okumoto, 1984) θ2(1+θ2θ1t)−1 _2(1+ _2 _1t)^-1 =(θ1,θ2)⊤ θ=( _1, _2) θ1>0,θ2>0 _1>0, _2>0 Gompertz (Ohishi et al., 2009) θ1θ2tθ3θ2tlog⁡(θ2)log⁡(θ3) _1 _2^t _3^θ^t_2 ( _2) ( _3) =(θ1,θ2,θ3)⊤ θ=( _1, _2, _3) θ1>0,0<θ2,θ3<1 _1>0,0< _2, _3<1 HPP Poisson (Sahinoglu, 1992) θ1 _1 =(θ1)⊤ θ=( _1) θ1>0 _1>0 With the simulated data, we evaluate the prediction accuracy of the proposed model relative to the benchmark reliability models summarized in Table 6. Specifically, we estimate the intensity function of the proposed model and the benchmarks based on data collected from the first 180 seconds (i.e., τ=180τ=180) and subsequently used the estimated intensity function to predict the number of errors in the following interval [180,200)[180,200) (i.e., Δτ=20 τ=20). This procedure is repeated thirty times (i.e., R=30R=30). The mean absolute error (MAE) of each model can be calculated by Eqs. (18)-(19). 3.2.2 Result - Prediction Accuracy Fig. 11 shows the MAEs of the proposed model and benchmarks across seven scenarios in Table 5. Overall, the proposed method (shown as the red line) consistently achieves the lowest MAEs compared with the benchmarks. In Setting I (Sces. 1–4), the proposed model and the benchmarks exhibit comparable prediction accuracy. This similarity can be attributed to the nature of Setting I, where errors are injected continuously throughout the simulation interval. Under these persistent weather conditions, the error occurrence remains relatively stable, allowing the benchmark models to capture the underlying pattern effectively. In contrast, in Setting I (Sces. 5–7), the proposed method demonstrates a significant advantage, with MAEs significantly lower than those of the benchmarks. This improvement indicates that by explicitly considering the EP, the proposed model can more accurately predict error occurrences under intermittent EI conditions, where errors arise sporadically and pose greater challenges for conventional benchmark models. This property makes the proposed CLEM-based modeling approach uniquely advantageous in analyzing the reliability of AI systems, which are applied in perception and decision-making in uncontrolled environments. Figure 11: Prediction Accuracy Comparison Based on Physics-Based Simulation 4 Conclusion AI system reliability modeling is critical for the development and safe deployment of autonomous technologies. This paper addresses the challenges of data, modeling, and estimation in AI systems reliability analysis, with particular emphasis on error propagation modeling. Leveraging a physics-based simulation platform, we develop a justifiable error injector to systematically generate error event data from AI systems in autonomous vehicles. Based on these simulated data, we propose a generic intensity decomposition framework that explicitly distinguishes between primary errors and propagated errors. To efficiently estimate latent error propagation, we design a CLEM algorithm that is computationally scalable and supported by theoretical guarantees. A numerical case study demonstrates that the CLEM algorithm significantly reduces computation time while maintaining estimation accuracy comparable to the standard EM algorithm. Furthermore, a physics-based simulation case study illustrates that explicitly modeling error propagation enables the proposed framework to achieve higher prediction accuracy than benchmark reliability models. While this study advances reliability modeling of AI systems by explicitly considering the error propagation, several future research directions remain. First, extending the proposed framework to real-world operational data beyond simulation will be critical for validating its practical applicability and robustness across diverse driving environments. Second, integrating uncertainty quantification and adaptive learning could enhance the model’s ability to update reliability assessments as new data are collected. Finally, extending the methodology to other safety-critical domains, such as healthcare, energy systems, and industrial automation, offers the potential to generalize the framework into a broader class of AI system reliability models. Appendix A Appendix: Details of CLEM algorithm CLEM algorithm Given an initial parameter estimate ^(0) ^(0), the parameter is updated iteratively by alternating between the CL-E step and the CL-M step. Let a=0,1,2,…a=0,1,2,… denote the iteration index. The (a+1)th(a+1)^th iteration then proceeds according to the following steps: • CL-E step: Given ^(a) ^(a), the CL-E step requires calculating expected value of complete data ℓCL(∣) _CL( ) with respect to the observed data t, the conditional distribution of missing data CLI_CL, and the current estimates ^(a) ^(a), i.e., (∣^(a)) \! ( ^(a) ) =CL∣,^(a)[ℓCL(∣,CL)] =E_I_CL , ^(a) [\, _CL\! ( ,I_CL )\, ] (A.1) =∑k=1K∑s=1S∑ms=1Ms∑i=1nmsk[pmsi(0)logλms0 = _k=1^K _s=1^S _m_s=1^M_s \ _i=1^n_m_s^k [\,p_m_s^i^(0)\, _m_s^0 +∑j:tms−1j∈s−1ktms−1j<tmsipmsi(j)log⁡λmsp(tmsi−tms−1j) + _ subarraycj:\,t_m_s-1^j _s-1^k\\ t_m_s-1^j<t_m_s^i subarrayp_m_s^i^(j)\, _m_s^p\! (t_m_s^i-t_m_s-1^j ) −∫kd−dkdλms(t)dt], - _kd-d^kd _m_s(t)\,dt ] \, where the probabilities pmsi(0)p_m_s^i^(0) and pmsi(j)p_m_s^i^(j)’s explicitly represent if an event tmsit_m_s^i is a primary error or a propagated error that is triggered by event tms−1j∈s−1kt_m_s-1^j _s-1^k. They can be linked to the primary intensity and propagated intensity defined in Eqs.(3) -(4) and defined as, pmsi(0)=λms0λms(tmsi),pmsi(j)=λms,ms−1p(tmsi−tms−1j)λms(tmsi).p_m_s^i^(0)= _m_s^0 _m_s(t_m_s^i),\,\,\,\,\,\,\,\,\,\,p_m_s^i^(j)= _m_s,m_s-1^p(t_m_s^i-t_m_s-1^j) _m_s(t_m_s^i). (A.2) • CL-M step: Given probabilities in Eq. (A.2) and ^(a) ^(a), ^(a+1) ^(a+1) can be updated by maximizing (∣,CL,^(a))Q\! ( ,I_CL, ^(a) ) in Eq.(A.1), i.e., ^(a+1)=arg⁡max≥0⁡(∣,CL,^(a)). ^(a+1)= _ ≥ 0\,Q\! ( ,I_CL, ^(a) ). (A.3) Specifically, each parameter is updated by the stochastic gradient descent method as below: λ^ms0(a+1)=∑k=1K∑i=1nmskpmsiT, λ_m_s^0(a+1)= _k=1^K _i=1^n_m_s^kp_m_s^iT, (A.4) α^ms,ms−1(a+1)=∑k=1K∑j=1nms−1k∑i=1nmskpmsi,ms−1j∑k=1K∑j=1nms−1k(1−exp⁡(−βms,ms−1(a)(kd−tms−1j))), α_m_s,m_s-1^(a+1)= _k=1^K _j=1^n_m_s-1^k _i=1^n_m_s^kp_m_s^i,m_s-1^j _k=1^K _j=1^n_m_s-1^k(1- (- _m_s,m_s-1^(a)(kd-t_m_s-1^j))), (A.5) β^ms,ms−1(a+1)=∑k=1K∑j=1nms−1k∑i=1nmskpmsi,ms−1jA+B, β_m_s,m_s-1^(a+1)= _k=1^K _j=1^n_m_s-1^k _i=1^n_m_s^kp_m_s^i,m_s-1^jA+B, (A.6) where A=∑k=1K∑j=1nms−1k∑i=1nmskpmsi,ms−1j⋅(tmsi−tms−1j)A= _k=1^K _j=1^n_m_s-1^k _i=1^n_m_s^kp_m_s^i,m_s-1^j·(t_m_s^i-t_m_s-1^j) and B=∑k=1K∑j=1nms−1kαms,ms−1(a+1)⋅(kd−tms−1j)⋅exp⁡(−βms,ms−1⋅(kd−tms−1j))B= _k=1^K _j=1^n_m_s-1^k _m_s,m_s-1^(a+1)·(kd-t_m_s-1^j)· (- _m_s,m_s-1·(kd-t_m_s-1^j)). Appendix B Appendix: Proof of Ascent Property and Statistical Consistency B.1 Proof of Ascent Property Ascent Property. Given the observed data t, the composite log-likelihood ℓCL(∣) _CL( ) defined in (8) and the sequence of CLEM estimators ^(a),a=1,2,… ^(a),\,a=1,2,…, satisfy ℓCL(^(a−1)∣)≤ℓCL(^(a)∣), _CL( ^(a-1) )≤ _CL( ^(a) ), (B.1.1) with equality if and only if (^(a)∣,CL,^(a))=(^(a−1)∣,CL,^(a))Q( ^(a) ,I_CL, ^(a))=Q( ^(a-1) ,I_CL, ^(a)). Let Θ denote the parameter space for . For each sub-window k=1,…,Kk=1,…,K, define the conditional density of the collection of latency indicators, CLk=Imsik:s=1,…,S;ms=1,…,Ms;i=1,…,nmskI_CL^k=\I_m_s^i^k:s=1,…,S;m_s=1,…,M_s;i=1,…,n_m_s^k\ given kt^k as Pr(CLk∣k,)Pr(I_CL^k ^k, ). In the CLEM framework, the expectation step (CL-E step) defines (∣′) ( ^ ) =∑k=1KCLk∣k,′[log⁡Pr(k,CLk∣)] = _k=1^KE_I_CL^k ^k, ^ [ (t^k,I_CL^k ) ] (B.1.2) =∑k=1K∑CLklog⁡Pr(k,CLk∣)Pr(CLk∣k,′). = _k=1^K _I_CL^k (t^k,I_CL^k )Pr(I_CL^k ^k, ^ ). Analogously, define the composite H-function as H(∣′)=∑k=1K∑CLklog⁡Pr(CLk∣k,)Pr(CLk∣k,′).H( ^ )= _k=1^K _I_CL^k (I_CL^k ^k, )Pr(I_CL^k ^k, ^ ). (B.1.3) The following two lemmas are critical in establishing Theorem 1. Lemma 1. For any (,′)∈Θ×Θ( , )∈ × , (∣′)−H(∣′)=ℓCL(∣).Q( )-H( )= _CL( ). Proof. (∣′)−H(∣′) ( )-H( ) =∑k=1K∑CLklog⁡Pr(k,CLk∣)Pr(CLk∣k,)Pr(CLk∣k,′) = _k=1^K _I_CL^k Pr(t^k,I_CL^k )Pr(I_CL^k ^k, )Pr(I_CL^k ^k, ) (B.1.4) =∑k=1K∑CLklog⁡Pr(k∣)Pr(CLk∣k,′) = _k=1^K _I_CL^k (t^k )Pr(I_CL^k ^k, ) =∑k=1Klog⁡Pr(k∣) = _k=1^K (t^k ) =ℓCL(∣). = _CL( ). Lemma 2. For any (,′)∈Θ×Θ( , )∈ × , H(′∣)≤H(∣).H( )≤ H( ). Proof. By Jensen’s inequality and the concavity of log , H(′∣)−H(∣) H( )-H( ) =∑k=1K∑CLklog⁡Pr(CLk∣k,′)Pr(CLk∣k,)Pr(CLk∣k,) = _k=1^K _I_CL^k Pr(I_CL^k ^k, )Pr(I_CL^k ^k, )Pr(I_CL^k ^k, ) (B.1.5) ≤log∑k=1KPr(CLk∣k,′)=0, ≤ _k=1^KPr(I_CL^k ^k, )=0, which implies H(′∣)≤H(∣)H( )≤ H( ). Proof of Theorem 1. Given the CLEM sequence ^(a)=arg⁡max⁡(∣^(a−1)) ^(a)= _ Q( ^(a-1)), for a=1,2,…a=1,2,…, we use Lemma 1 to write: ℓCL(^(a)∣)=(^(a)∣^(a−1))−H(^(a)∣^(a−1)). _CL( ^(a) )=Q( ^(a) ^(a-1))-H( ^(a) ^(a-1)). Since ^(a) ^(a) maximizes (∣^(a−1))Q( ^(a-1)), we have (^(a)∣^(a−1))≥(^(a−1)∣^(a−1)).Q( ^(a) ^(a-1)) ( ^(a-1) ^(a-1)). Meanwhile, by Lemma 2, H(^(a)∣^(a−1))≤H(^(a−1)∣^(a−1)).H( ^(a) ^(a-1))≤ H( ^(a-1) ^(a-1)). Combining these two inequalities yields ℓCL(^(a)∣) _CL( ^(a) ) =(^(a)∣^(a−1))−H(^(a)∣^(a−1)) =Q( ^(a) ^(a-1))-H( ^(a) ^(a-1)) (B.1.6) ≥(^(a−1)∣^(a−1))−H(^(a−1)∣^(a−1)) ( ^(a-1) ^(a-1))-H( ^(a-1) ^(a-1)) =ℓCL(^(a−1)∣), = _CL( ^(a-1) ), which establishes the monotone ascent property in (B.1.1). Equality holds if and only if (^(a)∣^(a−1))=(^(a−1)∣^(a−1))Q( ^(a) ^(a-1))=Q( ^(a-1) ^(a-1)). B.2 Proof of Statistical Consistency Statistical Consistency. Suppose that the parameter space Θ is compact and the following conditions hold: (1) [UK,d()]=0E[U_K,d( )]=0 if and only if =d = _d; (2) There exists a nonnegative function τ(⋅)τ(·) such that ‖f(1)(k∣)f(k∣)‖<τ(nk) \| f^(1)(t^k )f(t^k ) \|<τ(n^k) (B.2.1) ‖f(2)(k∣)f(k∣)‖<τ(nk) \| f^(2)(t^k )f(t^k ) \|<τ(n^k) (B.2.2) where [τ(nk)2]<∞E[τ(n^k)^2]<∞ and nk=|k|n^k=|t^k| denotes the number of events in the kkth sub-window. Then ^K,d _K,d converges in probability to d _d as K→∞K→∞. Furthermore, if [UK,d()]→0E[U_K,d( )]→ 0 as d→∞d→∞ only when =0 = _0, then d→0 _d→ _0 as d→∞d→∞. To establish Theorem 2, we build upon the following lemmas. Lemma 3 (Theorem 3.1 in (Crowder, 1986)). Let ^K,d _K,d be the solution to UK,d()=0U_K,d( )=0. Define Bϵ=‖−d‖<ϵB_ε=\\| - _d\|<ε\ for some ϵ>0ε>0. Then P[^K,d∈Bϵ]→1P[ _K,d∈ B_ε]→ 1 if the following conditions hold: (A1) [UK,d()]→E[U_K,d( )]→ 0 only at =d = _d; (A2) infΘ−Bϵ‖[UK,d()]‖≥Cϵ _ -B_ε\|E[U_K,d( )]\|≥ C_ε for some Cϵ>0C_ε>0; (A3) supΘ−Bϵ‖UK,d()−[UK,d()]‖→0 _ -B_ε\|U_K,d( )-E[U_K,d( )]\|→ 0 in probability. Lemma 4. If [UK,d()]E[U_K,d( )] is continuous in , then (A1) and (A2) in Lemma 3 are satisfied. Proof. Since [UK,d()]E[U_K,d( )] is continuous and equals zero only at =d = _d, which implies (A1). Continuity also implies boundedness of [UK,d()]E[U_K,d( )] on the compact space Θ . Hence, there exists Cϵ>0C_ε>0 such that infΘ−Bϵ‖[UK,d()]‖≥infΘ‖[UK,d()]‖≥Cϵ _ -B_ε\|E[U_K,d( )]\|≥ _ \|E[U_K,d( )]\|≥ C_ε, verifying (A2). Next, we prove the continuity of [UK,d()]E[U_K,d( )] in Θ through Lemmas 5 and 6. Lemma 5. If there exists a nonnegative function τ(⋅)τ(·) satisfying ‖f(1)(k|)f(k|)‖<τ(nk),‖f(2)(k|)f(k|)‖<τ(nk),[τ(nk)2]<∞, \| f^(1)(t^k| )f(t^k| ) \|<τ(n^k), \| f^(2)(t^k| )f(t^k| ) \|<τ(n^k), [τ(n^k)^2]<∞, then [UK,d′()]E[U _K,d( )] is bounded. Proof. Since UK,d′()=∑k=1K[f(2)(k|)f(k|)−(f(1)(k|)f(k|))2],U _K,d( )= _k=1^K [ f^(2)(t^k| )f(t^k| )- ( f^(1)(t^k| )f(t^k| ) )^2 ], we have [UK,d′()] [U _K,d( )] =[∑k=1K(f(2)(k|)f(k|)−(f(1)(k|)f(k|))2)] =E[ _k=1^K( f^(2)(t^k| )f(t^k| )-( f^(1)(t^k| )f(t^k| ))^2)] (B.2.3) ≤∑k=1K[‖f(2)(k|)f(k|)‖]+∑k=1K[‖f(1)(k|)f(k|)‖2] ≤ _k=1^KE[\| f^(2)(t^k| )f(t^k| )\|]+ _k=1^KE[\| f^(1)(t^k| )f(t^k| )\|^2] ≤∑k=1K[τ(nk)]+∑k=1K[τ(nk)2] ≤ _k=1^KE[τ(n^k)]+ _k=1^KE[τ(n^k)^2] <∞, <∞, Thus, [UK,d′()]E[U _K,d( )] is bounded and Lemma 5 is proofed. Lemma 6. If UK,d()U_K,d( ) is continuous and differentiable on the compact set Θ , and [UK,d′()]E[U _K,d( )] is bounded, then [UK,d()]E[U_K,d( )] is continuous on Θ . Proof. By the Mean Value Theorem, for any δ, there exists ~∈(,+) ∈( , + δ) such that [UK,d(+)−UK,d()]=[UK,d′(~)]⋅‖.E[U_K,d( + δ)-U_K,d( )]=E[U _K,d( )]·\| δ\|. Given ‖[UK,d′()]‖<M\|E[U _K,d( )]\|<M, for any ϵ∗>0ε^*>0, choose δ∗=ϵ∗/Mδ^*=ε^*/M. Then, whenever ‖−∗‖<δ∗\| - ^*\|<δ^*, ‖[UK,d()]−[UK,d(∗)]‖<ϵ∗.\|E[U_K,d( )]-E[U_K,d( ^*)]\|<ε^*. Hence, [UK,d()]E[U_K,d( )] is continuous. Having established continuity, we have verified (A1) and (A2) in Lemma 3. Next, we verify (A3) using Lemmas 7–9. Lemma 7((Guan, 2006)). If for any ϵ>0,η>0ε>0,η>0, there exist δ>0δ>0 and K′<∞K <∞ such that the following two conditions are satisfied for K>K′K>K , we can verify (A3) in Lemma 3. (C1) sup‖−d‖<δ‖[UK,d(1)]−[UK,d(2)]‖<ϵ _\| - _d\|<δ\|E[U_K,d( _1)]-E[U_K,d( _2)]\|<ε, (C2) P[sup‖−d‖<δ∥UK,d(1)−UK,d(2)]∥]<ηP[ _\| - _d\|<δ\|U_K,d( _1)-U_K,d( _2)]\|]<η. We have proofed that [UK,d()]E[U_K,d( )] is continuous on Θ in Lemma 6, which directly implies (C1) in Lemma 7. Thus to verify (A3) in Lemma 3, we only need to verify (C2) in Lemma 7. Now, we are going to verify (C2) by Lemma 8 and Lemma 9. Lemma 8. If there exists a nonnegative function τ(⋅)τ(·) such that ‖f(1)(k|)f(k|)‖<τ(nk),‖f(2)(k|)f(k|)‖<τ(nk)\| f^(1)(t^k| )f(t^k| )\|<τ(n^k),\\ \| f^(2)(t^k| )f(t^k| )\|<τ(n^k), and [τ(nk)2]E[τ(n^k)^2] is bounded, then for any η, there exists Mη<∞M_η<∞ such that Pr[sup‖UK,d′()‖>Mη]<ηPr[ _ \|U _K,d( )\|>M_η]<η. Proof: We have ‖UK,d′()‖ \|U _K,d( )\| ≤∑k=1K‖f(2)(k|)f(k|)‖+∑k=1K‖f(1)(k|)f(k|)‖2 ≤ _k=1^K\| f^(2)(t^k| )f(t^k| )\|+ _k=1^K\| f^(1)(t^k| )f(t^k| )\|^2 (B.2.4) ≤∑k=1K[τ(nk)+τ(nk)2]. ≤ _k=1^K[τ(n^k)+τ(n^k)^2]. Since τ(⋅)τ(·) is nonnegative and [τ(nk)2]E[τ(n^k)^2] is bounded, there exists k0<∞k_0<∞ such that ∑k=1K[τ(|nk|)+τ(|nk|)2]<k0 _k=1^KE[τ(|n^k|)+τ(|n^k|)^2]<k_0. By the Markov inequality, we have Pr∑k=1K[τ(nk)+τ(nk)2]>k0η \ _k=1^K[τ(n^k)+τ(n^k)^2]> k_0η\ ≤∑k=1K[τ(nk)+τ(nk)2]k0/η ≤ _k=1^KE[τ(n^k)+τ(n^k)^2]k_0/η (B.2.5) ≤η. ≤η. Let Mη=k0ηM_η= k_0η, we have Prsup‖UK,d′()‖>Mη \ _ \|U _K,d( )\|>M_η\ ≤Pr∑k=1K[τ(nk)+τ(nk)2]>Mη \ _k=1^K[τ(n^k)+τ(n^k)^2]>M_η\ (B.2.6) ≤η. ≤η. Lemma 9. If for any η>0η>0, there exists Mη<∞M_η<∞, such that Pr[sup‖UK,d′()‖>Mη]<ηPr[ _ \|U _K,d( )\|>M_η]<η, then, Pr[sup‖−s‖<δ‖UK,d()−UK,d()‖>ϵ2]<ηPr[ _\| - _s\|<δ\|U_K,d( _1)-U_K,d( _2)\|> ε2]<η. Proof: By the continuity of UK,d()U_K,d( ) and the Mean Value Theorem, we have ‖UK,d(1)−UK,d(2)‖−s‖<δ \|U_K,d( _1)-U_K,d( _2)\|_\| - _s\|<δ =UK,d′(~)⋅‖1−2‖ =U _K,d( )·\| _1- _2\| (B.2.7) <‖UK,d′(~)‖⋅δ, <\|U _K,d( )\|·δ, which implies that sup‖−s‖<δ‖UK,d(1)−UK,d(2)‖<sup‖UK,d′(~)‖⋅δ _\| - _s\|<δ\|U_K,d( _1)-U_K,d( _2)\|< _ \|U _K,d( )\|·δ. Thus, for any ϵ,ηε,η, there exists Mη=ϵ2δ<∞M_η= ε2δ<∞ such that sup‖−s‖<δ‖UK,d(1)−UK,d(2)‖>ϵ2 _\| - _s\|<δ\|U_K,d( _1)-U_K,d( _2)\|> ε2 implies sup‖UK,d′(~)‖⋅δ>ϵ2 _ \|U _K,d( )\|·δ> ε2, which further implies Pr(sup‖−s‖<δ‖UK,d(1)−UK,d(2)‖>ϵ2)<Pr(sup‖UK,d′(~)‖>ϵ2δ=Mη)<ηPr( _\| - _s\|<δ\|U_K,d( _1)-U_K,d( _2)\|> ε2)<Pr( _ \|U _K,d( )\|> ε2δ=M_η)<η. Thus, we proofed Lemma 9., i.e., (C2) in Lemma 7. Thus far, we verified (A1)-(A3) in Lemma 3, which implies ^K,d _K,d converges in probability to d _d as K→∞K→∞. Next, we are going to proof that if [UK,d()]→0E[U_K,d( )]→ 0 as d→∞d→∞ only at =0 = _0, we have d→0 _d→ _0 as d→∞d→∞ in Lemma 10. Lemma 10. If ^K,d _K,d converges in probability to d _d as K→∞K→∞, then for any ϵ>0ε>0, there exists d′∈ℝd such that for any d>d′d>d , we have ‖d−0‖∞<ϵ.\| _d- _0\|_∞<ε. (B.2.8) Proof: If the above statement is not true, then there exists ϵ∗∈ℝε^* such that for any d′>0d >0, there exists d~>d′ d>d , such that ‖d−0‖∞>ϵ∗|| _d- _0||_∞>ε^*. Therefore, there exists a sequence d~n∗n=1∞\ ^*_ d_n\_n=1^∞ such that d~n→∞ d_n→∞ and ‖d~n∗−0∗‖∞>ϵ∗|| ^*_ d_n- ^*_0||_∞>ε^*. Since Θ is compact, there exists a subsequence d~nk∗k=1∞\ ^*_ d_n_k\_k=1^∞ such that d~nk∗→0∗≠0 ^*_ d_n_k→ ^*_0≠ _0 as k→∞k→∞ and ‖0∗−0‖∞>ϵ∗|| ^*_0- _0||_∞>ε^*. Define Φd()=[UK,d()] _d( )=E[U_K,d( )]. Since UK,d()U_K,d( ) is continuous and differentiable, according to the Mean Value Theorem, we have ‖Φd~nk(d~nk∗)−Φd~nk(0∗)‖ \| _ d_n_k( ^*_ d_n_k)- _ d_n_k( ^*_0)\| =‖Φd~nk′(~0)‖⋅‖d~nk∗−0∗‖ =\| _ d_n_k( _0)\|·\| ^*_ d_n_k- ^*_0\| (B.2.9) ≤M⋅‖d~nk∗−0∗‖. ≤ M·\| ^*_ d_n_k- ^*_0\|. We know Φd~nk(d~nk∗)=[UK,d~nk(d~nk∗)]→0 _ d_n_k( ^*_ d_n_k)=E[U_K, d_n_k( ^*_ d_n_k)]→ 0 as d~nk→∞ d_n_k→∞ and Φd~nk′() _ d_n_k( ) is bounded and continuous. Then, limd~nk→∞‖Φd~nk(d~nk∗)−Φd~nk(0∗)‖≤M⋅limd~nk→∞‖d~nk∗−0∗‖ _ d_n_k→∞\| _ d_n_k( ^*_ d_n_k)- _ d_n_k( ^*_0)\|≤ M· _ d_n_k→∞\| ^*_ d_n_k- ^*_0\| and limd~nk→∞‖Φd~nk(0∗)‖≤M⋅‖0∗−0∗‖=0 _ d_n_k→∞\| _ d_n_k( ^*_0)\|≤ M·\| ^*_0- ^*_0\|=0, which implies limd~nk→∞Φd~nk(0∗)=0 _ d_n_k→∞ _ d_n_k( ^*_0)=0. This is contradicted with the assumption that limd→∞Φd()=0 _d→∞ _d( )=0 only at =0 = _0. Thus, we have proved Lemma 10. Appendix C Appendix: Additional Results in Section 3.1.4 Tables 7, 8, and 9 report the stepwise Friedman test results for T=1000T=1000, 25002500, and 50005000. Using the same selection criterion, we obtain the following optimal choices: K∗=20K^*=20 for T=1000T=1000, K∗=50K^*=50 for T=2500T=2500, and K∗=100K^*=100 for T=5000T=5000. Table 7: Stepwise Friedman Test Results for Selecting the Optimal K (T=1000T=1000) Hypothesis p-value Conclusion H0(1):MEM=MCLEM(2)=MCLEM(5)H_0^(1):M_EM=M_CLEM(2)=M_CLEM(5) 0.756 Not reject H01H_0^1 H1(1):Not all medians are equalH_1^(1):Not all medians are equal H0(2):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)H_0^(2):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10) 0.147 Not reject H02H_0^2 H1(2):Not all medians are equalH_1^(2):Not all medians are equal H0(3):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)=MCLEM(20)H_0^(3):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10)=M_CLEM(20) 0.137 Not reject H03H_0^3 H1(3):Not all medians are equalH_1^(3):Not all medians are equal H0(4):MEM=MCLEM(2)=MCLEM(5)=⋯=MCLEM(50)H_0^(4):M_EM=M_CLEM(2)=M_CLEM(5)=·s=M_CLEM(50) <0.001<0.001 Reject H04H_0^4 H1(4):Not all medians are equalH_1^(4):Not all medians are equal a Rejection decision is made if p-value is smaller that 0.05. Table 8: Stepwise Friedman Test Results for Selecting the Optimal K (T=2500T=2500) Hypothesis p-value Conclusion H0(1):MEM=MCLEM(2)=MCLEM(5)H_0^(1):M_EM=M_CLEM(2)=M_CLEM(5) 0.533 Not reject H01H_0^1 H1(1):Not all medians are equalH_1^(1):Not all medians are equal H0(2):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)H_0^(2):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10) 0.705 Not reject H02H_0^2 H1(2):Not all medians are equalH_1^(2):Not all medians are equal H0(3):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)=MCLEM(20)H_0^(3):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10)=M_CLEM(20) 0.846 Not reject H03H_0^3 H1(3):Not all medians are equalH_1^(3):Not all medians are equal H0(4):MEM=MCLEM(2)=MCLEM(5)=⋯=MCLEM(50)H_0^(4):M_EM=M_CLEM(2)=M_CLEM(5)=·s=M_CLEM(50) 0.106 Not reject H04H_0^4 H1(4):Not all medians are equalH_1^(4):Not all medians are equal H0(5):MEM=MCLEM(2)=MCLEM(5)=⋯=MCLEM(100)H_0^(5):M_EM=M_CLEM(2)=M_CLEM(5)=·s=M_CLEM(100) <0.001<0.001 Reject H05H_0^5 H1(5):Not all medians are equalH_1^(5):Not all medians are equal a Rejection decision is made if p-value is smaller that 0.05. Table 9: Stepwise Friedman Test Results for Selecting the Optimal K (T=5000T=5000) Hypothesis p-value Conclusion H0(1):MEM=MCLEM(2)=MCLEM(5)H_0^(1):M_EM=M_CLEM(2)=M_CLEM(5) 0.210 Not reject H01H_0^1 H1(1):Not all medians are equalH_1^(1):Not all medians are equal H0(2):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)H_0^(2):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10) 0.080 Not reject H02H_0^2 H1(2):Not all medians are equalH_1^(2):Not all medians are equal H0(3):MEM=MCLEM(2)=MCLEM(5)=MCLEM(10)=MCLEM(20)H_0^(3):M_EM=M_CLEM(2)=M_CLEM(5)=M_CLEM(10)=M_CLEM(20) 0.126 Not reject H03H_0^3 H1(3):Not all medians are equalH_1^(3):Not all medians are equal H0(4):MEM=MCLEM(2)=MCLEM(5)=⋯=MCLEM(50)H_0^(4):M_EM=M_CLEM(2)=M_CLEM(5)=·s=M_CLEM(50) 0.108 Not reject H04H_0^4 H1(4):Not all medians are equalH_1^(4):Not all medians are equal H0(5):MEM=MCLEM(2)=MCLEM(5)=⋯=MCLEM(100)H_0^(5):M_EM=M_CLEM(2)=M_CLEM(5)=·s=M_CLEM(100) 0.107 Not reject H05H_0^5 H1(5):Not all medians are equalH_1^(5):Not all medians are equal H0(6):MEM=MCLEM(2)=MCLEM(5)=⋯=MCLEM(250)H_0^(6):M_EM=M_CLEM(2)=M_CLEM(5)=·s=M_CLEM(250) <0.01<0.01 Reject H06H_0^6 H1(6):Not all medians are equalH_1^(6):Not all medians are equal a Rejection decision is made if p-value is smaller that 0.05. References AI Incident (2024) Note: [Online]. Artificial Intelligence Incident Database: https://incidentdatabase.ai, accessed: December 26, 2024. Cited by: Table 1, §1. Y. Balagurunathan, R. Mitchell, and I. El Naqa (2021) Requirements and reliability of ai in the medical context. Physica Medica 83, p. 72–78. Cited by: §1. California Department of Motor Vehicles (2024) Autonomous vehicle tester program. Note: [Online]. Available at: https://w.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/, accessed: September 01, 2024. Cited by: Table 1, §1. L. Chan, I. Morgan, H. Simon, F. Alshabanat, D. Ober, J. Gentry, D. Min, and R. Cao (2019) Survey of ai in cybersecurity for information technology management. In 2019 IEEE technology & engineering management conference (TEMSCON), p. 1–8. Cited by: §1. D. R. Cox and V. Isham (1980) Point processes. Vol. 12, CRC Press. Cited by: §2.2. M. Crowder (1986) On consistency and inconsistency of estimating equations. Econometric Theory 2 (3), p. 305–330. Cited by: §B.2. A. Dempster (1977) Maximum likelihood estimation from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39, p. 1–38. Cited by: §1. Z. Faddi, K. da Mata, P. Silva, V. Nagaraju, S. Ghosh, G. Kul, and L. Fiondella (2024) Quantitative assessment of machine learning reliability and resilience. Risk Analysis, in press. Cited by: Table 1, §1. X. Gao and P. X. Song (2011) Composite likelihood em algorithm with applications to multivariate hidden markov model. Statistica Sinica, p. 165–185. Cited by: §1. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun (2013) Vision meets robotics: the kitti dataset. The International Journal of Robotics Research 32 (11), p. 1231–1237. Cited by: Table 1, §1. N. Gorjian, L. Ma, M. Mittinty, P. Yarlagadda, and Y. Sun (2010) A review on reliability models with covariates. In Engineering Asset Lifecycle Management: Proceedings of the 4th World Congress on Engineering Asset Management (WCEAM 2009), 28-30 September 2009, p. 385–397. Cited by: §1. Y. Guan (2006) A composite likelihood approach in fitting spatial point process models. Journal of the American Statistical Association 101 (476), p. 1502–1512. Cited by: §B.2. M. Hassaballah, M. A. Kenk, K. Muhammad, and S. Minaee (2020) Vehicle detection and tracking in adverse weather using a deep learning framework. IEEE transactions on intelligent transportation systems 22 (7), p. 4230–4242. Cited by: §3.2.1. Hossain, S. A, and R. C. Dahiya (1993) Estimating the parameters of a non-homogeneous poisson-process model for software reliability. IEEE Transactions on Reliability 42 (4), p. 604–612. Cited by: §1. P. Kafka (2012) The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars. Procedia Engineering 45, p. 2–10. Cited by: §2.2. N. Kalra and S. M. Paddock (2016) Driving to safety: how many miles of driving would it take to demonstrate autonomous vehicle reliability?. Transportation Research Part A: Policy and Practice 94, p. 182–193. Cited by: item 1. K. C. Kapur and M. G. Pecht (2014) Reliability engineering. John Wiley & Sons. Cited by: §1. V. Kilic, D. Hegde, V. Sindagi, A. B. Cooper, M. A. Foster, and V. M. Patel (2021) Lidar light scattering augmentation physics-based simulation of adverse weather conditions for 3d object detection. arXiv preprint arXiv:2107.07004. Cited by: §3.2.1. P. W. Lewis and G. S. Shedler (1979) Simulation of nonhomogeneous poisson processes by thinning. Naval research logistics quarterly 26 (3), p. 403–413. Cited by: §3.1.1. D. Li and O. Okhrin (2023) Modified ddpg car-following model with a real-world human driving experience with carla simulator. Transportation research part C: emerging technologies 147, p. 103987. Cited by: item 1. M. Li, J. Han, and J. Liu (2017) Bayesian nonparametric modeling of heterogeneous time-to-event data with an unknown number of sub-populations. IISE Transactions 49 (5), p. 481–492. Cited by: §1. M. Li and J. Liu (2016) Bayesian hazard modeling based on lifetime data with latent heterogeneity. Reliability Engineering & System Safety 145, p. 183–189. Cited by: §1. J. Lian, L. Freeman, Y. Hong, and X. Deng (2021) Robustness with respect to class imbalance in artificial intelligence classification algorithms. Journal of Quality Technology 53, p. 505–525. Cited by: Table 1, §1. G. Liang and B. Yu (2003) Maximum pseudo likelihood estimation in network tomography. IEEE Transactions on Signal Processing 51 (8), p. 2043–2053. Cited by: §1. J. Min, Y. Hong, C. B. King, and W. Q. Meeker (2022) Reliability analysis of artificial intelligence systems using recurrent events data from autonomous vehicles. Journal of the Royal Statistical Society Series C: Applied Statistics 71 (4), p. 987–1013. Cited by: item 1, §1, §2.4. M. Mnyakin (2023) Applications of ai, iot, and cloud computing in smart transportation: a review. Artificial Intelligence in Society 3 (1), p. 9–27. Cited by: §1. J. D. Musa and K. Okumoto (1984) A logarithmic poisson execution time model for software reliability measurement. In Proceedings of the 7th international conference on Software engineering, p. 230–238. Cited by: Table 6. NHTSA (2017) Self-Driving Uber Car Kills Pedestrian in Arizona, Where Robots Roam. The New York Times (en-US). External Links: ISSN 0362-4331, Link Cited by: §1. K. Ohishi, H. Okamura, and T. Dohi (2009) Gompertz software reliability model: estimation algorithm and empirical validation. Journal of Systems and software 82 (3), p. 535–543. Cited by: Table 6. H. Okamura and T. Dohi (2013) Application of em algorithm to nhpp-based software reliability assessment with ungrouped failure time data. Stochastic Reliability and Maintenance Modeling: Essays in Honor of Professor Shunji Osaki on his 70th Birthday, p. 285–313. Cited by: §1. H. Okamura, Y. Watanabe, and T. Dohi (2003) An iterative scheme for maximum likelihood estimation in software reliability modeling. In 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003., p. 246–256. Cited by: §1. F. Pan, Y. Zhang, L. Head, J. Liu, M. Elli, and I. Alvarez (2022) Quantifying error propagation in multi-stage perception system of autonomous vehicles via physics-based simulation. In 2022 Winter Simulation Conference (WSC), p. 2511–2522. Cited by: §1. F. Pan, Y. Zhang, J. Liu, L. Head, M. Elli, and I. Alvarez (2024) Reliability modeling for perception systems in autonomous vehicles: a recursive event-triggering point process approach. Transportation Research Part C: Emerging Technologies 169, p. 104868. Cited by: §1, §2.3.2, §3.1.1. F. Pan, Y. Zhou, C. Vivas-Valencia, N. Kong, C. Ott, M. S. Jalali, and J. Liu (2025) Modeling opioid overdose events recurrence with a covariate-adjusted triggering point process. PLOS Computational Biology 21 (5), p. e1012889. Cited by: §1. Z. Pan and N. Balakrishnan (2011) Reliability modeling of degradation of products with multiple performance characteristics based on gamma processes. Reliability Engineering & System Safety 96 (8), p. 949–957. Cited by: §1. R. S. Peres, X. Jia, J. Lee, K. Sun, A. W. Colombo, and J. Barata (2020) Industrial artificial intelligence in industry 4.0-systematic review, challenges and outlook. IEEE access 8, p. 220121–220139. Cited by: §1. X. Pham and Zhang (2003) NHPP software reliability and cost models with testing coverage. European Journal of Operational Research 145 (2), p. 443–454. Cited by: §1. R. Pyke (1961) Markov renewal processes: definitions and preliminary properties. The Annals of Mathematical Statistics, p. 1231–1242. Cited by: §1. M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng, et al. (2009) ROS: an open-source robot operating system. In ICRA workshop on open source software, Vol. 3, p. 5. Cited by: 2nd item. M. Sahinoglu (1992) Compound-poisson software reliability model. IEEE Transactions on Software Engineering 18 (7), p. 624. Cited by: Table 6. M. R. Sheldon, M. J. Fillyaw, and W. D. Thompson (1996) The use and interpretation of the friedman test in the analysis of ordinal-scale data in repeated measures designs. Physiotherapy Research International 1 (4), p. 221–228. Cited by: §2.3.4. M. N. Van Lieshout (2012) On estimation of the intensity function of a point process. Methodology and Computing in Applied Probability 14 (3), p. 567–578. Cited by: §2.4. C. Varin, G. Høst, and Ø. Skare (2005) Pairwise likelihood inference in spatial generalized linear mixed models. Computational statistics & data analysis 49 (4), p. 1173–1191. Cited by: §1. A. Veen and F. P. Schoenberg (2008) Estimation of space–time branching process models in seismology using an em–type algorithm. Journal of the American Statistical Association 103 (482), p. 614–624. Cited by: §2.3.1. F. Wang, J. Du, Y. Zhao, T. Tang, and J. Shi (2020) A deep learning based data fusion method for degradation modeling and prognostics. IEEE Transactions on Reliability 70 (2), p. 775–789. Cited by: §1. S. Xia, Y. Zhang, K. Lansey, and J. Liu (2025) Penalized spatial-temporal sensor fusion for detecting and localizing bursts in water distribution systems. Information Fusion 117, p. 102912. Cited by: §2.3.4. H. Yang, F. Pan, D. Tong, H. E. Brown, and J. Liu (2024) Measurement error–tolerant poisson regression for valley fever incidence prediction. IISE transactions on healthcare systems engineering 14 (4), p. 305–317. Cited by: §1. R. Yuan, M. Tang, H. Wang, and H. Li (2019) A reliability analysis method of accelerated performance degradation based on bayesian strategy. IEEE Access 7, p. 169047–169054. Cited by: §1. P. Zeephongsekul, C. L. Jayasinghe, L. Fiondella, and V. Nagaraju (2016) Maximum-likelihood estimation of parameters of nhpp software reliability models using expectation conditional maximization algorithm. IEEE Transactions on Reliability 65 (3), p. 1571–1583. Cited by: §1. Z. Zhang, X. Si, C. Hu, and Y. Lei (2018) Degradation data analysis and remaining useful life estimation: a review on wiener-process-based methods. European Journal of Operational Research 271 (3), p. 775–796. Cited by: §1. X. Zhao, V. Robu, D. Flynn, K. Salako, and L. Strigini (2019) Assessing the safety and reliability of autonomous vehicles from road testing. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), Vol. , p. 13–23. External Links: Document Cited by: §1. S. Zheng, J. M. Clark, F. Salboukh, P. Silva, K. da Mata, F. Pan, J. Min, J. Lian, C. B. King, L. Fiondella, et al. (2025) DR-air: a data repository bridging the research gap in ai reliability. Quality Engineering, p. 1–22. Cited by: item 1. X. Zou, D. B. Logan, and H. L. Vu (2022) Modeling public acceptance of private autonomous vehicles: value of time and motion sickness viewpoints. Transportation Research Part C: Emerging Technologies 137, p. 103548. Cited by: §1.