← Back to papers

Paper deep dive

Efficient Policy Learning with Hybrid Evaluation-Based Genetic Programming for Uncertain Agile Earth Observation Satellite Scheduling

Junhua Xue, Yuning Chen

Year: 2026Venue: arXiv preprintArea: cs.AIType: PreprintEmbeddings: 100

Abstract

Abstract:The Uncertain Agile Earth Observation Satellite Scheduling Problem (UAEOSSP) is a novel combinatorial optimization problem and a practical engineering challenge that aligns with the current demands of space technology development. It incorporates uncertainties in profit, resource consumption, and visibility, which may render pre-planned schedules suboptimal or even infeasible. Genetic Programming Hyper-Heuristic (GPHH) shows promise for evolving interpretable scheduling policies; however, their simulation-based evaluation incurs high computational costs. Moreover, the design of the constructive method, denoted as Online Scheduling Algorithm (OSA), directly affects fitness assessment, resulting in evaluation-dependent local optima within the policy space. To address these issues, this paper proposes a Hybrid Evaluation-based Genetic Programming (HE-GP) for effectively solving UAEOSSP. A Hybrid Evaluation (HE) mechanism is integrated into the policy-driven OSA, combining exact and approximate filtering modes: exact mode ensures evaluation accuracy through elaborately designed constraint verification modules, while approximate mode reduces computational overhead via simplified logic. HE-GP dynamically switches between evaluation models based on real-time evolutionary state information. Experiments on 16 simulated instance sets demonstrate that HE-GP significantly outperforms handcrafted heuristics and single-evaluation based GPHH, achieving substantial reductions in computational cost while maintaining excellent scheduling performance across diverse scenarios. Specifically, the average training time of HE-GP was reduced by 17.77\% compared to GP employing exclusively exact evaluation, while the optimal policy generated by HE-GP achieved the highest average ranks across all scenarios.

Tags

ai-safety (imported, 100%)csai (suggested, 92%)preprint (suggested, 88%)safety-evaluation (suggested, 80%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 97%

Last extracted: 3/13/2026, 12:50:15 AM

Summary

The paper introduces Hybrid Evaluation-based Genetic Programming (HE-GP) to solve the Uncertain Agile Earth Observation Satellite Scheduling Problem (UAEOSSP). By integrating a Hybrid Evaluation (HE) mechanism into an Online Scheduling Algorithm (OSA), the approach dynamically switches between exact and approximate filtering modes to balance computational efficiency with solution quality, outperforming traditional GPHH and handcrafted heuristics.

Entities (5)

GPHH · methodology · 100%HE-GP · algorithm · 100%UAEOSSP · problem · 100%HE mechanism · mechanism · 95%OSA · component · 95%

Relation Signals (4)

HE mechanism integratedinto OSA

confidence 100% · A Hybrid Evaluation (HE) mechanism is integrated into the policy-driven OSA

HE-GP solves UAEOSSP

confidence 100% · this paper proposes a Hybrid Evaluation-based Genetic Programming (HE-GP) for effectively solving UAEOSSP

HE-GP utilizes HE mechanism

confidence 100% · The HE-GP framework utilizes a conventional GP architecture for population evolution while incorporating a novel Hybrid Evaluation (HE) mechanism

GPHH evolves scheduling policies

confidence 95% · Genetic Programming Hyper-Heuristic (GPHH) shows promise for evolving interpretable scheduling policies

Cypher Suggestions (2)

Map the components of the HE-GP framework. · confidence 95% · unvalidated

MATCH (a:Algorithm {name: 'HE-GP'})-[:UTILIZES]->(m:Mechanism)-[:INTEGRATED_INTO]->(c:Component) RETURN m.name, c.name

Find all algorithms designed to solve the UAEOSSP problem. · confidence 90% · unvalidated

MATCH (a:Algorithm)-[:SOLVES]->(p:Problem {name: 'UAEOSSP'}) RETURN a.name

Full Text

100,173 characters extracted from source content.

Expand or collapse full text

Efficient Policy Learning with Hybrid Evaluation-Based Genetic Programming for Uncertain Agile Earth Observation Satellite Scheduling Junhua Xue , Yuning Chen Abstract The Uncertain Agile Earth Observation Satellite Scheduling Problem (UAEOSSP) is a novel combinatorial optimization problem and a practical engineering challenge that aligns with the current demands of space technology development. It incorporates uncertainties in profit, resource consumption, and visibility, which may render pre-planned schedules suboptimal or even infeasible. Genetic Programming Hyper-Heuristic (GPHH) shows promise for evolving interpretable scheduling policies; however, their simulation-based evaluation incurs high computational costs. Moreover, the design of the constructive method, denoted as Online Scheduling Algorithm (OSA), directly affects fitness assessment, resulting in evaluation-dependent local optima within the policy space. To address these issues, this paper proposes a Hybrid Evaluation-based Genetic Programming (HE-GP) for effectively solving UAEOSSP. A Hybrid Evaluation (HE) mechanism is integrated into the policy-driven OSA, combining exact and approximate filtering modes: exact mode ensures evaluation accuracy through elaborately designed constraint verification modules, while approximate mode reduces computational overhead via simplified logic. HE-GP dynamically switches between evaluation models based on real-time evolutionary state information. Experiments on 16 simulated instance sets demonstrate that HE-GP significantly outperforms handcrafted heuristics and single-evaluation based GPHH, achieving substantial reductions in computational cost while maintaining excellent scheduling performance across diverse scenarios. Specifically, the average training time of HE-GP was reduced by 17.77% compared to GP employing exclusively exact evaluation, while the optimal policy generated by HE-GP achieved the highest average ranks across all scenarios. I Introduction Earth observation satellites (EOSs) are spaceborne platforms engineered to fulfill a wide range of observation requirements across disciplines such as agriculture and economics, making the EOS scheduling problem (EOSSP) a longstanding focus in optimization research [1].With the rapid advancement of satellite technologies and the escalating demand for satellite imagery across various applications, the optimization of scheduling processes for Agile EOSs (AEOSs) has attracted increasing attention. The AEOSs feature three degrees of freedom in attitude control (roll, pitch, and yaw), and their enhanced maneuverability enables them to handle overlapping observation requests and complex operations [2]. Compared to EOSSP, AEOSSP’s search space is considerably larger, with theoretically innumerable observation windows (OWs) within visible time windows (VTWs) for each request [3]. Contemporary research primarily concentrates on satellite scheduling challenges that incorporate considerations of agile maneuverability [4]. The integration of artificial intelligence and cyber-physical systems has driven a growing demand for autonomous satellite scheduling, particularly under the constraints of limited onboard computational resources [5]. Autonomous scheduling requires satellites to dynamically adjust schedules based on real-time state information while effectively managing environmental uncertainties. However, most existing AEOSSP studies tend to oversimplify real-world operational conditions, frequently adopting static and deterministic problem formulations. These conventional static models and their associated solution methodologies present inherent limitations in effectively supporting the development of autonomous scheduling, as AEOSs inherently encounter various resource-related and task-related uncertainties during scheduling [6]. While recent research has considered single-source uncertainties such as cloud cover impacts [7, 8, 9] and the dynamic arrival of observation requests [3], systematic investigations into multi-uncertainty integration remain insufficient. To bridge this critical gap and better align theoretical research with practical requirements, this study introduces the Uncertain AEOSSP (UAEOSSP) [10]. The UAEOSSP explicitly characterizes profit, resource consumption, and visibility as stochastic variables, thereby providing a more accurate depiction of the operational environment. Consequently, it holds substantial theoretical and practical significance for advancing autonomous scheduling capabilities. Markov Decision Processes (MDPs) provide a robust framework for uncertain sequential decision-making, enabling dynamic schedule generation based on real-time state information and supporting autonomous satellite operations [11, 12, 13]. They simulate the on-board autonomous decision-making process of satellites, where irreversible decisions are made based on real-time state information. Scheduling policies play a crucial role in methodologies that employ the MDP framework for modeling autonomous scheduling. Handcrafted scheduling policies, such as priority-based sequential construction procedures [14], have demonstrated some efficacy, but their performance is highly scenario-dependent and contingent upon specific optimization objectives. Moreover, designing effective policies tailored to specific scenarios requires substantial time and domain expertise. Recent developments in Machine Learning (ML) technologies have prompted numerous studies to utilize ML methodologies to generate scheduling policies [11, 15, 16, 17, 18]. These approaches derive optimization policies from historical datasets and data-driven patterns. However, most existing satellites are equipped solely with CPU hardware, whereas complex models, such as deep neural networks, require high-performance GPU resources [19]. Additionally, these network models suffer from the “black box” problem, which limits transparency and interpretability and hinders their direct application in satellite scheduling scenarios that impose stringent engineering reliability requirements [20]. Genetic Programming Hyper-Heuristic (GPHH) has emerged as a promising approach to address the challenges associated with interpretability in scheduling optimization [21]. As a population-based evolutionary approach, GPHH evolves heuristic policies via genetic operations rather than generating specific schedules directly [22]. Unlike deep neural network models, the evolved policies can be expressed as transparent and interpretable mathematical formulations, thereby enhancing user comprehension and trust. The efficacy of such mathematically represented policies has been demonstrated across a variety of real-world scheduling problems [20]. Furthermore, prior studies have successfully applied GPHH to develop robust policies for uncertain scheduling [23, 24, 25]. To date, some scholars have employed GPHH to address the AEOSSP while incorporating uncertainty considerations: Wei et al. proposed a knowledge-transfer based GP for multi-objective dynamic AEOSSP [3]. Chen et al. were the first to apply GPHH to solve the AEOSSP under conditions involving multiple uncertainties [10]. Although GPHH has demonstrated remarkable performance in studying AEOSSP under uncertainty [3, 10], it still exhibits certain limitations that warrant further investigation. Currently, there is a paucity of research analyzing the design and impact of evaluation models, despite their indispensable and critical role within GPHH. The evaluation of GP individuals is intrinsically linked to constructive methods [11], with the design of these methods directly influencing the fitness evaluation of identical policies. This relationship causes the distribution of local optima within the policy space to depend on the construction method employed. While prior studies have predominantly focused on the effects of enhancements to genetic operators [26, 27], modifications to evaluation can steer distinct evolutionary trajectories and potentially improve the algorithm’s search efficacy. In addition, GPHH is distinguished by high computational requirements, largely attributable to the time-intensive evaluation process [28]. Some research has sought to mitigate these computational costs by constraining policy complexity through multi-objective optimization methods [29, 30]. However, this approach either restricts the potential effectiveness of evolved policies or lacks a comprehensive and systematic investigation specifically aimed at enhancing evaluation efficiency within the context of the problem. Therefore, achieving an optimal balance between reducing the computationally expensive evaluation overhead and preserving strong algorithmic performance remains a significant and unresolved challenge in applying GPHH to the UAEOSSP. Building on the above considerations, this study introduces an innovative Hybrid Evaluation-based Genetic Programming (HE-GP) approach designed to address the UAEOSSP effectively. The HE-GP framework utilizes a conventional GP architecture for population evolution while incorporating a novel Hybrid Evaluation (HE) mechanism within a policy-driven Online Scheduling Algorithm (OSA). This HE mechanism integrates both exact and approximate filtering strategies: the exact filtering mode ensures accuracy through meticulously designed constraint verification modules, whereas the approximate filtering mode reduces computational demands by employing simplified logical procedures. Unlike evaluation models that rely on a single mode, the HE-OSA dynamically alternates between filtering modes based on the real-time evolutionary status of the GP population, thereby achieving an optimal balance between computational efficiency and search effectiveness. The principal contributions of this paper are summarized as follows: • A HE-GP was developed to address the UAEOSSP by integrating a novel-designed HE mechanism within the policy-based OSA. This integration enabled efficient policy evaluation through adaptive switching between exact and approximate filtering modes, thereby enhancing both the algorithm’s efficiency and its search performance. • The HE-GP and its evolved scheduling policies were evaluated across different configurations, verifying the effectiveness and superiority of the proposed HE mechanism in reducing computational cost and improving solution quality. • In-depth analysis was conducted on the impact of the HE mechanism on algorithm evolutionary characteristics and the composition of evolved scheduling policies, identifying key feature terminals for optimal policy design and providing valuable theoretical references for subsequent research on GPHH and AEOS autonomous scheduling. The remainder of this paper is organized as follows. Section I delineates the problem description and mathematical formulation of the UAEOSSP, and also reviews related work pertinent to this study. Section I introduces the basic framework of GPHH for solving the UAEOSSP, with an emphasis on the policy-based OSA. Section IV details the implementation of the HE mechanism. Section V describes the generation of experimental instance sets and the configuration of algorithmic parameters. Section VI presents a comparative analysis of manually designed heuristics, HE-GP, and GP employing a single evaluation model across 16 simulated instance sets, followed by a detailed discussion. Finally, Section VII concludes the paper and outlines directions for future research. I Background I-A Uncertain Agile Earth Observation Satellite Scheduling Problem The AEOSSP is an oversubscribed planning and scheduling challenge focused on producing feasible, practical plans for satellites in orbit. This study introduced a stochastic variant of AEOSSP that accounts for uncertainty in problem parameters. However, developing scheduling plans in practical management contexts often entails significant complexity, as they must integrate intricate operational considerations, including regulatory constraints and specific user requests [2]. To facilitate the analysis of UAEOSSP, some reasonable simplifications and assumptions are proposed as follows: • The satellite’s available resources and the VTWs for all requests are known a priori. • Only point targets are considered in the requests. • AEOS operates in a uniform-speed ground imaging mode, and its memory consumption is directly proportional to the imaging duration. • AEOS is equipped with a single imaging payload and can observe only one request at a time. • Cloud cover affects imaging quality, which in turn affects the profit per request. Since the uncertainty in profit has been incorporated into UAEOSSP, the current model does not account for the impact of partial cloud cover. The UAEOSSP involves a set of candidate observation requests R. For each request ri∈Rr_i∈ R, there exists a corresponding VTW [w​sri,w​eri][ws_r_i,we_r_i]. Each request is characterized by its required imaging duration d​uridu_r_i and an expected profit prip_r_i. This paper focuses on the planning and scheduling of a single satellite in orbit with a maximum onboard memory capacity denoted by m​m​cmmc. At any t within the [w​sri,w​eri][ws_r_i,we_r_i], the satellite assumes a unique attitude a​t​t,riatt_t,r_i that enables the observation of rir_i. The satellite executes different requests through attitude maneuvers. Each rir_i requires a continuous observation duration d​uridu_r_i for complete imaging. During the observation period of rir_i, the satellite’s attitude angle varies over time. Upon completion of imaging rir_i, the corresponding profit is obtained. Considering uncertainties during scheduling, the actual profit, resource consumption, and visibility associated with rir_i may vary under different environmental conditions. Let E denote the set of scenarios representing the same situation under varying environmental conditions. For a given e​n​v∈Eenv∈ E, the actual profit pri¯​(e​n​v) p_r_i(env) and visibility v​i​sri¯​(e​n​v) vis_r_i(env) of rir_i are assumed to be known in advance. However, the actual data write rate c​rri¯​(e​n​v) cr_r_i(env) during imaging of rir_i is environment-dependent and cannot be predetermined, introducing uncertainty into the resource consumption of rir_i. The variable xri¯​(e​n​v) x_r_i(env) indicates whether rir_i is observed in scenario e​n​venv, and yri,rj¯​(e​n​v) y_r_i,r_j(env) represents the sequencing order between rir_i and rjr_j. The decision variables are shown as (1) and (2). In a feasible schedule (i.e., solution), the [o​sri,o​eri][os_r_i,oe_r_i] of each selected rir_i must also be determined. xri¯​(e​n​v)=1, if ​ri​ is selected0,otherwise x_r_i(env)= cases1, if r_i is selected\\ 0,~otherwise& cases (1) yri,rj¯​(e​n​v)=1,if ​rj​ is selected immediatelyafter request ​ri0,otherwise y_r_i,r_j(env)= cases1,&if r_j is selected immediately\\ &after request r_i\\ 0,&otherwise cases (2) The objective function aims to maximize the expected total profit in an uncertain scheduling scenario and is formulated as shown in (3). max⁡∑i=1|R|xri¯​(e​n​v)⋅pri¯​(e​n​v)|E| _i=1 R x_r_i(env)· p_r_i(env) E (3) Here, the numerator denotes the total actual profit across all environments, while the denominator is the total number of environments. Constraint (4) ensures that the total memory consumption does not exceed the maximum memory capacity m​m​cmmc. ∑i=1|R|(xri¯​(e​n​v)⋅c​rri¯​(e​n​v)⋅d​uri)≤m​m​c _i=1 R ( x_r_i(env)· cr_r_i(env)· du_r_i)≤ mmc (4) Constraint (5) stipulates that if rir_i is invisible, it cannot be scheduled for observation. xri¯​(e​n​v)=0,∀v​i​sri¯​(e​n​v)=0 x_r_i(env)=0,∀ vis_r_i(env)=0 (5) Constraint (6) requires that, for any observed rir_i, the OW must lie entirely within [w​sri,w​eri][ws_r_i,we_r_i]. w​sri≤o​sri¯​(e​n​v)<o​eri¯​(e​n​v)≤w​eri,∀xri¯​(e​n​v)=1ws_r_i≤ os_r_i(env)< oe_r_i(env)≤ we_r_i,∀ x_r_i(env)=1 (6) Constraint (7) specifies that the duration for each observed rir_i must equal d​uridu_r_i to ensure complete imaging. o​eri¯​(e​n​v)−o​sri¯​(e​n​v)=d​uri,∀xri¯​(e​n​v)=1 oe_r_i(env)- os_r_i(env)=du_r_i,∀ x_r_i(env)=1 (7) Constraint (8) ensures that the attitude transition time between two consecutive observations does not exceed the time interval between their OWs. The transition angle Δ​g g between the two attitudes is calculated based on the differences in pitch (γ), roll (η), and yaw (ϑ ) angles, as shown in (9). Furthermore, the transition time can be modeled as a piecewise linear function, as described in (10) [2, 18]. o​eri¯​(e​n​v)+T​r​a​n​s​(a​t​to​eri¯,ri,a​t​to​erj¯,rj) oe_r_i(env)+Trans (att_ oe_r_i,r_i,att_ oe_r_j,r_j ) (8) ≤o​erj¯​(e​n​v),∀yri,rj¯​(e​n​v)=1 ≤ oe_r_j(env),∀ y_r_i,r_j(env)=1 Δg(atto​eri¯,ri, g(att_ oe_r_i,r_i, atto​erj¯,rj)=|γ(atto​eri¯,ri)−γ(atto​erj¯,rj)| att_ oe_r_j,r_j)= γ(att_ oe_r_i,r_i)-γ(att_ oe_r_j,r_j) (9) +|η​(a​t​to​eri¯,ri)−η​(a​t​to​erj¯,rj)| + η(att_ oe_r_i,r_i)-η(att_ oe_r_j,r_j) +|ϑ​(a​t​to​eri¯,ri)−ϑ​(a​t​to​erj¯,rj)| + (att_ oe_r_i,r_i)- (att_ oe_r_j,r_j) T​r​a​n​s​(Δ​g)=a1+Δ​gv1,Δ​g≤θ11a2+Δ​gv2,θ20≤Δ​g≤θ21…an+Δ​gvn,θn​0≤Δ​g≤θn​1 Trans( g)= (10) Here, a, v, and θ are satellite-specific parameters associated with the function T​r​a​n​(⋅)Tran(·). Constraint (11) prohibits any rir_i from having a sequential relationship with itself. yri,ri¯​(e​n​v)=0, y_r_i,r_i(env)=0, (11) To address the issue of representing the first and last requests in the observation sequence within the decision variable y¯​(e​n​v) y(env), dummy requests are introduced as per [31]. The corresponding constraints on the dummy requests are specified as (12). xr0¯​(e​n​v)=xr|R|+1¯​(e​n​v)=1 x_r_0(env)= x_r_ R +1(env)=1 (12) Constraints (13) represents that if rir_i is observed, it must have exactly one successor and one predecessor request. xri¯​(e​n​v) x_r_i(env) =∑j=1|R|yri,rj¯​(e​n​v) = _j=1 R y_r_i,r_j(env) (13) =∑i=1|R|yri,rj¯​(e​n​v),∀xri¯​(e​n​v)=1 = _i=1 R y_r_i,r_j(env),∀ x_r_i(env)=1 Constraint (14) ensures that if rjr_j is observed immediately after rir_i, both requests must be included in the schedule. xri¯​(e​n​v)=1∧xrj¯​(e​n​v)=1,∀yri,rj¯​(e​n​v)=1 x_r_i(env)=1 x_r_j(env)=1,∀ y_r_i,r_j(env)=1 (14) The DFJ constraint proposed by Dantzig et al. in 1954 is an effective method of avoiding subloops on the path [32], which can be expressed by (15). ∑ri∈S∑rj∈Syri,rj¯​(e​n​v)≤|S|−1, _r_i∈ S _r_j∈ S y_r_i,r_j(env)≤ S -1, (15) ∀S⊆1,2,⋯,|R|,S≠∅ ∀ S \1,2,·s, R \,S≠ Constraint (16) is the domain of decision variables, which is shown as (16). xri¯​(e​n​v)∈0,1,yri,rj¯​(e​n​v)∈0,1, x_r_i(env)∈\0,1\, y_r_i,r_j(env)∈\0,1\, (16) o​sri¯​(e​n​v)∈R+,o​eri¯​(e​n​v)∈R+ os_r_i(env)∈ R^+, oe_r_i(env)∈ R^+ I-B Related Work The AEOSSP has arisen from advances in satellite technology, notably the increased maneuverability of agile satellites, and has been shown to be NP-hard [33]. Since exact methods are only suitable for small-scale instances, heuristic [34, 35] and metaheuristic [2, 36] approaches have predominantly been employed to address AEOSSP. Most studies focus on static and deterministic AEOSSP models, which oversimplify real-world conditions. Simultaneously, the associated solution methodologies encounter significant challenges when addressing stochastic variations in the environment [10]. The growing demand for autonomous scheduling capabilities imposes more stringent requirements on contemporary research in the field of AEOSSP [5], with a primary focus on enabling satellites to effectively manage the uncertainties inherent in the scheduling process. Incorporating uncertainty fundamentally enhances the relevance of research on AEOSSP to the requirements of aerospace engineering. In recent years, the investigation of AEOSSP models and solution methodologies that incorporate uncertainty has garnered significant scholarly attention. Several studies have examined the impact of cloud cover uncertainty on satellite observations [7, 8, 9]. The dynamic nature of demand arrivals and cancellations, a practical challenge in satellite scheduling, has also emerged as a prominent research area [37, 3, 38]. Furthermore, some research has addressed uncertainties related to resource consumption [39] and emergency events [40] within the context of satellite scheduling. Nevertheless, existing studies predominantly focus on single uncertainty, with limited integration of multiple uncertainties. To more effectively address the AEOSSP amid uncertainties, considerable research has focused on scheduling methodologies with rapid-response capabilities. These methodologies are primarily categorized into rescheduling and policy-based scheduling approaches. Rescheduling approaches predominantly focus on managing unforeseen occurrences, such as the arrival of new demand, and may involve either creating an entirely new schedule or modifying the existing solutions [41, 42]. The drawback of rescheduling approaches lies in their strong dependence on the initial plan and incur high computational costs, leading to slow responses to frequently occurring uncertainties. Policy-based approaches provide scheduling policies that facilitate the generation of schedules rather than directly optimizing them. These policies can guide constructive methods to produce solutions, often derived from handcrafted heuristic rules [14] or ML algorithms [18, 43]. Although the network-based policies provide AEOS with enhanced autonomous scheduling functionalities, their applicability is constrained by limited interpretability, often characterized as a “black box” [20, 23]. In many engineering fields, especially in the aerospace sector, maintaining users’ trust and confidence in machine systems is of paramount importance. Therefore, it is essential to develop autonomous scheduling capabilities for satellites that incorporate policies ensuring both operational effectiveness and interpretability, thereby providing reliable support [13]. To ensure the interpretability of the scheduling policies, GPHH has been extensively applied to scheduling optimization over the past decade. It can generate interpretable policies with mathematical or logical expressions [44]. Recent developments in GPHH have primarily focused on designing algorithmic architectures and operators to improve both the efficiency of the evolutionary process and the quality of the resulting evolved policies. Partial techniques are listed as follows: multi-objective evaluation to regulate policy complexity [13, 45, 30]; transfer learning to enhance the initial population’s performance [26, 3]; ensemble learning to increase decision robustness [46, 47]; and niching to maintain population diversity [27, 48]. However, to our knowledge, there remains a notable lack of systematic research focused on evaluation, despite its indispensable role in the evolutionary process of GPHH. Evaluation is a crucial aspect of GPHH that warrants further investigation. In scheduling optimization, training time is often prolonged due to inefficient evaluation processes [28]. Moreover, different evaluation models can steer the evolutionary process in distinct directions, as variations in the evaluation model may produce different schedules for the same policy. Firstly, the inefficiency in the evaluation process directly impedes the training speed. Therefore, minimizing runtime is crucial for the practical implementation of GPHH [49]. Existing studies mainly accelerate evaluation from three perspectives: fitness inheritance and imitation methods estimate a GP individual’s fitness based on related individuals rather than performing full evaluations [50, 51]; multi-fidelity fitness approximation methods can balance fidelity and computational expense by employing multi-fidelity simulations and surrogate models to expedite the search for optimal solutions, ultimately yielding acceptable results [28, 52, 53]; and parallel and distributed computing techniques have been utilized to alleviate computational burdens [54]. Besides parallel and distributed computing techniques, other methods mainly employ approximate evaluation for GP individuals to mitigate the high computational cost associated with exact evaluation. Approximate evaluation models are widely utilized, and they simplify the evaluation process by leveraging exact evaluation models to generate approximate results, thereby accelerating training [55]. Nonetheless, discrepancies exist between approximate and exact models, so the algorithm’s performance is highly contingent upon the design of the approximate model [28]. The fitness values derived from approximate evaluation differ from those obtained through exact evaluation, potentially causing inconsistencies in the identification of local and global optima. Currently, there is still limited research addressing the trade-offs and integration between approximate and precise evaluations. Most existing studies primarily concentrate on incorporating approximate models to accelerate training, provided that the reduction in result quality remains within acceptable bounds. In summary, we introduce an AEOSSP model that better reflects real-world conditions by incorporating multiple uncertainties, and we solve it using the GPHH method. Given the high computational cost of GPHH and its dependence on evaluations, this study innovatively designs a technique that integrates both exact and approximate evaluations to achieve a better balance between computational efficiency and evolutionary quality. I Genetic Programming Hyper-Heuristic for UAEOSSP This section introduces a basic GPHH framework for solving UAEOSSP, and the specific designs of each component are provided. I-A The Overall Framework of GPHH Algorithm 1 illustrates the architecture of the GPHH used to solve the UAEOSSP. Initially, a population of N GP individuals, each representing a scheduling policy, is randomly generated. The evaluation utilizes a constructive method grounded in MDP, allowing the satellite to progressively generate a schedule in accordance with the scheduling policy. Based on GP individuals’ fitness values, the GP population undergoes iterative refinement using genetic operators to continuously improve the effectiveness of scheduling policies. Once the stopping criteria are met, the evolutionary process terminates and returns the highest-performing scheduling policy. Specifically, the constructive method employed for evaluation must be suitable for the UAEOSSP, thereby providing an objective and effective basis. For any scheduling scenarios, this policy-driven, timeline-based MDP should be capable of producing feasible schedules, ensuring that all constraints are satisfied. Algorithm 1 The Overall Framework of GPHH 1:The number of individuals in the GP population N, the Maximum number of iterations T, a training dataset t​r​a​i​n​S​e​ttrainSet 2:The optimal policy s​p∗sp^* 3:p​o​p←Init​(N)pop (N); \\ Initialize a GP population. (See Section I-B) 4:b​e​s​t​G​e​n←∅,b​e​s​t​F​i​t←−∞bestGen← ,bestFit←-∞; 5:for t←1t← 1 to T do 6: f​i​t​s←Evaluation​(p​o​p,t​r​a​i​n​S​e​t)fits (pop,trainSet); \\ Evaluate the current population using training instances. (See Section I-C) 7: \\ Genetic Operators is applied to generate offspring. (See Section I-D) 8: p​o​p′←Selection​(p​o​p,f​i​t​s)pop (pop,fits); 9: p​o​p′←Crossover​(p​o​p′)pop (pop ); 10: p​o​p′←Mutation​(p​o​p′)pop (pop ); 11: p​o​p←Reproduction​(p​o​p,p​o​p′)pop (pop,pop ); 12: Update s​p∗sp^* in p​o​ppop; 13:end for 14:return s​p∗sp^*; I-B Individual Representation and Initialization GP individuals represent heuristic scheduling policies that guide decision-making. Each GP individual is encoded as a tree structure comprising function nodes and terminal nodes, which can be translated into a mathematical expression. For instance, the scheduling policy illustrated in Fig. 1 can be expressed as |R​P−R​R|+max⁡(C​T,5.20) RP-R + (CT,5.20). Within the dynamic decision-making process, the scheduling policy is applied prior to each decision to compute heuristic values for candidate requests, which serve as the basis for selecting the optimal request r∗r^*. The initial population is generated using Koza’s half-and-half method [56], in which individuals are created with equal probability via either the full or grow method. Figure 1: An example of the tree representation of a scheduling policy. I-C A Constructive Method for Fitness Evaluation The policy-driven, timeline-based MDP model proposed herein is named the Online Scheduling Algorithm (OSA). OSA must be capable of generating feasible schedules based on GP individuals (i.e., scheduling strategies) within specific environmental contexts to evaluate their effectiveness. For the AEOSSP with time-dependent transition time, some researchers have modeled it as an MDP [3, 11, 17]. Given that the UAEOSSP incorporates unpredictable information, decision points are not predetermined [57]. Furthermore, the uncertainty associated with specific parameters results in indeterminate state transitions, thereby rendering the UAEOSSP effectively a partially observable MDP [58]. A request that is still feasible is defined as a candidate request. Each candidate request has an earliest start time of OW, and can be observed from this time until the end of its VTW. In the MDP of the UAEOSSP, the action is to select a candidate request according to the scheduling policy s​psp at each step, determine its earliest OW, and execute the observation at that time to reach the next state. Each request has a corresponding actual profit, which serves as the reward the satellite receives after completing each action. The objective of this paper is to identify a scheduling policy that performs robustly on average across diverse environmental conditions. Accordingly, the fitness of s​psp is defined as the mean total profit obtained over multiple scenarios, as formalized in (17). F​i​t​(s​p)=∑e​n​v∈EO​S​A​(s​p,e​n​v)|E|Fit(sp)= _env∈ EOSA(sp,env) E (17) Here, O​S​A​(s​p,e​n​v)OSA(sp,env) represents the total profit obtained by policy s​psp in the scheduling scenario under environment e​n​venv, and E denotes a set of distinct environments. Algorithm 2 provides the specific implementation process for OSA. During the execution of OSA, the satellite’s state is classified as either active (working) or inactive (idle). The satellite is deemed idle immediately following initialization or upon completion of a prior observation. Decision points arise exclusively when the satellite is idle, at which juncture the scheduling policy must determine the optimal action based on the current information. Once a decision is made under the s​psp, the satellite transitions to the active state. A new decision point is triggered each time the satellite re-enters the idle state. Initially, all requests are incorporated into the candidate request pool U. Prior to each decision, the feasibility of requests within U is verified, and their earliest OW start times are computed. The heuristic value of each ri∈Ur_i∈ U is calculated using s​psp, and the one with the optimal heuristic value is selected for execution. Scheduling terminates and returns the schedule and total profit if U=∅U= or if the onboard memory is insufficient. Otherwise, the satellite iteratively recalculates heuristic values for candidate requests and makes decisions accordingly. Notably, the proposed OSA exhibits inherent adaptability to dynamic scenarios. Newly arriving requests can be directly appended to U, while canceled requests can be removed, followed by appropriate schedule adjustments. This renders the constructive method is readily extendable to dynamic scheduling problems. Algorithm 2 The Online Scheduling Algorithm for Solving UAEOSSP 1:A scheduling policy s​s​(⋅)s(·), a request set R, a scheduling scenario e​n​venv, the maximum memory capacity m​m​cmmc, the actual profit p​(⋅)p(·), the actual write code rate c​r​(⋅)cr(·), the imaging duration d​u​r​(⋅)dur(·) 2:A feasible schedule s​o​l=(u,o​su,o​eu)sol=\(u,os_u,oe_u)\, fitness of the given policy f​i​tfit 3:s​o​l←∅sol← , f​i​t←0fit← 0, t←0t← 0; 4:U←RU← R; \\ U is the pool of unobserved requests 5:while U≠∅U≠ do 6: U′,O​Su←F​i​l​t​e​r​(U,t,s​o​l)U ,OS_u← Filter(U,t,sol); \\ Update U to ensure the feasibility of candidate requests, and the OW start time for each is calculated. (See Section IV) 7: u∗←arg⁡maxu∈U′⁡s​s​(u,F)u^*← _u∈ U \s(u,F)\; \\ u∗u^* is the selected request with optimal heuristic value 8: m​m​c←m​m​c−d​u​r​(u∗)×c​r​(u∗)mmc← mmc-dur(u^*)× cr(u^*); 9: if m​m​c<0mmc<0 then 10: break; \\ The scheduling terminates due to insufficient memory capacity. 11: end if 12: t←O​Su​(u∗)+d​u​r​(u∗)t← OS_u(u^*)+dur(u^*); 13: s​o​l←s​o​l∪u∗,O​Su​(u∗),tsol← sol∪\u^*,OS_u(u^*),t\, f​i​t←f​i​t+p​(u∗)fit← fit+p(u^*); 14: U←U′∖u∗U← U u^*; 15:end while 16:return s​o​lsol, f​i​tfit; The principal computational challenge associated with the OSA arises from state updates and policy-based decision-making. The computational complexity inherent in policy-based decision-making is directly proportional to the size of the policy, with more complex policies demanding increased computational resources. Therefore, expediting state update procedures represents a crucial objective for improving the overall efficiency of the OSA. The principal innovation of this study is the introduction of a HE-based OSA (HE-OSA), which facilitates rapid filtering of the U and efficient computation of the earliest OWs for ri∈Ur_i∈ U. HE-OSA encompasses both exact and approximate evaluation models, each utilizing two distinct filtering modes: exact filtering and approximate filtering. The methodologies for approximate and exact filtering are described in Sections IV-B and IV-A, respectively. In contrast to employing a single filtering mode or allocating iterations to each mode through a two-stage process, this study proposes an adaptive switching mechanism that leverages evolutionary state information, as detailed in Section IV-C. The objective of UAEOSSP requires multiple training instances for evaluation. However, directly assessing GP individuals on the entire training set incurs prohibitive computational costs. To mitigate this expense, a mini-batch rotation mechanism [59] is employed, wherein each iteration evaluates fitness using only one mini-batch. A round-robin sampling strategy is adopted to sequentially select mini-batches, ensuring systematic and balanced utilization of training instances throughout the evolutionary process. I-D Genetic Operators The genetic operators of the EH-GP include selection, crossover, mutation, and population reproduction: I-D1 Selection The tournament selection operator with a tournament size of Ts​i​z​eT_size is employed. Ts​i​z​eT_size individuals are randomly sampled without replacement from the parent population, and the individual with the optimal fitness is selected to enter the offspring population. I-D2 Crossover The single-point crossover operator is utilized. Two individuals are randomly selected from the offspring population, and crossover points are chosen from their non-leaf nodes. The subtrees rooted at these crossover points are exchanged to generate two new offspring individuals, which replace the original selected individuals. I-D3 Mutation The uniform mutation operator is applied to introduce genetic diversity into the population. GP individuals are selected for mutation with a preset mutation probability. For the selected individual, a random node is chosen from its tree structure, and a new subtree generated by the half-and-half method replaces the selected node. I-D4 Reproduction At the end of each iteration, the parent population and the newly generated offspring population are combined. The tournament selection is reimplemented on the combined population to screen individuals with excellent fitness, which constitute the next-generation population, and maintain the same scale of the GP population during evolution. IV Hybrid Evaluation Mechanism Building upon the framework outlined in Section I, this section proposes a hybrid evaluation (HE) mechanism designed to enhance the performance of the OSA used for evaluation. The HE mechanism offers a novel evaluation strategy for GP individuals, enabling adaptive switching between exact and approximate evaluation models. Different evaluation models correspond to distinct filtering modes, which also account for differences among evaluation models under HE-GP. OSA is a constructive-based MDP model, which is also widely used for evaluation in scheduling optimization [3, 15, 16]. During the autonomous scheduling process, state updates and policy-based decision-making are indispensable (see Section I-C). The decision logic employed is deterministic, meticulously calculating heuristic values for all candidate requests based on a given scheduling policy and selecting the optimal one. The filtering in the state updates is also heuristic due to the diversity of OWs. Beyond ensuring that any schedule meets the constraints, the agility of AEOS extends the VTWs, making the OW for each candidate request non-unique. The state update requires removing unobservable requests from U and recalculating the OW for each ri∈Ur_i∈ U. Therefore, the distinction between different filtering modes lies in the selection of OWs for candidate requests. In contrast to vehicle routing planning problems, where path information between request points can be predetermined [60], the transition time between any two requests cannot be predetermined. Consequently, the OWs of candidate requests are neither fixed nor rapidly computable, as illustrated in Fig. 2. The frequent necessity to verify constraints and update OWs imposes substantial computational overhead during evaluation, directly affecting training efficiency. Hence, developing efficient procedures for removing infeasible requests from U and updating OWs is essential to mitigate costly evaluation overhead. Figure 2: An example of the AEOS attitude transition is provided. The duration necessary for the attitude transition between two identical candidate requests exhibits variability. Prior studies have typically selected the earliest O​WriOW_r_i that meets the constraints to observe rir_i [38, 31]. This approach can be considered a form of exact filtering, which determines whether to remove candidate requests based on the occurrence of the earliest OW. However, this commonly used filtering mode design in the past still has certain shortcomings: 1. The frequent computation of earliest OWs significantly increases the computational load, whereas pruning techniques can effectively prevent unnecessary resource expenditure. For example, the maximum transition time m​T​r​a​n​smTrans can be pre-estimated based on VTW information, thereby avoiding updates for candidate requests with later OW start times. 2. The exact determination of OWs depends on heuristic strategies that do not guarantee optimal solutions when combined with specific scheduling policies. Different OW selection heuristics can produce varying performance outcomes for the same policies and may guide the algorithm toward different local optima. Therefore, incorporating a hybrid evaluation mechanism can promote the exploration of diverse local optima, thereby improving the global search capability. Based on the above considerations, the main contribution of the proposed HE mechanism is the improvement of the traditional evaluation approach, which relies on a fixed exact model. It introduces a more efficient approximate filtering mode and possesses the capability to adaptively switch between filtering modes based on the current evolutionary state information. The two meticulously designed modes incorporate multiple check modules to verify the feasibility of candidate requests. The key difference between the exact evaluation model and the approximate evaluation model lies in how U is updated. IV-A Exact Filtering Mode Fig. 3 illustrates the flowchart of exact filtering mode utilized in the HE mechanism. Figure 3: Flowchart of the exact filtering mode. The exact filtering mode employed in the exact evaluation model comprises four check modules, each tasked with conducting specific constraint checks or making pruning decisions. These four modules are described as follows: (1) Pruning Judgment Check w​sri≥tn​o​w+m​T​r​a​n​sws_r_i≥ t_now+mTrans (18) Here, w​sriws_r_i denotes the start time of the VTW for request rir_i, tn​o​wt_now is the current time, and m​T​r​a​n​smTrans represents the maximum transition time, which depends solely on the satellite’s maneuvering capabilities and the spatial distribution of requests. Candidate requests in U are sorted by w​sriws_r_i. Satisfaction of this condition implies that all subsequent requests in U do not require recomputation of OWs. (2) Complete Imaging Check w​eri<tn​o​w+d​u​rriwe_r_i<t_now+dur_r_i (19) Here, w​eriwe_r_i is the end time of the VTW for request rir_i, and d​u​rridur_r_i is the required observation duration. If this inequality holds, request rir_i cannot be fully imaged, enabling rapid exclusion of requests that cannot be completely observed in a single operation. Given that satellites typically execute requests in a jump-observation mode, many candidate requests between adjacent observations are unobservable; this module efficiently identifies and removes them. (3) Memory Check d​u​rri×c​r>m​m​cn​o​wdur_r_i× cr>mmc_now (20) Here, c​rcr is the write code rate during imaging and m​m​cn​o​wmmc_now denotes the current available memory. If this condition is met, request rir_i violates memory constraints and is discarded. (4) Exact Earliest Observation Window Check The existence of a feasible OW is a necessary condition for the executability of a candidate request. This check employs a two-stage binary search algorithm to precisely compute the earliest OW for candidate requests. If no OW satisfies the constraints for a given request, it is removed from U. The proposed two-stage binary search algorithm is designed to efficiently and accurately calculate the earliest OW [o​sri,o​eri][os_r_i,oe_r_i]. The theoretical foundation of this search algorithm is based on the time-delay monotonicity property of AEOS attitude transitions [31]. Specifically, if an earliest OW exists for a request, there exists a unique time point at which the satellite completes the attitude transition at o​srios_r_i. Algorithm 3 delineates the procedure for identifying the earliest OW start time of candidate request rir_i. Initially, the left pointer l​plp is set to w​sriws_r_i and the right pointer r​prp to w​eriwe_r_i. The first stage identifies the right endpoint r′r such that any time within the interval [l,r′][l,r ] guarantees complete observation of request rir_i. The second stage locates the earliest o​srios_r_i, ensuring that the earliest OW complies with all constraints. If the pointers l and r coincide, this point corresponds to the earliest o​srios_r_i; otherwise, the satellite cannot observe request rir_i. Algorithm 3 The two-stage binary search algorithm 1:the current moment t, a request id i, the current attitude angles a​t​t​N​o​wattNow, the start time of VTW w​sws, the end time of VTW w​ewe, the time required for imaging d​u​rdur, the desired precision p​r​epre 2:the earliest start time of OW e​s​test 3:e​s​t←−1est←-1; \\ If the returned e​s​t=−1est=-1, it indicates that rir_i does not have a feasible OW. 4:l​p←w​s​(i)lp← ws(i), r​p←w​e​(i)rp← we(i); 5:while l​p≤r​plp≤ rp do 6: m​i​d←(l​p+r​p)/2mid←(lp+rp)/2; 7: if m​i​d+d​u​r​(i)≤w​e​(i)mid+dur(i)≤ we(i) then 8: a​t​t​N​e​x←GetAttitude​(i,m​i​d)attNex (i,mid); \\ Get the satellite attitude angle required to observe rir_i at time m​i​dmid 9: if m​i​d≥t+Trans​(a​t​t​N​o​w,a​t​t​N​e​x)mid≥ t+Trans(attNow,attNex) then 10: r​p←m​i​d−p​r​erp← mid-pre; 11: e​s​t←m​i​dest← mid; 12: else 13: l​p←m​i​d+p​r​elp← mid+pre; 14: end if 15: else 16: r​p←m​i​d−p​r​erp← mid-pre; 17: end if 18:end while 19:return e​s​test; Given that the width of the VTW for rir_i is w​wriww_r_i, the time complexity of the search algorithm is O​(log⁡w​wri)O( w_r_i), representing a substantial improvement over the brute-force search complexity of O​(w​wri)O(w_r_i). IV-B Approximate Filtering Mode Fig. 4 illustrates the flowchart of approximate filtering mode involved in the HE mechanism. Figure 4: Flowchart of the approximate filtering mode. Obviously, the approximate filtering mode does not include the memory check module, while a novel verification module is employed to compute OWs. The specific design of this module is as follows: (5) Approximate Earliest Observation Window Check In the context of UAEOSSP, the VTWs of requests are deterministic and known a priori. Regardless of the order of the requests, the maximum transition time m​t​t​(ri,rj)mtt(r_i,r_j) for each request pair (ri,rj)(r_i,r_j) can be preprocessed. This value is calculated based on the maximum angular difference within the VTWs of the two requests, as follows: m​t​t​(ri,rj)=m​t​t​(rj,ri)= mtt(r_i,r_j)=mtt(r_j,r_i)= (21) T​r​a​n​(maxtri,trj⁡∇g​(a​t​tri,tri,a​t​trj,trj)), Tran ( _t_r_i,t_r_j \∇ g (att_r_i,t_r_i,att_r_j,t_r_j ) \ ), ∀tri∈[w​sri,w​eri],∀trj∈[w​srj,w​erj] ∀ t_r_i∈ [ws_r_i,we_r_i ],∀ t_r_j∈ [ws_r_j,we_r_j ] After completing the preceding request rp​r​er_pre, the OW start time o​srios_r_i for request rir_i is updated as: o​sri= os_r_i= max⁡(w​sri,tn​o​w+m​a​x​T​r​a​n​(rp​r​e,ri)), (ws_r_i,t_now+maxTran(r_pre,r_i) ), (22) ∀ri∈U ∀ r_i∈ U If the updated OW start time satisfies the following inequality: o​sri+d​u​rri≤w​erios_r_i+dur_r_i≤ we_r_i (23) then request rir_i is retained; otherwise, it is removed from U due to constraint violation. Given that any m​t​t​(ri,rj)mtt(r_i,r_j) can be preprocessed in advance, the time complexity of the verification module is O​(1)O(1). This represents a significant improvement compared to the O​(log⁡w​wri)O( w_r_i) time complexity required to calculate the exact earliest OW. IV-C Adaptive Switching Methodology To quantify the evolutionary status of the algorithm, two indicators are introduced: the evolutionary stage factor f​a​ce​sfac_es and the population diversity factor f​a​cp​dfac_pd. f​a​ce​sfac_es measures evolutionary progress based on the current generation g relative to the total number of generations G. f​a​cp​dfac_pd assesses population diversity by evaluating the uniqueness of fitness values within the population. Define a mapping Nb:μ→ℕN_b:μ that counts the occurrences of elements, where Nb​(x)N_b(x) denotes the frequency of element x in set b. For the population p​o​ppop with corresponding fitness set f​i​t​sfits, let the deduplicated fitness set be f​i​t​s^=f∈f​i​t​s∣Nf​i​t​s​(f)≥1 fits=\f∈ fits N_fits(f)≥ 1\. The two factors are computed as follows: f​a​ce​s=gG,f​a​cp​d=|f​i​t​s^||f​i​t​s|fac_es= gG,fac_pd= | fits||fits| (24) The HE mechanism relies on the two aforementioned factors to regulate the likelihood of employing either exact or approximate filtering modes. Fundamentally, its rationale is that as the evolutionary process progresses or population diversity decreases, the need for exact evaluation increases. During the initial phases of evolution, the focus is on global exploration by generating diverse strategies aimed at covering potentially optimal regions. At this stage, the primary evaluation criterion prioritizes efficiency over accuracy. Conversely, in the middle to later stages of evolution, as the population’s overall fitness improves, the algorithm should shift toward local exploitation. This involves the exact evaluation of high-quality policies to more effectively differentiate their fitness values. When population diversity declines, meaning that most individuals have similar fitness values, exact evaluation model should be employed to provide accurate fitness feedback. The fitness values of GP individuals serve as phenotypic indicators reflecting similarity among individuals. Accordingly, the probability of employing the exact filtering mode, denoted Pe​x​a​c​tP_exact, is defined as follows: Pe​x​a​c​t=φe​s×f​a​c​t​o​re​s+φp​d×(1−f​a​c​t​o​rp​d)P_exact= _es× factor_es+ _pd×(1-factor_pd) (25) where φe​s _es and φp​d _pd are weighting coefficients satisfying φe​s+φp​d=1 _es+ _pd=1. Let ρ be a random variable uniformly distributed over [0,1][0,1]. The evolutionary indicator function is then defined as follows: ​(g)=1,if ​ρ<Pexact0,otherwiseI(g)= cases1,&if ρ<P_exact\\ 0,&otherwise cases (26) For any scheduling policy s​psp, the fitness evaluation function is modified as follows: F​i​t​(s​p,g)=1|E|​∑e​n​v∈EO​S​A​(s​p,e​n​v,​(g))Fit(sp,g)= 1|E| _env∈ EOSA(sp,env,I(g)) (27) The online scheduling function O​S​A​(⋅)OSA(·) is specified by: O​S​A​(s​p,e​n​v,)=O​S​Aa​p​p​r​o​(s​p,e​n​v), if ​=0O​S​Ae​x​a​c​t​(s​p,e​n​v), if ​=1OSA(sp,env,I)= casesOSA_appro(sp,env), if I=0\\ OSA_exact(sp,env), if I=1& cases (28) Here, O​S​Aa​p​p​r​o​(⋅)OSA_appro(·) and O​S​Ae​x​a​c​t​(⋅)OSA_exact(·) denote the approximate and exact evaluation models, respectively. The evaluation function can be further expressed as follows: Fit(sp,g)=1|E|∑e​n​v∈E[(1−(g))⋅ Fit(sp,g)= 1|E| _env∈ E[ (1-I(g) )· (29) OSAa​p​p​r​o(sp,env)+(g)⋅OSAe​x​a​c​t(sp,env)] OSA_appro(sp,env)+I(g)· OSA_exact(sp,env)] V Experimental Design Instance generation was conducted on a laptop equipped with a 13th-generation Intel Core i9-13980HX processor operating at 2.20 GHz and 15.6 GB of RAM. The satellite scheduling simulations and instance generation were performed using MATLAB R2020b with the Satellite Toolkit (STK) version 12. All algorithms were developed in Python and executed on a workstation with a 32-core Intel Xeon Gold 6459C processor. V-A Instance Generation This study addresses the UAEOSSP, in which request profit, visibility, and imaging data write rate are modeled as stochastic variables. An instance set comprises problem instances that share identical deterministic parameters but differ in realizations of random variables, representing varying environmental conditions. Due to the lack of established benchmarks, the STK is used to simulate 16 distinct AEOSSP scenarios. For each scenario, random parameters are sampled from specified distributions to generate corresponding UAEOSSP instances, with a scheduling horizon spanning from 00:00:00 to 24:00:00 UTC on September 1, 2024. The satellite configurations utilized in the scenarios are based on prior works [38, 31]. The satellite’s spatial position is characterized by six classical orbital elements: semi-major axis (a), eccentricity (e), inclination (i), argument of perigee (ω), right ascension of the ascending node (R​A​A​NRAAN), and mean anomaly (m). Table I presents the initial orbital parameters of the AEOS. Attitude constraints restrict the pitch and roll angles within the interval [−27∘,27∘][-27 ,27 ]. The imaging data write rate is fixed at 3.5 GB/s. Parameters governing the attitude transition time function T​r​a​n​()Tran() are configured as detailed in Table I. TABLE I: The Orbital parameters of the AEOS Parameter α​(m)α(m) e(∘)e( ) i(∘)i( ) ω(∘)ω( ) RAAN(∘)RAAN( ) m(∘)m( ) Value 68781376878137 0.000.00 0.000.00 0.000.00 360.00360.00 360.00360.00 TABLE I: The parameters of the function T​r​a​n​()Tran() Segment αi _i viv_i θi​0 _i0 θi​1 _i1 1 5 1 0 15 2 10 2 15 40 3 16 2.5 40 90 4 22 3 90 - The number of requests ranges from 50 to 200 in increments of 50, consistent with prior single-satellite scheduling studies [15, 16]. m​m​cmmc is set to 2048 GB for small-scale scenarios (50 or 100 requests) and 4096 GB for larger instances (150 or 200 requests). Unlike previous studies that randomly generate coordinates within geographic regions [2], this study employs a ground-track-based method to ensure all requests possess valid visibility windows. The scheduling time interval [0,S​T][0,ST] uses S​T∈3600,7200ST∈\3600,7200\ seconds, where request coordinates are generated by randomly selecting points along the satellite trajectory within [0,S​T][0,ST] and applying perturbations in the range [−2∘,2∘][-2 ,2 ]. The imaging duration d​u​rridur_r_i follows a normal distribution N​(25,9)N(25,9), and the request profit is modeled as N​(2×d​u​rri,100)N(2× dur_r_i,100), following [38, 31]. Uncertain parameters are modeled using probability distributions. Following [59], request profit and write code rate are modeled using gamma distributions G​m​(α,β)Gm(α,β), which are commonly employed for non-negative random variables. The expected value and variance are α/βα/β and α/β2α/β^2, respectively. For fixed expected values (corresponding to deterministic AEOSSP parameters), varying α produces different distribution shapes. Fig. 5 illustrates the probability density functions for a fixed expectation of 20 with varying α. Notably, as α→∞α→∞, the gamma distribution converges to a Gaussian distribution. In this study, αp=30 _p=30 and αc​r=350 _cr=350 are used to simulate realistic variations in profit and write code rate, with αc​r _cr reflecting the relatively stable write code rate observed during satellite operations. Request visibility uncertainty is modeled using cloud coverage probability pc​c∈0.15,0.30p_c∈\0.15,0.30\. Each scheduling scenario generates 50 test instances and 100 training instances. GP evolution utilizes only the training set, while performance is assessed on the test set. Concurrently, the exact evaluation model is employed throughout the testing phase to assess scheduling performance, thereby guaranteeing fairness. Figure 5: Probability density function curves are presented with a fixed mean of 20 and varying values of the parameter α. α takes values from 20 to 100 in increments of 20. It is evident that the larger the α, the steeper the curve. Based on the described instance design, 16 scheduling scenarios were constructed. Instance sets are named to reflect their parameter configurations; for example, the instance set “50_36_20_0.15” denotes a scenario with 50 requests, a request distribution time range of [0,3600][0,3600] seconds, a maximum onboard memory of 2048 GB, and a cloud coverage probability of 0.15. V-B Parameter Settings Table I enumerates the feature terminals employed, categorized into profit, memory, and time groups, collectively constituting the terminal set for the HE-GP. To mitigate unit inconsistencies among features, all terminals are normalized to the interval [0,1][0,1] prior to input into the scheduling policy. The function set comprises +,−,×,÷,max,min,abs\+,-,×, , , ,abs\, with all functions safeguarded against computational errors. Specifically, division is implemented as protected division, returning 1 when the denominator is zero. TABLE I: The terminal set developed for UAEOSSP consists of three distinct categories of features. Category Terminal Description Profit R​PRP Real profit. This feature is normalized using the min-max scaling method prior to application [61]. R​P​P​URPPU Real profit per unit of time. This feature is normalized via min-max scaling method. Memory E​M​CEMC Expected memory consumption. This feature is normalized via min-max scaling method. E​M​U​REMUR Expected memory usage ratio. This feature is computed as the expected memory consumption divided by the remaining onboard memory. R​M​PRMP Remaining memory percentage. This feature is calculated as the current remaining memory divided by the m​m​cmmc. Time C​TCT Current time. This feature is calculated by dividing the current time by the total scheduling duration (i.e., S​TST). R​I​S​TRIST Request imaging start time. This feature is defined by the following equation: RIST_r_i = osri- tnow+ cT - tnow+ c tn​o​wt_now is the current time; T is the total scheduling time; o​srios_r_i is the earliest observable time of the request rir_i; c is a small constant. R​R​PRRP Remaining request percentage. This feature is obtained as the number of candidate requests divided by the total number of requests. F​RFR Full ranking. All requests are ordered by the start time of their VTWs from earliest to latest. The feature is computed as the rank of the request divided by the total number of requests. R​R Relative Ranking. Considering the rank of requests within the candidate pool U, candidate requests in U are similarly sorted by the start time of their VTWs. This feature is calculated as the rank of the request divided by the total number of candidate requests. Other C Constant. The values are generated from a uniform random distribution over the interval [−1,1][-1,1]. The GP parameters utilized are detailed in Table IV and align with those commonly adopted in recent related research [24, 3]. Operator selection and population reproduction employ binary tournament selection. During training, a mini-batch rotation mechanism is used: the 100 training instances per scenario are partitioned into 20 mini-batches, each containing 5 instances. TABLE IV: The GP parameter settings for HE-GP. Parameter Value Parameter Value Population size 200 Tournament size 2 Number of iterations 50 Maximum depth 8 Crossover probability 0.80 Initial minimum depth 2 Mutation probability 0.15 Initial maximum depth 6 Furthermore, the hyperparameters f​a​ce​sfac_es and f​a​cp​dfac_pd are also involved in the evaluation process. An increased value of f​a​ce​sfac_es causes the algorithm to prioritize selection based on the degree of evolution, resulting in a higher likelihood of utilizing approximate evaluation model during the early stages of evolution. The sensitivity analysis of these two parameters is detailed in Section VI-A; therefore, specific values are not assigned at this stage. VI Results and Discussion To assess the effectiveness of the proposed HE-GP algorithm in addressing the UAEOSSP, this section presents comparative experiments involving 16 scheduling scenario instances. The benchmark methods include two categories of handcrafted heuristic algorithms. Additionally, two GP-based methods utilizing a single evaluation model are incorporated: the GP algorithm based on exact evaluation (E-GP) and the GP algorithm based on approximate evaluation (AE-GP). Each of the three GP-based approaches was independently executed 10 times across all 16 scheduling scenarios. Three evaluation metrics are introduced: Relative Percentage Deviation (RPD), which intuitively assesses solution quality by quantifying the deviation of each solution from the optimal solution [62], as shown in (30); Relative Gap (RG), which measures the relative difference between two values, as shown in (31); and Average Rank, which represents the average rank of performance across all scenarios. RPD(%)=Cb​e​s​t−Cm​e​t​h​o​dCb​e​s​t×100%RPD(\%)= C_best-C_methodC_best× 100\% (30) RG(%)=CA−CBCA×100%RG(\%)= C_A-C_BC_A× 100\% (31) Here, Cb​e​s​tC_best denotes the best value obtained among all algorithms, and Cm​e​t​h​o​dC_method represents the value achieved by a specific algorithm (for instance, CAC_A corresponds to the value obtained by algorithm A). VI-A Parameter Sensitivity Analysis The novel HE mechanism modulates the preference for various filtering modes through two hyperparameters, f​a​ce​sfac_es and f​a​cp​dfac_pd, which are constrained such that their sum equals one. To assess the effects of these parameters on the performance of HE-GP, a sensitivity analysis was conducted across 16 scheduling scenarios. The HE-GP with varying parameter settings is denoted as HE-GP(face​s(fac_es, facp​d)fac_pd), with both parameters ranging from 0 to 1 in increments of 0.1. Each parameter configuration was independently executed ten times. Fig. 6 presents a heatmap illustrating the frequency with which each (f​a​ce​s,f​a​cp​d)(fac_es,fac_pd) setting surpasses alternative settings in average performance across 16 scenarios, along with its mean ranking within these scenarios. The Wilcoxon rank-sum test confirmed that there are no significant differences in the results across various (face​s(fac_es, facp​d)fac_pd) at a 95% confidence level. The results indicate no significant differences among the parameter settings. Additionally, no single parameter setting consistently outperforms the others in most scenarios. Based on the average rank and heatmap, HE-GP demonstrates enhanced performance when the weights assigned to the two factors are approximately balanced. These findings underscore the efficacy of the HE mechanism’s design, which adaptively adjusts the evaluation model based on the evolutionary stage and population diversity. Figure 6: Heatmap of the frequency with which HE-GP outperforms other settings across 16 scenarios under various (f​a​ce​s,f​a​cp​d)(fac_es,fac_pd) settings. A larger square indicates that the parameter setting corresponding to the row outperforms others in a greater number of scenarios on average. The average rank represents the mean rank across 16 scenarios for each setting. In general, the parameter settings (0.4,0.6)(0.4,0.6), (0.5,0.5)(0.5,0.5), and (0.6,0.4)(0.6,0.4) demonstrate superior average performance compared to other settings. Notably, (0.6,0.4)(0.6,0.4) achieves the highest average rank of 4.25 and outperforms other parameter settings in the majority of evaluated scenarios. Therefore, subsequent experiments will employ the parameter pair (0.8,0.2)(0.8,0.2) to further investigate HE-GP’s performance characteristics. VI-B Scheduling Performance The benchmark algorithms considered in this study comprise Look-Ahead Heuristics (LAHs) [63] and Manually Designed Heuristics (MDHs) [59]. LAHs operate by selecting a request from a set of anticipated future requests at each decision point, guided by a greedy selection rule. The effectiveness of LAHs depends on both the length of the look-ahead horizon and the specific rule used. For instance, with a look-ahead step size of three, the algorithm selects one request from the next three (or fewer) forthcoming requests at each decision step. Notably, when the look-ahead step size is set to one, LAHs consistently select the nearest observable request. In contrast, MDHs are scheduling policies developed from experiential knowledge, are highly interpretable, and can be seamlessly integrated with the proposed OSA to produce schedules. A comprehensive overview of both LAHs and MDHs is provided as follows: • LAH1: Employs a look-ahead length of 1, thereby considering only the candidate request nearest to the satellite at each step. • LAH2: Selects the request with the highest actual profit among look-ahead requests, with the look-ahead length varying between 2 and 20. • LAH3: Chooses the request with the highest ratio of actual profit to imaging time among look-ahead requests, with the look-ahead length ranging from 2 to 20. • MDH1: Computes the value density of candidate requests as follows, prioritizing requests with higher values: M​D​H​1​(ri)=p¯ri​(e​n​v)d​u​rri+T​r​a​n​s​(a​t​trp​r​e,trp​r​e,a​t​to​sri,ri)MDH1(r_i)= p_r_i(env)dur_r_i+Trans(att_r_pre,t_r_pre,att_os_r_i,r_i) Here, the previously fulfilled request and its associated observation end time are denoted as rp​r​er_pre and trp​r​et_r_pre, respectively. The attitude information at that specific time is represented as a​t​trp​r​e,trp​r​eatt_r_pre,t_r_pre. • MDH2: Calculates the interval time from the current moment to the start of observation for each candidate request using: MDH2(ri)=maxTrans(attrp​r​e,trp​r​e,atto​sri,ri), MDH2(r_i)= \Trans (att_r_pre,t_r_pre,att_os_r_i,r_i ), osri−tp​r​e os_r_i-t_pre\ At each decision point, the candidate request with the minimal interval time is selected. This approach is analogous to the nearest-node selection strategy in the Travelling Salesman Problem. • MDH3: Applies MDH1 when m​m​cn​o​wmmc_now is less than half of m​m​cmmc; otherwise, MDH2 is used. In addition to the aforementioned handcrafted heuristic algorithms, two variants of HE-GP were introduced to facilitate a comparative performance analysis. The first variant, Exact Evaluation-based GPHH (E-GP), employs exact evaluation methods, whereas the second, Approximate Evaluation-based GPHH (AE-GP), uses approximate evaluation methods. The efficacy of these novel design algorithms was assessed using their best and average performance metrics from 10 independent experimental runs. Each HE-GP variant and the comparison algorithms were executed on 16 distinct UAEOSSP instance sets, with results systematically recorded. Note that the UAEOSSP dataset is partitioned into a training set of 100 instances and a test set of 50; this comparison focuses exclusively on algorithmic performance on the test set. Table V summarizes the performance outcomes of the various algorithms. Overall, HE-GP achieved the highest average ranking of 1.4375 and identified the optimal policy in 9 scenarios. The AE-GP, which relies solely on the approximate evaluation model, demonstrated inferior overall performance relative to the other two GP-based approaches, and its average performance across all scenarios is worse than HE-GP. Nonetheless, all three GP-based methods surpassed the handcrafted heuristic algorithms across 16 scenarios. Specifically, HE-GP achieved average performance improvements of 4.857% and 12.011% over the best results of the LAHs and MDHSs, respectively. Furthermore, the Wilcoxon rank-sum test was employed to assess the statistical significance of performance differences among the algorithms. The results indicate that, with 95% confidence, the HE-GP demonstrates significantly superior average performance compared to handcrafted heuristic algorithms across 16 scenarios, except for LAH2 and LAH3. Simultaneously, although no substantial differences are observed among the GP-based methods, HE-GP unexpectedly demonstrates superior performance in the average ranking metric and outperforms E-GP in most scenarios. TABLE V: Average performance (standard deviation) (RPD), with bold indicating the optimal average performance in the corresponding scenario. Win/Draw/Lose shows the performance of the comparison algorithm compared to HE-GP. Average rank gives the average ranking of each algorithm’s average performance across different scenarios. Scenario LAHs-Best MDHs-Best E-GP AE-GP HE-GP 50_36_20_0.15 1333.94(-)(4.03%) 1283.52(-)(8.12%) 1384.90(15.59)(0.20%) 1383.12(17.25)(0.33%) 1387.74(12.25)(0.00%) 50_36_20_0.30 1206.97(-)(11.94%) 1293.89(-)(4.42%) 1336.81(14.71)(1.07%) 1334.78(9.45)(1.22%) 1351.09(8.07)(0.00%) 50_72_20_0.15 1353.01(-)(3.96%) 1253.45(-)(12.22%) 1394.88(17.48)(0.84%) 1398.63(15.98)(0.57%) 1406.58(14.32)(0.00%) 50_72_20_0.30 1212.65(-)(12.46%) 1273.00(-)(7.13%) 1363.76(16.98)(0.00%) 1359.69(13.01)(0.30%) 1361.16(14.82)(0.19%) 100_36_20_0.15 1492.58(-)(3.74%) 1301.99(-)(18.92%) 1548.36(21.11)(0.00%) 1542.15(21.09)(0.40%) 1547.80(9.25)(0.04%) 100_36_20_0.30 1471.70(-)(2.11%) 1286.54(-)(16.80%) 1492.52(13.59)(0.68%) 1495.56(15.09)(0.48%) 1502.75(24.94)(0.00%) 100_72_20_0.15 1499.17(-)(4.11%) 1270.18(-)(22.87%) 1560.74(17.01)(0.00%) 1543.39(29.71)(1.12%) 1558.49(13.03)(0.14%) 100_72_20_0.30 1478.57(-)(2.14%) 1252.04(-)(20.62%) 1499.77(26.25)(0.70%) 1496.70(25.14)(0.90%) 1510.23(22.59)(0.00%) 150_36_40_0.15 2646.37(-)(6.87%) 2631.33(-)(7.48%) 2827.78(24.36)(0.01%) 2767.12(74.62)(2.21%) 2828.19(38.85)(0.00%) 150_36_40_0.30 2647.44(-)(3.93%) 2571.17(-)(7.02%) 2751.63(13.94)(0.00%) 2662.42(46.69)(3.35%) 2731.55(25.98)(0.73%) 150_72_40_0.15 2743.25(-)(4.25%) 2456.76(-)(16.41%) 2845.46(48.67)(0.51%) 2836.35(41.64)(0.82%) 2859.87(34.19)(0.00%) 150_72_40_0.30 2636.81(-)(5.23%) 2417.71(-)(14.76%) 2763.08(56.68)(0.42%) 2766.40(36.60)(0.30%) 2774.66(34.56)(0.00%) 200_36_40_0.15 2880.63(-)(4.34%) 2809.14(-)(6.99%) 3000.01(45.43)(0.19%) 2927.36(42.83)(2.67%) 3005.64(23.70)(0.00%) 200_36_40_0.30 2851.83(-)(2.42%) 2721.55(-)(7.33%) 2921.01(34.26)(0.00%) 2844.34(73.92)(2.69%) 2919.43(30.54)(0.05%) 200_72_40_0.15 2874.93(-)(5.46%) 2594.81(-)(16.84%) 3031.89(29.21)(0.00%) 3003.36(44.82)(0.95%) 3018.35(23.59)(0.45%) 200_72_40_0.30 2857.73(-)(2.55%) 2568.64(-)(14.09%) 2930.58(37.31)(0.00%) 2882.61(73.74)(1.66%) 2926.25(41.33)(0.15%) Win/Draw/Lose 0/0/16 0/0/16 7/0/9 0/0/16 N/A Average Rank 4.0625 4.8750 1.7500 2.8750 1.4375 The HE mechanism proposed in this study aims to reduce evaluation overhead while improving the algorithm’s search efficiency by adaptively switching between exact and approximate evaluation models. These models employ distinct filtering modes, which may result in variations in the schedules generated by an identical policy, thus affecting the policy’s fitness. This design essentially perturbs the evolutionary search process by introducing evaluation noise, which can enhance exploration capabilities during the early stages of the algorithm or when population diversity is low. This study investigates this phenomenon by analyzing the evolutionary process of GP-based methods. Fig. 7 displays the performance of the optimal policies obtained by the three GP-based methods on the test set throughout the evolutionary process. All methods utilize the exact evaluation model for testing to ensure a fair comparison. The evolutionary trajectories indicate that HE-GP is more adept at escaping local optima during evolution (except <150_72_40_0.15>, <200_36_40_0.15>), as evidenced by continuous improvements in the best policy rather than stagnation. For instance, in scenario <100_36_20_0.30>, both E-GP and AE-GP exhibit premature convergence, with no enhancement in the best policy over an extended period. In contrast, HE-GP achieves improvements through perturbations induced by the HE mechanism. Overall, HE-GP shows superior performance in scenarios of small and medium scale. From the perspective of evolutionary potential, the frequencies with which HE-GP, E-GP, and AE-GP exhibited the most significant advancements in Fig. 7 correspond to a ratio of 12:4:012:4:0. Additionally, the highest counts of iterations showing improvements in optimal performance follow a ratio of 8:6:28:6:2. These observations suggest that, in certain cases, once E-GP and AE-GP become trapped in local optima, escaping solely through traditional genetic operations appears to be challenging. For instance, in scenario <100_72_20_0.15>, AE-GP shows no improvement in optimal performance from generation 13 to 50, whereas HE-GP continuously finds better policies and eventually surpasses E-GP. Figure 7: The optimal performance of the evolutionary process in GP-based methods across 16 scenarios. The random seed is set to 1. VI-C Training Time The findings in Section VI-B confirm that HE-GP is comparable to E-GP and even surpasses it based on certain metrics. To verify the positive impact of the HE mechanism on the algorithm, we compared the training times of the two methods, with training time serving as a key indicator of efficiency. Table VI presents a comparative analysis of training and evaluation times between E-GP and HE-GP. Compared to E-GP, HE-GP achieves an average reduction of 17.77% in training time and 17.78% in evaluation time across 16 scenarios. These findings clearly demonstrate that the approximate evaluation model and adaptive switching mechanism introduced by HE substantially decrease evaluation costs. Notably, both E-GP and HE-GP allocate over 99% of their total runtime to evaluation, indicating that evaluation efficiency is the predominant factor influencing overall algorithm runtime. Furthermore, the evaluation overhead is proportional to the number of requests: an increase in requests results in more decision points within the timeline-based decision process and a higher volume of requests requiring updates at each decision step, including feasibility checks, determination of OWs, and calculation of heuristic values. TABLE VI: Comparison of training time and evaluation time between E-GP and HE-GP. Gap is the percentage reduction of HE-GP compared to E-GP. Evaluation time ratio = evaluation time / training time. Scenario Average Training Time (seconds) Average Evaluation Time (seconds) Evaluation Ratio (%) E-GP HE-GP Gap (%) E-GP HE-GP Gap (%) E-GP HE-GP 50_36_20_0.15 668.56 591.30 11.56 666.58 589.45 11.57 99.70 99.69 50_36_20_0.30 540.47 435.26 19.47 538.50 433.51 19.50 99.63 99.60 50_72_20_0.15 581.64 472.21 18.81 579.55 470.38 18.84 99.64 99.61 50_72_20_0.30 467.50 346.26 25.93 465.50 344.45 26.00 99.57 99.48 100_36_20_0.15 1125.24 961.38 14.56 1123.74 959.92 14.58 99.87 99.85 100_36_20_0.30 952.20 831.19 12.71 950.64 829.66 12.73 99.84 99.81 100_72_20_0.15 933.28 829.53 11.12 931.78 828.08 11.13 99.84 99.82 100_72_20_0.30 770.55 768.87 0.22 768.93 767.35 0.20 99.79 99.80 150_36_40_0.15 2242.68 1573.54 29.84 2241.21 1572.34 29.84 99.93 99.92 150_36_40_0.30 1867.37 1481.23 20.68 1865.80 1479.87 20.68 99.91 99.91 150_72_40_0.15 2037.04 1644.73 19.26 2035.47 1643.21 19.27 99.92 99.91 150_72_40_0.30 1718.74 1475.19 14.17 1717.01 1473.49 14.18 99.90 99.88 200_36_40_0.15 2439.90 1788.97 26.68 2439.10 1788.19 26.69 99.97 99.96 200_36_40_0.30 2291.57 1683.05 26.55 2290.65 1682.18 26.56 99.96 99.95 200_72_40_0.15 2257.89 1836.10 18.68 2256.94 1835.22 18.68 99.96 99.95 200_72_40_0.30 2079.62 1786.79 14.08 2078.60 1785.77 14.09 99.95 99.94 Average - - 17.77 - - 17.78 99.84 99.82 Fig. 8 presents line graphs depicting the cumulative training time and the average size throughout the evolutionary processes of E-GP and HE-GP. In certain scenarios, the average size of HE-GP exceeds that of E-GP for the majority of iterations (e.g., <50_36_20_0.15>, <100_36_20_0.30>). It suggests that the improvement in evaluation efficiency is not due to simplifying the policy used to accelerate heuristic value computation. In <100_72_20_0.30>, the average size of HE-GP during the iteration process is significantly larger than that of E-GP, which explains why the reduction in training time for HE-GP compared to E-GP in this scenario is minimal. When the average sizes of the two algorithms are similar, HE-GP exhibits a shorter training time. Collectively, the experimental findings corroborate that the HE mechanism effectively improves training efficiency. Meanwhile, the comparable average sizes suggest that HE-GP and E-GP exhibit a similar extent of coverage within the policy space, further substantiating the superior performance of HE-GP analyzed in Section VI-B. Figure 8: Line chart of cumulative training time and average size of E-GP and HE-GP in the 16 scenarios. Average size refers to the mean size of all policies in GP population. VI-D Component Analysis This study investigates three GP-based methods (i.e., E-GP, AE-GP, and HE-GP) and their evolved scheduling policies by analyzing feature frequencies and the mathematical significance of the resulting tree-based policies. Fig. 9 presents the sizes of the optimal policies generated by these methods across 10 independent runs on 16 scenarios. Despite differences in their evaluation approaches, the distribution of policy sizes is similar across the three methods, with most policies ranging from 20 to 60 nodes and relatively few outside this range. This suggests that modifications to the evaluation process do not significantly affect the policy structure. Figure 9: The size of the optimal scheduling policy for all running results of GP-based methods on 16 scenarios (with 10 independent runs for each scenario) Although assessing the importance of terminal nodes solely based on their frequency is imprecise, due to potential confounding effects from redundant structures (e.g., the expression “X−X-X” artificially inflates the frequency of terminal X [13]), frequency analysis can nonetheless provide a partial indication of the algorithm’s preference for certain terminals. Fig. 10 shows the frequency distribution of terminals, including both features and functions, within the optimal scheduling policies obtained from 10 independent runs across 16 scenarios. Notably, the frequency of feature terminals is remarkably similar across the three GP-based methods. Among these, Profit-R​PRP, Memory-E​M​U​REMUR, and Time-R​I​S​TRIST/R​R exhibit relatively high frequencies. In particular, R​PRP appears most frequently, underscoring its critical role in shaping policy logic. Additionally, E​M​U​REMUR, R​I​S​TRIST, R​R, and R​P​P​URPPU are significant contributors to optimal policies. Conversely, E​M​CEMC, which represents expected memory consumption, occurs least frequently, possibly because it is less intuitive and less influential than E​M​U​REMUR (expected memory usage ratio). Figure 10: The frequency of feature terminals and functions in the optimal scheduling policies among all the running results of GP-based methods on 16 scenarios (with 10 independent runs for each scenario). The darker the color, the higher the frequency. Scheduling policies serve as priority functions that guide decision-making and are encoded as tree-structured genotypes, which can be represented as mathematical expressions. Analyzing these expressions facilitates a deeper understanding of the underlying decision-making logic. Two robust scheduling policies, denoted S​P1SP_1 and S​P2SP_2, derived from HE-GP in <50_36_20_0.15>and <50_36_20_0.30>, respectively, are presented below: S​P1= SP_1= max(EMUR,||R||)+max(EMUR, (EMUR, | |R | | )+ (EMUR, (32) EMUR)÷(RP÷(0.5239+RP)) EMUR) (RP (0.5239+RP ) ) S​P2= SP_2= R​R+min⁡(|−0.8010|,E​M​U​R÷R​P) R+ ( |-0.8010 |,EMUR RP ) (33) +(R​R​P+F​R) + (RRP+FR ) In (32), E​M​U​REMUR exhibits a positive correlation with the priority assigned to candidate requests, whereas R​PRP shows a negative correlation. This relationship also holds in (33). In (33), ranking information makes a significant role, incorporating both absolute ranking (F​RFR) and relative ranking (R​R). A negative correlation exists between reward and heuristic value in both (32) and (33). This counterintuitive finding suggests that HE-GP can identify policies that are challenging to discern based on expert experience but prove effective in scheduling scenarios. Nonetheless, the structures within (32) contain redundancies, exemplified by expressions like max⁡(E​M​U​R,E​M​U​R) (EMUR,EMUR), which can be simplified to E​M​U​REMUR. The simplified expressions for S​P1SP_1 are shown in (34): S​P1′= SP_1 = max(EMUR,R)+EMUR÷ (EMUR,R )+EMUR (34) (R​P÷(0.5239+R​P)) (RP (0.5239+RP) ) VII Conclusion This research investigates the UAEOSSP, a more realistic extension of the conventional AEOSSP, by incorporating uncertainties related to profit, resource consumption, and visibility. The problem is formulated as a stochastic programming model to effectively represent the uncertain environmental conditions characteristic. Inspired by recent advancements in GPHH for scheduling optimization, this study applies GPHH to solve the UAEOSSP. To mitigate the computational burden associated with evaluation and enhance algorithmic performance, a novel HE mechanism was developed and integrated into GPHH. This HE mechanism accelerates the filtering of candidate requests within the MDP employed for evaluation by implementing a rigorously designed constraint-checking procedure. Two evaluation models are incorporated into the HE mechanism: an exact evaluation model and an approximate evaluation model, with the latter further improving computational efficiency based on the former. Moreover, the HE mechanism adopts an adaptive switching technique that dynamically alternates between the two models in response to the evolutionary state information. Extensive experimental evaluations were conducted across 16 scheduling scenarios, comparing HE-GP with LAHs, MDHs, E-GP, and AE-GP. Analyzing the average performance of the evolved scheduling policies, HE-GP achieved an average rank of 1.4375, the highest among all algorithms considered. HE-GP outperformed the GP utilizing a single evaluation model in more than half of the scenarios and surpassed handcrafted algorithms in all scenarios. Moreover, HE-GP achieved an average reduction of 17.77% in training time compared to E-GP, which relies solely on the exact evaluation model. The findings reveal that the HE mechanism not only speeds up training efficiency but also effectively alleviates evolutionary stagnation, as evidenced by HE-GP exhibiting better continuous optimization capability and greater optimization magnitude compared to E-GP and AE-GP. Component analyses of the evolved scheduling policies confirmed their interpretability, while feature frequency analyses identified key terminals (e.g., profit-related and memory-related features) that contribute substantially to the optimal policies. These analyses validated the interpretability of the evolved policies and provided valuable insights to inform future research in this field. Notwithstanding these promising results, this study acknowledges several limitations. First, the current implementation is limited to single-AEOS scheduling; future research could extend the research to include constellations of multiple AEOSs. Second, the hyperparameters of the HE mechanism require further optimization through systematic tuning, and the robustness of the evolved policies should be assessed under a wider range of environmental conditions. Third, existing GP-based methods inevitably produce scheduling policies that include redundant structures, which not only reduce decision-making efficiency but also hinder user understanding. In summary, the proposed HE-GP constitutes a notable advancement in addressing the UAEOSSP. The interpretability of the derived scheduling policies renders them especially appropriate for practical implementation in aerospace contexts, where reliability and transparency are critical. This study not only advances the domain of satellite scheduling but also provides valuable insights into the development of effective evaluation frameworks for GP-based optimization methods. References [1] B. Ferrari, J.-F. Cordeau, M. Delorme, M. Iori, and R. Orosei, “Satellite scheduling problems: A survey of applications in earth and outer space observation,” Computers & Operations Research, vol. 173, p. 106875, Jan. 2025. [2] X. Liu, G. Laporte, Y. Chen, and R. He, “An adaptive large neighborhood search metaheuristic for agile satellite scheduling with time-dependent transition time,” Computers & Operations Research, vol. 86, p. 41–53, Oct. 2017. [3] L. Wei, M. Chen, L. Xing, Q. Wan, Y. Song, Y. Chen, and Y. Chen, “Knowledge-transfer based genetic programming algorithm for multi-objective dynamic agile earth observation satellite scheduling problem,” Swarm and Evolutionary Computation, vol. 85, p. 101460, Mar. 2024. [4] X. Wang, G. Wu, L. Xing, and W. Pedrycz, “Agile earth observation satellite scheduling over 20 years: Formulations, methods, and future directions,” IEEE Systems Journal, vol. 15, no. 3, p. 3881–3892, Sep. 2021. [5] K. Thangavel, R. Sabatini, A. Gardi, K. Ranasinghe, S. Hilton, P. Servidia, and D. Spiller, “Artificial intelligence for trusted autonomous satellite operations,” Progress in Aerospace Sciences, vol. 144, p. 100960, Jan. 2024. [6] D. Ouelhadj and S. Petrovic, “A survey of dynamic scheduling in manufacturing systems,” Journal of Scheduling, vol. 12, no. 4, p. 417–431, Aug. 2009. [7] X. Wang, G. Song, R. Leus, and C. Han, “Robust earth observation satellite scheduling with uncertainty of cloud coverage,” IEEE Transactions on Aerospace and Electronic Systems, vol. 56, no. 3, p. 2450–2461, Jun. 2020. [8] C. Han, Y. Gu, G. Wu, and X. Wang, “Simulated annealing-based heuristic for multiple agile satellites scheduling under cloud coverage uncertainty,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 53, no. 5, p. 2863–2874, May 2023. [9] W. Jianjiang, H. Xuejun, and H. Chuan, “Reactive scheduling of multiple eoss under cloud uncertainties: Model and algorithms,” Journal of Systems Engineering and Electronics, vol. 32, no. 1, p. 163–177, Feb. 2021. [10] Y. Chen, J. Xue, W. Gu, and M. Shao, “An effective genetic programming hyper-heuristic for uncertain agile satellite scheduling,” in 2025 11th International Conference on Big Data and Information Analytics (BigDIA). Nha Trang, Vietnam: IEEE, Nov. 2025, p. 311–318. [11] M. Chen, Y. Du, K. Tang, L. Xing, Y. Chen, and Y. Chen, “Learning to construct a solution for the agile satellite scheduling problem with time-dependent transition times,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 10, p. 5949–5963, Oct. 2024. [12] D. Eddy and M. Kochenderfer, “Markov decision processes for multi-objective satellite task planning,” in 2020 IEEE Aerospace Conference. Big Sky, MT, USA: IEEE, Mar. 2020, p. 1–12. [13] S. Wang, Y. Mei, and M. Zhang, “Towards interpretable routing policy: A two stage multi-objective genetic programming approach with feature selection for uncertain capacitated arc routing problem,” 2020 IEEE Symposium Series on Computational Intelligence (ssci), p. 2399–2406, 2020. [14] R. Xu, H. Chen, X. Liang, and H. Wang, “Priority-based constructive algorithms for scheduling agile earth observation satellites with total priority maximization,” Expert Systems with Applications, vol. 51, p. 195–206, Jun. 2016. [15] J. Chun, W. Yang, X. Liu, G. Wu, L. He, and L. Xing, “Deep reinforcement learning for the agile earth observation satellite scheduling problem,” Mathematics, vol. 11, no. 19, p. 4059, Sep. 2023. [16] M. Wang, Z. Zhou, Z. Chang, E. Chen, and R. Li, “Deep reinforcement learning for agile earth observation satellites scheduling problem with variable image duration,” Applied Soft Computing, vol. 169, p. 112575, Jan. 2025. [17] L. Wei, Y. Chen, M. Chen, and Y. Chen, “Deep reinforcement learning and parameter transfer based approach for the multi-objective agile earth observation satellite scheduling problem,” Applied Soft Computing, vol. 110, p. 107607, 2021. [18] M. Chen, Y. Chen, Y. Chen, and W. Qi, “Deep reinforcement learning for agile satellite scheduling problem,” in 2019 IEEE Symposium Series on Computational Intelligence (SSCI). Xiamen, China: IEEE, Dec. 2019, p. 126–132. [19] H. Liu, H. Liu, Y. Kuang, J. Wang, and B. Li, “Deep symbolic optimization for combinatorial optimization: Accelerating node selection by discovering potential heuristics,” Jun. 2024. [20] Y. Mei, Q. Chen, A. Lensen, B. Xue, and M. Zhang, “Explainable artificial intelligence by genetic programming: A survey,” IEEE Transactions on Evolutionary Computation, vol. 27, no. 3, p. 621–641, Jun. 2023. [21] E. Burke, M. Hyde, G. Kendall, G. Ochoa, E. Özcan, and R. Qu, “A survey of hyper-heuristics,” School of Computer Science and Information Technology University of Nottingham Jubilee Campus, Tech. Rep. No. NOTTCS-TR-SUB-0906241418-2747, Jan. 2009. [22] E. Burke, G. Kendall, J. Newall, E. Hart, P. Ross, and S. Schulenburg, “Hyper-heuristics: An emerging direction in modern search technology,” in Handbook of Metaheuristics, F. Glover and G. A. Kochenberger, Eds. Boston, MA: Springer US, 2003, p. 457–474. [23] S. Wang, Y. Mei, and M. Zhang, “Explaining genetic programming-evolved routing policies for uncertain capacitated arc routing problems,” IEEE Transactions on Evolutionary Computation, vol. 28, no. 4, p. 918–932, Aug. 2024. [24] Z. Sun, Y. Mei, F. Zhang, H. Huang, C. Gu, and M. Zhang, “Multi-tree genetic programming hyper-heuristic for dynamic flexible workflow scheduling in multi-clouds,” IEEE Transactions on Services Computing, vol. 17, no. 5, p. 2687–2703, Sep. 2024. [25] C. Zhang, J. Yang, and N. Wang, “Multitree genetic programming with rule reconstruction for dynamic task scheduling in integrated cloud–edge satellite–terrestrial networks,” IEEE Internet of Things Journal, vol. 12, no. 12, p. 21 429–21 442, Jun. 2025. [26] M. Ansari Ardeh, Y. Mei, and M. Zhang, “Genetic programming with knowledge transfer and guided search for uncertain capacitated arc routing problem,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, p. 765–779, Aug. 2022. [27] S. Wang, Y. Mei, M. Zhang, and X. Yao, “Genetic programming with niching for uncertain capacitated arc routing problem,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 1, p. 73–87, Feb. 2022. [28] F. Zhang, S. Nguyen, and M. Zhang, “Collaborative multifidelity-based surrogate models for genetic programming in dynamic flexible job shop scheduling,” IEEE Transactions on Cybernetics, vol. 52, no. 8, p. 8142–8156, 2022. [29] S. Wang, Y. Mei, and M. Zhang, “A multi-objective genetic programming approach with self-adaptive α dominance to uncertain capacitated arc routing problem,” in 2021 IEEE Congress on Evolutionary Computation (Cec). Kraków, Poland: IEEE, Jun. 2021, p. 636–643. [30] —, “Two-stage multi-objective genetic programming with archive for uncertain capacitated arc routing problem,” in Proceedings of the Genetic and Evolutionary Computation Conference. Lille France: ACM, Jun. 2021, p. 287–295. [31] X. Chu, Y. Chen, and Y. Tan, “An anytime branch and bound algorithm for agile earth observation satellite onboard scheduling,” Advances in Space Research, vol. 60, no. 9, p. 2077–2090, Nov. 2017. [32] G. Dantzig, R. Fulkerson, and S. Johnson, “Solution of a large-scale traveling-salesman problem,” Journal of the Operations Research Society of America, vol. 2, no. 4, p. 393–410, 1954. [33] M. Lemaître, G. Verfaillie, F. Jouhaud, J.-M. Lachiver, and N. Bataille, “Selecting and scheduling observations of agile satellites,” Aerospace Science and Technology, vol. 6, no. 5, p. 367–381, Sep. 2002. [34] G. Peng, R. Dewil, C. Verbeeck, A. Gunawan, L. Xing, and P. Vansteenwegen, “Agile earth observation satellite scheduling: An orienteering problem with time-dependent profits and travel times,” Computers & Operations Research, vol. 111, p. 84–98, Nov. 2019. [35] R. Kandepi, H. Saini, R. K. George, S. Konduri, and R. Karidhal, “Agile earth observation satellite constellations scheduling for large area target imaging using heuristic search,” Acta Astronautica, vol. 219, p. 670–677, Jun. 2024. [36] G. Peng, G. Song, Y. He, J. Yu, S. Xiang, L. Xing, and P. Vansteenwegen, “Solving the agile earth observation satellite scheduling problem with time-dependent transition times,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 3, p. 1614–1625, Mar. 2022. [37] H. Chen, S. Peng, C. Du, and J. Li, Earth Observation Satellites: Task Planning and Scheduling. Singapore: Springer Nature Singapore, 2023. [38] X. Chu, Y. Chen, and L. Xing, “A branch and bound algorithm for agile earth observation satellite scheduling,” Discrete Dynamics in Nature and Society, vol. 2017, p. 1–15, 2017. [39] A. Maillard, “Flexible scheduling for an agile earth-observing satelllite,” in Proceedings of the 24th International Conference on Artificial Intelligence, ser. IJCAI’15. Buenos Aires, Argentina: AAAI Press, 2015, p. 4379–4380. [40] W. Yang, L. He, X. Liu, and Y. Chen, “Onboard coordination and scheduling of multiple autonomous satellites in an uncertain environment,” Advances in Space Research, vol. 68, no. 11, p. 4505–4524, Dec. 2021. [41] S. Liu, Y. Chen, L. Xing, and X. Guo, “Time-dependent autonomous task planning of agile imaging satellites,” Journal of Intelligent & Fuzzy Systems, vol. 31, no. 3, p. 1365–1375, Sep. 2016. [42] X. Zhu, J. Wang, X. Qin, J. Wang, Z. Liu, and E. Demeulemeester, “Fault-tolerant scheduling for real-time tasks on multiple earth-observation satellites,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 11, p. 3012–3026, Nov. 2015. [43] K. Li, T. Zhang, and R. Wang, “Deep reinforcement learning for multiobjective optimization,” IEEE Transactions on Cybernetics, vol. 51, no. 6, p. 3103–3114, Jun. 2021. [44] M. Taleby Ahvanooey, Q. Li, M. Wu, and S. Wang, “A survey of genetic programming and its applications,” KSII Transactions on Internet and Information Systems, vol. Vol.13, p. 1765–1793, Apr. 2019. [45] S. Wang, Y. Mei, and M. Zhang, “A multi-objective genetic programming algorithm with α dominance and archive for uncertain capacitated arc routing problem,” IEEE Transactions on Evolutionary Computation, vol. 27, no. 6, p. 1633–1647, Dec. 2023. [46] —, “Novel ensemble genetic programming hyper-heuristics for uncertain capacitated arc routing problem,” in Proceedings of the Genetic and Evolutionary Computation Conference. Prague Czech Republic: ACM, Jul. 2019, p. 1093–1101. [47] S. Wang, Y. Mei, J. Park, and M. Zhang, “Evolving ensembles of routing policies using genetic programming for uncertain capacitated arc routing problem,” in 2019 IEEE Symposium Series on Computational Intelligence (Ssci). Xiamen, China: IEEE, Dec. 2019, p. 1628–1635. [48] Y. Zakaria, Y. Zakaria, A. BahaaElDin, and M. Hadhoud, “Niching-based feature selection with multi-tree genetic programming for dynamic flexible job shop scheduling,” in Studies in Computational Intelligence. Cham: Springer International Publishing, 2021, p. 3–27. [49] Z.-H. Zhan, L. Shi, K. C. Tan, and J. Zhang, “A survey on evolutionary computation for complex continuous optimization,” Artificial Intelligence Review, vol. 55, no. 1, p. 59–110, Jan. 2022. [50] C. Sun, J. Zeng, J. Pan, S. Xue, and Y. Jin, “A new fitness estimation strategy for particle swarm optimization,” Information Sciences, vol. 221, p. 355–370, 2013. [51] J.-H. Chen, D. E. Goldberg, S.-Y. Ho, and K. Sastry, “Fitness inheritance in multi-objective optimization,” in Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, ser. GECCO’02. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2002, p. 319–326. [52] C. Sun, Y. Jin, R. Cheng, J. Ding, and J. Zeng, “Surrogate-assisted cooperative swarm optimization of high-dimensional expensive problems,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 4, p. 644–660, Aug. 2017. [53] H. Wang, Y. Jin, and J. Doherty, “A generic test suite for evolutionary multifidelity optimization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 6, p. 836–850, Dec. 2018. [54] X.-F. Liu, Z.-H. Zhan, J.-H. Lin, and J. Zhang, “Parallel differential evolution based on distributed cloud computing resources for power electronic circuit optimization,” in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, ser. GECCO ’16 Companion. New York, NY, USA: Association for Computing Machinery, 2016, p. 117–118. [55] T. Hildebrandt and J. Branke, “On using surrogates with genetic programming,” Evolutionary Computation, vol. 23, no. 3, p. 343–367, Sep. 2015. [56] J. R. Koza, “Genetic programming as a means for programming computers by natural selection,” Statistics and Computing, vol. 4, no. 2, p. 87–112, Jun. 1994. [57] F. Zhang, Y. Chen, and Y. Chen, “Evolving constructive heuristics for agile earth observing satellite scheduling problem with genetic programming,” in 2018 IEEE Congress on Evolutionary Computation (CEC), Jul. 2018, p. 1–7. [58] Y. He, L. Xing, Y. Chen, W. Pedrycz, L. Wang, and G. Wu, “A generic markov decision process model and reinforcement learning method for scheduling agile earth observation satellites,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 3, p. 1463–1474, Mar. 2022. [59] Y. Liu, Y. Mei, M. Zhang, and Z. Zhang, “Automated heuristic design using genetic programming hyper-heuristic for uncertain capacitated arc routing problem,” in Proceedings of the Genetic and Evolutionary Computation Conference. Berlin Germany: ACM, Jul. 2017, p. 290–297. [60] B. Golden, X. Wang, and E. Wasil, The Evolution of the Vehicle Routing Problem: A Survey of VRP Research and Practice from 2005 to 2022, ser. Synthesis Lectures on Operations Research and Applications. Cham: Springer Nature Switzerland, 2023. [61] P. J. Muhammad Ali, “Investigating the impact of min-max data normalization on the regression performance of k-nearest neighbor with different similarity measurements,” Aro-the Scientific Journal of Koya University, vol. 10, no. 1, p. 85–91, Jun. 2022. [62] X. Lin, Y. Chen, J. Xue, B. Zhang, Y. Chen, and C. Chen, “Parallel machine scheduling with job family, release time, and mold availability constraints: Model and two solution approaches,” Memetic Computing, vol. 16, no. 3, p. 355–371, Sep. 2024. [63] K. Sun, G. Bai, Y. Chen, R. He, and L. Xing, “Action planning for agile earth-observing satellite mission planning problem,” Journal of National University of Defense Technology, vol. 34, no. 6, p. 141–147, 2012.