Paper deep dive

A Self-Evolving Defect Detection Framework for Industrial Photovoltaic Systems

Haoyu He, Yu Duan, Wenzhen Liu, Hanyuan Hang, Qiantu Tuo, Xiaoke Yang, Rui Li

Year: 2026Venue: arXiv preprintArea: cs.AIType: PreprintEmbeddings: 52

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 97%

Last extracted: 3/22/2026, 5:15:42 AM

Summary

The paper introduces SEPDD, a self-evolving framework for industrial photovoltaic (PV) defect detection. It addresses challenges like long-tailed defect distributions, domain shifts, and evolving defect taxonomies by integrating automated model optimization with a continual self-evolving learning mechanism. SEPDD outperforms existing baselines and human experts on both public and private industrial EL datasets.

Entities (4)

Electroluminescence (EL) imaging · technology · 100%Photovoltaic (PV) modules · hardware · 100%SEPDD · framework · 100%EcoFlow Inc · organization · 95%

Relation Signals (3)

SEPDD → detects → Photovoltaic (PV) modules

confidence 100% · SEPDD is designed for industrial photovoltaic defect detection.

SEPDD → utilizes → Electroluminescence (EL) imaging

confidence 100% · SEPDD is a self-evolving framework for industrial photovoltaic inspection using EL imaging.

SEPDD → developedby → EcoFlow Inc

confidence 90% · The authors are affiliated with EcoFlow Inc and developed the SEPDD framework.

Cypher Suggestions (2)

Identify technologies used by the SEPDD framework · confidence 95% · unvalidated

MATCH (f:Framework {name: 'SEPDD'})-[:UTILIZES]->(t:Technology) RETURN t.name

Find all frameworks developed by a specific organization · confidence 90% · unvalidated

MATCH (f:Framework)-[:DEVELOPED_BY]->(o:Organization {name: 'EcoFlow Inc'}) RETURN f.name

Abstract

Abstract:Reliable photovoltaic (PV) power generation requires timely detection of module defects that may reduce energy yield, accelerate degradation, and increase lifecycle operation and maintenance costs during field operation. Electroluminescence (EL) imaging has therefore been widely adopted for PV module inspection. However, automated defect detection in real operational environments remains challenging due to heterogeneous module geometries, low-resolution imaging conditions, subtle defect morphology, long-tailed defect distributions, and continual data shifts introduced by evolving inspection and labeling processes. These factors significantly limit the robustness and long-term maintainability of conventional deep-learning inspection pipelines. To address these challenges, this paper proposes SEPDD, a Self-Evolving Photovoltaic Defect Detection framework designed for evolving industrial PV inspection scenarios. SEPDD integrates automated model optimization with a continual self-evolving learning mechanism, enabling the inspection system to progressively adapt to distribution shifts and newly emerging defect patterns during long-term deployment. Experiments conducted on both a public PV defect benchmark and a private industrial EL dataset demonstrate the effectiveness of the proposed framework. Both datasets exhibit severe class imbalance and significant domain shift. SEPDD achieves a leading mAP50 of 91.4% on the public dataset and 49.5% on the private dataset. It surpasses the autonomous baseline by 14.8% and human experts by 4.7% on the public dataset, and by 4.9% and 2.5%, respectively, on the private dataset.

PDF

Open source PDF →Open local PDF →

Full Text

51,492 characters extracted from source content.

Expand or collapse full text

1 A Self-Evolving Defect Detection Framework for Industrial Photovoltaic Systems Haoyu He, Yu Duan, Wenzhen Liu, Hanyuan Hang, Qiantu Tuo, Xiaoke Yang, Rui Li Abstract—Reliable photovoltaic (PV) power generation requires timely detection of module defects that may reduce energy yield, accelerate degradation, and increase lifecycle operation and main- tenance (O&M) costs during field operation. Electroluminescence (EL) imaging has therefore been widely adopted for PV module in- spection. However, automated defect detection in real operational environments remains challenging due to heterogeneous mod- ule geometries, low-resolution imaging conditions, subtle defect morphology, long-tailed defect distributions, and continual data shifts introduced by evolving inspection and labeling processes. These factors significantly limit the robustness and long-term maintainability of conventional deep-learning inspection pipelines. To address these challenges, this paper proposes SEPDD, a Self- Evolving Photovoltaic Defect Detection framework designed for evolving industrial PV inspection scenarios. SEPDD integrates automated model optimization with a continual self-evolving learning mechanism, enabling the inspection system to progres- sively adapt to distribution shifts and newly emerging defect patterns during long-term deployment. Experiments conducted on both a public PV defect benchmark and a private industrial EL dataset demonstrate the effectiveness of the proposed framework. Both datasets exhibit severe class imbalance and significant domain shift. SEPDD achieves a leading mAP50 of 91.4% on the public dataset and 49.5% on the private dataset. It surpasses the autonomous baseline by 14.8% and human experts by 4.7% on the public dataset, and by 4.9% and 2.5%, respectively, on the private dataset. By improving the robustness and adaptability of automated PV inspection, the proposed framework supports more reliable maintenance of distributed PV systems. Index Terms—Photovoltaic systems, PV module reliability, electroluminescence inspection, defect detection, operation and maintenance. I. INTRODUCTION P HOTOVOLTAIC (PV) generation has become one of the fastest-growing renewable energy sources and plays an increasingly important role in modern power systems. Ensuring the long-term reliability and sustained energy yield of PV modules is essential for stable power generation and effective lifecycle operation and maintenance (O&M) management. In practical deployments of PV systems, various defects and degradation mechanisms—such as microcracks, hot spots, grid- line interruptions, black cores, and potential-induced degrada- tion—may gradually emerge during long-term field operation. If not detected in time, these defects can significantly reduce power output, accelerate module degradation, and increase maintenance costs [1, 2]. Consequently, efficient and reliable This work was supported by EcoFlow R&D Projects. (Corresponding author: Rui Li). H. He, Y. Duan, W. Liu, H. Hang, Q. Tuo, X. Yang, and R. Li are with EcoFlow Inc, Shenzhen, China (ricky.li@ecoflow.com). defect inspection has become a key component of PV system reliability management. In practice, PV module inspection commonly relies on imaging-based diagnostic techniques such as electrolumines- cence (EL), infrared, and visible-light imaging. Among these approaches, EL imaging is particularly effective in revealing in- ternal electrical and structural defects within photovoltaic cells. However, conventional inspection workflows often require manual interpretation of EL images, which is labor-intensive, subjective, and difficult to scale to large fleets of distributed PV systems. To improve inspection efficiency and consistency, deep-learning–based approaches have been increasingly ex- plored for automated PV defect detection [3, 4]. Recent studies have investigated Transformer-based visual architectures [5], multi-scale feature fusion strategies [6], and attention-enhanced object detection models for PV defect localization [7]. Despite these advances, deploying learning-based inspection systems in real operational environments remains fundamen- tally challenging. First, PV inspection datasets are typically limited in size and often exhibit strongly long-tailed dis- tributions, where a few dominant defect types account for most samples while many reliability-critical defects appear only rarely [3, 4]. Under such conditions, standard supervised learning pipelines are easily dominated by head classes, leading to poor recall for rare defects [8, 9, 10]. However, these rare defects often correspond to high-risk failure modes that directly affect long-term PV reliability and energy yield [2]. Second, PV defect patterns often exhibit substantial mor- phological complexity and semantic ambiguity [11, 12]. Their visual appearance may vary significantly across module de- signs, installation conditions, and imaging devices. Conse- quently, samples within the same defect category may present large intra-class variations, while visually similar patterns may correspond to different defect types under low-resolution electroluminescence (EL) imaging. For example, the efficiency degradation induced by finger interruptions depends on a complex interaction between their size, position, and number, and some visually observable defects may not necessarily lead to measurable efficiency loss [11]. Moreover, defects on abnormal PV cells often exhibit visual characteristics similar to the textured background in EL images, which makes them difficult to distinguish from normal patterns or impurities using conventional inspection algorithms [12]. Third, PV inspection environments are inherently dynamic. Variations in imaging devices, illumination conditions, in- stallation locations, and operational environments introduce persistent distribution shifts between historical training data and newly collected inspection data, which may significantly arXiv:2603.14869v1 [cs.AI] 16 Mar 2026 2 degrade the performance of models trained on static datasets [13, 14]. In addition, the defect taxonomy itself may evolve over time as new defect patterns emerge during long-term operation. Conventional inspection pipelines typically assume a fixed label space and require repeated manual annotation and full model retraining when new defect categories appear, making them difficult to maintain in evolving industrial envi- ronments. These observations suggest that PV defect inspection should not be treated as a one-time closed-set learning problem, but rather as a long-term adaptive learning task. Effective inspection systems must not only achieve strong detection performance on existing datasets, but also remain robust under small and imbalanced data, adapt to distribution shifts across operational environments, and continuously incorporate newly emerging defect categories. To address these challenges, this paper proposes SEPDD, a self-evolving defect detection framework for industrial photo- voltaic inspection. Unlike conventional pipelines that optimize models only once during offline training, SEPDD introduces a closed-loop evolution mechanism that continuously improves the detection system as new data and defect patterns appear. Specifically, SEPDD integrates automated architecture search with a self-evolving learning strategy, enabling robust de- fect detection under long-tailed data distributions, distribution shifts, and dynamically expanding defect taxonomies. The main contributions of this work are summarized as follows: (i) We identify key challenges in automated PV defect in- spection, including limited training data, long-tailed defect distributions, evolving inspection workflows, and emerg- ing defect categories, and propose SEPDD, a self-evolving framework integrating automated model optimization with continual model adaptation to address these challenges in industrial PV inspection. (i) We provide a self-evolving framework that combines a search strategy to balance exploration and exploitation (e.g., top-k selection and merge over the search graph) with a per-node workflow that integrates code generation and iterative refinement, so that each node yields high- quality, deployment-ready code before it is used to expand the search. (i) Extensive experiments on both public and industrial EL datasets demonstrate that SEPDD significantly improves rare-defect detection performance while maintaining real- time inference capability. The remainder of this paper is organized as follows. Section I presents the industrial PV defect inspection setting and related work on photovoltaic defect detection and long-tailed learning, highlighting the practical challenges that motivate this work. Section I presents the SEPDD framework and describes its self-evolving search mechanism and automated pipeline optimization for PV defect detection. Section IV describes the industrial PV inspection setting and experimental datasets, and reports the experimental results and performance analysis. Section V concludes the paper. I. INDUSTRIAL PV DEFECT DETECTION This section describes the industrial PV defect inspection setting, the key challenges for automated defect detection, and how these challenges motivate the proposed SEPDD frame- work. A. Problem Setting Reliable PV power generation critically depends on the early identification of module defects before they propagate into energy-reducing failure modes during field operation. Undetected microcracks, metallization failures, dark regions, or localized hotspots may expand under environmental stressors such as thermal cycling, humidity–freeze, and mechanical load- ing, eventually leading to irreversible degradation, energy-yield loss, and increased operation and maintenance (O&M) costs over the module lifetime [15, 16]. Accurate defect detection is therefore essential for ensuring long-term PV system reliability and stable energy yield. 1) Industrial EL inspection setting: EL imaging is widely used for PV module defect inspection because it reveals electrical inhomogeneities and structural defects within pho- tovoltaic cells that are difficult to detect using visible-light or infrared imaging. However, industrial EL inspection envi- ronments differ substantially from the controlled acquisition conditions of most public datasets. As reported in PV relia- bility surveys and inspection studies [15, 11], real-world EL images often exhibit heterogeneous module geometries, non- uniform spatial resolutions, acquisition noise, and evolving defect characteristics across imaging devices and production batches. Compared with curated academic datasets, industrial inspection datasets typically contain limited annotated samples and exhibit greater morphological diversity. Public EL datasets such as [11] provide high-quality annotations under relatively consistent imaging conditions, whereas industrial inspection data typically exhibit greater variation in module formats, image quality, and defect characteristics. These differences significantly increase the difficulty of reliable automated defect detection. 2) Data characteristics: limited annotated data and long- tailed distributions: Industrial EL inspection datasets often contain limited annotated samples because EL imaging and expert annotation are expensive and time-consuming. As a result, the available training data provide insufficient statistical coverage of defect variations, which increases the risk of overfitting in deep learning models. In addition, PV defect datasets typically exhibit highly long- tailed distributions, where a small number of dominant defect categories account for most samples while many reliability- critical defects appear only rarely. Such imbalance is known to degrade the performance of conventional supervised learning methods, particularly for tail classes [17]. Missing these rare defects may lead to significant reliability risks and long-term energy yield losses in PV systems. B. Challenges Although deep-learning–based PV inspection methods have demonstrated strong performance on curated academic 3 datasets, deploying them in real industrial environments intro- duces several fundamental challenges. 1) Challenge 1: Complex defect morphology under low- resolution EL imaging: PV defects originate from diverse mechanical, electrical, and material degradation mechanisms [15]. Consequently, defects may exhibit widely varying shapes, scales, contrasts, and textures in EL imagery. Industrial inspec- tion further increases this complexity due to heterogeneous panel geometries, lower imaging resolution, and acquisition noise. As a result, samples belonging to the same defect category may show large intra-class variation, while visually similar patterns may correspond to different defect types. Although modern object detection architectures and multi- scale feature representations have improved defect recognition performance [11, 12], these models are typically trained on rel- atively clean datasets and may not generalize well to industrial inspection scenarios. 2) Challenge 2: Distribution shifts across operational envi- ronments: Industrial PV inspection data are subject to persis- tent distribution shifts caused by variations in imaging devices, illumination conditions, module geometries, and production processes. As these factors evolve over time, the statistical characteristics of EL images may differ substantially from those of historical training data. Consequently, models trained on static datasets often experience significant performance degradation when deployed in new inspection environments. Although domain adaptation techniques can partially mit- igate such shifts by aligning feature distributions across do- mains [18], maintaining stable detection performance under continuously changing operational conditions remains difficult. 3) Challenge 3: Emerging defect categories and evolving label space: Unlike academic benchmarks with fixed label taxonomies, the defect category space in industrial PV inspec- tion is inherently dynamic. As module technologies, materials, and manufacturing processes evolve, previously unseen defect types may gradually emerge. These new defect categories are typically rare and sparsely labeled in early stages but may pose significant risks to module reliability. Conventional detection pipelines assume a fixed label space and require manual re-annotation and full model retraining when new defect categories appear. Continual learning tech- niques aim to address such evolving tasks by enabling models to learn new categories without catastrophic forgetting [19], but applying them effectively in industrial inspection systems remains challenging. In summary, industrial PV defect inspection is character- ized by small datasets, complex defect morphology, evolving data distributions, and dynamically emerging defect categories. These characteristics expose the limitations of static, closed-set defect detection pipelines and highlight the need for adaptive inspection systems. C. From Current Approaches to SEPDD Recent advances in AutoML and neural architecture search have shown that automated optimization pipelines can signif- icantly reduce manual engineering effort in computer vision tasks [20]. However, directly applying automated optimization to industrial PV EL inspection remains challenging due to severe class imbalance, limited training data, and evolving inspection environments. In practice, domain experts often improve detection perfor- mance through manual architectural modifications, customized training strategies, and extensive hyperparameter tuning. While these expert-driven approaches can be effective, they require substantial effort and are difficult to maintain as inspection conditions evolve. Industrial PV inspection therefore requires detection systems that can maintain robust performance under changing environ- ments while adapting to newly emerging defect patterns with minimal human intervention. To address these requirements, we propose SEPDD, a self- evolving photovoltaic defect detection framework designed for industrial EL inspection environments. Instead of treat- ing model development as a one-time optimization process, SEPDD introduces a closed-loop self-evolving mechanism that continuously monitors system performance and triggers automated model adaptation when necessary. Through iterative model evolution and automated optimization, SEPDD enables robust defect detection under long-tailed data distributions, dis- tribution shifts, and dynamically expanding defect taxonomies. I. SEPDD: A SELF-EVOLVING PV DEFECT DETECTION FRAMEWORK Ensuring high PV energy yield and long-term module reliability requires timely and accurate detection of defects that may otherwise evolve into energy-reducing failure modes during field operation. As discussed in Section I, industrial EL inspection is challenged by limited and long-tailed data, complex defect morphology under low-resolution imaging, persistent distribution shifts, and the continual emergence of new defect patterns. These characteristics make static, one-shot optimization pipelines difficult to maintain in practice. To address these challenges, we propose SEPDD, a self- evolving framework for photovoltaic defect detection designed for industrial EL inspection environments. Unlike conventional pipelines that optimize a detector only once during offline train- ing, SEPDD introduces a closed-loop evolution mechanism that continuously monitors system status and improves the detection pipeline when necessary. The framework is inspired by recent progress in code-space exploration and automated model op- timization (e.g., AIDE [21]), but its core contribution lies in a self-evolving iterative mechanism that enables the detector to adapt autonomously to distribution shifts, newly emerging defect categories, and evolving operational conditions. A. Framework Overview SEPDD is an autonomous, trigger-driven framework for photovoltaic defect detection. It functions as an automated maintenance system that continuously monitors the operational context and updates the detector when the current pipeline can no longer maintain the expected performance. For a given inspection task, the solution is represented by source code that defines the ML training pipeline and produces the best- performing trained model. 4 Valid node Buggy node Code Refiner Inter-node Pipeline Idea Generator Code Creator Validator Analyzer Executor Analyzer Self-evolving Search Tree Domain specific code TaskKnowledge Environmental context SEPDD: Automated code generation & optimization Retraining failure Label evolution Environmental change Periodic evolution Trigger Ta s k : Create a training, evaluation, and test pipeline for photovoltaic (PV) defect detection... Knowledge: The instances of different labels are highly unbalanced. The most updated YOLO model is... Environmental Context: These are the resources you can use... You MUST follow these constraints... Journal Fig. 1: Overall architecture of the SEPDD framework. The evolution cycle is triggered by pre-defined indicators. The self- evolving search is tree-based with merge action, where a node represents an exploration. Each node orchestrates a self-contained evolution pipeline that produces high-quality and stable code with a deployment-ready model. A key characteristic of SEPDD is that it is knowledge- based. Its monitoring logic and adaptation strategies incor- porate external domain knowledge, including expert-validated defect characteristics, system-level operational constraints, and updated task knowledge. This is particularly important for photovoltaic inspection, where specialized prior knowledge remains essential for rigorous and reliable defect analysis. The framework tracks a set of indicators to determine when an evolution cycle should be activated. A typical trigger is retraining failure, where routine refreshment can no longer recover the desired performance. Other triggers include label evolution, in which new defect categories appear; operational or environmental changes, such as changes in imaging con- ditions or deployment constraints; and periodic evolution, which is scheduled to maintain model freshness and long-term stability. Once a trigger is activated, SEPDD automatically enters an evolution cycle to update the detector. Depending on the trigger type and operational context, this update may involve targeted retraining, feature representation adjustment, or architectural modification. Evolution is thus tied to indicators that signal when static pipelines begin to fail under the conditions in Section I. B. Self-Evolving Search Framework 1) Search framework: Building on ideas from AIDE [21] and aira-dojo [22], we construct a self-evolving, graph-based search framework equipped with enhanced operators for itera- tive pipeline optimization. Each expansion step creates a new node that encapsulates an entire exploration of the code space, including model architecture, training strategy, and execution outcome. Edges record the evolutionary relationship between successive explorations. The root node represents the initial state of the system, where the detector is either absent or initialized with a predefined configuration. This structure lets the search explore multiple pipeline variants while keeping a clear lineage, so adaptation can proceed as data or defect characteristics change. 2) Search strategy: SEPDD adopts a top-k search strategy to balance exploitation and exploration. At each iteration, the parent for expansion is selected from the current top-k best- performing nodes, which promotes exploitation of strong can- didates. Among those candidates, the node with the minimum child count is expanded, which encourages exploration by avoiding over-commitment to a single branch. This design is also related to Monte Carlo Tree Search (MCTS), but is augmented with mechanisms that improve information flow and reduce node isolation [23]: • Journal access. We maintain a global journal that records the complete evolutionary history and acts as both a storage layer and an information hub. Although each new node is primarily derived from its parent, the journal exposes global context so that high-performing patterns can influence subsequent transformations. This shared memory stabilizes the search and reduces drift toward degenerate solutions, which matters when feedback is noisy or conditions vary across runs. Over successive iterations, the framework thereby accumulates evidence of which pipeline choices succeed and which fail, so that future explorations can favor effective strategies and avoid ineffective ones. • Merge action. SEPDD periodically examines a set of promising nodes, extracts their effective patterns, and analyzes both their strengths and failure cases. These nodes are then treated collectively as candidate parents, and their distilled components are merged to form a new node. An independent LLM-powered module is used to produce a detailed analysis of promising candidates. Merge promotes cross-branch communication and yields solutions that combine strengths from multiple explo- rations, so that results generalize across defect types and imaging conditions. 3) Operator set: The self-evolving process is implemented through a set of modular, reusable, LLM-powered operators, each responsible for a specific function in the evolution pipeline. Together they aim for stable, deployable pipelines suited to industrial EL (e.g., complex morphology, imbalanced 5 ** Task Context ** - Task description - Data description - Task-Specific requirements ** Parent Node ** - Code - Execution output - Strategies of this code ** Summaries of Other Solutions ** - Model/training strategies summary - Strength analysis - Weakness analysis ** System rules ** e.g., all task requirements must be satisfied, print metrics explicitly to stdout... ** Reasoning Steps and Instructions ** e.g., avoid vague suggestions... ** Output Format ** Fig. 2: Prompt template for Idea Generator. data, evolving conditions): • Idea Generator serves as the strategic exploration compo- nent. It analyzes the current codebase, task requirements, and performance metrics, and then generates prioritized suggestions for improvement. It supports exploration of diverse solution strategies (e.g., architecture or training changes) while preserving focus on high-impact changes. We provide the prompt template of the base LLM in Fig. 2 with critical components. • Code Creator functions as the primary code synthesis en- gine. It translates task requirements and strategic sugges- tions into complete, executable Python implementations. It operates in both initial-generation and improvement modes, and proactively fixes issues beyond those explic- itly specified. • Analyzer provides a multi-faceted evaluation mechanism that combines static code analysis with dynamic execution analysis. It inspects code structure, syntax, and logic, parses execution results, and extracts performance metrics from terminal output. It determines whether a candidate requires debugging and produces feedback for subsequent refinement, so that only validated candidates advance. • Code Refiner performs focused bug fixing and code refine- ment. Guided by the analyzer, it addresses identified issues while proactively discovering additional problems through comprehensive code examination. It leverages previous failed attempts to avoid redundant debugging strategies. 4) Inter-node pipeline: An orchestrator unifies the above operators and supporting tools into a structured evolution workflow. Each node in the search graph follows the same pipeline: Idea Generator→ Code Creator→ loopValidator→ Analyzer → Executor → Analyzer → Code Refiner. The Validator first checks syntax and linting errors and then runs the candidate code in a lightweight “debug” mode using minimal hyperparameters. The Executor subsequently performs the full run. Performance metrics and execution traces are fed back to the operators to guide node evaluation and future search decisions. Nodes are ranked by their performance, promising nodes are expanded, and branches that encounter two consecutive faulty nodes are terminated to avoid wasted Algorithm 1: Workflow of one node (single expansion step). input : parent node, SEPDD input, journal. output: new node // Code generation 1 suggestions ← IdeaGenerator(parent_node) 2 code ← CodeCreator(parent_node, suggestions) // Code refinement 3 repeat 4syntax_info, exec_output ← Validator(code) 5buggy, analysis ← Analyzer(code, syntax_info, exec_output) 6if not buggy then 7exec_output ← Executor(code) 8buggy, analysis ← Analyzer(code, syntax_info, exec_output) 9if not buggy then 10break 11end 12end 13code ← CodeRefiner(code, exec_output, analysis) 14 until reach maximum debug depth computation. Algorithm 14 summarizes the workflow of a single node and the input/output of each operator or tool. The debug loop ensures that each node produces functional and evaluated code before being used as a parent node. Additionally, the debug loop records successful and unsuccessful attempts to guide subsequent corrections. As a result, the search process builds upon solutions that remain robust under the variability of industrial EL data. In summary, the self-evolving mechanism constitutes the core of SEPDD. Triggers start evolution when static pipelines no longer suffice; the graph-based search with top-k and merge explores and consolidates effective variants; the operator pipeline and journal ensure that code is validated and that strong patterns propagate. The system thus keeps learning, from both successful and failed explorations, what works under the current data and task and what does not. These choices address the principal challenges of industrial PV defect inspection: complex defect morphology under low-resolution and noisy EL, distribution shifts across environments, and emerging or rare defect categories. In practice, SEPDD im- proves detection of reliability-critical defects, reduces the prob- ability of defective modules entering outdoor deployment, and contributes to more stable long-term energy yield, lower O&M and warranty-related costs, and enhanced robustness against production-line variation. IV. EXPERIMENTS This section evaluates the proposed SEPDD framework on both public and private industrial EL defect detection datasets. The experiments are designed to verify whether SEPDD can effectively address the key challenges identified in Section I, namely long-tailed label distributions, distribution shifts across inspection environments, and the emergence of new defect categories. To this end, we consider both full-label and 6 050010001500200025003000 Instance count black_core crack finger horizontal_dislocation short_circuit thick_line vertical_dislocation star_crack 13.2% 16.2% 38.0% 10.2% 6.3% 12.6% 1.8% 1.7% PVEL-AD Dataset 020004000600080001000012000 Instance count microcracks lightAndShade blackSlice blackEdging brokenGridLines blackSpots 75.1% 9.0% 10.7% 1.7% 0.9% 2.6% EF Industrial Dataset Fig. 3: Defect distribution. reduced-label regimes, and compare SEPDD against AIDE and expert-based baselines. A. Experimental Setup 1) Datasets and splits: We use two EL defect detection benchmarks. The PVEL-AD [24] (public) dataset comprises 4,050 training images and 450 test/validation images, whereas the EF industrial (private) dataset comprises 814 training images and 90 test/validation images. The label distributions are shown in Fig. 3. Both datasets are in YOLO format with multi-class defect annotations. Compared with the public benchmark, the industrial dataset exhibits more heterogeneous panel formats, lower image resolution, and a substantially more severe long-tailed class distribution, making it a more challenging testbed for evaluating robustness under industrial conditions. 2) Baselines: We compare three baselines. AIDE denotes the automatically optimized baseline. Without explicit domain knowledge, AIDE typically selects YOLOv8m [25] (or a comparable mid-size backbone) and trains it using a standard detection pipeline, but without the refined self-evolving search strategy and code stability enhancement introduced in SEPDD. To bracket the range of human-driven approaches, we further consider two expert baselines. Expert-YOLO is a human- expert-tuned detector based on YOLO11 [26] or YOLO12 [27, 28], with the best result retained. It uses the same detector family as our pipeline, but relies on manual model selection and hyperparameter tuning. Expert-FRCN adopts the Faster R-CNN–based few-shot object detector of [29], serving as a representative public-domain detector transferred to the indus- trial PV EL scenario. This baseline is included to highlight the difficulty of cross-domain adaptation. Both expert baselines require substantial manual effort, including repeated tuning of architectures, training recipes, and hyperparameters, which SEPDD is designed to reduce. 3) Base model for SEPDD: Code generation uses Qwen3- coder-plus, as in AIDE, to keep the fair comparison. Analysis (such as Idea Generator and Analyzer) use Qwen3.5-plus. Unless otherwise stated, reported results use this Qwen-based setup (SEPDD (Qwen)). 4) Metrics: We adopt standard object detection evaluation metrics: Precision (P), Recall (R), mAP50, and mAP50–95. B. Quantitative Results and Analysis 1) Public benchmark: As shown in Table I, the public PVEL-AD benchmark is relatively clean and well curated, allowing all methods to achieve competitive performance. AIDE already provides a reasonable automated baseline, and both Expert-YOLO and Expert-FRCN further improve mAP50. SEPDD achieves the highest recall and the highest mAP50 among all compared methods. The substantial recall gain over AIDE suggests that the self-evolving mechanism can capture weak, subtle, or partially annotated patterns that are difficult for a single fixed pipeline to exploit. This result indicates that iterative refinement is beneficial even under relatively idealized conditions. 2) EF industrial dataset: Detection results. Results on the EF industrial dataset are reported in Table I. Compared with the public benchmark, all methods experience a noticeable degradation in absolute performance, which is consistent with the more challenging industrial conditions, including lower resolution, stronger class imbalance, and domain shift. In particular, Expert-FRCN drops sharply, indicating that a public- domain few-shot detector transfers poorly to private PV EL data without explicit adaptation. This observation is consistent with the challenge of distribution shift discussed in Section I. SEPDD achieves the best overall performance on this dataset, with the largest improvement over AIDE among all methods. Moreover, the gain of SEPDD over AIDE is larger on the industrial dataset than on the public one, suggesting that the benefit of self-evolution becomes more pronounced when the task is harder and the data are more constrained. This result supports the claim that SEPDD is particularly effective under the realistic industrial conditions targeted in this work. The detector has been deployed in EF production settings, reducing the detection time per image from 60 s to 2 s and supporting more than 22,000 devices. Token usage. On one EF industrial run, SEPDD consumed 1.36M input and 0.23M output tokens (1.59M total); AIDE with 50 nodes consumed 0.88M input and 0.19M output tokens (1.06M total). SEPDD uses roughly 50% more tokens than AIDE in this setting, which reflects the additional cost of the journal, merge, and per-node validation–refinement loop; the shallower search and fewer effective nodes in SEPDD (e.g., 18 nodes with a single primary edge) keep the difference moderate relative to the gain in detection performance and pipeline stability. 3) One-label-added simulation: In real-world industrial set- tings, we often encounter emerging defect categories of small instance counts (challenge 3). To evaluate the performance 7 TABLE I: PVEL-AD dataset. Bold: best per column; underline: second. MethodPrecisionRecallmAP50mAP50-95 AIDE79.969.476.649.4 Expert-YOLO79.980.486.755.7 Expert-FRCN75.069.788.763.5 SEPDD (Qwen)81.991.391.457.2 SEPDD (GPT-5.1)88.284.8 90.362.9 TABLE I: EF industrial dataset. Bold: best per column; underline: second. MethodPrecisionRecallmAP50mAP50-95 AIDE43.449.244.627.7 Expert-YOLO58.445.947.029.8 Expert-FRCN38.832.936.018.8 SEPDD (Qwen)49.254.149.530.7 SEPDD (GPT-5.1)58.948.250.930.4 in such setting, we simulate the emergence of a new defect category with only a small number of labeled instances. Specif- ically, we remove one low-frequency label from the training set and re-run all methods. The results in Tables I and I correspond to the full-label setting. For PVEL-AD, we remove star_crack; for the EF industrial dataset, we remove blackEdging. The results are shown in Fig. 4, where the darker bars correspond to the one-label-removed setting and the lighter bars correspond to the full-label setting. This experiment directly evaluates the challenge of an evolv- ing label space. On both public and industrial datasets, SEPDD maintains its performance better than the compared baselines when a rare label is removed and later treated as newly introduced. By contrast, AIDE and Expert-FRCN degrade more noticeably, especially on the industrial scenario. This indicates that re-training a fixed pipeline is often insufficient when supervision for a newly appearing category is scarce. Because SEPDD automates hyperparameter search, architecture search, and pipeline refinement within the self-evolving loop, it can adapt to new categories without sacrificing overall robustness. Across both the full-label and one-label-added settings (Tables I and I, Fig. 4), SEPDD consistently matches or outperforms the baselines in mAP50 and mAP50–95. Taken together, these results show that SEPDD improves both detection coverage and adaptability, especially when the task involves strong class imbalance, domain shift, and newly emerging defect categories. C. Result Visualization and Observation Visual comparisons complement the quantitative results by showing how different methods behave on representative defect patterns. Fig. 5 compares detection outputs on the public benchmark across eight defect categories, namely finger, star crack, black core, crack, thick line, horizontal dislocation, ver- tical dislocation, and short circuit. Each row corresponds to one defect type, and each column shows the original image, ground truth, AIDE, Human Expert, and SEPDD. Fig. 6 presents representative industrial examples, including microcracks, light and shade, black spots, broken grid lines, and black slice. P R mAP50 mAP50-95 P R mAP50 mAP50-95 P R mAP50 mAP50-95 P R mAP50 mAP50-95 40 50 60 70 80 90 100 AIDEExpert-YOLOExpert-FRCNSEPDD PVEL-AD Dataset P R mAP50 mAP50-95 P R mAP50 mAP50-95 P R mAP50 mAP50-95 P R mAP50 mAP50-95 10 20 30 40 50 60 70 AIDEExpert-YOLOExpert-FRCNSEPDD EF Industrial Dataset Fig. 4: One-label-added regime: full label vs. one label re- moved per method and metric. Darker bars: one-label-removed; lighter bars: full-label. On the public dataset, all methods generally identify the major defects correctly. However, SEPDD more often recovers subtle or low-contrast defects that are missed by AIDE and, in some cases, by the expert-tuned models. This is particu- larly evident for rare categories such as thick line in Fig. 5. On the industrial dataset, the advantage of SEPDD is more pronounced. Under lower resolution and stronger long-tailed imbalance, SEPDD maintains better coverage of weak or relatively ambiguous defects, such as light and shade vs. black slice in Fig. 6, which is consistent with the gains reported in Table I and Fig. 4. These qualitative observations align with the central goal of SEPDD: improving defect coverage and robustness under the same challenges emphasized throughout this paper, namely complex morphology, domain shift, and evolving defect patterns. D. Code Strategy and Pipeline Analysis and Insights The gains of SEPDD over AIDE and the expert baselines arise from the combination of an autonomous training pipeline and the self-evolving mechanism introduced in Section I. The underlying training pipeline, which is shared by AIDE and SEPDD, already incorporates a large knowledge base about PV EL defects and data characteristics. SEPDD further improves this pipeline through iterative self-evolution. In what follows, we first summarize code-level insights and then discuss how the self-evolving search progresses in practice. 1) Code-level insights: The challenges in Section I (long- tailed and limited data, complex defect morphology, distribu- tion shift) motivate the directions in which the code evolves; the insights below are the resulting design choices that address those challenges. (i) Long-tailed and limited data. Because industrial EL data are often limited and long-tailed, the search favors con- 8 OriginalGround-truthAIDEHuman ExpertSelf-Evolving Fig. 5: Detection comparison across eight public defect categories. Each row: defect type; each column: Original, Ground truth, AIDE, Human Expert (Expert-YOLO), Self-Evolving (SEPDD). 9 OriginalGround-truthAIDEHuman ExpertSelf-Evolving Fig. 6: Detection comparison on EF industrial defect types (microcracks, light and shade, black spots, broken gridlines, black slices). Columns: Original, Ground truth, AIDE, Human Expert (Expert-YOLO), Self-Evolving (SEPDD). figurations that better represent minority defect categories. Classification loss and class imbalance: Setting cls=1.0 and using inverse-square-root-weighted oversampling (rare classes repeated more often, subject to a cap) is the resulting choice; focal loss is not used because it is unsupported in the cur- rent framework. Augmentation: Stronger augmentation (e.g., mosaic=0.7, mixup=0.15, degrees=5, translate=0.10, scale=0.5) improves recall and mAP50 when the backbone and learning rate are chosen accordingly. Optimizer and learning rate: AdamW with a lower initial learning rate (lr0=8e-4) and cosine decay pairs well with a larger backbone and stronger augmentation. Backbone: A larger backbone (e.g., YOLO11l) helps only when combined with a matching augmentation and optimization recipe; model size alone does not guarantee improvement. (i) Complex morphology and distribution shift. Because PV EL imagery exhibits complex defect morphology (small, low- contrast defects such as microcracks and gridline breaks) and imaging conditions vary across environments, the search con- verges to PV-specific and robustness-oriented choices. Small defects: Overly strong augmentation can displace or distort tiny targets; effective configurations use augmentation strong enough to improve generalization without degrading small structures. Panel structure: Moderate rotation and translation keep defects on-panel and preserve panel layout, so pipelines remain valid when module geometries and imaging conditions vary. Resolution: An input size of 640 works well with the selected augmentation; larger resolution or tile-based infer- ence with NMS can help for very high-resolution imagery. Localization: A stronger box loss and label smoothing improve bounding-box quality and confidence calibration when defect shapes and contrasts vary widely. (i) What did not work. The search correctly avoids the following: focal loss (fl_gamma, fl_alpha unsupported; training fails when enabled); very high input resolution (e.g., 896) or excessively strong mosaic/mixup, which did not out- Root 0 1: mAP50-95=0.2764, mAP50=0.4433 5: mAP50-95=0.4106, mAP50=0.4106 6: mAP50-95=0.2970, mAP50=0.4578 9: mAP50-95=0.3051, mAP50=0.4592 13: mAP50-95=0.3045, mAP50=0.4609 17: mAP50-95=0.2942, mAP50=0.4561 18: mAP50-95=0.2935, mAP50=0.4736 14: mAP50-95=0.2552, mAP50=0.3919 10: mAP50-95=0.3069, mAP50=0.4954 11: mAP50-95=--, mAP50=-- (invalid) 12: mAP50-95=0.2767, mAP50=0.4549 15: mAP50-95=0.2777, mAP50=0.4581 16: mAP50-95=0.2904, mAP50=0.4909 2: mAP50-95=0.2718, mAP50=0.4325 3: mAP50-95=0.2827, mAP50=0.4723 4: mAP50-95=0.2685, mAP50=0.4106 7: mAP50-95=0.2802, mAP50=0.4389 8: mAP50-95=0.2433, mAP50=0.4114 Fig. 7: Evolution tree from one EF industrial run (code-like view). Bold: best branch 0→ 1→ 6→ 10 (Nodes 1, 6, 10). perform the selected configuration; and using a larger backbone without an appropriate learning rate and augmentation strategy. 2) Self-evolving progress: When merging or distilling in- formation across the search graph, SEPDD keeps only the primary edge, namely the best parent-child lineage along the current best branch; see Section I for the complete evolution pipeline. Fig. 7 shows the evolution tree from one run on the EF industrial setting, where the search expands from the root through two first-level branches and then continues along the strongest-performing branch with associated metrics (mAP50- 95 and mAP50) recorded at each node. The search expands from the root through two first-level branches (Nodes 1 and 2). The best mAP50 in the run is achieved at Node 10 (0.4954), which also has the best mAP50- 95 (0.3069) among valid nodes; the primary edge is therefore 0 → 1 → 6 → 10. Node 11 is invalid (no valid metrics) and not used in the primary edge. The branch under Node 2 (Nodes 3, 4, 7, 8) does not surpass the best branch. Overall, the run 10 produces 18 nodes, and the best configuration corresponds to Node 10. The detector and metrics reported in the main tables (e.g., Table I) correspond to the best node selected from such a run. This behavior highlights several important properties of SEPDD. First, the journal and merge operations improve stabil- ity by reducing search drift. Second, the framework can achieve strong performance gains with a relatively shallow tree, which improves search efficiency. Third, the framework naturally incorporates domain knowledge through updated knowledge inputs, task-specific triggers, and specialized operators for PV EL inspection. Finally, the overall design substantially reduces the repeated learn–tweak–try cycle required by manual expert tuning. Due to the design of AIDE [21], it may produce a much larger solution tree, often involving on the order of 50 nodes or more, with many branches devoted to drafting, debugging, and refining code. If not carefully constrained, AIDE-generated solutions may also exploit shortcuts that improve reported metrics without delivering commensurate gains in actual de- fect detection performance. In contrast, SEPDD records only workable nodes and achieves strong results with fewer nodes and a shallower search tree, which is consistent with its design objective of stable and efficient self-evolution. V. CONCLUSION This paper presented SEPDD, a self-evolving photovoltaic defect detection framework for industrial electroluminescence (EL) inspection. SEPDD was developed to address key chal- lenges in industrial PV defect inspection, including limited and long-tailed training data, complex defect morphology un- der low-resolution imaging, persistent distribution shifts, and newly emerging defect categories. By combining automated model optimization with a trigger-driven self-evolving mecha- nism, the proposed framework enables robust and adaptive de- fect detection with minimal human intervention. Experiments on both a public benchmark and a private industrial EL dataset showed that SEPDD consistently improves the detection of reliability-critical and low-contrast defects, particularly under severe class imbalance and domain shift. The results further indicate that SEPDD adapts more effectively to evolving defect categories than fixed training pipelines and manually tuned baselines. By improving the robustness and maintainability of industrial PV inspection, SEPDD supports more reliable defect screening, reduced maintenance risk, and more stable long- term energy yield of PV systems. Future work will extend the framework to multimodal inspection and fleet-level PV reliability management. REFERENCES [1] E. Özkalay, H. Quest, A. Gassner, A. Virtuani, G. C. Eder, C. Buerhop- Lutz, G. Friesen, and C. Ballif, “Three decades, three climates: environ- mental and material impacts on the long-term reliability of photovoltaic modules,” EES Solar, 2025. [2] Y. Tang, S. Poddar, M. Kay, and F. Rougieux, “Understanding and reducing the risk of extreme photovoltaic degradation,” IEEE Journal of Photovoltaics, 2025. [3] Q. Liu, M. Liu, C. Wang, and Q. M. J. Wu, “An efficient cnn-based detec- tor for photovoltaic module cells defect detection in electroluminescence images,” Solar Energy, vol. 266, p. 112245, 2023. [4] M. W. Akram, J. Bai, C. Xuan, X. Xu, J. Hu, and S. Wu, “Advancing photovoltaic cells defect detection in electroluminescence images through exploring multiple object detectors,” Solar Energy Materials and Solar Cells, vol. 285, p. 113777, 2025. [5] J. P. C. Barnabé, L. P. Jiménez, G. Fraidenraich, E. R. D. De Lima, and H. F. Santos, “Quantification of damages and classification of flaws in mono-crystalline photovoltaic cells through the application of vision transformers,” IEEE Access, 2023. [6] S. Chen, Y. Lu, G. Qin, and X. Hou, “Polycrystalline silicon photovoltaic cell defects detection based on global context information and multi- scale feature fusion in electroluminescence images,” Materials Today Communications, vol. 42, p. 110627, 2024. [7] D. Lang and Z. Lv, “A pv cell defect detector combined with transformer and attention mechanism,” Scientific Reports, vol. 14, p. 72019, 2024. [8] H. He and E. Garcia, “Learning from imbalanced data,” IEEE Transac- tions on Knowledge and Data Engineering, vol. 21, no. 9, p. 1263–1284, 2009. [9] Y. Cui, M. Jia, T. Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in CVPR, 2019, p. 9268–9277. [10] J. Tan, X. Lu, G. Zhang, and J. Yin, “Equalization loss for long-tailed object recognition,” in CVPR, 2020, p. 11 662–11 671. [11] J. Deitsch, V. Christlein, S. Berger et al., “Automatic classification of defective photovoltaic module cells in electroluminescence images,” Solar Energy, vol. 185, p. 455–468, 2019. [12] J. Wang, L. Bi, P. Sun et al., “Deep-learning-based automatic detection of photovoltaic cell defects in electroluminescence images,” Sensors, vol. 23, no. 1, p. 297, 2022. [13] C. Del Pero, N. Aste, F. Leonforte, and F. Sfolcini, “Long-term reliability of photovoltaic c-si modules – a detailed assessment based on the first italian bipv project,” Solar Energy, vol. 274, p. 112074, 2023. [14] A. Kumar, H. Ganesan, V. Saini, S. Sharma, and A. Agrawal, “An assessment of photovoltaic module degradation for life expectancy: A comprehensive review,” Engineering Failure Analysis, vol. 152, p. 107863, 2023. [15] M. Aghaei, A. Fairbrother, A. Gok et al., “Review of degradation and failure phenomena in photovoltaic modules,” Renewable and Sustainable Energy Reviews, vol. 149, p. 112160, 2022. [16] J. Kim, M. Rabelo, S. Padi et al., “A review of the degradation of photovoltaic modules for life expectancy,” Energies, vol. 14, no. 14, p. 4278, 2021. [17] Y. Zhang, B. Kang, B. Hooi et al., “Deep long-tailed learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. [18] M. Wang and W. Deng, “Deep domain adaptation for machine learning: A survey,” Neurocomputing, vol. 338, p. 78–94, 2020. [19] G. Parisi, R. Kemker, J. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, p. 54–71, 2019. [20] T. Elsken, J. Metzen, and F. Hutter, “Neural architecture search: A survey,” Journal of Machine Learning Research, vol. 20, p. 1–21, 2019. [21] Z. Jiang, D. Schmidt, D. Srikanth, D. Xu, I. Kaplan, D. Jacenko, and Y. Wu, “Aide: Ai-driven exploration in the space of code,” arXiv preprint arXiv:2502.13138, 2025. [22] E. Toledo, K. Hambardzumyan, M. Josifoski, R. Hazra, N. Baldwin, A. Audran-Reiss, M. Kuchnik, D. Magka, M. Jiang, A. M. Lupidi et al., “Ai research agents for machine learning: Search, exploration, and generalization in mle-bench,” arXiv preprint arXiv:2507.02554, 2025. [23] S. Du, X. Yan, D. Jiang, J. Yuan, Y. Hu, X. Li, L. He, B. Zhang, and L. Bai, “Automlgen: Navigating fine-grained optimization for coding agents,” arXiv preprint arXiv:2510.08511, 2025. [24] B. Su, Z. Zhou, and H. Chen, “Pvel-ad: A large-scale open-world dataset for photovoltaic cell anomaly detection,” IEEE Transactions on Industrial Informatics, vol. 19, no. 1, p. 404–413, 2022. [25] G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics [26] G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics [27] Y. Tian, Q. Ye, and D. Doermann, “Yolo12: Attention-centric real- time object detectors,” 2025. [Online]. Available: https://github.com/ sunsmarterjie/yolov12 [28] —, “Yolo12: Attention-centric real-time object detectors,” arXiv preprint arXiv:2502.12524, 2025. [29] X. Wang, T. E. Huang, T. Darrell, J. E. Gonzalez, and F. Yu, “Frus- tratingly simple few-shot object detection,” in Proceedings of the 37th International Conference on Machine Learning, 2020, p. 9919–9928.