EU AI Act Article 15: Accuracy and Robustness Guide

Accuracy, robustness, and cybersecurity are not aspirational properties. Under Article 15, they are legal requirements, and the deployer's ongoing monitoring of declared metrics is an enforceable duty. Understanding what the provision actually mandates is the prerequisite for building a monitoring regime that satisfies it.

Key takeaways

Article 15 requires high-risk AI systems to achieve appropriate accuracy, robustness, and cybersecurity throughout their lifecycle. The provider declares the metrics; the deployer monitors performance against them continuously.
Robustness under Article 15(2) includes resilience against errors, faults, inconsistencies, unexpected inputs, and adversarial inputs. For AI agents, this covers prompt injection and data poisoning scenarios explicitly.
Article 26(1)(d) converts the Article 15 technical requirement into a deployer monitoring obligation. Performance degradation below declared metrics is a risk management event under Article 9 and a notification trigger under Article 72.
Insurers evaluating AI liability cover treat declared accuracy metrics and monitoring evidence as core underwriting criteria. Munich Re's aiSure product settles parametrically on accuracy metric breaches, making the declared metrics a direct financial instrument, not only a compliance artefact.

The structure of Article 15

Article 15 comprises four paragraphs, each addressing a distinct dimension of the technical performance requirement for high-risk AI systems. Read together, they establish a framework that runs from initial design through deployment and into the full operational life of the system.

Article 15(1) sets the foundational principle: high-risk AI systems shall be designed and developed in such a way that they achieve, in light of their intended purpose, an appropriate level of accuracy, robustness, and cybersecurity, and perform consistently in those respects throughout their lifecycle. The phrase "in light of their intended purpose" introduces proportionality: what is appropriate for a medical diagnostic system differs from what is appropriate for a recruitment screening tool. Both are subject to the requirement. The standard adjusts; the obligation does not disappear.

Article 15(1) also requires that accuracy metrics be declared in the technical documentation required under Article 11 and in the instructions for use required under Article 13. This is the mechanism by which an abstract performance requirement becomes a measurable compliance criterion. The deployer's monitoring obligation under Article 26(1)(d) attaches to those declared metrics.

Article 15(2) addresses robustness. Article 15(3) addresses cybersecurity, with specific attention to adversarial attacks. Article 15(4) addresses technical redundancy solutions. Each paragraph is examined separately below.

Accuracy: declared metrics and the deployer's monitoring duty

The term "accuracy" in Article 15 is used in the statistical sense: it refers to the degree to which the system's outputs correspond to the correct or expected results for a given set of inputs, as determined by the evaluation methodology chosen by the provider during conformity assessment. The AI Act does not prescribe a universal accuracy metric. It requires the provider to choose and declare the metrics appropriate to the system's intended purpose.

For a machine learning classification system, accuracy metrics typically include measures such as precision, recall, F1 score, and area under the receiver operating characteristic curve. For a regression model, they might include mean absolute error or root mean square error. For a generative AI system used in a high-risk context, accuracy metrics are more difficult to specify but are not thereby optional. The provider must find a measurable proxy and declare it.

The distinction between machine learning systems and rule-based systems is relevant here. Systems that rely primarily on explicit logical rules encoded by developers behave deterministically: the same input produces the same output every time, and the accuracy of the rule set can be evaluated through structured testing. Machine learning systems, particularly those using neural networks, behave probabilistically: performance on a given input distribution reflects training distribution coverage, and accuracy can degrade as the operational distribution drifts from the training distribution. Annex I of the AI Act identifies this distinction, and it has a direct consequence for monitoring. A rule-based system may need only periodic review of the rule set for continued accuracy. A machine learning system requires continuous distribution monitoring to detect accuracy drift before it becomes a compliance failure.

Article 26(1)(d) is the provision that converts the provider's accuracy declaration into a deployer obligation. Deployers must monitor the operation of the high-risk AI system on the basis of the instructions for use and inform the provider when the system is not meeting the performance indicators described in those instructions. The word "monitor" is used in its operational sense. It implies instrumentation, data collection, threshold setting, and a process for acting when thresholds are breached. A deployer who accepts a system, puts it into production, and does not measure whether the declared metrics are being achieved is in breach of Article 26(1)(d) from the first day of operation.

When monitoring reveals that declared metrics are not being met, the deployer has two immediate obligations: notify the provider, and treat the event as a risk management input under Article 9. The risk management system required by Article 9 must identify, analyse, evaluate, and address known and reasonably foreseeable risks throughout the lifecycle of the system. An accuracy degradation event is precisely the type of risk that the Article 9 system is designed to handle. It should trigger a documented review, a determination of root cause, a decision on whether continued operation is appropriate, and, where a serious incident has occurred, notification to the market surveillance authority under Article 26(5).

Robustness: performance under non-ideal conditions

Article 15(2) requires that high-risk AI systems be resilient with regard to errors, faults, or inconsistencies that may occur within the system, or its environment, including in the light of interactions with natural persons or other systems. The operational phrase is "non-ideal conditions." No deployed AI system operates only on inputs that precisely match its training distribution. Robustness is the property that determines how the system performs when conditions deviate from that ideal.

The Regulation distinguishes two categories of deviation in Article 15(2). The first is incidental: ordinary errors, sensor faults, data quality problems, integration inconsistencies. These arise from the imperfect reality of operational environments. The second is adversarial: inputs that are specifically designed to cause the system to behave in ways that diverge from its intended purpose. The same paragraph covers both. The robustness obligation applies across the full spectrum from benign data quality issues to deliberate manipulation.

For deployers, the robustness requirement has two practical dimensions. The first is input validation. The system must include, or the deployer must implement, controls that detect inputs outside the expected distribution and handle them gracefully, either by refusing to process them, by flagging them for human review, or by producing a conservative output that acknowledges uncertainty. A system that produces confident outputs in response to out-of-distribution inputs is not robust in the Article 15(2) sense, regardless of its accuracy on standard benchmarks.

The second dimension is degradation mode. Article 15(2) also requires that the system, in the event of incidental faults, behave in a manner that minimises the risk of harm. This is sometimes described as graceful degradation. A high-risk AI system that fails silently, continuing to produce outputs while internally operating in a degraded state, does not satisfy the provision. The system must expose its operational state in a way that allows the oversight persons assigned under Article 14 and Article 26(2) to detect degraded performance and act on it.

Cybersecurity: adversarial attacks and the agent-specific risk of prompt injection

Article 15(3) is the cybersecurity paragraph. It requires that high-risk AI systems be resilient with regard to attempts by unauthorised third parties to alter their use or performance through attacks exploiting vulnerabilities in those systems. The paragraph then specifies three attack vectors as non-exhaustive examples: training data manipulation (data poisoning), training data or model manipulation (model poisoning), and adversarial examples.

Data poisoning refers to the contamination of training or fine-tuning data by an adversary who inserts samples that cause the model to learn incorrect associations or to behave differently on specific trigger inputs. It is the attack vector most relevant to providers during the development phase, but deployers who fine-tune a base model or who provide feedback data that influences subsequent model updates are within the poisoning threat surface.

Adversarial examples are inputs that have been crafted to cause the model to produce an incorrect or unintended output, typically by adding imperceptible perturbations to otherwise normal inputs. For vision systems, the classical example is image perturbations that cause misclassification with high confidence. For language models, the equivalent is adversarial text designed to elicit responses outside the model's intended scope or to override instructions.

For AI agents specifically, the most operationally significant attack vector under Article 15(3) is prompt injection. Prompt injection is a technique in which adversarial content embedded in the environment that the agent is processing, whether a document, a web page, a database record, or a message, contains instructions that the agent interprets as authoritative commands and acts on. The effect is that an unauthorised third party can alter the agent's behaviour without direct access to the model or its configuration. Because AI agents operating in high-risk contexts typically process external content as part of their task, the prompt injection attack surface is significant and must be addressed as part of the Article 15(3) cybersecurity requirement.

The practical compliance requirement for deployers is to verify, before putting an AI agent into service, that the provider has specified the system's resilience against prompt injection in the technical documentation, and that the deployer's operational environment includes controls to reduce the injection attack surface. These controls include input sanitisation, privilege separation between the agent's action scope and the content it processes, and monitoring for anomalous action sequences that may indicate a successful injection. For more on the certification of performance and resilience properties, the Agent Certified framework provides a structured evaluation methodology for AI systems deployed in regulated contexts.

Technical redundancy and the connection to Article 9 risk management

Article 15(4) requires that technical redundancy solutions, which may include backup or fail-safe plans, be implemented where appropriate to ensure the continuity of the high-risk AI system's operation. The qualification "where appropriate" is proportionality language. Not every high-risk AI system requires full redundancy. The assessment of appropriateness sits within the risk management system required by Article 9.

The connection between Article 15 and Article 9 is structural. Article 9 requires providers and deployers to maintain a continuous, iterative risk management process throughout the lifecycle of the high-risk AI system. The three properties addressed by Article 15 are precisely the properties whose degradation creates the risks that Article 9 is designed to manage. An accuracy failure is a risk event. A robustness failure is a risk event. A successful cybersecurity attack is a risk event. Each must flow into the Article 9 risk register, trigger the appropriate evaluation procedure, and result in documented risk treatment decisions.

Post-market monitoring under Article 72 is the mechanism that closes the loop. Article 72 requires providers of high-risk AI systems to actively collect and review data on the performance of their systems after they are placed on the market or put into service. This includes data on accuracy, robustness, and cybersecurity incidents. Deployers feed into this process by notifying the provider, as required by Article 26(1)(d), when they observe performance deviations. The provider then integrates that information into the post-market monitoring plan and updates the risk management system accordingly. The deployer is not merely a passive user in this framework. It is a data source for the provider's lifecycle compliance obligations.

For the full picture of how Article 15 fits within the deployer's compliance obligations, the Article 9 risk management guide sets out the structure and documentation requirements of the risk management system in detail.

Documentation checklist: what a deployer must have in place

Compliance with Article 15 and the associated deployer obligations under Article 26(1)(d) requires a coherent set of documents. The minimum is a performance monitoring register, an incident response procedure for accuracy degradation events, and a cybersecurity controls log. Each is described below.

Performance monitoring register

The performance monitoring register is the document that records the accuracy metrics declared by the provider, the monitoring methodology the deployer is using to track those metrics, the frequency of measurement, the thresholds at which the monitoring system triggers a review, and the record of actual measurements over time. It should be kept as a live document and updated at every measurement cycle. The register is the primary evidence of compliance with Article 26(1)(d). A deployer who cannot produce it during a market surveillance inspection has no way to demonstrate that monitoring was occurring.

The register must also record the baseline conditions under which the declared metrics were established. If the provider declared accuracy metrics on a specific benchmark dataset or under specific operational conditions, the monitoring methodology must be designed to test comparable conditions. Measuring performance on a different distribution and comparing it to the declared metrics produces a meaningless compliance record.

Incident response procedure for accuracy degradation

The incident response procedure specifies what the deployer does when the performance monitoring register shows that the system is not meeting declared metrics. It must define the thresholds that constitute a breach, the person or role responsible for reviewing the breach, the procedure for notifying the provider under Article 26(1)(d), the criteria for determining whether continued operation is appropriate pending investigation, and the escalation path that leads to notification of the market surveillance authority if the breach involves a serious incident as defined in Article 3(49).

The procedure should be short enough to be usable under operational pressure. A five-page flowchart that requires three levels of approval before a notification can be sent is not operationally effective. The decision logic for the most common scenarios should fit on a single reference card that the monitoring personnel can access immediately.

Cybersecurity controls log

The cybersecurity controls log records the controls the deployer has implemented to address the Article 15(3) attack vectors: data poisoning prevention (if relevant to the deployment context), input validation and sanitisation, prompt injection mitigation for AI agents, anomaly detection configuration, and incident response for suspected attacks. It should record the date each control was implemented, the specification it implements, and the review schedule.

For AI agents operating in environments where prompt injection is a plausible attack vector, the cybersecurity controls log should specifically record how the deployer has addressed the injection risk: what content sources the agent processes, what privilege controls limit the agent's action scope in response to injected instructions, and how anomalous action sequences are detected and escalated.

The insurance dimension: accuracy metrics as underwriting instruments

The connection between Article 15 compliance and AI insurance coverage is closer than the compliance framing suggests. Insurers writing AI liability cover in the European market are evaluating the same properties that Article 15 mandates, because those properties are direct predictors of claim frequency and severity.

Munich Re's aiSure product, the most structurally advanced AI performance insurance product currently available in the European market, includes parametric trigger provisions that settle on accuracy metric breaches without requiring the insured to prove a causal chain to a downstream harm. The mechanism is straightforward: the policy schedule specifies the accuracy threshold and the measurement methodology; when the monitoring register shows that the threshold has been breached for a defined period, the parametric payout is triggered. The Article 15(1) accuracy declaration is therefore not only a compliance artefact. It is the contractual foundation of the insurance product. An insured who cannot produce a clear accuracy declaration and a monitoring register is not insurable under that product structure.

The AIUC-1 standard, the emerging AI underwriting criteria framework referenced by European insurers active in the AI liability market, treats cybersecurity resilience as a separate rating factor alongside accuracy and robustness. A system with strong declared accuracy metrics but no documented cybersecurity controls for adversarial attacks receives a higher risk loading. The Article 15(3) cybersecurity controls log described above is the evidence base that addresses that rating factor.

For deployers preparing both compliance documentation and insurance coverage discussions, the most efficient approach is to treat the Article 15 documentation package as the primary evidence base and map it to the underwriting questionnaire, rather than maintaining separate compliance and insurance records. The questions are the same. Only the framing differs. For further context on how certification of AI performance properties connects to coverage eligibility, see agentcertified.eu.

For the full set of deployer obligations under Article 26, including how the Article 26(1)(d) monitoring duty sits within the broader compliance architecture, the Article 26 complete guide provides the systematic treatment. For the human oversight requirements that complement the Article 15 performance requirements, the Article 14 guide addresses staffing, authority, and documentation.

Frequently asked questions

What does Article 15 of the EU AI Act require from high-risk AI systems?

Article 15 of Regulation (EU) 2024/1689 requires that high-risk AI systems achieve an appropriate level of accuracy, robustness, and cybersecurity throughout their lifecycle. The provider must declare accuracy metrics in the instructions for use and technical documentation. Deployers must monitor performance against those declared metrics on an ongoing basis under Article 26(1)(d). Accuracy failures are risk management events under Article 9.

What is the difference between accuracy and robustness under Article 15?

Accuracy refers to the correctness of outputs relative to the ground truth the system was designed to approximate, expressed as measurable metrics declared by the provider. Robustness refers to the system's resilience under non-ideal conditions: errors, faults, inconsistencies, and adversarial or unexpected inputs that were not in the training distribution. A system can be accurate on standard inputs and non-robust against edge cases or adversarial manipulation. Both properties must be maintained throughout the operational lifecycle.

Does Article 15 apply to AI agents and does it cover prompt injection?

Yes. Article 15(3) requires that high-risk AI systems be resilient against attempts by unauthorised third parties to alter their use or performance. For AI agents that receive and process natural language instructions, this includes prompt injection attacks, where adversarial content in the environment is crafted to alter the agent's action sequence. The robustness obligation applies to the full input surface, not only to numerical or structured data inputs.

What is the deployer's monitoring obligation under Article 26(1)(d) in relation to Article 15?

Article 26(1)(d) requires deployers to monitor the operation of the high-risk AI system on the basis of the instructions for use and to inform the provider when accuracy metrics are not being met. This is a continuous obligation, not a one-time acceptance check. The deployer must establish a monitoring regime that tracks the system's performance against the provider's declared accuracy metrics, logs deviations, and triggers the notification and risk management procedures required under Articles 9 and 72 when performance degrades.

How does the Digital Omnibus affect Article 15 compliance timelines?

The Digital Omnibus on AI, currently in trilogue as of June 2026, proposes to delay the high-risk obligations from 2 August 2026 to 2 December 2027 for most high-risk AI systems listed in Annex III. Until the Omnibus is formally adopted and published in the Official Journal, the original 2 August 2026 deadline remains legally binding. GPAI models with systemic risk under Articles 51 to 55 are not covered by the proposed delay. Article 15 compliance for those models was required from 2 August 2025.

References

Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act), OJ L, 12.7.2024.
Article 15, Regulation (EU) 2024/1689, accuracy, robustness, and cybersecurity requirements for high-risk AI systems.
Article 26(1)(d), Regulation (EU) 2024/1689, deployer obligation to monitor performance against declared accuracy metrics.
Article 9, Regulation (EU) 2024/1689, risk management system requirements for high-risk AI systems.
Article 11, Regulation (EU) 2024/1689, technical documentation requirements for providers of high-risk AI systems.
Article 13, Regulation (EU) 2024/1689, instructions for use, including required accuracy disclosures.
Article 72, Regulation (EU) 2024/1689, post-market monitoring obligations for providers.
Annex I, Regulation (EU) 2024/1689, techniques and approaches for AI systems, including the distinction between machine learning and rule-based systems.
Recital 51, Regulation (EU) 2024/1689, on accuracy measurement and the need for lifecycle performance monitoring.
European Insurance and Occupational Pensions Authority. Opinion on artificial intelligence governance in the insurance and occupational pensions sectors. Frankfurt, August 2025.
Munich Re. aiSure: AI performance insurance product documentation. Munich, 2025. Schedule D: parametric accuracy trigger provisions.
NIST. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. NIST AI 100-2e2023. National Institute of Standards and Technology, 2024.
Perez, F. and Ribeiro, I. Prompt Injection: A Taxonomy of Attacks Against Language Model Agents. arXiv preprint, 2023.

EU AI Act Article 15. Accuracy, robustness, and cybersecurity for high-risk AI systems.