Validation Is Where Submissions Succeed or Fail

A rejected regulatory submission doesn’t just delay a timeline — it restarts a clock. For a pharma company with a drug nearing patent cliff, or a biotech running low on runway, the weeks lost to a validation failure have real financial impact. Yet most validation failures are preventable: wrong PDF settings, missing bookmarks, broken hyperlinks, incorrect page sizes.

This is where eCTD validation AI has genuine practical value — not in replacing the validation rules (which should be deterministic), but in helping teams prepare documents that pass validation on the first attempt.

The 23 Rules DnXT’s Validator Checks

Our ComplianceValidator implements 23 rules across eight categories. Every rule produces exactly one result per document: pass or fail. There are no probabilities, no confidence scores, no “maybe.” Regulatory validation requires binary answers.

Structure (3 rules)

  • PDF Version — must be PDF 1.4–1.7 (per FDA/EMA technical specifications)
  • Valid Structure — well-formed PDF with valid cross-reference table
  • Linearized — optimized for web viewing (auto-fixable)

Font (3 rules)

  • Font Embedding — all fonts must be embedded (auto-fixable)
  • Font Subsetting — embedded fonts should be subsetted to reduce file size
  • Standard Fonts Only — no custom fonts that might render differently on agency systems

Bookmark (4 rules)

  • Bookmarks Required — documents over a certain page count must have bookmarks
  • Bookmark Structure — bookmarks must follow the document’s heading hierarchy
  • Bookmark Zoom Inheritance — bookmarks must use “Inherit Zoom” setting (auto-fixable)
  • No Broken Bookmarks — every bookmark must point to a valid page

Link (3 rules)

  • Relative Links — hyperlinks between documents must be relative, not absolute
  • No Broken Links — every hyperlink must resolve to a valid target
  • Valid Destinations — link destinations must be within the submission structure

Metadata, Security, Page, Content (10 rules)

  • Title metadata required (auto-fixable), no sensitive metadata (auto-fixable)
  • No password protection, copy allowed, print allowed
  • Page size compliance (Letter or A4 per region), orientation check
  • No JavaScript, no multimedia, no embedded files — security requirements from all major health authorities

Multi-Region Validation

Different health authorities have different requirements. Our validator supports five regions with region-specific regulatory submissions criteria:

  • US (FDA) — US regional v3.3 DTD, Letter page size, FDA-specific file naming conventions
  • EU (EMA) — EU regional v3.1.1 DTD, A4 page size, EU-specific module requirements
  • JP (PMDA) — JP regional DTD, bilingual requirements for certain document types
  • CA (Health Canada) — CA regional DTD, bilingual (English/French) validation
  • AU (TGA) — AU regional DTD, Australia-specific documentation requirements

The validator loads region-specific ValidationCriteria.xml files and resolves DTDs dynamically — using a regex pattern that matches any <region>-regional-*.dtd rather than hardcoding file names. This was a fix from a real incident where the validator only recognized US DTDs.

The ResultGuard Pattern

One of our harder-won lessons: for every selected validation criterion, the report must contain exactly one <result status="true|false"/> element. Zero results is a bug. Two results is a bug. Anything other than true/false is a bug.

We learned this when validation reports showed grey boxes — criteria that were selected but produced no result. The PDF renderer showed these as empty checkboxes, visually identical to “user didn’t select this criterion.” An auditor looking at the report couldn’t tell whether a check passed, failed, or never ran.

The ResultGuard pattern wraps every criterion evaluation: it counts results before and after, and if a criterion produces no result (usually because a sub-validator swallowed an exception), ResultGuard injects a red X with the message “INTERNAL: rule produced no result element.” This makes failures visible instead of silent.

Where AI Helps — and Where It Doesn’t

Here is where we draw a clear line between what AI does and what deterministic rules do in eCTD validation:

AI helps with preparation

  • Document classification — AI suggests the correct eCTD module and section based on document content. A clinical study report gets classified as Module 5.3, not Module 3.2. The human reviews and confirms.
  • Predictive issue detection — Before the user runs full validation, AI can flag documents that are likely to fail (e.g., a PDF without bookmarks that’s over 10 pages).
  • Cover letter drafting — AI generates a draft cover letter based on submission context. The human reviews and signs.

Deterministic rules handle validation

  • PDF compliance checking — The 23 rules above are deterministic. A PDF either has embedded fonts or it doesn’t. There’s no AI judgment involved.
  • DTD validation — eCTD XML structure either conforms to the DTD or it doesn’t.
  • File naming conventions — Agency-specific naming rules are exact string matches.

We don’t use AI for the actual validation because regulatory compliance requires reproducible, deterministic results. If you run the same document through validation twice, you must get the same answer both times. Probabilistic AI models don’t guarantee that.

Auto-Fix Capabilities

Five of the 23 rules are auto-fixable: linearization, bookmark zoom, title metadata, sensitive metadata removal, and font embedding. When a document fails one of these rules, the system can fix it automatically and re-validate — with the fix recorded in the audit trail.

Auto-fix is limited to mechanical corrections with no judgment component. We don’t auto-fix bookmark structure (because the correct hierarchy is a judgment call), and we don’t auto-fix broken links (because the correct target requires context about the submission).

The practical value of AI in eCTD validation isn’t replacing the validator — it’s reducing the number of documents that reach the validator with preventable issues. Better preparation means fewer validation cycles, which means faster submissions.

This article was written by the DnXT Solutions team. If you’ve found a validation scenario we don’t cover well, we’d like to hear about it at se******@***********ns.com.