What AI-Powered eCTD Validation Actually Looks Like in Practice

Validation Is Where Submissions Succeed or Fail

A rejected regulatory submission doesn’t just delay a timeline — it restarts a clock. For a pharma company with a drug nearing patent cliff, or a biotech running low on runway, the weeks lost to a validation failure have real financial impact. Yet most validation failures are preventable: wrong PDF settings, missing bookmarks, broken hyperlinks, incorrect page sizes.

This is where eCTD validation AI has genuine practical value — not in replacing the validation rules (which should be deterministic), but in helping teams prepare documents that pass validation on the first attempt.

The 23 Rules DnXT’s Validator Checks

Our ComplianceValidator implements 23 rules across eight categories. Every rule produces exactly one result per document: pass or fail. There are no probabilities, no confidence scores, no “maybe.” Regulatory validation requires binary answers.

Structure (3 rules)

PDF Version — must be PDF 1.4–1.7 (per FDA/EMA technical specifications)
Valid Structure — well-formed PDF with valid cross-reference table
Linearized — optimized for web viewing (auto-fixable)

Font (3 rules)

Font Embedding — all fonts must be embedded (auto-fixable)
Font Subsetting — embedded fonts should be subsetted to reduce file size
Standard Fonts Only — no custom fonts that might render differently on agency systems

Bookmark (4 rules)

Bookmarks Required — documents over a certain page count must have bookmarks
Bookmark Structure — bookmarks must follow the document’s heading hierarchy
Bookmark Zoom Inheritance — bookmarks must use “Inherit Zoom” setting (auto-fixable)
No Broken Bookmarks — every bookmark must point to a valid page

Link (3 rules)

Relative Links — hyperlinks between documents must be relative, not absolute
No Broken Links — every hyperlink must resolve to a valid target
Valid Destinations — link destinations must be within the submission structure

Metadata, Security, Page, Content (10 rules)

Title metadata required (auto-fixable), no sensitive metadata (auto-fixable)
No password protection, copy allowed, print allowed
Page size compliance (Letter or A4 per region), orientation check
No JavaScript, no multimedia, no embedded files — security requirements from all major health authorities

Multi-Region Validation

Different health authorities have different requirements. Our validator supports five regions with region-specific regulatory submissions criteria:

US (FDA) — US regional v3.3 DTD, Letter page size, FDA-specific file naming conventions
EU (EMA) — EU regional v3.1.1 DTD, A4 page size, EU-specific module requirements
JP (PMDA) — JP regional DTD, bilingual requirements for certain document types
CA (Health Canada) — CA regional DTD, bilingual (English/French) validation
AU (TGA) — AU regional DTD, Australia-specific documentation requirements

The validator loads region-specific validation criteria files and resolves DTD specifications dynamically — using a flexible pattern that matches any regional DTD rather than hardcoding file names. This was a fix from a real incident where the validator only recognized US specifications.

Making Validation Failures Visible

One of our harder-won lessons: for every selected validation criterion, the report must contain exactly one result — pass or fail. Zero results is a bug. Two results is a bug.

We learned this when validation reports showed grey boxes — criteria that were selected but produced no result. The report renderer showed these as empty checkboxes, visually identical to “user didn’t select this criterion.” An auditor looking at the report couldn’t tell whether a check passed, failed, or never ran.

Our result guard mechanism wraps every criterion evaluation: it counts results before and after, and if a criterion produces no result (usually because an internal error was silently swallowed), the guard injects a visible failure marker. This makes errors visible instead of silent — an important principle when your validation reports are part of a regulatory submission.

Where AI Helps — and Where It Doesn’t

Here is where we draw a clear line between what AI does and what deterministic rules do in eCTD validation:

AI helps with preparation

Document classification — AI suggests the correct eCTD module and section based on document content. A clinical study report gets classified as Module 5.3, not Module 3.2. The human reviews and confirms.
Predictive issue detection — Before the user runs full validation, AI can flag documents that are likely to fail (e.g., a PDF without bookmarks that’s over 10 pages).
Cover letter drafting — AI generates a draft cover letter based on submission context. The human reviews and signs.

Deterministic rules handle validation

PDF compliance checking — The 23 rules above are deterministic. A PDF either has embedded fonts or it doesn’t. There’s no AI judgment involved.
DTD validation — eCTD XML structure either conforms to the DTD or it doesn’t.
File naming conventions — Agency-specific naming rules are exact string matches.

We don’t use AI for the actual validation because regulatory compliance requires reproducible, deterministic results. If you run the same document through validation twice, you must get the same answer both times. Probabilistic AI models don’t guarantee that.

Auto-Fix Capabilities

Five of the 23 rules are auto-fixable: linearization, bookmark zoom, title metadata, sensitive metadata removal, and font embedding. When a document fails one of these rules, the system can fix it automatically and re-validate — with the fix recorded in the audit trail.

Auto-fix is limited to mechanical corrections with no judgment component. We don’t auto-fix bookmark structure (because the correct hierarchy is a judgment call), and we don’t auto-fix broken links (because the correct target requires context about the submission).

The practical value of AI in eCTD validation isn’t replacing the validator — it’s reducing the number of documents that reach the validator with preventable issues. Better preparation means fewer validation cycles, which means faster submissions.

This article was written by the DnXT Solutions team. If you’ve found a validation scenario we don’t cover well, we’d like to hear about it at se******@***********ns.com.