PDF Remediation: A Practical Guide to PDF/UA

PDF remains the workhorse format for reports, journals, forms and archival documents, yet most PDFs in circulation are effectively unreadable for people who rely on screen readers. A visually polished document can be a wall of silence for assistive technology if its underlying structure is missing or wrong. PDF remediation is the discipline of fixing that structure after the fact: adding tags, correcting reading order, describing images and making interactive elements operable without a mouse.

In our remediation work at Emayyam, the same problems surface again and again, regardless of whether the source file came from InDesign, Word or a scanned original. This guide walks through the core remediation tasks in the order we typically tackle them, the standard that defines success, and the testing routine that tells you whether a document genuinely works for real users rather than merely passing an automated checker.

What PDF/UA actually requires

PDF/UA, formally ISO 14289, is the international standard for universally accessible PDF. Where WCAG describes outcomes for web content in general, PDF/UA specifies how a PDF file must be built: all meaningful content must be tagged, decorative content must be marked as artifact, tags must follow a logical structure, and the document must declare its language and title. It is the benchmark most procurement teams and accessibility auditors now reference for documents.

The heart of compliance is the tag tree. Tags are invisible structural labels, such as headings, paragraphs, lists, tables and figures, that assistive technologies use to navigate and announce content. A remediated PDF needs a tag for every piece of real content, the right tag type in the right place, and nothing in the tree that should not be there. Most remediation time is spent building or repairing this tree.

Getting the reading order right

Sighted readers infer reading order from layout: columns, sidebars, captions and pull quotes all signal sequence visually. A screen reader follows the order of the tag tree instead, and when that order is wrong the document becomes incoherent, with sidebars interrupting sentences or captions read pages away from their figures. Multi-column layouts, text wrapped around images and complex magazine-style spreads are the usual trouble spots.

We fix reading order by walking the tag tree alongside the visual page, not by trusting the order panel alone. The test is simple: read the tags top to bottom and ask whether the narrative still makes sense. Footnotes, running headers, page numbers and watermarks should be artifacted out entirely so they never interrupt the flow. This step is tedious, but it determines whether everything else you do matters.

Alt text that actually helps

Every meaningful image needs alternate text, and writing it well is harder than tagging it. Good alt text conveys the purpose of the image in context, not a literal inventory of its contents. A logo needs only the organization name; a process diagram may need a structured description; a chart should state the trend or comparison the author intended the reader to take away, with the underlying data offered in a table when precision matters.

Purely decorative images, borders, flourishes and background textures should be artifacted, not described, because empty announcements waste a listener's time. For complex figures in STM content, we often pair short alt text with a longer description placed in the visible text or an appendix. The discipline is editorial as much as technical, which is why remediation teams need writers, not just tool operators.

Tables: where most remediations fail

Tables are the single most common failure point we see in audit work. A screen reader user navigates a table cell by cell, relying on header associations to know what each value means. That requires correctly tagged header cells, defined scope for rows and columns, and, for complex tables, explicit header IDs linking data cells to the headers that govern them. Without these, a table is just numbers read aloud in sequence.

Layout tables, meaning tables used purely to position content, should not be tagged as tables at all. Merged and split cells need careful span settings, and tables that were flattened into images must be rebuilt as real text. When a source table is genuinely too convoluted, the kindest fix is often to simplify or split it, which requires a conversation with the content owner rather than a silent workaround.

Tag header cells as TH, not TD
Set scope for row and column headers
Use header IDs for multi-level tables
Never tag layout tables as data tables
Rebuild image-only tables as live text

Forms and interactive elements

Interactive PDF forms add another layer: every field needs an accessible name, usually supplied via the tooltip, that tells the user exactly what to enter. Tab order must follow the visual and logical flow of the form, radio buttons need both group names and distinct option values, and required fields and error conventions should be stated in text rather than signalled by colour alone.

Links deserve the same care. Link annotations must sit inside link tags with text that makes sense out of context, because screen reader users frequently pull up a list of links and nothing else. We also check that any buttons or JavaScript-driven behaviours are keyboard operable, since a form that demands a mouse excludes a large share of assistive technology users by default.

Testing with Acrobat, PAC and JAWS

Automated checking comes first. Acrobat's built-in accessibility checker catches missing tags, language settings and untitled documents, while PAC, the free PDF Accessibility Checker, validates against PDF/UA far more rigorously and renders a screen reader preview of the tag structure. Treat a clean automated pass as the entry ticket, not the finish line, because no tool can judge whether alt text is meaningful or reading order is sensible.

The decisive test is listening. We read every remediated document with a screen reader, typically JAWS or NVDA, navigating by headings, jumping between tables and tabbing through form fields the way a real user would. Five minutes of listening exposes problems that hours of visual inspection miss. If a remediation team never puts on headphones, its compliance claims rest on hope rather than evidence.

Building a sustainable remediation workflow

Remediation scales badly if every document is a one-off heroic effort. High-volume programmes need triage that sorts simple documents from complex ones, templates and source-file fixes that prevent recurring defects, and a quality gate that combines automated validation with human screen reader review. The cheapest remediation is the one you avoid by fixing the Word or InDesign template upstream, before a thousand broken PDFs are generated from the same flawed source.

If you are starting a remediation programme, begin with a small representative batch: one report, one form, one scanned legacy document. Remediate them to PDF/UA, test them with PAC and a screen reader, and document every decision. That pilot becomes your playbook, your cost baseline and your training material, and it will tell you more about your real backlog than any estimate produced from a file count.