There’s a lot of chatter on the intertubes of late around the thoroughness and accuracy of various accessibility testing approaches; Manual, Automated and Artificial Intelligence (AI)-based testing. I thought it might be good to take a look at these comparisons, to have a better sense when different approaches can be effective.
Instead of writing a general overview between each approach, I felt it may provide additional value in looking at specific Web Content Accessibility Guidelines (WCAG) Success Criteria, and the pros and cons to each method. With this post, I am starting a series of articles covering all of the WCAG success criteria. Let’s see how this goes; if I start going down a rabbit hole, I’ll revisit my approach to these.
Up first: WCAG Success Criterion 1.1.1 (Non-text Content).
Explanation of the success criteria
WCAG Success Criterion 1.1.1 (Non-text Content) falls under the principle of “Perceivable” (the P in POUR, Perceivable, Operable, Understandable, Robust), and the guideline “Text Alternatives.” The goal of this success criteria is to ensure that non-text information (images, photos, icons, etc.) is available to everyone, no matter how they are obtaining the information; visually, via a screen reader, refreshable braille display, etc.
Testing via automated testing
Automated tools efficiently scan entire pages, screens and websites for missing or incorrect text alternatives using predefined rules, offering high scalability by checking thousands of pages instantly. They detect common issues like missing alt attributes and improper use of alt=""
. These tools are best suited for quick preliminary compliance checks.
Automated tools have limited accuracy, detecting missing alt attributes but not assessing the quality or appropriateness of descriptions. Their context awareness is low, focusing only on the presence or absence of alt text.
Examples of what automated testing excels at
Image | Code |
---|---|
![]() | <img src='mountain.png'> |
![]() | <button>
<i class="fa fa-envelope">
</button> |
![]() | <svg width="50" height="50" viewBox="0 0 200 200" xmlns="http://www.w3.org/2000/svg">
<!-- Background Circle -->
<circle cx="100" cy="100" r="80" fill="#3498db" stroke="#2980b9" stroke-width="5" />
<!-- Triangle -->
<polygon points="100,40 40,160 160,160" fill="#e74c3c" stroke="#c0392b" stroke-width="5" />
</svg>
|
Examples of automated testing’s limitations
Image | Code |
---|---|
![]() | <img src='mountain.png' alt='placeholder'> |
![]() | <button>
<i class="fa fa-envelope">
<span class="sr-only">Send</span>
</button> |
![]() | <svg width="50" height="50" viewBox="0 0 200 200" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="logo">
<title id="logo">Shape</title>
<!-- Background Circle -->
<circle cx="100" cy="100" r="80" fill="#3498db" stroke="#2980b9" stroke-width="5" />
<!-- Triangle -->
<polygon points="100,40 40,160 160,160" fill="#e74c3c" stroke="#c0392b" stroke-width="5" />
</svg>
|
Testing via AI
AI analyzes images to generate contextually appropriate text alternatives, detecting issues like missing descriptions and flagging potential problems. It excels at bulk text alternative generation and serves as a useful tool for preliminary reviews. It may be able to assist with some of our text alternative issues above.
AI offers moderate to high accuracy in generating text alternatives, though it may lack full contextual understanding and misinterpret complex images. It is faster than manual efforts but slower than rule-based automation. With moderate context awareness, AI can infer meaning but may still require validation. Its high scalability allows for analyzing large datasets, though AI-generated descriptions may lack nuance and need review.
Manual testing
Human testers can review non-text content, such as images, icons, and charts, for WCAG compliance with high accuracy. Their strong context awareness allows them to assess content meaning and intent, detecting issues like missing or incorrect alt text, proper classification of meaningful vs. decorative images, and the relevance of descriptions. Manual testing is best suited for evaluating complex visuals, ensuring meaningful descriptions, and refining intent-based assessments.
While high in quality, manual reviews are time-consuming and requires expertise, making it difficult to scale for large websites. It demands significant effort to assess each element individually.
Which approach is best?
No single approach guarantees perfect text alternatives. However, using the strengths of each approach in combination can have a positive effect. Automated testing is ideal for quickly scanning entire websites or apps, performing preliminary compliance checks. AI can enhance this process by generating bulk text alternatives and assisting with reviews. Manual testing remains essential for ensuring meaningful descriptions, evaluating complex visuals, and assessing intent.