Test development
Some accessibility evaluation problems to be addressed by EVAL TF


The objective of W3C's new Evaluation Methodology Task Force (Eval TF) ist to develop an internationally harmonized methodology for evaluating the conformance of websites to WCAG 2.0. In this article, we try to identify some of the key problems ahead.

Author: Detlev Fischer, BIK Test Development

The objective of EVAL TF falls into three parts. We quote these parts here and propose the respective problems that we think will need to be adressed as bullet points underneath.

Selecting representative samples of web pages from entire websites; this includes defining approaches for dynamically generated websites and web applications, large-scale surveys, and other contexts.

  • Whole-part problem: When checking success criteria (SC) for dynamically generated parts of a page, you don’t want to run through redundant things but still cover all aspects that may constitute an accessibility problem
  • Full page coverage (WCAG conformance statement 2): With pages having many different and possibly interdependent states, testing could get quite exhausting
  • Complete processes problem (WCAG conformance statement 2): It is not always easy to demarcate a complete process; a high redundancy for many SC is likely across process steps; there is the possibility of a distortion of the test score when it includes multi-page processes
  • Tracking / documentation problem: Sample selection cannot always be URL based, cannot be saved locally

Carrying out evaluation of individual web pages using WCAG 2.0 Techniques; this includes defining approaches for selection appropriate WCAG 2.0 Techniques and assessing Accessibility-Support assumptions.

  • Testing level problem: Accessibility testing must be positioned on the level of SC, not on the level of WCAG technique since most of the time, compliant alternative techniques exist (witness disclaimers coming with the test for each technique).
  • Failure coverage problem: If testing is therefore failure-based, registering no documented failure will not necessarily imply that the SC is actually met.
  • UA-AT variance problem: There will be variations of witnessed presentation and behaviour and exposure of accessibility names dependent on the respective user agent (UA) and assistive technology (AT) used for testing
  • Alternative versions problem: What measure should determine if alternative compliant versions were indeed unavoidable, or the default pages should have been accessible?

Aggregating individual results into an overall conformance statement; this includes defining approaches for assessing the relative impact on failures, potentially through incorporating tolerance metrics.

  • Prioritization problem: Failures can have a strong or just a light impact. Determining the impact of a particular failure instance requires human assessment and should be reflected in any test score. (Example: a missing or misleading alt attribute on a graphical navigation menu is critical while it is relatively unimportant on some page footer logo)
  • Critical failures problem: Critical failures must not be drowned in statistically calculated scores
  • Tolerance assessment problem: In human evaluation, assessments of tolerance are likely to differ. Arbitrating results may require multiple testers or quality review / assurance processes