Test development
Web accessibility evaluation: The BITV-Test in the international and European context


How does the German BITV-Test compare to other accessibility evaluation approaches in Europe? Extended notes of a talk given at the European Blind Union Technical Group Meeting, Leipzig, 26. 09. 2009

Author: Detlev Fischer, Manager BIK Test Development

1 Introduction

In the field of web accessibility, the Web Content Accessibility Guidelines (WCAG), developed by the W3C’s Web Accessibility Initiative and published as W3C recommendation, take the position of a quasi standard. This explains that the latest version of the guidelines, WCAG 2.0, is currently being used as the basis for a revision of the BITV , the German directive setting out the accessibility requirements for Federal web sites and graphic user interfaces. It is also a common point of reference for all other approaches identified so far.

The BITV-Test was developed in order to support and monitor the implementation of BITV across federal and other web sites. The development was funded by the German Ministry of Work and Social Affairs (BMAS). The test has made compliance with the BITV-regulation verifiable, giving commissioners of web sites an instrument to measure the level of accessibility reached and in turn, a means to assess the quality of the web agencies they employ.

How does the German BITV-Test compare to other accessibility evaluation approaches in Europe?

While the German BITV-Test is widely used and accepted as a workable approach for checking the degree of accessibility of web sites, it is instructive to compare it to other approaches found in Europe. Given the same web site, would the use of different evaluation approaches arrive at similar results? Is it enough to check web accessibility at one given point in time, or are there other approaches that help ensure ‘sustainable’ accessibility over the life time of a web site? Can automated test procedures employed in other countries simplify the time-consuming manual expert evaluation necessary to discover many of the most serious accessibility barriers?

The aim of this talk

While not attempting to answer these questions succinctly or comprehensively, this talk reviews some approaches to accessibility evaluation found across Europe, and compares these to the approach chosen in the BITV-Test.

The interest of the BIK project, originator of the BITV-Test, in this a comparison is two-fold:

  1. We wish to identify other accessibility evaluation approaches - or certain aspects of them – that can enrich and improve our own methodology. We believe a lively exchange about practicalities of evaluation, testing and ranking including the assessment of emergent web techniques will benefit all parties involved, especially in a situation where many actors are updating their methods to reflect WCAG 2.0. So this talk is an open invitation to start a mutual exchange.
  2. We want to point out some benefits of the evaluation approach followed in the BITV-Test, hoping that developments elsewhere in Europe might benefit from the experience gathered in 7 years of test practice and during the continuous refinement of the testing methodology

2 Accessibility evaluation approaches – a fragmented picture across Europe

The picture of web accessibility evaluation approaches across Europe is rather uneven. A number of approaches exist, most of them referring in some way to WCAG 1.0. Some are linked to European projects centred around the development of UWEM (Unified Web Evaluation Methodology). However, it is not easy to ascertain the exact role that UWEM plays in the actual evaluation process when dealing with individual web sites, for example, regarding the selection of pages to be tested, the set of instructions for testers with specific references to testing tools, the criteria for assessing compliance, and the ranking scheme used.

The approaches found are by and large positioned on a national level, or may be initiatives by companies or individual organisations within Member States representing the interests of particular segments within the group of users with disabilities. Beyond WCAG as a common point of reference and in some cases, the reference to the Unified Web Evaluation Methodology (UWEM) that was the result of a cluster of European projects, there seems to be little commonality with regard to the actual approach – or methodology - followed in accessibility evaluation. The degree of commonality however is difficult to establish since in most cases, details of the approach are not publicly documented.

Why is there no European standard for eAccessibility?

Given the global character of the World Wide Web and the body regulating it, it is perhaps not surprising that the European Commission has not played a more prominent role in defining standards for eAccessibility. To be sure, the Commission does promote "eAccessibility" and has also commissioned a number of studies and projects, many of them aimed at measuring the uptake of accessible web design across Europe. Perhaps one of the motives behind these studies was to demonstrate the poor performance of many public sector websites across Europe, and thereby, lend more weight to the call for a binding European standard for eAccessibility – a call often supported by the interest groups of disabled persons.

However, a European standard (witness a recent suggestion by Viviane Reding to introduce a European Disability Act adopting WCAG 2.0 as binding for Europe) , is less liked by national governments that fear to be forced to implement changes that, given the current poor accessibility of many public sites, may be deemed too disruptive and too expensive - even if no one would dare make that argument publicly. So in the absence of a European standard, the EU seems currently largely reduced to hand-waving, pointing out that international ICT standards or recommendations such as the WCAG are currently not eligible for association with EU legislation and policies.

Mandate 367 with little visibility

In another area, accessibility is to be included in standard requirements for EU public procurement processes, following the Mandate 376 given by the European Commission to the European Standardisation Organisations (ESOs).

Communication about the state of play regarding Mandate 376 does not seem high on the agenda; four years into the development work, there is no joint website and precious little information on progress made so far. Could it be that progress has been too slow or too painful to be granted public airing? Phase 1 of Mandate 376 promised a ‘Report on testing and certification schemes’, but so far (to my knowledge), no one involved in Mandate 376 has ever requested information about the approach taken by the BIK project, developer of the BITV-Test, one of the best-documented approaches around.

3 The role of UWEM

The EU financed the development of UWEM which follows WCAG 1.0. While UWEM includes testing procedures, it is not specific as to the actual procedure to be applied (what tools to use, how to use them). Some characteristics of UWEM:

  • UWEM differentiates whether a test procedure is fully automatable or not (most are not)
  • UWEM has just fail or pass per checkpoint (no grey areas)
  • Some critical tests are missing (logical tab order, detection of keyboard traps)
  • Some questionable checks are included (e.g., the use of the q-Element which does not work as expected in Internet Explorer up to version 7)
  • Apparently UWEM has mainly been applied in automated approaches

The mapping of national test procedures to UWEM which is the basis of the EURACERT label seems largely a failure, judging by the few sites that have taken the trouble to apply for the label. The site shows no activity since October 2007. There has been talk about initiatives to update WCAG 1.0-based UWEM to WCAG 2.0. The WAB Cluster has published a migration plan but there has been no progress since then. Currently, some of the WAB Cluster partners are looking for resources to execute the work.

4 Some common problems in accessibility evaluation

The following points list some of the common problems encountered in accessibility evaluation.

The scope of automatic testing is limited. However, tools such as WAVE can help spotlight obvious deficits and clear omissions, and there is a role for automatic testing in the pre-screening of sites. Some of the most critical elements, however, cannot be tested automatically: take, for example, tab order, scalability (‘zoom page’ and zoom text only’) , or the use of meaningful alternative texts on images.

There are some organisations trying hard to automate the colour contrast checks (Fraunhofer), but this is bound to be inconclusive (e.g. for text placed over background images with more than one colour, such as photographs) The results of automatic testing may be misleading. For example, checking for the presence of non-empty alt attributes of images cannot determine whether the content of the alt-attribute is actually useful for the image in question. It can actually encourage implementations (in a CMS, for example) which by default set images’ alt attributes to the image URL, leading to a meaningless and counterproductive information for screen reader users. It is then down to the diligence of the editor to replace these with meaningful texts – something that is easy to forget.

Some of the grading / rating schemes used are questionable since they may not fairly reflect the severity of accessibility problems. Those that are not documented so that decisions cannot be checked and verified independently are difficult to rely on because of their lack of transparency. This does not mean that every accessibility test has to be open; clearly some companies that provide private services to their customers need to keep their methods to themselves. The situation is different however for approaches that hand out quality seals claiming WCAG compliance. Any approach to ranking of an entire site has to cope with a number of methodological problems:

  • priorities are needed to account for the severity of accessibility problems
  • some critical failures can justify an overall downgrading
  • weighting of instances of problems or failures on any given page can distort results
  • contextual judgements may require domain experience (e.g., is an error symbol clear enough from context, without alt and title attribute?)
  • changes to the page (e.g., dynamic elements, embedded multimedia, AJAX updates to content, processes based on progressive steps) means that a singular URI may no longer be sufficient to describe what is being tested

There is an overlap between accessibility and usability. Something technically accessible can nevertheless be unusable. (It is worth pointing out that many of the requirements checked in WCAG imply aspects of usability: whether page titles, alternative texts of images and form labels used are meaningful can only be determined when the domain and functional context of the site can be understood.)

Finally, there are always errors and oversights in human judgement. Even in tests carried out by experienced testers, people tend to overlook or misinterpret things. This is the reason for BITV-Test to mandate two testers for final tests who, after testing independently, talk through all checkpoints with differences in assessment in the arbitration phase before finalising a jointly agreed result.

5 Comparing different schemes of accessibility testing

Various accessibility evaluation schemes and quality labels exist in Europe. The list is not intended to be inclusive:

It is difficult to compare the approaches because many are either not publicly documented in detail or, perhaps, not documented at all. Some organisations are reluctant to share information on the approach since they see it as a competitive asset. Finally, there is the language problem. Some procedures may exist in other EU countries which I simply didn't come across.

This is not the place to go into great detail regarding the differences of all these approaches. However, a comparison is worthwhile. A short review based on an email request with a list of questions about aspects such as sample size, documentation of test steps, training of testers and the state regarding the adoption of WCAG 2.0 is currently under preparation. The results will soon be published at this site.

Sample and scope

Regarding the sample size, there is a lot of variation across European approaches. Commonly, the sample contains important pages (such as start page, contact page with form input, and typical content pages) and reflects the different templates used. Some schemes explicitly search for weaknesses, others try to cover a large number of pages, apparently to make results more objective. The trade-off in using large samples (UWEM, for example, prescribes the test of 30 pages) is that such tests are simply too time-consuming for scrupulous manual testing, which may encourage the use of automatic checks which can only ever give a coarse impression whether accessibility has been on the agenda at all. This may be useful in benchmarking studies who can neglect the individual sites covered. For any particular site however, this is not the way to reach a dependable assessment regarding accessibility.

Focusing on fewer pages in manual testing (the BITV-Test sets the minimum at three pages) introduces some uncertainty since important accessibility problems on pages not tested may go unnoticed. But the result for those pages that are tested is much more valid, especially if the test is carried out by two independent testers who afterwards compare and arbitrate results. Initially scrutinizing the site’s pages when defining the test sample is an art taking into account the size of the site and the number of different templates used.

Regarding the scope of testing, it will become more important to devise explicit testing procedures for non-HTML content. Many of the European approaches are currently being overhauled to reflect WCAG 2.0 with its claimed neutrality regarding web techniques. This raises the issue of how to test PDFs, multimedia and web apps presented, for example, in Flash or Silverlight.

Sustainability of results

Another aspect of scope is the sustainability of results over time. Some schemes such as AccessiWeb (FR) and Access-for-All (CH) offer a quality seal for a limited period only, usually a couple of years, and then require re-testing. The underlying problem is that any external consultant will have only limited influence on the "full lifecycle of accessibility" of a customer. Repeated tests are expensive. Checking that labels are actually taken down after two years when no re-test is issued and acting upon non-compliance adds administrative effort. Is it really done? The German DIN CERTCO scheme for certification of web sites never took off, simply because it was too expensive. DIN CERTCO has now shelved this service.

Accessibility of a site also rests on the accessibility knowledge and level of skill of online editors. So in the end, sustained accessibility is down to employee training and knowledge management within an organisation.

Testing approach and documentation

If the approach taken in accessibility evaluation is not properly documented, test procedures and methods risk to be non-transparent and prone to errors and inter-personal variations. Documentation is also vital for the training of new testers, where fully documented and comparable procedures enable the fine-tuning and discussion of assessments applied to specific pages. Here the approaches differ a lot. Some, like AccessiWeb and BITV-Test, have detailed online documentation, others have test step instructions used internally only. Some of the latter schemes such as Access-for-all plan a public documentation for the new procedure based on WCAG 2.0

Rating scheme

Large quantities of sites (as have been covered in European benchmark studies) are difficult or impossible to tackle without automatic testing. But ratings based on automatic testing are questionable if they are not merely used as a pre-filter (as in the case of RNIB, where a large number of local authorities is regularly tested). Rating schemes can distort results, drown serious ‘fails’ in many irrelevant ‘passes’, or fail to downgrade the overall result as ‘inaccessible’ because one or a few critical flaws make the entire site inaccessible (e.g., keyboard traps, image-based navigation without alternative texts or implemented using background images, or the use of a visual CAPTCHA with no sound alternative).

6 Changes to the BITV Test in response to WCAG 2.0

Given that the WCAG 2.0 have been released and that a new German regulation, BITV 2, is imminent, what are the plans of the BIK project for its own accessibility evaluation procedure, the BITV-Test?

There will not be so many fundamental changes because BITV-Test has been continuously updated, with annual revisions reflecting the technical changes that over time rendered some of the requirements of WCAG 1.0 obsolete. The test has been recently updated with a look at WCAG 2.0, reflecting changes in success criteria in WCAG 2.0 that were not in contradiction to the German regulation, BITV 1.0.

Checkpoints will be renamed to map onto the WCAG 2.0 numbering scheme (in some cases, success criteria will have several checkpoints) Some checkpoints will be added, such as Audio Control or Error identification, suggestion and prevention. It will then be important to reflect in the checkpoints the requirements for 'accessibility support' and the proper use of alternative versions.  These do not map onto particular WCAG 2.0 success criteria, but are nevertheless included in the normative part of the recommendation (see section "Conformance" in the Recommendation). Some checks without correspondence in the WCAG 2.0 will be dropped.

7 Outlook

For the BITV-Test, there are many areas for development to ensure that the procedure is able to cover the accessibility of future web technologies. The emergence of web technologies beyond HTML/CSS/Javascript means that new accessibility evaluation procedures need to be developed and tested in order to cover content in those media.

Another aspect is the need for clear rules regarding the testing of dynamic elements / page instances, including documentation. BIK is unlikely to deal with quality assurance beyond testing a "time slice" or snapshot of a web site. One could argue that quality assurance is the responsibility of providers; if an organisation’s policy includes a commitment to sustained accessibility, it needs to put the things in place that are required to maintain it (such as the training of online editors, commissioning repeat tests, etc).

BITV-Test had a difficult time in the European accessibility context – perhaps because it seemed at odds with the approach taken in UWEM and has always taken a critical stance towards automated testing. However, both approaches may be seen as complementary and may benefit from each other, especially as UWEM as it stands is now outdated.

It is clear that countries and organisations have a vested interest in maintaining their investment into their chosen approach and may not want to abandon it for some new scheme. However, looking at other approaches is always beneficial. The BIK project will therefore continue to look for co-operation with other organisations in Europe. The aim is not to devise a unified evaluation procedure (this would be far too ambitious and unlikely to work), but to define a platform for joint discussions of testing methodology and practice, and the exchange of views regarding new technical developments and fair demands regarding accessibility.

We have posted a concept sketch for a project centred around test procedures in a Wiki at Other organisations are invited to add information on their own approach and define the aims that such a project should have. To prevent spam, the discussion is currently open for registered users only. If you want to participate, send a mail to fischer [at] and I will send you the login details for registering.