Why accessibility APIs matter

This morning, Victor from payPal and I got into an exchange on Twitter regarding the ChromeVox extension. ChromeVox is a Chrome extension which provides screen reading functionality for blind users. Through keyboard commands, the user can navigate page content by different levels like object by object, heading by heading, form control by form control, etc.

Wait, you might say, but this is screen reader functionality I know from NVDA or JAWS on Windows, or VoiceOver on the Mac! And you’re absolutely right! The difference is that ChromeVox is working in the web content area of Chrome only, not in the menus or tool bars.

That in itself is not a bad thing at all. The problem comes with the way ChromeVox does what it does. But in order to understand that, we need to dive into a bit of history first.

Back in the late 1990s, when the web became more important every day, and Windows was the dominant operating system. With the big browser war coming to an end with Microsoft’s Internet Explorer as the great winner, it was apparent that, in order to make the web accessible to blind users and those with other disabilities, assistive technologies had to deal with IE first and foremost. Other browsers like Netscape quickly lost relevance at that time in this regard, and Firefox wasn’t even thought about yet.

Microsoft took sort of a mixed approach. They were bound by their own first implementation of an accessibility API, called Microsoft Active Accessibility, or MSAA. MSAA had some severe limitations in that, while it knew very well how to make a button or checkbox accessible, it had no concept of document-level accessibility or concepts such as headings, paragraphs, hyperlinks etc. Microsoft managed to evolve MSAA bit by bit over time, giving it more role mappings (meaning different types of content had more vocabulary in saying what they were), but there were still severe limitations when it came to dealing with actual text and attributes of that text.

But screen readers wanted more than just knowing paragraphs, links, form fields and buttons. So the Windows assistive technologies were forced to use a combined approach of using some information provided by MSAA, and other information provided by the browser’s document object model (DOM). There were APIs to get to that DOM in IE since version 5.0, and for the most part, the way rich internet information is accessed in IE by screen readers has not changed since 1999. Screen readers still have to scrape at the DOM to get to all the information needed to make web content accessible in IE.

In 2001, Aaron Leventhal of Netscape, later IBM, started working on a first implementation of accessibility in the Mozilla project, which later became Firefox. To make it easier for assistive technology vendors to come aboard and support Mozilla in addition to Firefox, a decision was made to mimic what IE was exposing. That part of interface is with Firefox until today, and being used by some Windows screen readers, although we nowadays strongly evangelize for use of a proper API and hope to deprecate what we call the ISimpleDOM interfaces in the future.

In 2004/2005, accessibility people at Sun Microsystems and the GNOME foundation, as well as other parties, became interested in making Firefox accessible on the Linux desktop. However, this platform had no concept similar to the interfaces used on Windows, and a whole new and enhanced set of APIs had to be invented to satisfy the needs of the Linux accessibility projects. Other software packages like much of the GNOME desktop itself, and OpenOffice, also adopted these APIs and became accessible. While some basic concepts are still based on the foundation laid by MSAA, the APIs on Linux quickly surpassed these basics on a wide scale.

Around the same time, work began on the NVDA project, the first, and to date only, open-source screen reader on Windows. The NVDA project leaders were interested in making Firefox accessible, giving users a whole open-source combination of screen reader and browser. However, they were not planning on building screen-scraping technology into NVDA that was used by other Windows screen readers, but wanted API-level access to all content right from the start. Out of this requirement, an extension to MSAA, called IAccessible2, was born, which enhanced MSAA with stuff already present in the Linux accessibility APIs. As a consequence, they are very similar in capability and nomenclature.

In parallel to that, Apple had been developing their own set of APIs to make OS X accessible to people with visual impairments. Universal Access and the NSAccessibility protocol are the result, accessibility from an API level that also does not require the screen reader to scrape video display content to get to all the information. This protocol is in many ways very different in its details, but offers roughly the same capabilities.

Within Firefox, this meant that gaps that were previously only pluggable by using the browser DOM directly, needed to be closed with proper API mappings. Over time, these became very rich and powerful. There is a platform-independent layer with all capabilities, and platform-specific wrappers on top which abstract and slightly modify (on occasion) the exposed information to make it suitable for each platform. Both Firefox for Android and Firefox OS JavaScript bridges to Talkback and a speech synthesizer respectively, use the platform-independent layer to access all information. Whenever we find the JavaScript code needs to access information from the DOM directly, we halt and plug the hole in the platform-independent APIs instead, since there will no doubt be a situation where NVDA or Orca could also run into that gap.

So to re-cap: In IE, much information is gathered by looking at the browser DOM directly, even by NVDA because there is no other way. In Firefox, some of the more legacy screen readers on Windows also use this technique provided by Firefox as a compatibility measure, but all newer implementations like NVDA use our IAccessible2 implementation and no DOM access to give users the web experience.

Safari on OS X uses the Apple NSAccessibility protocol obviously. It has since been discontinued on Windows, and never had much of an  MSAA support to speak of.

Google Chrome also exposes its information through Apple’s NSAccessibility protocol on OS X, and uses MSAA and IA2, at least to some degree, on Windows.

And what does ChromeVox use?

Here’s the big problem: ChromeVox uses DOM access exclusively to provide access to web content to blind users. It does, as far as I could tell, not use any of Chrome’s own accessibility APIs. On the contrary: The first thing ChromeVox does is set aria-hidden on the document node to make Chrome not expose the whole web site to VoiceOver or any other accessibility consumer on OS X or Windows. In essence, both Crome and ChromeVox perform their own analysis of the HTML and CSS of a page to make up content. And the problem is: They do not match. An example is the three popup buttons at the top of Facebook where the number of friend requests, messages, and notifications are displayed. While Crome exposes this information correctly to VoiceOver, ChromeVox only reads the button label if there is a count other than 0. Otherwise, the buttons sound like they are unlabeled.

In my half hour of testing, I found several pages where there were these inconsistencies between what Chrome exposes, and what chromeVox reads to users. An example quite to the contrary is the fact that Google Docs is only accessible if one uses Chrome and chromeVox. What Chrome exposes to VoiceOver or NVDA is not sufficient to gain the same level of access to Google Docs.

If you are a web developer, you can imagine what this means! Even if you go through the trouble of testing your web site with Chrome and Safari to make sure they expose their information to VoiceOver, it is not guaranteed that ChromeVox users will benefit, too. Likewise, if you use chromeVox exclusively, you have no certainty that the other APIs are able to cope with your content.

Web developers on Windows have also learned these lessons the hard way with different screen readers in different browsers: Because with IE, each is forced to do their own interpretation of HTML, at least on some level, results will undoubtedly differ.

There is a harmonization effort going on at the W3C level to make sure browsers interoperate on what they expose on each platform for different element types. However, if prominent testing tools like ChromeVox or some legacy screen readers on Windows hang on to using their own HTML interpretation methods even when there are APIs available to provide them with that information, this effort is made very very difficult and puts a big extra burden on those web developers who are making every effort to make their sites or web apps accessible.

When we started work at Mozilla to make both Firefox for Android and Firefox OS accessible, we made a conscious decision that these platforms needed to use the same API as the desktop variants. Why? Because we wanted to make absolutely sure that we deliver the same kind of information on any platform for any given HTML element or widget. Web developers can count on the fact that we at Mozilla will always ensure that if your stuff works in a desktop environment, it is highly likely that it will also speak properly on Firefox OS or through TalkBack in Firefox for Android. That is why our JS bridge does not use its own DOM access methods, so there are no interpretation gaps and API diversities.

And here’s my pledge to all who still use their own DOM scraping methods on whichever platform: Stop using them! If you have an API available by the browser, use that whenever possible! You will make your product less prone to changes or additions in the HTML spec and supported elements. As an example: Does anyone remember earlier versions of the most prominent screen reader on Windows which suddenly stopped exposing certain Facebook content because Facebook started using header and footer elements? The screen reader didn’t know about those and ignored everything contained within the opening and closing tags of those elements. It required users to buy an expensive upgrade to get a fix for that problem, if I remember correctly. NVDA and Orca users with Firefox, on the other hand, simply continued to enjoy full web access to Facebook when this change occurred, because Firefox accessibility already knew how to deal with the header and footer elements and told NVDA and Orca everything they needed to know.

On the other hand, if you are using DOM scraping because you find that something is missing in the APIs provided by the browser, work with the browser vendor! Identify the gaps! Take the proposals to close those gaps to the standards bodies at the W3C or the HTML accessibility task force! If you’re encountering that gap, it is very likely others will, too!

And last but not least: Help provide web developers with a safer environment to work in! Web apps on the desktop and mobile operating systems are becoming more and more important every day! Help to ensure people can provide accessible solutions that benefit all by not making their lives unnecessarily hard!

18 thoughts on “Why accessibility APIs matter

  1. Thanks for the detailed background, Marcos! It really helps highlight the need for the API approach.

    It’s already challenging enough for web developers to come to grips with how they should be making their sites accessible, since most of us aren’t users of screenreaders or other AT (except our glasses! :P), and having multiple competing screenreader models is a strong disincentive to do good testing. We’re starting to finally emerge from the nightmare of browser incompatibility, thanks to HTML5, and the same should be done for AT.

    (Disclaimer: I work for W3C, but I’m not an accessibility expert, and my opinions are my own. I’m working to make SVG accessible, and this article will definitely help inform how I think about the problem space.)


  2. Marco.
    Everything you say is correct. However…
    There are realities you can’t ignore. Not all accessibility APIs are given
    the necessary level of attention by the companies that maintain them. As a result users may end up with different experiences when trying to use the same browser on different operating systems, eg Chrome, IE or even Firefox. Ultimately, the experience users get on a specific platform depends on the desktop level APIs that a particular browser has to talk to. This, as you point out, can potentially be remedied by the browser accessibility APIs but this approach in itself may call for hacks and work-arounds.

    Having worked on a number of very dynamic and content-rich products I can recall the number of ocasions where the implementation of ideas were much complicated because this or that screen reader / browser combo did not play to the spec or support the feature we needed. An attempt to work with AT vendors did not always result in the fix we wanted because the vendors had their own priorities to address first.

    As I said on Twitter, my hunch is that Chromevox was introduced to cover the needs of the Chrome OS first and foremost but since they share the same browser, the extension is also available for all flavors of Chrome.
    Interestingly enough, those users who opt into the Google universe are most likely to get a consistent experience across Chrome browsers on different operating systems because the DOM is more consistent across browsers than accessibility APIs are. The only thing Chrome and Chromevox need from the OS is the text-to-speech synthesizer.
    But there’s more…

    Chromevox can now introduce experimental features, such as support for math expressions, good treatment of HTML5, etc, and anything else that improves the user experience.

    In conclusion…
    I think you are correct with regards to accessibility APIs and how everyone should work toward improving them. However, we know many inconsistencies and oversights exist and some browsers are far away from addressing those in the near future. (Here I hope I am wrong regarding that future!)
    Chromevox closes accessibility gaps for some of their products and also takes an opportunity to experiment with technologies that are not available in any other screen reader or accessibility API.

    I personally treat Chromevox as another tool in the toolbox. If I can’t get something done in Safari with VoiceOver, I will try Chrome with or without Chromevox, NVDA, JAWS and whatever else!


  3. Hi Victor,

    thanks for your comment!

    I realize why Google are doing the things they’re doing. In fact, we at Mozilla are also doing experimental stuff, but we always make sure our platform-independent APIs know about the new stuff. An example is the newly proposed role of “key” in ARIA, which will be exposed through regular APIs so Firefox OS accessibility, for which it was thought up initially, can start using it immediately. Once there is consensus on how to expose this through the platform-dependent APIs such as IAccessible2, ATK and what not, we just have to change a single line of code to expose the new role.

    So our approach is much more centered around the API level of things than just getting it exposed somehow. I dare say that ours is a little more future-proof. 😉


  4. Marco, thanks for the thoughtful post.

    As a developer, I use ChromeVox heavily for “smoke testing” content for accessibility. I work on OS X and debug in Chrome, along with everyone I work with, our clients, and most of the devs I meet. That’s because OS X has a good development suite and the debugging features in Chrome have so far been superior to other browsers.

    Since Chrome on OS X is out de facto standard, we’re left with two options really: VoiceOver or ChromeVox. VoiceOver behaves similarly in Chrome and Safari, so that’s not an issue. The primary drawbacks are as follows:

    1. VoiceOver sometimes introduces performance problems when disabled an reenabled repeatedly
    2. VoiceOver is a black box: no release notes, changelogs, source code, docs, etc.
    3. Switching VoiceOver on and off when tabbing from code to browser is disruptive to dev workflow.

    Overall, from a dev standpoint, right now ChromeVox is the most appealing option. If Apple can iron out the performance issues and give us docs for VoiceOver, and if we could find a way to sandbox announcements to the browser, I’d personally be more inclined to use it.


  5. As Victor said “Chromevox can now introduce experimental features, such as support for math expressions, good treatment of HTML5, etc, and anything else that improves the user experience.”

    I agree that this is good for experimentational purposes, however it’s important to note that this doesn’t actually mean that it will improve accessibility for the majority of general users of Assistive Technologies.

    Having worked on many interaction designs through the AccDC TSG for the same purpose, to find out which configurations for common design patterns are most widely supported for AT users, I can easily prove that full W3C standards compliance does not always coincide with accessibility.

    Regardless of support variations and browser Accessibility API differences, it is a mistake for developers to test new components and new features using screen reader and browser combinations that are not used by the majority of disabled Assistive Technology users as part of the QA process.

    For example, currently, the most widely used screen readers are JAWS and NVDA using IE and FF on the Windows platform, and the use of VoiceOver on the iOS platform for mobile, followed by TalkBack on the Android. It’s also important to know that JAWS is hard coded to work best in IE, and NVDA is hard coded to work best in FF, which often manifests within complex ARIA implementations. Additionally, full keyboard support needs to be tested with no screen reader running, which is often overlooked. Doing so also ensures accessibility for voice navigation software like Dragon.

    If features are only available or usable within ChromeVox in Chrome, and are only tested with this in mind, it will severely limit the overall accessibility of any application for the majority of disabled users around the world.


  6. Marco,

    You wrote: “An example quite to the contrary is the fact that Google Docs is only accessible if one uses Chrome and chromeVox. What Chrome exposes to VoiceOver or NVDA is not sufficient to gain the same level of access to Google Docs.”

    I work on Google Docs accessibility, and support NVDA, VoiceOver, and JAWS – please let me know what doesn’t work for you with any of these! JAWS takes a bit of convincing to work with the live region and interactions we have going, but NVDA and VoiceOver tend to work fairly well for me. You should have the ability to do everything using (for instance) NVDA and Firefox that you can do with Chrome and ChromeVox.

    I can’t really speak to the OS accessibility API and ChromeVox side of things (JavaScript geek), but I’d very much like to talk in more detail about the state and future of web APIs from the web application side in order to make things more accessible for all.


  7. Great post,

    Victor, I sort of agree with you, but don’t you think it’s odd that ChromeVox doesn’t use Chrome’s API? It seems to be a duplication of work.

    Your examples were between different companies, not the same company!

    Google’s a big company so I can see how that would happen, but strategically, shouldn’t that work be put into the API rather than developing a separate parser? Even just for their own benefit.


  8. It’s also important to know that JAWS is hard coded to work best in IE
    Bryan, while that technically may be true, would you not say that Firefox offers superior support for WAI-ARIA, and so arguably works better with JAWS than IE on modern web pages?


  9. I understand what you mean about FF being more compliant with current standards, but unfortunately this doesn’t always translate into better support in ATs if the standard isn’t equally supported by the screen reader.

    Here is an example of this, using a simple button that specifies two explicit labels using aria-labelledby according to spec:

    Shared Text

    Expand Section

    If viewing it using JAWS 14 in IE, it announces both when tabbing to the button.
    If viewing it in FF however, it ignores the second internal label and only announces the first shared one, causing an accessibility issue when more than one button references the same technique.

    This is a JAWS bug, and one I’ve already reported to FS, but it shows what I mean.

    It’s always important to test using the most widely used combinations in order to identify where something may be semantically incorrect versus unsupported by the AT.


  10. Hmm, it looks like the blog strips the tag markup.

    Here is the markup with the less and greater signs converted to brackets:

    [h2 id=”lgnd”] Shared Text [/h2]
    [button aria-labelledby=”lgnd lbl”]
    [span id=”lbl”] Expand Section [/span]


  11. Exelent post! I fear I must take issue with Brian on his Jaws with IE and FF comment. Respected sir, I must point out to you that this combination is the dominant one in the United States and not necessarily in the rest of the world. Places like the Indian subcontinent, Europe, South America and Africa have adopted Linux as their OS of choice and Orca as the screen reader there due to the fact that these are free as in beer and free as in libre coupled with the fact that they offer wide multilingual support. I know of several initiatives in India and Africa that have brought the use of computers for the very first time to blind people in these places using these tools. I would argue that a platform that caters to such a potentially wide audience should receive lots of attention from testers in the developer community. Totally agree with you on the unsuitability of Chromevox as prime candidate for this purpose though. I vote for ff with NVDA, Orca and Talkback and then use IE with Jaws, Safari with Voiceover and, then, if you really want to, use Chromevox to round things out.


  12. So in 2005, the Windows standard is MSAA, which has been around for seven years or so. A screenreader written to support MSAA would support your application if you wrote your application to support MSAA.

    Ah, but there are limitations to MSAA: so Microsoft in 2005 has produced UIA, an new accessibility API, with lots more power. Better yet, the two technologies bridge. If your older application provides MSAA, not UIA, a screenreader asking for the latest UIA will get the MSAA information served up in UIA wrapping, so it works. If your application supports UIA but a screenreader requests MSAA, then the UIA gets turned into MSAA for it. All works fine – backwards compatible.

    So all we needed was for people making software on Windows to support MSAA or UIA and screenreaders on Windows would work really well!

    However, this doesn’t suit the commercial interests of Sun and Netscape and IBM, nor the ideological interests of Mozilla. So they introduce another accessibility API in 2006, IAccessible2. Now, if you’re writing a screenreader for Windows, you have ANOTHER DIFFERENT API to support. And if you are a screenreader user trying to use software like, say OpenOffice, you have to wait until when, 2013? until MSAA is supported properly and you can use the File menu.

    Don’t get me wrong: Microsoft failed badly by waiting until Windows 8 to provide a UIA interface to Internet Explorer and the client area in Office. If they had done that in 2005 then this might all have been different and much better: IAccessible2 would have died a death, NVDA could have used MSAA/UIA for pretty much everything, and so on.

    But the fact remains that there were established accessibility standards for Windows, that they worked, and they were ignored for mainly commercial and ideological reasons by the people behind IAccessible2. And that’s made it harder for people who need to use accessibility APIs, developers and users: as you put it yourself, “unnecessarily hard”. If it’s bad to have to use accessibility APIs plus DOMs, why is not bad to have to use multiple accessibility APIs? Why no UIA access to Firefox or OpenOffice?

    Politics, sadly.


  13. I thought I’d share another perspective which seems to be the exact inverse of the approach you are advocating, Marco. I posted this to the WebAIM listserv and the full thread can be accessed here:

    Here is the initial post. I’d be interested in your take.

    Subject: Google Chrome Frame for Screen Readers?
    No previous message | Next message
    Hello, all.

    I wonder if anyone has ever thought of developing a JavaScript library that
    emulated the behavior of the most standards-compliant, open source,
    fully-featured screen reader/browser combination (not sure what that would
    be). The way I’m envisioning it, the library would apply aria-hidden to
    everything on the page and then use JavaScript to respond to key commands
    by adding content to a single ARIA live region. Basically, the live region
    would function as a virtual buffer of sorts.

    Why would you reinvent the wheel like this? Well, basically it would allow
    developers to code to standards, and ensure that if the user was accessing
    the page with a screen reader that supported aria-hidden and live regions,
    they could at least have a minimally accessible experience. The library
    could include code that would let the user turn this emulation on or off,
    allowing them to continue using whatever system they were comfortable
    with. However, by doing this, developers wouldn’t have to code to and test
    on every single permutation of screen reader and browser and this would
    encourage writing standards-compliant, accessible code. It might also give
    AT vendors a common target. It would be something vaguely similar to the
    approach Google used with its Chrome Frame tool.

    Individual developers could add it to their pages, but it also wouldn’t be
    hard, I would think, to create a browser extension/bookmarklet that let the
    end users apply the library to any page they wanted.

    Okay, have at it! Please tell me what obvious things I’m missing here.
    I’ve never done anything of this scope before (and I’m not volunteering),
    so perhaps it is just impossibly complex or would incur some unacceptably
    huge performance hit. Or perhaps it is wrong from some theoretical or
    ethical perspective. I’d love to hear what others think about this, as I
    suspect the answers will help me (and perhaps others) understand better the
    details of how the technology works. Thanks!



    1. Hi Rob,

      I read your thread on webAIM, but due to travel, did not have a chance to catch up with all the mail surrounding it. Let me just say that I doubt that this will work. Content JavaScript does not have access to all the layout information the browser core has and which take a big part in information gathered for accessibility. Some of this information is, or may be, sensitive or security-relevant, and therefore should not be exposed to content. That’s why I think the creation of an accessibility tree that has all the information should be the browser’s responsibility. The screen reader or other assistive technology should be the consumer, not do their own interpretation. That’s one thing I’ve been criticizing about JAWS and some other ATs in the past: They do their own HTML parsing even in browsers that give them full information about the accessibility, like Firefox, and thus become another browser as a matter of fact, adding to the hell of browser inconsistencies that web developers already have to deal with without accessibility even in play. Since accessibility for the non-experienced web developer is hard enough, we should, in the accessibility community of browser vendors, try to be good citizens and provide there the most consistency possible. The Protocols and Formats Working Group and accessibility-related task forces at the W3C aim to do exactly that. And the fact that Microsoft Edge is now going full-stack accessibility API as well and is taking away the ability to parse the DOM for screen readers, is a statement to that this approach is correct. Firefox, Safari, and Chrome are doing it as well, even ChromeVox is, if rumors ring true, getting away from parsing the DOM on their own, and use Chrome’s information instead.


  14. Thanks for the reply, Marco. Could you give me some examples of specific tasks you couldn’t accomplish by inspecting the DOM but that you could by accessing the Platform APIs? I’m just trying to get clearer about where that line is.


  15. This is a great post, and the history lesson was immensely helpful!

    As I’m writing this it’s now 2019 and I’m wondering if there’s been any movement from ChromeVox or if things are basically still the same.


Share your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.