A Browser That Talks Back

How to chat your way around the Web.

Illustration by Robert Neubecker

If phone makers get their way, pretty soon we’ll all tote little Web browsers in our pockets, whipping them out 10 times a day instead of running back to a desktop screen. Sounds great, until you try to navigate Mr. Pizza Man’s online menu on your cell phone’s tiny screen. By the time you’ve thumb-typed or stylus-tapped your way through the dozens of yummy options, you could have ordered 100 pizzas the old-fashioned way. Wouldn’t this all be easier if you could talk to your browser like it were the pizza guy?

It certainly would, based on my test drive of the Opera browser for Windows, which comes with built-in voice support. At least in this beta version, Opera’s most useful voice-activated features are the commands that control the browser itself. All I had to do was plug in my USB headset (if you’ve got a laptop, you can just use the built-in speaker and microphone) and turn on the voice feature in the preferences panel. Instantly, the browser obeyed my commands. “Opera, back!” I said, and the back button clicked. “Opera, next link! Opera, open link!” It all seems like a cute gimmick at first, but as I write this article I’m finding it easier to shout at Opera to scroll up and scroll down than to reach over a whole 2 inches to grab my mouse. (If you do want a cute gimmick, though, say, “Opera, speak!” and listen to MC Stephen Hawking read angst-ridden LiveJournal entries.)

The browser itself is fast and consistently crash-free. Opera has been ahead of the curve on tech specs and performance since the mid-1990s, but Wired News recently dubbed it “The Forgotten Browser“—latecomer Firefox now gets gushing reviews for features Opera had years ago. If Opera wants to strike back by luring away headset-wearing freaks like me, it has a bit more work to do. The speech-recognition software, which comes from IBM, performed well even with the stereo cranking a couple of feet away, but there are a few missing features. You can’t speak URLs into the browser yet. It would also be nice if Opera had a “search” command—shouting “Opera, search Paul Boutin!” would look for the phrase “Paul Boutin” using my default search engine. (For a complete list of Opera’s voice commands, click here.)

Speech-driven computer interfaces are nothing new—you can buy IBM’s ViaVoice for about $45. But the Opera browser is significant because it adds support for a new markup language called X+V that takes on the annoyances of using the Web on a mobile gadget. X+V, short for XHTML plus Voice, has been jointly developed by IBM, Motorola, and Opera. The language makes it easy for Web designers to hide special tags on their sites that voice-enabled browsers can both speak—”Would you like a small, medium, or large?”—and listen for—”Give me extra anchovies.”

This old IBM demo video shows how much easier it is to book a flight on your PDA if you can just say your flight number or destination city rather than having to type it out. (Note to IBM: In your next demo, replace Boston with Albuquerque—that’ll make the speech advantage obvious.) If you’ve installed the Opera browser, check out IBM’s demo menus for ordering pizza and Chinese takeout. Each menu only has a few items, so it may not be obvious why it’s so great to say “ginger chicken” instead of just clicking the checkbox. But imagine a real Chinese menu with 100 items that’s displayed on a tiny phone screen. With X+V, words with unusual spellings or pronunciations can be programmed into the page phonetically. It’s elementary to have the menu ask if you want soup or rice, too.

IBM’s demos are kind of dull, but Opera’s talking pages for Web developers are a cutup. One shows how to rig a page to parse sentences so it seems like it understands the user. Tell it, “I want to shut down the computer,” and it replies, “Why do you want to shut down the poor computer?” View the page’s source code, and you’ll see that it just listens for any sentence that matches “I want to ___ the ___” and plugs those words into a canned reply. Another page shows how a site’s voice could be flipped from male to female to match a customer profile.

It’ll probably be a couple more years before X+V makes its way into corporate Web sites—the technology is still being developed, and Opera hasn’t added voice support to its mobile browser yet. But the markup language itself is incredibly easy to write. I set up a talking demo page myself in five minutes and am now working on making it listen to me. A pro Web programmer could add voice interaction to a page quickly, making sites like Amazon and Orbitz much easier to use from a PDA or smartphone. Finally, a breakthrough for those of us who love versatility, portability, and the sound of our own voice.