How Wolfram Alpha could change software

The upstart "computational knowledge engine" claims its results are original works, raising important questions about software and intellectual property

Don't call Wolfram Alpha a search engine. Billed by its creators at Wolfram Research as a "computational knowledge engine," Wolfram Alpha uses mathematical techniques to cross-reference myriad specialized databases, producing unique results for each query. For example, query Wolfram Alpha for "San Francisco New York elevation" and you get back a page explaining that, at 52 feet above sea level, San Francisco is 60 percent higher than New York. (The same query at Google yields links to airline ticketing sites, a review of a pilates studio in San Francisco, and a blurb about a New York burger joint.)

But that's not all that separates Wolfram Alpha from traditional search engines. Try cutting and pasting from the results page. You can't, and with good reason. According to Wolfram Alpha's terms of use, its knowledge engine is "an authoritative source of information," because "in many cases the data you are shown never existed before in exactly that way until you asked for it." Therefore, "failure to properly attribute results from Wolfram Alpha is not only a violation of [its license terms], but may also constitute academic plagiarism or a violation of copyright law."

[ Tired of being told to do more with less? Participate in the Slow IT movement: Rant on our wailing wall. Read the Slow IT manifesto. Trade Slow IT tips and techniques in our discussion group. Get Slow IT shirts, mugs, and more goodies. ]

In other words, Wolfram Research is claiming that each page of results returned by the Wolfram Alpha engine is a unique, copyrightable work, like a report or term paper. That makes Wolfram Alpha different not just from classic search engines, but from most software. While software companies routinely retain sole ownership of their software and license it to users, Wolfram Research has taken the additional step of claiming ownership of the output of the software itself. It's a bold assertion, and one that could have significant ramifications for the software industry as a whole.

The mad mathematician of the software industry

Stephen Wolfram, the brains behind Wolfram Alpha, has no qualms about bucking established practice. A mathematical prodigy, he published his first scientific paper at 15; studied at Eton, Oxford, and Caltech; and held staff positions at a number of universities, but left academia in the late 1980s to focus on entrepreneurship and independent research. He hasn't authored a proper, peer-reviewed paper in years, but in 2002 he published "A New Kind of Science," a 1,200-page tome that he claimed would revolutionize science and introduce "a whole new way of looking at the universe."

"It may sound arrogant, but I have moved pretty far away from what most scientists know about," Wolfram told Technology Review in 1997. "That means there are fewer and fewer people I can talk to about what I am doing. Your typical top scientist does not know this stuff."

Not everyone agrees. Wolfram's critics describe him as egotistical and dismissive; his research, undisciplined and flawed. Cosma Shalizi, a statistician at the University of Urbana whose work overlaps Wolfram's, fears that in his isolation Wolfram has become "a crank in the classic mold." "A New Kind of Science," Shalizi says, is not the landmark book Wolfram claims it is; rather, it is "a rare blend of monster raving egomania and utter batshit insanity."

If his science is suspect, however, Wolfram's business acumen is above dispute. Privately held Wolfram Research's main software product, the computational toolkit Mathematica, retails for around $2,500 per seat in many markets and has been wildly successful. The Wolfram Alpha engine was itself written in Mathematica, making it a valuable marketing tool for Wolfram's software offerings, if nothing else.

But Wolfram has much bigger plans for Alpha. The launch team says it's thinking of the project in terms of a 20-year-plus timeline, but it will always be a work in progress. The latter is generally true of any active site; in Wolfram's case, however, finding ways to profit from the knowledge engine must surely be an ongoing concern. No wonder the company sees its query results as intellectual property.

Is software output copyrightable?

Wolfram might be right. It is at least theoretically possible to copyright works generated by machines. Consider electronic music, for example. But some things can't be copyrighted, including recipes, simple instructions, and other trivial bits of information. For Wolfram Alpha to claim copyright protection for its query results, its pages must be such original presentations of information that they qualify as unique works of authorship.

How unique are they really? Wolfram claims that its knowledge engine is powered by exclusive, proprietary sources of "curated data," but many of the actual data points it works with are nothing more than commonplace facts. The query "300 feet in centimeters," for example, returns equally useful results whether you query Wolfram Alpha or Google. Wolfram Alpha merely pads the job. But as Wolfram Alpha improves, not every query will be quite so simple. It's easy to imagine cases where Wolfram's claims could be upheld.

The problem is that under current copyright law, where copyright protection is applicable it is automatic. Contrary to popular belief, you don't need to file a form with the government to copyright your work, or even attach a copyright notice to it. If the work qualifies, copyright exists as soon as the work is completed, even if it is only a letter, a doodle, or a novel that will live out its days in a box in your closet. So if copyright is applicable to Wolfram Alpha's output in some cases, by extension the same rules apply to every other information service in similar cases.

Copyright traps for software developers and users

Consider your Gmail inbox. That's your mail, right? But wait -- you didn't write it, someone else did. So it's theirs -- right? But the mail only exists in a database on Google's servers, and when you display the mail you're really viewing a custom presentation of the raw data in your Web browser, created exclusively by Google. Is that presentation a unique work? What about when you search your mail or organize it into threads? Now you're asking Gmail to perform transformations on the data held in its stores and present it in new ways. Are these views of data any less unique than Wolfram Alpha's query results?

If the answer is no, then where do we draw the line? Wolfram Alpha and Gmail are both examples of SaaS (software as a service), but even desktop software is offered under strict license terms. Suppose you have an Excel spreadsheet full of numbers that you input, but then you ask Excel to generate a series of complex graphs based on rules, formulae, and templates designed by Microsoft. Or what about pivot tables? What about mash-ups or tools like Mozilla Jetpack? If unique presentations based on software-based manipulation of mundane data are copyrightable, who retains what rights to the resulting works?

Of course, these are all questions for the lawyers. Until the answers are tested in the courts, however, the issue of copyright for software output will remain a gray area. Until it is resolved, software developers and their representation must anticipate how intellectual property law might affect users of their software or consumers of their services, their business partners, and themselves, particularly as computing moves ever further onto the Web.

Copyright © 2009 IDG Communications, Inc.

InfoWorld Technology of the Year Awards 2023. Now open for entries!