Thursday, June 02, 2011

The Web of Data -- WolframAlpha is two years old

We talk about different ways to retrieve stored data. Text search ala Google is most common, but we also have relational databases like Zoho Creator, hierarchical taxonomies like the Yahoo Directory or the Dewey Decimal system used in libraries, and keyword tags like the label terms in the right hand column of this blog.

Wolframalpha, which just celbrated its second birthday, stores and retrieves structured data, but it goes beyond retrieval, using the data for computation. For example, when I entered "New York to Los Angeles at 100 miles/hour," it computed the straight line distance using geo-coordinates and then computed the time to travel that distance at 100 miles/hour. (Click the image to enlarge it).

It also showed the assumptions it made -- that I meant the city New York, not the State or Financial Note and I meant Los Angeles, California, not Chile -- and it inferred that I wanted it to compute travel time from the fact that I had included a velocity (100 miles/hour) in my query. It also "knows" that a velocity is an example of a broader class, rate.

It easily handed unit conversions. When I modified the query, asking it to calculate "Los Angeles to New York at 1 inch per hour," it reversed the direction of the arrow in the diagram and told me the trip would take 17,804 years 3 months 24 days 22 hours 35 minutes.

My next query was "calories in 2 slices of bread and 2 tablespoons of peanut butter and 2 tablespoons of jelly." It calculated the weight of the ingredients, looked up the number of calories of each and displayed the result: 440 calories. It also displayed other nutrition facts like the amount of fat, cholesterol, and saturated fat.

WolframAlpha has information on many types of object, but it is limited. When I asked for the "calories in a peanut butter and jelly sandwich," it overlooked the bread, and assumed two tablespoons of peanut butter and one of jelly. It does not have data on "sandwiches."

But, the next version may. Wolfram is constantly adding new data. During their second year they added data in these categories: US Economy,International Data, US Social Statistics, Culture and Media, Geography, Astronomy, Chemistry, Earth Sciences, Engineering, Health and, medicine, Life Sciences, Materials, Physics, Money and, Finance, Units and Measures, Math, and Technology and Computer Systems.

It is noteworthy that Wolfram has decided their staff would add new data. This is in contrast to Freebase, a structured data storage and retrieval system, in which any user is able to add data, wiki style. (Freebase lacks the computation ability of Wolframalpha).

Freebase and Wolfram are building a "web of data" as opposed to a web of HTML and javascirpt pages. They include semantic information -- they know about the data they are storing.

Is this the future of the Web? Tim Berners-Lee, the inventor of the Web protocols, is now focusing his attention on the semantic web and Google has acquired Freebase. Google knows you are an instance of the class person -- what can they infer about you?