What don’t the major search engines do well?
That’s a burning question these days from way too many parties. It
comes up in John Battelle’s most recent post (one that reminds me how
effective a writer he is), which culminates in a plea for teaching search literacy in schools, and then journalist Cyrus Farivar’s thoughtful commentary on it. It’s a hot topic on Alt Search Engines,
where Charles Knight finds and promotes startups that beat the engines
at their own game. And of course it’s the raison d’être for all those
engines.
There are two ways to address the problem of engines’ shortcomings.
One way is to make the engines smarter. The other is to make the users
smarter. With any luck we’ll meet somewhere in the middle.
It’s interesting to see startups’ different approaches to making themselves smarter.
Powerset, for example,
which was acquired by Microsoft, only parses Wikipedia pages so far, so
it’s hard to gauge the real value of the technology. Powerset will
summarize pages and also claims to show “factz” (ugh) and meaning. Many
of the suggested searches under “meaning” also relate to facts, such as
“What did Caravaggio paint” “What awards did ‘No Country for Old Men’
win” and “When did earthquakes hit Tokyo.” The results are clearly
organized, but any real power still is hidden under the hood.
True Knowledge, still
in beta, also prides itself on facts, but the kind spelled with an s.
Company strategists wrote in an email update earlier this month, “We
are really excited that our knowledge base has expanded to the point
that it now boasts over 110 million facts, which has greatly increased
the number of questions we can answer.” I wonder how many billions of
facts are out there, and which are the ones people either currently use
search engines for or would use search engines for if people trusted
search engines had those answers. Some context would be useful to
ground the big number.
The engines also continue to progress in answering facts, even if
they don’t provide direct answers to 110 million questions. One True
Knowledge example is “When is Easter Sunday 2009.” True Knowledge
answers this perfectly with a big, bold answer, but it’s also in the
description of the first result in Google.
Hakia, another contender, has
loftier sights from the start. All of its sample queries on the
homepage strive to address more than facts. These queries include “What
is the most effective way to lose weight,” “What are benefits of a
long-term care insurance,” “Is bottled water better than tap water,”
and “Most common risk factors for stroke.” Even more surprising is that
the first three queries don’t have clear, quantifiable answers. The
weight loss query, rather than responding with “put down the Mallomars,
turn off Hulu, and blow the dust off the Wii Fit already!,” brings up a
bunch of results — including many from librarian-certified sites — with
highlighted text that helps answer the query. The user then can make
informed decisions. Hakia also has thousands of fact-filled pages in
its galleries, but it’s focused on addressing higher-level queries.
Facts are important areas for search engines to cover. Anyone who’s
old enough to remember the struggles of finding facts in reference
books or libraries will appreciate how much time search engines save.
It even proved useful at my family’s lunch celebration for my birthday
this weekend. My dad tried to recall the name of the actor who played
President George H.W. Bush in Oliver Stone’s movie “W.,” just noting it
was an “old guy who plays mobsters,” which I instantly pegged as James
Cromwell. My nephew checked IMDB.com from his mom’s iPhone and proved
me right. It felt good.
Better indexing of facts will make engines more useful, but it
won’t make them smarter. For that, they need to be more intelligent,
and better at indexing meaning. It’s the difference between knowledge
and wisdom.
No matter what, we’ll still require human search literacy. Even
facts are subject to interpretation. With a simple phrase like “Joe the
Plumber,” one can think it refers to a plumber named Joe, while another
can assert that it’s someone named Sam who isn’t licensed as a plumber.
We can’t assume search engines will ever sort it out perfectly like
the eponymous protagonist of the movie “WALL-E.” We can aim to make
sure we’re smart enough to make the most of all the engines do for us.
People reacted to this story.
Show comments Hide commentsPhilosophers sometimes define knowledge as “true, justified belief.” Powerset’s Factz are beliefs, in that they occur as a proposition in Wikipedia, and are justified, at least in the trivial sense that they are verified by the Wikipedia community. We’re trying to stay away (for now) from the question of truth (for more, see a blog post I wrote about truth).
Leaving philosophical musing aside, Factz are assertions in Wikipedia that we derive from sentences via our syntactic parse. By reading every sentence, we can pull out these subject-verb-object triples. What’s neat about them is that we’re able to succinctly summarize a lot of information about a topic that’s strewn throughout Wikipedia. Our Factz project is just the first in a whole line of features enabled by Powerset’s semantic index. And you’re absolutely right that there’s a lot more going on under the hood in Powerset than just what you see exposed already.
We’d love to chat with you in more detail, so feel free to drop me a line if you’re curious.
Cheers,
Mark Johnson
Powerset Sr. Program Manager