The Chronicle of Higher Education
From the issue dated January 21, 2000
http://chronicle.com/weekly/v46/i20/20b00601.htm
Searching for the Right Search Engine
By ROBERT BERKMAN
Researchers now have it all on the World Wide Web: facts on virtually
any topic, available from the far corners of the globe, unfiltered by
reporters, editors, or publishers, and usually free. But sometimes we
feel that we have too much information -- often way too much -- and
that it may not be correct.
Despite the latest flurry of prime-time ads by search-engine vendors
boasting that they can find anything you want online, search engines
can't distinguish among Web pages based on their contents. The only
way researchers can pinpoint information on the Web is if they learn
how to do efficient Web searches, and which engines are best for which
purposes.
One important lesson is to understand the range of search tools now
available. Many researchers don't realize that they can use
hierarchical indexes, standard search engines, alternative search
engines, meta search engines, and databases -- and that those tools
are not all the same.
In a hierarchical index -- probably the best known is Yahoo
(http://www.yahoo.com) -- people trained to categorize information,
such as librarians and indexers, examine Web sites and put them in
categories and subcategories. Thus, when you do a search on a
hierarchical index, it is much more likely that what you find will be
relevant to what you are looking for.
The drawback to hierarchical indexes is that they are extremely
selective. Because they are created by human beings rather than by
computers, they can include only a tiny portion of what is available
on the Web. Of course, in these days of abundant information, that may
not be such a bad thing.
Yahoo uses a standard search engine as well. For that reason, the
results of a search on Yahoo are split into several sections.
"Category matches" inform you if your topic matches one of Yahoo's
existing categories. "Site matches" are the sites that have been
indexed and categorized. "Web pages" provide links to pages located by
the search engine. Yahoo also groups results into two other sections:
"related news," for any news item it locates on your subject, and "Net
events," which are mostly chat sites.
Yahoo is by no means the only hierarchical index, and some of the many
others are aimed specifically at academic users. The latter group
includes: AlphaSearch (http://www.calvin.edu/library/as), BUBL Link
(http://www.bubl.ac.uk/link), and Infomine (http://infomine.ucr.edu).
Then there are the standard search engines. Popular ones include
AltaVista (http://www.altavista.com), Excite (http://www.excite.com),
Go Network (http://infoseek.go.com), and HotBot
(http://hotbot.lycos.com). Unlike hierarchical indexes, standard
search engines send out software "robots" or "spiders" to search the
Web and index the pages in each site they encounter. The engines then
calculate mathematically how relevant the pages are to your search
terms; each engine uses its own algorithm to rank pages. Factors in
the calculation include the frequency and placement of your keywords
on a page, and their occurrence in the descriptions that owners write
of their pages, which are invisible to users. The search engine puts
the pages that get the highest score at the top of the list of
results.
Savvy researchers will avoid standard search engines when they have a
very broad subject. Instead, they will use a hierarchical index, to
find just a few relevant, well-cataloged sites.
Alternative search engines, which take various approaches to ranking
and sorting the pages that they find, are often more helpful than
standard engines. Northern Light (http://www.northernlight.com), for
instance, ranks Web pages as a standard search engine does. But
instead of displaying all of its results in a single listing, it sorts
pages into categories and groups the results into folders. As an
example, a search for "alternative energy" creates folders with labels
such as "solar power," "air pollution," and "National Technical
Information Service," which includes documents from that agency. And
the folders contain subfolders. Within the solar-power folder, for
instance, are folders for "photovoltaic systems" and "government
sites." That arrangement of material can help you determine which
groups of pages are most likely to be relevant to your needs.
Ask Jeeves (http://www.askjeeves.com) takes an altogether different
approach. You don't enter keywords, but type a question in plain
English -- perhaps "Is there evidence of life on Mars?" Ask Jeeves has
recorded millions of questions that users have asked it, and has found
Web sites that answer those questions.
The first thing that Ask Jeeves does after getting your query is to
scan its database of questions and answers. It then gives you a list
of questions that it "thinks" you want the answer to. If you select
one of them, it lists sites that contain the answers. Ask Jeeves
doesn't always work, but it can save you time, and it is fun to use.
Google (http://www.google.com) takes yet another tack. Like other
search engines, it first matches up your keywords to the pages it has
collected in its index. Then, however, it ranks each page based on how
many other pages link to it -- and how many link to those pages in
turn. The pages you see at the top of your list of results are those
with the highest number of links to other pages. The idea is that such
popularity is meaningful, just as a diner that has many trucks parked
in front probably serves better food than the diner whose parking lot
is empty. The approach works. After several years of being a loyal
AltaVista user, I am now a "googler."
Oingo (http://www.oingo.com) has an even more radical approach. The
site's slogan is "We know what you mean," and Oingo conducts a
"conceptual search" to make sure that it understands your request. Ask
it to search for "china," for example, and it will ask you to choose
"porcelain" or any of the various geographical Chinas. Once you make a
selection, Oingo will display "directory hits" and "Web hits." The
site combines a hierarchical index and a search engine (it uses
AltaVista), although the conceptual search applies only to its
directory results.
Search engines that search other engines are called meta search
engines. Among the popular ones are Dogpile (http://www.dogpile.com),
Inference Find (http://www.inferencefind.com), and MetaCrawler
(http://www.metacrawler.com). The concept here is that because no
single search engine indexes the entire Web, using a meta search
engine allows a researcher to scan more sites. The downside is that
such an engine needs to use a "lowest common denominator" search
statement, so that all of the search engines that it searches
understand the request. Therefore, meta search engines are not a very
good choice for complex searches, involving, say, Boolean logic.
(Dogpile does include some Boolean-search capabilities.)
A completely different strategy is to search a database on the Web.
Hundreds of databases originally searchable on CD-ROM or through
proprietary online dial-up services are now available on the Web, and
new databases are continually being born there as well. That makes it
possible to search rich databases with a standard Web browser,
although in many cases, the researcher must pay a fee or be affiliated
with a university that subscribes to the database. The fee-based sites
typically filter the data they contain, increasing the likelihood that
the results will be relevant to a search; many also offer superior
search capabilities, so requests can be more precise.
The many new, free databases on the Web can also be helpful. A site
that does an excellent job of identifying and sorting free databases
is The BigHub (http://www.thebighub.com). Through its "specialty
search categories," it allows you to search more than 1,500 databases
on the Web, many of which are oriented toward academics.
What new tools for searching the Web are on the horizon? At a recent
conference, I heard about "vortals," vertical portals that provide
information from only a designated slice of the Web. For example, a
vortal might search only those sites and pages that have to do with
health care. VerticalNet (http://www.verticalnet.com) offers portals
to industries including communications and advanced technologies.
Although the concept is a good one, the jury is still out on vortals'
usefulness.
Farther down the road are visual representations of search results.
Those search tools display their results graphically, allowing you to
see at a glance which items are the most relevant. A service called
NewsMaps (http://www.newsmaps.com), for example, displays the results
of your search as a thematic map. Topographical markers indicate
clusters of similar documents -- the most similar ones are piled up
into little hills. According to Cartia, the company behind the
technology, the maps are created automatically by an algorithm that
"reads documents, extracts the content, and organizes the collection
into a map." You can view some sample maps at the site.
No matter which search tool you choose, you will get the best results
if you know what information you need, know the advantages and
disadvantages of the various ways to search the Web, and regularly
practice doing research online. Despite technological innovation, the
best research tool remains the human brain.
Robert Berkman is a member of the faculty of the graduate
media-studies program at the New School University, and conducts
workshops on searching the Internet. He is the author of Find it Fast:
How to Uncover Expert Information on Any Subject
(end of article)
VICUG-L is the Visually Impaired Computer User Group List.
To join or leave the list, send a message to
[log in to unmask] In the body of the message, simply type
"subscribe vicug-l" or "unsubscribe vicug-l" without the quotations.
VICUG-L is archived on the World Wide Web at
http://maelstrom.stjohns.edu/archives/vicug-l.html
|