VICUG-L Archives

Visually Impaired Computer Users' Group List

VICUG-L@LISTSERV.ICORS.ORG

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
catherine turner <[log in to unmask]>
Reply To:
catherine turner <[log in to unmask]>
Date:
Sat, 1 May 1999 13:32:35 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (403 lines)
I thought this was useful.

Catherine

From Computer Shopper

 #33  Searching the Net
          Like a little whippet out of a trap, Kay Ewbank hares all over
the
          Net extracting precisely the information she wants. No
rubbish, no
          messing. Good work, sparky

How many times have you wanted to find something on the World Wide Web,
only to
fail miserably? You either find no matches whatsoever or, even worse,
you get
matches one to 21 of 54,931. What most people forget, if they ever knew,
was
that there are lots of things you can do to improve the success rate of
your
Web queries - to make sure you get the results you want and only the
results
you want.
One example that shows just why you need to learn how to write clever
Web
queries is this e-mail that someone posted on a newsgroup:

I've just been having such an astonishing time with a search engine that
I
thought it might amuse you!
I put in 'Plastic Bottles'. Straightforward, you might think?>
I got the programme for the Dorset Arms Darts Team; an ad for
'Ewe-Boost' for
that post-lambing slump; a chocolate-tasting event; a record of council
tenants' complaints sent to Chichester Council and the home-page of
'Jasper',
who was apparently sent to live in a bird-park when aged 4_
I haven't found any bottles yet!!!

It has to be said that this list is unusually weird and entering the
phrase
'plastic bottles' in most search engines gets you far too many
manufacturers of
plastic bottles for anyone's wishes. However, the general principle
applies;
Internet searches are strange, show you things that seem completely
irrelevant
and, worse, sometimes don't show you anything useful at all.

The key, the secret
So how can you make your Web queries work and give you the answer that
you
want? The problem comes from the amount of information there is out
there on
the Internet. The number of publicly available documents runs into
hundreds of
millions and it's doubling every six months. Our current situation must
be
something similar to the early days of the printing press; the amount of

information that's on offer just keeps increasing and no-one has
formalised the
way to manage it. You can make guesses yourself over what's available,
but most
people use a search engine. These are special Web sites where someone
else has
spent a lot of time and money building up an index so that when you
enter a
search phrase, you are shown Web sites that might possibly be of
interest.
Search engines come in two types - real search engines and directories.
Directories show you the information already organised into a structure
that
you can recognise. Yahoo is the best known example of this and if you
look at
www.yahoo.com, you'll find some headings at the top (weather, shopping,
accommodation). At the right hand side will be breaking news headlines,
but the
majority of the page shows you headings - business & economy; recreation
&
sport; reference; entertainment; health and so on. Click on any of the
headings
and you're shown sub-categories - games; fitness; travel; sport and so
on in
the 'recreation & sport' category. Click again and you go on down the
list
until you find the specific category that interests you. There is a
search box
and at any time you can enter a phrase to search on within the category
you're
looking for.
Real search engines are rather different. The best known of the engines
include
AltaVista, Infoseek and Excite. A search engine indexes the words in all
or
part of documents within Web sites. It can also take a list of index
words
provided by the Web site creator as being relevant to that particular
site.
This is why you can sometimes end up at surprising sites, because it
comes down
to what the Web site creators thought their site was about.
The biggest problem with search engines is that they show you far too
much; the
dreaded message 'matches one to 21 of 35,000,000' on a list of possible
documents is not a harbinger of a quick solution. Another point to note
is that
even the best and biggest of the search engines indexes only as much as
a third
of the documents on the Internet, so search engines aren't infallible.

Ten top tips to try
So what can you do to improve your odds? The short answer is, lots. The
way you
structure a query, and the words you use, can make the difference
between
getting a match of a dozen worthwhile sites, or finding nothing at all
or
millions of irrelevant pages. Here are our top ten tips for clever
queries:

1 Use nouns as the keywords for your queries. Try 'plastic bottle
manufacturer', not 'making plastic bottles' or 'better plastic bottles'.
The
reason for this is that most search engines don't index on verbs or
adjectives
- just nouns.

2 Don't use plurals. Use 'plastic bottle', not 'plastic bottles'. You
can also
tell the search engine to look for all the forms of the words by adding
what's
known as a wildcard. This is a special character that means 'look for
this word
and all words starting with this word'. The character that's used is the

asterisk - '*'. If you put in 'plastic bottle*', you'll get bottle,
bottles,
bottlemakers and anything else starting with bottle. This can result in
getting
twice as many useful matches.

3 Use plenty of keywords. Most people put one word, or at most two
words, in
their search. Put in half a dozen or more and you'll be a lot closer to
getting
the match you need. Each keyword cuts down the number of erroneous
matches and,
if you use enough, you can cut out almost all the irrelevant stuff. For
instance, if Terri had used 'plastic bottle* manufacture*', that would
have
avoided all the incorrect matches she did get. This is one of the most
difficult ideas to use in practise, because there's a feeling that you
don't
know what you it is you want to see. However, you probably have more of
an idea
than you realise and you can almost certainly come up with three or four
words.
If you're worried that you won't get any useful matches, you aren't
using the
right search engine - more of that later.

4 Match all the words in your search. The default behaviour in most
search
engines, unless you tell them differently, is that you'll get matches on
the
words in your search as though they were separated by an OR operator. In
other
words, if you enter 'plastic bottles' most search engines assume you
mean
plastic or bottles, so anything with plastic or bottles will come up in
your
match. What most people want is an implicit and between the words -
plastic and
bottles. To be sure you get this, put a plus sign in front of each of
the words
in the list: '+plastic +bottles'.
This works in nearly all the search engines on the Internet.

5 Take out irrelevant matches. If you're looking for references to
plastic
bottles but don't want any of the manufacturers' sites to come up in the
list,
you would want a query that was the equivalent of 'plastic and bottles
but not
manufacturer'.
The way to do this is to put in the words you don't want with a minus
sign in
front of the word. For example, '+plastic +bottle -manufacturer'.
You usually find the words you don't want by doing a search and being
frustrated by lots of erroneous matches!
One of the things people don't remember is that if you enter multiple
words in
your query, the matches you get find the pages and sites where the words

appear, even if they're not anywhere near each other. For example,
putting in
'plastic bottles' will find any site with the word plastic or bottles.
Even if
you put a plus in front of each of the words, all that means is that the
word
plastic has to be in the page and so does the word bottles. If you
really only
want the sites that are about plastic bottles, then you can alter this
default
behaviour by specifying that the words you want form a phrase. To do
this what
you need to do is to put double quotes around the word, as in 'plastic
bottles'.
This will limit the search so it returns only those Web pages with the
phrase.
Everything described so far works on all the basic search engines.
However, you
can also make use of metasearch engines, which are essentially search
engines
for search engines. These give you more commands at your disposal to
make your
query even cleverer.
So long as the site you're using supports it, you make use of a number
of
boolean search operators to improve your query. Some of the operators
are
equivalent to options we've seen so far. You can type 'plastic and
bottles',
rather than '+plastic +bottles'. Similarly, you can use 'and not'
instead of
the '-' operator. Other choices let you be more specific about the way
your
words or phrases should occur.
For example, 'plastic bottles' wouldn't find any Web page that had the
phrase
'plastic drinks bottles', but if you used the 'near' operator, as in
'plastic
near bottles' you'd get a match on plastic drinks bottles. Similar, but
more
specific operators include 'before' and 'after' - 'plastic before
bottles'
finds only those pages where plastic is near, and occurs before,
bottles.
'Bottles after plastic' has the same effect.
Near, before and after all work, so long as the words or phrases you're
searching for occur within a few words of each other, so if you enter a
search
and you don't find the site you want it is worth re-running the query
without
the 'near' operator, as you could well find documents that actually
match the
words in your query, but the words just happen not to occur in close
proximity.


6 Know how matches are ranked. The first words or phrases in your query
are
treated as more important than those occuring later in the query.
Another thing
that affects the ranking is the number of times a keyword appears in the

document; the more times, the higher the ranking. If the keyword appears
in the
title of the document, that will increase the ranking. You can improve
the
likelyhood of seeing the answers you want by remembering these rules and
by
phrasing your searches accordingly.

7 Use site filters to target your queries. Most of the search engines
will let
you target your queries by limiting which types of site your search
should
consider. To use a site filter, you obviously need to understand how a
site
reference is made up. The URL (Universal Resource Locator) is the name
of the
site that appears in the location box in your browser. The URL is made
up of a
number of elements. For example, if you take a page on Mirosoft's Web
site, the
URL looks like this:

http://www.microsoft.com/office/default.htm

You can break this down into sub-components, as follows.
The 'http://' is the standard prefix that goes on the front of all Web
site
addresses. Your browser will put it on the front of Web addresses
without it.
In terms of site filters, don't use it and don't enter it.
Secondly, www.microsoft is the subdomain name. It usually, though not
always,
starts with www (standing for World Wide Web). You can usually ignore
the www
bit in site filtering. The .com, or similar endings such as the .co.uk
in
www.computershopper.co.uk, is the makor domain name. It can be one of
the most
useful restrictions to apply in filters. Many companies use the .com
domain or
.co.uk, .co.de or whatever. The com domain is supposed to refer to
international companies with sites in more than one country, but because
of
their limited grasp of geography you'll find it used by a lot of USA
companies,
no matter how small, and some UK companies who'd like to pretend they're
big
and important. Other major domains are:

     edu - educational sites, universities, etc.
     gov - government departments
     mil - military departments
     net - Internet service providers and services
     org - non-profit-making organisations

Other domains you might encounter include info, arts, rec and web, all
of which
are fairly self-evident.
The major domain is one of the best ways to limit a search. If you tell
your
search engine that you want to only look in sites ending .edu, then
you'll
avoid all the commercial sites out there. Limit your search to .co.uk
and
you'll hopefully avoid most of the irrelevant USA information.
The rest of the URL is the path within the Web site - in the case of the

Microsoft site, it tells you that we're looking at the Office area and
the main
page within that. To limit your search to a particular type of site, the
most
common method is to prefix your query by the filter. For example,
'url:plastic'
would show only pages where the word 'plastic' appears in the URL of the
site,
while 'title:plastic bottles' finds only pages with the phrase plastic
bottles
in the title of the document. This can be particularly useful in
limiting the
number of matches returned.

8 Use date filters. One of the problems with the Web is that no-one
takes off
the garbage. Looking for classical music concerts at London's Wigmore
Hall
comes up with lots of interesting matches, until you notice they
happened in
1996 or 1997. Use a date filter if you need to make sure the information
is at
least vaguely relevant. One point to note is that the date used by the
search
engines is in most cases the date the page was indexed, not the date it
was
created. The index is often a few weeks out of date, so you should have
a
slightly wider range than you first think of.

9 Search what you've already found. Most search engines have an option
to look
within the matches you've already returned, so you can start off with a
fairly
general search, then hone in on the bits that interest you if the number
of
matches is too large. Many of the search engines express the 'search
existing
matches' idea by using the pipe operator, expressed as the | symbol. The
way
you use this is as 'plastic bottles'| manufacture.
This would look for the word manufacture within the matches on plastic
bottles.


J Choose the right search engine for the job. Some engines are better
than
others for specific types of searches and there are thousands of them
out
there. The table shows our recommendations of what's out there and
what's
worked when we've searched the Web, but new sites and engines are
appearing all
the time and you might find that the particular information you're
looking for
is returned more effectively from other search engines. One good place
to look
for descriptions of the more specialised contenders is at
www.beaucoup.com,
where you'll find details of around a thousand engines organised
by             areas such
as music, science, politics, health and fitness. It's also worth
remembering
that sites are available for searching for people's e-mail addresses,
phone
numbers and so on from sites such as the online yellow pages, or Yahoo's
people
search.


VICUG-L is the Visually Impaired Computer User Group List.
To join or leave the list, send a message to
[log in to unmask]  In the body of the message, simply type
"subscribe vicug-l" or "unsubscribe vicug-l" without the quotations.
 VICUG-L is archived on the World Wide Web at
http://maelstrom.stjohns.edu/archives/vicug-l.html


ATOM RSS1 RSS2