VICUG-L Archives

Visually Impaired Computer Users' Group List

VICUG-L@LISTSERV.ICORS.ORG

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jamal Mazrui <[log in to unmask]>
Reply To:
VICUG-L: Visually Impaired Computer Users' Group List
Date:
Fri, 2 Oct 1998 08:12:48 -0400
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (2634 lines)
From the web page http://www.samizdat.com/script/title.htm

          I-School, I want to find, AltaVista, tutorial

Hw to get the most out of AltaVista Search

Script for a speech by Richard Seltzer, Internet Evangelist,
Digital Equipment


About this tutorial --

This presentation is set up with live links to AltaVista. If you
click to see an example, you will connect to AltaVista,
automatically launching that particular search.

That's one of the powers of AltaVista -- there is a unique URL
for each and every search that you do. Hence you can bookmark a
search, and later click on that bookmark to launch that same
search again and get fresh results. Or you also can do a cut and
paste of the URL associated with a search. That's what I did for
this presentation. So when I click on an example in my word
slides, I launch the search I want at AltaVista. (That would be
very complex to do if this capability hadn't been built in to
AltaVista in the first place.)

Once you are connected to AltaVista, to see an example, you can
experiment by editing the query or by entering new queries. To
return to this script, use the Backup button on your browser.

This talk and the companion piece ("I want to be found", which
you should go through after this one) are based on my research
for the book The AltaVista Search Revolution (published by
Osborne/McGraw-Hill) and also my experience with my own personal
Web site.

Please feel free to send me your questions at
[log in to unmask] I'll do my best to get back to you
promptly. And in the near future, we plan to post the most
interesting and useful questions and answers at this Web site.
If demand warrants, we may also open a chat room, with regular
hours for you to ask your questions there.

----------
 introduction

What you might not realize about AltaVista Search


Simple

Advanced

Web

Newsgroups

Putting it together for search


AltaVista is a very powerful piece of software. But most people
don't realize that. Normally, when you go to a search engine,
you just type in a couple of words. If you get a good result,
fine. If not, you try somewhere else. But very few people ever
look at the help files. They basically presume that if it's
free, it's got to be simple. But this is a complex program. If
you had just received something like Microsoft Office and had
never used it before in your life, and you were going to use it
for the first time, you'd probably look at the documentation.
But people never look at it for this.

My goal is not that you remember all the different commands, but
rather to give you a sense of the wide range of things that you
can do with AltaVista, so you can go to the help files and
figure it all out when you need to. If you don't know that you
can do it, you never ask the question.

That's the reason for going through this. I'm going to show you
some things that you probably never realized were there.

Basically, there are three different ways of searching at
AltaVista. There's a single index, but there are three ways of
moving through it. Which one you choose is mostly a matter of
your own personality and style, your own way of looking for
things.

First we'll talk about Simple and Advanced search, then we'll
talk about the graphical way of looking for things. We'll
discuss searching the Web and also the Usenet newsgroups.

----------
 Simple search

Simple Search


+ and -

phrases -- "richard seltzer" "to be or not to be"

capitalization -- eXcursion

punctuation and spaces

accent marks (foreign languages)

* (after 3 characters, for up to 5; or as element of phrase)


Click on the word Simple to connect to AltaVista in Simple
search mode. This is what AltaVista looks like when you go there
for the first time. Simple search is the default.

Louis Monier, the lead developer of AltaVista, loves Simple
search. He uses it all the time.

Advanced search might seem more natural for an engineer. That's
where you get to use Boolean logic, with AND, OR, AND NOT so you
can very precisely specify what you want.

Simple search is intended so you can just type a lot words,
without knowing anything about what was happening in the
background, and still get useful results.

They wanted to let people use a precise way of searching so they
kept the Advanced search -- which is really how the whole
mechanism works in the background, using the ANDs, ORs, etc. But
they also wanted a way that the ordinary person just coming to
this search engine for the first time could get useful results,
even in the case, which happens so often, that they've never
looked at the help files.

One of the interesting things about Simple search is that I
could enter a long string of words -- say 30 of them -- with no
punctuation. I just type in any word that has anything to do
with whatever I'm trying to find. If I then click, I might get a
result back that says there were 5 million matches. But I'd have
nothing to worry about, because those pages that have all 30 of
those words will be the ones that are on the top of the list.
And way down at the 5 million mark would be the ones that only
had one of them. So the great likelihood would be that the
things that I'm looking for would be right up front. And the
more words I type in, the better the results I get.

Note that on this page, the Web is the default. If I click on
the down arrow next to the word "Web", I see the choice of
"Usenet", which means newsgroups. I can search either the Web or
the newsgroups, but not both at the same time.

They added "language tags" recently. Now I can click on
"Language" and have a choice of limiting my search to any one of
dozens of different languages. For instance, I could search for
only documents that are in French or Serbo-Croatian or Korean.

The underlying index at AltaVista understands nothing about any
language. At the basic index level it is dumb, and that dumbness
is a tremendous strength. Search engines that are built around
the syntax of any given language lock themselves out of the rest
of the world. AltaVista just captures all the text that it
finds. Within a couple weeks of when they went live, they were
surprised to get email from some people in Korea who had gone in
and using their Korean keyboards had typed in queries and had
gotten good results to Korean pages. AltaVista was simply
capturing the underlying code.

You run into some problems with some Asian languages, where
there are a variety of encodings of the same language. But
basically, in most cases, this gives you what you needs. That's
an amazing power.

There are many ways in which this basic notion of a seemingly
disorderly index is a source of great power. I'll mention other
instances of that as we go along.


Still looking at the default starting page (which is the Simple
search page), note that there is a "preferences" button. You can
set the defaults the way you want them for your future visits.
For instance, you can pre-set the language, or Advanced (instead
of Simple), or text-only (so your pages will load faster), etc.


We're beginning with the notion that you can type a bunch of
words and get an interesting result.

Now say in this string of words you've chosen, some of those
words are more important than others. There might be two or
three terms that you really absolutely want to have in the
result. In that case, put a plus sign in front of the terms that
you really want. There also might be words that you know might
be confused with what you want. For instance, if I were doing a
search for "Digital," I might want to exclude "watches". So I
would put a minus in front of "watch."

It's also very helpful, when you've done a quick search and
you've seen that the first few items in the results list were
things that you felt really shouldn't be there -- they weren't
the kind of thing you were looking for -- see what's in common
about them. Then add one or more terms to your query with a
minus sign in front of each, to eliminate those when you submit
your query again.

So with pluses and minuses you can focus your search very
quickly.


When I enter a string of words with spaces between them,
AltaVista provides a list of Web pages that contain any one of
the words -- the first or the second or the third...

For instance, if I search for my own name, if I just type the
words Richard Seltzer (separated by a space), I would get every
instance of either Richard or Seltzer. That would be a huge
number, and it wouldn't be particularly useful to me.

If I put quotation marks around a set of words, like "Richard
Seltzer", that tells AltaVista to look for that phrase -- those
particular words in that exact order. That's a very powerful
capability, made possible by the fact that AltaVista indexes
every single word it finds.

Before AltaVista came along, some search engines relied on
knowledge of syntax of a particular language to try to cut down
on how much work they were asking their machine to do, and to
cut down on the size of their index. They might throw away the
little words, like: a, and, the, or, but... AltaVista doesn't
throw away any words. It keeps all of them. And because it keeps
all of them and remembers not just what the words are, but also
their exact order on the page; so I could go in and type in a
paragraph or longer out of a book, with quotation marks around
it, and do a search to see if somebody has plagiarized my work.

There are lot of things you can do to refine your search based
on this capability. In this case, instead of getting back a
couple hundred thousand matches for Richard space Seltzer, with
a search for "Richard Seltzer" in quotation marks, I get back
about 500; and most of the ones at the top of the list are, in
fact, about me. (Later, I'll talk about how to make that
happen.)

Consider the quotation "to be or not to be". If somebody had
made the syntactic decision that little words didn't mean
anything, none of those words would be in the index. But, as it
is, they are all there, and you can search for that quote, or
for any other quote.

When you get up to the size that the index is now, this becomes
very interesting. AltaVista recently grew its index. For a long
time it was at about 30 million pages. Now it has over 100
million pages. There's a lot of information there. And if you
haven't been there in the last month or two, you ought to take
another look. It's much richer than it was before.


The way that AltaVista deals with capital letters can prove very
useful when you want to search for brand names with unusual
capitalization.

If you type a word in all in lower case, AltaVista will search
for both lower case and upper case. But if any letter is in
upper case, it looks for that and only that.

Marketing people like to put capital letters in strange places
to make the spelling of their brans names unique. For instance,
Digital had a product called eXcursion, spelled with a capital
X. If I search for eXcursion, I get exactly what I want, because
only mentions of that product will match that unique spelling.


This brings up another point that is important about AltaVista.
Getting information from AltaVista is the reverse of how you
would have normally thought of getting information at a library.

If you talk to a reference librarian, if you have a question
that has many possible ways to answer it, the librarian will be
happy -- pull a book off the shelf, give you an answer, and you
walk away as a happy customer. But if you ask for something that
is rare and difficult to find, that librarian is going to start
tearing her hair out, and it's going to take a long time and be
very painful to get that answer.

With AltaVista, the more rare, the more unique, the more hard to
find something is, the easier it is to find. Because there are
over a hundred million pages in the index, chances are excellent
that what you want is mentioned. And because it's rare, you are
likely to get only a few matches and they are likely to be
exactly what you want.

So if the doctor gives you a prescription and you'd really like
know what it is before you take it, enter that ridiculously long
word and you'll find it here.

If you get a strange error message on your PC or your
workstation -- it's a crazy combination of letters and numbers
-- type that in at AltaVista. Chances are you'll probably find
somebody talking about that problem.

The more rare it is, the easier your search is because you don't
have to worry about how to refine your search. What you want is
just going to be there at the top of your list.

And using capitalization, like using quotation marks, is a way
of making your request rare.


The way punctuation is handled seems totally bizarre. At
AltaVista, all punctuation was created equal. It's very
democratic. It doesn't matter if it was a period, a comma, a
slash, an underscore, a hyphen -- they are all the same.

When they first told me this, when I was writing the book, I
asked, "what???" It didn't make sense to me. Then they pointed
out -- think of the old PDP-11 line of computers that was made
by Digital. Was it PDP11/70 or PDP/11-70 or PDP_11/70? And even
if you know what the right way to spell it is, do you think that
people posting things about a product like that would remember
to spell it the right way? If AltaVista took punctuation
literally and matched only a period to a period or a slash to a
slash, then to find a product that is complex with that kind of
punctuation in it, I would have to go to Advanced search and
make a complex query -- this or this or this or this, and try to
imagine all the ways that people could have fouled up that
spelling. But because all punctuation is equal, I just need to
put in one way and I'll get all my results. That's pretty
simple.


Accent marks are handled very much the same way as capital
letters. If type a word in without accent marks, it's going to
look for those letters with or without accents. If I type them
with accent marks, it will only look for them with accents. So
in other words, if I type in "elephant", it will look for the
English word elephant and the French word with the acute accent
on the e's. If I have my keyboard set up to do French
characters, and type in the French word with accents, it will
only match with the French word.

Originally, that would have been the way, with the AltaVista
index, that you would checked out for foreign languages. You
would have looked for a particular accent or a particular common
word, and then you could have found most of the pages in that
language. Basically, that's what "Language Tags" does. They put
a piece of software that sits on top of the dumb index and it
tests pages based on things that are characteristic of a
particular language.


You are probably used to using an asterix (*) as a wild card
that can stand for missing letters or words in a search query.
AltaVista lets you do that, but the limitations are interesting.
They have a lot to do with how AltaVista was developed.

AltaVista was developed by researchers at Digital Equipment's
laboratories in Palo Alto, CA. They were trying to climb Mount
Everest. They wanted to take on a challenge and that was the
biggest challenge that they could find. The challenge as they
perceived it was to make it possible for millions of people to
get to tens of millions of pages, and to get really quick
results. They weren't out to arrive at academic purity. They
were try to get practical results. So when they faced with an
instance like this -- yes, they would like people to use an
asterix, because wildcards are handy. But if they let you put it
at the beginning of a word, it would mean that they would have
to search through 100 million pages to answer your request. So
they arrived at an interesting compromise, and they might change
the details at some point as circumstances change. If you type
in three characters first, then you can use an asterix to stand
for up to five characters. So if I want to search for the
English spelling as well as the American spelling of a word like
color (colour), I can throw in an asterix where the "u" would be
(colo*r). Or very often, I use the asterix to stand for the
plural, so I can search for both singular and plural at the same
time (dog*).

You can also use an asterix for an element in a phrase. If I
have a phrase in quotation marks, like "one if by land and two
if by sea", and maybe I can't remember one of those words, then
I can throw in an asterix to stand for a missing word ("one if
by * and two if by sea").

----------
 Advanced search

Advanced Search


to get more than 200 matches (link:)

NEAR (up to 10 words away) (Richard NEAR Seltzer)

dates

AND, OR, AND NOT

parentheses to group complex queries

counts (accurate for low numbers; for thousands+ = estimate)


There are a number of reasons why you might want to use Advanced
search. Let's take a look.

Here's a ranking box (small form at the top) and a query box
(the large form, labelled "Boolean").

What's the difference between query and ranking? The best
example that I can think of has to do with cooking. I'm a lousy
cook. I know nothing about cooking. I know so little about
cooking that I can't use a cookbook, because I don't know what
the dishes are called. I wouldn't know what a sauted something
or other was. But I can go to AltaVista in Advanced search, and
put recipe in the query box, and in the ranking box I can put in
a list of everything that happens to be in the refrigerator
right now. I submit the query, and the things at the top of the
list are those recipes that have most of those ingredients in
it. I'll probably learn something about categories by doing
that, but I didn't need to think of the world that way to get
good results.

Another example -- say you need to hire somebody. Go to Advanced
search and type "resume" in the query box; and in the ranking
box, just list the qualifications that you are looking for.
There are over 800,000 resumes out on the Web today -- just
plain HTML pages that AltaVista has indexed. So when I do this,
the people who have most of those qualifications will be right
on the top of the list, right off the bat.

It's quick and simple. And it's a different way of thinking,
because you don't have to think of categories.

We're so used to thinking the way libraries are organized, like
with the Dewey Decimal System. Does it belong in this niche, or
this niche, or this niche? But you don't have to think about
niches. That's part of the power of it. You don't need to know
how the information is organized to find the information you
want.


There are a few other unique capabilities in Advanced search.

For instance, I can specify the date -- the date that a Web page
or a newsgroup item was posted. When I was doing research for
the book, one of the very first things I did was search
newsgroups. I searched day by day by day for every instance of
"AltaVista", because I wanted to know not just what the
developers thought about it, but what the users were using it
for, which isn't always the same thing.


They also recently added the ability to get a "precise count" of
matches.

When AltaVista first came out, some of the most interesting
articles written about it were totally bogus. Reporters got the
wrong idea and got enthusiastic in the wrong direction. You
found articles in the Wall St. Journal and the New York Times
writing things like, "We did this search at AltaVista and it
showed that John Lennon is more popular than Jesus Christ." They
were just looking at the number of matches when they did
searches. Well, when the developers were designing AltaVista, as
part of the ranking mechanism, they needed to have a rough feel
for how many of these things there were, before they put them in
any kind of order. And they only needed a rough feel. An order
of magnitude was pretty good. And if they were within a factor
of two, that was great. But reporters weren't taking it that
way. They were taking those numbers literally.

There was also a student up in Canada who did a very clever
program when AltaVista first came out, that again was totally
bogus, unfortunately. He wanted to track instances of Windows 95
being mentioned on the Web. So he wrote a little program that
went in every day and did that search at AltaVista and brought
the number back and plugged it into a graph so he could see over
time, day by day, how many mentions there were. But it was
totally random. He could have had a hundred thousand now and a
minute later done the same search and come out at fifty
thousand. And as far as the developers were concerned -- hey,
that's in the right ballpark.

So you have to be careful before jumping to conclusions.

Today, there is a little box you can check in Advanced search
that allows you to get a more precise count. It won't be totally
precise, but it will be better than you get in Simple search, if
you just do an ordinary search. Basically, the developers are
saying, "If you don't want us to go to all the trouble of giving
you the matches, if all you want us to do is count them, then
okay, the cycles we save from not doing that we'll put into
getting you a more precise count." So you can get a better
count. But I wouldn't bet my life on it anyway, because if the
machine happens to be busy at the time you go to it, they will
truncate the count. So it is still imprecise, but it is better.
And if you do it at an odd time of day, it will be more reliably
better.


In Advanced search you can also use the NEAR command.

Most of the Boolean Algebra operators have their equivalents in
Simple search. NEAR is something you can only do in Advanced.

Why would I want to use NEAR?

I showed you the example of searching for myself, with my name
as a phrase, in quotation marks -- "Richard Seltzer". The
problem of doing a search like that is, it wouldn't capture any
of the odd instances like Seltzer, Richard or Richard W.
Seltzer. Richard NEAR Seltzer gives every instance of the words
Richard and Seltzer within ten words of one another and in any
order.

Actually, I had a case just a week ago where I was trying to
find an old friend, let's call her "Elaine Wilcox." Her phone
number was unlisted. I knew she was a professor somewhere in
Illinois and hence was probably listed on the Web. But a search
of her first name and last name in quotes didn't get her. Then
when I used Advanced search and entered Elaine NEAR Wilcox, all
of a sudden I found her right away in a list of the faculty, but
her name appeared there with an initial in the middle, like
Elaine I. Wilcox, so the other kind of searching would have
never found this instance.

Hence, when you are using AltaVista, don't just presume that
your objective is to get a small number of matches. You ought to
make sure that your search has captured the full range of what
you are in fact looking for and that you haven't just by the way
constructed it, cut out the very thing that you are looking for.


AltaVista also lets you use a whole series of commands like
link: that let you discover information that you were never able
to get before.

I mention this command here to help you understand why you might
sometimes want to see more than 200 matches.

link: followed by a domain name or a complete URL gives you
every Web page that has a hypertext link to a particular site or
to a particular page. That is very useful information. It's
useful if you are running your own Web site and you want to know
who has links to your site -- those are folks that you want to
get in touch, you want to know what they are doing, you want to
know why they are linking to you; you might want to link to
them. Also, if you change the address of a page of yours, if you
change your directory structure, you'd like to know what pages
have hyperlinks to the old addresses, so you can go to the
Webmasters of those pages and tell them so that they change to
the new addresses, so you don't lose that link.

Normally, when I'm going through AltaVista, whether in Simple
search or Advanced search, it may say that were half a million
or ten million matches, but I'm only going to see 200 of them.
I'll see 10 per page. I can click at the bottom of the page --
next, next, next... But after I've gotten 200, clicking for next
won't give me anything more.

Once again, the developers are trying to serve the needs of the
greatest number of people, as efficiently as possible. Well,
that 200 limit is going to serve the needs of 99.999...% of
people. Very rarely do you want to go beyond that. But there are
some times when you need to go beyond that. And link: is one of
those times.

For instance, if I search for link:samizdat.com, I see there are
over 700 Web pages with hyperlinks to pages at my personal Web
site. That's all good information. I don't want to throw away
any of them, and the ranking doesn't make any difference
whatsoever to me. Every one of those is equally important to me.

So they have made it so if in Advanced search I enter a query
and leave the ranking box blank, they will give me all the
results. Once again, it's a compromise. You have said, "I don't
need to have the ranking." So that is saving them cycles. So in
return, they are willing to give you all the results. You go to
the bottom; you hit next; and you will get more than 200.


The other commands here in Advanced search, you are probably
very familiar with -- AND, OR, AND NOT. AND is the same as
putting pluses in front of two words in Simple search. OR is the
same as a space. AND NOT is the same as a minus sign.

You can also use parentheses to group complex queries.

Let me give you an example of that. I'm not sure how far in you
can nest one set of parentheses inside another, but it can go
pretty far.

In this example -- Digital has always been a little bit
schizophrenic, are we Digital? or are we DEC? So if we want to
use the link: command and wanted to get complete results, then
we could have

(link:digital.com OR link:dec.com) AND NOT (host:digital.com OR
host:dec.com)

host: means what pages are at that site. So with this query, I'm
saying, "Give me all the pages that link to Digital that aren't
at Digital's own sites." Because why should I want to know about
our own internal links. I just want to know those folks who
aren't at Digital who are linking to Digital pages.

That query gives you a sense of how you can organize your
thoughts. With parentheses, you can say what operation you want
performed before what other operation.

You don't need to capitalize the operators -- AND, OR, AND NOT,
NEAR. I just do that for convenience and not to get confused.
Lower case would work just as well. It's a matter of personal
taste.

----------
 Web search

Web Search


text:

title:

link:

anchor:

url:

host:

domain:

image:

applet:

object:


We already mentioned two special Web search commands -- link:
and host:

Sometimes you know or suspect that the information you want is
at a particular site. For instance, if you wanted to limit your
search to pages at MIT, in Simple search, you could begin your
query with +host:mit.edu

By the way, if all your pages are indexed at AltaVista, you can
use host: to provide an index of your own site. For instance,
+host:samizdat.com will list every Web page at my site. And
+host:samizdat.com followed by query terms will launch a search
that is limited just to my site. You can use the same technique
I used here for the examples to put a hyperlink at your site
which automatically launch a search of your site at AltaVista.
Just tell your visitors to enter their query after the term
+host:domainname which will automatically appear in the query
box.

I very rarely use text: or title: But with those commands, you
can limit your search to that part of the Web page.

anchor: is for the words that are highlighted on a Web page that
you click on to go to another page. You might remember that once
you clicked on a certain set of words on a particular page and
you want to go back there. Well, you could search for anchor:
followed by those words.

With url: I can limit a search to a particular directory. Maybe
someone has an internal site on our intranet or has a hosted
site at an ISP and hasn't registered for his own domain name. In
that case url: followed by the main directory for that person's
pages, will gave me a list of all the indexed pages that are
there. Or I can check to see if a particular page is in the
index at AltaVista by typing url: followed by the complete
address of that page.

domain: here means .edu, .com, .net, and also all the
identifiers for different countries. So I can limit a search to
page in France, for instance, by beginning my query with
+domain:fr.

Keep in mind that the more you know about the Internet, the more
effective and useful your searches are likely to be. For
instance, today, companies in third-world countries tend to put
their Web sites in the US or Europe. They don't put them in
their own country because the lines are too slow. So doing a
search for domain:co, does not give me a picture of Web sites of
companies in Colombia. Rather it shows a very small and random
subset.

image: is the name of a graphic image. Today, AltaVista only
indexes text. But it does index the names of images. That comes
in handy. For instance, you may wish to pull in a picture of
Jupiter to add to your site. Just do image:jupiter or
image:jupiter.gif or image:jupiter.jpg, and you'll get a list of
all the images that have that for a name, some of which are
likely to be public domain pictures from NASA.

applet: is a handy command. I can do applet:* and it will give
me a list of every Java applet on the Web. And, of course, I can
limit it. I could do +domain:fr +applet:* and ask for every Java
applet on sites in France. In other words, you can combine these
commands in interesting and powerful ways.

object: is very much like applet: only it's for Active-X
objects. (There are nearly six times as many applets as Active-X
objects on the Web today.)

With the applet: command also, if you have written applets
yourself and put them up on the Web, and maybe you did it for
the good of humanity and you were hoping other people would pick
this up and use it. Well, you could search by the name of the
applet, because people would have probably kept the name. Or
maybe you put it on the Web and you didn't want anybody to steal
it. Well, you could search for the name, and people still would
have kept the name and you'll be able to find it that way.

----------
 newsgroups

Newsgroup search


current, immediate

candid

articles stored at AltaVista

from:

subject:

newsgroups:

summary:

keywords:


I love newsgroups. I loved them long before there was a Web. I
don't use them now as often as I did in the past, because there
are so many groups and so many items in each of them that they
have become unwieldy and time-consuming. But AltaVista makes
them much easier to use now than they were a little while ago.

Clicking here will show you what a result list looks like. I
chose Usenet instead of Web, and entered my query as usual. Now
in my results, in the far right column, here are the email
addresses of the people who posted items. To send email to any
of them, just click on the address.

The next column to the left is a list of all the newsgroups that
had matches to my query.

The next column to the left is the list of the items themselves.
If I click on an item, I get that item served up to me by
AltaVista itself. When I search the Web and click on an item, I
get connected to a different Web site. But when I'm search in
newsgroups, AltaVista acts as a newsreader and gives you the
item directly from its own files of over 14,000 newsgroups. It
keeps these items anywhere from three weeks to about six months,
depending on the whim of the person who runs the machine. If he
likes the group, it stays a long time. But don't count on
anything being there more than about three weeks. AltaVista is
not meant to serve as a newsgroup archive.

One of the things that happens because these are stored at
AltaVista, when you retrieve an item, the words in your query
are highlighted in bold in the text displayed.

If the item you want happens to be a computer program or a
picture, you can click on "B" to retrieve it as a binary file,
so it won't be corrupted in the transmission.

And if you have to be far away from AltaVista and you'd prefer
to get the news item from a local newsserver rather than from
AltaVista, just click on "L". For instance, somebody in Russia
might find that handy. You get the very same item, but perhaps
faster.

Now why would you want to search through newsgroups? There are
plenty of good reasons. I've always found that the information
in newsgroups is often more current, and sometimes more accurate
as well, than the information on Web sites. Corporate Web pages
are typically written like annual reports. They go through
committees. And they take forever to be posted. It might be
weeks after a press release first appeared before it was posted
on the Web site, because of corporate procedures.

If I go to a newsgroup, I'm going to see what people are saying
today -- what they are saying candidly, what they think, what
they feel. They don't get approvals for what they post there.

I can also limit my search to particular aspects of a newsgroup
item. For Web searches, you could use commands like link: and
host: For newsgroups, you can use from: subject: newsgroups:
summary:

You can use this tool to do some very interesting competitive
research. For instance, you could search

+from:ford.com

which looks for items posted by employees of Ford,

+newsgroups:comp

which limits the search to computer-related newsgroups,

Then I can search for mentions of Digital's products or
competitors' products and find out what people at Ford are
saying about them today. Or I could search for mentions of "bug"
or "problem". I can get an early warning if people are saying
nasty or good things about us or our competitors. Or maybe I can
get an early warning about major projects that are going out for
bids. It's a useful source.

----------
 putting it together

Putting it all together


value of rare words, trademark spellings

checking spelling and usage

recipe by ingredients

trademark search

competitor search

customer search

bookmarking


You can put all these capabilities together in interesting ways.

Just consider what you can do with newsgroups.

Before AltaVista came along, newsgroups were in a success
crisis. There were simply too many of them. There was simply too
much information.

In the early days, you might subscribe to a dozen or so and keep
track of those on a regular basis. But they just kept
splintering and splintering until there was no way that I could
track of all the groups related to subjects I'm interested in.
Either they were getting so monstrous that a single group was
getting a thousand postings a day, or they were so many little
splintered parts that I had to go to so many separate groups
that I couldn't keep track of it. It became useless to me.

Now, with AltaVista, I'm able to create my own newsgroups. I can
carefully construct a query that focuses on the kind of
information that I really need. I run that and if the results
look good, I simply bookmark that search and the next day I go
in and click on that.

With the "genome" example, that subject cuts across many
boundaries. Within many different newsgroups, the same subject
is discussed.

And if I wanted to, if I had my own Web pages, I could set up my
page with anchors that say basically, "if you are interested in
X, click here". And I would have carefully constructed all these
great queries that are going to get people exactly the kinds of
things they want out of newsgroups. And they don't have to learn
the commands at AltaVista. I've just done this for them.

Keep in mind that back in the early days of the Web, many pages
consisted of nothing more than lists of hyperlinks to other
pages. People would post their carefully constructed lists of
pages on particular topics. The problem was that those lists
soon got out of date -- because so many new pages were added to
the Web every day and so many URLs went out of date as
Webmasters moved files around. So now, rather than scramble to
try to keep such a list current, you can try to construct a
query at AltaVista which produces similar results, and put a
link from your page to that particular query, so visitors at
your site who click on that link will get the latest results.


I mentioned "precise count" in Advanced search. One practical
use of that is as a spelling checker. For instance, in
fast-changing fields -- like the Internet -- we're constantly
coming up with new terms. What's the right way to spell those
words? What's the proper usage? I can use Advanced search, click
on "count only", and put in the different variant spellings that
I can imagine and see how each of them score. If there's an
order of magnitude difference, there's my answer. I don't have
to wait ten years for dictionaries to catch up. This is what
current usage is.

----------
 refine

"Refine"


help in finding anything

dynamic categories (not pre-determined)

information X-rays


We've talked about Simple and Advanced search. The third way of
searching at AltaVista is with Refine. (Up until a little while
ago, this was called "LiveTopics".) The Refine button is right
next to the Search button.

Some people use this method of searching all the time -- it is a
matter of taste.

Personally, I find it much more valuable for getting kinds of
information that otherwise I wouldn't be able to get -- showing
relationships among information, like an information x-ray. It
isn't precise, but when you start doing it regularly, you learn
to interpret what you see.

----------
 refine, your way

Help in Finding Anything -- Your Way


recipe: Advanced -- no categories, what's in refrigerator

recipe: Simple with Live Topics -- + recipe +chocolate -- see
the categories, see the x-ray

manual: rare terms -- +bj200 +driver*

automatic: printer, driver, +canon


Let's take a look at how this works.

I talked about my example with the refrigerator and recipes.
Well, if I had done a search for +recipe +chocolate, to get all
the recipes that have chocolate in them, and instead of
"search", I clicked on "refine," this is what I would see first
-- a list of 20 categories, created on the fly, based on the
frequency with which these words appear on pages that match the
query.

This, like Language Tags, is a piece of software that rides on
top of AltaVista, and it does require some semantic
understanding, because for this purpose you do need to throw
away the little words.

Today, if you do an AltaVista search in Russian and click on
Refine, what comes up is just a list of the 20 most common words
in the Russian language, which is of no particular use to
anyone. Somebody with a knowledge of Russian has to go through
and modify that software for that particular language. It has to
be done one language at a time. It works very well for English.

Now, in addition to those categories, you also see
sub-categories. In other words, on pages that match my query
that also have the word "butter" these other words are the most
common.

Here I can move my cursor to the box next to a category word and
click to add that term to my query or subtract that term from my
query. Basically, I'm just editing my query in a different mode
-- plus this and minus that.

People who don't want to learn how can construct commands with
pluses or minuses, or who like a list of words to prompt their
memory or their imagination as they try to put together their
query, can do it this way.

I can't say that I've ever done that myself.

What I find much more interesting is the graphical view of the
same information. Once you have the list of categories, you get
the graphical view by clicking on GRAPH.

Here, if I put the cursor over any of the category terms, I see
the sub-categories. And once again, I can click to add or
subtract any of these terms.

Let's consider how this capability can be used.

First let's consider how people think.

Around the time AltaVista came out, I got this laptop with
Windows 95. I had a printer already and wanted it to work with
this new machine. Of course, I needed to get the driver. For
that, all I had to do was go to AltaVista and do a search for
+bj200 +driver* and the first item on the list of matches was a
page that in fact had the driver I wanted; and within five
minutes, I had my printer working. That's an example of using a
rare term to get what you want quickly.

Another way of looking at this same problem might be to think of
the word "printer". I think of the specific -- the model number.
But some other people might think of the category -- printer.
That person could do a search for printer, click on refine, and
graph, and go through a series of searches -- adding and
subtracting categories and clicking on search or refine again.
And that person would find the same piece of software, maybe
from a different page, and following a very different trail.
There is more than one way to skin a cat, and AltaVista provides
you with three ways (Simple, Advanced, and Refine).

----------
 refine, x-rays

Information X-Rays


snapshot of fast-changing fields (biochemistry categories,
x-ray; "management consulting")

first-cut market segmentation (intranet, "electronic commerce",
isp)


To me, this capability is not as important for finding as it is
for telling me something about information.

Say there is a fast-changing field, like biochemistry. My son
was majoring in that. If I do Refine for a query for
"biochemistry", I get the top 20 categories, as of today,
statistically determined from 100 million Web pages, followed by
the sub-categories. If the field changes fast, I'd want to keep
track of that if that was my career.

How do the pieces fit together? A graphical view might make the
relationships clearer to me. If there is a line connecting two
terms, that means that pages with the one thing are likely to
have the other as well. If terms are isolated, there is very
little overlap.

Now, my son just graduated from Yale last May and got a job in
management consulting. That's a field that didn't exist when I
went to college; so I was interesting in getting a quick look at
the field -- what does it include? what is the typical subject
matter that firms like that deal with? Now, when he was choosing
among companies that were potential employers, he also could
limit his search to one or more of those companies and see what
they covered (or at least what their Web sites covered).

You can also use this capability for first-cut market
segmentation. Here is a search for intranet. Imagine how much
money we normally pay market research firms to tell us about
intranet or "electronic commerce" or isp. These are brand-new,
fast-changing markets. We can pay them a few hundred thousand
dollars to go off for six months and come back and tell us what
the world was like a year before. Or, you could go to AltaVista
and do one of these quick searches to get a feeling for what are
the major categories, provide that information to the market
research company, and say, "Okay, charge us half as much and get
us the results a lot faster. This is your starting point." Or
you might be able to get to the point where you can derive
useful information directly from these x-rays yourself. For
instance, are there islands which are totally isolated from the
rest of the categories -- those might be naturals to consider as
separate niches.

I call these "x-rays" because this is not precise. It takes a
while for a doctor to learn how to interpret what he is looking
at.

----------
 refine, portraits

Portraits


portrait of the entire Internet -- +*

portrait of God

portrait of a country -- Colombia

portrait of an individual ("richard seltzer")


There are fun things that you can do with this capability, too.

This is my favorite elevator speech. If you have a wireless
Internet connection on your laptop, and you have an elevator in
the building, and you are trying to explain what the Internet is
to your boss's boss, you can connect to AltaVista, do a search
for +* -- you'll get about 100 million matches (perhaps
truncated a little bit if the machine is busy). Then hit Refine
and Graph to get the graphical view. That is a picture of the
entire Web -- 100 million pages. It would make a great magazine
cover -- this is the Web.

You can do many other fun things as well.

Here is a picture of God.

This example helps make the point that your results are only as
good as how well you thought out what you are trying to do
before you submit your query. It's like "I think therefore I
am." To what extent did you stack the deck, did you put your
answer in already from the way you constructed the query? If I
search for "God" with a capital G, I've limited my search to
English language and to certain religions right off the bat. So
my results end up very Judeo-Christian. If I want to get a
worldwide view of religion, I'd have to go to Advanced search
and enter every way I could imagine translations of the word
"god," and names of gods, and related terms.

This is what I get when I search for "Colombia." Now maybe this
is a picture of the country Colombia. But the words here look
totally bizarre. What does Colombia have to do with Gabon, Togo,
Micronesia... ? Maybe this is telling me something very
interesting about international drug trafficking. Or maybe my
query is not well suited for what I'm trying to find. For
instance, it is quite possible that the name "Colombia" is
common in many different parts of the world -- it isn't just the
name of a country; it is also a family name and the name of
numerous towns and cities. To get mentions of the country and
only the country, I'd have to work harder. So before you jump to
conclusions, consider -- could there be a problem with the way
you asked the question? What is it that you are trying to learn?

There are also interesting repercussions when you put everything
about yourself on the Web. My little Web site has over 700
documents, some of which are complete books. Just about
everything I've written over the last 30-40 years is all sitting
on the Web and every page of that is indexed at AltaVista. The
last time I did a search for "Richard Seltzer", the picture
looked like a case of schizophrenia -- two separate, unconnected
pieces. It's a little strange looking at yourself in a mirror
and seeing something like that.

----------
 refine, web-site analysis

Web-site analysis


major research site (host:harvard.edu)

your own site (host:samizdat.com) (host:digital.com)

subset of your own site (+host:digital.com +internet)

competitors' sites (what do they target? how are they allocating
resources?)


What useful things can you do with this?

Here's a look at host:harvard.edu. Most major universities have
dozens, if not hundreds of different Web sites, with different
people running them. They have overlapping information spread
across these sites. If I went to Harvard and tried to find
something just by going through their front door, I'd get lost
very quickly. But by doing this, I'm getting an x-ray view of
all of Harvard's Web sites. I could then decide that I'm
interested in neurology and disease and I'll take a look at
everything having to do with those at Harvard, regardless of
which server they happen to sit on at Harvard. Not depending on
how the Webmasters at Harvard set up their navigation buttons;
but finding my own way through it.

Here's my own Web site.

And here's what Digital's Web site looks like.

You can also combine commands. This is +host:digital +internet
If you are looking for vendors, this is a good thing to do --
check what the x-ray of their Web site or a sub-set of their Web
site looks like. They may say that they are a full-service
vendor and they have this and that. Well, take a look.

You can also take a look at the competition and your particular
product area at their sites. You might want to compare that with
pictures of your own company's site. The comparison might tell
you something about how the various companies are allocating
their resources.

You can use this to check trends -- it might be sites that link
to your pages, it might be changes in a fast-changing field.
Whatever it is you could make these kinds of pictures and save
them. You can't save them directly as files, because the
information comes up in a Java applet. But you can capture the
images as images. I use PaintShop Pro to save them and print
them.

If there is a subject that you are following regularly and that
tends to change frequently, you might want to save a series of
snapshots. And at some point in time, a significant change might
jump out at you in the graphical view, whereas if you were just
looking at a list of thousands of matches, that could slip by
unnoticed.

----------
 reference


Followup and Future Reference


Richard Seltzer -- [log in to unmask], [log in to unmask]

http://altavista.digital.com

http://www.digital.com/info/internet

http://www.samizdat.com/

Business on the World Wide Web, live chat sessions, Thursdays,
noon to 1 PM Eastern Standard Time (GMT -5),
http://www.web-net.org

AltaVista newsletter (Cobb Group) http://www.cobb.com/alt


You are now ready go through the companion tutorial "I want to
be found" -- a speech on how to using AltaVista to improve your
Web site and to drive more traffic in your direction.

Remember, please feel free to send me your questions at
[log in to unmask] I'll do my best to get back to you
promptly. And in the near future, we plan to post the most
interesting and useful questions and answers at this Web site.
If demand warrants, we may also open a chat room, with regular
hours for you to ask your questions there.

----------
         I-School, I want to be found, AltaVista, tutorial

How to AltaVista Search to improve your Web site

Script for a speech by Richard Seltzer, Internet Evangelist,
Digital Equipment


About this tutorial --

This talk and the companion piece ("I want to find", which you
should go through before this one) are based on my research for
the book The AltaVista Search Revolution (published by
Osborne/McGraw-Hill) and also my experience with my own personal
Web site.

Please feel free to send me your questions at
[log in to unmask] I'll do my best to get back to you
promptly. And in the near future, we plan to post the most
interesting and useful questions and answers at this Web site.
If demand warrants, we may also open a chat room, with regular
hours for you to ask your questions there.

----------
 AltaVista Search for Information Providers


how AltaVista works

implications of AltaVista

tool to fix Web sites

adding and subtracting URLs

what AV doesn't index

ranking rules

exclusion

flypaper


In this presentation, we'll look at how to set up your pages to
make it more likely that you'll be found. You may not at first
think that's important, but it's very important. The world is
changing. Nobody would have ever imagined a year ago the number
of people today who have their own personal Web pages. Geocities
-- one site -- has over a million people's personal Web pages.
And there are lots of others making similar offers, with ISPs
everywhere offering free Web space.

The combination of personal Web pages and full-text search, with
AltaVista, makes for a very interesting kind of environment.
There would be no point in having the personal Web pages if
nobody's ever going to find you. But when you do have this
capability, then all of a sudden, being found becomes very
important to you, and the Internet becomes much less like a
library, and much more like a social event -- where plain text
pages become an invitation for dialogue.

I'm going to be talking a lot from my personal experience. I
have my own personal Web pages that I get free from an ISP. I
run a little site with about 12 Megabytes of text. Keep in mind
that if you don't put graphics up on the Web, you can fit an
awful lot of stuff into about 10-12 Megabytes. In fact, you can
put 20 copies of Huckleberry Finn in 10 Megabytes, if you didn't
use graphics. You have a choice putting up 20 books on the Web
for free, vs. maybe putting up a dozen pictures of your family
and your pets. So if you do get your own personal Web pages,
keep in mind what the power of that could be if you don't limit
yourself.

----------
 how it works

How AltaVista Works


Scooter following trail of URLs (1000s of threads)

full text index


First, let's look at how AltaVista works -- from the perspective
of an information provider.

How is your information going to get into AltaVista, and then
how are people going to find you? And how can you use AltaVista
as a tool to improve your pages, to get more traffic to your
site, to get your pages higher on the ranking lists when people
are searching for things that are related to your content and
your business.

AltaVista has an index that is built by sending out a crawler (a
robot program) that captures text and brings it back.

The main crawler is called "Scooter." (It now has a few cousins,
too, which have specialized jobs to do to help keep the index
current, such as checking for "dead" links -- pages that have
been moved or gone away and should be removed from the index.)
Scooter sends out thousands of threads simultaneously. 24 hours
a day, 7 days a week, Scooter and its cousins access thousands
of pages at a time, like thousands of blind users grabbing text,
pulling it back, throwing it into the indexing machines so the
next day that text can be in the index. And at the same time,
they pull off, from all those pages, every hyperlink that they
find, to put in a list of where to go to next. Because, of
course, there is no one central registry for Web pages. When you
create a Web page, you don't have to tell a soul. You just put
it up there. There is no central place for the crawler to go to
and say, "Tell me about all the pages out there." No, it has to
discover the pages by going from link to link to link. Because
of that, you can't predict with assurance when AltaVista might
find a new page of yours.

Imagine you are playing a game with over 100 million pages out
there and you have a few thousand threads going all the time. A
thousand is a big number, but it's nowhere near as big a number
as a hundred million. So what do you think the odds are that
it's going to find your page in the next week? in the next
month? or even in six months?

Yes, in a typical day Scooter and its cousins visit over 10
million pages. If there are a lot of hyperlinks from other pages
to yours, that increases your chances of being found. But if
this is your own personal site, or if this is a brand new Web
page, that's not too likely.

So what can you do? We'll go over in this in greater detail
later, but, basically, you can go to AltaVista and at the bottom
of the page, click on ADD URL, and simply type in the URL of
your new page. The crawler will immediately fetch that page and
hand it off to the index machines to be added to the index,
probably by the next day. So instead of waiting for this random,
many-month process, you can take control when your page is
indexed.

My personal view is -- ignore the directions at AltaVista where
it says only type in the URL for your home page. If you do that,
once again, the pages that are linked to directly from your home
page will be put at the end of this huge line of what to go to
next. And after they are fetched, then the URLs in those pages
will go to the end of the list; and so on, until your whole site
is found and indexed. If you want control over the process, if
you want all your pages in the index tomorrow, then you should
add each and every page individually.


Full-text index is a very important concept.

Large companies often misunderstand how AltaVista works.

There is such a thing as a "metatag" for "keywords", which
really confuses Webmasters. Webmasters often think that
AltaVista and other search engines only search for things that
appear in metatags -- special commands embedded in the header of
a Web page. That is not the case. Every word on the page counts.

The implications of full-text search only became clear to me
when I was talking to Brian Reid, director of one Digital's
research laboratories in Palo Alto, California. He mentioned
that he had saved every email message that he sent or received
for over 15 years. He just threw them on a disk. He's in a
research environment; he has as much disk space as he wants. He
didn't bother to put them in any directory structure. He didn't
bother to name them. He didn't have to. He just threw them on
disk. He knew that with the direction of technology sooner or
later that would be valuable to him.. Along comes AltaVista, and
he can find anything he wants. He can search by date, by a
person's name, by a phrase, by anything he wants and get what he
wants immediately.

By the way, now, there is a free personal version of AltaVista
Search that you can download from the AltaVista site. It will
index all of your mail, and your Word files, and your text
files, and your HTML files. It will index a number of file types
that can't be indexed at the public site.

What I got from talking to Brian Reid was a sense of the value
of disorganized information.

We have been trained to think that order is good and disorder is
bad, but there are times when that isn't the case.

There are interesting possible applications for full-text
indexing as a complement to databases.

Many people have been trained to think, "If I have a lot of
information and need to retrieve it later, the only way to
handle that is with a database." That means you have to have
define fields, categorize information, etc. There a lot of work
that goes into setting that up, and a lot of maintenance work.

In the AltaVista style, there are no categories and there is no
maintenance. It's all there in flat disorganized files, and that
disorder is valuable -- because basically any time that you use
your human intelligence to make judgements on information to
split it up into categories, you are making that information
less accessible and less flexible in the long run.

Think of the old-fashioned work environment where there were
rows and rows of file cabinets. And a clerk would go through,
very carefully filing things, day after day. Eventually, he gets
a gold watch and goes away, and suddenly hundreds or even
thousands of files will never be found -- no matter how well
that person followed the rules.

Likewise, with a database, you are putting information into
pigeonholes. What happens when the categories of the world
change?

Five years ago, there was no Web. Many of the categories that we
normally use in our day-to-day business lives today didn't exist
five years ago. Any set of information that we categorized back
then would be much less useful to us today. And five years from
now in the future, the world will have changed again.

There are many ways in which if you don't have to categorize
information you can have better access to it.

Another thing to keep in mind -- which is more a function of how
search engines work -- is that public Internet search engines
don't index information in databases.

When I talked about Advanced search [in the companion speech, "I
want to find"], I used the example of recruiting employees with
special qualifications. I showed how you could enter the word
"resume" in the query box and then list all the qualifications
you are looking for in the ranking box, and those resumes with
most of those qualifications would appear at the top of the list
of matches.

Today there are over 3500 Web sites devoted to jobs. Every one
of those sites uses a database. And what are they putting into
that database? Just text. They are just putting in resumes and
job descriptions. That's information that would work beautifully
in an AltaVista environment. But because they use databases,
that information is not indexed at AltaVista.

So if I went to the trouble of entering my resume at 200
different job-related Web sites, a headhunter going to AltaVista
and looking to hire someone with credentials just like mine will
never find me, won't even know that I exist. But if I simply put
my resume up as a plain text page on the Web and ADD URL at
AltaVista, then I could be found very easily. And, in fact, the
word "resume" appears on over 800,000 Web pages in the AltaVista
index.

It's an interesting phenomenon that in this case a database
gives you less.

Full-text indexing is very interesting from another point of
view as well. Think of all the vast sets of information in the
world that have never been put into electronic form, because it
would have been too costly to create a database for them, to
categorize them. Think of the property records for the city of
Boston. You know how much money lawyers get to do title searches
whenever real estate changes hands. Well, with this technology,
you could simply scan all that information -- page after page of
it -- post it all as plain text Web pages, and ADD URL at
AltaVista. Then I could search for "99 State St." and get a list
of links to every page where that address was mentioned. Nobody
would have had to organize that information at all, but I could
get back everything that I wanted to know.

There are many huge sets of information like that that if you
think of in this different way, you can do things with it that
you never imagined before.

Remember about eight years ago the Federal government suddenly
decided that they needed to know what country everybody was born
in. They needed to know, among all the people who are living and
working in the U.S., who wasn't born here? So suddenly,
employers were required to include this information in their
Personnel databases so they could report on it. Nobody had it
there. They all had to go back and rewrite their programs and
contact all their employees and fill out the forms that way,
because their databases were static. Now, if in addition to
their databases, they had had, indexed in an AltaVista style,
free-form interviews that Personnel person could have had with
people when they came on board -- "Tell me about yourself..." --
they could have captured much of that right off the bat. So
unstructured information has value in unexpected ways.

If you let your imagination go on that concept, there are
probably ways you could use that technology that had never
occurred to you before.

----------
 site design

Implications of full-text search


Users don't come in by front door

-- you don't control the context, the user's experience

-- need to provide navigation buttons on every page

host: url: and bookmarks

-- AltaVista provides index of your site or any subset of the
Internet

No need to organize information to be able to retrieve it later


What are the implications of full-text search from the
perspective of Web-site design?

This became clear to me from my own little site, back around the
time when AltaVista first came out. In the fall of 1995, there
were some other search engines out there that were already
heading in this same general direction. On Halloween day, I went
to a tele-seminar about the Internet. It was given by a couple
friends of mine who are professors at the Harvard Business
School. Being professors, they had to explain everything in
terms of the whole history of mankind. They were trying to talk
about how to make a successful business on the Web. And they
took, as an example, Virtual Vineyards, which sells premium
wines on the Web.

Their analysis of Virtual Vineyards was that, first, they don't
own the infrastructure -- that's the Internet and that's
available to anybody. Second, they don't own the product,
because you could go down to the corner store and buy the same
bottle of wine. So what is it that differentiates them? What
makes this a successsful business?

Their conclusion was -- the context of the user's experience.
Then they showed a videotape with the people who had designed
the Web pages and ran the business. They talked very proudly
about why they put this button here and that one there. And the
way they described it felt to me like a branching adventure
story, where you are going down a long hallway and are you going
to open the door here? or the door there? And are you going to
fight the dragon or run for the hills? And it's a very long
hallway, and after maybe 20 choices, then you are going to
decide which wine you want to buy.

Now, just around that same time, I had a little article on my
Web site about Halloween. This was something I had written about
20 years before, that had never been published. (I put
everything on my Web site.) Suddenly, around Halloween time, I
was getting three times as many hits on that article as I was on
my home page. So it was beginning to dawn on me that something
was happening. People weren't going through the front door. When
you have full-text search engines, people can come in anywhere
-- any page at your site is a potential entry point. There is
nothing special about a home page. All pages are created equal,
as far as search engines are concerned. And when most traffic is
driven by search engines, don't waste all that time on a home
page. Pay attention to each and every individual page.

So when I heard them talking about how they had designed Virtual
Vineyards, I thought, "They are in trouble." Because all of a
sudden, they didn't have control of that context anymore. People
could come in through the windows or the back door and go
anywhere.

There are still some instances today, but there were many back
then, of Web sites that were designed like the old Burma Shave
ads (if you're old enough to remember those). You'd drive along
the highway and there would be a series of about a dozen signs,
and there would be just a couple words on each, and there were
about a hundred yards apart, and there was a really good
punchline when you got to the end. Well, doing a search at
AltaVista, I would suddenly be reading the seventh sign, and I
wouldn't have a clue what the context was, or where to go to
next, or what was going on. It was totally bizarre.

So if you have a Web site, what do you do in this environment?

Naturally, you need to provide navigation buttons on every page.

My basic design principle is "maximum content for minimum
clicks." I don't want people to have to click 20 times to come
to a decision. It shouldn't take more than three clicks -- two
is even better -- to go from anywhere at a Web site to any other
page at that same site. Keep it easy. Remember, it's easy for
your visitors to click on a bookmark to go back to a search
engine and go anywhere else they want on the Web. At any moment,
there are thousands of competing pages out there. I want to give
them the information they need fast, rather than wait for them
to get frustrated.


Another interesting little trick -- you can do a search for
host: followed by a domain name (like host:samizdat.com) to see
ever page from a particular Web site that is in the index. If I
have been rigorous and every time I add a new Web page or make a
significant change to a Web page, I go to AltaVista and ADD URL;
then I can use AltaVista as a free index of my site.

This dawned on me when people were coming to me and saying, "You
have a lot of good stuff at your site, but it's just too
complicated. There are so many things there. Can't you put some
search software on your site." Well, I didn't want to spend any
money on that. I didn't want to have to think of key words. I
didn't want to have to go through all the maintenance problems
when you are running your own separate search engine at your
site. But AltaVista indexes it all automatically for me now.

First, I did it the simplest way. I made a hyperlink at my site.
"Click here to launch a search at AltaVista and to search for
only pages at this site. Just add your query after what you see
there in the box." They'd click, get to AltaVista, and the query
+host:samizdat.com would already be in the box. (The URL I
linked to was a unique URL generated by doing a search at
AltaVista for +host:samizdat.com. I simply did a cut and paste
to make a hyperlink to that at my site.)

Later, somebody who had seen my site sent me a neat piece of
code that let me do this using forms. So today it looks a bit
more sophisticated. But it's a very simple notion -- if you have
a small site, you don't need to buy software to make it easy for
visitors to search your site,. just piggyback off AltaVista.

----------
 tool for fixing Web sites

Tool to Fix Web Sites


site inventory (host:)

-- security/confidentiality

-- personal information (SSN, home address, home phone)

-- dates (obsolete page or AltaVista hasn't yet indexed latest
version)

-- same title used twice

-- pages without title


Here are some fix-it things you can do using AltaVista.

Somebody could start a business in the next ten minutes, simply
using AltaVista and going to Web sites and seeing the problems
they have. Then send the Webmaster an email message with a quick
diagnosis. "I've seen these kinds of problems at your site. I
could help you fix them very quickly. Just send me a check for
$200." It's just a marketing problem; the technology is sitting
there at AltaVista.

For instance, I can do a search host: followed by the domain
name, and then I can search for security terms -- company
confidential, proprietary, top secret, etc. At a large site,
you'd be amazed at how many instances there are of somebody
accidentally putting up something that shouldn't have been put
up.

Personal information -- many times people will include their
Social Security Number, their home telephone number, their home
street address -- information they probably don't want to be on
the public Web. It was in a printed document before, and
somebody just moved it onto the Web and didn't give it that
extra second thought, "Do you want that on the Web?"


Any time a list of matches comes up at AltaVista, each item has
the date that that each of those pages was posted on the Web. So
in a quick scan I can determine, "60% of pages at this site are
more than six months old. And 25% are more than two years old."


Many Webmasters still don't realize that HTML titles are the
most important part of their pages.

Before AltaVista came along, the HTML title was a throw-away.
Nobody paid attention to it.

This isn't the name of the file. This isn't the name in big bold
letters across the top. This is the HTML title -- part of the
header for the document.

That used to be a relatively insignificant piece of information.
And many folks putting together Web sites and doing cut and
paste to use one page as a template for another can easily make
mistakes. I've ended up with 3 or 4 or 5 pages all with the same
title, because I forgot I was cutting and pasting. And it wasn't
until the next time I took a look at my site with
+host:samizdat.com at AltaVista that I realized that I had made
that mistake.

You also can very easily put up a page without an HTML title.
And even folks who are very knowledgeable about the Internet do
that.

The other day I checked a very professional Internet publishing
site and saw that of the two dozen pages from that that were in
the AltaVista index, more than half showed up as "No Title".

What's the importance of the HTML title? Two things.

First, when I get a ranking list -- that is what appears as the
name of that page on the list. "No Title" doesn't really attract
people to click to go to your page.

Second, in the ranking rules (which I discuss in more detail
later), the HTML title is the number one, most important thing
on your page. When people search for your kind of information,
what are the words they are most likely to use? Those words
belong in your HTML title and also in the first couple lines of
text. When you leave that blank, or put unimportant words there,
or you blunder and put the same title on many different pages,
you've just thrown away the best way to get free traffic to your
site.

Those are the kinds of things that you can tell about anybody's
site very quickly.

----------
 add URL

Adding and Subtracting URLs


normal mode: submit main page, AltaVista finds all the rest

new or significantly changed page: submit that URL

dead URL -- submit URL and AltaVista will delete page from index


Several times I've mentioned ADD URL.

Let's take a closer look. Just connect to AltaVista and scroll
down to the bottom on the page. Just click on ADD/REMOVE URL.
Then once again, scroll to the bottom of the page, where the
form appears. There you should type in the complete URL for the
particular page you want to ADD. When you do that, you will very
quickly get back a message saying that the page has been fetched
and that it will be added to the index in the next day or two.
So making sure that your content is current in the index is
under your control -- it's up to you.

Keep in mind that you don't have to have any special authority
to ADD URL. This is not a directory, like Yahoo!, where the
information provider has to submit information and has to prove
they are who they say they are. No, all you are doing is saying,
"Here's a URL. You wonderful dumb, blind crawler, please go and
check this out." AltaVista doesn't believe a word you say except
that there's probably a URL out there. It will go and check and
bring back whatever text it finds at that address. All it knows
is what it found from that page; not what you told it.

If you give it a URL for a page that doesn't exist, it will come
back with Error 404, which means there is no such page (not that
that the crawler couldn't get there due to some transient
problem, but that a page with that address does not exist on
that server). Then if that page was in the index, it will remove
that page from the index the next day.

This is very important from several perspectives. Say you have
changed the directory structure at your Web site. First, you
should go to AltaVista and ADD URL for all the old addresses to
remove the old information from the index. Then you should ADD
URL for all the new addresses.

Then, as I mentioned [in "I want to be found"] that you could
use the command link: followed by a Web address to find out what
Web pages have hyperlinks to a particular page or a particular
URL. So you can use AltaVista to search for link: for each and
every page that you have moved. Then you can send email to the
Webmaster of sites that have links to those pages that you moved
and ask them to update their links.

Also, very often, when you do a search at AltaVista and it comes
back with hundreds of thousands of matches, maybe two or three
out of the first ten don't exist anymore. And you get upset,
"Why don't they keep that thing up to date." Well, it's up to
you to keep it up to date. We talked about thousands of threads
bouncing around among a hundred million pages. Don't expect that
in your lifetime this thing will ever be perfect. This is
pragmatic. Their job is not to produce perfection. If it were,
the company would go out of business. It would simply be far too
costly to keep the index anywhere near perfection by automatic
means. But whenever you find an instance of a page that doesn't
exist anymore, if you simply click on ADD URL and type that URL
in, then the next day, that page will be removed from the index.
If we can get millions of people doing that on their own, then
the index will be kept up-to-date pretty well. That's a lot
better than writing some fancy new code and putting 50 more
machines out there.

That's my seat-of-the-pants way of trying to deal with that
situation.

----------
 tool to improve

Tool to Improve Web Sites


who links to you -- link:

-- use Advanced and no rank to get all

who links to old pages, old addresses, etc.

embed URLs in your pages for

-- up-to-date indices of your site and/or others

-- complex AltaVista searches tailored for your audience


You can use AltaVista in a variety of ways to improve your Web
site. I already talked about using link: and I also talked about
embedding URLs for specific searches in your pages, such as for
an index of your site.

There is another interesting use of that capability.

In the early days of the Web, quite often you would find Web
sites that consisted of nothing but lists of hyperlinks to other
pages. People would post their carefully constructed lists of
pages on particular topics. The problem was that those lists
soon got out of date -- because so many new pages were added to
the Web every day and so many URLs went out of date as
Webmasters moved files around. So now, rather than scramble to
try to keep such a list current, you can try to construct a
query at AltaVista which produces similar results, and put a
link from your page to that particular query, so visitors at
your site who click on that link will get the latest results.

I could put a whole set up of AltaVista search links on my page,
saying, "if you are interested in X, click here". I could have
carefully constructed all these great queries that are going to
get people exactly the kinds of things they want out of
newsgroups and Web pages. I just cut and paste the unique URLs
that those searches generate at AltaVista and make hyperlinks
from my pages. Then visitors to my site don't have to learn all
the commands to take advantage of the power of AltaVista. I've
done the work for them.

Depending on your interests, you might have a dozen or two dozen
links at your pages that are launching particular searches at
AltaVista that would give valuable information to the kinds of
people you are trying to attract to your site. So you are
providing a useful service to your audience, and you haven't
spent a dime. You've spent a little time maybe doing the very
kinds of searches that you'd want to do anyway.

----------
 what it doesn't index

What AltaVista Doesn't Index (importance of plain text)


sites that require registration/password

databases

dynamic pages

info inside frames

graphics (but image:)

multimedia files (but applet: and object:)

Acrobat and PostScript files

text files larger than 100K truncated at 100K

comments

-- rule of thumb -- design for the blind, label everything
clearly (ALT)


Keep in mind that AltaVista doesn't index everything.

Actually, the way this works is much to the advantage of small
companies and individuals. It wasn't intended that way. It just
works out that way.

All of the expensive, fancy things that large corporations do
lock out search engines.

If you don't have the money or the time or the knowledge to do
the fancy expensive things, you are in a much better position.
You are going to get much more traffic at less cost.

Large corporations doing fancy things and inadvertently locking
out the search engines, then have to spend a lot of money on
promotion to drive in the traffic that they threw away by doing
the expensive things. Consider that logic.


First, sites that require any kind of registration or password
lock out search engines.

Password or not, if there's a box you have to fill in -- this is
a dumb, blind robot. It can't fill out any forms. As soon as it
comes to a form, it stops. That's all the farther it goes.


Databases -- as I mentioned before, a crawler cannot get content
from a database, because it cannot fill out a form.

Now if your are talking about the intranet version of AltaVista,
there is for that a tool kit, which allows you to take some
information from databases and put that in a form that AltaVista
can index. But that works because you are in charge -- that's
your database. The crawlers from the public AltaVista Search
site is not going to dig into other people's databases and pull
out information.


Dynamic pages -- it seems like a great idea to make it so
everybody who comes to your site gets a unique experience. And
there's some excellent software that let's you pull pieces out
of databases to create unique user experiences based on cookies
or on profile information.

But there is one problem with that. When a search engine crawler
arrives, that's like dividing by zero. The crawler halts
immediately, because it sees ahead of it an infinite number of
pages.

This is one reason why nobody can say how many pages there are
on the Web, total. Every dynamic site has an infinite number of
pages. How many millions of dynamic sites are there out there?


Information inside frames -- I love this. This wasn't
intentional. This is true of all search engines today. AltaVista
will index what's in the outside of the frame, but not what
appears in the window of the frame. Unless you have a no-frames
version of your page, the information in the window of the frame
does not get indexed.

I love that because I hate frames -- in most instances. There
are some very good uses for frames. Actually, I was thinking of
using frames for the on-line version of this tutorial, with the
script on the outside, and the examples -- with live searches at
AltaVista -- could appear inside the frame.

But in most instance, the frame is just a nuisance, making it
easy to put a flashing ad in front of my face, and eating up my
screen space. When I'm using a laptop with a small screen, it is
a serious problem to have a third of my screen space eaten up
with a useless frame.

One way or another, they are firmly punished by search engines
for using frames.


AltaVista also won't index graphics. Have you ever been to a
site that has a huge picture that takes two or three minutes to
paint across your screen at modem speeds? And all the words are
embedded in that .gif. A search engine can't do a thing with
that. Unless the Webmaster put ALT text behind the picture,
describing it and listing those important words, just like a
bllind person would stop right there, the crawler stops.


Multi-media files (audio and video) and information that's in
Java applets can't be indexed.


These are limitations today. I don't say that they will always
be limitations. At some point in the future we will have very
good voice recognition and it will become possible to index
voice files. At some time, it will be possible to quickly and
accurately match patterns and you may be able to search for
images. But if you are designing your Web site for today, you
need to be aware of the consequences of you page design for
search engines.


Acrobat and PostScript files -- the intranet version of
AltaVista can index those today, I believe. But the public
AltaVista Search site, which drives traffic to your public Web
site. the public site does not.


Another strange limitation is a pragmatic compromise, intended
to help optimize the performance of AltaVista. They will only
index the first 100 Kbytes of text. So if you have an entire
book, it's best to break it up into chapters, and then all the
text can be indexed.

They will pull out the hyperlinks from the whole document, but
they will only index the first 100 Kbytes.


Comments aren't indexed at all.

When AltaVista first came out, there were a lot of wise-guy
Webmasters who thought they were going to fool everybody. "I
figure everybody searches for the word 'sex.' I don't have any
sex at my site, but I want people to stumble across my site. So
I'm going to put the word 'sex' three thousand times as
comments. And any time that anybody searches for 'sex' that will
come up first."

People actually tried that. They tried doing the same kind of
thing in wallpaper. They tried everything imaginable to fool
search engines.

But two factors come into play here. The first is that AltaVista
doesn't index comments at all. The developers presumed that
comments are private communications by Webmasters to themselves
and to others who work technically on the site. It's information
that is not meant to be public and hence should not be indexed.

The other thing is that AltaVista only counts to two. They
designed that in at the the beginning. They didn't want the Web
to get totally corrupted with people repeating words uselessly.

You can imagine how bad it could get. Just look at a phone book.
How much of the phone book is AAAA... from people trying to be
near the front of the phone book. And you can imagine how
cluttered the Web would be with useless text if people thought
that repetition actually got them to the head of the list. "I
have 5 million copies of that word here. Why aren't I higher on
the list?" It would have been an endless disaster. Fortunately,
they avoided that.


My rule of thumbs is -- design for the blind. Label everything.
Of course, that is easier to do with your personal site or an
intranet site than it is with a public corporate site.

The blind are some of the best users of the Internet today. They
use text-only browsers and text-to-voice converters, and they
are able to navigate very well unless people put up these
blockades and brick walls of pages with big images that you need
to see to understand.

If you need to have a picture, be sure to label everything
clearly with ALT text in the background, to explain what a
sighted person would see.

Remember, a crawler really is very much like a blind user.

----------
 ranking rules

Ranking rules (so you'll be found and appear higher on results
lists


HTML title

first couple lines of text

metatags

repetition doesn't work


I've talked about some of these along the way.

The HTML title and the first couple lines of text are the most
important part of your pages.

Say you want to put your resume on the Web. Keep these rules in
mind. You don't put your name first. Who cares about your name?
If they know your name, that isn't somebody your trying to have
find you. They already know you. You want to be found by people
who never heard of you before. So don't waste any letter in the
HTML title on your own name. The first word should be "resume".
After that, list your main qualifications and the kinds of jobs
that you are looking for. Put the same kinds of things in the
first couple lines of text. That's what will come up as the
default as the description in match list, and it's also the
second most important position for ranking.

Further down the list comes metatags. (Metatags are brief
instructions that that you can put in the header of your Web
pages. For details on how to use them, check the help files at
AltaVista.) As I mentioned before, many corporations
misunderstand the role of metatags.

Yes, there is a metatag for "description." It's very simple to
do. You just put follow the metatag format and enter a couple
lines that describe what your page or your site is about. These
are the words that you would like to appear as the description
for your page in results lists at AltaVista, instead of the
default, which is the first couple lines of text.

Yes, you can do that. But what people don't realize is that for
ranking purposes, the first couple lines of text still takes
priority over the metatag.

Also, you can have a metatag for "keywords." But people
misunderstand that term "keywords" and they start acting as if
AltaVista were a database. The purpose of the keyword metatag is
simply to allow you to throw in some synonyms -- words that are
appropriate for what's on your page, that describe what's there,
but those words may not actually appear on that page.

One of the best uses for keyword metatags is for foreign
translations of the main words on your page, so, for instance,
somebody searching in French will find that page.

But many Webmaster think that by putting words in keyword
metatags they are getting some advantage in the ranking. No,
those words are worth little more than any other word in the
main text of the page. There is nothing "key" about it. You have
simply added a few more words to the page in a place that is not
visible.


As mentioned before, repetition doesn't work. AltaVista only
counts to two.


If you keep these ranking rules in mind, it should be relatively
easy for you -- whatever little site you are running -- to get
high ranking on the searches that matter most to you, because
large corporations haven't caught on and are often so tied down
with graphics rules related to branding that they can't make the
sensible adjustments. You'll see many corporate pages are in
frames and frames only, and have HTML titles that have no
significance for search purposes, and are set up so the first
words on the page are totally random because they just happen to
be associated with graphic elements dictated by branding --
words that don't really tell anyone anything. So, basically,
they are throwing away the opportunity to have search engines
drive traffic to their sites.

What does it mean to drive traffic to a site? Take the example
of my own little site on free Web space I get from an ISP. I
have over 700 documents at my site. Some of them are complete
books. I do nothing to publicize the site. I simply make sure
that everything is indexed at AltaVista. On an average day, I
get about 600 to 700 visitors. Since the end of May when I
started getting good statistics from Acunet, where the pages
sit, I've had over 75,000 different people come to my site.
That's what you throw away when you do the things that stop you
from being properly indexed.

I love this environment where the little guy can compete on good
terms.

And I would strongly suggest that the people who are in charge
of branding rules at large corporations should take a good look
at how AltaVista works and find a compromise that gives them the
right look-and-feel, but doesn't chase traffic away. Because you
then have to spend money trying to attract that same traffic.

----------
 metatags

Metatags


plain clear English = better

description

key words (good for synonyms)


Metatags -- the basic rule of thumb for metatags is that plain
English is always best. Metatags are a band aid to help you deal
with pages don't clearly state what they are about in clear
text, right up front. Do it right to begin with, and you don't
need metatags at all, and you'll get far better results in terms
of search engine traffic than you would if you depended on
metatags.

Keep in mind that the Internet is a different place than what we
are normally used to. One of the worst things you can possibly
do is take existing brochures and other material and simply put
it unedited up on the Web. Marketing brochures are not written
to inform. They are written to tease. They are not supposed to
answer questions. They are supposed to make people ask more
questions and ask for the next brochure and then ask again and
then talk to a salesperson. If you give an answer, they are
afraid that the salesperson will never get in the loop.

The Web is different. People on the Web want answers. They don't
want to have to click forever to get those answers. And if they
don't get the answer from you and get it quickly, they are going
to go to somebody else and get it there.

So you want to write material for the Web in plain clear
English. This is both for purposes of getting properly indexed
and ranked, but also for properly serving your audience.

----------
 exclusion

Exclusion


directory indexing feature (turn it off)

can exclude individual files or directories

can use it to try to control user context

newsgroups (X-No-Archive: Yes)


Exclusion -- there are some times when you might not want to be
indexed.

Say you just created some pages, and you want to get comments
from a few people, but you aren't ready to tell the whole world
that the pages are live. You just want to tell a few people to
come in and test them before you make your public announcement.

AltaVista and many of the other crawlers obey a Robot Exclusion
Standard. You just create a simple little text file and name it
"robots.txt". You can be very precise about what you want to
exclude. You can exclude a particular crawler or all crawlers
from your entire site or from particular directories or from
particular files. You can get the details about how to do it in
the help files at AltaVista.

You can't do this in an individual Web document. You have to do
it in the top level directory of the Web server. So if your
pages are hosted at an ISP, you'll need to ask the ISP's
Webmaster for help.

In any case, if you want to use robot exclusion, keep in mind
that Web server software often comes with a directory indexing
feature. If your software has that feature and that happens to
be on, then any crawler that comes to you site is going to grab
everything right out of the index. So even if you had set up for
robot exclusion, that wouldn't do you any good. So the first
thing you have to do is shut off the directory indexing feature.

I talked earlier about how you no longer have control over the
context of the visitor's experience. Excluding search crawlers
from particular files can give you a way to reassert a little
bit of control over the context. For instance, if I wanted to do
a trivia contest at my site, I could put robot exclusion on the
pages with the answers; so people wouldn't be able to find those
pages randomly -- they'd only find the pages with the questions.

If you want to do a one-two punch like that, robot exclusion
lets you.


Newsgroups -- one thing that people liked about newsgroups in
the past was that they were, in many cases, close-knit groups or
communities. A few thousand people might "lurk" or just read
some items without ever posting, and maybe a few hundred would
regularly participate to some extent, and maybe a dozen or so
would be really active. And when you posted something to a
newsgroup people understood what you said in the context of the
continuing dialogue, the on-going threads of discussion. You
might be sarcastic. You might make remarks that out of context
could be misinterpreted. But you felt comfortable because you
knew who the audience was. Now people are coming in by way of
AltaVista and other search engines, and all of a sudden your
remark can be seen totally out of context. So, if you feel that
there is any reason that you would be embarassed if your boss or
spouse or anybody else saw this item out of context, then when
you post it all you have to do is include in the first line the
terms "X-No-Archive: Yes". Lots of newsreader software has that
as a choice to fill in. In that case, just add the "Yes" on that
line. Otherwise, type it in as a line.

----------
 flypaper

Flypaper


friends find me -- why? what are they really looking for?

Ebooks Multimedia

Spit and Iceland

Trudeau, Aronsohn, and Heather

Barb and Elcom (database vs. Web page)

Bill Ransom, Deane Rink, etc.


This is my last slide, aside from a list of references for
followup. It's also my favorite.

This is a principle called "flypaper," which I discovered at my
little Web site. It comes from the combination of search engines
and lots of personal Web pages.

I started getting two or three email messages a week from old
friends I hadn't heard from for twenty to thirty years. At
first, I thought, "Gosh. Amazing. They're looking for me." Then
I started thinking, "Why would they be looking for me?"

I hardly knew these guys. I wouldn't look for them.

I pushed back. I found out they weren't looking for me. They
were looking for themselves.

They went to AltaVista and searched for their own names; and
there's so much stuff at my site that anybody who had any
contact with me over the last thirty years or so is mentioned
there. They found themselves, and they wrote to me.

The more I talked to people, the more it became clear to me that
this is a matter of human nature mixing in strange ways with
technology.

Human nature is that people first look for themselves or for
their friends or for things that are near and dear to them. And
only after they have done that do they do research.

We designed this site for research, but it's used for these
other purposes, probably far more.

But you'd never learn that by checking the logs at AltaVista.
Because the logs will give you the top terms that people have
searched for. Well, Joe Blow from Minneapolis looking for his
own name won't appear very high on that list. But that's the
first thing he'd look for. And lots of people are doing that.

So it occurred to me, there's a business model there. If people
act that way, I can build a better mousetrap.

Instead of going out with flyswatters to try to find things,
I'll create flypaper to get them to come to me.

Say, for instance, you have a business proposal. You've been
trying to get through to someone who you know uses the Web. This
guy doesn't answer your email, doesn't return your phone calls.
You have a really good message for him.

Well, create a Web page. The first word in the HTML title and
the first line of text -- the guy's name. Then add in those same
places everything you can think of that is near and dear to him.
Then you add the message that you really want him to see.

You need to have hyperlinks to this page from anywhere in the
world. You just go to AltaVista and ADD URL.

The next time that guy searches for himself, he finds you.

Believe me, it really changes the whole dialogue. He's coming to
you; you didn't go to him.


There are two kinds of flypaper. That's what I call "targeted
flypaper." Another kind of flypaper is "generalized flypaper."

I put anything and everything at my site. One of the items I
posted is a list of every book I've read for the last 39 years
-- since I was in junior high school. I'm obsessive. I've kept
such a list. It was in electronic form. It tooks me just a few
minutes to post it on the Web. So why not? It's my site. Who
cares? If somebody wants to come to it, fine.

It turns out that that is the most trafficked page at my site.
And I get email from authors, their editors, and their agents.
They are searching for themselves and then for the stuff that's
near and dear to them.

I get very good dialogues going with them and with readers who
particular like those same books.

If you start thinking in that sort of vein, then the kinds of
material you put on your Web site will be very different from
what you are probably doing today.


I have a few examples here of bizarre things that have happened
to me.

Back in the early 1970s, I self-published a book of mine called
The Lizard of Oz. That was back when a lot of people were
playing around with self-publishing and small press publishing.
All of sudden, thanks to photo-offset, it had become cheap to
print; and we made the mistake of thinking that printing was the
same as publishing. So we printed all these books and went
around to small-press bookfairs and met a lot of people and had
a lot of fun and sold very few books. You couldn't get
distribution, you could't get them into book stores.

So this book of mine had been gathering dust in the basement for
over 20 years. And along comes the Web and all of a sudden
distribution is free. So one of the first things I did when I
got my own free little Web space was to put up the full text of
just about everything I had ever written.

Shortly after I put up The Lizard of Oz, I got email from a
little company in San Francisco that does interactive CD ROMs
for kids. A brand new company. They couldn't afford the time or
the money to have acquisition editors out looking for material.
So they were using AltaVista to search the Web for stories that
might be useful for their product line. They found my book. They
loved it. And two weeks later, we had a contract. They still
haven't come out with the CD ROM. I hope they do it soon. But it
was a very good contact, and one that I would have never made. I
could have never found them, because they were new and wouldn't
have been listed anywhere. But they found me. And because they
found me, it was a very quick and different kind of dialogue. I
wasn't trying to sell the book to them. They were trying to
convince me to let them publish it.


The next example is even more bizarre.

I wrote a movie script a long time ago and nothing has ever
happened with it.

Now this didn't lead to a business deal. I'm not going to be
famous tomorrow.

This was an eye-opener though.

I got email from a producer in Iceland. We went back and forth,
and I sent him the script. He was interested, but we didn't
arrive at a deal.

But would you ever have imagined that there were producers in
Iceland? If you were trying to sell a script, would that have
been on your list of places to try?


The next example is totally off the wall. This was due to my
list of books.

I got email from someone who collects Gary Trudeau books (the
cartoonist who does Doonesbury). He saw that I had read
Trudeau's first book, which he had self-published back when he
was an undergraduate at Yale. I happened to have been at Yale at
the time. I picked it up at the Co-op. It had been gathering
dust ever since.

I thought, "Okay. I'll sell the book. I don't have any real need
for it. I haven't looked at it in nearly thirty years."

Well, the guy took a further look at my Web site. He saw that my
daughter is into acting. She goes to Sarah Lawrence. This year
she is doing her junior year in London doing drama. And at the
time we were doing this correspondence, she was in Los Angeles
for the summer, staying with my sister, trying to get summer
jobs in acting. It turned out the guy who had done the query was
Lee Aronsohn, who at various times has been executive producer
and writer of a number of popular TV series, such as Cybill and
Grace Under Fire. He, out of the blue, suggested that in
exchange for the book, my daughter could get an audition for a
show.

She didn't get a part, but she got an audition with someone who
has won Emmys for casting. She got to meet the people. She got a
sense of how the business worked. This was an invaluable kind of
experience for her.

There is no way in the world that I would have ever come up with
that as a business model.


The general messag from that is -- don't limit yourself to your
own imagination.

You have ideas sitting in your drawers. You have works in
progress. You have things that are almost there. Don't you edit
yourself out of the possibility.

If you have something that might work, that might fly, that
somebody might interested in, put it up and get it well indexed
and see what kind of response you get.

You might be afraid that someone will "steal" your idea.

Remember that there is such a thing as copyright law. When I put
a book up, like The Lizard of Oz, I include a copyright notice.
And I say, "Permission is hereby granted to disseminate this in
electronic form for non-profit purposes or for you won purposes.
If you wish to do something commercial with it or to print it,
then contact me. Here's my email address."

Now what protection does that give me? Well, if somebody is
going to do a commercial edition of that thing, and they are
going to make a lot of money out of it, I'm going to hear about
it, and I'll sue them. And that would be fine.

If somebody wants to plagiarize... If I were paranoic I could
use AltaVista periodically to search the Web for selected
paragraphs of my works. I don't do that all the time, but I
could.

The more frequent question I get from an audience like this is
the fear that somebody will take you idea. I'm talking about
writing fiction for kids and copyrighting that. But if you have
an idea for a new business, what's going to happen if somebody
steals that idea?

Take my advice with a grain of salt. But I believe that if the
world is ready for an idea, more than one person is likely to
have it. And your best protection is to let the world know that
you have this idea so that you then become part of the dialogue
as it goes to its next stages. And you begin the dialogue and
take it to the next step. You get your writings about this
subject well indexed. You get in touch with other people with
similar interests, and maybe you build a company out of the
people you pull together that way.

But, if you take that idea, as we always have in the past, and
keep in the bottom drawer, afraid that somebody is going to
steal the idea; then two years later, you are going to read in
the Wall St. Journal about a $10 million company that's doing
that same thing that you had the idea for.

If you are talking about software code and the kind invention
that typically involves patent lawyers, that's another matter
and you should get legal advice. But if you have a business
idea, I'd say, tell the world about it; don't hide it. Tell them
as loudly as possible, so everybody knows that you told them
about it. And then you can become part of the on-going dialogue.

The underlying principle of this advice is the realization that
the Web is an awful place to put totally polished, finished
text. We see one after another that on-line magazines are too
expensive to sustain themselves, and they fold. But the Web is a
great place to put ideas and works in progress.

When you put finished text on the Web, that's like saying, "I
have just returned from the mountain. I have seen it. Don't
bother to write me. This is the final answer."

What you want to say instead is, "I think is headed in the right
direction. I might be 80% there, or maybe 50%. But I feel good
about this. I want to talk to people out there and take this to
the next step. Please send me your reactions."

Then when people send their reactions, treat them with respect.
And with their permission, if they've written something cogent,
whether they agree with you or disagree with you, add it to your
document as a letter to the editor and then go back and index
again at AltaVista and let that document grow.

I mentioned my Halloween article. That's what I did with that.
It's just a little article -- a couple pages long. Now it
rambles on and on and on, because for three years I've had
people sending me messages saying they agree or disagree
strongly, and they often write very well about it. I've added
those comments as letters to the editor, and some of it is very
informative and some is well-argued opinion. But your one little
static document suddenly becomes the start of a social
experience, without needing any fancy software to do it.

Take those idea that you have been editing out, that you've been
chopping off at the legs, and give them a chance. Let the world
find them. Let the world do something with them.

To me that's what's exciting -- when you put the pieces
together: your own creativity and the power of a search engine
like AltaVista.

----------
 reference


Followup and Future Reference


Richard Seltzer -- [log in to unmask], [log in to unmask]

http://altavista.digital.com

http://www.digital.com/info/internet

http://www.samizdat.com/

Business on the World Wide Web, live chat sessions, Thursdays,
noon to 1 PM Eastern Daylight Time (GMT -5),
http://www.web-net.org

AltaVista newsletter (Cobb Group) http://www.cobb.com/alt


If you haven't done so already, you might want to go through the
companion tutorial "I want to find -- a speech on how to use
AltaVista to find what you want on the Web and in newsgroups.

Remember, please feel free to send me your questions at
[log in to unmask] I'll do my best to get back to you
promptly. And in the near future, we plan to post the most
interesting and useful questions and answers at this Web site.
If demand warrants, we may also open a chat room, with regular
hours for you to ask your questions there.

----------
End of Document






ATOM RSS1 RSS2