LISTSERV - VICUG-L Archives - LISTSERV.ICORS.ORG

VICUG-L Archives

Visually Impaired Computer Users' Group List

VICUG-L@LISTSERV.ICORS.ORG

LISTSERV Archives

VICUG-L Home

Subscribe or Unsubscribe

Search Archives

Options:

Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message:

[<< First] [< Prev] [Next >] [Last >>]

Topic:

[<< First] [< Prev] [Next >] [Last >>]

Author:

[<< First] [< Prev] [Next >] [Last >>]

Subject:

Re: Question About Message Format

From:

Kelly Pierce <[log in to unmask]>

Reply To:

Kelly Pierce <[log in to unmask]>

Date:

Thu, 12 Jun 2003 19:30:23 -0500

Content-Type:

text/plain

Parts/Attachments:

text/plain (129 lines)

From: "Jacob Joehl" <[log in to unmask]>
Sent: Wednesday, May 28, 2003 11:24 AM


> Hi all.  I am wondering what the =20 symbols in several messages on
this
> list are.  Thank you.

Dear Jacob,

This question is asked occasionally on the list.  Here's a highly
informed response offered to this question in 1998.

Kelly



From [log in to unmask] Jun 30 06:47:40 1998
Date: Tue, 30 Jun 1998 06:46:11 -0400
From: Saul J Rosenberg <[log in to unmask]>
Reply to: Saul J Rosenberg <[log in to unmask]>
To: [log in to unmask]
Subject: =20 and international text

Its an interesting question.

The equals codes support an early try at the internationalization of
global email systems.  That they show up in your email reader is a
result of uneven software support.

The =20 at the end of some paragraphs represents a space character,
or a paragraph end character, and can be ignored.  It is an artifact
of whatever system is used to generate the text.  Other "=##"
characters often appear for characters that are not in "7 bit Ascii"
such as true quote marks, or letters from European languages that are
not in English, or letters combined with diacritical marks (such as
an e with accent grave).

Another common use is a single equals sign at the end of lines,
not followed by any digits.  This represents an optional inserted line
wrap, when the original author typed a long paragraph as a single
continuous line, and the email system inserted temporary line
breaks every perhaps 60-70 characters as a kindness so that mere
mortals could read the text.

---------------------

What follows is a deeper explanation for the technically inclined ...

The equals codes pass special characters transparently thru the global
email system, without them being mangled.  Without these codes, a
person at a Spanish computer may enter a "c-cedilla", but it might
show up on a system from a different country as a different character,
or be deleted as an invalid code, or worse, happen to match an internal
software flag byte and totally screw up the rest of the message.

Since the equals codes represent these foreign characters using
vanilla 7 bit Ascii (equals plus digits), they pass as-is thru
all email systems.  In theory, your email reader, knowing the original
language's character set, should reassemble them on your end to
the desired character.  That many email readers don't do this
speaks to the many varieties of email systems out there, and the
many different character set representations for the many languages
and dialects.

The base problem is technical -- most current systems use one byte
(8 bits) to hold one character.  8 bits can represent 256 symbols.
But there are more than 256 characters and variations necessary,
so there is a perpetual squabble because no one standard can fit all.
And even if they managed to squeeze in the major European languages,
there is still the rest of the world (remember them ?)

There is a new technical solution that is taking hold.
The latest character representation, called UniCode, uses two bytes
(16 bits) per character, which can hold 65536 symbols.
That is ample to cover not only Europe, but most of the world,
even including Chinese / Japanese / Korean and other oriental
languages which have thousands of symbols.
(To some people's regret, they turned down Klingon :-)

Using 16 bits rather than 8 doubles the (uncompressed) storage
requirements.  However, computer systems and disks are growing more
powerful and larger very quickly, so more systems are expected to
support UniCode over time, for true international text support.
(a motivation to give up on DOS)

Unicode will both make it easier and harder for screen readers
used for international text.

Easier because there is one unique character for every symbol.
The screen reader does not have to guess based on which language /
dialect it thinks is the default for that message.

Harder, because there are 30,000 plus symbols for the full Unicode
support.  (English centric non-Politically Correct remark -- most
of these are not used for English / European letters, so our
transition will be easy.)

Simple questions often have interesting answers.  Hoped this helped.

Regards
Saul

-------------------

> From: Ted Martin <[log in to unmask]>
> Date:         Mon, 29 Jun 1998 16:55:25 +0100
> Subject:      =20

> Could somebody please explain to a novice, and a newcomer to this
> list, why =20 appears at the end of every paragraph in articles
> taken from journals such as the New York Times?

> Just curious.
> --
> Ted Martin

Saul J Rosenberg                       [log in to unmask]
Object Developers Group, Inc (ODG)    http://www.objdev.org


VICUG-L is the Visually Impaired Computer User Group List.
To join or leave the list, send a message to
[log in to unmask]  In the body of the message, simply type
"subscribe vicug-l" or "unsubscribe vicug-l" without the quotations.
 VICUG-L is archived on the World Wide Web at
http://maelstrom.stjohns.edu/archives/vicug-l.html

ATOM RSS1 RSS2

LISTSERV.ICORS.ORG