Re: Cohen's paper on byte order
Date: 04/30/03

Date: Wed, 30 Apr 2003 03:37:06 GMT

Douglas A. Gwyn ( wrote:
: wrote:

: > It happens that serial data, sent through an RS232 port, is sent LSB
: > first, to allow interoperability of 7-bit and 8-bit ASCII. (It would break
: > down if 9-bit ASCII were used, though, because of spurious start bits.)

: I think you're using "ASCII" in a notional sense. ASCII itself is
: inherently a 7-bit code; a transmitted RS-232-C frame includes one
: start bit (SPACE), 7 data bits, an optional parity bit, and at
: least one stop bit (MARK), leading to a confusion of UART data
: with ASCII codes..

There is now an 8-bit international standard which uses the American
version of ITA 5 in the first half.

You are correct that the standards ASCII-65 and ASCII-68 are 7-bit codes.
However, in normal conversation, any national-use variant of ITA 5, any
binary character code whose first 128 characters correspond to those of
ASCII, can be loosely referred to ASCII.

I'll admit that there are few people who would go as far as to refer to
UNICODE as "16-bit ASCII", however.

: > And because of those situations, the "cultural" reasons to favor
: > big-endianness are sufficient. (It's also easier *pedagogically*
: > to teach people how to understand how machine language works
: > using a big-endian computer as an example.

: I don't think either of those has been convincingly demonstrated.
: In fact, the intricacies of subdividing words into smaller
: uniform units is tamed very nicely by consistent use of *little*
: endian conventions.

: In fact a great number of popular processor architectures,
: including the ubiquitous Intel x86 family, have adopted
: little-endian conventions.

: In my opinion, a natural cultural bia toward big-endianness is
: a good reason to teach the *opposite* convention, to clarify
: the concepts involved. Maybe that would have cut down on MKS's
: confusion in this thread.

Now, I can recognize the legitimacy of this reasoning. It does make sense
to clarify concepts by illustrating systems that are different from what
one is used to. Thus, in the "new math", students were introduced to
arithmetic in different base systems.

It all depends on one's goals. Yes, in teaching the *theory of
computation*, an understanding of underlying principles is needed. But in
the early stages, when the goal is understanding the basics - what a
program is, and how to write one - building computers that are
little-endian creates an unnecessary layer of difficulty.

Computers should be as easy to understand as is possible _without_
compromising performance. There was a time, with some short word-length
computers, when multi-word arithmetic was slightly simplified by using
the little-endian convention. On the other hand, it means that string
comparisons can't be applied to unsigned integers.

: > For that matter, using decimal instead of binary
: > would _also_ be an advantage, but that _would_ involve too much
: > inefficiency to tolerate.)

: In fact, several successful early computers used decimal
: representation for integers (usually but not always as BCD,
: i.e. four actual bits encoding a base-10 digit). It wasn't
: much of an advantage and is now relegated to the dustbin of
: history.

I'm well aware that early computers used BCD, or excess-3, or even did
arithmetic on character strings (BCDIC, in the IBM 1401). This made sense
when values were being input and output after performing only a small
number of calculations, and they needed to be stored in a human-readable
form. It's still true, because some operating systems use special
characters to delimit records in files, that it is sometimes difficult to
store data even on hard drives in binary. This is another convention I do
not approve of.

Except for using a character code where the letters of the alphabet were
noncontiguous, the IBM 360 did it right. The internal data formats all
were easily understandable, data files could contain binary and character
fields without any interference in determining the end of a record, and
even instruction opcodes could be decoded by hand from a hex dump.

Try that with an Itanium! It has instruction formats where the bits of an
address field are scattered through one or more 31-bit instruction words.
Of course, only the compiler has to get its hands dirty with the machine
code, and designing it to fit everything in, and allow the maximum
optimizations does make sense. But deliberate obfuscation can be avoided.

Intel happened, of course, to have the 8088 available, while Motorola
didn't see the point of a 68008 until later. It is a happenstance that
gave Intel a giant market for its chips, giving it the money to finance
R&D expenditures no one else can match. Sadly, neither the Amiga nor the
Atari ST took off explosively; but then, they weren't open systems, like
the IBM PC and the Apple II, so that may not have meant all that much,
even if they had.

John Savard

Relevant Pages

  • Re: Im still amused.
    ... attributes 'Pos and 'Val to reference the position in ASCII of some ... character from ASCII. ... The computers' internal representations (whatever they are, binary, BCD, ...
  • Re: Explanation of this Python language feature? [x for x in x for x in x] (to flatten a nested list
    ... In some of those cases, the localisation was done by companies like IBM, ... realising that if they wanted to sell computers ... Just because ASCII exists doesn't mean everyone uses it ... With the demise of EBCDIC as the standard character encoding (actually ...
  • Re: Im still amused.
    ... What representation do you prefer? ... (There exist some computers which do decimal computations more or less ... attributes 'Pos and 'Val to reference the position in ASCII of some ... character from ASCII. ...
  • Crazy HTML...???
    ... new character for something, they could pull one out that they liked, ... Another idea that goes along with this is that ASCII be really, ... Computers are so stupid. ...
  • Re: what does "serialization" mean?
    ... Sorry eddie, but you're dead wrong there as usual. ... >>How about ASCII character 0xB0, ... > Totalitarians and Fascists are often self-appointed language police. ...