Recent Furore over ASCII presentation in Decimal form as being Standard.



From Wikipedia, the free encyclopedia

• Ten things you may not know about images on Wikipedia •

The Universal Character Set (UCS) is defined by the ISO/IEC 10646
International Standard as a character set on which many encodings are
based. It contains nearly a hundred thousand abstract characters, each
identified by an unambiguous name and an integer number called its
code point.
Characters (letters, numbers, symbols, ideograms, logograms, etc.)
from the many languages, scripts, and traditions of the world are
represented in the UCS with unique code points. The inclusiveness of
the UCS is continually improving as characters from previously
unrepresented writing systems are added.
Since 1991, the Unicode Consortium has worked with ISO to develop The
Unicode Standard ("Unicode") and ISO/IEC 10646 in tandem. The
repertoire, character names, and code points of Version 2.0 of Unicode
exactly match those of ISO/IEC 10646-1:1993 with its first seven
published amendments. After the publication of Unicode 3.0 in February
2000, corresponding new and updated characters entered the UCS via ISO/
IEC 10646-1:2000.


-----------------------------------------------------

*Unquote.* from here onwards - adcrypt

This is the rock hard standard and anybody who has taken the trouble
will note that ASCII is now absorbed into Unicode and of course
Unicode itself is only a part of the larger ISO 16046. ASCII is right
at the beginning of the huge library of language codes in Unicode and
the reader will no doubt notice that although the characters are in
hexadecimal representation form, for the writable ASCII Latin_1
characters that I use, This virtually still means decimal since the
number (95 characters) hardly comes on scale as HEX being such a
small number (the 95 writable keyboard characters being used by me in
my cipher are also 95 in HEX). It is still quite in order in my view
therefore to say decimal representation of ASCII as the way of
expressing ASCII numerical values until Unicode becomes totally
widespread as ‘fait accompli’ in worldwide commerce, this is not the
case at present . Strictly speaking I should be using the hexadecimal
representation of ASCII and certainly not as suggested a binary
representation.

The UCS specifically highlights (last line paragraph 1 above) that a
code point has to be identified by an unambiguous name and an integer
number – not, note well, a binary number which is not an integer per
se and is not unambiguous given the many implementations of binary
numbers – complements, guard-bits, etc.

Obviously and sensibly, the Unicode Consortium specifies binary
'encoding forms' to be used in order to manage and present ASCII and
other national codes (we should not be talking about these now but
should be saying Unicode) in binary representation for the convenience
of computing but the Scientific standard that is ISO 10646 is and
still remains without prejudice just that, an ISO standard that does
not use binary.

Suggesting binary numbers as the standard way of presenting ASCII
(being now a part of UCS) is not acceptable in ISO 10646. Binary
representation of ASCII is an 'encoding form' that is designated by
the Uniocode Consortium and is simply a nuance of a standard but
nothing else.

The irrefutable fact remains also that ASCII character values are best
known in object-orientated programming languages as enumeration types
that have integer positions and integer values in the code and it is
sensible to think and talk decimal representation and not binary
representation.

In any case decimal representation of these positions must be done by
decimal integers (not binary integers) as required by ISO 10646.

Binary representation is really only suitable for academic tutorials.
It was a misfortunate event first day for cryptography,that is, using
byte values as the way of presenting alphanumeric data, when the
better way is to present alphanumeric data as itself, with no binary
intermediary form except perhaps the machine code of email
transmissions. That is what my cipher is doing now.

Binary representation of alphanumeric characters has sent many good
people down the wrong path in cryptography!

The defence rests.

QED - Adacrypt
.



Relevant Pages

  • Re: Enhanced Unicode support for "Go" tools
    ... Right, you know ASCII? ... accent characters used in French and other European ... UNICODE isn't just about all the different alphabets out ... out wrongly because the character set the file was written in is ...
    (alt.lang.asm)
  • Re: ASCII Requires a Temporary Substitution During Encryption.
    ... ASCII has now been replaced by Unicode: ... makes ascii 00 and then the 94 standard characters ...
    (sci.crypt)
  • Re: Unicode Support
    ... > | single bit extra from ASCII for any ordinary ASCII characters... ... UNICODE character then check what "range" it's in with the table ... 7-bit ASCII characters are encoded in exactly the same way in UTF-8 ... All non-ASCII characters use a multi-byte sequence ...
    (alt.lang.asm)
  • Re: 128 bit password
    ... AdMod is ascii based, it doesn't write unicode. ... If I used the unicode version of ldap_mod it would likely be limited to 127 unicode characters. ... Joe Richards Microsoft MVP Windows Server Directory Services ...
    (microsoft.public.security)
  • Re: Question about ACHAR and IACHAR.
    ... programming everything using ASCII codes ... capable of representation in the default character type."I can't say so ... From the S/370 days, there were two ASCII characters not in EBCDIC, ... In S/370 EBCDIC there was no tilde or carat character. ...
    (comp.lang.fortran)