Re: Bypassing of web filters by using ASCII



On 21 Jun 2006 at 13:11, k.huwig@xxxxxxxxx wrote:


1. problem description

The character set ASCII encodes every character with 7 bits. Internet
connections transmit octets with 8 bits. If the content of such a
transmission is encoded in ASCII, the most significant bit must be ignored.


Not quite. The most significant bit must be set to zero when encoding (from RFC 20: "For
concreteness, we suggest the use of standard 7-bit ASCII embedded in an 8 bit byte whose
high order bit is always 0"). So a byte whose high bit is set is simply illegal in US-
ASCII. Which leads to the following point:

In case a message contains message description (in our case, charset specification, i.e.
charset=US-ASCII) which is inconsistent with the message data (in our case, data out of the
charset specification, i.e. bytes with the high bit set), what is a message reader to do?
Security-wise, the best would be to reject the message. Yet of course this leads to less
than ideal user experience. So the obvious solution is to virtually modify one of the
elements (either the message description, or the message data), so consistency is
established.

Now, IE changes the data, i.e. sets each msb to zero, and thus establishes consistency -
the data becomes valid US-ASCII byte stream. Firefox and Opera, I assume, take the other
path, and modify the message description to read "ISO-8859-1", and thus establish
consistency, as now the bytestream is valid ISO-8859-1 data.

Of the tested browsers Firefox 1.5, Opera 8.5 and InternetExplorer 6,
only the InternetExplorer does this correctly, the others evaluate the
bit and display the characters as if they were from the character set
ISO-8859-1.

So what I don't understand now is why IE's "solution" is any better than Opera/Firefox?

Why is modifying the data (msb) any better than modifying the data-description (charset)?

Please note: the attack you described is interesting and elegant. I'm just reserved about
the statement that IE's approach is correct (vs. the other browsers). I was involved in
research around similar situations wherein the strict RFC was violated, and different
products interpreted data differently. And in such cases, I think we should be cautious
about which product is "correct" (except that naturally, security-wise, it's more corrent
to reject the message altogether).

Food for thought,
-Amit



Relevant Pages

  • Re: OT Brief heads-up
    ... ASCII is a character set that contains 256 items. ... I wouldn't want to predict whether Sibelius or Emacs with LilyPond would be ... If your message looks like spam I may not see it. ...
    (rec.music.early)
  • Re: extended ascii
    ... % Displays current ASCII encoding in use in Matlab ... What standard does ML use for ascii? ... text simply has to know a priori what character set to use. ...
    (comp.soft-sys.matlab)
  • Re: extended ascii
    ... % Displays current ASCII encoding in use in Matlab ... What standard does ML use for ascii? ... the character set is explicitly specified in a header. ... This is likely the character set used by the MATLAB ...
    (comp.soft-sys.matlab)
  • Re: producing junk when printing a string
    ... > He is probably not assuming ASCII. ... He more than likely knows his platform is ... > Coding for a specific character set is not a mistake. ... > foo() ...
    (alt.comp.lang.learn.c-cpp)
  • Re: New Internationalized domain names are coming [Telecom]
    ... Internet DNS daemon will have to be upgraded. ... all the spammers in China, Korea, Russia, Nigeria. ... DNS lookups resolving to domains using a non-Latin character set. ... other languages from using the Latin alphabet as well as their own. ...
    (comp.dcom.telecom)

Loading