Re: Compressing hash strings into two directions - hashing idea
From: Juuso Hukkanen (juuso929_at_tele3d.net)
Date: 03/20/05
- Next message: CBFalconer: "Re: Thou shalt have no other gods before the ANSI C standard"
- Previous message: Gib Bogle: "Re: Ellipse versus hyperbola, surrogate factoring"
- In reply to: Matt Mahoney: "Re: Compressing hash strings into two directions - hashing idea"
- Next in thread: Matt Mahoney: "Re: Compressing hash strings into two directions - hashing idea"
- Reply: Matt Mahoney: "Re: Compressing hash strings into two directions - hashing idea"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Sun, 20 Mar 2005 21:34:54 +0200
On 18 Mar 2005 11:52:37 -0800, "Matt Mahoney" <matmahoney@yahoo.com>
wrote:
On 18 Mar 2005 11:52:37 -0800, "Matt Mahoney" <matmahoney@yahoo.com>
wrote:
<snip>
(See. End/conclutions; This is my last LONG post about this idea. If
later asked, I will try comment shortly. But I simply do not know more
/ intend to use this self. But who knows this may provide ideas to
someone)
>First, a secure hash should have a fixed size that does not depend on
>the input. How do you achieve that?
That is so correct and I have no good answer to that. I posted this
idea "double compression + joining for others to utilize. Perhaps
there is some hash designer somewhere with an exellent checksum
builder, but with an unsolved unstrirring problem. I assumed from very
beginning that if this method is good for something, it should be
prevention of unstirrability.
Multilayered defences are generally concidered best. If this
unstirrability would be shown+made strong. Perhaps this algorithm
could be added to an (almost) ready cryptographic hash function, which
had been designed to give message digests of certain length. After all
someone may come here next year with an algorithm idea "exellent
checksums, but backwards traceable what to do?" Having the other
"cryptographic checksum' making algoritm below would also help in
guaranteeing that a short message which comes to this "final" algoritm
part - will be already an evenly distributed hash.
**** Here ends the part which discusses the unstirring property ****
>> The only thing I claim is that; "if a string is compressed twice into
>> two directions <using two different compressors (TX Matt)> and then
>> the strings are joined byte by byte, that result-string will not
>> unstir/reveal back to original".
****_______________________________________________****
>From now on I am on an alien territory, a terrain which requires other
twisting than just compression. Basically I should not enter soft
alien ground, because the answers (which I am not qualified to give)
may/will cast a shadow over the original "unstirrability" claim. Also
remember that any inaccuracy/stupidity of answers may be caused by the
fact that I know just about nothing about anything relevant.
So, I should just say that the pre-hasher part (pre-hasher),must
quarantee that each of closely related strings do not produce a
pre-message-digests, which are closely related to each others.
>Second, if the hash size is n bits, then it must meet the following
>requirements:
>1. You cannot find two different inputs x and y that produce the same
>hash (H(x) = H(y)) with less work than the time to compute 2^(n/2)
>hashes (collision resistance).
I think there are quite good changes to that, but the use of
pre-hasher is required. As later shown the possibly made slowness of
this suggested hasher, might increase the systems overall resistance
against flaws (type MD5 / SHA-1) in pre-hasher.
A lossless compression has a good ability of having exactly one
uncompressed form per one compressed form. Thus each right->left
compression can be quaranteed to be (100%) sure to not have a single
collision. The same is true for left-right compression. Lets
Illustrate isolated cases:
---------
| |
| | <--< right-left = ZERO collisions with other right-left's
| |
| |
---------
---------
| |
Left-right= ZERO collisions with other left-rights >--> | |
| |
| |
---------
Potential problems begin first when the two hashes are combined. At
this stage there should be two hashes full of different random-like
data.
Combining the random-like strings is not be expected to make a result
string to obtain any entropy reducing properties.
I'd think collisions in this group could be easiest to manufacture, if
the pre-hasher fails in distributing the message-databits evenly.
>2. Given a hash H(x) you cannot find any x producing the same hash with
>less than 2^n work (preimage resistance).
I would think there is a especially strong resistance against these
attacks. Because without knowing the original. The hash should give no
glue about the original message. I believe that is due to attempted
try of borrowing the powers of (reversed) counting argument. i.e. You
cant get data data for two random strings from one equally sized
random string.
>3. Given x, you cannot find y != x such that H(x) = H(y) with less than
>2^n work (second preimage resistance).
This depends on fitness of pre-hashing / twisting before compressions.
>4. n should be large enough that 1-3 are computationally infeasible.
Again, this also depends on fitness of pre-hashing / twisting before
compressions.
>How does your algorithm meet these requirements? Have you computed the
>actual work required?
No, I presented an idea in hope of someone finding use for the
unstirrability property.
>Here are some weaknesses.
>1. Given that your hash is variable sized, I can easily brute force it
>by using small or easily compressible inputs.
Not, if a pre-hasher is used - that guarantees non-compressability and
size.
>2. For most compression algorithms, changing the end of the input does
>not change the beginning of the output. This allows incremental
>attacks.
If you change the end of original message(digest), that end will be
the beginning of the (reversed) another string, thus the changes will
be there.
>3. Compressing twice does nothing in terms of either reducing the input
>size or adding security.
Most likely correct - in most simple versions. I added that 'twice' in
'claim' in order to make sure that if someone would claim and want
some kind of test challenge, I'd make sure that there would not be for
example branches of huffman tree lying around.
But in (preferred) more advanced model, there sure can be
additional security found from compressing multiple times. Namely that
can be used in removing statistical errors caused by compression
algorithms etc. That happens by compressing multiple (twice) times and
saving all compressed files and finally joining all produced strings
byte by byte. Saving all compressed strings also gives possibilities
for twisted amount of data-twisting. (ref. previous reply SERIES3)
>4. Compression does not remove all redundancy. In fact, there is no
>way to do this (as proved by Kolmogorov). In any case, redundancy
>removal provides no protection against collision or second preimage
>attacks.
I believe that. The only protective effect, which could be obtained
would perhaps be against attacks which would try to utilize a
distribution weakness in mentioned pre-hasher. For example In
practical use of hashes, the hashing speed has some significance and
that is measured in Mbytes / sec. This suggested dual compression
hasher would probably be very slow in dealing with whole megabyte
messages, but it sure can make the required rounds for a short
pre-hash strings fast enough. Now lets assume that there would be a
found vulnerability in pre-hasher. lets call the imaginary pre-hashers
to MDx5 and SHAx-1. Now assume that some a newly published
vulnerabiity is suddenly promising a collision in 2^28 rounds. That
would be a for some disaster, but luckily the-compression rounds 14
per each attack do slow down testing of small brute-force passwords.
The slowness of compression part-hasher would ofcause not slow down
the whole file checksum calculations, they would allways be fast in
terms of Mbytes/sec.
>5. Some data is not compressible. A hash would be the same size as the
>input. How would this be useful?
Examples in SERIE3 last post, those grown parts can be used in
increasing a twisted amount of twisting. Damn - I love that
data-twisting stuff.
Ok,SERI3_3(even more advanced), assume I have 14 random strings all of
sizes between 21-30 bytes. On average 5 extra bytes in each array
(20 original). Thus I have 160 (14x5x8) extra bits I need to get rid
of. But where shall I take those bits away? How about one by one - on
left or right side of individual strings. Side decition based of
whether the sum of all whole bytes in ALL the 14 strings is odd or
even.
Lets start at string 1:
what is the summ of all bytes in all 14 strings ?
ANSWER: 44831
-> action If(string_len (string_this >= 20))
{
remove_bit_from_left(string_this);
move_strings_five_last_bits_to_become_strings_first(string_previous);
}
else
{
remove_bit_from_right(string_this);
swap_middle_bytes(string_previous,string_next);
}
,thus each random side bit removal causes changes to other strings,
eather by rotating an another string positions, or causing mutations
between strings. Each caused mutation, thus causes causes a large
number of other mutations.
Before all 140 bits have been randomly removed, an average of 70 byte
mutations have been performed, and 70 5 bit (frame shift mutations)
rotations performed. In addition to that that bit removals are likely
to cause a lot of unbalanced strings truncations. In addition this bit
removing scheme forces an attacer into a tube, which it must follow
from start to end in order of getting through.
Ok, now the strings are all truncated to 20 bytes, and now when they
are joined. There are all kinds of mutations and substring rotations
caused by the removed parts. That is how I see compressing
uncompressable data would be useful.
>You have not described what compression algorithm you intend to use.
>There are sure to be other weaknesses that depend on this.
Objection, I have not said that I intend to use anything - unless
someone wants a challenge. However If there really was a gun against
my head or someone who wants, to take that challenge I'd consider
takin two of following compressors + ofcause remove compressors their
header data parts.
bijectiuve arithmetic / arib32 (David A Scott)
http://bijective.dogma.net/arib.zip
Range Coder (Dmitry Subbotin &Mikael Lundqvist)
http://www.geocities.com/mikaellq/range.tgz
Context Tree Weighting : CTW version 0.1 (CTW)
http://www.ele.tue.nl/ctw/download.html
Adaptive Huffman : lzhuf (Haruyasu Yoshizaki)
http://www.cs.umu.se/~isak/Snippets/lzhuf.c
Adaptive bijective Huffman (David A. Scott)
http://bijective.dogma.net/ah2.zip
Ladies and Gentleman, I think I have delivered my suggestion to the
group, it might not be worth anything to anyone, ever, but who knows.
Atleast the technique seemed not been videly used in 1800th! century.
My suggestions as conclutions are:
1) Method possible is an easy way to make a checksum which does not
(easily) unstir.
2) The hashing method provides - probably poor shielding against
general collisions and very weak second preimage resistance.
3) Methods unstirrability is perhaps packed by the some powers of
reversed counting argument
4) stirring is based on only the stirred data
5) Each string is guaranteed to be collision free until the point of
joining the compressed high entropy strings. After joining the strings
collisions may occur at undetermined rate.
6) Most likely this method would be totally worthless as a secure
hasher, but someone might find a way of utilizing this to some purpose
for example in combination to an another hash-algorithm or cipher.
Could those be somewhat aggreeable conclutions?
If someone finds any use for this hashing technique, please, call it
lets say... "Juuso's half hash".
I Thank You all Very much for you comments and instructions, I learned
valuable lessions - hopefully someone of You also got / will get some
kind of fruitfull ideas.Unfortunately my time schedule (or my skills)
do not allow me to discuss this much further. Next week I will however
submit a simple pseudo-random seed algorithm+code. If Collisions
against that intention are not too overwhelming.
Juuso
(remove non-alphabetic tokens from mail-address )
<OT>
I am preparing to launch (on 2005/06/15) an open source project for
building a C based programming language and after that I need LOTS of
helping hands and minds. Language will include C (except the typical
overflow functions) and most of the modern stuff missing from C -
including encryption. If you are interested in being reminded/
participating drop me a line.
</OT>
- Next message: CBFalconer: "Re: Thou shalt have no other gods before the ANSI C standard"
- Previous message: Gib Bogle: "Re: Ellipse versus hyperbola, surrogate factoring"
- In reply to: Matt Mahoney: "Re: Compressing hash strings into two directions - hashing idea"
- Next in thread: Matt Mahoney: "Re: Compressing hash strings into two directions - hashing idea"
- Reply: Matt Mahoney: "Re: Compressing hash strings into two directions - hashing idea"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|