Vigenere++ Proposal of a (new?) cipher

From: Stephen Cantini (nospam_at_nospam)
Date: 10/29/03


Date: Wed, 29 Oct 2003 01:48:59 +0100

Hello,

I'm an IT student and I have developed an autokey Vigenere cipher variant
with shuffling that looks quite strong. I'm not a cryptography expert at
all, so maybe this algorithm is weak by definition and I simply can't see
it. For what I know, it seems to be quite secure against statistical and
known-plaintext attacks, and it has really a simple (and quite fast)
implementation.

If you take the time to read this post, I'd really appreciate any comment
(see Questions section). I know that there are many security experts in
here, and I would really be happy to know their thoughts.

For those interested in challenges, there is a small (and quite easy) one in
the Examples section...no monetary reward, sorry! :)

Algorithm description follows (sorry for the 80 column pre-formatting...)
You can download it too, together with a DOS executable for testing
from http://www.codewiz.org/~nevez/vigpp.html

Thanks
Stephen

-----------------------------------------------------------------------

Vigenere++
Proposal of a new encryption algorithm based on Vigenere cipher

Vigenere++ tries to improve the original Vigenere cipher by addressing its two
main weaknesses (vulnerability to statistical attacks and known-plaintext
attacks) with the use of hash based automatic key generation and an
additional ciphertext shuffling phase.

The algorithm uses Fowler/Noll/Vo hashing
    http://www.isthe.com/chongo/tech/comp/fnv/
which is a fast hash function with a low collision rate and the Mersenne
Twister pseudo-random number generator
    http://www.math.keio.ac.jp/~matumoto/emt.html
which is four times faster than rand(), has a period of 2^19937-1 and a
623-dimensional equidistribution property proved.

The FNV-1 hash function converts an indefinite amount of input characters into
a single unsigned 32bit value. It is currently recognized as one of the best
hash function (for CollisionRate/Speed ratio), and is used in many
widespread applications. Hashing a character costs one multiplication and one
bitwise XOR only.
The Mersenne Twister generator generates pseudo-random 32bit unsigned
integers, and it is initialized with a 32bit unsigned integer as "seed".

>From now on I will use "P[i]" to indicate the i-th letter (or octet) of the
plaintext, "C[i]" to indicate the i-th letter (or octet) of the ciphertext and
"K[i]" to indicate the i-th letter (or octet) of the key. "PLEN" is the length
of the plaintext, "KLEN" the length of the key, "ASIZE" is the cardinality of
the alphabet used.

Encryption PASS 1 : Autokey Vigenere ------------------------------------------

This phase tries to convert plaintext into something as near as possible to
random characters, by generating a random key as long as the plaintext and
(possibly) having no "cycles" inside (repeating sequences of characters).

Algorithm description:

. HASH is initialized to the hash function applied to the reversed key (the key
  read from right to left).
. The pseudo-random number generator is initialized with the same value, as
  initial seed.
. For each character of index "i" of the plaintext (from first to last):
  . Modify K[i mod KLEN] in this way: add the old character (by hashing)
    to HASH, modify the resulting HASH by XOR-ing it with a new pseudo-random
 number (obtained from the Mersenne Twister generator) and reduce it
 (by xor-folding) to a value less than ASIZE. This final value replaces the
 original key character.
  . Apply standard Vigenere: C[i]=(P[i]+modifiedK[i mod KLEN]) mod ASIZE
  . Add P[i] to HASH (by hashing it)

As you can see, HASH is updated during the process by adding every
character of the plaintext and the generated key to it, and thus the final
encrypted i-th character value depends on (assuming no cyclic collisions):

. All the i-1 preceding plaintext characters
. All the i-1 "preceding" key characters
. The current (i-th) key character
. The i-th pseudo-random number (i.e. the generator seed i.e. the entire key)

Also note that the original base key is never used directly in the encryption
process (this attenuates the problem of a dumb key like a sequence of zeroes).

Encryption PASS 2 : Shuffling -------------------------------------------------

This phase shuffles the output of pass 1 by swapping each character with itself
or one of those following it, from the last to the first one. By going
backward, the first part of the ciphertext achieves better diffusion: this is
useful because usually the first part of a file contains the headers,
which are easily predictable. At the same time, by going backward with respect
to pass 1, much stronger avalanche effect is achieved (see "Considerations"
below). The choice of the swap character is based on the hashing of the
preceding plaintext and key (as in pass 1), but this time the random number
generator is not used. (see "Possible improvements" below)

Algorithm description:

. HASH is initialized to the hash function applied to the key (read from left
  to right) followed by the last character of the plaintext.
. The key is re-initialized to its initial value.
. For each character of index "i" produced with pass 1 (from last-1 to first):
  . j = i+(HASH mod (PLEN-i-1))+1
  . Swap the characters of index "i+1" and "j"
  . Add to HASH (by hashing) the character "i" and KEY[i mod KLEN]

Note that the same character can be swapped several times, possibly
travelling the entire PLEN distance.

Decryption --------------------------------------------------------------------

Decryption obviously works by reversing the process: first the ciphertext is
"un-shuffled". This must be done in reverse order to do it correctly (from
first character to last). To eliminate the problem of recreating the last hash
value of pass 1 (needed to begin "un-shuffling") the value itself can be stored
at the end of the ciphertext file during encryption. The value should be
encrypted in some way: my test executable stores
LASTHASH^hash(entire_key)^hash(entire_key_reversed) at the end of ciphertext.
The second pass of the decryption process is the inverse autokey vigenere: the
key is regenerated and its values subtracted from the plaintext.

Algorithm analysis ------------------------------------------------------------

The two passes together produce good avalanche effect: changing one character
of the plaintext will change every following ciphertext character during the
first pass. Moreover, with pass 2 the characters preceding it will be affected
too, since they will probably be swapped with different ones.
The same can be said for the key - changing one character alters the
initialization of HASH and the random number generator, thus affecting the
encryption from the start.
Regarding the random number generator, it's important to say that even without
it the hash function itself applied to both the plaintext and the key behaves
like a good random function, with low probability of having a short period.
Without this behaviour it would be much easier to decrypt the ciphertext by
trying all of the possible 2^32 seeds of the MT generator until we
find something that is a standard weak Vigenere cipher. Adding those numbers,
together with pass 2, simply helps covering worst case scenarios, if they will
ever happen, and complicates the attacker's task.
Because of this "strong" casuality of the ciphertext, to my eyes there's no
easy way to recover the original key length nor to gather statistical data.
Known-plaintext attacks are also greatly reduced because of pass 2: again
apparently there's no easy way to know where the weak encrypted data has been
repositioned inside the ciphertext.

Examples ----------------------------------------------------------------------

In the following examples the "hash key" is the key generated during pass 1
as it would have been without XOR-ing HASH with MT pseudo-random numbers (thus
showing if the hash function alone "works well" or not).
ASIZE is equal to 64, and values are mapped with this order (from "A"=0 to
"."=63): "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ."

plaintext :"This is a sample plaintext for a new autokey encryption algorithm"
base key :"EncRyPT"
ciphertext:"f5VYYwCai1.10qW7Yfqqx7ieNYRDJhV00x78NQR1KKOwvkdnYVZPcTRiEysiPZOs7"
hash key :"dICUGpDOHkL1K3e0tf3cAhI6rIpwdvVVvWKjF.8Tg1pG4Wk6nMnjqM8zZiRNaJ4ja"

- Changing one character of the plaintext
plaintext :"This is a sample plaintext for A new autokey encryption algorithm"
base key :"EncRyPT"
ciphertext:"OfRVuBL1NDu5gZmMYHcTU 4qKxnid7RWOMKQGqPs0y770N0xvQRFfTJhmYjPsk.kY"
hash key :"dICUGpDOHkL1K3e0tf3cAhI6rIpwdvVVWnaU1YdBYIvPBmm9RHbYVtN2vWU8LeVxw"

- Changing one character of the key
plaintext :"This is a sample plaintext for a new autokey encryption algorithm"
base key :"EncryPT"
ciphertext:"8GuEKFjjOIUATGISL LONRbjIH7U9gTzoNjYDraWxHw2x31orMKDk8PEtyBRIOc9B"
hash key :"nHwSrt.jvGFsyxG.GGzj.ozWNPV09IhuIlNeHAi152jz9zWT8CeyVPaiR4gaqGAMQ"

- Adding one character to the key
plaintext :"This is a sample plaintext for a new autokey encryption algorithm"
base key :"EncRyPT2"
ciphertext:"57hAItVvQ7M XszFLCaIQh0hJZ9KOzgennNgP6Yh57HVmESnH7GmgWrFFh8CEzD.4"
hash key :"HQbK04Ym5jOdG8d9OFnf5thipGhNpe.O5Toy19 HsNJE26BavhYatXMl54DMoyMHJ"

- Dumb plaintext and key
plaintext :"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
base key :"AAAA"
ciphertext:"8Nmf6JJiN.8hdhUSebAsEH9SyaRi.mUAoIrE3mAQeZJdg7yoFaSc0IFYtMv0SQigS"
hash key :"2OX dJF.ou6S5 XelqyFnyNVtNDx81lzfV3JkDduiyB0kU5HDB.wAwJHoPMpnBLhl"

- One character key
plaintext :"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
base key :"A"
ciphertext:"pCtDUD41rUEnlVrowKJE3dC64UxypmVb9GRsaZPV BhoPbwG5DrTSzI5acPNwfI3J"
hash key :"OGEZJ4qE4hk0vqePfWVlDYRIjS OXuywslbWKUa6ah86IStxacrojbrGcmHV 76Of"

- A cipher proposal is not a real one without a challenge, right? Decrypt this:
plaintext : ??? (Written in english language, quite ripetitive)
base key : ??? (not too long...)
ciphertext:"ipUejq0ysk7hrmpH15gK7OBW1pecPUCZfYvNX5ekuw8boyn36M.qVdEYh9nyHV8"

If you can break it, please post the method you have used!

Testing -----------------------------------------------------------------------

You can download a DOS executable from http://www.codewiz.org/~nevez/vigpp.html
to test this algorithm yourself. It allows encryption/decryption of plaintexts
with ASIZE=64 (6 bit per character), as in the examples above. Please do not
compare speed of operation with other algorithm's implementations, as this one
is NOT optimized in any way. If my idea turns out to be a good one, I'll
immediately add the source code, so that everybody can modify it
and optimize it, together with a normal (8 bit per character) version.
The executable produces an auxiliary file during encryption, called
"<filename>.key" that contains the "hash key" mentioned above.
As an empiric test of security, compressing it with any compression tool can
at least reveal if the hash key contains cycles. If you can compress the hash
key to less than 75% or so of the original size, then you have found a weak
plain/key pair. Please post it!!!
The reason behind 75% is that only 6 bits are used in each character of the
ciphertext.

Known weaknesses --------------------------------------------------------------

. The avalanche effect coded inside the two passes does not behave well on
  disturbed trasmissions (i.e. altering one character of the ciphertext
  prevents the correct decryption of the rest)

Questions ---------------------------------------------------------------------

1) Does the algorithm contain evident weaknesses?
2) How does it compare to: (a) Vigenere (b) DES (c) OTP
   in terms of security (by key length) and efficiency (by # of ops)?
3) Is there any part that could be simplified/removed without affecting
   security (and thus improving performance)?
4) What kind of attacks can be done to such a cipher?
5) Would the use of a secure hash function make the cipher stronger?
   I don't think so...a secure hash function has the additional benefit of
   making it very difficult to find another string that hashes to the same
   value, right? In this context I can't see how this would improve security,
   and it would surely impact performance.

Possible improvements (to my eyes) --------------------------------------------

. One way to improve the algorithm would be altering the initialization of the
  HASH value of the two passes by XOR-ing it with the sum
  of all the characters of the plaintext. By doing this, a small
  change to the plaintext will affect EVERY char of the ciphertext starting
  from the first of the first pass, increasing the avalanche effect.
  This is always true if we modify just one character, and is highly probable
  if we modify more than one (because of checksum collisions).

. Another possible improvement would be adding random number XOR to pass 2 too.
  This would require an additional memory buffer during decryption
  to hold the pre-generated random numbers, as it is not possible to
  generate those numbers in reverse order.

. Efficiency could be improved by hashing only one character for each
  iteration (the character obtained by XOR-ing together the key char and the
  plaintext char), but this would probably increase the probability of creating
  cycles in the key.

Posted from X-Privat Free NNTP server - www.x-privat.org



Relevant Pages

  • Re: Combined Signature and Encryption Schemes.
    ... A block cipher on the Plaintext, this gives me the CipherText ... A Mac on the Ciphertext ... digitally signing the MAC value would remove the need for a hash pass ...
    (sci.crypt)
  • Re: (newbie) question on modification of polyalphabetic substitution cipher
    ... Then you would have a plaintext autokey cipher. ... > assume, for a simplified example, an ordinary 26 character alphabet, ... assume that the ciphertext is obtained from the plaintext by the ... > since the attacker needs only to try a few small values of n, ...
    (sci.crypt)
  • (newbie) question on modification of polyalphabetic substitution cipher
    ... assume, for a simplified example, an ordinary 26 character alphabet, ... assume that the ciphertext is obtained from the plaintext by the ... since the attacker needs only to try a few small values of n, ...
    (sci.crypt)
  • Re: OTP and message integrity.
    ... exactly what the hash is of, which is something of a shame. ... If the hash is of the plaintext, and the IV used for CBC mode encryption ... difference between the old and new hashes; the scheme therefore falls to ... the IV and the CBC ciphertext (all of which can be done through ...
    (sci.crypt)
  • Re: post-doomsday computing
    ... successive bit in the random stream to the corresponding plaintext ... the result is the ciphertext. ... the entropy per character in the OTP must be ... Paul Leyland, pleyland@ | Thought I'd something more to say. ...
    (sci.crypt)