Re: Encryption key longer than text to encrypt

rossum wrote:
On 3 Jan 2007 14:10:25 -0800, "Jean-François Michaud"
<cometaj@xxxxxxxxxxx> wrote:

rossum wrote:
On 3 Jan 2007 09:17:52 -0800, "Jean-François Michaud"
<cometaj@xxxxxxxxxxx> wrote:

rossum wrote:
3 Since the cypher needs to be able to resist a chosen plaintext
attack, I do not see what you gain from scrambling the plaintext.

The idea is, since the PRNG is weaker than RNG used for OTP is to try
and cut down chances of text being recovered more easily by using
heuristics (since the text is scrambled). Attackers won't be able to
expect relevant text to fill in the whole encrypted message.
This does not do what you seem to think it does. You are interleaving
the non-random plaintext into the (almost) random bogus data. All the
attacker has to do is to decrypt a few KB of data and look for an
excess of common plaintext characters, ETAOIN etc for English text.
This is purely statistical and is not affected by shuffling. That
will quickly narrow down possible keys.

To be able to decrypt the data, the attacker would have to know the
keystream that was XORED against interleaved plaintext + bogus data.
Don't we hit the same constraints as we do with OTP, notably, any key
that we would use would allow us to decrypt to anything we want? Also,
an attacker would have to deal with reconstructing messages from mixed
characters which is a much larger space than simply brute forcing
through all possible data to extract potential candidates through
heuristic pruning.
The attacker does not need to know the keystream, the attacker just
needs to know, or guess, the key. All the attacker has to do is to
brute force her way through all possible keys decrypting plaintext
with each key. The statistical checks will pull out those decryptions
with an excess of common characters. Those keys will be selected for
further detailed anlysis.

Actually, the idea is to generate 2 keystreams, both of them being
generated deterministically on both BOB and ALICE's side given that
they can both have the same virtual machine states. One of the keys
would be used as the key to unlock the Bogus Data + Text and another
keystream which is the original bogus data would be used to unscramble
text and recover it.
As I said, this does not recover the text, all it can do is to recover
the differences between the two.

If you XOR the first keystream against the encrypted data, you end up
with bogus data + interleaved plaintext, right? Isn't this basic XORing

From there, the scrambling algorithm can be run to unscramble
text. The data would be interleaved through the bogus data as a
function of say the state the rotors end in after the bogus data is
generated. When comes the time to decrypt, the same bogus data can be
generated by the receiving party and the same rotor state achieved so
that the same scrambling process can be used to reverse the scramble
and restore order to plaintext.
Looking back at the example in your OP, you had:

Message: H O L LE

That looks like replacement to me, not interleaving. Interleaving
would not lose characters but might disrupt the scrambling, depending
on the scrambling algorithm you used.

For example:

Message: S TA R T A T 8 (Start at 8)

Alice now recovers the combined bogus data and message. She then
looks for differences:

It wouldn't be a difference function, this wouldn't work. It would be a
positional scramble based on the state of rotors after the bogus data
has been generated.
You will need to explain this more clearly. Your OP had a simple
replacement, which implies just looking for differences from the
expected calculated Bogus Data. I think that you need to specify your
algorithm more carefully here.

Differences: S TA R T A * * (Start a )

How can Alice tell that the T and 8 at the two asterisked positions
are actually part of the message and not just bogus data? As it
stands your idea here loses data, roughly 1 in 256 bytes will get
lost. That is not a good thing in a cypher.

Or maybe the scrambling/unscrambling can be a
function of the machine state after generating both keystreams. I think
this would be more secure since the machine state is supposed to remain
unknown to the attackers (it is, in essense, the key to the whole
encryption scheme).

1) The text does not appear in order, it is scrambled. In your
example, Alice does not recover "HELLO", she recovers "HOLLE" which is
the order in which the message characters appear. How does Alice know
what the scrambling order is? There is nothing in your description
about transmitting the scrambling order to Alice.

2) By chance, Alice will fail to recover 1 in 256 characters when the
character in the Message matches the corresponding character in the
Bogus Data: if the Bogus Data was "LLLLLLLLLLL..." she would recover
"HOE". How will Alice know when a character is actually a Message
character rather than a Bogus Data character?

The unscrambling/reconstruction would happen automatically as a
function of either the bogus data generated by the virtual machine or
as a function of the latest state of the virtual machine after the
latest key generation. No work on Alice's part has to take place for
her to decode the message.
You need to give more detail here about how the unscrambling would
work. A fixed algorithm is of no value as any attacker can simply
undo any scrambling. A keyed algorithm requires sending the key
somehow which complicates key handling.

I would suggest that you drop the shuffling and bogus data ideas. If
you do retain the bogus data then you need to modify it so that all of
the text is easily recoverable.

As a difference function, it wouldn't work, it would have to be a
dynamic positional scramble based on something that both parties have
but that the attacker doesn't have, notably, the key. For a rotor
machine, the key keeps changing, so it seems to be an appropriate
source to determine HOW and WHERE characters will go in the bogus data.
If it is generated by the algorithm then the attacker can generate it
as well with her guessed key. You are not adding any uncertainty to
the attacker's task. The one key will give her the keystream, bogus
data and message scrambling. She still has the same sized keyspace to
run her brute force attack through. The complications you are adding
to your cypher are not increasing the uncertainty of the attacker.

Even if the attacker knew about the scrambling algorithm, he would have
to know what the state of the machine is after the bogus data
generation to know HOW and WHERE the actual data is.
The attacker is guessing the key, so when she has a statistically
interesting decryption she just runs it through the known unscrambling
algorithm using the currently guessed key. She has the algorithm and
a possible key. She can unscramble the message as easily as Alice

I understand this very clearly, I'm merely pondering the idea that
brute forcing 4096 bits (512 rotors and 256 cogs) is not feasable and
that an attack against the potential weaknesses of the keystream
generated by the PRNG might actually help to figure the real key out,
this is why I'm exploring the idea. You are saying that the keystream
is as strong as the key, but what I perceive is that the bounds for the
keystream is AT MOST as strong as the key and at the very least
completely insecure. What are the preconditions to meet for the
relationship to become true?

What are the preconditions to meet this requirement?

A is key
B is keystream

How does one make sure that the potentially looser cryptographic
strength for keystream B <= A becomes B = A? This is dependent on the
properties of the algorithm, but I'm uncertain as to what they would

Jean-Francois Michaud