# Re: Padding Question (Cramer-Shoup 98 Crypto System)

From: Mark Wooding (mdw_at_nsict.org)
Date: 11/24/04

```Date: Wed, 24 Nov 2004 13:08:35 +0000 (UTC)

```

Oliver Moeller <omoeller@verify-it.de> wrote:

> I have a Java implementation of Cramer-Shoup 98 Crypto System (see
> links below). Version 2.1 now guarantees to de-crypt to the original
> message by introducing a safe way to pad the clear text message up to
> the required block size.

Cramer-Shoup is IND-CCA2 out of the box, so your padding scheme doesn't
have to have any particularly interesting security properties. CS
deterministic. The /only/ job padding needs to do is injectively
transform your message into a member of the appropriate group. This,
alas, is nontrivial.

CS wants to work in a group of prime order. Your suggestions just
encode the message as a large integer and hope for the best, without
regard to whether the result is in the right subgroup of F_p^*.

The CS paper suggests using safe primes (i.e., p = 2 q + 1 with q
prime); then once you've encoded your message as an integer 1 <= m < q,
you can use m^2 as your message representative. Recovery is a matter of
finding the square root in the right range, noting that x^2 = (p - x)^2
(mod p) and decoding the result.

If you're using more general primes then you have a more difficult
problem. If p = q h + 1 for prime q, then you could proceed as above,
using m^h as your message representative. However, extraction of h-th
roots mod p is a nontrivial job, particularly if you want the specific
one which lies between 1 and q, so this rapidly looks like a bad idea.
I found http://www.ma.utexas.edu/users/voloch/Preprints/roots.pdf which
shows how to find roots in finite fields, but doesn't seem to be of much
use if you really want /this particular one/. I think this approach is
doomed to failure.

Encoding a message as an integer in 1, ..., q - 1 is straightforward.
Suppose that 2^{8 b - 1} <= q < 2^{8 b}, i.e., q needs b bytes to write
it down. Your message M must be b - 2 bytes long or less. Let M' = 00
|| ZZ || 01 || M', where ZZ is a string of zero bytes long enough to
make M' be b bytes long. Transform M' into an integer using the big-
endian convention. Decoding is obvious.

If you're doing CS on an elliptic curve then life is actually a little
easier. Suppose we need b bytes to represent a field element. Choose
some n, about 4 is enough; you'll be able to represent messages M which
are b - n - 2 bytes long with failure probability 2^{-8 n}. Simply
encode M as M' = 00 || I || ZZ || 01 || M, where I is an n-byte counter
starting at 0, and ZZ is a string of zero bytes long enough to make M'
be b bytes long. Now, transform M' into a field element x (using OS2FEP
or whatever) and see if there's a y such that (x, y) is (a) on your
curve and (b) in the right subgroup. If not, then step I on one and try
again. Each try will succeed with probability about 1/2^{1 + h}, where
h is the curve's cofactor. Decoding is again obvious.

Actually, the hybrid approach seems superior in almost every respect.
Choose a random group element, and hash it to obtain a session key for a
symmetric IND-CCA2 encryption scheme. Encrypt your group element with
CS and append the ciphertext of your message encrypted under the session
key. (In practice, DLIES is almost certainly a better bet -- it's
simpler and faster. But I'm guessing this is about CS being cool rather
than practical, and there's nothing wrong with that.)

-- [mdw]