Re: Comments on interface?



Jonathan Lee <chorus@xxxxxxx> writes:

BlockCipher provides
* Padding (and interface for selecting padding)
* Modes of Operation (and interface)
* A common interface for the encryption and decryption, eg.,
encrypt(plaintext, sizeofplaintext, outputbuffer,
sizeofoutputbuffer, key, sizeofkey, initvector);

Derived class *MUST*
* Implement the forward and reverse transforms for encryption/
decryption
* Similarly implement the construction of a key schedule
* Provide a function that returns the current blocksize
* Specify a byte order in its constructor (i.e., what endian for
mapping memory to blocks)

This seems to me to be throwing too much into the interface. A block
cipher is a really simple thing. It has a key (which can often be
expanded into tables to improve performance), and the key selects one of
a (very large) number of invertable functions which operate on
fixed-size blocks of binary data.

I'd say that the interface to a block cipher should look like this:

* The block cipher can be interrogated to find out (a) the size of
blocks it works on and (b) which size(s) of key it accepts.

* You instantiate a block cipher by specifying a key. (This allows
the expansion into tables I mentioned above.)

* An instance provides operations which implement the forward and
inverse functions selected by its key.

Unfortunately life is unkind and this is a little too simple -- see
below. But the most important difference I've made is that I've thrown
out all of the modes-of-operation stuff.

Why? Lots of reasons:

* A mode of operation is a way of using a block cipher to achieve some
higher-level cryptographic objective. So it stands to reason that
it should live outside the block cipher itself.

* The set of modes of operation is open-ended. Trying to cram them
all into a base class means that (in most object-oriented
languages[1]) the original implementer of the base class is
specially privileged, and nobody else can come along later and
provide new modes which look like the existing ones.

* Not all modes are applicable to all block ciphers.

For this reason, I'd keep modes of operation as separate entities which
accept a block cipher as a parameter, rather than trying to fold them
both together.

[1] Such languages are obviously deficient, but they're popular anyway.

Some things that trouble me:
- Is it a good idea for cipher to have a "default" blocksize? Or
should the user be forced to specify one?

Umm. The block size is fixed by the block cipher. (Oh, there are some
block ciphers which have variable block size, but in practice you want
to nail down a small collection of sizes and implement them separately
for performance reasons.)

- I don't like specifying the byte order in the derived class, but I
see no way else to do this if I want my base class to provide padding
(I'm going for code re-use here).

Byte order is a messy issue. (This is the `see below' part.) For best
performance you want to implement your block cipher in terms of units
larger than individual bytes. This means that somewhere you need to
convert between raw bytes and these larger units; bit this conversion is
dependent on the block cipher.

The obvious answer is to have the block cipher handle this conversion,
since it obviously knows how to do it right. The problem is that modes
of operation can also benefit from operating on larger units. In many
cases (but not all!) you can get away with only doing a single
conversion each way if you expose the conversion process in the block
cipher's interface.

So, I'd probably add:

* Operators which convert between byte vectors of the appropriate
size, and an internal representation which is actually processed by
the function and inverse-function operators.

* A description of the internal representation, sufficient for your
modes of operation to decide whether they can make use of it.

Exactly what the description looks like is an engineering decision.
Something like `vector of <N> items, each of which is a integer formed
from <M> bytes taken in <big-endian> order' would seem general enough to
me. If your mode doesn't like the internal format, it'll have to
convert to a format it does like (probably via a byte vector).

If you're very brave, you might try to have block ciphers accommodate
multiple internal representations. For example, AES can be implemented
equally well using big- and little-endian representations, but
big-endian fits in better with GCM. It's possible that there might be a
similar win for little-endian representation with a different mode.

- Should key and initialization vector be provided at the time of
calling encrypt()? Or is it more useful to have a SetKey() method and
SetInitVec()? Or should setting the initialization vector go with
setting the mode of operation?

This is a matter for the interface of your modes of operation. I've
been deliberately quiet about this issue, since it's very complicated,
and different modes have different requirements.

If you're encrypting large amounts of data, you'll almost certainly want
a restartable interface to encryption. This will suggest that your mode
interface should start with a `set-IV' operation, and then process data
in chunks, keeping some state in between. This can get messy, since
you'll either need to impose restrictions on the sizes of chunk you can
process (e.g., must be a multiple of the block size) or do some slightly
hairy buffering of data to cope with the misalignment, and cope with a
final incomplete block at the end in some way.

I'd think about a mode of operation as being something that you
initialize once with a key, and then use to process a number of messages
(each of which needs an IV), each of which consists of a number of
chunks. But if you can get away with a simpler model then by all means
do!

Life gets even harder. Some modes take multiple arbitrary-sized inputs:
e.g., GCM accepts an IV, message, and additional data, all of which can
be very large; and it can often be the case that much or all of the
additional data is the same for different messages.

Also, different modes have different requirements on IVs: e.g., CBC
requires that the IV be unpredictable in advance, whereas GCM requires
only that the IVs be distinct. This may affect how your application
wants to generate the IVs: if it already has a message counter available
then this can be used directly as the IV for GCM, but it's not adequate
for CBC. (In CBC, you could encrypt the counter to serve as an IV; in
fact, you can even use the same key as used for the main encryption
without damaging the security.)

I'm sorry this is all very complicated. If your application doesn't
need full generality, I encourage you to simplify as much as you can --
but no more! And hope that you don't need to extend it in the future...

-- [mdw]
.



Relevant Pages


Loading