Re: Voice encryption (Stream vs CBC mode)

From: Peter Fairbrother (
Date: 12/12/03

Date: Fri, 12 Dec 2003 00:06:17 +0000

David Wagner wrote

> Peter Fairbrother wrote:
>> And I still don't know of any forgery attacks that are of importance in a
>> typical VoIP application,
> Well, ok. But...
> That's a slightly dangerous style of reasoning. Remember, we
> didn't know of any attacks on IPSec, or WEP, or you name it, when
> they were initially proposed, and so it seemed ok to omit the
> MAC. Once you find the attack, sure, you add a MAC, but by then
> it may be too late.

Provable assurance is very nice, but I'm not too sure what proven integrity
(if a MAC gives that) really means in terms of defending against attack.
>> when CBC is used (which means no chosen plaintext
>> forgeries) with sync-based IV's (which give replay protection)
> No, they don't. There are still cut-and-paste attacks, which
> allow one to paste in traffic. A block or two will be garbled,
> but if that's voice data, maybe no one will notice. Whether this
> matters to your VoIP application, I don't know.

I don't follow that. Suppose 64 packets per second, and a one-byte counter
in each packet, that's 4 seconds between counter repeats. We assume very few
packets will take 4 seconds, and we also have a counter in the boxen which
increments every 4 seconds. Concatenate the box counter and the packet
counter, hash with key, and there's your IV.

There are only 3 or 4 blocks per IV. Actually you could have a new IV for
each block if you wanted, I don't know what that mode might be called!

Any cut-and-pasted blocks will either be in the 4 second interval, if they
are close to sync then I can't see anything useful an attacker could do with
them, if too far out they will be ignored as being too far out-of-sync, or
they will be outside the 4 second interval, in which case the IV will be
different and they won't decrypt to the real plaintext, or to any
predictable plaintext, it will be garbage.

Different calls will of course use different keys, and there will be forward
secrecy. Won't there.

Isn't there some newly discovered confidentiality-and-integrity mode? I
remember reading abut something like that, but it seems to have gone away.

The problem in encrypted VoIP is traffic volume, making the poor box do a
lot of work decrypting isn't going to mess things up (even in weeny hardware
- you need a lot more hardware to do the compression than the encryption)
like increasing ciphertext size does.

Latency problems mean that you need lots of small packets, the idea is to
get latency down below about 150ms.

You get x ms worth of speech. You can't really compress it until you have
the whole sample, so that's x ms gone. Compression takes c ms, encryption
takes e ms, then you send it. The 'net has a latency l and takes s per
packet, and you can't start decryption until half the packet has been sent.
Decryption takes d, and encoding takes g. That's x + c + e + l + s/2 + d + g
ms. And you want all that to be less than 75 ms, so the return trip takes
less than 150 ms. I missed sa fair bit out as well. It's a pain, and only
just possible to do even without encryption. Adding extra to the packets is

For instance, on a modem, you have 32kb/s max (a 56k modem is 56kb/s
downstream and 32kb/s upstream full-duplex), but you have to allow for other
traffic and slow lines and so on, so call it 24 kb/s max traffic . About 2/5
that will be overhead like headers etc, so you have to get decent voice into
~15 kb/s. Hard to do. Something like Speex at 14.5 kb/s fixed rate is
probably a good start for a freeware project.

There are header compression fixes, but they tend to need specialised 'net
hardware. And what about broadband, you ask? Nah It needs to be available
for everyone, even those poor souls with modems, or you can't call most

Another good thing about modems is that you can just plug into a phone and
away you go, encrypted telephony, and you can find out if the man is in
first. Not yet untraceable tho', sadly.

Peter Fairbrother