Re: Verifying CRCs

From: Rob Warnock (
Date: 07/12/04

Date: Sun, 11 Jul 2004 22:23:08 -0500

Francois Grieu <> wrote:
| "Skybuck Flying" wrote:
| > Well for hardware I don't know... Seems a bit weird to me... Since you
| > need to receive bytes first anyway ?
| > Or can a crc calculation be done, bit by bit ?
| Yes, CRC calculation in hardware can be done while bits are received
| (after removing "transparency" elements)

Moreover, the CRC calculation can be done in hardware one bit at a
time, or 8 bits at a time, or 32, 64, 128 bits &c. at a time, with
fairly small amounts of logic. Doing the Autodin-II CRC-32 (Ethernet,
FDDI, ATM/AAL5, etc.) in parallel 32 bits at a time (that is, consuming
32 input bits per clock) takes about 500 XOR-gate-equivalents (or maybe
less with multi-input XORs).

[Little-known fact: An M-bit CRC can be done N bits at a time even
if N >> M. E.g., a CRC-32 can be done 128 bits at a time (per clock).
Just takes a little more hardware...]

| > I am not sure... but maybe ethernet adds a crc32 to the end of the frame.
| It does.

And it pre-initializes the result to all one's *and* it one's-complements
the result before appending it to the outgoing frame [thus improving its
ability to detect lost runs of zeros at points in the frame where the
current running CRC happens to be zero].

| > That would be unfortunate since frames can have variable sizes.
| No problem at all with a canonical CRC, which can be entered in the
| CRC engine as data.

And at least for the Autodin-II CRC-32, the "magic" constant that
results from a valid CRC at the receiver does *not* depend on the
frame length.

| > A better technique at least for calculating the crc32... is to put the crc
| > in the header of the frame. That way the location of the crc32 is always
| > known and can be calculated... as soon as the payload bytes arrive.
| And then the sender must compute the CRC on the whole frame before
| sending it, making an extra pass in the data, adding latency...

Not only *that*, but it can double (or in the worst case, triple!)
the required memory bandwidth!

| CRC at start of data is never used in any data transmission context,
| to the best of my knowledge.

Although putting multiple CRCs in a packet is done by some protocols.
E.g., DEC's DDCMP link-level protocol had a short, fixed-length header
which contained both the length of the following variable-length data
section *and* a CRC-16 over just the header (including that data length).
Then the data section which followed had a completely separate CRC-16
over just the variable-length data. The result was that the receiver
saw a good header CRC, it could (with high confidence) trust the length
field and dynamically set up the DMA transfer for the following variable-
length section to be just that length.

| Computing a 16-bit CRC in software on an 8-bit CPU only cost 2 table
| lookups, a XOR, and few load/store, for each byte of data.

A bit of ancient trivia: With the standard BSC/SDLC CRC-16, the table
can be compressed into 12 bits in width by stripping out "bit columns"
that are all zeros, resulting in a table that is just 256 12-bit words.
Thus you need only one table lookup per byte (though you must do a few
shifts/masks/XORs to regenerate the 16 bits from the 12-bit value).
This is how we did CRC-16's in software on the DEC PDP-8 circa 1972.
[Two of them per packet: see note above about DDMCP... ;-} ]

Similarly, computing a CRC-32 in software on a 32-bit CPU costs only
one table lookup, two XORs, and a byte-fetch for each byte of data
(assuming you're doing the whole packet in a tight loop, otherwise
add another 32-bit store & load per byte).

| > So instead the packet can be stored and crc32 calculations done later.
| Adding latency, and implying an extra pass in the data.

And extra memory bandwidth and CPU consumption.


Rob Warnock <>
627 26th Avenue <URL:>
San Mateo, CA 94403 (650)572-2607