Re: Linear Shiftback Registers
 From: rpw3@xxxxxxxx (Rob Warnock)
 Date: Mon, 31 Aug 2009 21:20:40 0500
Phil Carmody <thefatphil_demunged@xxxxxxxxxxx> wrote:
+
 Tom St Denis <tom@xxxxxxx> writes:
 > Phil Carmody <thefatphil_demun...@xxxxxxxxxxx> wrote:
 >> I saw 1/x being used before I saw x being used. (Not in defining/finding
 >> the polynomial, but in the actual generation of values from the LFSR.)
 >
 > Really? I was kinda joking about that. I know it was used in the
 > OMAC2 spec before he just replaced 1/x with x^2.

 When I first saw LFSRs, in hardware, the shift was done so that the
 XOR of all of the tapping points was injected into the shiftedin end
 of the register. This is the reverse of the usual practice in s/w where
 the shiftedout end of the register XORs with all of the the tapping
 points. So I think it's fair to view it as division rather than
 multiplication.
+
I believe that it's the case that if some polynomial generates a
maximallength sequence with one style of feedback wiring then you
can always find a corresponding reciprocal polynomial that gives the
same output sequence with the other style of wiring, and viceversa.
But while simple, slow, smallpolynomial hardware LFSRs [like I used
to build on perfboard! ;} ] often used the "XOR all the taps into
the one input" method, these days fast, largepolynomial hardware LFSRs
tend to use the "XOR the output *into* various stagetostage taps"
method [which is truly the "division" method, b.t.w.], the reason being
that it's easier/faster to do the latter with FPGAs or hardcoded silicon,
since you don't have a huge, honking multiinput XOR that needs to
propagate in one clock(!) and you also have only *one* feedback wire
to route back from the output, not many.
In any event, even in hardware these days most people use some variant
of the "matrix multiplication" method [of which I've written here before]
when implementing checksums [such as CRC32] in order to be able to
process more than one input data bit per clock. E.g., using ~500 XOR
gates [or equivalent lookup table bits in FPGA logic blocks] you can
perform a CRC32 step on 32 bits of input data *per clock*. [The wiring
of the XORs is just the 32nd power of the 1inputbit/step CRC32 matrix.]
In fact, by using even more gates, it's possible to perform CRC32 on
*64* bits of input data per clock. [You have to be a little careful
about alignment/fixup issues at the end of packets which aren't an
exact multiple of 64 bits long, but that's a mere detail... ;} ]
The corresponding software technique is the venerable tablelookup method[1],
e.g., <http://www.faqs.org/faqs/compressionfaq/part1/section26.html>,
which contains C code [written by yours truly] for byteatatime CRC32.
Unfortunately, the needed tables rapidly explode in size when processing
more than one input byte per step: For a 32bit polynomial, the table is
1 KiB for 1 byte/step, 256 KiB for 2 B/step, and *16 GiB* for 4 B/step.
Cache effects alone would make calling the 1 B/step version four times
be *MUCH* faster than calling the 4 B/step version once! ;}
Rob
[1] Which I first learned from a paper by Stu Wecker [then at DEC]:
Wecker, S., "A TableLookup Algorithm for Software
Computation of Cyclic Redundancy Check (CRC),"
Digital Equipment Corporation memorandum, 1974.
There might have been prior art from IBM even then, but I'm not sure.

Rob Warnock <rpw3@xxxxxxxx>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)5722607
.
 FollowUps:
 Re: Linear Shiftback Registers
 From: Phil Carmody
 Re: Linear Shiftback Registers
 Next by Date: DES Key Generation
 Next by thread: Re: Linear Shiftback Registers
 Index(es):