Re: Thou shalt have no other gods before the ANSI C standard

From: Randy Howard (randyhoward_at_FOOverizonBAR.net)
Date: 02/27/05


Date: Sun, 27 Feb 2005 01:01:43 GMT


Ok, as there have been a lot of responses, I've tried to
collect them all in one place. Interesting how much
attention four lines of code can receive.

According to Randy Howard:
[edited to use k for index var]

> int k;
> int hash_value = 0;
>
> for (k=0; k<strlen(string);k++)
> hash_value = (hash_value + string[k]) % TBL_SIZE;
>
> How many things can you find to comment about in those few lines?

First of all, I've substituted k above, since it shouldn't have
the same newsreader/spellchecker affliction as in the original.
Comments about capitalization have been skipped.

It's also fair to assume (since it was originally in the prof's
assignment) that string has type char *, comments about that
have been skipped.

Comments about indenting have been skipped.

I've taken the remaining comments and sort of folded them all
together with fairly heavy editing.

1 strlen() is likely to be recomputed many times, thus making the
  whole algorithm quadratic wrt to string length. This can be
  prevented by precomputing strlen() once outside the loop. One
  cannot rely upon the compiler to do this magically. There may
  be some C compiler in existence that will do so, but this has not
  been demonstrated (yet).

2 one could replace "k<strlen(string)" by "string[k] != '\0' ".

3 Doing the % inside the loop might be slow also. However, doing %
  outside the loop means you have to cope with possible arithmetic
  overflow inside it.

4 As values are extracted, they may yield negative values when
  "char" is a signed type. "hash_value" may end up with a negative
  value, which would be bad if that value is used for indexing an
  array (probable).

5 Depending on the architecture, size of int, size of char and value
  of TBL_SIZE, the addition may yield a signed integer overflow.

6 Since C mandates "truncation towards 0", modulus and division by
  constants are a bit more complicated on signed integers (for
  instance, a division by 2 cannot be implemented with a simple
  shift: it yields the wrong result with some negative operands).
  Making "hash_value" an unsigned number makes things easier for
  the compiler to produce faster code.

7 "string" is a name which is reserved in standard C

8 "int", of course, is not very good for an array index. (size_t)

9 Using a signed integer type, especially if it's going to be summing
  with no range reduction within the loop.

10 Failure to bracket the "then" statement in {} could cause problems
  during program maintenance.

11 space characters aren't in short supply. Readability suffers.

12 There might be an old compiler somewhere for which "hash_value"
  is too long.

13 The algorithm computes values which may be quite biased for
  typical strings (it depends on what strings are used, of course).
  An almost infinite variety are available in the literature.
  Suffice it to say that the details would be best left to the
  implementer after learning the nature of the expected data set
  for the target application.

A fairly long list for so short a code fragment. :-)

-- 
Randy Howard (2reply remove FOOBAR)
"Making it hard to do stupid things often makes it hard
 to do smart ones too." -- Andrew Koenig


Relevant Pages

  • Re: [EGN] Hoisting Loop Invariants (Was: Re: [EGN] Numerical Accuracy)
    ... compiler out there somewhere that did as you claim. ... > the programmer has this knowledge, then the programmer should not use ... >> string in a loop, regardless of the blatant inefficiency of doing so. ...
    (comp.programming)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... The MAXIMUM size of a working string, accessed in its entirety, is ... compiler would generate a loop), ... > natural incompetence in the field of programming. ...
    (comp.programming)
  • Re: HELP, INFINIT LOOP... simple LINKED LIST
    ... > producing an infinit loop i think, but i can't figure out where.. ... let your compiler catch the obvious ... > typedef struct node ... > String Data; ...
    (comp.lang.c)
  • Re: Fast way to read string one char at a time
    ... > implementing the foreach loop on a string, I somehow expected it to be ... > implemented using some compiler magic. ...
    (microsoft.public.dotnet.framework.performance)
  • extension_pack
    ... It is used to set upper loop -- limits for non-deterministic values thus avoiding the use of access -- types and enabling the functions to be used for synthesizeable code. ... DivisorVal: integer) return std_logic_vector; function "/"(DividendVal: string; DivisorVal: integer) return std_logic_vector; ... for loopVar in 0 to slvVal'length/4-1 loop ... end loop; if then return not resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to else return resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to end if; ...
    (comp.lang.vhdl)