Re: Can SHA-1 produce dupe hash values?

From: Damian Menscher (menscher+crypt_at_uiuc.edu)
Date: 09/30/04


Date: Wed, 29 Sep 2004 22:42:35 +0000 (UTC)

car <car_member@newsguy.com> wrote:
> Hi, all! I am in an area that processes customer transaction records. To keep
> me (the data collection and manipulation service) from being able to see real
> customer IDs, the transaction source has implemented SHA-1 with "salt" to
> produce one-way hash text. I see the resulting hash value as the customer ID.

> I have been told that the way they are encrypting the customer ID, the resulting
> text will be consistent and distinct. They said it is practically impossible to
> get the same hash value for two different IDs. Is that true of the
> implementation of SHA-1 with the same salt every time? Does it matter what the
> length of the customer id is (too short, too long)?

> I am on a quick deadline to implement the handling the customer IDs (all old IDs
> have to be deleted), so I do not have much time to check out SHA-1 in dpeth, but
> I thought I remember hearing that one-way hashes could produce the same output
> for two different inputs - is that true? I just want to know how sure I can be
> that these hash texts will uniquely identify one and only one customer. It is
> OK for me to associate a customer's data together for analysis (customers who
> bought x also bought y within 30 days, etc), the restriction is on my putting
> transactions to an actual named John Doe. Since we got rid of the customer
> detail tables, the restriction is a mute point.

> Funny, my question is not how secure is the method, but instead how reliably
> unique is the result of the method...

I'm not a cryptographer, but since nobody else is answering, I'll
give it a shot:

As you say, there can be collisions. However, the chances of seeing
a random collision (from a so-called "birthday attack" are one in
2^80 (80 is half the SHA-1 hash size of 160 bits). So, you should
be fine until you get about 1,000,000,000,000,000,000,000,000
customers.

Since you're not worried about security, only uniqueness, the length
and salt probably doesn't matter.

Damian Menscher

-- 
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <menscher@uiuc.edu> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-


Relevant Pages

  • Re: rearrange "columns" of a multi-level hash?
    ... > customer vendor transType productCode appNumber resultCode ... > Without too much detail, the first report is sorted by Customer, then by ... > TransactionType, then by ProductCode, and then by resultCode, with a count ... > level hash: ...
    (comp.lang.perl.misc)
  • Re: Can SHA-1 produce dupe hash values?
    ... I am in an area that processes customer transaction records. ... > me (the data collection and manipulation service) from being able to see real ... I see the resulting hash value as the customer ID. ...
    (sci.crypt)
  • Re: Extending a secure zone to an insecure zone
    ... My goal is to prevent the customer from reading another customer's ... Now the attacker ... > know the hash of the plaintext to decrypt the plaintext. ... key you used for the MAC. ...
    (sci.crypt)
  • Re: Can SHA-1 produce dupe hash values?
    ... I am in an area that processes customer transaction records. ... I see the resulting hash value as the customer ID. ... what are your chances to get the same hash for two different customer ID's? ... As long as your customer ID is not significantly longer than your hash ...
    (sci.crypt)
  • Re: hash referrences and such
    ... ticket has several transactions associated with it that actually make up ... Some of those transactions are the addition of time spent on ... hash of each user and the total time spent on the customer. ...
    (perl.beginners)