Re: SSN encryption

drfremove_at_nber.org
Date: 09/29/05


Date: 29 Sep 2005 13:01:26 -0700

Peter Pearson wrote:
> drfremove@nber.org wrote:
>
> > We want to encrypt social security numbers in a database.
>
> You've triggered Pearson's predictable "clarify your requirements"
> lecture. Regular sci.crypt readers can move along.
>
> You need clarity in your requirements. Without clarity,
> (a) you won't get useful help, and (b) you can't tell
> whether or not the final design meets your requirements.
> If you're not sure where you're going, you'll never know
> whether you've arrived.
>
> "We want to encrypt X" is not a requirement; rather, it's
> an implementation suggestion. A good requirement sounds
> like this: "It should be computationally infeasible for
> someone knowing A, B, and C, but not knowing D, to guess
> the value of E with probability of success greater than F."
>
> If you can't separate the data-security function from the
> rest of your application cleanly enough to articulate the
> necessary requirements, then you'll have to hire a cryptographer
> who can study the whole application.
>
> So .... you have a database that includes SSNs. It appears that
> you're using the SSN as an index into the database, and that
> you want to deny somebody the ability to extract certain
> information from the database.
>

In the United States, there are many datasets whose content need not be
kept secret, as long as the content can not be associated with a
particular individual. For example, the census department releases a
public use dataset which anyone can purchase which includes much
private informatio on a 1% sample of all households, but since none of
the information included allows one to identify exactly which household
is referred to, there is no violation of confidentiality. Most
government sponsored surveys promise respondents that their responses
will not be released in "indentifiable" form. The content can be
released, but not with identifiers, or with sufficient detail that
individuals could be identified. So often bith year will be included,
but not birth day. Name, address and SSN are always excluded. The
national archives and the census department offer hundreds of such
surveys for sale.

Now we have a dataset which is not actually public use, but is used by
a dozen or so researchers who have signed confidentiality agreements.
So there is no requirement that the SSNs be hidden. However, we have
been asked if it wouldn't be possible to conceal the identifiers from
the research datasets, as an extra level of protection. That way an
accidental release wouldn't be quite so serious, since the intruder
would have difficulty finding identifying who any particular record
referred to. It seems to be a reasonable request, and several
government agencies have published statements that they do this with
similar datasets. We are interested in following that lead.

> Why not just encrypt the entire database with a secret key?
>

Then we would have to give each researcher the secret key, and they
would have to work with the unencrypted dataset, leaving it possibly
exposed if there were a breakdown in the other security precautions.

Encrypting a dataset that is in constant use doesn't seem to accomplish
much. I understand it might be effective for backup tapes - then the
tapes could be stored in an insecure location. But if the data is used
every day, then the unencrypted dataset will always be available, and
the fact that there is another encrypted copy doesn't seem to be very
significant.

The statistical software used in analysis does not support encryption.

> Is the database shared? If so, how much are the users
> trusted? Can they share a secret key? Must they all be
> able to modify the database?

The database has several users. They are trustworthy but may possibly
make mistakes. They are subject to "shoulder surfing", etc.

They do not update the database, but they make extracts and merge
extracts from multiple tables for analysis. These temporary tables
persist for weeks or months, and may be used frequently.

Now perhaps you can see the logic to my request. If the SSN can be
suitably hashed, then the original records including the SSN can be
stored offline, and analysis can go on as before, with table merges
easily done on the hashed SSN, but the accidental release of a record
will not include an individual identifier, mitigating concerns about
violations of outer security barriers.

>>From a cyptographic point of view, I thought the main difficulty is
that the intruder can probably identify himself in the database, and
perhaps some close relations, if he knows enough about them. This would
give him plaintext for several encrypted fields, and we wouldn't want
that to be sufficient to recover the key. Practically, it would be nice
if the hash were no more than 9 characters, so we didn't have to modify
any of the documentation when we replaced the field. (Beyond pointing
out that SSN was encrypted).
>
> --
> Peter Pearson
> To get my email address, substitute:
> nowhere -> spamcop, invalid -> net

Hope this is enough information to make clear what we are looking for.
It really isn't that obscure.

Daniel Feenberg
feenberg isat nber dotte org



Relevant Pages

  • Re: Application security question
    ... you want to implement security. ... So you are protecting the database from direct querying and altering ... login credentials for the database from the application. ... Why encrypt the password? ...
    (comp.lang.java.programmer)
  • Re: Which is more secure RC2 or RC4 ?
    ... same database temporarily, until the order is approved manually and the ... obviously there are a LOT of security related issues that arise ... itself in order to decrypt the information, ... meaning if I encrypt the information using AES and a password driven ...
    (sci.crypt)
  • Re: SSN encryption
    ... >>We want to encrypt social security numbers in a database. ... >>requirement is that the same SSN should encrypt to the ... To avoid collisions you also have to store the encrypted ...
    (sci.crypt)
  • RE: protecting .NET assemblies from hackers
    ... try exposing a web service or a remote class.. ... So one thing to do here is use a code obfuscator to encrypt ... > edit data on basically every table in the database. ... >> other methods of security like domain authentication or using ssl. ...
    (microsoft.public.dotnet.general)
  • Re: SSN encryption
    ... First the SSN is not anything approaching random, ... > Social Security Administration has a website on they are generated ... the intruder solve for the remaining digits. ... the database, that may be much harder. ...
    (sci.crypt)