- From: Dimona <mmm@xxxxxxx>
- Date: Fri, 26 Jun 2009 21:27:05 +0200
Find everything you need copied and posted below, no need to visit any website:
Response Part 2: Complexities of Key Management
Published by jresch on June 24, 2009 in Uncategorized.
In a recent post we discussed some drawbacks of encryption. We received a lot of feedback, both negative and positive. Now we’d like to explore some of the math behind that claim and clarify some of the points we made.
This is the second post in a 3 part series. We will discuss the following points in these three posts:
Today’s encryption systems are designed to provide a sufficient level of security for protecting against today’s threats. These systems are not designed to provide security for long term storage over decades. The security benefits of Dispersal are more resistant to advances in computing power and mathematical discoveries.
Today’s storage systems generally make a trade-off between confidentiality and availability. Systems which accomplish both do exist, but often at the expense of low storage efficiency. Dispersal allows the user to achieve high levels of confidentiality and availability while minimizing storage inefficiency.
Given current laws, companies must disclose data loss regardless of whether data was encrypted or not. Dispersal changes the conversation because a typical loss event will result in the original data being mathematically impossible to recover from any lost components.
Confidentiality vs. Availability
When storing important confidential data, one has two goals:
Availability: The data should always be available to the authorized entities, and never be lost
Confidentiality: The data should be kept private and inaccessible to unauthorized entities
So simultaneously we must keep the data as available as possible to the right individuals while also keeping it as unavailable as possible to the wrong individuals. Traditionally, confidentiality is achieved using encryption. By encrypting the data all efforts toward confidentiality can focus on the key, because encrypted data remains private to someone who doesn’t have the key.
One could stop here, having achieved a decent level of confidentiality but only if one doesn’t care about losing the data. In the current state, such data would be highly vulnerable to loss. If the hard drive, CD-ROM, thumb drive or whatever media storing the key is lost, breaks or becomes corrupted then the encrypted data will remain forever irretrievable. Likewise any similar problem with the media storing the encrypted data will cause a loss of data.
One way to deal with the threat of loss in this situation is to make multiple copies. Backup the encrypted data to tape, or to another off-site location. Likewise store multiple copies of the key in different locations to prevent the failure or loss of any one device from causing a loss of data. In doing so one will have achieved a decent level of availability.
In making all these copies to maintain availability, what have we done to our confidentiality? We now have multiple copies of encrypted data, and multiple copies of the key. Each copy represents another attack vector for an adversary, and another thing which attention must be focused on protecting. The more we try to enhance availability the worse things become for confidentiality.
If only we could have the best of both worlds: a high level of availability AND confidentiality.
Secret Sharing schemes promise just that: high availability and confidentiality. They have been known for a long time, having been invented independently by Adi Shamir and George Blakley in 1979, and some key management systems use it for storing keys.
The basic operation of a secret sharing scheme is to take a secret, in this case a key, and create some number of shares from it. To recover the secret, some user-defined minimum number of shares must be used in the calculation. This provides high availability; a number of shares can be lost and so long as you can obtain the minimum number required, you can get the key back. For example, lets say we created 5 shares for a key, and made the minimum number required for recovery 3. In such a case we can tolerate the loss of any two shares, much as if we had made 3 copies but without the associated loss in confidentiality.
In fact, the security of storing the key as shares greatly enhances the confidentiality, beyond that of storing a single copy of a key. Again using the 5/3 (5 shares, 3 needed) example, if a single share is exposed, stolen, or otherwise compromised the privacy of the key is maintained. Even if two shares are simultaneously revealed, the confidentiality of the data is maintained, because the threshold of 3 is not met. The practicality for an adversary to locate and steal multiple shares from potentially different locations is much harder to pull off than gaining access to a single key at a single location.
Information Theoretic Security
Another benefit of secret sharing schemes is that they provide information theoretic security. This means that with less than the threshold number of shares, there simply isn’t enough information available to figure out what the secret is, even if one had infinite computing power or time to try to crack it. To provide this level of security, however, comes at a cost: Every share needs to be the same size as the original secret. This is generally not a problem when storing something small like a key, which is usually less than a Kilobyte in size, however it makes secret sharing schemes impractical for bulk data storage.
To see why, imagine you are designing a secure storage system which needs to store 500 GB of data. You learned of secret sharing systems and want to use one to securely store that 500 GB of data. Lets say you decide to create 5 shares of which any 3 are needed to recover the data. To do this will require buying 2,500 GB of storage, since each of the 5 shares will be of equal size to the data being stored. Therefore you have a 5-fold increase in storage requirements and therefore a 5-fold increase in the cost of the system.
Wouldn’t it be great if there were a way to store information efficiently while keeping the availability and confidentiality of secret sharing schemes?
Dispersed StorageTM Technology
Dispersal, when combined with an All-or-Nothing Transform (AONT) makes such a thing possible. To achieve this level of efficiency requires that we abandon information theoretic security, but the practical benefits of information theoretic security are minor. One-Time-Pad encryption is a type of encryption which is provably unbreakable, because it provides information theoretic security, but like secret sharing, its theoretical benefits simply don’t outweigh its practical costs when used for bulk data encryption. Given that the level of security provided by the AONT can be set arbitrarily high (there is no limit to the length of key it uses for the transformation), information theoretic security is not necessary as one can simply use a key so long that it could not be cracked before the stars burn out.
By dropping the requirement for information theoretic security we can achieve the highest theoretically possible efficiency for the given level of fault tolerance. Cleversafe is unique in providing information dispersal with an AONT and was the first to make secret-sharing-like systems for efficient bulk data storage. Unlike shares, slices are only a fraction of the size of the original data, specifically they are 1/threshold the size. If we again look at the 5/3 example, instead of storing 5 shares each of which is 500 GB, we would instead store 5 slices, each of which is (500/3) or 167 GB. Therefore the total storage requirements would be 5*167 or 835 GB, this is 1/3rd the cost of the 2,500 GB system.
The availability and confidentiality benefits of dispersal become even greater the wider the dsNet. A common configuration we recommend is 16/10, that is 16 slices, with a threshold of 10. This setup has greater availability than say the 5/3 example because we can tolerate the loss of 6 slices simultaneously, compared with 2 for the 5/3 setup. Furthermore it becomes more confidential because an attacker would need to compromise 10 slices, not just 3. Surprisingly, the system is even more efficient, a 16/10 configuration would need 16 50 GB slices, so it needs only 800 GB not 835. Note that this is less costly than creating just a single copy which would require a total of 1,000 GB.
With Dispersed StorageTM Technology there are fewer headaches than with existing key management systems. This is true for a number of reasons. The first is that one no longer has to worry about having to trade availability for confidentiality, or vice versa. One need not choose which is more important; both are important, and the storage system should reflect that fact. Dispersal provides the high availability and confidentiality of secret sharing schemes, and it can do it for your data directly. No separate system for storing keys is needed, it would be superfluous and only hurt availability.
Availability will always be detrimentally affected by adding a key management system, because it is one more thing which can fail. If the key management system fails and you lose your keys, then you’ve also lost all your encrypted data. Short of replicating the key many times or storing shares in many locations, existing key management systems cannot match the availability that information dispersal provides. Therefore the key management system will be the weakest link in the chain and the more likely path to data loss.
The All-or-Nothing Transform merges the key and the data such that only one storage system is required, it meets both the confidentiality and availability requirements. The AONT is also not vulnerable to advances in math or quantum computers as RSA and ECC are, nor does it at any point rely on passwords which can be easily cracked or forgotten.
Generate random symmetric key: R
Calculate hash of encrypted data: H
Append (H XOR R) to the encrypted data
Information Dispersal (configured with N/K)
Add padding to data up to next multiple of K bytes (using PKCS5)
Divide padded data into K equal pieces, these are the first K slices
Using Reed-Solomon codes, compute (N-K) additional slices to provide forward error correction
Send each of the N slices to a different storage node
Information Dispersal (configured with N/K)
Request slices from any K of N storage nodes
Using Reed-Solomon codes, compute any of the first K slices that are missing
Concatenate the first K slices in the correct order
Remove padding from the concatenated data
Strip the appended masked key from the end of the data
Calculate the hash of the encrypted data: H
XOR the hash with the masked key to recover the random key: R
Use the key to decrypt the data
Ordinarily the availability world be hurt by using an All-or-Nothing transform and storing just the divided pieces in different locations, this is where the IDA comes to the rescue. By creating additional code slices we can recover from situations in which some of the data slices are missing or otherwise not available. Just how many are required is determined by the user-configurable threshold, which in the above diagrams is 5. So so long as any 5 are still recoverable then the entirety of the All-Or-Nothing Transformed data can be recovered, and thus the original data can be as well.
Looking at how the decode operation works, one can see the criticality in having ALL of the data. Without all of the data in its entirety one cannot compute the digest of the encrypted data, and therefore one cannot un-mask the random key. Without knowledge of the random key NOTHING of the original data can be decrypted.
It may seem magical that all these goals can be achieved, but the benefits are not without cost. Dispersed StorageTM Systems have requirements beyond those of most existing storage systems. To achieve the full benefits of dispersal requires infrastructure: multiple sites, high bandwidth connections, at least as many computers as the IDA’s configured width, memory for managing TCP connections, and CPU power for encoding and decoding slices. For these reasons dispersal may not be an option for everyone, however if one has massive storage requirements, desires the utmost reliability or security, or already has the existing infrastructure then dispersal may be the ideal storage solution.
Please stay tuned for the final blog posts in this series where we will explore disclosure laws and their impact on businesses.
« Response Part 1: Future Processing Power
2 Responses to “Response Part 2: Complexities of Key Management”
Feed for this Entry Trackback Address
June 26, 2009 at 8:31 am
As a follow-up of my questions on the previous article, i have some comments:
1. As stated in the Ron Rivest paper about AONT, “We note that the all-or-nothing transformation is not itself “encryption”, since it makes no use of any secret key information”. The interesting property is that if you do not have ALL the data, you cannot decrypt it.
2. At the end, there is no secret sharing scheme, such as Shamir or Blakley. The IDA scheme is the application of Reed-Solomon codes.
3 I’m not sure but i think that because you can reconstruct the data without having it all (but having several blocks on different copies) that collides with the AONT.
May you please give me your analysis on these 3 points?
June 26, 2009 at 9:07 am
In point one, I think what Rivest was saying is that the AONT state is easily reversible, so long as no data is missing. In that sense the data isn’t even really encrypted, since there is no additional key involved. The security in our system comes from the way that without a threshold number of slices one cannot get all the data.
Right we do not use a secret sharing scheme like one proposed by Shamir or Blakley. The secret schemes they proposed are threshold based schemes which rely on information theoretic security to ensure that with less than a threshold number of shares no information can be obtained. Instead we use Reed-Solomon codes to provide the threshold system, and the AONT to ensure that with less than a threshold number of slices it is very difficult to get any information about the original data. By difficult, I mean as hard as brute forcing the random symmetric key used in the transform.
Regarding your final point you are right. We start with the All-Or-Nothing Transformed data, but the IDA makes it such that the data becomes a “Some-Or-Nothing” scheme, in that some threshold number of slices is needed to get all of the original All-Or-Nothing transformed data. The number of bytes needed to get to that point is still the same as the the length of the transformed data, we just have more options as to where we may retrieve slices from. Without the IDA, splitting the transformed data would always be an N of N system, the IDA allows it to become a K of N system much like the secret sharing schemes proposed by Shamir and Blakley, but without the massive overhead those schemes have.
I hope this clarifies things for you, please let me know if you still have any lingering questions.
- Prev by Date: Encryption Response Part 2: Complexities of Key Management -- http://bit.ly/dcyeo
- Next by Date: Re: Tips to keep your PC from crashing
- Previous by thread: Encryption Response Part 2: Complexities of Key Management -- http://bit.ly/dcyeo