Re: Paper & pencil password algorithm



Maaartin <grajcar1@xxxxxxxxx> wrote:

TO Ilmari Karonen: Thank you for your support.

Agreed. Thanks.

James Taylor wrote:

Really? This non-linear quality must be something a bit deeper than I
thought.

You should find the definition of linear on the internet and look at
it first. What I wrote were only some consequences important in this
context, the definition is the base.

For the things to be even more confusing, normaly you define
http://en.wikipedia.org/wiki/Linearity
over a
http://en.wikipedia.org/wiki/Vector_space
which itself is defined over a
http://en.wikipedia.org/wiki/Field_(mathematics)
but actually sometimes we need to speak about linearity over an
algebraic structure similar to vector space but defined over a
http://en.wikipedia.org/wiki/Ring_(mathematics)
So I'll write only "structure" in the following.

Well, I've been doing a lot of reading over the last few days, of the
above links and many others besides. While I really don't understand it
all, I think I may have absorbed by osmosis some level of partial
understanding, or at least an illusion of understanding.

I find this whole subject area much more fascinating than I would have
imagined, and I wish I had started studying this all much earlier in
life. Sadly, given that I have to earn a living, and have many other
goals to achieve before I die, I think I'm going to have to limit how
far down this rabbit hole I explore.

That's fascinating, I wish I could see how that linear
multiplication works. Seeing that might help me understand
the essential property required for non-linearity.

Look again at
a + b + 2*c = 9
a + b + c = 6
5*a + b + c = 10
which is the same like
a + b + c + c = 9
a + b + c = 6
a + a + a + a + a + b + c = 10
and compared it to
a * b * c * c = 512
a * b * c = 64
a * a * a * a * a * b * c = 1024

Hmmm, I notice the right sides are all powers of two:
512=2^9, 64=2^6, 1024=2^10.

So the multiplication equations could be rewritten as:
a * b * c^2 = 2^9
a * b * c = 2^6
a^5 * b * c = 2^10

Taking logs of both sides (log base 2 for convenience):
log(a) + log(b) + 2*log(c) = 9
log(a) + log(b) + log(c) = 6
5*log(a) + log(b) + log(c) = 10

This looks sufficiently similar to the additive equations above that I
think I can say with confidence that:

log(a) = 1 => a = 2^1 = 2
log(b) = 2 => b = 2^2 = 4
log(c) = 3 => c = 2^3 = 8

Which one is easier? How much easier?

Well, other than taking logs at the start and exponents at the end, it
worked exactly the same way. Oh, I guess that proves your point that the
multiplications, in this particular case, can be solved as a set of
linear equations.

Smart combining operation from different groups is the key to strength.


Ah, ok. But then why isn't the combination of fold and sum sufficient?

No, cause fold - in this sense - does nothing.

Nothing? Surely a permutation is not nothing. Perhaps you mean it only
does another operation in the same vector space, and thus both
operations can be represented by a single matrix or a single set of
linear equations.

Can you write down matrices for sum and fold? And is there any matrix
for foldsum?

I would love to know the answer to that. I think a permutation matrix
might be simply lots of zeros with a 1 once in each row, but it's
trickier to imagine a matrix that performs an ordered accumulative sum
of the digits with an additional one added in on the leftmost digit. I'm
not even convinced it's possible.

Are they not different operations? What kind of smart combining do I
need?

Some examples:
- addition and xor - but there's nothing like xor in decimal
- addition and multiplication
- addition and data dependent permutation
- addition and non-linear substitution

The latter two ideas sound interesting. How would data dependent
permutation work? Could that be as simple as changing the start point of
the fold depending on the odd/even-ness of the first digit? For example:

12345678 starts with a 1, an odd number, so do "odd" fold
1.2.3.4.
8 7 6 5
18273645

23456789 starts with a 2, an even number, so do normal "even" fold
.2.3.4.5
9 8 7 6
92837465

Would that be sufficient, when combined with the summing pass, to be
non-linear? If so, I think that would not add much at all to the time
required, and so it would be a worthwhile trade-off of effort for
security gained.

Presumably a non-linear substitution would be any substitution that
couldn't be represented by adding a constant to each digit. So I guess
this:

0123456789
4567890123

would be linear, but this:

0123456789
5061728394

would not be linear. Am I right?

In any case, such substitutions don't seem to add much in the way of
diffusion/mixing, while adding a fair bit of cognitive load. I therefore
prefer the permutation technique.

Now pls try this:
3*a + 5*b = 3
2*a + 8*b = 8
You can surelly get a real-valued solution. But what about solving it
modulo 10?

I'm willing to make the effort to learn, but I'm starting to think
you're giving me too much homework to do Martin! ;-)

The 3 and 8 on the right hand side of each equality are really multiple
possible values. In place of the 3 it is (3 or 13 or 23 or 33 or...) and
in place of the 8 it is (8 or 18 or 28 or 38 etc...). That's like having
two more unknowns. The equations could be re-written as:

3*a + 5*b = 3 + 10*x
2*a + 8*b = 8 + 10*y

It seems that we need two more equations in order to fix the two extra
unknown non-negative integers x and y. We can put an upper bound on x
and y because in the worst case (where both a and b are 9) we can see
that x is at most 6, and y is at most 8. Maybe I should just see where a
bit of substitution gets me:

3*a + 5*b = 3 + 10*x => a = (3 + 10*x - 5*b)/3
Substituting in the second equation:
2*(3 + 10*x - 5*b)/3 + 8*b = 8 + 10*y
2 + 20/3*x - 10/3*b + 8*b = 8 + 10*y
(8 - 10/3)*b = 8 - 2 + 10*y - 20/3*x
b = (6 + 10*y - 20/3*x)/(24/3 - 10/3)
b = (6 + 10*y - 20/3*x)*3/(24 - 10)
b = (18 + 30*y - 20*x)/14
b = (9 + 15*y - 10*x)/7

Now let's tabulate b against possible values of x and y and look for
non-negative single digit integer values of b:

7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
x
0 1 2 3 4 5 6
+---------------------------------------------------
0 | 9/7 -1/7 -11/7 -3 -31/7 -41/7 -51/7
1 | 24/7 2 4/7 -6/7 -16/7 -26/7 -36/7
y 2 | 39/7 29/7 19/7 9/7 -1/7 -11/7 -3
3 | 54/7 44/7 34/7 24/7 2 4/7 -6/7
4 | 69/7 59/7 7 39/7 29/7 19/7 9/7
5 | 12 74/7 64/7 54/7 44/7 34/7 24/7
6 | 99/7 89/7 79/7 69/7 59/7 7 39/7
7 | 114/7 104/7 94/7 12 74/7 64/7 54/7
8 | 129/7 17 109/7 99/9 89/7 79/7 69/7

There are only four possibilities:

b=2, x=1, y=1 [A]
b=2, x=4, y=3 [B]
b=7, x=2, y=4 [C]
b=7, x=5, y=6 [D]

Now substituting back into: a = (3 + 10*x - 5*b)/3

[A]: a = (3 + 10*1 - 5*2)/3 = 1
[B]: a = (3 + 10*4 - 5*2)/3 = 11
[C]: a = (3 + 10*2 - 5*7)/3 = -4
[D]: a = (3 + 10*5 - 5*7)/3 = 6

So solutions [A] and [D] both fit the equations with a single digit
result.
Either:
a=1 and b=2
or:
a=6 and b=7

So it seems, even with the additional restriction of a and b both having
to be single digit integer values, there was not enough information in
the two equations given to establish beyond doubt the value of a and b.

What is the lesson you wanted me to learn from this?

In what sense is a mapping from one set of things to another
set of a different type be said to be linear or not?

Just simple: Imagine a "natural order" of letters, i.e., ABCDE....
Let your transformation be like
A=00, B=01, C=02, ...
Then http://en.wikipedia.org/wiki/Caesar_cipher corresponds with a
addition of three, so it's (affine) linear.
But a better transform is not.

I can see how a mapping between two equivalent sets can be linear or
not. For example:

ABCDEF or 123456 but not ABCDEF nor 123456
EFABCD 345612 BCADFE 365124

but I don't see how the following mappings are linear or not:

ABCDEF ABCDEF 123456 123456
123456 152634 ABCDEF FBCADE

They are just one set of symbols mapped to another. There is no inherent
ordering in the symbols unless you choose to impose one, and no
relationship between the symbols until you define it. They're just
symbols.

If both add and xor are linear,

They are, but not over the same structure, you can see it here:
Try to simplify
a + b + c + a

2*a + b + c

a ^ b ^ c ^ a

b ^ c (the a's cancel)

(a ^ b) + (c ^ a)

No simplification possible.

(a + b) ^ (c + a)

No simplification possible.

Ok, I think you made the point.

So, perhaps we can hide the state of the Lagged Fibonacci Generator
(LFG) by running it in a separate parallel set of digits instead of
running it inline.

Without any OBVIOUS disclosure.

Actually, WITH obvious disclosure.

I think extending the hash is the most complicated step if it should
be secure.

Yes, I agree. And the more I try to think of ways around it, the more I
agree.

You asked me about the timing of the individual parts of the algorithm.
I haven't yet had time to really test this rigorously, but I intend to
do so, and to compare the various ideas we have had so far. This
includes appending a fixed phrase *before* hashing, and other schemes. I
will report back with the results.

By extending the hash you always get a longer password but expose you
key.

Yes, that is the essential problem and it seems unavoidable.

I have to keep reminding myself not to get too carried away with making
this system uncrackable. For most of the threats I mentioned (spyware,
phishing, keyloggers, crooked site admins, etc) the collector of
passwords will not know there is an algorithmic relationship and, even
if they did, they would not know the exact naming scheme used, so they
wouldn't have accurate plaintext names to go with each cipher password.

You mean the URL's? I'm quite sure the spyware would collect them
nicely paired with passwords.

No, I have never intended to use the full URLs. A short name like
"google" or "amazon" is enough, and maybe you'd go as far as including
the rest of the domain name if there was a likelihood of confusion. For
example nic.uk is different from nic.us and would require a different
password. After the name you would likely add a version number or date,
perhaps a username too if you had more than one account at the given
site. The exact convention you choose to use would vary between each
individual using this password system, so an attacker would be unlikely
to guess it with 100% accuracy.

Furthermore, they wouldn't be motivated to crack the algorithm because
the victim is just one of many thousands of others using much worse
passwords and reusing them on lots of sites.

Sure, that's a sort of http://en.wikipedia.org/wiki/
Security_through_obscurity#Security_through_minority

Not exactly the same. Security through minority is hoping to be more
secure by using a system used by so few that the hackers don't feel it's
worth designing attack for it. Here, we are hoping that the hackers
won't consider it worth trying to crack our password system because
there are so many easier passwords to guess, and so easier ways to
collect more passwords than by cracking an algorithmic system.

The only threat I think requires this system to be secure is the
possibility of a targeted attack, such as a malicious work colleague
with whom I previously shared the algorithm and who has the ability to

I wouldn't share the final version of the algorithm with anybody. You
can share the ideas, you can teach the algorithm, but you can also
save some steps for yourself.

Yes, perhaps, but my goal is to find a system that *everyone* can use to
improve their password quality and security. If I do keep anything to
myself it will be very minor details, not the basic algorithm. I
wouldn't want secrecy of the algorithm to be necessary for it to be
secure enough for the purpose.

Extending a short hash is much quicker and as long
as my method of extension does not reveal
information about my key, I'll be happy with it.

But than maybe one foldsum round could be more secure.

Are you suggesting that I could append the result of a third foldsum
round to the output of the second round to make a password twice as
long? Wouldn't that also reveal too much about my key?

Remember that the key and hash should be sufficient to make the password
unpredictable to someone who has seen only a handful of passwords, so
the purpose of the extension is only to beat rainbow tables and blind
brute-force.

The main problem is that - if done as described - the extension
reveals your key. Using a different key for this could make it
acceptable.

Of course, the extension would reveal just as much about the 2nd key as
it currently does about the 1st key. The only benefit is that, without
the 1st key, the attacker cannot predict your other passwords. He can
however work out what the hashes were using the 2nd key, and may be able
to reverse them (given the linearity of foldsum) and thus be able to
work out the 1st key too.

The extension doesn't need to be terribly secure in itself.
So, if we can find no better extension mechanism, I'll just add a fixed
suffix to the password, or repeat the password to the required length.

That's fine. Or use something simple and stupid after the conversion
to letters.

I love that idea. It's the best idea so far.

I'm quite sure it's less work than your LFG, but I wrote a very short
program doing it. It uses the following "key"
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
and extends a random 10 characters string to 20 characters. Here's the
output:

5LIwLTE6ti6JMFuMxU7j
G2xGVNd3rzHyWes3HO40
zcjhOTy6yW0kPzzdiU7X
d3Mom0ax0JeNnb14p1yK
JLUtzpor34KV0p4Muqs5
[snip]

Looks great, and not obviously a pattern.

Can you crack it? I think you can,

Well, I spent several hours on it, encouraged by spotting that the 11th
char is always one after the 1st char. However, I was not able to see
the relationship between the 12th char and the 2nd. Please explain how
it works.

but here I used a different key (taken from wiki)

You mean Wikipedia? Please give me the URL.

and the same algorithm:

UOG4QTiTooI9UcwW3HHw
ONU5m6cs8rWIpk5F24oo
mS6b0TLOqqp4tAuOrHWu
uE4e4BtbUui33hIQqRri
eWCEKgUOdEqKBIoNQTWQ
[snip]

Well, as I couldn't see how the first one was working, I cannot judge
how much harder it is to see this one.

What was the key used? Now imagine you know neither the algorithm nor
the key. Maybe it suffices your needs

I think it probably will, but I need to see how it works first so I can
make a judgement. Please describe how it works. Thanks.

Try to break your original LFG idea. This will help you more than
asking and reading.

Yeah right! I'm not sure I could.

Consider the following digit sequence
a ? ? ? ? ? b c
If I got it right, a+b=c holds. Now pair the digits like
a? ?? ?? bc
and use your square and get for example
X ? ? Y
Imagine you see
Z ? ? Y
somewhere. Translate it back to digits like
? ? ? ? ? ? b c
But you know the first question mark, don't you? So you can say that X
and Z lies in the same row. That's bad.

That's clever. Thanks.

I think there's an algorithm doable in five minutes and secure enough,

I share that belief. It may look nothing like what we currently have
though. It may all be doable in a single step based on a two dimensional
representation of the hashing process that doubles up as the
pseudo-random generator.

I don't understand what you mean in the last sentence.

I only mean that a two-dimentional hashing technique might be more
effective than our current one-dimentional ideas, and if we were clever
about the design of it then we could use the final hash-result-grid
itself to generate the random digit sequence, thus saving manual work
*and* not leaking information about the key.

I'm just hand waving, of course, because I haven't yet designed such a
hash technique, it's just a dream. I think 2d would be harder to do
neatly on paper than 1d hashing in separate rows too.

Let me know how long the other steps takes. Maybe we'll find something
simple avoiding the transformation to digits and back.

Yes, that's a possibility, although I really quite like the digits
because (in base 10 anyway) they are easy to manipulate, and also the
Polybius square gives us some nice fractionation.

I will do some timing tests just as soon as I find some time. I have had
to make myself stay up late (3am now) just so I can finish writing this
reply before another week passes. During the daylight I have to get on
with my real work, unfortunately.

--
James Taylor
.



Relevant Pages

  • Re: Paper & pencil password algorithm
    ... of the digits with an additional one added in on the leftmost digit. ... Take one secret digit, let's say 1, an odd number, so write the first ... Multiplying each digit by a constant is linear as well (just think ... I'd encode the message using some constrained character sets: ...
    (sci.crypt)
  • Re: Top 500 is out
    ... Computing pi by a more sensible algorithm will use a couple of dozen ... gives details, describes them as "almost linear time", and specifically ... We give algorithms for the computation of the d-th digit of certain ...
    (comp.arch)
  • Re: Top 500 is out
    ... that these algorithms ... | gives details, describes them as "almost linear time", and specifically ... | We give algorithms for the computation of the d-th digit of certain ...
    (comp.arch)
  • Re: decrementing arrays
    ... A 16 digit number ... "One linear run" ... requirements analysis in 90% of the battle. ...
    (comp.lang.c)
  • Re: Calculating two free axes
    ... You can permuate the coordinates if vn is zero to get ... On paper you solve systems of linear equations easily. ... generate a set of linearly independent vectors huh? ... algorithm to do this while I have. ...
    (comp.graphics.algorithms)