Password Strength Estimation using Markov Model in Haskell.
Not an actual library. Or even a useful program.
Made for fun as a learning exercise
cabal v2-run
This will launch a REPL, with the following options
addPassword "password_to_add to model"
This will add the given password to the model currently in use.createWeights "path_to_training_directory" "charset"
.path_to_training_directory
is the directory in which to runsplit -l100000 inputfile
, becauseinputfile
may be very large, containing more than a few million lines, each containing a password. This may take a long time (a 8 minutes for the RockYou password set containing 14 millions passwords). It will automatically load the newly created weights as well.charset
contains the set of alphabet with which passwords are constructed. charset is just one very long string for exampleabcdefghijklmnopqrstuvwxya0123456789
loadWeights "path_to_weight"
.path_to_weight
is a weight file generated by the program. This loads much faster than creating weights for the first time (Within a 20-30 seconds for RockYou).writeWeights "path_to_output"
inpath_to_output
specify a path, with the filename. The filename will automatically have.wt
appended to it. This is to load the modified model the next time much faster.passwordStrength "password"
will print the strength estimation for given password.exit
to leave.
This repo contains resultweight.wt which is an already created weight file that was constructed using the RockYou leaked passwords test set. This model can be built upon even more by adding more passwords from different sources. Fair warning though, this model is incredibly memory hungry. The weights file may be of 100 MB, but upon loading, the the ram usage of this program blows up to 2GB.
Most paswords are made using some form of natural language, with modifications to perhaps make the passwords stronger. In natural languages, some letters are more likely to follow a particular character than others.
This relation can be concretely shown using a Markov Chain, by considering the letters as states, and the edge from 'a' to 'b' describing the total number of times a word contained the transition from 'a' to 'b'. In order to get a useful Markov Chain, we must add a large number of commonly used passwords. That is the model creation step in this program.
One method of breaking passwords is to randomly generate passwords from a Markov model, by walking the Markov chain and generating passwords in the decreasing order of likelihood, i.e. when faced with multiple possible transitions, choose the one with the highest probability. This is much more efficient than brute-force, and is a very popular method as well.
If we take any password, break it up into n-grams, and find out by walking the Markov Chain, the probability of each n-gram and multiply them together. We get the probability of that password being generated by the Markov Model.
Now we use Shannon Entropy which is -log(P(x))
where P(x)
is probability of x. This is the final value that is returned.
If the output is 10.49 which is approximately the output for "password" which is one of the most commonly used passwords, this implies that using the given Markov Model, "password" can be predicted in (2 ^ 10.49) guesses, which is approximately 1000 guesses.
A score greater than 60 is pretty safe because of existing technology.
This could potentially be used (after some modifications, perhaps by exposing an API) with a live website. Every time a user registers, their new password is fed into the Markov Chain, to update it's state, so the system keeps learning.