Neural Network Techniques For Proactive Password Cracking

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 3, NO.
4, OCTOBER-DECEMBER 2006 327
Neural Network Techniques

for Proactive Password Checking
Angelo Ciaramella, Paolo D’Arco, Alfredo De Santis, Clemente Galdi, and
Roberto Tagliaferri, Senior Member, IEEE
Abstract—This paper deals with the access control problem. We assume that valuable resources need to be protected against
unauthorized users and that, to this aim, a password-based access control scheme is employed. Such an abstract scenario captures
many applicative settings. The issue we focus our attention on is the following: Password-based schemes provide a certain level of
security as long as users choose good passwords, i.e., passwords that are hard to guess in a reasonable amount of time. In order to
force the users to make good choices, a proactive password checker can be implemented as a submodule of the access control
scheme. Such a checker, any time the user chooses/changes his own password, decides on the fly whether to accept or refuse the
new password, depending on its guessability. Hence, the question is: How can we get an effective and efficient proactive password
checker? By means of neural networks and statistical techniques, we answer the above question, developing suitable proactive
password checkers. Through a series of experiments, we show that these checkers have very good performance: Error rates are
comparable to those of the best existing checkers, implemented on different principles and by using other methodologies, and the
memory requirements are better in several cases. It is the first time that neural network technology has been fully and successfully
applied to designing proactive password checkers.
Index Terms—System security, access control, passwords, machine learning, neural networks.
1 INTRODUCTION
T HE Access Control Problem: A Challenge. The design of

efficient and secure protocols for protecting partially
shared or private resources from unauthorized users’
easy to guess, i.e., it belongs to a small dictionary of words
[19], [15], [14]. In this case, all words in the dictionary can be
checked until a match is found in a reasonable amount of
accesses is a big challenge for computer scientists. Even time. Such attacks are referred to as dictionary attacks.
though several suitable techniques have been proposed in A solution developed in order to strengthen the pass-
the literature over the years, e.g., biometric identification word-based scheme is the one-time password approach
schemes [20], [13] and challenge-response protocols based [11], in which the user, by means of a passphrase, generates
on smart cards [22], password-based schemes are still a list of passwords that are used to log in onto a remote host
frequently used due to their simplicity. just once. However, such a technique is still not secure
Password-Based Schemes and Dictionary Attacks. In a against dictionary attacks. Indeed, the Request for Com-
password-based access control scheme, a user who wishes ments 2289 [12] requires that, in order to reduce the risks
to gain access to a resource or a system executes a (possibly related to dictionary attacks, the length of the secret
interactive) protocol whose goal is to “prove knowledge” of information used to generate the one time password
some secret information, i.e., the password. As an example, sequence has to be at least 10 characters.
the familiar login-password scheme to get access to a Notice that, starting from [2], a lot of research has been
computer constitutes the basic authentication scheme done in order to design password-based authentication
implemented by every operating system. schemes that are secure against dictionary attacks [25], [8],
These schemes seem to be secure if the user keeps his [18], [7]. We stress that the problem of choosing good
password secret. Unfortunately, this is not true. Indeed, a passwords is not restricted to access control of network
password can be retrieved not only when the user system hosts. Indeed, passwords are also often used, for
accidentally discloses it, but also when the password is example, to protect private information such as cryptographic
keys or data files. In general, to increase the security level of
password-based systems, we need a method to reduce the
. A. Ciaramella and R. Tagliaferri are with the Dipartimento di Matematica
ed Informatica, Università di Salerno, Via Ponte Don Melillo, I-84084, efficacy of dictionary attacks. This goal can be achieved if
Fisciano (SA), Italy. E-mail: {ciaram, robtag}@dmi.unisa.it. users are not allowed to choose easy-to-guess passwords.
. P. D’Arco and A. De Santis are with the Dipartimento di Informatica ed Weak and Strong Passwords. To simplify our discus-
Applicazioni, Università di Salerno, Via Ponte Don Melillo, I-84084,
Fisciano (SA), Italy. E-mail: {paodar, ads}@dia.unisa.it. sion, we will informally use the terms weak or bad for easy-
. C. Galdi is with the Dipartimento di Scienze Fisiche, Università di Napoli to-guess passwords and strong or good for hard-to-guess
“Federico II”, Via Cinthia, Complesso Monte S. Angelo-I-80126, Napoli, ones. Notice that, according to the definition of dictionary
Italy. E-mail: c.galdi@na.infn.it.
attack, weak basically means a condition of membership in
Manuscript received 6 Dec. 2005; accepted 17 May 2006; published online some dictionary of words that can be exhaustively checked
2 Nov. 2006.
For information on obtaining reprints of this article, please send e-mail to in a reasonable amount of time, while strong refers to the
tdsc@computer.org and reference IEEECS Log Number TDSC-0167-1205. opposite condition. These two notions are computational in
1545-5971/06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society
328 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 3, NO. 4, OCTOBER-DECEMBER 2006
nature. Hence, a password is weak if it can be found in a proactive password checking. Instead of using standard
reasonable amount of time, while it is strong if the search computing techniques, the classifier was implemented by
requires unavailable resources of time or space, i.e., it can be means of a perceptron, the simplest example of neural
any element of a big dictionary constructed over a given networks. The extended abstract pointed out the efficiency
alphabet. It follows that a strong password looks like a and efficacy of the approach compared to previous
random string. proposals.
Previous Work on the Subject. Several papers have Our Contribution. In this paper, we fully develop the
addressed the issue of choosing good passwords. In the approach of [6]. We discuss and analyze proactive
literature, different techniques have been proposed in order password checkers based on multilayer neural networks.
to discourage/remove the choice of easy-to-guess pass- We evaluate the performance of several network topolo-
words (see [26] for a recent overview). Proactive password gies and of a combined approach comprising standard
checking is a promising technique. A proactive password preprocessing techniques of the inputs and neural net-
checker is a program that interacts with the user when he works. We compare the performance of our system with
changes his password. It checks the proposed new pass- those obtained by [5], [16]. The results obtained show that
word and the change is allowed only if it is hard to guess. If proactive password checkers based on this technology are
the password is easy to guess, the system asks the user to a suitable alternative to currently available solutions and,
type in another password instead. The philosophy on which for resource-constrained devices (e.g., smart cards), they
these programs are based is that the user has the ability to might represent the best choice.
select a password, but the system enables the selection of
nontrivial ones only. 2 A MATHEMATICAL FRAMEWORK
Proactive Password Checkers. A proactive password
checker is a simple program conceptually. It holds a list of A Model [4]. Let P be the set of all acceptable passwords, let
weak passwords that must be rejected. When the user p be an element chosen from P, and let s be a function used
chooses or wishes to change his password, it checks for to select the password p from P. Then, denote by p0 a guess
membership in the list. If the password is found in the list, for the password p and assume that it takes a constant
the substitution is not enabled and a short justification is amount of time T ¼ tðp0 Þ to determine whether this guess is
given; otherwise, the substitution is allowed. However, a a correct one, i.e., if p0 ¼ p.
We can model the choice of p in P with a random
straightforward implementation of such a program is not
variable G, taking values in P. These values are assumed
suitable for two reasons: The list of weak passwords can be
according to a probability distribution PG over elements of
very long and cannot be kept in the first levels (i.e., cache
P that is induced by the selection function s. Moreover,
and main memories) of the memory hierarchy. Also, the
assuming that s is known, the time to guess p can be
time for checking membership can be high, which implies
represented with a random variable FPG , which assumes
an unacceptably long wait for the user. Therefore, several
real values according to PG .
proactive password checkers that aim at reducing the time
If G is uniformly distributed on P, i.e., PG ¼ U, and no
and space complexities of the trivial approach have been
proposed (see [17], [10], [23], [16]). All these models are an prior knowledge of the authentication function (the func-
improvement over the straightforward scheme. However, tion used by the operating system to check the equality of a
both the straightforward scheme and these checkers have a guess with the true password) is available, then, as pointed
low predictive power when tested on new dictionaries of out in [4], to guess the selected password p, we have to try,
words, i.e., they do not perform well if passwords are on average, jPj
2 passwords from P and the expected running
chosen from dictionaries which have not been considered time is EðFU Þ ¼ T jPj2 .
during the setup phase of the checker. Indeed, a desirable Notice that, in this model, there is a correspondence
feature of a proactive password checker is the ability to between the set S of selection functions and the set DP , the
correctly classify passwords which do not appear in the set of all probability distributions on the set P. Therefore,
initial set. To this aim, an interesting approach for designing we can characterize the bad selection functions s to choose p
a proactive password checker is the one applied in [1]. The in P, with those probability distributions PG such that
problem of password classification is therein viewed as a
EðFPG Þ kEðFU Þ: ð1Þ
Machine Learning Problem. The system, in a training phase,
using dictionaries of examples of weak and strong pass- The parameter k 2 ½0; 1 defines a lower bound on the
words, gets the knowledge for distinguishing weak pass- suitability of a given selection function, represented by the
words from strong ones. This knowledge is represented by distribution PG . If p is chosen according to a probability
means of a decision tree. Later on, the decision tree is used distribution PG that satisfies (1), we say that p is easy-to-
for classification. The experimental results reported in [1] guess.
showed a meaningful enhancement of the error rates of A family of bad selection functions is represented by
previous solutions. The same technique was subsequently language dictionaries, where the dictionary can be seen as
applied in [5], where the power of the checker was the image set of a selection function s. The words in the
increased by exploring another key idea of machine dictionary are a small subset of all the strings that can be
learning in the construction of the decision tree: the constructed with the symbols of a given alphabet. Accord-
Minimum Description Length Principle (MDLP). Finally, [6], ing to our model, the distribution induced by languages is
put forward the possibility of using neural networks for skewed on P since they assign nonzero values only to a small
CIARAMELLA ET AL.: NEURAL NETWORK TECHNIQUES FOR PROACTIVE PASSWORD CHECKING 329
TABLE 1 object classification. Loosely speaking, we assume that a

The Charset given universe of objects can be partitioned in different
classes according to some characteristics. The recognition
problem consists in associating each object to a class.
Generally, a pattern recognition system is a two-part device:
a feature extractor and a classifier. It takes as input an object
and outputs the classification. A feature is a measurement
taken over the input object that has to be classified. The
values of the measurements are usually real numbers and
are arranged in a vector, called the feature vector. The set of
possible feature vectors is called the feature space. The
feature extractor of a pattern recognition system simply
takes measurements over the object and passes the feature
subset of elements; therefore, EðFPG Þ is much smaller than vector to the classifier. The classifier applies a given
EðFU Þ. Hence, it is sufficient to try a number of passwords criterion, implemented by means of a discriminant function,
smaller than jPj to establish in which class the object belongs. A discrimi-
2 to guess the chosen p.
To guarantee the security of the system against illegal nant function is a function that maps the feature vector into
accesses, we have to require that the selection function does the classification space and, usually, defines a boundary
not localize a small subset of P. This means that we have to among the classes. If the discriminant function is a linear
find a method to discard those probability distributions PG function, i.e., it defines a boundary in the classification
on P such that EðFPG Þ is too small. If EðFU Þ is very large space that looks like a hyperplane, the classifier is said to be
and we can force PG to look like U, then the goal is obtained. linear. Of course, a linear classifier can be used if the classes
Our Point of View. In the above abstract model, a themselves can be separated by means of a straight line.
When this happens, we say that the problem is linearly
proactive password checker can be viewed as a tool for
separable. As we will see later, the system we are looking
checking if a password p is chosen from P according to a
for is a pattern recognition system which takes as input a
suitable selection function, i.e., a function which induces a
password, extracts some features from it, and passes them
probability distribution that looks like the uniform one.
to a classifier which outputs a decision. In our abstract view,
Such a viewpoint is useful in order to understand why, in
if the classifier says that the password is weak, then it
implementing our proactive password checker, we use
means that the selection function the user is applying is not
statistical techniques. Indeed, the password selection pro-
suitable.
blem can be cast as a particular instance of pattern
Neural Networks. Neural Networks (NNs, for short) can
recognition problems and the most general and natural
be considered as a statistical technique in pattern recogni-
framework in which to formulate solutions to pattern
tion. They implement nonlinear mappings from several
recognition problems is a statistical framework.
input variables to several output variables, where the form
A Concrete Setting. The above is a general analysis of
of the mapping is governed by a number of adjustable
the password selection problem. In order to derive practical
parameters.
results, we need to carefully specify the password space P.
An NN learns how to compute a mapping by trial and
We consider the set of all the strings composed of
error, through a certain parameter optimization algorithm.
“printable” ASCII characters with length less than or equal
Such an algorithm, due to the biological premises of the
to 8 (i.e., the length allowed for passwords by Unix-like theory of NNs, is called a learning algorithm. During the
operating systems). This set is reported in Table 1. The learning process (also called training), the network receives
charset is divided into “weak” characters (namely, all the a sequence of examples and adapts its internal parameters
letters), the digits, and “strong” characters. This partition is to match the desired input-output functionality. The
motivated by the empirical evidence that strong characters knowledge to compute the mapping is therefore acquired
do not usually appear in passwords. On the other hand, during this learning process and it is stored in the modified
digits are considered “weaker” than strong characters values of the internal parameters. Several learning algo-
because of the users’ habit to use numbers in their rithms have been developed in order to teach an NN to
passwords, e.g., birth dates. We stress that this partition compute a certain mapping. We will use an NN in
should be considered as an experimental setting and that it implementing the classifier in our pattern recognition
can be modified in order to fit per-site password policies. system, i.e., the proactive password checker.
Learning Typologies. Let us suppose that a sufficiently
3 PATTERN RECOGNITION AND NEURAL NETWORKS large set of examples is available. Two main learning
strategies can be adopted in general:
Pattern Recognition. Pattern recognition is a solid area of
studies. It encompasses a wide range of information . Supervised learning—This learning typology requires
processing problems of great practical significance, from that target output values of the network are known
speech recognition and classification, to fault detection in for all the input patterns of the training set.
machinery and medical diagnosis. From an abstract point of Examples of algorithms implementing such a strat-
view, it concerns with function approximation and object egy are the Multilayer Perceptron and Radial Basis
classification. In this paper, we restrict our attention to Functions.
X
d
ak ðxÞ ¼ wTk x þ w0k ¼ wik xi þ w0k : ð3Þ
i¼1
Each variable ak is associated with each output unit. The

values wik represent the weights and the values w0k
represent the bias parameters. Moreover, to each output
unit is associated an activation function of its own inputs.
The choice of the output activation functions for the output
units depends on the application. The simplest choice is the
linear function of the form
yk ðx; wÞ ¼ ak ðxÞ: ð4Þ

Fig. 1. Single layer perceptron. Another choice, when multiple independent attributes are
involved, is given by using independent logistic sigmoidal
.Unsupervised learning—This learning typology can be activation functions, applied to each of the outputs
applied when target answers of the network for the independently, defined by
input patterns of the training set are unknown.
Unsupervised learning teaches the network to dis- 1
yk ðx; wÞ ¼ : ð5Þ
cover by itself correlations and similarities among 1 þ eak ðxÞ
the input patterns of the training set. Examples of One of the most used methods for training an SLP is the
algorithms implementing such a strategy are Self- least-square learning algorithm [3], [21]. Nevertheless, it is
Organizing Maps, Hopfield Nets, and PCA NNs.
also possible to take advantage of the linear (or near linear)
In the following, we consider NNs which apply the first structure of the network and use a particularly efficient
learning strategy. In this case, an NN can be regarded special purpose learning algorithm known as Iterated
simply as a particular choice of a function of the form Reweighted Least Squares (IRLS) [21].
yðx; wÞ; ð2Þ Notice that, in a classification problem, an SLP is used as
follows: Once the network has been trained applying the
where x denotes the input vector and w denotes the vector aforementioned early stopping method, a new vector is
of adjustable parameters, called weights. Learning in NNs, classified by giving it in input to the network, computing
i.e., adaptation of the value of w during the training process the output unit activations, and assigning the vector to the
for learning a certain mapping, is usually formulated in class whose output unit has the largest activation value.
terms of minimizing an error function with regard to w. It is possible to show that an SLP, by appropriately
Generalization and Early Stopping: Training, Valida- instantiating the output units with specific activation
tion, and Test Sets. The goal of the training procedure does functions, implements various forms of linear discriminant
not consist of exactly modeling a given set of objects, but
functions [3]. This implies that an SLP always defines a
rather in learning the mapping underlying such a data set.
Indeed, the network should exhibit good performance over linear decision boundary among the classes. Unfortunately,
new unseen inputs. A method to control the complexity of there are some problems that cannot be solved using a
the learning process is called early stopping. More precisely, linear decision boundary or, in other words, that are not
a training procedure can be seen as an iterative procedure linearly separable. Hence, SLPs correspond to a narrow
which aims at reducing the error function over the training class of possible discriminant functions and, in many
set. This error decreases as a function of the number of situations, may not represent the optimal choice.
iterations of the procedure. However, the error measured Multilayer Perceptron. The Multilayer Perceptron
on an independent data set, referred to as the validation set, (MLP) is probably the most widely used architecture for
often shows a decrement followed by an increment when practical applications of NNs. Usually, the network consists
the network starts to overfit. Therefore, it is convenient to of two layers of adaptive weights with full connectivity
stop the process at the point of smallest error on the
between inputs and hidden units and between hidden units
validation set. Indeed, it is known that the network we get
at this point has the highest generalization power. The and outputs (see Fig. 2). From the theory of NNs [3], it is
performance of the selected network should be confirmed well-known that this architecture is capable of universal
by using another independent data set, referred to as the approximation in the sense that it can approximate to
test set. arbitrary accuracy any continuous function from a compact
Single Layer Networks. Single Layer Networks, or Single region of the input space, provided the number of hidden
Layer Perceptrons (SLP), implement the well-known statis- units is sufficiently large and provided the weights and
tical techniques of linear regression and function approx- biases are chosen appropriately. In practice, this means that,
imation [3]. Such NNs have a single layer of adaptive if there is enough data to estimate the network parameters,
weights between the inputs and the outputs (see Fig. 1). an MLP can model any smooth function.
More precisely, the input values to the network are The input values to the network are denoted by xi , for
denoted by xi , for i ¼ 1; . . . ; d. The network, for k ¼ 1; . . . ; c, i ¼ 1; . . . ; d. The first layer of the network forms M linear
forms c linear combinations of these inputs, producing a set combinations of these inputs, producing a set of indetermi-
ð1Þ
of intermediate variables ak defined by nate activation variables aj , for j ¼ 1; . . . ; M, defined by
effect on the performance of the network. One of the most

important forms of preprocessing consists of the reduction
of the dimensionality of the input data. To this aim, several
approaches require forming linear or nonlinear combina-
tions of the original measurements on the object, in order to
generate inputs for the network (i.e., the feature vector). The
principal motivation for dimensionality reduction is that it
can help to alleviate the worst effects of the so-called curse of
dimensionality (see [3] for details).
Principal Component Analysis. A widely used prepro-
cessing technique is the Principal Component Analysis (or
PCA, for short) or Karhunen-Loéve transformation [3]. Let
us briefly describe the classical PCA: Let x1 ; . . . ; xN be a set
of d-dimensional vectors. Our goal is to map d-dimensional
column vectors xi to m-dimensional vectors zi , where m < d.
Fig. 2. Multilayer perceptron. To this aim, notice that a generic vector xi can be
represented, without loss of generality, as a linear combina-
ð1Þ
X
d
ð1Þ
aj ¼ w1ij xi þ w0j ; ð6Þ tion of a set of d orthonormal vectors uj :
i¼1
ð1Þ
X
m X
d
with one variable aj associated with each hidden unit. The xi ¼ xij uj þ xij uj : ð9Þ
values w1ij represent the weights of the first layer of wires, j¼1 j¼mþ1
ð1Þ
while the values w0j represent the bias parameters Since our goal is to reduce the space dimensionality while
ð1Þ
associated with the hidden units. The variables aj are keeping as much information as possible, we would like to
then transformed by the nonlinear functions of the hidden retain only a subset of m < d of the basic vectors uj so that
layer. In our application, we restrict the attention to the we use only m coefficients xij for representing xi , i.e., the
hyperbolic tangent ðtanhÞ activation functions since this is m-dimensional vector zi corresponding to the d-dimensional
the most appropriate choice for classification problems [3]. vector xi is given by zi ¼ ðxij1 ; . . . ; xijm Þ with regard to a
The outputs of the hidden units are therefore given by suitably chosen set of d orthonormal vectors u1 ; . . . ; ud .
ð1Þ
Hence, we need to find a set of d orthonormal vectors
zj ¼ tanhðaj Þ: ð7Þ such that, by retaining only m of them, we maximize the
information held by the vectors in the m-dimensional space.
The zj are then transformed by the second layer of weights
ð2Þ It is possible to show that the directions of maximum
and biases yielding the second layer activation values ak ,
variance (i.e., the directions that brings most of the
for k ¼ 1; . . . ; c:
information in the original vectors) are parallel to the
X
M eigenvectors corresponding to the largest eigenvalues of the
ð2Þ ð2Þ
ak ¼ w2kj zj þ wj0 : ð8Þ covariance matrix defined by
j¼1
X
N
Finally, these values are passed through the output-unit ¼ ðxi xÞðxi xÞT ; ð10Þ
activation functions (e.g., linear, sigmoidal, softmax [3]) i¼1
producing the output values yk ðx; wÞ. where
For networks having differentiable activation functions,
such as tanh, there exists a powerful and computational 1X N
efficient learning algorithm called error back-propagation [3], x¼ xi ð11Þ

N i¼1
[21]. However, to estimate the weights of a two-layer MLP,
we can also adopt different optimization strategies, (e.g., and xi are the original feature vectors. For this reason, the
conjugate gradients, scaled conjugate gradients, quasi- set of vectors uj we need is just the set of the eigenvectors1
Newton method [3]). In our implementation, we use the uj of with highest eigenvalues j . Such eigenvectors of
quasi-Newton optimization algorithm, which is more stable represent a new system of coordinates. Each eigenvector is
than the back-propagation algorithm [3], [21]. called a principal component. In this way, each vector xi in the
Preprocessing of the Input. An NN can implement any original d-dimensional space is represented by zi in this new
arbitrary functional mapping between multidimensional system of coordinates for the m-dimensional space.
spaces. However, in real applications, a straightforward use
of a network to map the raw input data directly into the 4 EXPERIMENTAL EVALUATION AND COMPARISON
required output variables often is not a suitable choice. In
practice, it is nearly always advantageous to apply pre- In order to verify the applicability of NNs to proactive
processing transformations to the input data before it is password checking systems, we have run several experi-
presented to a network. Data preprocessing is one of the ments with both single-layer and multilayer perceptrons.
most important stages in the development of a solution, and 1. Notice that uj is an eigenvector of with eigenvalue j if and only if
the choice of preprocessing steps can have a significant uj ¼ j uj .
TABLE 2 is constructed from noise.0.x by substituting half of the

Dictionaries Description lowercase letters with the corresponding uppercase ones.
Features. The four features we have used for the
classification process are the following: Classes, #Strong
Characters, Digrams, Upper-Lower Distribution. More precisely:
. Classes. It is reasonable to consider the set of ASCII

characters divided into classes of different strength.
Commonly, passwords are composed of letters; this
means that all (uppercase and lowercase) letters
must have low values. In a second class, we can put
the digits 0; . . . ; 9. This is because it is not usual to
find a digit in a password, but it is not so unusual,
either. In the last class, called the class of strong
characters, we can put every character that does not
belong to the first two classes. To mark the distance
among these classes, we have assigned to the class of
letters a value equal to 0.2, to the class of digits a
value equal to 0.4, and, to the last class, 0.6. The
overall value of a password is computed by summing
We have applied these models using different architectures up the values associated with each character in the
and using a PCA preprocessing technique. The obtained password. Notice that, since the feature is a sum, the
results are compared to identify the NN model that yields longer the password, the higher the value.
the best performance. In the following, we describe in detail . #Strong Characters: The second feature is the number
the dictionaries and the features we have used and the of strong characters (as defined in Table 1) contained
experimental results we have got. in the password.
Dictionaries. The dictionaries we have used for the . Upper-Lower Distribution: The value of this feature is
training and testing phases of the NNs are reported in computed by the following formula:
Table 2. More specifically, the weak dictionary is composed
of 327,878 words from the English dictionary. All the words jUP P LOW j=‘et;
are lowercase and no special symbol belongs to any of these where UP P is the number of uppercase letters,
words. LOW is the number of lowercase letters, and ‘et is
We created three types of dictionaries of strong pass-
the number of letters in the password. The presence
words. The first type is composed of strings whose
of this feature is due to the observation that
characters are pseudorandomly sampled from the set of
characters reported in Table 1. This dictionary is composed passwords that contain both uppercase and lower-
of 30,000 words of length 6 (the dictionary strong.0.1), case letters are slightly stronger that passwords
30,000 words of length 7 (the dictionary strong.0.2), and composed of lowercase (uppercase) letters only.
30,000 words of length 8 (the dictionary strong.0.3). Notice . Digrams: This feature looks at the types of digrams
that the set of strong characters is approximately one-third present into the password. More precisely, we say
of the whole charset; hence, this strategy for constructing that a digram is an alternance if the two characters
strong passwords does not rule out the possibility of of the digram belong to different classes. The
obtaining pseudorandom passwords composed of only checker scans the password, analyzes all the
lowercase letters. digrams from left to right, and assigns values to
For this reason, we have constructed the dictionaries each of them. In a password with n characters, we
strong.1.x and strong.2.x so as to force each password in consider all n 1 possible digrams. The more
these dictionaries to present either strong characters or alternances the password has, the higher the value.
some digits. More precisely, each word in strong.1.x Experimental Results. The aim of these experiments is to
contains at least one strong character or at least two digits. compare an SLP and an MLP in classifying words as weak or
Similarly, each word in strong.2.x contains at least two strong. We use a supervised learning strategy to train the NN.
strong characters or three digits. Each dictionary is In order to implement such a strategy, we need to construct
composed of 30,000 words, of length 6 in strong.y.1, length 7 a training set (as well as validation and test sets, required by
in strong.y.2, and length 8 in strong.y.3. the early stopping method [3]) by using the dictionaries
To simulate usual users’ behavior, we have also used
described in Table 2. As a starting point, we consider a data
noisy dictionaries. The idea underlying the construction of
set, referred to as Data_Set_1, constructed as follows: We
such dictionaries is that users, in order to remember their
passwords, might substitute in a weak word one or more assume that weak and the noisy dictionaries contain weak
characters with strong ones and/or use both lowercase and passwords, while the remaining ones, namely strong.x.y, for
uppercase letters. Hence, the dictionary noise.0.x is obtained x ¼ 0; 1; 2 and y ¼ 1; 2; 3, contain strong passwords. We
by substituting, in each word, x strong characters, in label each weak (respectively, strong) password with a 0
randomly chosen positions. Finally, the dictionaries noise.1.x (respectively, 1).
TABLE 3
Classification Percentage of Data_Set_1
with Different NNs, no PCA
The training, the validation, and the test sets have been
obtained by collecting the labeled passwords in order to
form two big dictionaries of weak and strong passwords
and by assigning a randomly chosen 60 percent of the two
dictionaries to the training set, a randomly chosen 20 per-
cent to the validation set, and the remaining 20 percent to
the test set.
Our experiments were carried out using algorithms of
the Netlab Toolbox [21]. More precisely, the training
algorithm used for the SLP is the IRLS [21], with activation
functions for the output units given by the logistic Fig. 3. Eigenvalues plot of the training set.
sigmoidal function. Moreover, we have used the quasi-
Newton optimization algorithm as the training algorithm From the analysis of the correlation matrix, we note that
for the MLP with tanh activation functions for the nodes of the first two principal components (see Fig. 3) have
the hidden layer and a linear function for the nodes of the meaningful information (more than 90 percent of informa-
output layer. The number of inputs for both the NNs is four tion). This allows us to use a linear PCA technique to extract
and the number of outputs is one. In the case of the MLP, two features and to obtain a two-dimensional dictionary.
we have considered several instances of the network by More precisely, we have that, from the four features of the
changing the number of hidden nodes from four to 10. training set that we denote as
Performance. In Table 3 we show the classification rates
with respect to Data_Set_1, obtained with different NNs. x1 ¼ Classes;
From these results, it is clear that MLPs achieve better x2 ¼ #Strong Characters;
performance than SLPs, even when the number of hidden
x3 ¼ Upper Lower Distribution;
nodes is small. Since the classification rate is higher when
the number of hidden nodes is eight, we report in Table 4 and
the classification rates of such an NN on all dictionaries we
have constructed. x4 ¼ Digrams;
we obtain two features (y1 and y2 ). These features are a
TABLE 4 linear combination of the four source features and they can
Classification of the Dictionaries of Data_Set_1 be described by the following equation:
Using an Eight-Hidden-Nodes MLP X3
yi ¼ x for i ¼ 1; 2;
j¼1 ij j
ð12Þ
where the coefficients i ¼ i1 ; . . . ; i4 are the coefficients of

the principal components of the covariance matrix, i.e.,
1 ¼ ½0:1331 0:7435 0:5892 0:0434;

ð13Þ
2 ¼ ½0:0422 0:3982 0:3892 0:8296:
In Table 5, we show the results obtained by the different
NNs when preprocessing the inputs using the PCA
technique. We point out that the results obtained using
the best NN (eight hidden nodes) on each dictionary are
very close to those obtained without applying PCA. In
Fig. 4, we plot the two-dimensional data set and the
decision boundary obtained with the MLP with eight
hidden nodes.
Along the lines of the above experiments, we have
carried out a second set of experiments by using a new data
TABLE 5 TABLE 7
Classification Percentage of Data_Set_1 Classification of the Dictionaries of Data_Set_2
with Different NNs, with PCA by Using an MLP with 10 Hidden Nodes
TABLE 8
with Different NNs, with PCA
Fig. 4. Data visualization and contour plot obtained by using a MLP with
eight hidden nodes with PCA on the whole dictionary Data_Set_1: (.)
set, referred to as Data_Set_2, constructed by changing

Data_Set_1 by labeling with 1 the words in noise.1.2, i.e.,
noise.1.2 is a strong dictionary. We report in Table 6 the
classification rates obtained by applying an SLP and an
MLP with different hidden nodes, trained using this new
training set, but with no PCA preprocessing. From these results, it is clear that the choice to consider
Again, we note that by using an MLP, we obtain a better noise.1.2 a strong dictionary reduces the overall error rate
result than by using an SLP. Moreover, since the best result and significantly reduces the gap among the errors
is obtained using an MLP with 10 hidden nodes, we classify associated to different dictionaries. Furthermore, we notice
the original dictionaries using this network configuration. that the percentage of false negatives, i.e., the misclassified
In Table 7, we show the obtained classification rates. weak passwords, is negligible. The false negatives in Table 7
In Table 8, we show the results obtained on this are the passwords in the dictionaries weak, noise.0.x, and
dictionary using an SLP and different MLPs with four, noise.1.1 classified as strong and their percentage with
six, eight, and 10 hidden nodes, respectively, and applying regard to the number of all weak passwords is 0.4 percent.
the PCA preprocessing technique. We note that the results This property ensures that a negligible number of user
are again very similar to the ones obtained without using passwords will be easily guessable. Notice that most of the
the PCA technique. In Fig. 5, we plot the two-dimensional
data set and the obtained decision boundary. We can also
note from this figure that the classification is hard using
an SLP.
TABLE 6
with Different NNs, No PCA
Fig. 5. Data visualization and contour plot obtained by using an MLP with
10 hidden nodes with PCA on the whole dictionary Data_Set_2: (.) class
with label 1, (+) class with label 0.
Fig. 6. Decision boundary of the RBF model with 10 hidden nodes and Fig. 8. Decision boundary of the FRNN model with 10 hidden nodes and
with PCA on Data_Set_1. with PCA on Data_Set_1.
errors are due to the noise.1.1 dictionary, composed of NN and enables a visual representation of the classification
passwords of the dictionary weak in which a randomly process. For these reasons, in using the RBF and FRNN
chosen character has been substituted with a strong models, we have only considered the two data sets obtained
character and half of the letters have been substituted with by applying the PCA preprocessing.
their uppercase. With respect to Data_Set_1, we obtain, by using an RBF-
Comparison with Other Classification Methods. For based NN with 10 hidden nodes, 92.1818 percent of perfect
the sake of completeness, we have compared the results classification on the training set and 92.2148 percent on the
obtained with MLPs with the following classification test set. With respect to Data_Set_2, we obtain 94.8316 per-
methods: cent of perfect classification on the training set and of
94.8507 percent on the test set.
. Kernel Functions. Radial Basis Functions (RBFs) are
By using an FRNN with 10 hidden nodes on Data_Set_1,
powerful kernel-based techniques for interpolation
we get 95.9108 percent of perfect classification on the
and classification in multidimensional spaces. Basi-
training set and 95.8644 percent on the test set. With respect
cally, an RBF is a function which implements a
distance criterion with respect to a center. Radial to Data_Set_2, we obtain 97.194 percent of perfect classifica-
basis functions have been applied in the area of tion on the training set and 97.2195 percent on the test set.
NNs. Such networks have three layers: the input In Figs. 6, 7, 8, and 9, we plot the decision boundaries
layer, the hidden layer with the RBF functions, and a obtained in the above experiments.
linear output layer. The most popular choices for Notice that the experiments executed with FRNNs have
RBFs are the Gaussian functions. For a complete also shown that a smaller number of hidden nodes are
introduction to NNs based on RBFs, the reader is sufficient to get high classification rates. Indeed, an FRNN
referred to [3]. with five hidden nodes correctly classifies 93.9547 percent
. Fuzzy Models. Fuzzy Relational Neural Networks of the training set and 93.9114 percent on the test set of
(FRNNs) have been introduced in [9]. FRNNs apply Data_Set_1. At the same time it correctly classifies
fuzzy rules in order to classify objects. These fuzzy 97.1391 percent of the training set and 97.1782 percent on
rules are obtained by combining fuzzy relations the test set of Data_Set_2.
learned during the training process. The composi- Comparison with Other Password Checkers. We com-
tion of such relations is accomplished by using pared our password checker with those described in [5] and
suitable norms (e.g., t-Norms). [16]. We chose these proactive password checkers since the
The experiments considered so far show that the system described in [5] outperforms all the previous
percentages of classification obtained with and without solutions described in the literature and the second one is
PCA preprocessing are substantially the same. On the other derived from Crack [15], the most used password cracker.
hand, PCA preprocessing allows one to construct a smaller Both these systems can be adapted to a per-site policy.
Fig. 7. Decision boundary of the RBF model with 10 hidden nodes and Fig. 9. Decision boundary of the FRNN model with 10 hidden nodes and
with PCA on Data_Set_2. with PCA on Data_Set_2.
TABLE 9 TABLE 11
Hyppocrates—noise.1.2 Considered Hyppocrates—noise.1.2 Considered
in Training Phase as Strong in Training Phase as Strong
The idea behind the experiments is to test different trees

Hyppocrates: This system uses decision trees to classify
constructed with training sets in which the weak passwords
passwords. More precisely, Hyppocrates creates a decision
are sampled from weak, noise.1.1, and noise.1.2.
tree from the training set, which consists of positive and
In all the experiments, except Experiment 5, the strong
negative examples, namely, weak and strong passwords.
dictionary used for the training is randomly generated by
The result of the training process is a trained tree. Starting
Hyppocrates and its size is 10 percent of the size of the
from this tree, it is possible to construct a pruned one. We
weak dictionary used for the same phase. The value
used the pruned tree generated by the system since it
10 percent is suggested by the authors of [5] in order to
requires less space than the trained tree while maintaining
optimize the performance of the system. In Experiment 5,
almost the same classification rates. For a more detailed
we use strong.1.x, with x ¼ 1; 2; 3, during the training phase.
description, we refer the reader to [5].
In Experiment 1, we trained the system by using the
We ran several experiments, and in Tables 9, 10, 11, 12,
dictionary weak, transformed by Hyppocrates in the follow-
13, and 14, we report the results of the most relevant.
Notice that Hyppocrates is case-insensitive. Thus, from ing way: Each word with probability one-third is not
the system’s point of view, the dictionaries noise.0.x and modified; with probability two-thirds, it is modified by
noise.1.x are exactly the same, while, in NN-based checkers, substituting a strong character in a randomly chosen
such dictionaries are distinct. For this reason, we only position. Notice that two-thirds of the words in the
consider the dictionaries noise.1.x both for the training and resulting dictionary have the same characteristics of words
the testing phases. belonging to noise.1.1.
TABLE 10 TABLE 12
Hyppocrates—noise.1.2 Considered Hyppocrates—noise.1.2 Considered
in Training Phase as Strong in Training Phase as Weak
TABLE 13 TABLE 15
Hyppocrates—noise.1.2 Considered Classification of the Dictionaries Used to Construct Data Set 1
in Training Phase as Weak and Data Set 2 Provided by Cracklib in Several Experiments
In Experiment 2, we considered the case in which

examples of weak passwords are taken from noise.1.1. In Experiment 1, the system has been constructed by using
Experiment 3, half of the weak passwords are taken from half of the dictionary weak. In Experiment 2, the system has
weak and the other half are taken from noise.1.1.
been constructed by using the whole dictionary weak. In
In all the above experiments, the passwords in noise.1.2
Experiment 3, we have used half the words in weak and half
are classified as strong. In order to force the system to
classify such a dictionary as weak, in the training phase of the words in noise.0.1. In Experiment 4, we have used
Experiments 4 and 5, we used half the examples of weak noise.0.1. Finally, in Experiment 5, we have used both weak
passwords from noise.1.2. and noise.0.1.
Finally, in Experiment 6, one-third of the examples of Performance Evaluation. From the test results reported in
weak passwords are sampled from weak, one-third from Table 4 and Table 7, we can state that the system presented in
noise.1.1, and the remaining third from noise.1.2, as this paper correctly classifies almost all the weak passwords.
suggested by the authors in [5]. More precisely, if noise.1.2 is considered to be a weak
CrackLib: This password checker also provides the dictionary (Table 4), the percentage of misclassified weak
possibility of adapting the system in order to tune its passwords, i.e., weak passwords classified as strong, is 0.5
performance to local policies. In this case, the training is percent. On the other hand, the classification error on strong
carried out by using only the dictionary containing the
passwords is 25.5 percent. If noise.1.2 is considered to be a
weak passwords that have to be rejected. We report in
strong dictionary (see Table 7), the error rate over weak
Tables 15 and 16 the results of the following experiments: In
TABLE 16
TABLE 14 Classification of the Dictionaries Used to Construct Data Set 1
Hyppocrates—noise.1.2 Considered and Data Set 2 Provided by Cracklib in Several Experiments
in Training Phase as Weak
TABLE 17 the space required by the different experiments we have

Space Requirement presented in this paper.
It is clear that previous solutions do require an amount of
information that is, in most cases, considerably larger than
the solution presented in this paper.
5 CONCLUSIONS
We have applied SLP and MLP networks to the design of
proactive password checkers. It is the first time that such
techniques have been fully (and successfully) employed in
this setting. We have evaluated the performance of several
network topologies. For some of them, we have provided a
visualization of the behavior of the network using a
standard preprocessing technique of the inputs. Moreover,
we have compared the MLP networks with kernel-based
and fuzzy-based neural network models. Although such
models obtain very high classification rates, MLP networks
passwords decreases to 0.4 percent and, simultaneously, the still appear to be the best choice. Finally, we have compared
error rate over the strong passwords decreases to 10 percent. the classification rates obtained by our solutions with
For Hyppocrates, we note that Experiments 1, 2, and 3
previously presented proactive password checkers. In all
show a very good classification rate for weak passwords.
cases, the results confirm that proactive password checkers
Indeed, if we consider the dictionary noise.1.2 a strong one,
based on this technology have high efficiency and efficacy.
the percentage of weak passwords that are misclassified is 0
The solution presented has the main advantage that the
for Experiments 2 and 3. Furthermore, the memory
size of information to be stored after the training of the NN
requirements for the pruned tree are almost comparable
to the size of the NN (see Table 17). is independent of the size of the training set and, in our case,
On the other hand, we note that, for Hyppocrates, it is can be as low as 552 bytes. Hence, such checkers might be
hard to classify noise.1.2 as a weak dictionary. The results of easily implemented using the smart card technology.
Experiments 4, 5, and 6 show that, if we consider noise.1.2.
as a weak dictionary, the percentage of misclassified REFERENCES
passwords becomes unacceptable. Only in Experiment 6, [1] F. Bergadano, B. Crispo, and G. Ruffo, “High Dictionary
the percentage of misclassification of weak passwords is Compression for Proactive Password Checking,” ACM Trans.
0.67, but the price we pay is that the size of the pruned tree Information and System Security, vol. 1, no. 1, pp. 3-25, 1998.
[2] S.M. Bellovin and M. Merritt, “Encrypted Key Exchange: Pass-
grows to 76 KB. Hence, the size of a tree that classifies word-Based Protocols Secure against Dictionary Attacks,” Proc.
noise.1.2 as a weak dictionary (and that does not induce IEEE Symp. Research in Security and Privacy, pp. 72-84, 1992.
misclassification on the other weak dictionaries) is quite big. [3] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ.
The results of the experiments for Cracklib show very Press, 1995.
[4] M. Bishop, “Proactive Password Checking,” Proc. Fourth Workshop
poor performance with regard to the classification of weak Computer Security Incident Handling, pp. 1-9, 1992.
passwords (see Tables 15 and 16). Indeed, while it correctly [5] C. Blundo, P. D’Arco, A. De Santis, and C. Galdi, “Hyppocrates: A
classifies almost all the strong passwords, it fails in New Proactive Password Checker,” J. Systems and Software, vol. 71,
nos. 1-2, pp. 163-175, Apr. 2004.
classifying weak passwords, if they have not been used [6] C. Blundo, P. D’Arco, A. De Santis, and C. Galdi, “A Novel
during the training phase. Approach to Proactive Password Checking,” Proc. Infrastructure
Memory Requirement. As a final remark, we would like Security (INFRASEC ’02), pp. 30-39, 2002.
[7] M.K. Boyarsky, “Public-Key Cryptography and Password Proto-
to stress that the space needed to store a NN once it has been cols: The Multi-User Case,” ACM Conf. Computer and Comm.
trained, consists of a few bytes and that it is independent of the Security, pp. 63-72, 1999.
size of the training set. In our case, the PCA matrix consists of [8] V. Boyko, P. MacKenzie, and S. Patel, “Provably Secure Password-
eight real values. An MLP with h hidden nodes, with four Authenticated Key Exchange Using Diffie-Hellman,” Proc. Euro-
crypt 2000, pp. 156-171, 2000.
inputs and one output, requires storing 5h real values for the [9] A. Ciaramella, R. Tagliaferri, W. Pedrycz, and A. Di Nola, “Fuzzy
weights and biases needed to compute the intermediate Relational Neural Network,” Int’l J. Approximate Reasoning, vol. 41,
activation variables and h þ 1 values, for computing the pp. 146-163, 2006.
[10] C. Davies and R. Ganesan, “Bapasswd: A New Proactive
second-layer activation variable, along with the bias for the Password Checker,” Proc. 16th Nat’l Conf. Computer Security,
output unit. pp. 1-15, 1993.
This means that, in an MLP with 10 hidden nodes with [11] N.M. Haller, “The S/KEY One-Time Password System,” Proc.
ISOC Symp. Networks and Distributed Systems Security, 1994.
PCA preprocessing, if a real value is encoded using 8 bytes, [12] N. Haller, C. Metz, P. Nesser, and M. Straw, A One-Time Password
the total number of bytes to be stored is 552, independently of System, Request for Comments 2289, 1998.
the size of the training set. In contrast, previous solutions [13] L.C. Jain, U. Halici, I. Hayashi, S.B. Lee, and S. Tsutsui, Intelligent
presented in [5], [1], [16] require the size of the information Biometric Techniques in Fingerprint and Face Recognition. CRC Press,
1999.
to be stored, once the system has been trained, to be [14] “John the Ripper” password cracker, http://www.openwall.
dependent on the specific training set. In Table 17, we report com/john, 2006.
[15] A.D. Muffett, “Crack 5.0,” http://www.crypticide.com/users/ Alfredo De Santis received the laurea degree in
alecm/, 1997. computer science (cum laude) from the Uni-
[16] A.D. Muffett, “Cracklib v2.7: A Proactive Password Sanity versity of Salerno in 1983. Since 1984, he has
Library,” http://www.crypticide.com/users/alecm/, 1997. been with the Dipartimento di Informatica ed
[17] J.B. Nagle, “An Obvious Password Detector,” Usenet News, 1988. Applicazioni of the University of Salerno, in
[18] J. Katz, R. Ostrovsky, and M. Yung, “Efficient Password- 1984-1986 as an instructor in charge of the
Authenticated Key Exchange Using Human-Memorable Pass- computer laboratory, in 1986-1990 as a faculty
words,” Proc. Eurocrypt ’01, pp. 475-495, 2001. researcher, and since November 1990 as a
[19] D.V. Klein, “Foiling the Cracker–A Survey of, and Improvements professor of computer science. From November
to, Password Security,” Proc. Second USENIX Workshop Security, 1991 to October 1995 and November 1998 to
pp. 5-14, 1990. October 2001, he was the chair of the Dipartimento di Informatica ed
[20] R. de Luis-Garcı́a, C. Alberola-López, O. Aghzout, and J. Ruiz- Applicazioni of University of Salerno. He was the chairman of the
Alzola, “Biometric Identification Systems,” Signal Processing, graduate program in computer science at the University of Salerno: ciclo
vol. 83, no. 12, pp. 2539-2557, 2003. XII (1996–2000), ciclo XIII (1997–2001), ciclo XIV (1998–2002), ciclo XV
[21] I.T. Nabney, NETLAB-Algorithms for Pattern Recognition. Springer- (1999–2002), and ciclo XVI (2000–2003). From September 1987 to
Verlag, 2002. February 1990, he was a visiting scientist at the IBM T.J. Watson
[22] C. Schnorr, “Efficient Identification and Signature for Smart- Research Center, Yorktown Heights, New York. He spent August 1994
Cards,” Proc. Eurocrypt ’89, pp. 239-252, 1989. at the International Computer Science Institute (ICSI), Berkeley,
[23] E. Spafford, “Opus: Preventing Weak Password Choices,” California, as a visiting scientist. He was the program chairman of
Computers and Security 3, 1992. Eurocrypt ’94, of the Fifth Italian Conference on Theoretical Computer
[24] R. Stalling, Network and Internetwork Security Principles and Practice. Science, 1995, and of the Security in Communication Networks
Prentice Hall, 1995. Conference, 1996. He was cochair of the Advanced School on
[25] T. Wu, “The Secure Remote Password Protocol,” Proc. ISOC Computational Learning and Cryptography, Vietri sul Mare, Italy, 1993.
Network and Distributed System Security Symp., pp. 97-111, 1998. He served on the scientific program committee of several international
[26] J. Yan, “A Note on Proactive Password Checking,” Proc. ACM New conferences. He was the editor of the Proceedings of the Fifth Italian
Security Paradigms Workshop, Sept. 2001. Conference on Theoretical Computer Science, World Scientific, 1996.
He was the editor of the volume Advances in Cryptology—Eurocrypt ’94
and coeditor of the volume Sequences II: Methods in Communication,
Security and Computer Science (Springer-Verlag, 1993). His research
Angelo Ciaramella received the laurea degree interests include algorithms, data security, cryptography, communica-
(cum laude) and PhD degree in computer tion networks, information theory, and data compression.
science from the University of Salerno, Italy, in
1998 and 2002, respectively. He is currently a Clemente Galdi received the laurea degree
postdoctoral researcher with the Department of (cum laude) and the PhD degree in computer
Mathematics and Computer Science at the science from the University of Salerno (Italy) in
University of Salerno. He works on nonlinear 1997 and 2002, respectively. From May to
PCA for periodicities detection and independent September 2001, he visited Telcordia Tecnolo-
component analysis in blind source separation gies, New Jersey. From November 2001 to
for linear, convolutive, and single channel October 2004 he was a postdoctoral fellow with
mixtures. He also works on fuzzy and neurofuzzy systems. He is the the Department of Computer Engineering and
author of several publications in the area of soft computing and signal Informatics of the University of Patras and the
processing. Computer Technology Institute, Patras, Greece.
Since April 2006, he has been assistant professor at the University of
Paolo D’Arco received the PhD degree in Napoli “Federico II.” His research interests include cryptography, data
computer science from the University of Salerno security and algorithms.
in February 2002. From November 2001 to
September 2002, he was a postdoctoral fellow Roberto Tagliaferri received the laurea degree
at the Centre for Applied Cryptographic Re- in computer science from the University of
search, Department of Combinatorics and Opti- Salerno, Italy, in 1984. From 1986 to 1999, he
mization, University of Waterloo (Canada). was a researcher with the Department of
Since December 2003, he has been an assistant Computer Science at the University of Salerno.
professor at the University of Salerno. His Since 2000, he has been an associate professor
research interests include cryptography and with the Department of Mathematics and Infor-
data security. matics of the University of Salerno. His research
covers the area of neural nets: neural dynamics,
fuzzy neural nets, clustering and data visualiza-
tion techniques and their applications to signal and image processing
with astronomical and geological data, bioinformatics, and medical
computer-aided diagnosis. He has been cochairman of special sessions
at AMSE ISIS ’97, at IJCNN ’99, IJCNN ’01, IJCNN ’03, WILF ’03,
IJCNN ’04, IJCNN ’05, WILF ’05, and IJCNN ’06, and coeditor of a
special issue of Neural Networks. He presented tutorials on “Learning
with Multiple Machines: ECOC models versus Bayesian Framework” at
IJCNN ’03 and on “Visualization of High Dimensional Scientific Data” at
IJCNN ’05. He is the author of more than 100 publications in the area of
neural networks. Since 1995, he has been coeditor of the Proceedings
of the Italian Workshops on Neural Nets (WIRN). He was Secretary of
SIREN (Societá Italiana Reti Neuroniche) from 1994 to 2005. Currently,
he is a cochair of the Bioinformatics SIG of the INNS, a member of the
Director Council of the IIASS (International Institute for Advanced
Scientific Studies) E.R. Caianiello, a senior member of the IEEE, and a
member of INFN, INFM, and AIIA.
. For more information on this or any other computing topic,

please visit our Digital Library at www.computer.org/publications/dlib.

Neural Network Techniques For Proactive Password Cracking

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Network Techniques For Proactive Password Cracking

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 3, NO.

4, OCTOBER-DECEMBER 2006 327

Neural Network Techniques

T HE Access Control Problem: A Challenge. The design of

TABLE 1 object classification. Loosely speaking, we assume that a

Each variable ak is associated with each output unit. The

yk ðx; wÞ ¼ ak ðxÞ: ð4Þ

effect on the performance of the network. One of the most

efficient learning algorithm called error back-propagation [3], x¼ xi ð11Þ

TABLE 2 is constructed from noise.0.x by substituting half of the

. Classes. It is reasonable to consider the set of ASCII

where the coefficients i ¼ i1 ; . . . ; i4 are the coefficients of

1 ¼ ½0:1331 0:7435 0:5892 0:0434;

set, referred to as Data_Set_2, constructed by changing

The idea behind the experiments is to test different trees

In Experiment 2, we considered the case in which

TABLE 17 the space required by the different experiments we have

. For more information on this or any other computing topic,

You might also like