Professional Documents
Culture Documents
Citation
URL http://hdl.handle.net/2297/6796
Right
*KURAに登録されているコンテンツの著作権は,執筆者,出版社(学協会)などが有します。
*KURAに登録されているコンテンツの利用については,著作権法に規定されている私的使用や引用などの範囲内で行ってください。
*著作権法に規定されている私的使用や引用などの範囲を超える利用を行う場合には,著作権者の許諾を得てください。ただし,著作権者
から著作権等管理事業者(学術著作権協会,日本著作出版権管理システムなど)に権利委託されているコンテンツの利用手続については
,各著作権等管理事業者に確認してください。
http://dspace.lib.kanazawa-u.ac.jp/dspace/
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005
Abstract— Nowadays, the integer prime-factorization problem In this paper, first the ability of artificial neural networks
finds its application often in modern cryptography. Artificial (ANNs) in an attempt to solve the integer factorization prob-
Neural Networks (ANNs) have been applied to the integer prime- lem is investigated. A binary approach is proposed, expected to
factorization problem. A composed number N is applied to the
ANNs, and one of its prime factors p is obtained as the output. be more stable, i.e. less sensitive to small errors in the network
Previously, neural networks dealing with the input and output output compared to the previously used decimal approach.
data in a decimal format have been proposed. However, accuracy Simulations have been performed and results are compared
is not sufficient. In this paper, a neural network following a with results reported in the previous independent study. More-
binary approach is proposed. The input N as well as the desired over, instances of larger composites N and consequently larger
output p were expressed in a binary form. The proposed neural
network is expected to be more stable, i.e. less sensitive to small primes p are investigated. Finally, the probability density of
errors in the network outputs. Simulations have been performed the training patterns is examined and the need for a different
and the results are compared with the results reported in the way to create or select the training data, in order to solve the
previous study. The number of required search times for the true integer factorization problem for numbers composed of larger
prime number can be well reduced. Furthermore, the probability primes, is shown.
density function of the training patterns is investigated and the
need for different data creation and/or selection techniques is II. P ROBLEM D ESCRIPTION
shown.
In this paper, artificial neural networks are applied in order
I. I NTRODUCTION to factor integers N , which are the product of two odd2 primes
p and q, i.e. N = p · q. Throughout the article we will assume
Since ancient times, mathematicians have been fascinated p ≤ q. To factor an integer composed of two prime numbers,
by the integer prime-factorization problem, also known as it is enough to find one of the two primes. The other prime
the prime decomposition problem. In mathematics, more can be obtained though a single division. Here, given N we
specifically in number theory, the fundamental theorem of focus on obtaining the smaller prime p of the two. In other
arithmetic [1] states that every positive integer greater than words, we try to approximate the mapping N → p.
1 can be written as a product of prime numbers in only one Although it hasn’t been applied in our research, it
unique way1 . Although it is fairly easy to compute the product should be mentioned for a later reference that it has been
of a number of given primes, it appears to be very difficult proved [4] that N can be factored using any multiple of
to decompose a given product into its unique primes. Up to ϕ(N ) = (p − 1)(q − 1).
present, there exist no known deterministic or randomized
polynomial-time algorithm for finding the factorization of a III. N EURAL N ETWORK FOR FACTORIZATION
given composed number [2]. A. Network Design
Nowadays, the integer factorization problem finds its ap- The adopted neural networks used in our experiments are
plication often in modern cryptography. The computational multilayer feed-forward networks with a single hidden layer.
security of many cryptographic systems, including the well- Although we have experimented with various multilayer feed-
known and widely applied RSA public-key algorithm, re- forward neural networks, the best results were obtained with
lies on the difficulty of factoring large integers. If a fast networks having only a single hidden layer.
method for solving the integer factorization problem would Networks constructed with neurons having a sinusoidal acti-
be found, then several important cryptosystems would become vation function in the hidden layer performed better compared
insecure [2] [3]. Therefore, studies on factoring large integers
are very important for the development of more secure crypto- 2 We have chosen to consider only numbers composed of odd primes for
systems. two reasons. First of all, this allowed us to make a straightforward comparison
with the results reported in the previous study. Secondly, the case where N
is composed of an even prime, resulting in N to be an even number, is trivial
1 Ignoring the ordering of the factors anyway, especially in a binary format.
B. Training and Test Data However, it can be known that the least significant bit of p
will always be equal to one, because N is a product of two
This being our first study related to the integer factorization
odd primes. Also in all our performed experiments the ANNs
problem, we dealt with relatively small composed integers,
always correctly output this least significant bit. Taking this
restricted by an upper bound, and consequently small prime
knowledge into consideration, c(k) given in Eq. (1) can be
numbers. This enabled us to generate and use all possible
replaced by:
patterns N → p within the limitation N < M . A certain
percentage of the data set was used for training, while the
b−1 (b − 1)!
remaining part was used to test the performance of the network c′ (k) = = (3)
k k!(b − 1 − k)!
after learning had converged.
Both the input N and the desired output p were represented Analogously, the maximum number of required trial and error
in a binary form. During evaluation, any output of an output procedures in order to obtain the correct output for data within
neuron greater than or equal to zero was considered as an βi can be defined by:
upper bit, while any output less than zero was considered as i
X
a lower bit. Sβ′ i = c′ (l) (4)
l=1
IV. P ERFORMANCE M EASURE
A. Measures for Bit Errors V. S IMULATIONS
To evaluate the network performance, two different mea- The neural networks used in our simulations have been
sures are considered. The first measure, which we call the developed using the Java Object-Oriented Neural Engine
binary complete measure and denoted by β0 , indicates the (JOONE), an open source neural net framework implemented
percentage of the data for which the network produces the in Java [8].
2578
TABLE IV
Results are reported for single neural networks (SNNs) and
R ESULTS FOR N ETWORKS T RAINED WITH 33% OF THE D ATA S ET WITH
multi-modal neural networks (MNNs) consisting of three in-
N = p · q ≤ 100000
dependent neural networks. All presented results are averaged
over 10 independent runs for each problem instance.
Tables I and II show the results of networks trained on the Topology Epochs β0 β1 β2 β3 β4 Data
problem instances N < 1000 and N < 10000, respectively. 17-50-9
10000
51% 66% 81% 92% 98% Train
Here, 66% of the data set was used for training. This allowed SNN 44% 57% 72% 86% 95% Test
us to make a straightforward comparison with the results 17-50-9 30000 59% 72% 84% 94% 99% Train
reported in the previous study [9]. MNN-3 (3 · 10000) 52% 62% 74% 86% 95% Test
TABLE I
TABLE V
R ESULTS FOR N ETWORKS T RAINED WITH 66% OF THE D ATA S ET WITH
R ESULTS FOR N ETWORKS T RAINED WITH 10% OF THE D ATA S ET WITH
N = p · q ≤ 1000
N = p · q ≤ 1000000
TABLE II
VI. C OMPARISON BETWEEN P REVIOUS A PPROACH AND
R ESULTS FOR N ETWORKS T RAINED WITH 66% OF THE D ATA S ET WITH
C URRENT S TUDY
N = p · q ≤ 10000
A. Differences in Approach
Topology Epochs β0 β1 β2 β3 Data Previously Meletiou et al. investigated the ability of ANNs
14-20-7 64% 81% 93% 98% Train to factor integers and reported promising results [9]. Here we
5000 address the same problem, however two major differences exist
SNN 53% 69% 83% 94% Test
14-20-7 15000 72% 85% 95% 99% Train between their study and ours. The differences occur in the ap-
MNN-3 (3 · 5000) 62% 73% 85% 94% Test proach of solving the integer factorization problem by ANNs,
more specifically the differences are in the representation of
the data and the function to approximate.
For larger problem instances where N ≤ 100000, the neural Meletiou et al. dealt with the data in decimal form, where
networks are still able to obtain satisfactory results as shown the input data N < M was normalized to the space
in Table III. S = [−1, 1] by splitting it up in M sub-spaces. The network
output was transformed again into an integer number within
TABLE III
the interval [0, M ] using the inverse operation. According to
R ESULTS FOR N ETWORKS T RAINED WITH 66% OF THE D ATA S ET WITH
their paper, this normalization step played a crucial role in
N = p · q ≤ 100000
the whole procedure aiming to transform the data in such a
way that the network will find it easier to adapt to. However,
Topology Epochs β0 β1 β2 β3 β4 Data in our opinion this normalization step might lead to problems
17-50-9 49% 63% 77% 89% 97% Train as the problem space, i.e. the upper bound M of N starts
10000
SNN 45% 58% 72% 86% 95% Test to grow. Whenever larger problem instances are considered,
17-50-9 30000 55% 66% 79% 90% 97% Train the network output range, restricted to [−1, 1], will be divided
MNN-3 (3 · 10000) 51% 61% 73% 86% 96% Test into more, but smaller sub-spaces during the normalization
step. Then, even very small differences in the network output
might lead to values far removed from the desired target output
Trying smaller training sets, the networks maintain the after denormalization.
ability to adapt to the training data and generalize on the test This problem is already shortly addressed in their work
data. This is illustrated in Table IV where neural networks by introducing a second performance measure besides the
were trained with 33% of the data set, while the remaining complete measure. The complete measure, denoted by µ0 ,
part was used for validation. indicates the percentage of the data for which the network is
Finally, the results for the problem instance N < 1000000, able to compute the exact target value. The second measure,
where only 10% of the data set was used for training, are called the near measure and denoted by µ±k , indicates the
shown in Table V. percentage of the data for which the difference between the
2579
TABLE VII
desired and actual output does not exceed ±k of the real target.
R ESULTS R EPORTED BY M ELETIOU ET AL . FOR N ETWORKS T RAINED
However, introducing this second measure does not solve the
WITH 66% OF THE D ATA S ET WITH N = p · q ≤ 10000
problem and is more an admittance of the existence of the
problem. As long as the network output is within a small
distance from the desired output, this approach is acceptable. Topology Epochs µ0 µ±2 µ±5 µ±10 µ±20 Data
However, we believe that as the problem size will start to grow 1-5-5-1 80000
3% 15% 35% 65% 90% Train
it might be very difficult to maintain this approach. 5% 20% 40% 60% 90% Test
Therefore, in our research we have chosen to deal with the
data in a binary form, expecting that this approach leads to
a more stable network. It is our assumption that by applying networks outperform the networks proposed in the study of
a binary approach, the danger where small differences in the Meletiou et al.
network output might lead to values far removed from the First of all, by observing the results for N ≤ 10000, the
desired output is greatly reduced, because the restricted output values for the (binary) complete measure, that is 64% and 53%
[−1.0, 1.0] of the network output neurons is divided into only for the training data and the test data respectively in case of
two large sub-spaces opposite to M very small sub-spaces as single neural networks are much higher than the percentages
is in the decimal approach. 3% and 5% for the training data and test data respectively
From a different point of view, applying ANNs dealing reported in the study of Meletiou et al.
with the data in a binary form, the problem can be seen as Moreover, in an attempt to make a fair comparison of
a classification problem instead of a function approximation the overall performance of the two different networks, the
problem, where every output neuron has to decide to which average number of required trial and error procedures in order
class the input belongs. to obtain the true prime number for the reported test data
The second difference is related to the function to approx- will be taking into consideration. Regarding the test data
imate. Meletiou et al. focused on approximating the mapping for N ≤ 10000, Meletiou et al. reported for the successive
N → ϕ(N ), while we tried to approximate the mapping measures µ±0 and µ±2 5% and 20% respectively. This means
N → p. Both realizations of these mappings enable one to that 20% − 5% = 15% of the data is within µ±2 and not
factor integers. However, the possible output range for N → p within µ±0 . Furthermore, for that 15% of the data the true
is much smaller than for N → ϕ(N ) when N is bounded by a prime is sure to be found within 4 trial and error procedures3.
certain upper value. Secondly, it takes less work to verify the Therefore, we will assume that it takes 2 trial and error
validity of the network output and to compute the complete procedures on average. By accumulating the average number
factorization of N after obtaining p than after obtaining ϕ(N ). of trial and error procedures for each data set proportional to
One more difference worth to mention is that Meletiou et their size, it is possible to calculate the average number of trail
al. used two interesting, but hard to use non-straightforward and error procedures for all the reported data:
techniques, the so-called deflection technique [10] and function
“stretching” [11] method, to overcome convergence to local 4+10
15 · 4
+ 20 · 10+20
+ 20 · + 30 · 20+40
minima. We were able to obtain satisfactory results without 2 22 2
= 15.2
the use of these kinds of techniques. 90
A similar calculation can be performed on the data
B. Comparison of Results reported for our simulations. Considering the test data
In Tables VI and VII the results reported by Meletiou et N ≤ 10000, we have reported 53% and 69% for the suc-
al. are shown for the problem instances N ≤ 1000 and N ≤ cessive binary measures β0 and β1 respectively. Therefore,
10000 respectively, where 66% of the data set was used for 69% - 53% = 16% of the test data is within β1 and not within
training. β0 , and for that 16% at most 6 trial and error procedures are
required in order to find the true prime according to Eq. 4.
TABLE VI Therefore we will assume that it takes half the number, i.e.
R ESULTS R EPORTED BY M ELETIOU ET AL . FOR N ETWORKS T RAINED 3 trial and error procedures on average. The average number
WITH 66% OF THE D ATA S ET WITH N = p · q ≤ 1000 of trial and error procedures for all the reported data can be
given by:
Topology Epochs µ0 µ±2 µ±5 µ±10 µ±20 Data 6 6+21 21+41
16 · 2 + 14 · 2 + 11 · 2
5% 20% 40% 60% 80% Train = 6.1
1-3-5-1 60000 94
5% 20% 40% 50% 80% Test
Although these average values for the required number of
6% 20% 50% 70% 100% Train
1-7-8-1 50000 trial and error procedures have some inaccuracy, because not
5% 20% 50% 70% 90% Test
all the test data (90% and 94%) is used in the calculations, in
3 The data is within µ
±2 , therefore the correct output can be obtained within
Comparing those results with the results reported in Tables the range of 2 above and 2 below of the actual output resulting in a maximum
II and II, it can be easily noticed that our proposed neural of 4 trail and error procedures
2580
our opinion these values still give a very good understanding
of the performance gain that can be achieved by the newly
proposed method. We believe that these results support the
choice for a binary approach.
Moreover, a positive side effect is that the number of
training epochs in our experiments, 5000 and 15000 for SNNs
and MNNs respectively, is much lower compared to 50000-
80000 training epochs reported in the study of Meletiou et al.,
who also applied among others the RPROP learning algorithm.
2581
[6] M. Riedmiller and H. Braun, “A direct adaptive method for faster [9] G. C. Meletiou, D. K. Tasoulis, and M. N. Vrahatis, “A first study of
backpropagation learning: The RPROP algorithm,” in Proc. of the IEEE the neural network approach in the RSA cryptosystem,” in 6th IASTED
International Conference on Neural Networks, San Francisco, CA, 1993, International Conference Artificial Intelligence and Soft Computing ASC
pp. 586–591. 2002, Banff, Canada, July 2002, pp. 483–488.
[7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal [10] G. D. Magoulas, M. N. Vrahatis, and G. S. Androulakis, “On the alle-
representations by error propagation,” Parallel Distributed Processing: viation of the problem of local minima in backpropagation,” Nonlinear
Exploration in the Microstructure of Cognition, vol. 1, pp. 318–362, Analysis T.M.A., vol. 30, no. 7, pp. 4545–4550, 1997.
1986. [11] K. E. Parsopoulos, V. P. Plagianakos, G. D. Magoulas, and M. N. Vrahat-
[8] P. Marrone. (2005) Java object-oriented neural engine (JOONE). is, “Objective function “stretching” to alleviate convergence to local min-
[Online]. Available: http://www.jooneworld.com ima,” Nonlinear Analysis T.M.A., vol. 47, no. 5, pp. 3419–3424, 2001.
2582