• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
BIRTHDAY PARADOX\u2013
A SIMULATION OF SHARED BIRTHDAY EXPERIMENTS
Stephen Hogan (50217631)
Postgraduate Diploma in IT (Evening)
Dublin City University School of Computing
hogans2@mail.dcu.ie
Abstract\ue000Being one of the most well-known and famous problems in
probability1, the question is asked: \u201cHow many randomly chosen people
are needed to achieve at least a 50% probability that some pair will both
have been born on the same day?\u201d Since the chance of any two persons

having the same birthday is remote, many of us would expect this number to be rather large. However, it turns out that this is not the case, and hence the paradox.

A simulation of empirical testing and results was conducted to simulate multiple trials, where people of a given group size have their birthdays compared. The probability is consistently monitored and established as the number of successful experiments as a proportion of the total number of experiments performed.

As the theoretical value is already known (23), it was also the goal-driving
result for the algorithm being implemented in Java, and happily, was
achieved.
1. INTRODUCTION

Premise born in 19382: In a room of just 23 people, there is a 50% probability of two of these people sharing the same birthday, (ignoring years of birth). In a room of 75 people, there is a 99.9% chance of two people with matching birthdays. This is one particular case

ofex p o n en t ia l
(Saliusian) sets, where duplicates are allowed. Exponents are not intuitive,
and thus why our linear-thinking leads us to an incorrect estimation!

While theoretical mathematical models and proofs have been derived in previous works outside the scope of this paper, the object of this paper is not to put the formulae into action; moreover the objective of this paper is to highlight a proof by example; i.e. that a simulation of birthdates for a group of people is analysed for this probability result, based on the aforementioned premise.

For the following two sections, Background and Method, to be
presented here, both share the following assumptions:
\ue000

That there are only 365 days in a year, i.e. thus ignoring leap years. This also results in ignoring the suspension of leap day on years divisible by 100 that are also divisible by 400.

\ue000
Birth years are ignored.
\ue000
People\u2019s birthdays are equally distributed throughout the year; (i.e.

influencing elements such as seasonality are not factored in). Obviously in real-life, birthday distributions are not uniform, i.e. not all dates are equally likely.

\ue000
The date of a person\u2019s birthday does not affect the date of another
person\u2019 birthday, i.e. twins, triplets, etc.
1 This is not a paradox in the literal sense\u2013 it just highlights the fact that people
expect the value to be much larger.
2 American Mathematical Monthly in 1938 in Zoe Emily Schnabel's The estimation
of the total fish population of a lake, under the name of capture-recapture statistics.
2. BACKGROUND
Mathematical Model

One of the basic rules of probability: the sum of the probability that an event will happen and the probability that the even will not happen is always 1. In other words, the chance that anything might or might not happen isa l wa ys 100%.

If we can work out the probability that no two people will have the same birthday, we can use this rule to find the probability that two people will share a birthday:

P(event happens) + P(event does not happen) = 1
\u2192P(two people share birthdays) + P(no two people share birthdays) = 1
\ue000P(two people share birthdays) = 1\u2013 P(no two people share birthdays)
The formula for the probability thatn people have different birthdays
(month and day) is3:
\ue000
\ue001
n
n
365
*
!
365
365
\ue000
(1)
Therefore, the probability that at least two of them share the same birthday
is:
\ue000
\ue001
n
n
365
*
!
365
365
1
\ue000
\ue000
(2)

Having (2) graphed in Figure 1 it is clearly seen where, at the probability of 50%, cross-referencing it with the number of people reveals a value of 23:

Figure 1: A graph showing the approximate probability of at least two
people sharing a birthday amongst a certain number of people4.
3 As Dr Math FAQ\u2013 The Birthday Problem
(http://mathforum.org/dr.math/faq/faq.birthdayprob.html)
4 Wikipedia - Birthday Problem(http://en.wikipedia.org/wiki/Birthday_paradox)
Comment [JM1]: Keep it formal and to
the point.
Comment [JM2]: Use a reference rather
than a footnote.
Comment [JM3]: A reference would be
welcome here.
Comment [JM4]: This could be

compressed by simply giving a single
equation and pointing the reader to a
reference whre more detail is given.

Comment [JM5]: Mixed font size.
3. METHOD
Programming Methodology

The simulation was written in the Java language5. A text file that lists multiple trials with the following parameters serves as input to the program, (defaults outlined here are geared to solving the problem in question):

\ue000
Number of matches to be checked (default:K = 2)
\ue000
Number of trials (various >> values)
\ue000
Starting group size (default:N = 2)
\ue000
Group size increment (default: 1)
\ue000
Terminating probability (default: 0.5)
Assumptions:
\ue000
The group size will never be greater than 1,000.
\ue000

In the case that the starting value ofN is less thanK, you should start the simulation withN =K. The reason is simple: How do you possibly find three (i.e.K) matches in a group of two (i.e.N) persons?

Beginning with group size ofN = 2 people, we initialise an array with the random birthdays ofN = 2 people; (the random number generator is being seeded with the current time). We compare every pair wise (number of

matches K = 2) combination of people in the group of N = 2 and check the
existence of any two persons having the same birthday.

This will be repeated number of trials times with different groups of two people. If the average occurrence of two persons having the same birthday in these one thousand trials exceeds 0.5 (i.e. the probabilityP), the simulation terminates. Otherwise,N is incremented by group size

increment = 1, and the entire simulation of one thousand trials is repeated
with randomized groups of (starting group size = 2) + (group size
increment = 1) people.

The flexibility in reading in values from a file that control the execution of the algorithm allows us to evaluate numbers of people for various probabilities and/or enumerate possible combinations of more than two persons, differing numbers of trials and matches, and check for shared birthdays, for example. A review of the Java source code in Appendix 2 should reveal other variations.

For each trial, a number of random birthdays are generated and placed into an array; (here, we use the Julian Date format, 1\u2026365). These birthdays are sorted and then iterated through to find the same values in consecutive elements in the array, denoting a success. Once the trials have completed running, the probability is evaluated as6:

currProbability = numSameBirthday / numTrials
4. RESULTS & DISCUSSION

With the availability of having an input file, multiple case scenarios can be generated. Table 1 is an example of sample data was available on the input file, (as per aforementioned format in Section 3):

2
10000
2
1
0.5
2
20000
2
1
0.5
2
30000
2
1
0.5
2
50000
2
1
0.5
2
700000
2
1
0.5
2
1000000
2
1
0.5
2
2000000
2
1
0.5
2
5000000
2
1
0.5
Table 1: Input File sample data.
5 Adapted from [2]
http://www.comp.nus.edu.sg/~cs1101cl/labs_sem2_0405/lab3/oddweek/paradox.c
6 See Appendix 1\u2013 Pseudocode.

The last record in this table is graphed in Figure 2. Preparing graphs for all records processed in this table (Appendix 4) reveals, interestingly, that time performance dips in proportion to the curve of probability when tending to 23 people, irrespective of the number of trials:

Simulation 8
0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
0.60000
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
No. People
P
r
o
b
a
b
i
l
i
t
y
05000
10000
15000
20000
25000
T
i
m
e
(
m
s
)
Probability
Time (ms)
Figure 2: Simulation 8
5. CONCLUSIONS & RECOMMENDATIONS

This research reports on asim u la ted empirical study of the Birthday Paradox. The findings suggest that there is a strong similarity between the theoretical and (simulated) actual probabilities. Specifically, that for any group size or number of trials, to achieve at least a 50% probability in solving the problem, we would need at least 23 people in a room for comparison (with previously unknown birthdays).

The input file allows for variations on the number of people > pairs as well as the group increment size to be > 1, as well as other terminating probabilities.

A further recommendation would be to refine the sorting algorithm even
further, or adopt a faster mechanism of sorting, especially for largeN and
K. Parallelisation of Quicksort, for example, would be ideal, as
synchronisation is not a requirement. Java Threading would be an
approach for this.
Another approach to solving this problem would be to base it onc o l l i sions
\u2013by tracking as each person enters the room and checking to see if there is

a match with any other person. An array of 365 elements would only be needed, and a random date only generated for each person until either the entire group size is exhausted or a match has been found. The author of

this paper decided against this approach after initially selecting it, as it
would not be possible to measure time performance as fluidly.
6. REFERENCES
[1] Birthday Paradox Wikipedia.com
(http://en.wikipedia.org/wiki/Birthday_paradox)
[2] CS1101C Lab 3\u2013 Birthday Paradox National University of Singapore,
School of Computing
(http://www.comp.nus.edu.sg/~cs1101cl/labs_sem2_0405/lab3/oddweek/)
[3] How to Generate Random numbers About.com
(http://java.about.com/od/javautil/a/randomnumbers.htm)
[4] Quick Sort Implementation with median-of-three partitioning and
cutoff for small arraysJa va -T ips . o rg
(http://www.java-tips.org/java-se-tips/java.lang/quick-sort-
implementation-with-median-of-three-partitioning-and-cutoff-for-small-
a.html
)
Comment [JM6]: Why not use algrbraic
notation as in (1) and (2). Also: This eqn is
not numbered.
Comment [JM7]: Explain the entries in
the table.
Comment [JM8]:Why capitalise?
Comment [JM9]: You need to explain
what Simulation 8 is.
Comment [JM10]:We
APPENDIX 1\u2013 PSEUDOCODE
SET variables

numTrials = any large number
currGroupSize = 2
currProbability = 0.0
terminatingProbability = 0.5

DO
{
SET variable numSameBirthday = 0
DO WHILE currProbability <= terminatingProbability
{
DO FOR EACH trial FROM 1 TO numTrials
{
SET RANDOM number FOR EACH ELEMENT IN birthday[] UNTIL birthday[currGroupSize]
Sort birthday[] into ascending order

Check for match between consecutive elements: IF TRUE THEN numSameBirthday = numSameBirthday + 1
} END FOR
currProbability = numSameBirthday / numTrials
currGroupSize = currGroupSize + groupIncrement

} END WHILE
} END DO LOOP
Comment [JM11]: Complexity of t
could have been discussed in the paper.
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...