IN THE CLASSROOMRICK HESSE, Feature Editor, Mercer University
SIMULATING THE BIRTHDAY PROBLEMby Rick Hesse, Mercer University I want to share a nice spreadsheet template that simulates the "birthday problem," which is presented in most statistics books when discussing probability. Most students are surprised that there is better than a 70% chance that in a group of 30 people randomly chosen, at least two share the same birth date (not necessarily the same year). Shown in Figure 1 is the complete spreadsheet template, which will be explained in detail. For this example run, there are two people born on November 3rd (day 307) and thus a match. For this run, the original birthdays are in column B while column C contains the same birthdays in ascending order. Column A contains the number of people you wish to simulate in the group, which in this case is 25. Cells C2..D2 give the range of the birthday number (1-365, excluding leap year) to simply show the data ranges. Starting in cell B4 is the simulated result of a random birthday: Lotus: @INT(($D$2-$C$2)*@RAND+1) <%0>) Of course this assumes an equal chance of being born on any of the 365 days, which is probably not correct, but close enough to being uniform for our purposes. Once these birthday numbers have been calculated, we need to determine if there are any matches. The easiest way to do this is to sort column B so that if there are two or more birthdays on the same day, these numbers will be adjacent to each other. But when this sorting is done, the numbers get recalculated (even with recalculation turned off, there are problems later on)! Therefore we use column C to do a Paste Special Value (or Range Value in classic Lotus) to "freeze" or save these numbers. Then we can sort these "frozen" values in column C and check to see if there are any matches in column D. The logic formula checks to see if the previous date is the same as the current one. Lotus (D5): C5=C4 Excel (D5): =(C5=C4)*1 It is necessary in Excel to change the logic result of True/False to a number by multiplying the logic statement in parentheses by 1. Even if 0 or 1 is showing, Excel doesn't recognize it as numeric unless you multiply by 1. With either spreadsheet it now is a simple matter to check if the sum of column D is 0. If it is, there is at least one birthday match. In Figure 1, there were two birthday matches. The formula in D4 is:
Lotus (D4):
Excel (D4):
The actual probability of there being at least two people with the
same birthday is given in E4 with the calculations of žnž people
not having the same birthday shown from E5 on down.
E4: 1-E28.
The math formula for the probability of no one with the same
birthday out of "n" people is P(n) = (366-n)/365*P(n-1) with P(1)
= 1. The formula for this is:
E5: ($D$2-A4)/$D$2
E6: E5*($D$2-A5)/$D$2 and copy down
The keyboard macro shown in G2..G5 is for classic Lotus. This macro
calculates a new set of random birthdays in column B, copies the
values over to column C, sorts it, and calculates any matches in
column D. This macro will also work in Lotus 4, but must be
rewritten to work in Excel. The macro is named \R (for recalculate)
and its range is defined as G2..G5. The text can be all in one cell
or broken down into successive cells as shown in Figure 2.
As a class or lab exercise, students can run this 20 times or so
and determine if the probability comes close to the expected value.
The template can also be easily changed for a different number of
people in the group.
Dr. Rick Hesse
|