ScottRomanowski
Forum Guru
Wayne Hadady posted this to the ASL mailing list, 16 February 2001:
Re the test method I use, I posted something about it a long while
back. I can't find that old post. Here is a new rendition of the
procedure.
Minimally roll the die 100 times, tallying the number of times each
face appears. Compare the tallies to the min/max number of
appearances one would reasonably expect (per stat theory) if the
die were truly fair. If any face appears too many or too few times,
reject the die, or continue to test.
If you continue to test, retain the data and add to it the tallies
of an additional 100 trials. As before, compare the new sums to the
expected min/max per face, but for 200 rolls (as opposed to 100).
If still not fair, perform another additional 100 trials, etc.
Thus, you might test an initial seemingly-bad die into ``goodness,''
but you never discard the accumulated data, and each successive test
boosts reliability (owing to the increased number of trials).
Eventually, the die tests fair (use it) or you tire of testing it,
and retire it.
A die will typically test fair (20 pct confidence band) within 300
trials. It might take a 1000 or more to get there, though.
Here are the rejection boundaries I use:
#rolls 20 pct 10 pct 5 pct
t 100 11--22 10--24 8--25
t 200 26--41 24--43 22--45
t 300 41--59 38--62 36--64
t 400 56--77 53--80 51--82
t 500 72--95 69--98 66--101
t 600 87--113 84--116 81--119
t 700 103--130 99--134 96--137
t 800 119--148 115--152 112--155
t 900 135--165 131--169 127--173
t 1000 151--183 146--187 143--191
t 1100 167--200 162--205 158--209
t 1200 183--217 178--222 174--226
t 1300 199--235 194--240 189--244
t 1400 214--252 209--257 205--262
t 1500 231--269 225--275 221--279
To pass, the tallies for Each face of the die must be entirely Within
the desired confidence band. EX: for a t100 test, if =any= face
appears
11 times or fewer
or
22 times or more,
that die Fails, at the 20 pct level.
A truly fair die will fail the 20 pct band (giving a =false= indica-
tion of unfairness) 20 pct of the time. When that happens, you can
look at the 10 pct and 5 pct bands for a clue as to how badly it
failed. Based on that, you might test further (if it passed at the
10 pct level, say) or discard it (if it failed at the 5 pct level,
say).
I test to pass at the 20 pct level. If I get there in the first 100
rolls, and stick there after the next 100, I might rule the die fair
and go on to do other stuff (it's tedious work). If the die is diffi-
cult (passing at 10 or 5 pct, but not 20), I might keep testing it
for some hundreds of trials (keeping an eye on the result of each t100
as I build the data file). A truly fair die should eventually overcome
the skew of some unlucky t100 test and drift into the 20 pct band.
The dice I retired were all failures at the 10 or 5 pct level, after
up to 1500 trials, with consistent-looking defects in their t100 tests.
A die defect is a propensity toward too few or too many appearances
of one or more faces. A consistent-looking defect is one that occurs
in two or more t100 tests while building the data for that die. If
a die is trending slowly (or not at all) toward the 20 band, and I
see, in several t100s, the same defect (too many 5s, or too few 4s,
say), I am likely to give up on it well before 1500 trials.
The math from which I derived the bands can be found in the
REA Statistics Problem Solver
problem 16-56, found on page 623 of the 1985 printing.
It can be argued that my method will fail twenty percent of all dice,
even if every die is fair. I believe studying the t100 results for
patterns, in addition to looking at the cumulative results, provides
better reliability than that. I've made no attempt to quantify the
degree to which that improves overall confidence in the results. One
wouldn't expect a truly fair die to consistently fail several t100
tests in a similar way, though, so when that happens, there are
grounds to doubt the die.
This test is fast in that a die might be Passed in just a few hundred
rolls. OTOH, it allows one to continue growing a trial size to some
satisfying level, if the quick test leaves you in doubt.
Wayne