Saturday, December 24, 2016

Binomial Parameter Estimates

More on estimating class qualification probabilities.

Choosing a Character Attribute Generation Method

On p. 11 of the DMG four alternatives for generating character attributes are given. How should a referee decide which one to use?

The referee might have a sense for how easy it should be to roll up a prestige class. Does he want the players all playing paladins and rangers instead of fighters? If so, he should consider the chances listed in the previous post. The 4d6kh3 with re-arrangement method gives the best chances overall.

Did the Designers Know the Odds?

If you asked the designers forty years ago to guess the chance of rolling a paladin, how close would they have been? The calculation requires writing code and running it on a modern computer, so the exact chance was probably unknown in the 1970s. Still, the designers could have made a reasonable guess just by rolling lots of characters.

Monte Carlo Method

To estimate the chance of rolling a class, the designers could have rolled 100 characters and counted how many of them qualified. If 35 of them qualified, the estimated chance is 35%. This is the Monte Carlo method applied to the binomial parameter estimation problem.

The Monte Carlo method is slick! It is easy to understand. Complicated mathematics is avoided.

Monte Carlo can be used to verify complicated mathematics. After writing code to do the exact calculation, I wrote code which uses Monte Carlo to make sure the exact code is in the right ballpark.

Often it isn't enough to get an estimate; we also have specify how accurate it is. We can increase our accuracy by doing more rolls. For example, we will get a better estimate of the chance of a paladin if we roll 1000 characters instead of 100 characters. It is natural to ask how many characters must we roll to get an answer which is accurate enough for our purpose.

Normal Approximation Interval

One way to represent the accuracy of an estimate is with a confidence interval. For binomial parameter estimation, one way to get a confidence interval is with the normal approximation:
$$p \pm z_{1 - \frac{α}{2}} \sqrt{\frac{p \cdot (1 - p)}{n}}$$Here n is the number of characters generated, p is the observed fraction, and α is the chance of being outside the confidence interval. When α is 5%, z1-α/2 is 1.96.  In R one can compute z1-α/2 with

  qnorm(1-alpha/2)

The normal approximation suggests that if we increase the number of simulations 10-fold, the length of the confidence interval will shrink to about 0.32 of its previous size.

The normal approximation doesn't work well when the observed fraction p is close to zero or 1.  For example, when n is 100 and p is .01, the 95% confidence interval is [-.95%, 2.95%] even though we know a negative value is not possible. This is a consequence of using the normal distribution as an approximation for a distribution with non-negative support.

In the extreme cases where p is exactly 0 or 1, the normal approximation gives an obviously invalid interval with zero length. The usual advice is to never use the normal approximation when there are fewer than 5 successes or 5 failures. That is, there should be at least 5 characters which qualify as a paladin and 5 characters which don't.

Wilson Score Interval

At the cost of some extra complexity, there is a formula which works better when p is near zero or one:
$$\frac{1}{1 + \frac{1}{n}z_{1-α/2}^2} \bigg[p + \frac{1}{2n} z_{1-α/2}^2 \pm z \sqrt{\frac{1}{n} p (1 - p) + \frac{1}{4n^2} z_{1-α/2}^2}  \bigg]$$

Beta Distribution

We can look for a distribution which describes the possibilities for the parameter we are estimating. This is more informative than a confidence interval. Bayesian statistics gives us a way to do this, but we must start with an initial distribution, the prior, which is somewhat arbitrarily chosen. We use the data we observe and the prior to compute the posterior distribution, which is the distribution we seek.

When estimating a binomial parameter, the beta distribution is convenient because it makes computation of the posterior easy. The beta distribution has two parameters. If the prior has parameters α and β and we observe m successes in n trials, then the posterior is a beta distribution with parameters α + m and β + n - m. We can think of the parameters as tallies of the number of successes and failures we observe.

How should α and β be chosen for the prior? If we set them both to 1, then the prior is identical to the uniform distribution. That is, we are saying that before we look at the data we think each possible value for the parameter is equally likely. For those who know information theory, the beta distribution with parameters (1, 1) is the beta distribution with highest entropy. Such a prior, incidentally, was how Laplace solved the sunrise problem.

Suppose we decide to start with a beta (1, 1) prior. We roll 100 characters and 35 of them qualify. Then the posterior parameters are (36, 66). We can use R to compute the chance the actual probability is within any interval. For example that chance that the true probability is inside [.32, .38] is 47.4%:

  > pbeta(.38, 36, 66) - pbeta(.32, 36, 66)
  [1] 0.4739478

If we think we already know the mean and variance for the distribution, we can use them to set the parameters of the prior. Here is how to use Mathematica and the formulas for the mean and variance to solve for the parameters:

  Solve[{μ == α/(α+β),
         σ^2 == α β/((α+β)^2 * (α+β+1))},
        {α,β}]

Here are the formulas for the parameters, given mean μ and variance σ2:
$$α = \frac{μ^2 - μ^3 - μ σ^2}{σ^2}$$ $$β  = \frac{(-1 + μ) (-μ + μ^2 + σ^2)}{σ^2}$$

Hoeffding's Inequality

Hoeffding's Inequality is invaluable when doing Monte Carlo estimates. It directly tells us how much data we need to ensure our estimate is near the true value with a given probability:
$$ \mathrm{P}\big(\big|p - \mathrm{E}[p]\big| \geq t\big) \leq 2 e^{-2nt^2}$$
Here p is the estimate, E[p] is the true value, t is half the length of our confidence interval, and n is the number of observations. If we know the probability and interval we want, we recast the inequality as:
$$ \frac{\mathrm{ln} \frac{2}{\mathrm{P}(|p - \mathrm{E}[p]| \geq t)}}{2 t^2} \leq n$$ If we want to be 95% certain the estimate is within a tenth of a percent, we get this lower bound on n:
$$ \frac{\mathrm{ln} \frac{2}{.05}}{2(0.001)^2} \approx 9.22 \times 10^5 \leq n$$ The number might seem discouragingly large, but generating a million characters is feasible on modern computers; just write code to do it!