Chapter 12 Discrete Random Variables, pt II
12.1 The Bernoulli Distribution
The Bernoulli distribution is perhaps the simplest discrete distribution. Suppose we have a random variable \(X\) that can take on only one of two values, which we’ll denote as \(X \in \{ 0, 1 \}\). The Bernoulli distribution is often framed in terms of experiments, where the \(X=1\) outcome is labeled as a “success.” With two possible outcomes, there are only two probabilities: the “success” probability \(\Pr(X=1)=\pi\) and \(\Pr(X=0)=1-\pi\). It is important to note that \(\pi\) here is a probability between 0 and 1. It is not the constant 3.141592\(\ldots\)
Stated more formally, a Bernoulli random variable \(X \in \{0, 1\}\) has the probability mass function \[\Pr(x)= \left\{ \begin{array}{ccl} \pi & \, & \mbox{if }\, x=1 \\ 1-\pi && \mbox{if } \, x=0 \end{array} \right.\] A shorthand for the above PMF is to write that \(X\) is distributed Bernoulli(\(\pi\)), or \(X \sim \mbox{Bernoulli}(\pi)\).
A coin toss is a good example of a Bernoulli distribution. Suppose \(X \in \{ H, T \}\) is a random variable, representing the outcome of flipping a coin. If the coin is “fair,” then \(\pi=.5\), \(X \sim \mbox{Bernoulli}(.5)\), and the PMF is \[\Pr(x)= \left\{ \begin{array}{ccl} .5 & \, & \mbox{if }\, x=H \\ .5 && \mbox{if } \, x=T \end{array} \right.\]
The Bernoulli distribution allows for any valid success probability \(0 \le \pi \le 1\). Suppose we have a coin that lands heads up 70% of the time. The Bernoulli PMF for this RV \(X \in \{ H, T \}\) is \[\Pr(x)= \left\{ \begin{array}{ccl} .7 & \, & \mbox{if }\, x=H \\ .3 && \mbox{if } \, x=T \end{array} \right.\] And, again, a shorthand for this PMF is \(X \sim \mbox{Bernoulli}(.7)\).
12.2 The Expected Value of a Random Variable
The expected value \(E(X)\) of a random variable \(X\) is a characteristic of its probability mass function \(\Pr(x).\) It is the distribution analog to the sample mean.
Suppose the RV \(X\) takes values \(X \in \{1, 2, 3 \}\) with probabilities \(\Pr(X=1)=.3\), \(\Pr(X=2)=.2\), and \(\Pr(X=3)=.5\). The expected value of \(X\) is calculated as \[E(X) = \sum_{i=1}^3 x_i \, \Pr(X=x_i) = 1 (.3) + 2 (.2) + 3 (.5) = 2.2\] For discrete random variables, the expected value can be thought of as a weighted average of the outcome values, where the weights are the probabilities. More generally, for any RV \(X\) with a finite set of outcomes \(X \in \{x_1, x_2, \ldots , x_k \}\) and a valid probability mass function specifying the probability \(\Pr(x_i)\) of observing outcome \(x_i\), the expected value is calculated as \[E(X) = \sum_{i=1}^k x_i \, \Pr(x_i)\]
Let’s take a look at a few more examples. First, consider the Bernoulli distribution above where \(X \sim \mbox{Bernoulli}(.7)\). \(X\) can take values zero or one. The probability that \(X=1\) is .7 and the probability of observing a zero is .3. The expected value of \(X\), \(E(X)\), is \[E(X) = \sum_x x \, \Pr(x) = 0 (.3) + 1 (.7) = .7\] Now, consider the Bernoulli distribution representing a fair coin toss: \(X \sim \mbox{Bernoulli(.5)}\). Here, the expected value of \(X\) is \[E(X) = \sum_x x \, \Pr(x) = 0 (.5) + 1 (.5) = .5\] Notice that for a random variable \(X\) that is distributed Bernoulli(\(\pi\)) — i.e., with success probability \(\Pr(X=1)=\pi\) — the expected value is \[E(X) = 0 (1-\pi) + 1 (\pi) = \pi\]
As a final example, suppose random variable \(X\) can take values \(\{ 0, 1, 2, 3, 4 \}\) with probabilities \(\Pr(0)=.1\), \(\Pr(1)=.3\), \(\Pr(2)=.2\), \(\Pr(3)=.3\), and \(\Pr(4)=.1\). The expected value for this distribution is \[E(X) = 0 (.1) + 1 (.3) + 2 (.2) + 3 (.3) + 4 (.1) = 2\]
Interactive Example: Expected Value vs Sample Mean
The following interactive example demonstrates the difference between the expected value of a random variable versus the sample mean. The default distribution is a fair six-sided die. The random variable is \(X\), a roll of the die. The probability distribution for \(X\) is shown in the table: \(\Pr(x)=\frac{1}{6}\) for each \(X\in \{1, 2, 3, 4, 5, 6\}\) The calculation for the expected value \(E(X)\) is shown below that. In comparison, the bottom half of the window displays a random sample of values. For the default case, think of the sample as rolling the die \(N\) times. You can change the sample size \(N\) in the left panel. The sample mean \(\bar{X}\) is shown at the bottom.
Keep the distribution as it is (rolling a fair die) and generate a few new samples. Notice that each time you draw a new sample of data, the mean will likely change, but the PMF and the expected value \(E(X)\) do not. That’s because the expected value \(E(X)\) is a characteristic of \(X\)’s probability distribution. To calculate \(E(X)\), we use the probability distribution, not the data sample. In contrast, the sample mean \(\bar{X}\) is a characteristic of the sample data. We calculate the sample mean using the data sample, not the probability distribution, which is why the sample mean \(\bar{X}\) tends to change when we draw a new sample. Finally, notice that as you draw different samples, the sample mean \(\bar{X}\) rarely ever equals the expected value \(E(X)\) exactly.
Work through this interactive example ten or more times, drawing new samples and changing the sample size. In the left panel, you can select another distribution, the Bernoulli. For the Bernoulli, you can change the “success” probability \(\pi\), as well as the sample size. The distribution, the expected value \(E(X)\), a new sample, and the sample mean \(\bar{X}\) will be shown in the right panel.
Practice Session: Expected Value of a Discrete Random Variable
In this practice session, you are shown the PMF for the random variable \(X\). The outcomes are displayed along the top of the table, with their probabilities immediately beneath. Calculate the expected value \(E(X)\) and submit your answer in the left panel. You will be shown whether your answer is correct or not. You can use R, a calculator, or pencil and paper to calculate the expected value. Try three or four problems, or until you feel comfortable with the calculations. If you are having difficulty with the calculations, you may want to review prior material in this section, as well as the previous interactive example.
12.3 The Variance of a Random Variable
The variance \(V(X)\) of a random variable \(X\) is a characteristic of its probability mass function \(\Pr(x)\). For a random variable \(X\), its variance is defined as \[V(X) = E\left\{ \left[ X-E(X) \right]^2 \right\}\]
Let’s briefly disect this. First, we have \(X-E(X)\). At the heart of the variance is the deviation of the random variable \(X\) from its expected value \(E(X)\). Second, that deviation (or difference) is then squared: \(\left[ X-E(X) \right]^2\). So variance is based on the squared deviation of \(X\) from its expected value \(E(X)\). Finally, notice that the squared deviation is “wrapped” in an expected value \(V(X) = E\left\{ \left[ X-E(X) \right]^2 \right\}\). Again, expected value is a distribution characteristic similar to the average. Very roughly speaking, the variance of \(X\) can be thought of as the average squared deviation of \(X\) from its mean \(E(X)\).
As with calculating \(E(X)\), we use the outcome values and their probabilities to calculate \(V(X)\). There are two common equations used to calculate a RV’s variance. One way is to use the definition of variance above. The second is derived from it: \[\begin{align} V(X) & = E\left\{ \left[ X-E(X) \right]^2 \right\} \\ & \color{gray}{\small \,\, = E\left\{ X^2 - 2 X E(X)+E(X)^2 \right\} } \\ & \color{gray}{\small \,\, = E(X^2) - 2 E(X) E(X)+E(X)^2 } \\ & = E(X^2) - E(X)^2 \end{align}\]
We now have two equations for the variance of a RV \(X\). If we know the outcomes of \(X\) and their probabilities, we can calculate \(V(X)\). However, there’s one technicality we need to address before doing so. In both equations, we need to calculate the expected value of a function of \(X\). The second version of \(V(X)\) is usually the easiest to use in practice, so we’ll start with that: \[V(X) = E(X^2) - E(X)^2\]
We’ve already seen how to calculate \(E(X)\). For the last term above, we simply square that value. Let’s refer back to the previous example PMF: \(X \in \{1, 2, 3 \}\) with probabilities \(\Pr(X=1)=.3\), \(\Pr(X=2)=.2\), and \(\Pr(X=3)=.5\). There, we found that \(E(X)=2.2\), so \(E(X)^2 = (2.2)^2 = 4.84\).
But how do we calculate \(E(X^2)\)? Fortunately, it’s not much different from calculating \(E(X)\): instead of substituting the values of \(x_i\) into the summation, we substitute \(x_i^2\):
\[\begin{align}
E(X^2) &= \sum_{i=1}^3 x_i^2 \Pr(x_i) \notag \\
& = (1^2)(.3) + (2^2)(.2) + (3^2)(.5) \notag \\
&= 5.6
\end{align}\]
Returning to our \(V(X)\) calculation, we therefore have \[\begin{align} V(X) &= E(X^2) - E(X)^2 \notag \\ & = 5.6 - (2.2)^2 \notag \\ &= .76 \end{align}\]
We can also calculate \(V(X)\) using \(V(X) = E\left\{ \left[ X-E(X) \right]^2 \right\}\). Again, we are faced with calculating the expected value of a function of \(X\). However, we handle this similarly to the previous case: \[\begin{align} V(X) &= E\left\{ \left[ X-E(X) \right]^2 \right\} \notag \\ & = \sum_{i=1}^3\left[ x_i-E(X) \right]^2 \Pr(x_i) \notag \\ & = \sum_{i=1}^3\left[ x_i-2.2 \right]^2 \Pr(x_i) \notag \\ & = (1-2.2)^2 (.3) + (2-2.2)^2 (.2) + (3-2.2)^2 (.5) \notag \\ & = .76 \end{align}\]
The following practice session gives you an opportunity to calculate \(V(X)\) for different probability mass functions. As practical advice, I would recommend conducting the calculation as follows:
- Calculate \(E(X) = \sum_{i=1}^k x_i \Pr(x_i)\)
- Calculate \(E(X^2) = \sum_{i=1}^k x_i^2 \Pr(x_i)\)
- Plug in 1 and 2 into \(V(X) = E(X^2)-E(X)^2\)
Interactive Example: Variance of a RV vs Sample Variance
In this example, you are shown the PMF for the random variable \(X\). The outcomes are displayed along the top of the table, with their probabilities immediately beneath. The default PMF is for a fair die, where the outcomes are \(Y \in \{ 1, 2, 3, 4, 5, 6 \}\) and each has probability 1/6 of occurring. The variance \(V(X)\) of random variable \(X\) is calculated using the distribution information – e.g., using \(E(X)\) and \(E(X^2)\). In the lower half, a random sample of data has been drawn. Using this sample of data, the sample variance \(s^2\) is calculated. Refresh the sample a few times. Notice that the PMF never changes, so the \(E(X)\) and \(V(X)\) do not change. However, different samples will produce different results for the sample variance. The interactive example allows you to change the sample size and to switch to a Bernoulli distribution. The calculations are slightly more complicated here. Try four or five different variations of the distributions and sample size, until you understand (1) the difference between a random variable’s variance versus a sample variance and (2) how to calculate a random variable’s variance \(V(X)\).
Practice Session: Variance of a Discrete Random Variable
In this practice session, you are shown the PMF for the random variable \(X\). The outcomes are displayed along the top of the table, with their probabilities immediately beneath. Calculate the variance \(V(X)\) and submit your answer in the left panel. You can use R, a calculator, or pencil and paper to calculate the variance. Try four or five variance value calculations.