Joint Continuous Distribution & Expectations (2)
A committee of two is randomly selected from three teachers, two students, and one parent.
Let \(X\) be the number of teachers on the committe, and \(Y\) the number of students.
Find:
- the marginal pmf of \(Y\)
- the conditional pmf of \(Y\) given that \(X = 0\)
- the correlation between \(X\) and \(Y\)
\(X\) and \(Y\) have joint probability mass function \(p(x, y)\) given by
\[ p(x, y) = \frac{\binom{3}{x} \binom{2}{x} \binom{1}{2 - x - y}}{\binom{6}{2}} \]
For \(x \in [0, 2]\) and \(y \in [0, 2]\) with constraint \(1 \leq x + y \leq 2\)
Hence, we can fill in the following table remembering that it needs two people for the committee.
X \ Y | 0 | 1 | 2 |
---|---|---|---|
0 | \ | 2/15 = 0.13 | 1 / 15 = 0.07 |
1 | 3 / 15 = 0.2 | 6/15 = 0.4 | \ |
2 | 3 / 15 = 0.2 | \ | \ |
Therefore, we get that the marginal probability of \(Y\) is
\[ p_Y(y) = \begin{cases} 6/15, & y = 0 \\ 8/15, & y = 1 \\ 1/15, & y = 2 \\ \end{cases} \]
The conditional pmf of \(Y\) given that \(X = 0\) is trivial from the above as,
\[ p_{Y\mid X}(y|0) = \begin{cases} 0, & y = 0 \\ 2/3, & y = 1 \\ 1/3, & y = 2 \\ \end{cases} \]
- The Correlation between \(X\) and \(Y\)
First, find the expectations of \(X\) and \(Y\) first as,
\[ \begin{align} E(X) &= 1 \times 0.6 + 2 \times 0.2 = 1 \\ E(Y) &= 1 \times \frac{8}{15} + 2 \times \frac{1}{15} = 0.67 \\ \end{align} \]
Hence,
\[ \begin{align} Cov(X, Y) &= E(XY) - \frac{2}{3} \\ &= \frac{6}{15} - \frac{2}{3} \\ &= -\frac{4}{15} = -0.27 \end{align} \]
Now, in order to find the correlation, we also need to find the standard deviation of the two random variables.
\[ \begin{align} Var(X) & = E(X^2) - \mu_X^2 = \frac{9}{15} + 2^2 \frac{3}{15} - 1 = \frac{9 + 12 - 15}{15} = \frac{6}{15} \\ Var(Y) & = E(Y^2) - \mu_Y^2 = \frac{8}{15} + 2^2 \frac{1}{15} - \frac{4}{9} = \frac{12 * 15 - 100}{225} = \frac{80}{225} = \frac{16}{45} = 0.36 \\ \end{align} \]
Therefore, the correlation is,
\[ Cor(X, Y) = \frac{Cov(X, Y)}{\sqrt{6/15}\sqrt{16/45}} = -\frac{1}{\sqrt{2}} = -0.7071 \]
1 Laws of Multivariate Expectation
- \(E(c) = c\)
- \(E(cg(X, Y)) = cE(g(X, Y))\)
- \(E(g_1(X, Y) + g_2(X, Y) + \cdots + g_k(X, Y)) = E(g_1(X, Y)) + \cdots + E(g_k(X, Y))\)
- If \(X \perp Y\) then \(E(g(X)h(Y)) = E(g(X))E(h(Y))\)
Proof. The proof for all these can simply be extended from the single variate situation by simply exchanging the pmf to joint pmf.
2 Independence
We say that \(Y_1, Y_2, \ldots, Y_n\) are pairwise independent iff
\[ p(y_i, y_j) = p(y_i)p(y_j) \qquad \text{for all $i < j$} \]
We say that \(Y_1, Y_2, \ldots, Y_n\) are totally independent if
\[ \begin{align} & p(y_i, y_j) = p(y_i)p(y_j) & \text{for all $i < j$} \\ & p(y_i, y_j, y_k) = p(y_i)p(y_j)p(y_k) & \text{for all $i < j < k$} \\ & \dots \\ & p(y_1, y_2, \ldots, y_n) = p(y_1)p(y_2) \ldots p(y_n) & \\ \end{align} \]
3 Multinomial Distribution
This is simply a generalisation of the binomial distribution.
Consider \(n\) independent and identical trials, on each of which there are \(k\) possible outcomes. On each trial let \(p_i\) be the probability of outcome \(i\) and let \(Y_i\) be the total number of trials with outcome \(i\) (\(i = 1, \ldots k\)).
Then \(Y_1, Y_2, \ldots , Y_k\) have a multinominal distribution with joint pmf
\[ p(y_1, y_2, \ldots, y_k) = \frac{n!}{y_1 !y_2 ! \ldots y_k !} p_1^{y_1}p_2^{y_2}\cdots p_k^{y_k}, \qquad y_i \in \{ 0, 1, \ldots, n\}, \sum_{j = 1}^k y_j = n, \]
(\(p_i \in [0, 1] and p_1 + p_2 + \cdots + p_k = 1\)).
We write \(Y_1, Y_2, \ldots, Y_k \sim \text{Mult}(n; p_1, p_2, \ldots , p_k)\).
3.1 Multinomial Cefficients
The number of ways of partitioning \(n\) distinct objects into \(k\) distinct groups is
\[ \binom{n}{y_1 \ldots y_k} = \frac{n!}{y_1 ! y_2 ! \ldots y_k !}, \qquad y_i = \text{count in $i$-th group} \]
Trial | 1 | 2 | 3 | 4 | 5 | n | |
---|---|---|---|---|---|---|---|
Group | 1 | 3 | 5 | 1 | 1 | 2 |
Given the sequence in Table 1, we can state the probability as \[ \begin{align} P(sequence) &= p_1p_3p_5p_1p_1\cdots p_2 \\ &= p_1^{y_1} \ldots p_k^{y_k} \end{align} \]
The probability of obtaining a distinct sequene where we observe \(y_1, y_2, \ldots y_k\) is given by \[ P(y_1, y_2, \ldots , y_k) = \binom{n}{y_1 \ldots y_k} p_1^{y_1}p_2^{y_2} \ldots p_k^{y_k} \]
3.2 Example
On 10 rolls of a die, what’s the pr there will result 3 even numbers and 2 ones?
Let \(Y_1\) be the number of even numbers, \(Y_2\) be number of ones, and \(Y_3\) be number of threes and fives (non-evens and non-ones).
Then \(Y_1, Y_2, Y_3 \sim \text{Multi}(10; 1/2, 1/6, 1/3)\) with pmf \[ p(y_1, y_2, y_3) = \frac{10!}{y_1 ! y_2 ! y_3 !} \left( \frac{1}{2} \right)^{y_1} \left( \frac{1}{6} \right)^{y_2} \left( \frac{1}{3} \right)^{y_3} \] So, \[ p(3, 2, 5) =\frac{10!}{3! 2! 5!} \left( \frac{1}{2} \right)^3 \left( \frac{1}{6} \right)^2 \left( \frac{1}{3} \right)^5 = 0.03601 \]
4 Three Important Theorems Regarding Linear Combinations of Random Variables
Theorem 1
\[ E \left\{ \sum_{i = 1}^n a_i Y_i \right\} = \sum_{i = 1}^n a_i\mu_i \]
Theorem 2 \[ Var\left( \sum_{i = 1}^n a_i Y_i \right) = \sum_{i=1}^n a_i^2 \sigma_i^2 + 2 \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^n a_i a_j \sigma_{ij} = \sum_{i = 1}^n \sum_{j = 1}^n a_i a_j \sigma_{ij} \]
Theorem 3 \[ Cov\left( \sum_{i = 1}^n a_i Y_i, \sum_{i = 1}^n b_i Y_i \right) = \sum_{i=1}^n \sum_{j = 1}^n a_i b_j \sigma_{ij} \]
Proof. The proof of Theorem 1 is trivial given the linearity of the expectation. The proof of Theorem 2 is also trivial if Theorem 3 is proven. Hence, we would only need to prove Theorem 3.
\[ \begin{align} \text{LHS} & = E \left[ \left( \sum_{i = 1}^n a_i Y_i - E\left( \sum_{i = 1}^n a_i Y_i \right) \right) \left( \sum_{j = 1}^n b_j Y_j - E\left( \sum_{j=1}^n b_j Y_j \right) \right) \right] \\ & = E \left[ \left( \sum_{i = 1}^n a_i Y_i - \sum_{i = 1}^n a_i \mu_i \right) \left( \sum_{j = 1}^n b_j Y_j - \sum_{j = 1}^n b_j \mu_j \right) \right] \\ & = E \left[ \left( \sum_{i = 1}^n a_i (Y_i - \mu_i) \right) \left( \sum_{j = 1}^n b_j (Y_j - \mu_j) \right) \right] \\ & = \sum_{i = 1}^n \sum_{j = 1}^n a_i b_j E\left\{ (Y_i - \mu_i) (Y_j - \mu_j) \right\} = RHS \\ \end{align} \]
Suppose that \(Y_1\), \(Y_2\), and \(Y_3\) are three rv’s with means 2, -7, and 5, variances 10, 6, and 9, and covariances \(\sigma_{12} = -1\), \(\sigma_{13} = 3\), and \(\sigma_{23} = 0\).
Find:
- \(E(3Y_1 - 2Y_2 + Y_3)\)
- \(Var(3Y_1 - 2Y_2 + Y_3)\)
- \(Cov(3Y_1 - 2Y_2, Y_2 + 8 Y_3)\)
- \(E(3Y_1 - 2Y_2 + Y_3)\)
\[ \begin{align} E(3Y_1 - 2Y_2 + Y_3) & = 3 \mu_1 - 2\mu_2 + \mu_3 \\ & = 3 * 2 - 2 * (-7) + 5 = 25 \\ \end{align} \]
- \(Var(3Y_1 - 2Y_2 + Y_3)\)
\[ \begin{align} Var(3Y_1 - 2Y_2 + Y_3) & = \sum_{i = 1}^n \sum_{j = 1}^n a_i a_j \sigma_{ij} \\ & = 3 \times 3 \times 10 + 3 \times (-2) \times (-1) + 3 \times 1 \times 3 \\ & \qquad + (-2) \times 3 \times (-1) + (-2) \times (-2) \times 6 + (-2) \times 1 \times 0 \\ & \qquad + 1 \times 3 \times 3 + 0 \times 1 \times 1 \times 9 \\ & = 105 + 33 + 18 = 156 \\ \end{align} \]
- \(Cov(3Y_1 - 2Y_2, Y_2 + 8 Y_3)\)
Now, in order to use Theorem 3, we need to have the same list of variables. Therefore, we can use the following transformation,
\[ Cov(3 Y_1 - 2 Y_2, Y_2 + 8 Y_3) = Cov(3 Y_1 - 2 Y_2 + 0 Y_3, 0 Y_1 + Y_2 + 8 Y_3) \]
Then, we can apply Theorem 3 as the following,
\[ \begin{align} Cov(3 Y_1 - 2 Y_2, Y_2 + 8 Y_3) &= Cov(3 Y_1 - 2 Y_2 + 0 Y_3, 0 Y_1 + Y_2 + 8 Y_3) \\ & = 3(1)\sigma_{12} + 3(8)\sigma_{13} + (-2)1\sigma_{22} + (-2)8\sigma_{23} \\ & = 3(-1) + 24(3) - 2(6) - 16(0) \\ & = 57 \end{align} \]
In order to determine the mean with Theorem 1, we first need to decompose the random variable into linear combinations of smaller chuncks. In fact, we need to know the marginal distribution of all these randome variables while importantly, note that each of these trials are clearly not independent. In other words, suppose \(X_i\) be the random variable that denotes whether the outcome at trial \(i\) is successful (1) or not (0). Then, the marginal distribution of \(X_i\) is the following assuming \(Y \sim \text{HyperGeom}(N, r, n)\) and \(X_1 + \cdots + X_n = Y\) \[ \begin{align} P(X_i = 1) &= \sum_{\text{all configurations with $X_i = 1$}} p(x_1, x_2, \ldots , x_{i-1}, 1, x_{i+1}, \ldots , x_n) \\ & = \sum_{\text{all configurations with $X_i = 1$}} P(X_i = 1 \mid X_1 = x_1, \ldots X_{i - 1} = x_{i - 1}, X_{i + 1} = x_{i+1}, \ldots, X_{n} = x_n) P(\text{all other configurations}) \\ & = \sum_{y' = 0}^{\min(r, n - 1)}{\left(\frac{r - y'}{N - n + 1}\right) \frac{\binom{r}{y'}\binom{N - r}{n - 1 - y'}}{\binom{N}{n - 1}}} \\ & = \frac{1}{(N - n + 1)\binom{N}{n - 1}} \sum_{y' = 0}^{\min(r, n - 1)}{(r - y') \binom{r}{y'}\binom{N - r}{n - 1 - y'}} \\ & = \frac{1}{(N - n + 1)\binom{N}{n - 1}} \sum_{y' = 0}^{\min(r, n - 1)}{r \binom{r - 1}{y'}\binom{N - r}{n - 1 - y'}} \\ & = \frac{r}{(N - n + 1)\binom{N}{n - 1}} \sum_{y' = 0}^{\min(r, n - 1)}{ \binom{r - 1}{y'}\binom{N - r}{n - 1 - y'}} \\ & = \frac{r}{(N - n + 1)\binom{N}{n - 1}} \binom{N - 1}{n - 1} \\ & = \frac{r \binom{N-1}{n-1}}{(N-n+1)\frac{N}{N-n+1} \binom{N-1}{n-1}} \\ & = \frac{r}{N} \\ \end{align} \tag{1}\]
Note that:
- Going from line 2 to line 3, we have assumed that there are \(y'\) number of possible cases from \(X_1, \ldots , X_{i - 1}, X_{i + 1}, \ldots, X_n\).
- Going from line 3 to line 4 is nothing more than just algebraic manipulations.
- Going from line 4 to line 5 requires a relatively simple combinatoric identiy as \((r - k)\binom{r}{k} = r \binom{r-1}{k}\).
- Going from line 5 to line 6 requires only moving \(r\) out from the summation.
- Going from line 6 to line 7 requires the Vandermonde’s identity but assuming that \(r \geq n - 1\) 1. This identity simply states that \(\sum_{k = 0}^{m} \binom{a}{k} \binom{b}{m - k} = \binom{a + b}{m}\). The proof of this identity can be done by considering the binomial expansion of the binomial \((1+x)^a(1+x)^b\), which is essential the same as \((1+x)^{a+b}\).
- Going from line 7 to line 8 requires this identity: \(\binom{N}{n - 1} = \frac{N}{N - n + 1}\binom{N - 1}{n - 1}\). The proof of this identity is relatively obvious from the expansion of the binomial coefficient into the full form.
- Going from line 8 to line 9 requires only algebraic manipulations.
Then, from this probability, it is clear that for each trial, it still follows a bernoulli distribution \(\text{Bern}(r/N)\). Therefore, the mean and variance of each \(X_i\) is trivial from existing results. However, the covariances can be derived as, \[ \begin{align} Cov(X_i, X_j) & = E(X_i X_j) - E(X_i)E(X_j) \\ & = \sum_{x_i = 0}^1 \sum_{x_j = 0}^1 x_i x_j P(X_i = x_i, X_j = x_j) - \left( \frac{r}{N} \right)^2 \\ & = P(X_i = 1, X_j = 1) - \left( \frac{r}{N} \right)^2 \\ & = \frac{r(r-1)}{N(N-1)} - \left( \frac{r}{N} \right)^2 = \frac{r(r-N)}{N^2(N-1)} \\ \end{align} \tag{2}\] Note that:
- Going from second last to last line can be proven with a similar proof above. However, an intuitive way of thinking about it is using the law of total probability and first find \(P(X_i)\) and then the condintional probability \(P(X_j \mid X_i)\). This argument is possible because we can assume W.L.O.G by the unconditional is taken on the earlier event and the conditional even taken on the later event out of the two events. However, I suppose a more mathematical approach would be use a similar argument as Equation 1 but replace the first term in the summation in line 3 with \[\frac{\binom{r - y'}{2}\binom{N - (n - 2) - (r - y')}{0}}{\binom{N - n + 2}{2}}\] which should now be clear that it is simply a special of hypergeometric distribution where we only want positive cases.
Hence, now, we can apply Theorem 1 and Theorem 2 to obtain the required mean and variance of the hypergeometric distribution.
\[ \begin{align} E(Y) & = E \left( \sum_{i = 1}^n X_i \right) \\ & = \sum_{i=1}^n E(X_i) \\ & = \frac{nr}{N} \\ Var(Y) & = Var\left( \sum_{i=1}^n X_i \right) \\ & = \sum_{i = 1}^n \sum_{j = 1}^n \sigma_{ij} \\ & = \left(\sum_{i = 1}^n \sigma_i^2\right) + \left( \sum_{i=1}^{n-1} \sum_{j = 1 + 1}^n \sigma_{ij} \right) \\ & = n \frac{r}{N} (1 - \frac{r}{N}) + n(n-1) \frac{r(r-N)}{N^2(N-1)} \\ & = \frac{nr}{N} \left( 1 - \frac{r}{N} + \frac{(n - 1)(r - N)}{N(N - 1)} \right) \\ & = \frac{nr}{N} \left( \frac{N^2 - N}{N(N - 1)} - \frac{rN - r}{N(N - 1)} + \frac{(n - 1)(r - N)}{N(N - 1)} \right) \\ & = \frac{nr}{N} \left( \frac{N^2 - rN + rn - nN}{N(N-1)}\right) \\ & = \frac{nr}{N} \left( \frac{N(N - n) - r(N - n)}{N(N-1)} \right) \\ & = \frac{nr(N - r)(N - n)}{N^2(N - 1)} \\ \end{align} \]
This confirms with the formula sheet given as well.
Use the above three theorems to find \(Cov(Y_i, Y_j)\) when \(i \neq j\) and \(Y_1, Y_2, \ldots , Y_k \sim \text{Mult}(n; p_1, p_2, \ldots, p_k)\).
I don’t think there is an easy way to use the above three theorems. Instead a common way to approach this question is to actually consider two set of indicator variables that indicate whether the class is \(i\) or \(j\) at certain trial.
5 Continuous Multivariate Probability Distributions
\(Y_1, Y_2, \ldots , Y_n\) have a continuous multivariate probability distribution if their joint cdf \[ F(y_1, y_2, \ldots , y_n) = P(Y_1 \leq y_1, Y_2 \leq y_2, \ldots , Y_n \leq y_n) \] is continuous everywhere.
The joint pdf of \(Y_1, Y_2, \ldots, Y_n\) is then \[ f(y_1, y_2, \ldots , y_n) = \frac{\partial^n F(y_1, y_2, \ldots y_n)}{\partial y_1 \partial y_2 \ldots \partial y_n} \].
All the definitions and results made for discrete joint distributions also hold for continuous ones, except that summations must be replaced by integrals, and \(p\)’s need to be replaced by \(f\)’s.
Thus: \[ \idotsint_{\mathbb{R}^n} f(y_1, y_2, \ldots, y_n) \, dy_1 dy_2 \ldots dy_n = 1 \]
Also, we can calculate the probability by evaluating the integrals at the corret region. Similar to discrete we define the following (assume that we have two random variables for easiness of demonstrations):
- \(f_X(x) = \int f(x, y) \, dy\) (marginal pdf of \(X\))
- \(f_{X|Y}(x|y) = \frac{f(x, y)}{f(y)}\) (conditional pdf of \(X\) given \(Y = y\))
- \(E(g(X, Y)) = \iint g(x, y) f(x, y) \, dx dy\)
- \(E(c) = c\), etc.
6 Conditional Expectation
Now, wtih the definition of the conditional probability, we can define the conditional expectations as the following. \[ E(X \mid Y = y) = \begin{cases} \sum_x xp(x \mid y), & \text{if $X$ is discrete} \\ \int x f(x\mid y) dx, & \text{if $X$ is continuous} \\ \end{cases} \]
Suppose \(X\) and \(Y\) are two continuous random variables with joint pdf \[ f(x, y) = cxy, \qquad 0 < x < 2y < 4 \]. Find:
- \(P(X > 1, Y < 1)\)
- \(E(Y)\)
- \(\rho\)
- \(E(X|Y = 1)\)
- \(P(X > 1, Y < 1)\)
This question is nothing hard when you recognise the fact that it is simply a multi-variate integral question. However, we first need to determine the constant.
\[ \begin{align} 1 & = \int_{x=0}^4 \int_{y=\frac{1}{2}x}^2 cxy \, dydx \\ & = c\int_{x=0}^4 x \left[ \frac{1}{2}y^2 \right]_{\frac{1}{2}x}^2 dx \\ & = c \int_{x=0}^4 x \left( 2 - \frac{1}{8} x^2 \right) dx \\ & = c \left[ x^2 - \frac{1}{32}x^4 \right]_{0}^4 \\ & = c \left(16 - \frac{256}{32}\right) = 8c \\ \end{align} \]
Therefore, we know that \(c = \frac{1}{8}\). Now, we can proceed to evaluate the integral.
\[ \begin{align} P(X > 1, Y < 1) & = \int_{x=1}^2 \int_{y=\frac{1}{2}x}^1 \frac{xy}{8} \, dy dx \\ & = \frac{1}{8} \int_{x=1}^2 x \left[ \frac{y^2}{2} \right]_{\frac{1}{2}x}^1 \, dx \\ & = \frac{1}{8} \int_{x=1}^2 x \left( \frac{1}{2} - \frac{x^2}{8} \right) \, dx \\ & = \frac{1}{8} \left[ \frac{x^2}{4} - \frac{x^4}{32} \right]_1^2 \\ & = \frac{9}{256} \approx 0.035 \end{align} \]
- \(E(Y)\)
In order to find the expectations, we need to have the marginal distribution. However, it can be combined into one step.
\[ \begin{align} E(Y) & = \int_{y = 0}^2 y \int_{x=0}^{2y} \frac{xy}{8} \, dxdy \\ & = \int_{y = 0}^2 \frac{y^2}{8} \left[ \frac{x^2}{2} \right]_0^{2y} \, dy \\ & = \int_{y=0}^2 \frac{y^2}{8} \left( 2y^2 \right) = \int_{y=0}^2 \frac{y^4}{4} \, dy \\ & = \left[ \frac{y^5}{20} \right]_0^2 = \frac{32}{20} = \frac{8}{5} \end{align} \]
- \(\rho\)
\[ \begin{align} E(Y^2) &= \int_{y = 0}^2 y^2 \frac{y^3}{4} \, dy \\ & = \frac{2^6}{24} = \frac{8}{3} \\ Var(Y) & = \frac{8}{3} - \left( \frac{8}{5} \right)^2 \\ & = \frac{8}{75} \\ \end{align} \]
Also, we need to determine the same things for \(X\) as well.
\[ \begin{align} E(X) & = \int_{y = 0}^2 \int_{x=0}^{2y} x f(x, y) \, dxdy \\ &= \int_{y=0}^2 \frac{y}{8} \int_{x=0}^{2y} x^2 \, dxdy \\ & = \int_{y = 0}^2 \frac{y}{8} \frac{8y^3}{3} \, dy \\ & = \left[ \frac{y^5}{15} \right]_0^2 = \frac{32}{15} \\ E(X^2) & = \int_{y = 0}^2 \int_{x=0}^{2y} x^2 f(x, y) \, dxdy \\ &= \int_{y=0}^2 \frac{y}{8} \int_{x=0}^{2y} x^3 \, dxdy \\ & = \int_{y = 0}^2 \frac{y}{8} \frac{16y^4}{4} \, dy = \int_{y = 0}^2 \frac{y^5}{2} \, dy \\ & = \left[ \frac{y^6}{12} \right]_0^2 = \frac{16}{3} \\ Var(X) & = E(X^2) - (E(X))^2 \\ & = \frac{16}{3} - \left( \frac{32}{15} \right)^2 \\ & = \frac{176}{225} \\ E(XY) & = \int_{y = 0}^2 \int_{x=0}^{2y} xy f(x, y) \, dxdy \\ &= \int_{y=0}^2 \frac{y^2}{8} \int_{x=0}^{2y} x^2 \, dxdy \\ & = \int_{y = 0}^2 \frac{y^2}{8} \frac{8y^3}{3} \, dy = \int_{y = 0}^2 \frac{y^5}{3} \, dy \\ & = \left[ \frac{y^6}{18} \right]_0^2 \\ & = \frac{32}{9} \\ Cov(X, Y) & = E(XY) - E(X)E(Y) \\ & = \frac{32}{9} - \frac{32}{15}\frac{8}{5} = \frac{32}{225} \\ \rho & = \frac{Cov(X, Y)}{\sqrt{Var(X)}\sqrt{Var(Y)}} \\ & = \frac{\frac{32}{225}}{\sqrt{\frac{176}{225}}\sqrt{\frac{8}{75}}} = 0.4924 \end{align} \]
- \(E(X|Y = 1)\)
Let’s first calculate the conditional distribution.
\[ \begin{align} f(x \mid y) &= \frac{f(x, y)}{f(y)} \\ & = \frac{\frac{xy}{8}}{\frac{y^3}{4}} \\ & = \frac{x}{2y^2} \, (0 < x < 2y) \\ \end{align} \]
Then, we can find the expectation as,
\[ \begin{align} E(X \mid Y = y) & = \int_0^{2y} \frac{x^2}{2y^2} dx \\ & = \frac{1}{2y^2} \frac{8y^3}{3} \\ & = \frac{4y}{3} \\ \end{align} \]
Therefore, \(E(X | Y = 1) = \frac{4}{3}\).
6.1 Random Expectations
Definition 1 (Random Expectations) By \(E(X\mid Y)\), we denote the function \(E(X\mid Y = y)\) with \(y\) replaced by \(Y\). This would then makes \(E(X \mid Y)\) also be a random variable for which we can also apply expectations.
For example, the above would have the random expectations fo \(X|Y\) as \[ E(X \mid Y) = \frac{4}{3} Y \]. Note that \(E(X \mid Y)\) would be a random variable about \(Y\) rather than \(X\) as we have already taken the expectation over \(X\).
6.2 The Law of Iterated Expectation
This theorem is commonly used in Bayesian Inference as we will discuss in CH16.
Theorem 4 (Law of Iterated Expectation) \[ E(X) = E(E(X \mid Y)) \]
Proof. \[ \begin{align} E(E(X\mid Y)) & = \int E(X | Y = y) f(y) \, dy = \int \left( \int x f(x \mid y) \, dx \right) f(y) \, dy \\ & = \iint x f(x, y) \, dxdy = E(X) \end{align} \]
Footnotes
I believe this is generally a plausible assumptions to make otherwise it is possible to run out of items when you are performing the trials.↩︎