Coursera was launched in 2012 by Daphne Koller and Andrew Ng with the goal of giving life-changing learning experiences to students all around the world. In the modern day, Coursera is a worldwide online learning platform that provides anybody, anywhere with access to online courses and degrees from top institutions and corporations.

### Probabilistic Graphical Models 1: Representation Coursera Quiz Answers

### Week 1 Quiz Answers

#### Quiz 1: Basic Definitions

Q1. **Factor product.**

Let X, Y*X*,*Y* be binary variables, and let Z*Z* be a variable that takes on values 1, 2, or 3.

If \phi_1(X,Y)*ϕ*1(*X*,*Y*) and \phi_2(Y, Z)*ϕ*2(*Y*,*Z*) are the factors shown below, compute the selected entries (marked by a ‘?’) in the factor \psi(X, Y, Z) = \phi_1(X,Y) \cdot \phi_2(Y, Z)*ψ*(*X*,*Y*,*Z*)=*ϕ*1(*X*,*Y*)⋅*ϕ*2(*Y*,*Z*), giving your answer according to the ordering of assignments to variables as shown below.

Separate each of the 3 entries of the factor with spaces, e.g., an answer of

0.1 0.2 0.3;

means that \psi(1,1,2) = 0.1*ψ*(1,1,2)=0.1, \psi(1,2,1) = 0.2*ψ*(1,2,1)=0.2, and \psi(2,1,3) = 0.3*ψ*(2,1,3)=0.3. Give your answers as exact decimals without any trailing zeroes.

Q2. **Factor reduction.**

Let X, Z*X*,*Z* be binary variables, and let Y*Y* be a variable that takes on values 1, 2, or 3.

Now say we observe Y=1*Y*=1. If \phi(X,Y,Z)*ϕ*(*X*,*Y*,*Z*) is the factor shown below, compute the missing entries of the reduced factor \psi(X, Z) *ψ*(*X*,*Z*) given that Y=1*Y*=1, giving your answer according to the ordering of assignments to variables as shown below.

As before, separate the 4 entries of the factor by spaces.

Q3. **Properties of independent variables.**

Assume that A and B are independent random variables. Which of the following options are always true? You may select 1 or more options.

- P(B|A) = P(B)
*P*(*B*∣*A*)=*P*(*B*) - P(A,B) = P(A) \times P(B)
*P*(*A*,*B*)=*P*(*A*)×*P*(*B*) - P(A) = P(B)
*P*(*A*)=*P*(*B*) - P(A) \neq P(B)
*P*(*A*)=*P*(*B*)

Q4. **Factor marginalization.**

Let X, Z*X*,*Z* be binary variables, and let Y*Y* be a variable that takes on values 1, 2, or 3.

If \phi(X,Y,Z)*ϕ*(*X*,*Y*,*Z*) is the factor shown below, compute the entries of the factor

\psi(Y, Z) = \sum_X \phi(X,Y,Z)*ψ*(*Y*,*Z*)=∑*X**ϕ*(*X*,*Y*,*Z*),

giving your answer according to the ordering of assignments to variables as shown below.

Separate the 4 entries of the factor with spaces, and do not add any extra trailing or leading zeroes or decimal points.;

#### Quiz 2: Bayesian Network Fundamentals

**Q1. Factorization.**

- P(A,B,C,D) = P(A)P(B)P(C|A)P(C|B)P(D|B)
*P*(*A*,*B*,*C*,*D*)=*P*(*A*)*P*(*B*)*P*(*C*∣*A*)*P*(*C*∣*B*)*P*(*D*∣*B*) - P(A,B,C,D) = P(A)P(B)P(C|A,B)P(D|B)
*P*(*A*,*B*,*C*,*D*)=*P*(*A*)*P*(*B*)*P*(*C*∣*A*,*B*)*P*(*D*∣*B*) - P(A,B,C,D) = P(A)P(B)P(C)P(D)
*P*(*A*,*B*,*C*,*D*)=*P*(*A*)*P*(*B*)*P*(*C*)*P*(*D*) - P(A,B,C,D) = P(A)P(B)P(A,B|C)P(B|D)
*P*(*A*,*B*,*C*,*D*)=*P*(*A*)*P*(*B*)*P*(*A*,*B*∣*C*)*P*(*B*∣*D*)

Q2. **Independent parameters.**

If you haven’t come across the term before, here’s a brief explanation: A multinomial distribution over m*m* possibilities x_1,…,x_m*x*1,…,*xm* has m*m* parameters, but m−1*m*−1 independent parameters, because we have the constraint that all parameters must sum to 1, so that if you specify m−1*m*−1 of the parameters, the final one is fixed. In a CPD P(X|Y)*P*(*X*∣*Y*), if X*X* has m*m* values and Y*Y* has k*k* values, then we have k*k* distinct multinomial distributions, one for each value of Y*Y*, and we have m−1*m*−1 independent parameters in each of them, for a total of k(m−1)*k*(*m*−1). More generally, in a CPD P(X|Y_1,…,Y_r)*P*(*X*∣*Y*1,…,*Yr*), if each Y_i*Yi* has k_i*ki* values, we have a total of k_1 \times \ldots \times k_r \times (m−1)*k*1×…×*kr*×(*m*−1) independent parameters.

**Example**: Let’s say we have a graphical model that just had X \rightarrow Y*X*→*Y*, where both variables are binary. In this scenario, we need 1 parameter to define the CPD of X*X*. The CPD of X*X* contains two entries P(X = 0)*P*(*X*=0) and P(X=1)*P*(*X*=1). Since the sum of these two entries has to be equal to 1, we only need one parameter to define the CPD.;

Now we look at Y*Y*. The CPD for Y*Y* contains 4 entries which correspond to: P(Y = 0 | X = 0), P(Y = 1 | X = 0), P(Y = 0 | X = 1), P(Y = 1 | X = 1)*P*(*Y*=0∣*X*=0),*P*(*Y*=1∣*X*=0),*P*(*Y*=0∣*X*=1),*P*(*Y*=1∣*X*=1). Note that P(Y = 0 | X = 0)*P*(*Y*=0∣*X*=0) and P(Y = 1 | X = 0)*P*(*Y*=1∣*X*=0) should sum to one, so we need 1 independent parameter to describe those two entries; likewise, P(Y = 0 | X = 1)*P*(*Y*=0∣*X*=1) and P(Y = 1 | X = 1)*P*(*Y*=1∣*X*=1) should also sum to 1, so we need 1 independent parameter for those two entries.

Therefore, we need 1 independent parameter to define the CPD of X*X* and 2 independent parameters to define the CPD of Y*Y*.

- 4
- 3
- 12
- 8
- 6
- 7
- 11

Q3. ***Inter-causal reasoning.**

Calculate P(Accident = 1 | Traffic = 1) and P(Accident = 1 | Traffic = 1, President = 1). Separate your answers with a space, e.g., an answer of

0.15 0.25

means that P(Accident = 1 | Traffic = 1) = 0.15 and P(Accident = 1 | Traffic = 1, President = 1) = 0.25. Round your answers to two decimal places and write a leading zero, like in the example above.;

#### Quiz 3: Bayesian Network Independencies

Q1. **Independencies in a graph.**

Which pairs of variables are independent in the graphical model below, given that none of them have been observed? You may select 1 or more options.

- A, B
- C, D
- A, E
- D, E
- None – there are no pairs of independent variables.

Q2. ***Independencies in a graph. **(An asterisk marks a question that is more challenging. Congratulations if you get it right!)

- None – given E, there are no pairs of variables that are independent.
- A, B
- A, C
- A, D
- B, D
- D, C
- B, C

Q3. **I-maps. **I-maps can also be defined directly on graphs as follows. Let I(G)*I*(*G*) be the set of independencies encoded by a graph G*G*. Then G_1*G*1 is an I-map for G_2*G*2 if I(G_1) \subseteq I(G_2)*I*(*G*1)⊆*I*(*G*2).

Which of the following statements about I-maps are true? You may select 1 or more options.

- A graph K is an I-map for a graph G if and only if all of the independencies encoded by K are also encoded by G.
- A graph K is an I-map for a graph G if and only if K encodes all of the independences that G has and more.
- An I-map is a function f
*f*that maps a graph G to itself, i.e., f(G) = G*f*(*G*)=*G*. - The graph K that is the same as the graph G, except that all of the edges are oriented in the opposite direction as the corresponding edges in G, is always an I-map for G, regardless of the structure of G.
- I-maps are Apple’s answer to Google Maps

Q4. ***Naive Bayes.**

Assume a population size of 10,000. Which of the following statements are true in this model? You may select 1 or more options.

- Say we observe that 10001000 people have the flu, out of which 500500 people have a headache (and possibly other symptoms) and 500500 have a fever (and possibly other symptoms).
- We would expect that approximately 250250 people with the flu also have both a headache and fever.
- Say we observe that 10001000 people have a headache (and possibly other symptoms), out of which 500500 people have the flu (and possibly other symptoms), and 500500 people have a fever (and possibly other symptoms).
- We would expect that approximately 250250 people with a headache also have both the flu and a fever.
- Say we observe that 500500 people have a headache (and possibly other symptoms) and 500500 people have a fever (and possibly other symptoms).
- Without more information, we cannot estimate how many people have both a headache and fever.
- Say we observe that 500500 people have a headache (and possibly other symptoms) and 500500 people have a fever (and possibly other symptoms).
- We would expect that approximately 250250 people have both a headache and fever.

Q5. Question 5

**I-maps.**

Suppose (A \perp B) \in \mathcal{I}(P)(*A*⊥*B*)∈I(*P*), and G*G* is an I-map of P*P*, where G*G* is a Bayesian network and P*P* is a probability distribution. Is it necessarily true that (A \perp B) \in \mathcal{I}(G)(*A*⊥*B*)∈I(*G*)?

- Yes
- No

#### Quiz 4: Octave/Matlab installation

Q1. The platform requires us to have one graded assignment in every honors lesson, so we have to ask: have you successfully installed Octave or MATLAB?

- Yes
- No;

### Week 2 Quiz Answers

#### Quiz 1: Template Models

Q1. **Markov Assumption.**

If a dynamic system X*X* satisfies the Markov assumption for all time t \geq 0*t*≥0, which of the following statements must be true? You may select 1 or more options.

- ( X^{(t+1)} \perp X^{(0:(t-1))} | X^{(t)} )(
*X*(*t*+1)⊥*X*(0:(*t*−1))∣*X*(*t*)) - P(X^{(t+1)}) = P(X^{(t-1)})
*P*(*X*(*t*+1))=*P*(*X*(*t*−1)) for all possible values of X*X* - (X^{(t+1)} \perp X^{(0:(t-1))})(
*X*(*t*+1)⊥*X*(0:(*t*−1)))

Q2. **Independencies in DBNs.**

- (X^{(t+1)} \perp X^{(t)} \mid X^{(t-1)})(
*X*(*t*+1)⊥*X*(*t*)∣*X*(*t*−1)) - (O^{(t)} \perp X^{(t-1)} \mid X^{(t)})(
*O*(*t*)⊥*X*(*t*−1)∣*X*(*t*)) - (O^{(t)} \perp O^{(t-1)})(
*O*(*t*)⊥*O*(*t*−1)) - (X^{(t)} \perp X^{(t-1)})(
*X*(*t*)⊥*X*(*t*−1));

Q3. **Applications of DBNs.**

For which of the following applications might one use a DBN (i.e. the Markov assumption is satisfied)? You may select 1 or more options.

- Modeling the behavior of people, where a person’s behavior is influenced by only the behavior of people in the same generation and the people in his/her parents’ generation.
- Modeling data taken at different locations along a road, where the data at each location is influenced by only the data at the same location and at the location directly to the East
- Modeling time-series data, where the events at each time-point are influenced by only the events at the one time-point directly before it
- Predicting the probability that today will be a snow day (school will be closed because of the snow), when this probability depends only on whether yesterday was a snow day.

Q4. **Plate Semantics.**

“Let A and B be random variables inside a common plate indexed by i. Which of the following statements must be true? You may select 1 or more options.

- For each i, A(i) and B(i) have the same CPDs.
- For each i, A(i) and B(i) have edges connecting them to the same variables outside of the plate.
- If there is an instance of A for some i, then there is no instance of B for that i.
- There is an instance of A and an instance of B for every i.

**Q4. *Plate Interpretation.**

- Whether a specific teacher T taught a specific course C at school S
- None of these options can represent X in the grounded model
- Whether a specific teacher T is a tough grader
- Whether someone with expertise E taught something of difficulty D at a place in location L
- Whether someone with expertise E taught something of difficulty D at school S;

**Q6. Grounded Plates.**

Using the same plate model, now assume that there are s*s* schools, t*t* teachers in each school, and c*c* courses taught by each teacher. How many instances of the Location variable are there?

- s
*s* - stc
*stc* - s^2
*s*2 - t
*t*

Q7.

- K \cdot L \cdot M
*K*⋅*L*⋅*M* - K \cdot (L + M)
*K*⋅(*L*+*M*) - L \cdot M
*L*⋅*M* - (L \cdot M)^K(
*L*⋅*M*)*K*

Q8. **Template Models. **Consider the plate model from the previous question. What might P represent?

- Whether a specific product PROD was consumed by consumer C in market M
- Whether a specific product PROD was consumed by consumer C in all markets
- Whether a specific product of brand q was consumed by a consumer with age t in a market of type m that is in location a
- Whether a specific product PROD was consumed by consumer C in market M in location L

Q9.

- (a)
- (b)
- (c)

Q10. ***Unrolling DBNs. **Below are 2-TBNs that could be unrolled into DBNs. Consider these unrolled DBNs (note that there are no edges within the first time-point). In which of them will (X^{(t)} \perp Z^{(t)} \mid Y^{(t)})(*X*(*t*)⊥*Z*(*t*)∣*Y*(*t*)) hold for all t*t*, assuming Obs^{(t)}*Obs*(*t*) is observed for all t*t* and X^{(t)}*X*(*t*) and Z^{(t)}*Z*(*t*) are never observed? You may select 1 or more options.

- (a)
- (b)
- (c);

#### Quiz 2: Structured CPDs

Q1. **Causal Influence. **Consider the CPD below. What is the probability that E = e_0*E*=*e*0 in the following graph, given an observation A = a_0, B = b_1, C = c_1, D = d_1*A*=*a*0,*B*=*b*1,*C*=*c*1,*D*=*d*1? Note that, for the pairs of probabilities that make up the leaves, the probability on the left is the probability of e_0*e*0, and the probability on the right is the probability of e_1*e*1.

Q2. **Independencies with Deterministic Functions.** In the following Bayesian network, the node B is a deterministic function of its parent A. Which of the following is an independence statement that holds in the network? You may select 1 or more options.

- (A \perp B \mid C,D)(
*A*⊥*B*∣*C*,*D*) - (B \perp D \mid C)(
*B*⊥*D*∣*C*) - (A \perp D \mid B)(
*A*⊥*D*∣*B*) - (C \perp D \mid B)(
*C*⊥*D*∣*B*)

Q3. **Independencies in Bayesian Networks. **For the network in the previous question, let B no longer be a deterministic function of its parent A. Which of the following is an independence statement that holds in the modified Bayesian network? You may select 1 or more options.

- (B \perp D \mid A)(
*B*⊥*D*∣*A*) - (C \perp D \mid B)(
*C*⊥*D*∣*B*) - (C \perp D \mid A)(
*C*⊥*D*∣*A*) - (A \perp D \mid C)(
*A*⊥*D*∣*C*)

Q4. **Context-Specific Independencies in Bayesian Networks. **Which of the following are context-specific independences that **do** exist in the tree CPD below? (Note: Only consider independencies in this CPD, ignoring other possible paths in the network that are not shown here. You may select 1 or more options.)

- (E \perp_c C \mid b^0, d^0)(
*E*⊥*c**C*∣*b*0,*d*0) - (E \perp_c D \mid a^0)(
*E*⊥*c**D*∣*a*0) - (E \perp_c D \mid b^1)(
*E*⊥*c**D*∣*b*1) - (E \perp_c D, B \mid a^1)(
*E*⊥*c**D*,*B*∣*a*1)

#### Quiz 3: BNs for Genetic Inheritance PA Quiz

**Q1. This quiz is a companion quiz to Programming Assignment: Bayes Nets for Genetic Inheritance. Please refer to the writeup for the programming assignment for instructions on how to complete this quiz.**

James and Rene come to a genetic counselor because they are deciding whether to have another child or adopt. They want to know the probability that their un-born child will have cystic fibrosis.

Consider the Bayesian network for cystic fibrosis. We consider a person’s phenotype variable to be “observed” if the person’s phenotype is known. Order the probabilities of their un-born child having cystic fibrosis in the following situations from smallest to largest: (1) No phenotypes are observed (nothing clicked), (2) Jason has cystic fibrosis, (3) Sandra has cystic fibrosis.

- (3), (1), (2)
- (1), (2), (3)
- (3), (2), (1)
- (2), (3), (1)
- (1), (3), (2)

Q2. James never knew his father Ira because Ira passed away in an accident when James was a few months old. Now James comes to the genetic counselor wanting to know if Ira had cystic fibrosis. The genetic counselor wants your help in determining the probability that Ira had cystic fibrosis. Consider the Bayesian network for cystic fibrosis. We consider a person’s phenotype variable to be “observed” if the person’s phenotype is known. Order the probabilities of Ira having had cystic fibrosis in the following situations from smallest to largest: (1) No phenotypes are observed (nothing clicked), (2) Benjamin has cystic fibrosis, (3) Benjamin and Robin have cystic fibrosis.;

- (1), (3), (2)
- (3), (2), (1)
- (2), (3), (1)
- (1), (2), (3)
- (3), (1), (2)

Q3. Recall that, for a trait with 2 alleles, the CPD for genotype given parents’ genotypes has 27 entries, and 18 parameters were needed to specify the distribution. How many parameters would be needed if the trait had 3 alleles instead of 2?

Q4. You will now gain some intuition for why decoupling a Bayesian network can be worthwhile. Consider a **non-decoupled** Bayesian network for cystic fibrosis with **3 alleles** over the pedigree that was used in section 2.4 and 3.3. How many parameters are needed to specify all probability distributions across the entire network?

Q5. Now consider the **decoupled** Bayesian network for cystic fibrosis with **3 alleles** over the pedigree that was used in section 2.4 and 3.3. How many parameters are needed to specify all of the probability distributions across the entire network?

**Hint**: A child cannot inherit an allele that is not present in either parent, so there aren’t as many degrees of freedom here as there might be without that context-specific information.

Q6. Consider the **decoupled** Bayesian network for cystic fibrosis with three alleles that you constructed in section 3.3. We consider a person’s gene copy variable to be “observed” if the person’s allele for that copy of the gene is known.

James and Rene are debating whether to have another child or adopt a child. They are concerned that, if they have a child, the child will have cystic fibrosis because both of them have one F allele observed (their other gene copy is not observed), even though neither of them have cystic fibrosis. You want to give them advice, but they refuse to tell you whether anyone else in their family has cystic fibrosis. What is the **probability** (NOT a percentage) that their unborn child will have cystic fibrosis?

Q7. Consider a Bayesian network for spinal muscular atrophy (SMA), in which there are multiple genes and 2 phenotypes.

Let n*n* be the number of genes involved in SMA and m*m* be the maximum number of alleles per gene. How many parameters are necessary if we use a table CPD for the probabilities for phenotype given copies of the genes from both parents?

- O(m^2)
*O*(*m*2) - Depends on the phenotype
- O(mn)
*O*(*mn*) - O(m+n)
*O*(*m*+*n*) - O(n)
*O*(*n*) - O(2^n)
*O*(2*n*) - O(m^{2n})
*O*(*m*2*n*) - O(4^n)
*O*(4*n*)

Q8. Consider the Bayesian network for spinal muscular atrophy (SMA), in which there are multiple genes and two phenotypes.

Let n*n* be the number of genes involved in SMA and m*m* be the maximum number of alleles per gene. How many parameters are necessary if we use a sigmoid CPD for the probabilities for phenotype given copies of the genes from both parents?

- O(max(m,n))
*O*(*max*(*m*,*n*)) - O(m)
*O*(*m*) - O((mn)^2)
*O*((*mn*)2) - Depends on the phenotype
- O(m^2n)
*O*(*m*2*n*) - O(mn)
*O*(*mn*) - O(n)
*O*(*n*) - O(m+n)
*O*(*m*+*n*)

Q9. Consider genes A and B that might be involved in spinal muscular atrophy. Assume that A has 2 alleles A_1*A*1 and A_2*A*2, and B has 2 alleles, B_1*B*1 and B_2*B*2. Which of the following relationships between A and B can a sigmoid CPD capture?

- Allele A_1
*A*1 and allele B_1*B*1 make a person equally more likely to have SMA, but when both are present the effect on SMA is the same as when only one is present. - Neither gene A nor gene B contribute to SMA.
- Allele A_1
*A*1 and allele B_1*B*1 make a person more likely to be have SMA when both of these alleles are present, but neither affect SMA otherwise. - Allele A_1
*A*1 makes a person more likely to have SMA, while allele B_1*B*1 independently makes a person less likely to have SMA. - When the alleles are A_1
*A*1 and B_2*B*2 or A_2*A*2 and B_1*B*1 the person has SMA; otherwise the person does not have SMA. - Gene A contributes to SMA, but gene B does not contribute to SMA and thus does not affect the effects of gene A on SMA.
- Alleles A_1
*A*1 and B_1*B*1 each independently make a person likely to have SMA.

Q10. Consider the Bayesian network for spinal muscular atrophy that we provided in spinalMuscularAtrophyBayesNet.net. We consider a person’s gene copy variable to be “observed” if the person’s allele for that copy of that gene is known.

Now say that Ira and Robin come to the genetic counselor because they are debating whether to have a biological child or adopt and are concerned that their child might have spinal muscular atrophy. They have some genetic information, but because sequencing is still far too expensive to be affordable for everyone, their information is limited to only a few genes and to only 1 chromosome in each pair of chromosomes.

Order the probabilities of their un-born child having spinal muscular atrophy in the following situations from smallest to largest: (1) No genetic information or phenotypes are observed (nothing clicked), (2) Ira and Robin each have at least 1 M allele, (3) Ira and Robin each have at least 1 M allele and at least 1 B allele.

- (3), (1), (2)
- (1), (2), (3)
- (1), (3), (2)
- (3), (2), (1)
- (2), (3), (1)

Q11. Consider the Bayesian network for spinal muscular atrophy that we provided in spinalMuscularAtrophyBayesNet.net.

No longer interested in finding out whether his father had cystic fibroisis, James comes to the genetic counselor with another question: Did his father have spinal muscular atrophy? The genetic counselor now wants your help in figuring this out. This time, however, James has other information for you: both he and Robin have spinal muscular atrophy.

What is the **probability** (NOT a percentage) that Ira had spinal muscular atrophy?

### Week 3 Quiz Answers

#### Quiz 1: Markov Networks

Q1. **Factor Scope. **Let \phi(c,e)*ϕ*(*c*,*e*) be a factor in a graphical model, where c is a value of C and e is a value of E. What is the scope of \phi*ϕ*?

- {A, B, C, E}
- {C, E}
- {A, C, E}
- {C}
- C, D
- No pair of variables are independent on each other.
- D, E

Q3. **Factorization. **Which of the following is a valid Gibbs distribution over this graph?

- \phi(A, B, C, D, E, F)
*ϕ*(*A*,*B*,*C*,*D*,*E*,*F*) - \phi(A) \times \phi(B) \times \phi(C) \times \phi(D) \times \phi(E) \times \phi(F)
*ϕ*(*A*)×*ϕ*(*B*)×*ϕ*(*C*)×*ϕ*(*D*)×*ϕ*(*E*)×*ϕ*(*F*) - \frac{\phi(A, B, D) \times \phi(C, E, F)}{Z}
*Zϕ*(*A*,*B*,*D*)×*ϕ*(*C*,*E*,*F*), where Z*Z*is the partition function - \frac{\phi(A) \times \phi(B) \times \phi(C) \times \phi(D) \times \phi(E) \times \phi(F)}{Z}
*Zϕ*(*A*)×*ϕ*(*B*)×*ϕ*(*C*)×*ϕ*(*D*)×*ϕ*(*E*)×*ϕ*(*F*), where Z*Z*is the partition function

Q4. **Factors in Markov Network. **Let \phi(A,B,C)*ϕ*(*A*,*B*,*C*) be a factor in a probability distribution that factorizes over a Markov network. Which of the following must be true? You may select 1 or more options.

- A, B, and C form a clique in the network.
- \phi(a,b,c) \leq 1
*ϕ*(*a*,*b*,*c*)≤1, where a is a value of A, b is a value of B, and c is a value of C. - \phi(a,b,c) \geq 0
*ϕ*(*a*,*b*,*c*)≥0, where a is a value of A, b is a value of B, and c is a value of C. - A, B, and C do not form a clique in the network.
- There is no path from A to B, no path from B to C, and no path from A to C in the network.;

#### Quiz 2: Independencies Revisited

**Q1. I-Maps. **Graph G*G* (shown below) is a perfect I-map for distribution P*P*, i.e. \mathcal{I}(G)=\mathcal{I}(P)I(*G*)=I(*P*). Which of the other graphs is an I-map (**not** necessarily a perfect map) for P*P*?

- III
- None of the above
- II
- I and III

Q2. **I-Equivalence. **In the figure below, graph G*G* is I-equivalent to which other graph(s)?

- I
- III
- None of the above
- I and III

Q3. ***I-Equivalence. **Let Bayesian network G*G* be a simple directed chain X_1 \rightarrow X_2 \rightarrow … \rightarrow X_n*X*1→*X*2→…→*Xn* for some number n*n*. How many Bayesian networks are I-equivalent to G*G* including G*G* itself?

- n
*n* - n!
*n*! - 2^{(n-1)}2(
*n*−1) - 2n2
*n*

### Week 4

#### Quiz 1: Decision Theory

Q1. **Utility Curves.** What does the point marked A*A* on the Y*Y* axis correspond to? (Mark all that apply.)

- 0.5
*U*($0)+0.5*U*($1000) - U(ℓ)
*U*(ℓ) where ℓℓ is a lottery that pays $0 with probability 0.5 and $1000 with probability 0.5. - $500
*U*($500)

Q2. **Utility Curves.** What does the point marked B*B* on the Y*Y* axis correspond to? (Mark all that apply.)

- $500
*U*($500)- U(ℓ)
*U*(ℓ) where ℓℓ is a lottery that pays $0 with probability 0.5 and $1000 with probability 0.5. - 0.5
*U*($0)+0.5*U*($1000)

Q3. **Expected Utility.** In the simple influence diagram on the right, with the CPD for M*M* and the utility function V*V*, what is the expected utility of the action f^1*f*1?

- 5
- 2
- 0
- 20

Q4. ***Uninformative Variables.** In the influence diagram on the right, what is an appropriate way to have the model account for the fact that if the Test wasn’t performed (t^0)(*t*0), then the survey is uninformative?

- Set P(S | M, t^0)
*P*(*S*∣*M*,*t*0) so that S*S*takes some new value “not performed” with probability 1. - Set P(S | M, t^0)
*P*(*S*∣*M*,*t*0) to be uniform. - Set P(S | M, t^0) = P(S | M, t^1).
*P*(*S*∣*M*,*t*0)=*P*(*S*∣*M*,*t*1). - Set P(S | M, t^0)
*P*(*S*∣*M*,*t*0) so that S*S*takes the value s^0*s*0 with probability 1.

#### Quiz 2: Decision Making PA Quiz

Q1. This quiz is a companion quiz to the Programming Assignment on Decision Making. Please refer to the writeup for the programming assignment for instructions on how to complete this quiz.

We have provided an instantiated influence diagram FullI (complete with a decision rule for D) in the file FullI.mat. What is the expected utility for this influence diagram? Please round to the nearest tenth (i.e., 1 decimal place), do not include commas, and do not write the number in scientific notation.

Enter answer here

Q2. Run ObserveEvidence.m on FullI to account for the following: We have been informed that variable 3 in the model, which models an overall genetic risk for ARVD, has value 2 (indicating the presence of genetic risk factors). Then run SimpleCalcExpectedUtility on the modified influence diagram. What happened to the expected utility? (Hint — ObserveEvidence does not re-normalize the factors so that they are again valid CPDs unless the normalize flag is set to 1. — If you do not use the normalize flag, you can use NormalizeCPDFactors.m to do the normalization.)

- It substantially decreased.
- It did not change.;
- It substantially increased.
- The expected utility might or might not change because there is some randomness in the process for determining the expected utility.

Q3. Why can we explicitly enumerate all the possible decision rules while we often cannot enumerate over all possible CPDs?

- If there is one choice in a decision rule, at least one choice must have a 0 probability, where in a general CPD, no entries are restricted to having 0 probabilities.
- In an influence diagram, each decision node cannot have more than 1 parent, while in a general Bayes net, a node can have many parents.
- All choices have a probability of either 0 or 1, where in a general CPD, choices could take on any value in [0, 1].
- We can actually always enumerate over all possible CPDs.

Q4. Let a decision node DD take on dd possible values. Let it have mm parents that can each take on nn possible values. How many possible decision rules \delta_Dδ

D

are there?

- d(n^m)
*d*(*nm*); - d(m^n)
*d*(*mn*) - d^{(n^m)}
*d*(*nm*) - d^{(2n^m)}
*d*(2*nm*) - dnm
*dnm* - 2d(n^m)2
*d*(*nm*) - d^{(m^n)}
*d*(*mn*)

Q5. Consider an influence diagram with 1 decision node DD that can take on dd values. Let DD have mm parents that can each take on nn values. Assume that running sum-product inference takes O(S)O(S) time. What is the run-time complexity of running OptimizeMEU on this influence diagram?

- O(Sdnm)
*O*(*Sdnm*) - O(S+dn^m)
*O*(*S*+*dnm*) - O(S+n^m)
*O*(*S*+*nm*) - O(S+dnm)
*O*(*S*+*dnm*) - O(Sdn^m)
*O*(*Sdnm*) - O(d^{(n^m)})
*O*(*d*(*nm*)) - O(Sn^m)
*O*(*Snm*)

Q6. In which of the following situations does it make sense to use OptimizeWithJointUtility instead of OptimizeLinearExpectations?;

When the bottleneck in inference is in enumerating the large number of possible assignments to the parents of the utility variables, and each utility variable has a disjoint set of parents.

When there are large factors in the random-variables part of the influence diagram, making inference over the network slow, and there are only a few utility factors, each involving a small number of variables.

When the scopes of the utility factors are large compared to the scopes of the other (random variable) factors.

When every random variable in the network is a parent of at least one other utility factor.

Q7. In the field below, enter the dollar value of the test T1, rounded to the nearest cent (e.g., “1.23” means that you would pay $1.23 for the test; any more than that, and your net utility will be lower than if you didn’t perform any test). Do not precede with the amounts with dollar signs.

Enter answer here

Q8. In the field below, enter the dollar value of the test T2, rounded to the nearest cent (e.g., “1.23” means that you would pay $1.23 for the test; any more than that, and your net utility will be lower than if you didn’t perform any test). Do not precede with the amounts with dollar signs.

Enter answer here

Q9. In the field below, enter the dollar value of the test T3, rounded to the nearest cent (e.g., “1.23” means that you would pay $1.23 for the test; any more than that, and your net utility will be lower than if you didn’t perform any test). Do not precede with the amounts with dollar signs.

Enter answer here;

### Week 5 Quiz Answers

#### Quiz 1: Representation Final Exam

Q1. **Template Model Representation. **Consider the following scenario:

On each campus there are several Computer Science students and several Psychology students (each student belongs to one xor the other group). We have a binary variable L*L* for whether the campus is large, a binary variable S*S* for whether the CS student is shy, a binary variable C*C* for whether the Psychology student likes computers, and a binary variable F*F* for whether the Computer Science student is friends with the Psychology student. Which of the following plate models can represent this scenario?

- None of these plate models can represent this scenario
- (B)
- (D)
- (A)

Q2. **Partition Function. **Which of the following is a use of the partition function?;

- One can divide factor products by the partition function in order to convert them into probabilities.
- The partition function is the probability of each variable in the graph taking on a specific value.
- The partition function describes the probability that it is possible to partition the graph into groups of connected variables, where each variable within a group has the same value.
- The partition function is used only in the context of Bayesian networks, not Markov networks.

Q3. ***I-Equivalence. **Let T*T* be any directed tree (not a polytree) over n*n* nodes, where n \geq 1*n*≥1. A directed tree is a traditional tree, where each node has at most one parent and there is only one root, i.e., all but one node has exactly one parent. (In a polytree, nodes may have multiple parents.) How many networks (including itself) are I-equivalent to T*T*?

- n
*n* - n+1
*n*+1 - n!
*n*! - Depends on the specific structure of T
*T*.

Q4. ***Markov Network Construction. **Consider the unrolled network for the plate model shown below, where we have n*n* students and m*m* courses. Assume that we have observed the grade of all students in all courses. In general, what does a pairwise Markov network that is a minimal I-map for the conditional distribution look like? (Hint: the factors in the network are the CPDs reduced by the observed grades. We are interested in modeling the conditional distribution, so we do not need to explicitly include the Grade variables in this new network. Instead, we model their effect by appropriately choosing the factor values in the new network.)

- A fully connected graph with instantiations of the Difficulty and Intelligence variables.
- Impossible to tell without more information on the exact grades observed.
- A fully connected bipartite graph where instantiations of the Difficulty variables are on one side and instantiations of the Intelligence variables are on the other side.
- A graph over instantiations of the Difficulty variables and instantiations of the Intelligence variables, not necessarily bipartite; there could be edges between different Difficulty variables, and there could also be edges between different Intelligence variables.
- A bipartite graph where instantiations of the Difficulty variables are on one side and instantiations of the Intelligence variables are on the other side. In general, this graph will not be fully connected.

Q5. **Grounded Plates.**

Which of the following is a valid grounded model for the plate shown? You may select 1 or more options.

- (b) — watch out, options are not in order
- (a) — watch out, options are not in order
- (c) — watch out, options are not in order;

Q6. **Independencies in Markov Networks.**

Consider the following set of factors: \Phi = \{\Phi_1(A, B), \Phi_2(B, C, D), \Phi_3(D), \Phi_4(C, E, F)\}Φ={Φ1(*A*,*B*),Φ2(*B*,*C*,*D*),Φ3(*D*),Φ4(*C*,*E*,*F*)}. Now, consider a Markov Network G*G* such that P_\Phi*P*Φ factorizes over G*G*. Which of the following is an independence statement that holds in the network? You may select 1 or more options.

- (A \perp E \mid B)(
*A*⊥*E*∣*B*) - (C \perp E \mid B)(
*C*⊥*E*∣*B*) - (C \perp D \mid A)(
*C*⊥*D*∣*A*) - (B \perp E \mid C)(
*B*⊥*E*∣*C*) - (B \perp E \mid A)(
*B*⊥*E*∣*A*) - (A \perp F \mid C)(
*A*⊥*F*∣*C*)

Q7. **Factorization of Probability Distributions.**

Consider a directed graph G*G*. We construct a new graph G’*G*′ by removing one edge from G*G*. Which of the following is always true? You may select 1 or more options.

- If G
*G*and G’*G*′ were undirected graphs, the answers to the other options would not change. - Any probability distribution P
*P*that factorizes over G*G*also factorizes over G’*G*′.; - Any probability distribution P
*P*that factorizes over G’*G*′ also factorizes over G*G*. - No probability distribution P
*P*that factorizes over G*G*also factorizes over G’*G*′.

**Q8. Template Model in CRF.**

The CRF model for OCR with only singleton and pairwise potentials that you played around with in PA3 and PA7 is an instance of a template model, with variables C_1,\ldots,C_n*C*1,…,*Cn* over the characters and observed images I_1,\ldots,I_n*I*1,…,*In*. The model we used is a template model in that the singleton potentials are replicated across different C_i*Ci* variables, and the pairwise potentials are replicated across character pairs. The structure of the model is shown below:

Now consider the advantages of this particular template model for the OCR task, as compared to a non-template model that has the same structure, but where there are distinct singleton potentials for each C_i*Ci* variable, and distinct potentials for each pair of characters. Which of the following about the advantage of using a template model is true? You may select 1 or more options.

- The same template model can be used for words of different lengths.
- The template model can incorporate position-specific features, e.g. q-u occurs more frequently at the beginning of a word, while a non-template model cannot.
- The inference is significantly faster with the template model.
- Parameter sharing could make the model less susceptible to over-fitting when there is less training data.;

**Review: **

Based on our knowledge, we urge you to enroll in this course so you can pick up new skills from specialists. It will be worthwhile, we trust.