Probabilistic Graphical Models 1 Graphical Models

From course Probabilistic Models and Inference Algorithms for Machine Learning, Prof. Dahua Lin
All contents here are from the course and self understandings.

The key idea behind graphical models is factorization
A graphical model generally refers to a family of joint distributions over multiple variables that factorize according to the structure of the underlying graph.
可以从两方面来理解 graphical models:
- 是个数据结构(data structure), 这个数据结构呢能通过分解的方式描述联合分布（a joint distribution in a factorized manner.）
- 一种紧凑的一系列条件独立(conditional independencies)的分布（a family of distributions.）的表示方法
- 以上这两点实际上是等价的。
Graphical Models 的类别：
- Bayesian Networks (Directed Acyclic Graphs)
- Markov Random Fields (Undirected Graphs)
- Chain Graphs (Directed acyclic graphs over undirected components)
- Factor Graphs
Directed Acyclic Graph
- A graph G is called a directed acyclic graph (DAG)
  if it has no directed cycles. (即每月自环的现象)
- 因为有向图(directed graph)是有方向的，所以对图中的有向边来说，是要分parent 和 child 的
- A vertex s is called an ancestor of t and t an descendant of s, denoted as s ≺ t, if there exists
  a directed path from s to t.
- Topological Ordering： A topological ordering of a directed graph G = (V, E) is a linear ordering of vertices such that for each edge (s, t) ∈ E, s always comes before t.
- A finite directed graph is acyclic if and only if it has a topological ordering.

Given a DAG G = (V, E), we say a joint distribution over XV factorizes according to G, if its density p can be expressed as
p(xV)=∏s∈Vps(xs|xπ(s))
- Such a model is called a Bayesian Network over G.
- π(s) is the set of s`s parents, which can be empty
- example:

考虑一个无向图 G = (V, E)
- clique: is a fully connected subset of vertices
- A clique is called maximal if it is not properly contained in another clique. (指当另一个点加进来的时候，这个clique 就变得不 clique 了)
- C(G) denotes the set of all maximal cliques.
- example
Markov Random Fields
- Consider an undirected graph G=(V,E), we say a joint distribution of XV factorizes according to G if its density p can be expressed as
  $p (x V) = 1 X \prod C \in C ψ C (x C)$
- This is called a Markov Random Field over G.
- ψC:XC→R+ are called factors
- The normalizing constant Z is usually needed to ensure the distribution is properly normalized:
  $Z = \int \prod C \in C (G) ψ C (x C) μ (d x)$
- ψC 称为 compatibility functions，它不需要服从marginal or conditional distributions.

The graphical structure also encodes a set of conditional independencies among the variables.
Consider a joint distribution over (X, Y, Z), X and Y are called conditionally independent given Z, denoted by X⊥Y|Z if
Pr(X∈A&Y∈B|Z)=Pr(X∈A|Z)Pr(Y∈B|Z)

or more generally
EX,Y|Z[f(X)g(Y)]=EX|Z[f(X)]EY|Z[g(Y)]
- Suppose the conditional distributions X|Z and Y|Z have densities pX|z and pY|z, then X⊥Y|Z, if the following equality holds almost
  surely: $p (X, Y) | z (x, y) = p X | z (x) p Y | z (y)$

An MRF does not always fully reveal the factorized structure of a distribution.
A factor graph can sometimes give a more accurate characterization of a family of distributions.
A factor graph is a bipartite graph with links between two types of nodes: variables and factors.
A variable x and a factor f is linked in a factor graph, if the factor involves x as an argument.