- From course Probabilistic Models and Inference Algorithms for Machine Learning, Prof. Dahua Lin
- All contents here are from the course and self understandings.
Basic Concepts
- The key idea behind graphical models is factorization
- A graphical model generally refers to a family of joint distributions over multiple variables that factorize according to the structure of the underlying graph.
- 可以从两方面来理解 graphical models:
- 是个 数据结构(data structure), 这个数据结构呢能通过分解的方式描述联合分布(a joint distribution in a factorized manner.)
- 一种紧凑的一系列条件独立(conditional independencies)的分布(a family of distributions.)的表示方法
- 以上这两点实际上是等价的。
-
Graphical Models 的类别:
- Bayesian Networks (Directed Acyclic Graphs)
- Markov Random Fields (Undirected Graphs)
- Chain Graphs (Directed acyclic graphs over undirected components)
- Factor Graphs
-
Directed Acyclic Graph
- A graph G is called a directed acyclic graph (DAG)
if it has no directed cycles. (即每月 自环的现象) - 因为有向图(directed graph)是有方向的,所以对图中的有向边来说,是要分parent 和 child 的
- A vertex s is called an ancestor of t and t an descendant of s, denoted as s ≺ t, if there exists
a directed path from s to t. - Topological Ordering: A topological ordering of a directed graph G = (V, E) is a linear ordering of vertices such that for each edge (s, t) ∈ E, s always comes before t.
- A finite directed graph is acyclic if and only if it has a topological ordering.
- A graph G is called a directed acyclic graph (DAG)
Bayesian Networks
- Given a DAG G = (V, E), we say a joint distribution over
XV factorizes according to G, if its density p can be expressed asp(xV)=∏s∈Vps(xs|xπ(s)) - Such a model is called a Bayesian Network over G.
-
π(s) is the set ofs `s parents, which can be empty - example:
Markov Random Fields
-
考虑一个无向图 G = (V, E)
- clique: is a fully connected subset of vertices
- A clique is called maximal if it is not properly contained in another clique. (指当另一个点加进来的时候,这个clique 就变得 不 clique 了)
-
C(G) denotes the set of all maximal cliques. - example
-
Markov Random Fields
- Consider an undirected graph
G=(V,E) , we say a joint distribution ofXV factorizes according to G if its density p can be expressed asp(xV)=1X∏C∈CψC(xC) - This is called a Markov Random Field over G.
-
ψC:XC→R+ are called factors - The normalizing constant Z is usually needed to ensure the distribution is properly normalized:
Z=∫∏C∈C(G)ψC(xC)μ(dx) -
ψC 称为 compatibility functions,它不需要服从marginal or conditional distributions.
- Consider an undirected graph
Analysis of Conditional Independence
- The graphical structure also encodes a set of conditional independencies among the variables.
- Consider a joint distribution over (X, Y, Z), X and Y are called conditionally independent given Z, denoted by
X⊥Y|Z ifPr(X∈A&Y∈B|Z)=Pr(X∈A|Z)Pr(Y∈B|Z)
or more generallyEX,Y|Z[f(X)g(Y)]=EX|Z[f(X)]EY|Z[g(Y)] - Suppose the conditional distributions
X|Z andY|Z have densitiespX|z andpY|z , thenX⊥Y|Z , if the following equality holds almost
surely:p(X,Y)|z(x,y)=pX|z(x)pY|z(y)
- Suppose the conditional distributions
Factor Graphs
- An MRF does not always fully reveal the factorized structure of a distribution.
- A factor graph can sometimes give a more accurate characterization of a family of distributions.
- A factor graph is a bipartite graph with links between two types of nodes: variables and factors.
- A variable x and a factor f is linked in a factor graph, if the factor involves x as an argument.