Point-wise Mutual Information

(Yao, et al 2019) reclaimed a clear description of Point-wise Mutual Information as below:

\[PMI(i, j) = \log \frac{p(i,j)}{p(i)p(j)} \\ p(i, j) = \frac{\#(i,j)}{\#W} \\ p(i) = \frac{\#(i)}{\#W} \]

where \(\#(i)\) is the number of sliding windows in a corpus hat contain word \(i\)

where \(\#(i,j)\) is the number of sliding windows that contain both word \(i\) and \(j\)

where \(\#W\) is the total number of sliding windows in the corpus.

(Levy, et al 2014) simplified PMI formula as below:

\[PMI(i,j) = \log\frac{\#(i,j)\#W}{\#(i)\#(j)} \]

Obviously, \(\#W\) is a constant if we fixed slide window size and corpus, hence we can further simplify the formula as below:

\[PMI(i, j) = \log\frac{\#(i,j)}{\#(i)\#(j)} \]

References

Liang Yao, et al, 2019. Graph Convolutional Networks for Text Classification. AAAI

Omer Levy, et al, 2014. NeuralWord Embedding as Implicit Matrix Factorization. NIPS