这是一篇将GNN运用在预测知识图谱(Knowledge Graph)节点重要性的文章,被KDD2019接收。文中提出了GENI模型,在GNN聚合信息的过程中只聚合一个标量(score)而不是聚合节点的embedding。
Introduction
知识图谱可以看做是一个有向多关系图,并且节点之间可能存在不止一条边。
Given a KG, estimating the importance of each node is a crucial task that enables a number of applications such as recommendation, query disambiguation, and resource allocation optimization.
An importance score is a value that represents the significance or popularity of a node in the KG.
Method
table of symbols
score aggregation
在第l l l 层上,中心节点i i i 通过加权聚合邻居节点的score-estimation s ℓ − 1 ( j ) s^{\ell-1}(j) s ℓ − 1 ( j ) 来更新自己的score-estimations ℓ ( i ) = ∑ j ∈ N ( i ) ∪ { i } α i j ℓ s ℓ − 1 ( j ) s^{\ell}(i)=\sum_{j \in N(i) \cup\{i\}} \alpha_{i j}^{\ell} s^{\ell-1}(j) s ℓ ( i ) = j ∈ N ( i ) ∪ { i } ∑ α i j ℓ s ℓ − 1 ( j ) 为了获得初始的s 0 ( i ) s^0(i) s 0 ( i ) ,模型通过一个全连接层将节点的embedding映射成初始的score:s 0 ( i ) = Scoring N e t w o r k ( z ⃗ i ) s^{0}(i)=\text { Scoring} \mathrm{Network}\left(\vec{z}_{i}\right) s 0 ( i ) = Scoring N e t w o r k ( z i )
聚合过程是在聚合标量而不是向量,所以本文的GNN模型和其他大多数GNN模型不太一样。
Predicate-Aware Attention Mechanism
知识图谱一般可以写成三元组的形式:(subject, predicate, object),可以看做是图上一条边上的(起点,边的类型,终点)。为了更好地得到在聚合过程中的α i j ℓ \alpha_{i j}^{\ell} α i j ℓ 的值,一个合理的想法是α i j ℓ \alpha_{i j}^{\ell} α i j ℓ 与i,j之间边的类型有关系。使用p i j m p^m_{ij} p i j m 表示i,j之间第m条边的类型,ϕ ( p i j m ) \phi(p^m_{ij}) ϕ ( p i j m ) 是这条边的向量表示。通过attention机制计算出α i j ℓ \alpha_{i j}^{\ell} α i j ℓ 。
α i j ℓ = exp ( σ a ( ∑ m a ⃗ ℓ ⊤ [ s ℓ ( i ) ∥ ϕ ( p i j m ) ∥ s ℓ ( j ) ] ) ) ∑ k ∈ N ( i ) ∪ { i } exp ( σ a ( ∑ m a ⃗ ℓ ⊤ [ s ℓ ( i ) ∥ ϕ ( p i k m ) ∥ s ℓ ( k ) ] ) ) \alpha_{i j}^{\ell}=\frac{\exp \left(\sigma_{a}\left(\sum_{m} \vec{a}_{\ell}^{\top}\left[s^{\ell}(i)\left\|\phi\left(p_{i j}^{m}\right)\right\| s^{\ell}(j)\right]\right)\right)}{\sum_{k \in N(i) \cup\{i\}} \exp \left(\sigma_{a}\left(\sum_{m} \vec{a}_{\ell}^{\top}\left[s^{\ell}(i)\left\|\phi\left(p_{i k}^{m}\right)\right\| s^{\ell}(k)\right]\right)\right)} α i j ℓ = ∑ k ∈ N ( i ) ∪ { i } exp ( σ a ( ∑ m a ℓ ⊤ [ s ℓ ( i ) ∥ ϕ ( p i k m ) ∥ s ℓ ( k ) ] ) ) exp ( σ a ( ∑ m a ℓ ⊤ [ s ℓ ( i ) ∥ ∥ ϕ ( p i j m ) ∥ ∥ s ℓ ( j ) ] ) )
Centrality Adjustment
通常来说,图上入度越大的节点它的重要性就越高,所以可以使用c ( i ) = log ( d ( i ) + ϵ ) c(i)=\log (d(i)+\epsilon) c ( i ) = log ( d ( i ) + ϵ ) 计算初始的中心性得分,但这样直接计算出来的结果不能准确地衡量入度和中心性之间的关系,所以又加上了两个可学习的参数γ \gamma γ 和β \beta β :c ∗ ( i ) = γ ⋅ c ( i ) + β c^{*}(i)=\gamma \cdot c(i)+\beta c ∗ ( i ) = γ ⋅ c ( i ) + β 通过综合考虑c ∗ ( i ) c^{*}(i) c ∗ ( i ) 和最后一层的输出s L ( i ) s^{L}(i) s L ( i ) 得到节点i最终的scores ∗ ( i ) = σ s ( c ∗ ( i ) ⋅ s L ( i ) ) s^{*}(i)=\sigma_{s}\left(c^{*}(i) \cdot s^{L}(i)\right) s ∗ ( i ) = σ s ( c ∗ ( i ) ⋅ s L ( i ) )
architecture
为了增强注意力的效果,模型使用了多头注意力机制
We define s h ′ ℓ − 1 ( j ) s_{h}^{\prime \ell-1}(j) s h ′ ℓ − 1 ( j ) to be node i’s score that is estimated by (ℓ − 1)-th layer, and fed into h-th SA head in ℓ-th (i.e., the next) layer, which in turn produces an aggregation s h ℓ ( i ) s_{h}^{\ell}(i) s h ℓ ( i ) of these scores:
s h ℓ ( i ) = ∑ j ∈ N ( i ) ∪ { i } α i j h , ℓ s h ′ ℓ − 1 ( j ) s_{h}^{\ell}(i)=\sum_{j \in \mathcal{N}(i) \cup\{i\}} \alpha_{i j}^{h, \ell} s_{h}^{\prime \ell-1}(j) s h ℓ ( i ) = j ∈ N ( i ) ∪ { i } ∑ α i j h , ℓ s h ′ ℓ − 1 ( j )
在第l l l 层会得到H l H^l H l 个s h l ( i ) s^l_h(i) s h l ( i ) 值,将它们取平均后得到s h ′ l ( i ) s_{h}^{\prime l}(i) s h ′ l ( i ) 作为第l + 1 l+1 l + 1 层的输入。