Script event prediction requires a model to predict the subsequent event given an existing event context.
Tn this paper, they first extract narrative event chains from large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG) based on the extracted chains. To solve the inference problem on NEEG, they present a scaled graph neural network (SGNN) to model event interactions and learn better event representations based on network embedding.
1. NEEG Construction
We extract a set of narrative event chains S = {s1, s2, s3, ...., sN}, where si =
{T, e1, e2, e3, ..., em}. For example, si can be {T = customer, walk(T, restaurant, -), seat(T, -, -), read(T, menu, -), order(T,
food, -), serve(waiter, food, T), eat(T, food, fork)}. T is the protagonist entity (主角) shared by all the events in this chain. ei
is an event that consists of four components {p(a0, a1, a2)}, where p is the predicate verb(谓语), and a0; a1; a2 are the subject(主语), object(宾语) and indirect object to the verb, respectively.
we represent event ei by its abstract form (vi, ri), where vi is denoted by a non-lemmatized predicate verb, and ri is the grammatical dependency relation of vi to the chain entity T, for example ei=(eats, subj). This kind of event representation is called predicate-GR [1].
We count all the predicate-GR bigrams in the training event chains, and regard each predicate-GR bigram as an edge li in E. Each li is a directed edge vi vj along with a weight w, which can be computed by (where count(vi; vj) means the frequency of the bigram (vi; vj) appears in the training event chains):
2. Scaled Graph Neural Network
Only a subgraph (as shown in Figure 3(b)) with context and candidate event nodes is fed into GGNN[2] for each training instance.
As shown in Figure 3(c), the overall framework of SGNN has three main components.
The first part is a representation layer, which is used to learn the initial event representation. Composing pretrained word embeddings of its verb and arguments. For arguments that consist of more than one word, we follow [1] and only use the head word identified by the parser. Given an event ei = {p(a0; a1; a2)} and the word embeddings of its verb and arguments vp, va0, va1, va2, get event vei by a mapping function vei = f(vp, va0, va1,va2). Here are three semantic composition methods (Experiments show concatenation is best):
• Average: Use the mean value of the verb and all arguments vectors as the representation of the whole event.
• Nonlinear Transformation : ve = tanh(Wp · vp + W0 · va0 + W1 · va1 + W2 · va2 + b) (2) where Wp; W0; W1; W2; b are model parameters.
• Concatenation: Concatenate the verb and all argument vectors as the representation of the whole event.
The second part is a gated graph neural network, which isused to model the interactions among events and update the initial event representations.
Inputs to GGNN are two matrices h(0) and A, where h(0)={ve1, ve2, .., ven; vec1; vec2; :::; veck } (n is 8 (context) and k(candidate) is 5, the same as[1]), contains the initial context and subsequent candidate event vectors,
and is the corresponding subgraph adjacency matrix:
The basic recurrence of GGNN is:
Eq. (4) is the step that passes information between different nodes of the graph via directed adjacency matrix A. a(t) contains activations from edges. The remainings are GRU-like updates that incorporate information from the other nodes and from the previous time step to update each node’s hidden state. z(t) and r(t) are the update and reset gate, σ is the logistic sigmoid function.
The third part is used to compute the relatedness scores between context and candidate events. (experiments show Euclidean score metric is best)
(Attention mechanism) We use an attentional neural network to calculate the relative importance of each context event according to the subsequent event candidates
References:
[1] Mark GranrothWilding and Stephen Clark. What happens next? event prediction using a compositional neural network model. In AAAI, 2016.
[2] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. ICLR, 2016.