Language Model estimates the probs that the sequences of words can be a sentence said by a human. Training it, we can get the embeddings of the whole vocabulary.


UnConditional Language Model just assigns probs to sequences of words. That’s to say, given the first n-1 words and to predict the probs of the next word.(learn the prob distribution of next word).

Beacuse of the probs chain rule, we only train this:

【NLP】Conditional Language Models


Conditional LMs

A conditional language model assigns probabilities to sequences of words, W =(w1,w2,…,wt) , given some conditioning context x.


For example, in the translation task, we must given the orininal sentence and its translation. The orininal sentence is the conditioning context, and by using it, we predict the objection sentence.


Data for training conditional LMs:

  To train conditional language models, we need paired
 samples.E.X.

【NLP】Conditional Language Models

Such task like:Translation, summarisation, caption generation,
 speech recognition


How to evaluate the conditional LMs?

  • Traditional methods: use the cross-entropy or perplexity.(hard to interpret,easy to implement)
  • Task-specific evaluation:  Compare the model’s most likely output to human-generated expected output . Such as 【BLEU】、METEOR、ROUGE…(okay to interpret,easy to implement)
  • Human evaluation: Hard to implement.


Algorithmic challenges:

Given the condition context x, to find the max-probs of the the predict sequence of words, we cannot use the gready search, which might cann’t generate a real sentence.

We use the 【Beam Search】.


We draw attention to the “encoder-decoder” models  that learn a function that maps  x  into a fixed-size
 vector and then uses a language model to “decode”
 that vector into a sequence of words, 

【NLP】Conditional Language Models


Model: K&B2013

【NLP】Conditional Language Models

A simpal of Encoder – just cumsum(very easy)

【NLP】Conditional Language Models

A simpal of Encoder – CSM Encoder:use CNN to encode

【NLP】Conditional Language Models

The Decoder – RNN Decoder

【NLP】Conditional Language Models

The cal graph is.

【NLP】Conditional Language Models


Sutskever et al. Model (2014):

- Important.Classic Model

【NLP】Conditional Language Models

Cal Graph:

【NLP】Conditional Language Models


Some Tricks to Sutskever et al. Model :

  • Read the Input Sequence ‘backwards’: +4BLEU

  【NLP】Conditional Language Models

  •  Use an ensemble of m 【independently trained】 models (at the decode period) :
  1. Ensemble of 2 models: +3 BLEU
  2. Ensemble of 5 models: +4.5 BLEU


    For example:

      【NLP】Conditional Language Models

  • we want to find the most probable (MAP) output
 given the input,i,e.

      【NLP】Conditional Language Models

  We use the beam search : +1BLEU

    For example,the beam size is 2:

      【NLP】Conditional Language Models


Example of A Application: Image caption generation

Encoder:CNN

Decoder:RNN or

             conditional n-gram LM(different to the RNN but it is useful)

             【NLP】Conditional Language Models

             【NLP】Conditional Language Models


We must have some datasets already.

Kiros et al. Model has done this.




















  .

相关文章:

  • 2021-11-09
  • 2021-12-06
  • 2021-11-21
  • 2021-10-06
  • 2021-11-17
  • 2021-11-15
  • 2022-12-23
  • 2021-04-23
猜你喜欢
  • 2021-11-02
  • 2022-12-23
  • 2022-01-19
  • 2021-06-20
  • 2021-07-28
  • 2021-11-02
相关资源
相似解决方案