1. SVD Based Methods

    1.1 Word-Document Matrix

    1.2 Window based CO-occurrence Matrix

    In this method we count the number of times each word appears inside a window of a particular size around the word of interest. We calculate this count for all the words in corpus.

    1.3 Advantages:Both of these methods give us word vectors that are more than sufficient to encode semantic and syntacic

    1.4 Shortcomming:

    The dimensions of the matrix change very often(new words are added very frequently and corpus changes in size)

    The matrix is extremely sparse since most words do not co-occur

    The matrix is very hign dimensional in general

    Quadratic cost to train(perform SVD)


  1. Iteration Based Methods

    2.1 CBOW Model

    key idea: Predicting a center word from the surrounding context

    unkonwns: Two matrics,VRn×|V| and UR|V|×n

    Notation for CBOW Model:

    • wi:Word i from vocabulary V

    • VRn×|V|:Input word matrix

    • vi:the input vector representation of word wi

    • URn×|V|:Output word matrix

    • ui:the output vector representation of word wi

    Steps:

    • We generate our one hot word vector(x(cm),…,x(c1),x(c+1),…,x(c+m)) for the input context of size m.

    • We get our embedded word vectors for the context (Vcm=Vx(cm),Vcm+1=Vx(cm+1),…Vc+m=Vx(c+m))

    • Average these vectors to get v̂ =vcm+vcm+1+...+vc+m2m

    • Generate a score vector z=Uv̂ 

    • Turn the scores into probabilities ŷ =sofrmax(z)

    • We desire our probabilities generated,ŷ ,to match the true probabilities,y,which also happens to be the one hot vector of the actual word
      Word Vector Representation

    2.2 Skip-Gram Model

    key idea: Predicting surrounding context words given a center word

    steps:

    • We generate our one hot input vector x

    • We get our embedded word vectors for the context vc=Vx

    • Since there is no averaging,just set v̂ =vc

    • Generate 2m score vectors,ucm,...,uc1,uc+1,...,uc+m using u=Uvc

    • Turn each of the scores into probabilitiesm, y=softmax(u)

    • We desire our probability vector generated to match the true probabilities which is ycm,...,yc1,yc+1,...,yc+m,the one hot vecotrs of the actual output

    Word Vector Representation

相关文章:

  • 2021-09-01
  • 2021-05-01
  • 2021-09-18
  • 2021-11-05
  • 2021-11-10
  • 2021-09-11
  • 2021-06-23
  • 2021-06-27
猜你喜欢
  • 2021-06-09
  • 2021-09-16
  • 2022-01-22
  • 2021-11-28
  • 2021-12-20
  • 2021-04-09
  • 2021-11-02
相关资源
相似解决方案