【发布时间】:2016-02-03 00:41:38
【问题描述】:
我正在尝试运行此处文档中给出的 Word2Vec 的非常简单的示例:
https://spark.apache.org/docs/1.4.1/api/python/_modules/pyspark/ml/feature.html#Word2Vec
from pyspark import SparkContext, SQLContext
from pyspark.mllib.feature import Word2Vec
sqlContext = SQLContext(sc)
sent = ("a b " * 100 + "a c " * 10).split(" ")
doc = sqlContext.createDataFrame([(sent,), (sent,)], ["sentence"])
model = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", outputCol="model").fit(doc)
model.getVectors().show()
model.findSynonyms("a", 2).show()
TypeError Traceback (most recent call last)
<ipython-input-4-e57e9f694961> in <module>()
5 sent = ("a b " * 100 + "a c " * 10).split(" ")
6 doc = sqlContext.createDataFrame([(sent,), (sent,)], ["sentence"])
----> 7 model = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", outputCol="model").fit(doc)
8 model.getVectors().show()
9 model.findSynonyms("a", 2).show()
TypeError: __init__() got an unexpected keyword argument 'vectorSize'
知道为什么会失败吗?
【问题讨论】:
-
它失败了,因为你使用了错误的包。您正在尝试使用在 RDD 上运行的 mllib 包在 DataFrame(ml 包)上运行 Word2Vec。
标签: python machine-learning apache-spark pyspark word2vec