【发布时间】:2016-12-12 20:19:52
【问题描述】:
我正在尝试使用名为“requests”的 python 包以及使用 pyspark 的程序。我已经下载了所需的包,并且可以通过包含“导入请求”将其用于普通的 python 程序,但它不适用于 pyspark 程序并显示“ImportError: No module named requests”。
代码
def get_text(s):
import requests
url = s
data = requests.get(url).text
return data
调用函数
newrdd=newrdd.map(get_text)
输出错误行
16/12/12 15:42:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 48, node090.cm.cluster): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/local/hadoop-2/tmp/hadoop-yarn/nm-local-dir/usercache/wdps1615/appcache/application_1480500761259_0178/container_1480500761259_0178_01_000003/pyspark.zip/pyspark/worker.py", line 172, in main
process()
File "/local/hadoop-2/tmp/hadoop-yarn/nm-local-dir/usercache/wdps1615/appcache/application_1480500761259_0178/container_1480500761259_0178_01_000003/pyspark.zip/pyspark/worker.py", line 167, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/local/hadoop-2/tmp/hadoop-yarn/nm-local-dir/usercache/wdps1615/appcache/application_1480500761259_0178/container_1480500761259_0178_01_000003/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream
for obj in iterator:
File "/var/scratch/wdps1615/spark-2.0.2-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", line 1507, in func
File "/var/scratch/wdps1615/Entitytext.py", line 45, in get_text
import requests
ImportError: No module named requests
【问题讨论】:
-
您可以在包含您的脚本的同一文件夹中运行
pip freeze吗? -
是的,'requests==2.12.3' 在列表中。
标签: python apache-spark python-requests pyspark rdd