在 Azure ML Studio 中执行 Python 脚本答案

【问题标题】：Executing Python script in Azure ML studio在 Azure ML Studio 中执行 Python 脚本
【发布时间】：2018-11-12 03:27:14
【问题描述】：

我想创建一个 web 服务，它将使用 python、beautifulsoup 和 nltk 提供给定 URL 中文本的摘要。

但是我在 Azure ML Studio 中遇到以下错误

AZURE 中的示意图：

EnterData 模块的 URL 来自 wiki

执行 Python 脚本有以下代码

import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
def azureml_main(dataframe1="https://en.wikipedia.org/wiki/Fluid_mechanics", dataframe2 = None):
    wiki = dataframe1[0].to_string()
    page = ur.urlopen(wiki)
    soup = BeautifulSoup(page)
    df= pd.DataFrame([soup.find_all('p')[0].get_text()], columns =['article_text'])
    return dataframe1,

运行此实验产生以下错误：

    Error 0085: The following error occurred during script evaluation, please view the output log for more information:
    ---------- Start of error message from Python interpreter ----------
    Caught exception while executing function: Traceback (most recent call last):
      File "C:\pyhome\lib\site-packages\pandas\indexes\base.py", line 1876, in get_loc
        return self._engine.get_loc(key)
        File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)
      File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)
      File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
      File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
    KeyError: 0
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "C:\server\invokepy.py", line 199, in batch
        odfs = mod.azureml_main(*idfs)
      File "C:\temp\84d7e9fbcfe54596a2e7de022b4d236c.py", line 23, in azureml_main
        wiki = dataframe1[0][0].to_string()
      File "C:\pyhome\lib\site-packages\pandas\core\frame.py", line 1992, in __getitem__
        return self._getitem_column(key)
  File "C:\pyhome\lib\site-packages\pandas\core\frame.py", line 1999, in _getitem_column
    return self._get_item_cache(key)
  File "C:\pyhome\lib\site-packages\pandas\core\generic.py", line 1345, in _get_item_cache
    values = self._data.get(item)
  File "C:\pyhome\lib\site-packages\pandas\core\internals.py", line 3225, in get
    loc = self.items.get_loc(item)
  File "C:\pyhome\lib\site-packages\pandas\indexes\base.py", line 1878, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)
  File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 0
Process returned with non-zero exit code 1

---------- End of error message from Python  interpreter  ----------
Start time: UTC 11/11/2018 15:34:21
End time: UTC 11/11/2018 15:34:30

我正在使用 Anaconda 4.0/Python 3.5 来运行这个 sn-p。
当我将 URL 分配给变量 wiki 时，代码在我的本地计算机上成功运行
我不确定为什么无法从输入数据帧 1 中获取值。
输入数据帧没有标头，因此 dataframe1[0] 应该直接获取 URL..

感谢在这方面帮助我。

【问题讨论】：

标签： python pandas beautifulsoup

【解决方案1】：

你的dataframe1 是这样的

dataframe1 = {'Col1' : ['https://en.wikipedia.org/wiki/Finite_element_method']}

关键不是索引（int），而是它的'Col1'，你可以修复它

wiki = dataframe1['Col1'].to_string(index=0)

但它会引发另一个错误，如果 URL 太长会被修剪

https://en.wikipedia.org/wiki/Finite_element....

所以最好使用

wiki = dataframe1['Col1'][0]

另一个错误是

return dataframe1,

应该是

return df,

固定代码

import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
def azureml_main(dataframe1="https://en.wikipedia.org/wiki/Fluid_mechanics", dataframe2 = None):
    wiki = dataframe1['Col1'][0]
    page = ur.urlopen(wiki)
    soup = BeautifulSoup(page)
    df= pd.DataFrame([soup.find_all('p')[0].get_text()], columns=['article_text'])
    return df,

【讨论】：

欢迎您，请考虑将答案标记为正确。