【问题标题】:Executing Python script in Azure ML studio在 Azure ML Studio 中执行 Python 脚本
【发布时间】:2018-11-12 03:27:14
【问题描述】:

我想创建一个 web 服务,它将使用 python、beautifulsoup 和 nltk 提供给定 URL 中文本的摘要。

但是我在 Azure ML Studio 中遇到以下错误

AZURE 中的示意图:

EnterData 模块的 URL 来自 wiki

执行 Python 脚本有以下代码

import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
def azureml_main(dataframe1="https://en.wikipedia.org/wiki/Fluid_mechanics", dataframe2 = None):
    wiki = dataframe1[0].to_string()
    page = ur.urlopen(wiki)
    soup = BeautifulSoup(page)
    df= pd.DataFrame([soup.find_all('p')[0].get_text()], columns =['article_text'])
    return dataframe1,

运行此实验产生以下错误:

    Error 0085: The following error occurred during script evaluation, please view the output log for more information:
    ---------- Start of error message from Python interpreter ----------
    Caught exception while executing function: Traceback (most recent call last):
      File "C:\pyhome\lib\site-packages\pandas\indexes\base.py", line 1876, in get_loc
        return self._engine.get_loc(key)
        File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)
      File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)
      File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
      File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
    KeyError: 0
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "C:\server\invokepy.py", line 199, in batch
        odfs = mod.azureml_main(*idfs)
      File "C:\temp\84d7e9fbcfe54596a2e7de022b4d236c.py", line 23, in azureml_main
        wiki = dataframe1[0][0].to_string()
      File "C:\pyhome\lib\site-packages\pandas\core\frame.py", line 1992, in __getitem__
        return self._getitem_column(key)
  File "C:\pyhome\lib\site-packages\pandas\core\frame.py", line 1999, in _getitem_column
    return self._get_item_cache(key)
  File "C:\pyhome\lib\site-packages\pandas\core\generic.py", line 1345, in _get_item_cache
    values = self._data.get(item)
  File "C:\pyhome\lib\site-packages\pandas\core\internals.py", line 3225, in get
    loc = self.items.get_loc(item)
  File "C:\pyhome\lib\site-packages\pandas\indexes\base.py", line 1878, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)
  File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 0
Process returned with non-zero exit code 1

---------- End of error message from Python  interpreter  ----------
Start time: UTC 11/11/2018 15:34:21
End time: UTC 11/11/2018 15:34:30
  1. 我正在使用 Anaconda 4.0/Python 3.5 来运行这个 sn-p。
  2. 当我将 URL 分配给变量 wiki 时,代码在我的本地计算机上成功运行
  3. 我不确定为什么无法从输入数据帧 1 中获取值。
  4. 输入数据帧没有标头,因此 dataframe1[0] 应该直接获取 URL..

感谢在这方面帮助我。

【问题讨论】:

    标签: python pandas beautifulsoup


    【解决方案1】:

    你的dataframe1 是这样的

    dataframe1 = {'Col1' : ['https://en.wikipedia.org/wiki/Finite_element_method']}
    

    关键不是索引(int),而是它的'Col1',你可以修复它

    wiki = dataframe1['Col1'].to_string(index=0)
    

    但它会引发另一个错误,如果 URL 太长会被修剪

    https://en.wikipedia.org/wiki/Finite_element....
    

    所以最好使用

    wiki = dataframe1['Col1'][0]
    

    另一个错误是

    return dataframe1,
    

    应该是

    return df,
    

    固定代码

    import pandas as pd
    import urllib.request as ur
    from bs4 import BeautifulSoup
    def azureml_main(dataframe1="https://en.wikipedia.org/wiki/Fluid_mechanics", dataframe2 = None):
        wiki = dataframe1['Col1'][0]
        page = ur.urlopen(wiki)
        soup = BeautifulSoup(page)
        df= pd.DataFrame([soup.find_all('p')[0].get_text()], columns=['article_text'])
        return df,
    

    【讨论】:

    • 欢迎您,请考虑将答案标记为正确。
    猜你喜欢
    • 2019-03-09
    • 2021-08-28
    • 2022-11-03
    • 2019-06-27
    • 2022-11-10
    • 1970-01-01
    • 2019-02-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多