【问题标题】:Iterating over Excel rows to get URL from Google Search and write them in new columns on Excel遍历 Excel 行以从 Google 搜索中获取 URL 并将它们写入 Excel 的新列中
【发布时间】:2021-08-17 14:23:39
【问题描述】:

我正在做一个项目,你可以在Colab 看到它。总而言之,我正在对Excel File 中特定列的值应用 Google 搜索。如果你们想看看,我把它的链接放上去。

所以基本上我的代码将在 Google 中搜索 F 列中的值,并在 G、H、I、J 和 K 列中返回我需要的 URL。

代码如下:

FILE_NAME = "planilha.xlsx"
QUERY_LIST = ("Site Oficial", "Linkedin", "Facebook", "Instagram", "Twitter")
TAB_NAME = "Sheet1"
def _get_company_information(company_name):
    """Retrieve the information based on the Query List on the given company name."""
    list_links = []

    # for query_item, query_validators in QUERY_LIST.items():
    for query_item in QUERY_LIST:
        for query_result in search(
                f"{company_name} {query_item}",
                tld='com.br', lang='pt-br', num=1, start=0, stop=1, pause=1.0
        ):
            list_links.append(query_result)
    return list_links
if __name__=='__main__':
    xl = pd.ExcelFile(FILE_NAME)

    with pd.ExcelWriter("output_"+FILE_NAME, mode="w", engine="openpyxl") as writer:
        print("- Parsing Excel file")
        df1 = xl.parse(TAB_NAME)

        # Get single row by iteration
        for row_number, row_data in df1.iterrows():
            company_name = row_data.get("Organização - Nome fantasia")

            print(f"-- Getting info for company: {company_name} . . .")
            df_company_info = _get_company_information(company_name=company_name)
            df1.loc[row_number, QUERY_LIST] = df_company_info
            print(f"-- Got info: {df_company_info} !!!")
            print()
            
        print(f"- Updating DF . . .")
        df1.to_excel(writer, index=False)
        print(f"- Completed!!!")

此代码的输出返回以下错误:

- Parsing Excel file
-- Getting info for company: NORDEA DO BRASIL REPRESENTACOES LTDA . . .
-- Got info: ['https://www.emis.com/php/company-profile/BR/Nordea_do_Brasil_Representacoes_Ltda_en_2321032.html', 'https://no.linkedin.com/company/nordea', 'https://www.facebook.com/Nordea/', 'https://www.instagram.com/nordea_sverige/', 'https://twitter.com/nordea'] !!!

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-41-7f6b4a1d3574> in <module>()

AttributeError: 'str' object has no attribute 'sheet_state'

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
9 frames
/usr/local/lib/python3.7/dist-packages/openpyxl/writer/workbook.py in get_active_sheet(wb)
     59     visible_sheets = [idx for idx, sheet in enumerate(wb._sheets) if sheet.sheet_state == "visible"]
     60     if not visible_sheets:
---> 61         raise IndexError("At least one sheet must be visible")
     62 
     63     idx = wb._active_sheet_index

IndexError: At least one sheet must be visible

奇怪的是,它似乎一直工作到我的sheet 的第五行,然后分解为上面的错误。

知道哪里出了问题吗?

【问题讨论】:

    标签: python xlsxwriter google-search-api pandas.excelwriter


    【解决方案1】:

    您的问题解决了吗?
    我没有运行您的代码,但如果您的计算机使用的是英语以外的其他语言,则 Excel 输出文件中创建的工作表将不会称为“Sheet1”,您必须将名称明确指定给 to_excel(),例如:

    df1.to_excel(writer, sheet_name='Planilha1', index=False)
    

    【讨论】:

      猜你喜欢
      • 2020-06-17
      • 2017-01-15
      • 2022-11-01
      • 1970-01-01
      • 2023-04-04
      • 2021-10-09
      • 1970-01-01
      • 2020-12-28
      • 1970-01-01
      相关资源
      最近更新 更多