遍历 Excel 行以从 Google 搜索中获取 URL 并将它们写入 Excel 的新列中答案

【问题标题】：Iterating over Excel rows to get URL from Google Search and write them in new columns on Excel遍历 Excel 行以从 Google 搜索中获取 URL 并将它们写入 Excel 的新列中
【发布时间】：2021-08-17 14:23:39
【问题描述】：

我正在做一个项目，你可以在Colab 看到它。总而言之，我正在对Excel File 中特定列的值应用 Google 搜索。如果你们想看看，我把它的链接放上去。

所以基本上我的代码将在 Google 中搜索 F 列中的值，并在 G、H、I、J 和 K 列中返回我需要的 URL。

代码如下：

FILE_NAME = "planilha.xlsx"
QUERY_LIST = ("Site Oficial", "Linkedin", "Facebook", "Instagram", "Twitter")
TAB_NAME = "Sheet1"

def _get_company_information(company_name):
    """Retrieve the information based on the Query List on the given company name."""
    list_links = []

    # for query_item, query_validators in QUERY_LIST.items():
    for query_item in QUERY_LIST:
        for query_result in search(
                f"{company_name} {query_item}",
                tld='com.br', lang='pt-br', num=1, start=0, stop=1, pause=1.0
        ):
            list_links.append(query_result)
    return list_links

if __name__=='__main__':
    xl = pd.ExcelFile(FILE_NAME)

    with pd.ExcelWriter("output_"+FILE_NAME, mode="w", engine="openpyxl") as writer:
        print("- Parsing Excel file")
        df1 = xl.parse(TAB_NAME)

        # Get single row by iteration
        for row_number, row_data in df1.iterrows():
            company_name = row_data.get("Organização - Nome fantasia")

            print(f"-- Getting info for company: {company_name} . . .")
            df_company_info = _get_company_information(company_name=company_name)
            df1.loc[row_number, QUERY_LIST] = df_company_info
            print(f"-- Got info: {df_company_info} !!!")
            print()
            
        print(f"- Updating DF . . .")
        df1.to_excel(writer, index=False)
        print(f"- Completed!!!")

此代码的输出返回以下错误：

- Parsing Excel file
-- Getting info for company: NORDEA DO BRASIL REPRESENTACOES LTDA . . .
-- Got info: ['https://www.emis.com/php/company-profile/BR/Nordea_do_Brasil_Representacoes_Ltda_en_2321032.html', 'https://no.linkedin.com/company/nordea', 'https://www.facebook.com/Nordea/', 'https://www.instagram.com/nordea_sverige/', 'https://twitter.com/nordea'] !!!

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-41-7f6b4a1d3574> in <module>()

AttributeError: 'str' object has no attribute 'sheet_state'

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
9 frames
/usr/local/lib/python3.7/dist-packages/openpyxl/writer/workbook.py in get_active_sheet(wb)
     59     visible_sheets = [idx for idx, sheet in enumerate(wb._sheets) if sheet.sheet_state == "visible"]
     60     if not visible_sheets:
---> 61         raise IndexError("At least one sheet must be visible")
     62 
     63     idx = wb._active_sheet_index

IndexError: At least one sheet must be visible

奇怪的是，它似乎一直工作到我的sheet 的第五行，然后分解为上面的错误。

知道哪里出了问题吗？

【问题讨论】：

标签： python xlsxwriter google-search-api pandas.excelwriter

【解决方案1】：

您的问题解决了吗？
我没有运行您的代码，但如果您的计算机使用的是英语以外的其他语言，则 Excel 输出文件中创建的工作表将不会称为“Sheet1”，您必须将名称明确指定给 to_excel()，例如：

df1.to_excel(writer, sheet_name='Planilha1', index=False)

【讨论】：