【问题标题】:How can I clean up the response for this script to make it more readable?如何清理此脚本的响应以使其更具可读性?
【发布时间】:2020-12-22 04:32:46
【问题描述】:

如何将此脚本的输出转换为更简洁的格式,如 csv?当我保存对文本的响应时,它的格式很糟糕。我尝试使用 writer.writerow,但无法使用此方法来解释变量。

import requests
from bs4 import BeautifulSoup


url = "https://www.rockauto.com/en/catalog/ford,2015,f-150,3.5l+v6+turbocharged,3308773,brake+&+wheel+hub,brake+pad,1684"

response = requests.get(url)
data = response.text
soup = BeautifulSoup(data, 'html.parser')

meta_tag = soup.find('meta', attrs={'name': 'keywords'})

category = meta_tag['content']

linecodes = []
partnos = []
descriptions = []
infos = []
for tbody in soup.select('tbody[id^="listingcontainer"]'):
    tmp = tbody.find('span', class_='listing-final-manufacturer')
    linecodes.append(tmp.text if tmp else '-')

    tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
    partnos.append(tmp.text if tmp else '-')

    tmp = tbody.find('span', class_='span-link-underline-remover')
    descriptions.append(tmp.text if tmp else '-')

    tmp = tbody.find('div', class_='listing-text-row')
    infos.append(tmp.text if tmp else '-')


for row in zip(linecodes,partnos,infos,descriptions):
    result = category + ' | {:<20} | {:<20} | {:<80} | {:<80}'.format(*row)
    with open('complete.txt', 'a+') as f:
        f.write(result + '/n')
        print(result)

【问题讨论】:

    标签: python database csv beautifulsoup python-requests


    【解决方案1】:
    • 您可以将其放入 pandas 数据框
    • 从原始代码中删除最后一个for-loop
    # imports
    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    # set pandas display options to display more rows and columns
    pd.set_option('display.max_columns', 700)
    pd.set_option('display.max_rows', 400)
    pd.set_option('display.min_rows', 10)
    
    # your code
    url = "https://www.rockauto.com/en/catalog/ford,2015,f-150,3.5l+v6+turbocharged,3308773,brake+&+wheel+hub,brake+pad,1684"
    
    response = requests.get(url)
    data = response.text
    soup = BeautifulSoup(data, 'html.parser')
    
    meta_tag = soup.find('meta', attrs={'name': 'keywords'})
    
    category = meta_tag['content']
    
    linecodes = []
    partnos = []
    descriptions = []
    infos = []
    for tbody in soup.select('tbody[id^="listingcontainer"]'):
        tmp = tbody.find('span', class_='listing-final-manufacturer')
        linecodes.append(tmp.text if tmp else '-')
    
        tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
        partnos.append(tmp.text if tmp else '-')
    
        tmp = tbody.find('span', class_='span-link-underline-remover')
        descriptions.append(tmp.text if tmp else '-')
    
        tmp = tbody.find('div', class_='listing-text-row')
        infos.append(tmp.text if tmp else '-')
    

    为数据框添加代码

    # create dataframe
    df = pd.DataFrame(zip(linecodes,partnos,infos,descriptions), columns=['codes', 'parts', 'info', 'desc'])
    
    # add the category column
    df['category'] = category
    
    # break the category column into multiple columns if desired
    # skip the last 2 columns, because they are empty
    df[['cat_desc', 'brand', 'model', 'engine', 'cat_part']] = df.category.str.split(',', expand=True).iloc[:, :-2]
    
    # drop the unneeded category column
    df.drop(columns='category', inplace=True)
    
    # save to csv
    df.to_csv('complete.txt', index=False)
    
    # display(df)
                  codes       parts                            info                                 desc                   cat_desc  brand   model                 engine    cat_part
    0           CENTRIC    30016020  Rear; w/ Manual parking brake   Semi-Metallic; w/Shims and Hardware  2015 FORD F-150 Brake Pad   FORD   F-150   3.5L V6 Turbocharged   Brake Pad
    1           CENTRIC    30116020  Rear; w/ Manual parking brake         Ceramic; w/Shims and Hardware  2015 FORD F-150 Brake Pad   FORD   F-150   3.5L V6 Turbocharged   Brake Pad
    2  DYNAMIC FRICTION  1551160200     Rear; Manual Parking Brake                5000 Advanced; Ceramic  2015 FORD F-150 Brake Pad   FORD   F-150   3.5L V6 Turbocharged   Brake Pad
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-06-09
      • 2014-12-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-11-07
      相关资源
      最近更新 更多