【问题标题】:Specifying the List Title in an Ebay Scrape在 Ebay Scrape 中指定列表标题
【发布时间】:2016-01-31 19:04:03
【问题描述】:

我是编程和一般计算机的新手。我有一点 基本知识,我正在尝试做一个我一直在努力的 ebay 刮板。

r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')

titles = soup.find_all('a', {'class':'vip'})
titles = str(titles)
print(titles)

此代码返回一个列表,如下所示:

[
<a class="vip" href="http://www.ebay.com/itm/HAYNES-HONDA-CIVIC-DEL-SOL-1991-1995-REPAIR-MANUAL-/121848077988?hash=item1c5eb61ea4:g:M5gAAOSwf-VWahyW&amp;vxp=mtr" title="Click this link to access HAYNES HONDA CIVIC DEL SOL 1991-1995 REPAIR MANUAL">HAYNES HONDA CIVIC DEL SOL 1991-1995 REPAIR MANUAL</a>, 

<a class="vip" href="http://www.ebay.com/itm/1992-1995-HONDA-CIVIC-DEL-SOL-FUSE-BOX-/320502127733?hash=item4a9f6a5c75:m:mHAEk2bNyOI4W8-qUiEnGWw&amp;vxp=mtr" title="Click this link to access 1992 - 1995 HONDA CIVIC/DEL SOL FUSE BOX">1992 - 1995 HONDA CIVIC/DEL SOL FUSE BOX</a>, 

<a class="vip" href="http://www.ebay.com/itm/HAYNES-HONDA-CIVIC-DEL-SOL-1991-1995-REPAIR-MANUAL-/111847121189?hash=item1a0a9ba125:g:M5gAAOSwf-VWahyW&amp;vxp=mtr" title="Click this link to access HAYNES HONDA CIVIC DEL SOL 1991-1995 REPAIR MANUAL">HAYNES HONDA CIVIC DEL SOL 1991-1995 REPAIR MANUAL</a>,

<a class="vip" href="http://www.ebay.com/itm/1996-2000-HONDA-CIVIC-DEL-SOL-all-models-repair-manual-/401035736130?hash=item5d5f97b442:g:UjAAAOSwxN5WXm-M&amp;vxp=mtr" title="Click this link to access 1996-2000 HONDA CIVIC &amp; DEL SOL - all models - repair manual">1996-2000 HONDA CIVIC &amp; DEL SOL - all models - repair manual</a>, 

<a class="vip" href="http://www.ebay.com/itm/Haynes-42024-1992-1995-Honda-Civic-and-del-sol-repair-manual-/321920364888?hash=item4af3f2f158:g:9GoAAOSwLzdWRog0&amp;vxp=mtr" title="Click this link to access Haynes 42024 1992-1995 Honda Civic and del sol repair manual.">Haynes 42024 1992-1995 Honda Civic and del sol repair manual.</a>, 

<a class="vip" href="http://www.ebay.com/itm/1988-2000-honda-acura-civic-integra-del-sol-pvc-valve-17130-pm6-003-oem-a137-/141782449307?hash=item2102e47c9b:g:iB8AAOSw0HVWAg5x&amp;vxp=mtr" title="Click this link to access 1988-2000 honda acura civic integra del sol pvc valve 17130-pm6-003 oem a137">1988-2000 honda acura civic integra del sol pvc valve 17130-pm6-003 oem a137</a>, 

<a class="vip" href="http://www.ebay.com/itm/88-2000-Honda-Civic-5-Speed-Manual-Shift-Knob-OEM-CRX-EF-EG-Si-Del-Sol-89-96-/262152990857?hash=item3d09893089:g:1x4AAOSwlV9WT7fj&amp;vxp=mtr" title="Click this link to access 88 - 2000 Honda Civic 5 Speed Manual Shift Knob OEM CRX EF EG Si Del Sol 89 96">88 - 2000 Honda Civic 5 Speed Manual Shift Knob OEM CRX EF EG Si Del Sol 89 96</a>, 

<a class="vip" href="http://www.ebay.com/itm/1988-2000-Honda-Civic-CRX-EF-SI-DX-HF-Del-Sol-Manual-Shift-Knob-OEM-88-91-/281859356536?hash=item41a0207778:g:XTAAAOSwMmBVj5mr&amp;vxp=mtr" title="Click this link to access 1988-2000 Honda Civic CRX EF SI DX HF Del Sol Manual Shift Knob OEM 88-91">1988-2000 Honda Civic CRX EF SI DX HF Del Sol Manual Shift Knob OEM 88-91</a>, 

<a class="vip" href="http://www.ebay.com/itm/Chilton-Repair-Manual-Honda-Civic-Del-Sol-1996-00-/262123397570?hash=item3d07c5a1c2:g:decAAOSwA4dWNs07&amp;vxp=mtr" title="Click this link to access Chilton Repair Manual Honda Civic &amp; Del Sol, 1996-00">Chilton Repair Manual Honda Civic &amp; Del Sol, 1996-00</a>
]

此时我想做的是专门接收列表的标题。这样我就可以计算列表集中每个单词的频率。

预期输出:

[
'1992 - 1995 HONDA CIVIC/DEL SOL FUSE BOX',
'Chilton Repair Manual Honda Civic &amp; Del Sol, 1996-00',
'ETC'
]

如果您注意到,每行中有两个位置显示标题。首先在

"title=Click this link to access..." 

紧随其后,一直到行尾。 我尝试使用 string.split() 和其他变体,但我无法弄清楚如何以仅指定标题中的单词的方式使用它。我不断得到不同的单词混乱,每行一个字母,或者只是整个列表项等等。

有人知道什么好方法吗?

【问题讨论】:

  • 我建议加入 eBay 开发者计划并使用他们的 API。它更好而且免费。 go.developer.ebay.com

标签: python-3.x beautifulsoup python-requests screen-scraping


【解决方案1】:

您需要为找到的每个元素调用.get_text()

[a.get_text() for a in soup.find_all('a', {'class': 'vip'})]

【讨论】:

  • @indianhearts 提供的代码将使用class="vip" 循环遍历a 元素并获取找到的每个元素的文本。问题是什么?谢谢。
猜你喜欢
  • 1970-01-01
  • 2023-01-08
  • 2023-01-13
  • 2012-12-15
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多