Scrapy 从表中获取数据始终为空答案

【问题标题】：Scrapy get data from a table always nullScrapy 从表中获取数据始终为空
【发布时间】：2020-03-15 20:50:20
【问题描述】：

我尝试使用 scrapy 从表中获取测试或数据。但是表没有类。 structur HTML的部分是这样的：

<div class="content_e">
    <div class="content-ranklist">
        <div class="rank-title"><span><h1><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Beijing gourmet restaurant
            </font></font></h1></span><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Updated on November 20th</font></font>
        </div>
        <section class="ranklist-table">
            <table>
                <tbody>
                    <tr>
                        <th class="th-label-0">
                            <div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Ranking</font></font>
                            </div>
                        </th>
                    </tr>
                    <tr>
                        <td class="td-rank">
                            <div class="td-div-1"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">1</font></font>
                            </div>
                        </td>

我试图用不同的方式解决问题。但是，我总是得到None 的[]。我做的是这样的：

        response.css('div.content-ranklist section.ranklist-table table').extract()
        response.css('div.content-ranklist section.ranklist-table table tr td.td-shopName').extract()
        response.css('//td[contains(@class, "td-shopName")]/text()').extract()
        response.xpath("//table/tbody/tr//td[@class='td-shopName']//a[@class='J_shopName']").extract()

结果总是None 和[]

这是结果

[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
[]
=-=-=-=-
``

i was try to get this class :
[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/40x4o.png

【问题讨论】：

你想提取什么？
表格内的td
请显示示例输出
我在测试我的代码后用结果编辑了我的问题
我在问你这里的 html 源代码的预期输出！您要提取的文本！

标签： python html web-scraping scrapy tags

【解决方案1】：

既然是XHR，那么我们开始吧：

from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Firefox()
url = 'http://www.dianping.com/shoplist/shopRank/pcChannelRankingV2?rankId=83f473b08cba2af53642a889d8802c50'
browser.get(url)
time.sleep(3)  # wait 3 seconds for the site to load
html = browser.page_source
soup = BeautifulSoup(html, features='html.parser')

imgs = soup.findAll('a', attrs={'class': 'J_shopName'})
for img in imgs:
    print(img.get('href'))

输出是：

http://www.dianping.com/shop/68193557
http://www.dianping.com/shop/112393652
http://www.dianping.com/shop/93227192
http://www.dianping.com/shop/132799437
http://www.dianping.com/shop/67917756
http://www.dianping.com/shop/17637181
http://www.dianping.com/shop/102198900
http://www.dianping.com/shop/130316435
http://www.dianping.com/shop/121684828
http://www.dianping.com/shop/130834244
http://www.dianping.com/shop/129948761
http://www.dianping.com/shop/73410505
http://www.dianping.com/shop/129320981
http://www.dianping.com/shop/111876029
http://www.dianping.com/shop/93659299

【讨论】：

myhtml 变量是一个字符串，我不知道scrapy 能不能得到这样的字符串。我所知道的就像```response.css("table tr td a:J_shopname")。但是，就像我说的，结果是无
使用requests解析html源！给我链接，我会编辑我的答案。
这里是链接：dianping.com/shoplist/shopRank/…
你要解析哪个类？
J_shopName 类