【问题标题】:How to extract data w xpath in python when the HTML classes have the same name当HTML类具有相同名称时如何在python中提取数据w xpath
【发布时间】:2020-06-20 03:46:52
【问题描述】:

我正在尝试单独捕获值51011020RecifeBoa Viagem,但我无法理解表达式如何区分这些元素,因为类具有名称。

In [24]: response.xpath('//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()')
Out[24]: 
[<Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='51011020'>,
 <Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Recife'>,
 <Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Boa Viagem'>]

当尝试上面的代码时,它会一起返回三个数据。我怎样才能单独获得它们?非常感谢您的解释。

<div class="h3us20-5 jHoWDW">
    <div class="h3us20-2 fMOiyI">
        <div flexDirection="column" class="sc-jTzLTM sc-ksYbfQ uUqze">
            <span weight="semiBold" theme="[object Object]" tag="span" color="dark" font-weight="400" class="sc-ifAKCX dqTZSU">Localização</span>
            <div class="h3us20-4 eowFbc"></div>
            <div data-testid="ad-properties" class="sc-bwzfXH h3us20-0 cBfPri">
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">CEP</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">51011020</dd>
                    </div>
                </div>
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Município</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Recife</dd>
                    </div>
                </div>
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Bairro</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Boa Viagem</dd>
                    </div>
                </div>
            </div>
        </div>
        <div class="h3us20-4 hrzRZZ"></div>
    </div>
</div>

【问题讨论】:

    标签: python html xpath scrapy


    【解决方案1】:

    由于您需要单独的数据,因此您需要 3 个不同的 XPath。

    您可以使用位置索引([1][2][3]()):

    (//dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"])[1]/text()
    (//dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"])[2]/text()
    (//dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"])[3]/text()
    

    或带有轴 (following-sibling) 的文本谓词 (.=""):

    //dt[.="CEP"]/following-sibling::dd/text()
    //dt[.="Município"]/following-sibling::dd/text()
    //dt[.="Bairro"]/following-sibling::dd/text()
    

    两种情况下的输出:

    51011020
    Recife
    Boa Viagem
    

    【讨论】:

      猜你喜欢
      • 2015-12-10
      • 2020-11-22
      • 2020-10-26
      • 1970-01-01
      • 1970-01-01
      • 2015-06-19
      • 2020-06-26
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多