【问题标题】:selecting values of tags within tags在标签中选择标签的值
【发布时间】:2016-10-21 09:35:10
【问题描述】:

这是我感兴趣的 html 代码的一部分:

<div class="mreinfwpr" id="mhd">
    <p class="mreinfp">Hours of Operation <a href="javascript:void(0);" class="" id="vhall" onclick="houroperate('all')">(View all)</a><a href="javascript:void(0);" class="dn" id="swless" onclick="houroperate('less')">(Show less)</a></p>
    <ul id="hroprt" class="alstdul">
        <li class="mreinfli">
                                <span class="mreinflispn1">Today</span><span class="mreinflispn2"><span>11:30 am - 11:30 pm</span>
                            </span><span class="mreinflispn3">Closed Now</span>  </li>
    </ul>
    <!-- View All Work Timings Vertically  -->
    <ul class="alstdul dn" id="statHr">
                <li class="mreinfli">
                <span class="mreinflispn1"> Monday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
                <li class="mreinfli">
                <span class="mreinflispn1"> Tuesday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
                <li class="mreinfli">
                <span class="mreinflispn1"> Wednesday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
                <li class="mreinfli">
                <span class="mreinflispn1"> Thursday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
                <li class="mreinfli">
                <span class="mreinflispn1"> Friday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
                <li class="mreinfli">
                <span class="mreinflispn1"> Saturday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
                <li class="mreinfli">
                <span class="mreinflispn1"> Sunday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
            </li>
        </ul>

</div>

                <div class="mreinfwpr">
    <p class="mreinfp">Also Listed in</p>
    <ul class="alstdul">


                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Pubs-<near>-Indira-Nagar-2nd-Stage/ct-1000027567" title="Pubs in Indira-Nagar-2nd-Stage, Bangalore">Pubs</a>

                                <!--    <li class="spc"></li> -->

                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Pizza-Outlets-<near>-Indira-Nagar-2nd-Stage/ct-50105" title="Pizza Outlets in Indira-Nagar-2nd-Stage, Bangalore">Pizza Outlets</a>

                                <!--    <li class="spc"></li> -->



                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Restaurants-<near>-Indira-Nagar-2nd-Stage/ct-304085" title="Restaurants in Indira-Nagar-2nd-Stage, Bangalore">Restaurants</a>

                                <!--    <li class="spc"></li> -->

                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Lounge-Bars-<near>-Indira-Nagar-2nd-Stage/ct-597637" title="Lounge Bars in Indira-Nagar-2nd-Stage, Bangalore">Lounge Bars</a>

                                <!--    <li class="spc"></li> -->



                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Microbrewery-Pubs-<near>-Indira-Nagar-2nd-Stage/ct-1041785821" title="Microbrewery Pubs in Indira-Nagar-2nd-Stage, Bangalore">Microbrewery Pubs</a>

                                <!--    <li class="spc"></li> -->

                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Nightlife-Restaurants-<near>-Indira-Nagar-2nd-Stage/ct-1041746883" title="Nightlife Restaurants in Indira-Nagar-2nd-Stage, Bangalore">Nightlife Restaurants</a>

                                <!--    <li class="spc"></li> -->



                        <li>
                    <a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Foodie-Delight-<near>-Indira-Nagar-2nd-Stage/ct-1041818989" title="Foodie Delight in Indira-Nagar-2nd-Stage, Bangalore">Foodie Delight</a>

                                <!--    <li class="spc"></li> -->


                                <!--    <li class="spc"></li> -->




                                <!--    <li class="spc"></li> -->


                                <!--    <li class="spc"></li> -->

                                    <li>
                <a href="javascript:void(0);" onclick="_ct('morlstdin', 'dtpg');
                        openDiv('alsp');">more...</a>
            </li>
            </ul>
</div>
        <div class="mreinfwpr">
    <p class="mreinfp">Services</p>
                        <span class="srihd">General</span>
            <ul class="alstdul">
                                                                <!-- <tr  > -->
                                            <li><img class="srimg" src="http://www.justdial.com/public/images/icon/bar.png" width="20" height="20" /><span class="sritxt">Bar                                                   </span></li>
                                                <!-- <td class="spc"></td> -->
                                    <li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Outdoor Seating                                                  </span></li>
                                                <!-- </tr> -->
                                                            <!-- <tr  > -->
                                            <li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Alcohol                                                  </span></li>
                                                <!-- <td class="spc"></td> -->
                                    <li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">AC                                                   </span></li>
                                                <!-- </tr> -->
                                                            <!-- <tr class="reset" > -->
                                            <li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">WiFi                                                 </span></li>
                                                <!-- <td class="spc"></td> -->
                                    <li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Dinein                                                   </span></li>
                                                <!-- </tr> -->
                                    </ul>
        </div>
         <div class="mreinfwpr">
    <p class="mreinfp">Modes of Payment</p>
    <ul class="alstdul">

                                    <li>Cash</td>
                                <!-- <td class="spc"></td> -->
                                <li>Master Card</td>
                                </li>

                                    <li>Visa Card</td>
                                <!-- <td class="spc"></td> -->
                                <li>Debit Cards</td>
                                </li>

                                    <li>Credit Card</td>
                                <!-- <td class="spc"></td> -->
                </div>
            <div class="mreinfwpr">
    <p class="mreinfp">Year Established</p>
    <ul class="alstdul">
        <li> 2010</li>
    </ul>
</div>

我想要“也列在”类别中的数据。即:

Also Listed in

Pubs 
Pizza Outlets 
Restaurants 
Lounge Bars 
Microbrewery Pubs 
Nightlife Restaurants 
Foodie Delight 
more...

我尝试使用:

also_listed_in=bSoup.findAll("a", { "onclick" : "_ct('alsocat', 'dtpg', '17592186044416');" })

我能够获得所需的数据。但问题是“a”标签内的属性,即 onclick = _ct('alsocat', 'dtpg', '17592186044416') 使用相同类型的不同网址不断变化。但是我观察到 _ct('alsocat', 'dtpg' 的一部分 _ct('alsocat', 'dtpg', '17592186044416') 对于所有相似的都是相同的网址。 请帮我获取所需的数据。

【问题讨论】:

    标签: python-2.7 beautifulsoup findall


    【解决方案1】:

    你可以使用onclick文本中没有变化的部分来得到你想要的:

    from bs4 import BeautifulSoup
    import re
    soup = BeautifulSoup(html)
    
    print(soup.find_all("a",onclick=re.compile(r"_ct\('alsocat', 'dtpg'")))
    

    如果 _ct('alsocat' 对这些 url 是唯一的,那么您可以使用 css startswith 选择器:

     print(soup.select("a[onclick^=_ct('alsocat']"))
    

    【讨论】:

    • 非常感谢。还有一个快速的问题。我如何获得不同的付款方式?它们是:现金、万事达卡、Visa 卡、借记卡、信用卡?
        的集合不是唯一的。
      • 的集合对于付款方式、成立年份、也在中很常见跨度>
    猜你喜欢
    相关资源
    最近更新 更多
    热门标签