【发布时间】:2019-06-09 02:41:46
【问题描述】:
我编写了一个正则表达式来从网页中抓取数据。但是我收到了提到的错误。我无法找到解决方案。 有人建议
try:
code
except:
Attribute error
原代码:
import urllib.request
import bs4
import re
url ='https://ipinfo.io/AS7018'
def url_to_soup(url):
req = urllib.request.Request(url)
opener = urllib.request.build_opener()
html = opener.open(req)
soup = bs4.BeautifulSoup(html, "html.parser")
return soup
s = str(url_to_soup(url))
#print(s)
asn_code, name = re.search(r'<h3 class="font-semibold m-0 t-xs-24">(?P<ASN_CODE>AS\d+) (?P<NAME>[\w.\s]+)</h3>', s)\
.groups() # Error code
print(asn_code)
""" This is where the error : From above code """
country = re.search(r'.*href="/countries.*">(?P<COUNTRY>.*)?</a>',s).group("COUNTRY")
print(country)
registry = re.search(r'Registry.*?pb-md-1">(?P<REGISTRY>.*?)</p>',s, re.S).group("REGISTRY").strip()
print(registry)
# flag re.S make the '.' special character match any character at all, including a newline;
ip = re.search(r'IP Addresses.*?pb-md-1">(?P<IP>.*?)</p>',s, re.S).group("IP").strip()
print(ip)
【问题讨论】:
标签: python regex web-scraping beautifulsoup