【发布时间】:2021-03-03 02:37:58
【问题描述】:
我正在尝试使用 python 中的套接字模块发出请求。它成功地发出请求,获取响应并对其进行解码。当我查看 HTML 文档时,除了 HTML 文档中有 3-4 个随机长的随机字符串之外,一切都是正确的。我认为我的代码是正确的,但我不是 100% 确定。这是我的代码:
def recive_data(get, timeout):
ready = select.select([get], [], [], timeout)
if ready[0]:
return get.recv(4096)
return b""
def get_file(website, port, file, https=False):
data = []
new_data = ""
if https:
get = ssl.create_default_context().wrap_socket(socket.socket(socket.AF_INET, socket.SOCK_STREAM), server_hostname=website)
else:
get = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
get.connect((website, port))
get.sendall(f"GET {file} HTTP/1.1\r\nHost: {website}:{port}\r\n\r\n".encode())
while True:
new_data = recive_data(get, 5).decode()
if new_data != "" and new_data != None:
data.append(new_data)
new_data = ""
else:
break
data = "".join(data)
header = data[0:data.find(newline+newline)]
data = data[data.find(newline+newline):data.rfind(f"{newline}0{newline}{newline}")]
data = BeautifulSoup(data, 'html.parser').prettify()
get.close()
return (header, data)
如果我输入https://stackoverflow.com,它会输出:
30d
<!DOCTYPE html>
<html class="html__responsive html__unpinned-leftnav">
<head>
<title>
Stack Overflow - Where Developers Learn, Share, & Build Careers
</title>
<link href="https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196" rel="shortcut icon"/>
<link href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a" rel="apple-touch-icon"/>
<link href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a" rel="image_src"/>
<link href="/opensearch.xml" rel="search" title="Stack Overflow" type="application/opensearchdescription+xml"/>
<meta content="Stack Overflow is the largest, most trusted online communi
20d0
ty for developers to learn, share their programming knowledge, and build their careers." name="description"/>
<meta content="width=device-width, height=device-height, initial-scale=1.0, minimum-scale=1.0" name="viewport"/>
<meta content="website" property="og:type">
等等…… 但是,有些网站比其他网站拥有更多,我也无法弄清楚。非常感谢任何帮助!
【问题讨论】:
标签: python html python-3.x https get