【发布时间】:2020-11-04 18:07:37
【问题描述】:
我的python3代码:
import requests
url = sys.argv[1]
r = requests.get(url, stream=True)
chunk_size = 20000
with open('metadata.pdf', 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
它将内容保存在 metadat.pdf 中,但这不是 pdf 的真实内容,它是这个 html 页面:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<!-- $HTMLid: index.html /main/6 11-Jun-2004.13:54:09 $ -->
<head>
<title>Allied Waste</title>
<script language="JavaScript">
<!--
if (top != self) {
top.location = self.location;
}
function doRedirect() {
document.login.submit();
}
function init () {
var initChar = /^\?/;
var list = top.location.search.replace(initChar,"");
var parms = list.split('&');
for ( ct=0; ct < parms.length; ct++ ) {
vals = parms[ct].split('=');
switch ( vals[0] ) {
case "unitCode":
document.login.unitCode.value = unescape(vals[1]);
if ( document.login.unitCode.value == 'undefined' || document.login.unitCode.value == '' )
document.login.unitCode.value = "ALW";
break;
default:
document.login.unitCode.value = "ALW";
break;
}
}
document.login.submit();
}
//-->
</script>
</head>
<body onload="init()">
<form name="login" action="inetSrv" method="post">
<input type="hidden" name="type" value="SignonService"/>
<input type="hidden" name="action" value="SignonPrompt"/>
<input type="hidden" name="client" value="701122300"/>
<input type="hidden" name="unitCode" value=""/>
</form>
</body>
</html>
任何帮助,我如何保存文件的真实内容,而不是这个 html? 它应该是真正的pdf,当我下载它时,它就是这个html页面
更新:
当我使用 python 会话时来自服务器的答案:
b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n<html>\n\n \n<head><title></title>\n \n<LINK REL="StyleSheet" HREF="styles/mainStyle.css">\n</head>\n\n<body>\n<div style="float: left; border: 1px solid black; background-color: #FFFFFF; padding: 5px">\n\t<div class="TitleFont">Operation failed</div>\n\t<div class="TitleFont">Reason</div>\n\t<div>\n\t<div class="custom-message-box">\n\t\t\t\t<div class="ErrorFont" ALIGN="left" >A server error has occurred.</div>\n\t\t\t\t<div class="ErrorFont" ALIGN="left" >Error reference id: DLY-00716</div>\n\t\t\t\t<div class="ErrorFont" ALIGN="left" >Time: Wed Jul 15 05:33:12 CDT 2020</div>\n\t</div>\n\t</div>\n\t<div style="width: 600px">\n\t\t<p class="form-style-text">\n\t\tIf contacting customer support, please quote the above error reference id. You may be able to press the browser Back button to return to the previous screen. Otherwise you may need to login again. We apologize for the inconvenience.\n\t\t</p>\n\t</div>\n</div>\n\n</body>\n</html>\n\n'
【问题讨论】:
-
请分享pdf文件所在的url
标签: html python-3.x pdf python-requests