【问题标题】:curl vs. wget produce different redirects and resultscurl 与 wget 产生不同的重定向和结果
【发布时间】:2014-08-23 20:29:07
【问题描述】:

以下网址已在另一个问题中发布。

使用 wget 您可以按预期获得 csv 文件,但 curl 最终会将您重定向到不同的内容。我想知道这两个命令之间有什么区别,或者如何在curl 中获得相同的结果。

wget

wget --output-document=test.csv --no-check-certificate 'https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv'

卷曲

curl --location --insecure --output test.csv 'https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv'

已更新标题信息

标题比较

wget 1

--2014-07-03 09:54:30--  https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv
Resolving docs.google.com... 74.125.226.98, 74.125.226.100, 74.125.226.102, ...
Connecting to docs.google.com|74.125.226.98|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 302 Moved Temporarily
  Content-Type: text/html; charset=UTF-8
  Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  Pragma: no-cache
  Expires: Fri, 01 Jan 1990 00:00:00 GMT
  Date: Thu, 03 Jul 2014 13:54:30 GMT
  X-Robots-Tag: noindex, nofollow, nosnippet
  Location: https://www.google.com/url?q=https://docs.google.com/spreadsheet/ccc?key%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&sa=p
  Set-Cookie: NID=67=D4vu38cFuNFB-qdFSdaVBpLKJ94VcnpcVDfEpoyECGG-EesJlxBW4Rwb-AA-XAF7ztGOAIzx3u2YYqwRlt516cv3i6jSa9Pazf3uK-hyR5p5QoEYaZ-MqRpj9H_utLwU;Domain=.google.com;Path=/;Expires=Fri, 02-Jan-2015 13:54:30 GMT;HttpOnly
  P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
  Alternate-Protocol: 443:quic
  Transfer-Encoding: chunked
Location: https://www.google.com/url?q=https://docs.google.com/spreadsheet/ccc?key%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&sa=p [following]

卷曲 1

HTTP/1.1 302 Moved Temporarily
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Thu, 03 Jul 2014 13:59:48 GMT
X-Robots-Tag: noindex, nofollow, nosnippet
Location: https://www.google.com/url?q=https://docs.google.com/spreadsheet/ccc?key%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&sa=p
Set-Cookie: NID=67=QTFWWFkySepW985crZ2dZk1JfQ8gGj_H59HwYp-SMcOvYl0J4JU3VfDGCqppxFcEPt-e48qr0yJOx2ImUKH65LlgvuLyF3Ec842bPFq-BFg9a7YWEP_5Uq8YJrJ58taL;Domain=.google.com;Path=/;Expires=Fri, 02-Jan-2015 13:59:48 GMT;HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked

wget 2

--2014-07-03 09:54:30--  https://www.google.com/url?q=https://docs.google.com/spreadsheet/ccc?key%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&sa=p
Resolving www.google.com... 74.125.225.144, 74.125.225.145, 74.125.225.148, ...
Connecting to www.google.com|74.125.225.144|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 302 Found
  X-Frame-Options: ALLOWALL
  Location: https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv&pref=2
  Cache-Control: private
  Content-Type: text/html; charset=UTF-8
  Set-Cookie: PREF=ID=1f6208c8ba0c71f9:FF=0:TM=1404395670:LM=1404395670:S=HaS679Z5xbmJBKs7; expires=Sat, 02-Jul-2016 13:54:30 GMT; path=/; domain=.google.com
  Date: Thu, 03 Jul 2014 13:54:30 GMT
  Server: gws
  Content-Length: 311
  X-XSS-Protection: 1; mode=block
  Alternate-Protocol: 443:quic
Location: https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv&pref=2 [following]

卷曲 2

HTTP/1.1 302 Found
X-Frame-Options: ALLOWALL
Location: https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv&pref=2
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=432f03534cff2fd2:FF=0:TM=1404395989:LM=1404395989:S=1NwOiUYJQYKfn6qF; expires=Sat, 02-Jul-2016 13:59:49 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=EjeYW1PP63Nxk5upQVhEVreT_prZXQYQy4WVKZCHkY3cXffcTWyvXIJkt4Tg07LUoHo3GSkEg6qDh5ff5ESGhksbjT50ytYRd0SyKp7quyorpbT4GMhnbORlkFfTNdRc; expires=Fri, 02-Jan-2015 13:59:49 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Thu, 03 Jul 2014 13:59:49 GMT
Server: gws
Content-Length: 311
X-XSS-Protection: 1; mode=block
Alternate-Protocol: 443:quic

wget 3

--2014-07-03 09:54:31--  https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv&pref=2
Connecting to docs.google.com|74.125.226.98|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 302 Moved Temporarily
  Content-Type: text/html; charset=UTF-8
  Location: https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv
  Date: Thu, 03 Jul 2014 13:54:31 GMT
  Expires: Thu, 03 Jul 2014 13:54:31 GMT
  Cache-Control: private, max-age=0
  X-Content-Type-Options: nosniff
  X-Frame-Options: SAMEORIGIN
  X-XSS-Protection: 1; mode=block
  Server: GSE
  Alternate-Protocol: 443:quic
  Transfer-Encoding: chunked
Location: https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv [following]

卷曲 3

HTTP/1.1 302 Moved Temporarily
Content-Type: text/html; charset=utf-8
Location: https://www.google.com/accounts/ServiceLogin?service=wise&passive=1209600&continue=https://docs.google.com/spreadsheet/ccc?key%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&followup=https://docs.google.com/spreadsheet/ccc?key%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&ltmpl=sheets
Content-Length: 2270
Set-Cookie: NID=67=NdTD41weGlHPUtsUMwF0a7ugZ5Hfof3Q8CFsy2gQcJuBaH8ugZIYppe2PWWhP5fEMtdToEi76-lQH_lAJUeLEkNo1nObesgzEnSSg3HEJeb-5vYrAs4fwR7bM7Ourxeh;Domain=.google.com;Path=/;Expires=Fri, 02-Jan-2015 13:59:49 GMT;HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Thu, 03 Jul 2014 13:59:49 GMT
Expires: Thu, 03 Jul 2014 13:59:49 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE

wget 4(最终)

--2014-07-03 09:54:31--  https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv
Reusing existing connection to docs.google.com:443.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Type: text/csv; charset=utf-8
  Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  Pragma: no-cache
  Expires: Fri, 01 Jan 1990 00:00:00 GMT
  Date: Thu, 03 Jul 2014 13:54:31 GMT
  X-Robots-Tag: noindex, nofollow, nosnippet
  Content-Disposition: attachment; filename="Download Test Spreadsheet.csv"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
  Alternate-Protocol: 443:quic
  Transfer-Encoding: chunked

卷曲 4

HTTP/1.1 302 Moved Temporarily
Content-Type: text/html; charset=UTF-8
Location: https://accounts.google.com/ServiceLogin?service=wise&passive=1209600&continue=https%3A%2F%2Fdocs.google.com%2Fspreadsheet%2Fccc%3Fkey%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&followup=https%3A%2F%2Fdocs.google.com%2Fspreadsheet%2Fccc%3Fkey%3D0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc%26output%3Dcsv%26pref%3D2&ltmpl=sheets
Content-Length: 556
Date: Thu, 03 Jul 2014 13:59:49 GMT
Expires: Thu, 03 Jul 2014 13:59:49 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Alternate-Protocol: 443:quic

卷曲 5(最终)

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=10893354; includeSubDomains
Set-Cookie: GAPS=1:v3eXsN1lqmN5ryz1eyf2iMBP2uoIGg:wiYHYyLrGeoRHUfk;Path=/;Expires=Sat, 02-Jul-2016 13:59:49 GMT;Secure;HttpOnly;Priority=HIGH
X-Frame-Options: DENY
Date: Thu, 03 Jul 2014 13:59:49 GMT
Expires: Thu, 03 Jul 2014 13:59:49 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Length: 0
Server: GSE
Alternate-Protocol: 443:quic

【问题讨论】:

  • 除了“重定向到不同的东西”之外,curl 会产生什么结果?
  • 好吧,最终 wget 会下载 csv 文件,而 curl 会下载与 csv 无关的整个 html 页面。
  • 您必须提供更多详细信息。目前,对于问题中的 URL,我得到了 404 Not Found。一个很好的起点是 Curl 和 Wget 的详细输出。更好的办法是提供两者的调试日志。
  • 感谢您的建议,我已修复 wget 命令中的 404 错误并添加到标头比较中。

标签: http redirect curl wget


【解决方案1】:

一个很好的调试技术是打开该链接,同时让开发人员工具栏在 chrome 中打开并查看网络选项卡。可以右键单击该选项卡中的所有请求以显示 cURL 命令以下载该信息。

在您的情况下,问题似乎是 wget 正在为您处理 cookie,而 cURL 没有。这应该很容易解决:

curl 'https://docs.google.com/spreadsheet/ccc?key=0At2sqNEgxTf3dEt5SXBTemZZM1gzQy1vLVFNRnludHc&output=csv' --location --cookie tmp.cookie
# Foo,Bar,Baz
# 1,2,3
# 4,5,6

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-01-19
    • 2018-06-27
    • 2015-04-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-06-03
    相关资源
    最近更新 更多