【问题标题】:Download a cookie to make new GET request下载 cookie 以发出新的 GET 请求
【发布时间】:2013-12-11 12:47:39
【问题描述】:

我正在尝试向网站发出 PHP GET 请求:

问题是,如果我将 Cookie 信息附加到请求的标头,该网站只会处理我的请求。

或者在图片方面,如果我在浏览器中禁用 cookie,我会得到:

这意味着该网站认识到这是我第一次“访问”该网站。

问题是,如果我现在使用右上角的搜索栏,它将处理此请求: 它只会显示相同的(一般)屏幕。

例如:如果我有 cookie禁用并且我搜索“AAPL”,它不会显示任何结果。

现在,如果我启用了 cookie,请求就会得到很好的处理:

因此显示“AAPL”结果。

你也可以自己试试:

启用 cookie,访问 http://www.pennystocktweets.com/user_posts/feeds?cat=search&lptyp=prep&usrstk=AAPL

使用 cookie禁用,再次访问该链接:http://www.pennystocktweets.com/user_posts/feeds?cat=search&lptyp=prep&usrstk=AAPL

现在比较答案,只有第一个是正确的。

这意味着网站只有在客户端下载了一个 cookie,然后向服务器发出另一个(新的)GET 请求并附加了这个 cookie 信息之后才能工作。

(这是否意味着网站需要会话cookie才能正常运行?)

现在我正在尝试使用Apache HttpClient 模仿请求,如下所示:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.net.CookieHandler;
import java.net.CookieManager;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Date;
import java.util.List;
import java.util.StringTokenizer;

import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;

public class downloadTweets {

  private String cookies;
  private HttpClient client = new DefaultHttpClient();
  private final String USER_AGENT = "Mozilla/5.0";

  public static void main(String[] args) throws Exception {

    String  ticker  = "AAPL";  
    String  lptyp   = "prep";  
    int     opid    = 0;
    int     lpid    = 0;

    downloadTweets test = new downloadTweets();

    String url = test.constructURL(ticker, lptyp, opid, lpid);

    // make sure cookies is turn on
    CookieHandler.setDefault(new CookieManager());

    downloadTweets http = new downloadTweets();

    String page = http.GetPageContent(url, ticker);

    System.out.println(page);
  }

  public String constructURL(String ticker, String lptyp, int opid, int lpid)
  {
      String link = "http://www.pennystocktweets.com/user_posts/feeds?cat=search" +

              "&lptyp="     +   lptyp   +
              "&usrstk="    +   ticker;

      if (opid != 0)
      {
          link = link +
              "&opid="      +   opid    +
              "&lpid="      +   lpid;
      }

      return link;
  }

  private String GetPageContent(String url, String ticker) throws Exception {

    HttpGet request = new HttpGet(url);

    String RefererLink = "http://www.pennystocktweets.com/search/post/" + ticker.toUpperCase();

    request.setHeader("Host", "www.pennystocktweets.com");
    request.setHeader("Connection", "Keep-alive");
    request.setHeader("Accept", "*/*");
    request.setHeader("X-Requested-With", "XMLHttpRequest");
    request.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36");
    request.setHeader("Referer", RefererLink);
    request.setHeader("Accept-Language", "nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4,fr;q=0.2");

    HttpResponse response = client.execute(request);
    int responseCode = response.getStatusLine().getStatusCode();

    System.out.println("\nSending 'GET' request to URL : " + url);
    System.out.println("Response Code : " + responseCode);

    BufferedReader rd = new BufferedReader(
                new InputStreamReader(response.getEntity().getContent()));

    StringBuffer result = new StringBuffer();
    String line = "";
    while ((line = rd.readLine()) != null) {
        result.append(line);
    }

    // set cookies
    setCookies(response.getFirstHeader("Set-Cookie") == null ? "" : 
                     response.getFirstHeader("Set-Cookie").toString());

    return result.toString();

  }

  public String getCookies() {
    return cookies;
  }

  public void setCookies(String cookies) {
    this.cookies = cookies;
  }
}

现在,同样的事情也成立:如果我附加(我的)cookie 信息,响应就可以正常工作,如果我不这样做,则响应不起作用。

但我不知道如何获取 cookie 信息,然后在新的 GET 请求中使用它。

所以我的问题是:

我怎样才能向一个网站发出 2 个请求:

在第一个 GET 请求中,我从网站获取 cookie 信息并将其存储在我的 Java 程序中

在第二个 GET 请求中,我使用存储的 cookie 信息(作为 Header)发出新请求。

注意: 我不知道cookie是普通cookie还是会话cookie,但我怀疑它是会话cookie

非常感谢所有帮助!

【问题讨论】:

    标签: java apache http-headers xmlhttprequest session-cookies


    【解决方案1】:

    正如HttpClient Cookie handling part 中 Apache commons httpclient 的文档所述: HttpClient supports automatic management of cookies, including allowing the server to set cookies and automatically return them to the server when required. It is also possible to manually set cookies to be sent to the server.

    每当 http 客户端收到 cookie 时,它​​们就会被持久化到 HttpState 并自动添加到新请求中。这是默认行为。

    在下面的示例代码中,我们可以看到两个 GET 请求返回的 cookie。我们无法直接看到发送到服务器的 cookie,但是我们可以使用协议/网络嗅探器或ngrep 等工具来查看通过网络传输的数据:

    import java.io.IOException;
    
    import org.apache.commons.httpclient.Cookie;
    import org.apache.commons.httpclient.HttpClient;
    import org.apache.commons.httpclient.HttpException;
    import org.apache.commons.httpclient.HttpMethod;
    import org.apache.commons.httpclient.HttpState;
    import org.apache.commons.httpclient.cookie.CookiePolicy;
    import org.apache.commons.httpclient.methods.GetMethod;
    
    public class HttpTest {
    
    public static void main(String[] args) throws HttpException, IOException {
        String url = "http://www.whatarecookies.com/cookietest.asp";
        HttpClient client = new HttpClient();
        client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
        HttpMethod method = new GetMethod(url);
        int res = client.executeMethod(method);
        System.out.println("Result: " + res);
        printCookies(client.getState());
        method = new GetMethod(url);
        res = client.executeMethod(method);
        System.out.println("Result: " + res);
        printCookies(client.getState());
    }
    public static void printCookies(HttpState state){
        System.out.println("Cookies:");
        Cookie[] cookies = state.getCookies();
        for (Cookie cookie : cookies){
            System.out.println("  " + cookie.getName() + ": " + cookie.getValue());
        }               
    }   
    }
    

    这是输出:

    Result: 200
    Cookies:
      active_template::468: %2Fresponsive%2Fthree_column_inner_ad3b74de5a1c2f311bee7bca5c368aaa4e:b326b5062b2f0e69046810717534cb09
    Result: 200
    Cookies:
      active_template::468: %2Fresponsive%2Fthree_column_inner_ad%2C+3b74de5a1c2f311bee7bca5c368aaa4e%3Db326b5062b2f0e69046810717534cb09
      3b74de5a1c2f311bee7bca5c368aaa4e: b326b5062b2f0e69046810717534cb09
    

    这是ngrep的摘录:

    MacBook$ sudo ngrep -W byline -d en0 "" host www.whatarecookies.com
    interface: en0 (192.168.11.0/255.255.255.0)
    filter: (ip) and ( dst host www.whatarecookies.com )
    #####
    T 192.168.11.70:56267 -> 54.228.218.117:80 [AP]
    GET /cookietest.asp HTTP/1.1.
    User-Agent: Jakarta Commons-HttpClient/3.1.
    Host: www.whatarecookies.com.
    .
    
    ####
    T 54.228.218.117:80 -> 192.168.11.70:56267 [A]
    HTTP/1.1 200 OK.
    Server: nginx/1.4.0.
    Date: Wed, 27 Nov 2013 10:22:14 GMT.
    Content-Type: text/html; charset=iso-8859-1.
    Content-Length: 36397.
    Connection: keep-alive.
    Vary: Accept-Encoding.
    Vary: Cookie,Host,Accept-Encoding.
    Set-Cookie: active_template::468=%2Fresponsive%2Fthree_column_inner_ad; expires=Fri, 29-Nov-2013 10:22:01 GMT; path=/; domain=whatarecookies.com; httponly.
    Set-Cookie: 3b74de5a1c2f311bee7bca5c368aaa4e=b326b5062b2f0e69046810717534cb09; expires=Thu, 27-Nov-2014 10:22:01 GMT.
    X-Middleton-Response: 200.
    Cache-Control: max-age=0, no-cache.
    X-Mod-Pagespeed: 1.7.30.1-3609.
    .
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
    ...
    
    ##
    T 192.168.11.70:56267 -> 54.228.218.117:80 [AP]
    GET /cookietest.asp HTTP/1.1.
    User-Agent: Jakarta Commons-HttpClient/3.1.
    Host: www.whatarecookies.com.
    Cookie: active_template::468=%2Fresponsive%2Fthree_column_inner_ad.
    Cookie: 3b74de5a1c2f311bee7bca5c368aaa4e=b326b5062b2f0e69046810717534cb09.
    .
    
    ##
    T 54.228.218.117:80 -> 192.168.11.70:56267 [A]
    HTTP/1.1 200 OK.
    Server: nginx/1.4.0.
    Date: Wed, 27 Nov 2013 10:22:18 GMT.
    Content-Type: text/html; charset=iso-8859-1.
    Content-Length: 54474.
    Connection: keep-alive.
    Vary: Accept-Encoding.
    Vary: Cookie,Host,Accept-Encoding.
    Set-Cookie: active_template::468=%2Fresponsive%2Fthree_column_inner_ad%2C+3b74de5a1c2f311bee7bca5c368aaa4e%3Db326b5062b2f0e69046810717534cb09; expires=Fri, 29-Nov-2013 10:22:05 GMT; path=/; domain=whatarecookies.com; httponly.
    Set-Cookie: 3b74de5a1c2f311bee7bca5c368aaa4e=b326b5062b2f0e69046810717534cb09; expires=Thu, 27-Nov-2014 10:22:05 GMT.
    X-Middleton-Response: 200.
    Cache-Control: max-age=0, no-cache.
    X-Mod-Pagespeed: 1.7.30.1-3609.
    .
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
    ...
    

    【讨论】:

    • 所以我使用CookieOrigin cookieOrigin = context.getCookieOrigin();CookieSpec cookieSpec = context.getCookieSpec(); 获取cookie 信息,然后使用httpclient.execute(httpget, context);(以及上面的行)发送信息?
    • cookies 应该是自动发送的。它是 httpclient 的一部分。 CookiePolicy 可以覆盖如何处理 cookie。
    • 谢谢!我会试试看。
    • 我必须更正我的答案,因为这是针对 Apache HttpClient >= 4。对于 HttpClient 3.x,Cookies 页面指出:HttpClient 支持自动管理 cookie,包括允许服务器设置 cookie 并在需要时自动将它们返回到服务器。 不需要上下文,
    • 我无法成功提出正确的请求。您能否通过显示 cookie 部分所需的确切代码来帮助我?
    猜你喜欢
    • 2020-01-24
    • 1970-01-01
    • 2014-07-06
    • 2020-01-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-08-15
    相关资源
    最近更新 更多