【问题标题】:Setting proxy in Goutte在 Goutte 中设置代理
【发布时间】:2016-06-18 19:41:14
【问题描述】:

我尝试使用 Guzzle 的文档来设置代理,但它不起作用。 Goutte 的官方 Github 页面已经死了,所以在那里找不到任何东西。

有人知道如何设置代理吗?

这是我尝试过的:

$client = new Client();
$client->setHeader('User-Agent', $user_agent);
$crawler = $client->request('GET', $request, ['proxy' => $proxy]);

【问题讨论】:

  • 你找到答案了吗,我也面临同样的问题
  • 不。我刚回到php curl。效果更好,错误更少。

标签: php web-scraping goutte


【解决方案1】:

我已经解决了这个问题=>

    $url = 'https://api.myip.com';
    $client = new \Goutte\Client;
    $client->setClient(new \GuzzleHttp\Client(['proxy' => 'http://xx.xx.xx.xx:8080']));
    $get_html = $client->request('GET', $url)->html();
    var_dump($get_html);

【讨论】:

  • Check this issue 似乎新版本需要另一个配置。使用 PHPUnit 9.5.2 测试
【解决方案2】:

你的想法是正确的,但是在 Goutte\Client::doRequest() 中,当创建 Guzzle 客户端时

$guzzleRequest = $this->getClient()->createRequest(
        $request->getMethod(),
        $request->getUri(),
        $headers,
        $body
);

创建请求对象时不传递选项。所以,如果你想使用代理,那么重写 Goutte\Client 类,方法 doRequest(),并将这段代码替换为

$guzzleRequest = $this->getClient()->createRequest(
        $request->getMethod(),
        $request->getUri(),
        $headers,
        $body,
        $request->getParameters()
);

示例覆盖类:

<?php

namespace igancev\override;

class Client extends \Goutte\Client
{
    protected function doRequest($request)
    {
        $headers = array();
        foreach ($request->getServer() as $key => $val) {
            $key = implode('-', array_map('ucfirst', explode('-', strtolower(str_replace(array('_', 'HTTP-'), array('-', ''), $key)))));
            if (!isset($headers[$key])) {
                $headers[$key] = $val;
            }
        }

        $body = null;
        if (!in_array($request->getMethod(), array('GET','HEAD'))) {
            if (null !== $request->getContent()) {
                $body = $request->getContent();
            } else {
                $body = $request->getParameters();
            }
        }

        $guzzleRequest = $this->getClient()->createRequest(
            $request->getMethod(),
            $request->getUri(),
            $headers,
            $body,
            $request->getParameters()
        );

        foreach ($this->headers as $name => $value) {
            $guzzleRequest->setHeader($name, $value);
        }

        if ($this->auth !== null) {
            $guzzleRequest->setAuth(
                $this->auth['user'],
                $this->auth['password'],
                $this->auth['type']
            );
        }

        foreach ($this->getCookieJar()->allRawValues($request->getUri()) as $name => $value) {
            $guzzleRequest->addCookie($name, $value);
        }

        if ('POST' == $request->getMethod() || 'PUT' == $request->getMethod()) {
            $this->addPostFiles($guzzleRequest, $request->getFiles());
        }

        $guzzleRequest->getParams()->set('redirect.disable', true);
        $curlOptions = $guzzleRequest->getCurlOptions();

        if (!$curlOptions->hasKey(CURLOPT_TIMEOUT)) {
            $curlOptions->set(CURLOPT_TIMEOUT, 30);
        }

        // Let BrowserKit handle redirects
        try {
            $response = $guzzleRequest->send();
        } catch (CurlException $e) {
            if (!strpos($e->getMessage(), 'redirects')) {
                throw $e;
            }

            $response = $e->getResponse();
        } catch (BadResponseException $e) {
            $response = $e->getResponse();
        }

        return $this->createResponse($response);
    }
}

然后尝试发送请求

$client = new \igancev\override\Client();
$proxy = 'http://149.56.85.17:8080'; // free proxy example
$crawler = $client->request('GET', $request, ['proxy' => $proxy]);

【讨论】:

    【解决方案3】:

    您可以设置自定义GuzzleClient 并将其分配给Goutte 客户端。 当Guzzle 通过Goutte 发出请求时,使用默认配置。该配置在 Guzzle 构造中传递。

    $guzzle = new \GuzzleHttp\Client(['proxy' => 'http://192.168.1.1:8080']);
    $goutte = new \Goutte\Client();
    $goutte->setClient($guzzle);
    $crawler = $goutte->request($method, $url);
    

    【讨论】:

      【解决方案4】:

      你可以直接在Goutte或者Guzzle Request中使用

      $proxy = 'xx.xx.xx.xx:xxxx';
      
      $goutte = new GoutteClient();
      echo $goutte->request('GET', 'https://example.com/', ['proxy' => $proxy])->html();
      

      在 Guzzle 中使用相同的方法

      $Guzzle = new Client();
      $GuzzleResponse = $Guzzle->request('GET', 'https://example.com/', ['proxy' => $proxy]);
      

      【讨论】:

        【解决方案5】:

        对于最新版本,请使用:

        Goutte 客户端实例(扩展 Symfony\Component\BrowserKit\HttpBrowser)

        use Symfony\Component\HttpClient\HttpClient;
        use Goutte\Client;
        
        $client = new Client(HttpClient::create(['proxy' => 'http://xx.xx.xx.xx:80']));
        ...
        

        【讨论】:

          猜你喜欢
          • 2015-09-08
          • 1970-01-01
          • 1970-01-01
          • 2014-03-01
          • 2016-10-04
          • 2015-12-31
          • 2020-12-27
          • 2011-12-06
          • 2015-11-26
          相关资源
          最近更新 更多