【问题标题】:PHP Curl request not working but works fine in POSTMANPHP Curl 请求不起作用,但在 POSTMAN 中工作正常
【发布时间】:2018-03-04 18:51:04
【问题描述】:

我正在尝试登录 MCA 门户(POST URL:http://www.mca.gov.in/mcafoportal/loginValidateUser.do

我尝试使用 Google Chrome 上的 POSTMAN 应用程序登录,效果很好。但是,它在 PHP/Python 中也不起作用。我无法通过 PHP/Python 登录

这是 PHP 代码:

$url="http://www.mca.gov.in/mcafoportal/loginValidateUser.do"; 

$post_fields = array();

$post_fields['userNamedenc']='hGJfsdnk`1t';
$post_fields['passwordenc']='675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
$post_fields['accessCode'] = ""

$str = call_post_mca($url, $post_fields);
$str = str_replace(" ","",$str);   

$dom = new DOMDocument();
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);

$input_id =  '//input[@id="login_accessCode"]/@value';
$input_val = $xpath->query($input_id)->item(0);
$input_val1 = $input_val->nodeValue;

$url="http://www.mca.gov.in/mcafoportal/loginValidateUser.do"; 

$post_fields['userNamedenc']='hGJfsdnk`1t';
$post_fields['passwordenc']='675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
$post_fields['accessCode'] = $input_val1;  //New Accesscode 

function  call_post_mca($url, $params)
{   
    #$user_agent = getRandomUserAgent();
    $user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36";
    $str = "";
    foreach($params as $key=>$value)
    {
        $str = $str . "$key=$value" . "&";
    }
    $postData = rtrim($str, "&");

    $ch = curl_init();  

    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch,CURLOPT_HEADER, false); 
    #curl_setopt($ch, CURLOPT_CAINFO, DOC_ROOT . '/includes/cacert.pem');

    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);   


    curl_setopt($ch,CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_REFERER, $url); 

    $cookie= DOC_ROOT . "/cookie.txt";
    curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
    curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie); 

    $output=curl_exec($ch);

    curl_close($ch);
    return $output;

}

知道缺少什么吗?

【问题讨论】:

  • 你得到什么错误?工作也不是很通用
  • 是的@Tarun,我应该可以通过 CURL 登录。它发生在 POSTMAN 中,但使用 PHP/CURL,它失败了,它仍然显示登录页面
  • 是的@Kiran,但让我再问你一次:你遇到了什么错误?
  • 没有错误。它只是没有登录。它再次显示登录页面,没有错误。由于用户名/密码是正确的,它应该服务于仪表板页面(这发生在邮递员身上)
  • 这需要cookie处理,以及您首先获取login.do以获取新的accessCode然后您需要提交url,您的代码仍然是错误的

标签: php codeigniter curl web-scraping postman


【解决方案1】:

网站做了重定向,所以你需要添加

CURLOPT_FOLLOWLOCATION => 1

到您的选项数组。如果对 cURL 有疑问,请尝试

$status = curl_getinfo($curl);
echo json_encode($status, JSON_PRETTY_PRINT);

给予:

{
"url": "http:\/\/www.mca.gov.in\/mcafoportal\/loginValidateUser.do?userNamedenc=hGJfsdnk%601t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=-825374456",
"content_type": "text\/plain",
"http_code": 302,
"header_size": 1560,
"request_size": 245,
"filetime": -1,
"ssl_verify_result": 0,
"redirect_count": 0,
"total_time": 1.298891,
"namelookup_time": 0.526375,
"connect_time": 0.999786,
"pretransfer_time": 0.999844,
"size_upload": 0,
"size_download": 0,
"speed_download": 0,
"speed_upload": 0,
"download_content_length": 0,
"upload_content_length": -1,
"starttransfer_time": 1.298875,
"redirect_time": 0,
"redirect_url": "http:\/\/www.mca.gov.in\/mcafoportal\/login.do",
"primary_ip": "115.114.108.120",
"certinfo": [],
"primary_port": 80,
"local_ip": "192.168.1.54",
"local_port": 62524
}

如您所见,您获得了302 重定向状态,但redirect_count0。添加选项后,我得到:

{
"url": "http:\/\/www.mca.gov.in\/mcafoportal\/login.do",
"content_type": "text\/html;charset=ISO-8859-1",
"http_code": 200,
"header_size": 3131,
"request_size": 376,
"filetime": -1,
"ssl_verify_result": 0,
"redirect_count": 1,
"total_time": 2.383609,
"namelookup_time": 1.7e-5,
"connect_time": 1.7e-5,
"pretransfer_time": 4.4e-5,
"size_upload": 0,
"size_download": 42380,
"speed_download": 17779,
"speed_upload": 0,
"download_content_length": 42380,
"upload_content_length": -1,
"starttransfer_time": 0.30734,
"redirect_time": 0.915858,
"redirect_url": "",
"primary_ip": "14.140.191.120",
"certinfo": [],
"primary_port": 80,
"local_ip": "192.168.1.54",
"local_port": 62642
}

EDIT url 编码请求参数,并遵循重定向

 $str = urlencode("userNamedenc=hGJfsdnk%601t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=-825374456");
curl_setopt_array(
    $curl , array (
    CURLOPT_URL            => "http://www.mca.gov.in/mcafoportal/loginValidateUser.do" , // <- removed parameters here
    CURLOPT_RETURNTRANSFER => true ,
    CURLOPT_ENCODING       => "" ,
    CURLOPT_FOLLOWLOCATION => 1 ,
    CURLOPT_MAXREDIRS      => 10 ,
    CURLOPT_TIMEOUT        => 30 ,
    CURLOPT_HTTP_VERSION   => CURL_HTTP_VERSION_1_1 ,
    CURLOPT_CUSTOMREQUEST  => "POST" ,
    CURLOPT_POSTFIELDS     => $str,       // <- added this here
    CURLOPT_HTTPHEADER     => array (
        "cache-control: no-cache"
    ) ,
)
);

【讨论】:

  • 我在 curl 选项中尝试了这个选项,但是它仍然会抛出登录页面。
  • 如果有帮助,访问代码会不断变化,所以我需要先通过第一个请求获取它,然后使用访问代码发送第二个请求
  • 不幸的是,登录页面有问题,即使登录失败也会给出HTTP 200 OK。它应该回复HTTP 403 Forbidden,应该有人联系他们并帮助他们修复它。这是检测登录错误的更可靠方法:$domd = @DOMDocument::loadHTML ( $html ); $xp = new DOMXPath ( $domd ); $loginErrors = $xp-&gt;query ( '//ul[@class="errorMessage"]' ); if ($loginErrors-&gt;length &gt; 0) { echo 'encountered following error(s) logging in: '; foreach ( $loginErrors as $err ) { echo $err-&gt;textContent, PHP_EOL; } die (); } else { echo "logged in successfully!"; }。 TL;DR: 200 OK != 成功
【解决方案2】:

问题的复制

我在 Postman 中做了与您截屏相同的操作,但无法登录:

我能看到的唯一区别是您的请求包含 cookie,我怀疑这就是您能够在没有所有其他输入字段的情况下登录的原因。而且好像有很多输入字段:

使用邮递员

所以,我使用邮递员拦截来捕获登录期间使用的所有字段,包括验证码和访问代码,并且我能够登录:

更新 1

我发现,一旦您解决了验证码登录,在您注销后,您可以在表单数据中不包含 displayCaptchauserEnteredCaptcha 的情况下再次登录,提供您使用的 cookie 与您成功登录时使用的 cookie 相同。您只需从登录页面获取有效的accessCode

【讨论】:

  • 我也建议您在此之后更改密码。
  • 这看起来不像解决方案。能否请提供您能够登录的代码?
  • 抱歉,问题是我只有在提供 accessCode 和验证码后才能登录。如果您需要代码检查 pastebin.com/GjLYGqPf 。但请注意,我必须手动将 accessCode 和验证码值放入脚本中才能使其正常工作。如果你需要它是自动的,你需要一个验证码求解器,比如captchatronix.com/api.php
  • 这段代码实际上没有用。它对您的设置有效吗?
  • 是的,但就像我提到的那样,它起作用的唯一方法是:1. 在脚本中手动输入 accessCode 和验证码,以及 2. 我从浏览器中复制了 cookie(其中我查看了验证码)到 cookies.txt。
【解决方案3】:

@yvesleborg 和@tarun-lalwani 给出了正确的提示。您需要处理 cookie 和重定向。但尽管如此,它并不总是对我有用。我猜网站运营商需要在两个请求之间进行一些超时。

我稍微重写了你的代码来玩弄它。 mycurl.php:

function my_curl_init() {
    $url="http://www.mca.gov.in/mcafoportal/loginValidateUser.do"; 
    $user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36";

    $ch = curl_init();  

    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    return $ch;
}

/*
 * first call in order to get accessCode and sessionCookie
 */
$ch = my_curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, __DIR__ . "/cookie.txt"); // else cookielist is empty

$output = curl_exec($ch);

file_put_contents(__DIR__ . '/loginValidateUser.html', $output);

// save cookie info
$cookielist = curl_getinfo($ch, CURLINFO_COOKIELIST);
//print_r($cookielist);

curl_close($ch);

// parse accessCode from output
$re = '/\<input.*name="accessCode".*value="([-0-9]+)"/';
preg_match_all($re, $output, $matches, PREG_SET_ORDER, 0);
if ($matches) {
    $accessCode = $matches[0][1];

    // debug
    echo "accessCode: $accessCode" . PHP_EOL;   


    /*
     * second call in order to login
     */ 

    $post_fields = array(
        'userNamedenc' => 'hGJfsdnk`1t',
        'passwordenc'  => '675894242fa9c66939d9fcf4d5c39d1830f4ddb9',
        'accessCode'   => $accessCode
    );

    $cookiedata = preg_split('/\s+/', $cookielist[0]);
    $session_cookie = $cookiedata[5] . '=' . $cookiedata[6];

    // debug
    echo "sessionCookie: $session_cookie" . PHP_EOL;
    file_put_contents(__DIR__ . '/cookie2.txt', $session_cookie);

    /* 
     * !!! pause !!!
     */  
    sleep(20);

    // debug
    echo "curl -v -L -X POST -b '$session_cookie;' --data 'userNamedenc=hGJfsdnk`1t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=$accessCode' http://www.mca.gov.in/mcafoportal/loginValidateUser.do > loginValidateUser2.html";

    $ch = my_curl_init();  

    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);   

    curl_setopt($ch, CURLOPT_COOKIE, $session_cookie);

    $output = curl_exec($ch);

    file_put_contents(__DIR__ . '/loginValidateUser2.html', $output);

    curl_close($ch);
}

脚本向网站发出两个请求。第一个输出用于读取accessCode 并存储会话cookie。然后稍稍休息后,使用 accessCode 和会话信息以及登录凭据发出第二个。

我从终端 (php -f mycurl.php) 使用 PHP5.6 对其进行了测试。该脚本调试所有必要的信息,输出可以在终端中使用的 curl 命令,并将 HTML 和 cookie 信息记录到与脚本相同的文件夹中的某些文件中。

过于频繁地运行脚本不起作用。登录将不起作用。所以花点时间在你的尝试之间等待几分钟。或更改您的 IP ;)

希望对你有帮助。

【讨论】:

  • 很奇怪,在我的测试中,如果你在不解决验证码的情况下尝试登录,你会得到以下错误:Enter valid Letters shown,ps 你可以检查这样的登录错误:$domd = @DOMDocument::loadHTML ( $html ); $xp = new DOMXPath ( $domd ); $loginErrors = $xp-&gt;query ( '//ul[@class="errorMessage"]' ); if ($loginErrors-&gt;length &gt; 0) { echo 'encountered following error(s) logging in: '; foreach ( $loginErrors as $err ) { echo $err-&gt;textContent, PHP_EOL; } die (); } else { echo "logged in successfully!"; }
  • 够奇怪的。我再次对其进行了测试,我上面的脚本生成的文件loginValidateUser2.html 包含Welcome Kiran。因此,为了以编程方式登录,验证码似乎不是强制性的(至少对我来说不是......)。
【解决方案4】:

it doesnt work either in PHP/Python(正如其他人已经指出的那样)因为您使用的是浏览器现有的 cookie 会话, 这已经解决了验证码。清除您的浏览器 cookie,获取新的 cookie 会话,并且不要解决验证码,邮递员也将无法登录。 Any idea what is missing ? 几件事,其中,几个登录后参数(browserFlagloginType__checkbox_dscBasedLoginFlag 等等), 你的编码循环也被窃听了$str = $str . "$key=$value" . "&amp;";, 只要键和值都只包含 [a-zA-Z0-9] 字符,它几乎就可以工作, 并且由于您的 userNamedenc 包含重音字符,因此您的编码循环不足。 一个固定的循环将是

foreach($params as $key=>$value){
    $str = $str . urlencode($key)."=".urlencode($value) . "&";
}
$str=substr($str,0,-1);

,但是 这正是我们有 http_build_query 函数的原因,整个循环和以下修剪可以用这一行替换:

$str=http_build_query($params);

同样,您似乎正在尝试在没有预先存在的 cookie 会话的情况下登录, 那是行不通的。当您向登录页面发出 GET 请求时,您会得到一个 cookie 和一个唯一的验证码, 验证码答案与您的 cookie 会话相关,需要在您尝试登录之前解决, 您也没有提供处理验证码的代码。另外,在解析“userName”输入元素时,它会默认为“Enter Username”,它被javascript清空并替换为userNamedenc,你必须在PHP中复制这个, 此外,它将有一个名为“dscBasedLoginFlag”的输入元素,它被 javascript 删除,您还必须在 php 中执行此部分, 它还有一个名为“Cert”的输入元素,它有一个默认值,但是这个值是用javascript清除的,在php中做同样的事情, 和一个名为“newUserRegistration”的输入元素,用javascript删除,这样做,

这是您应该做的:向登录页面发出 GET 请求,保存 cookie 会话并确保为所有进一步的请求提供它,并解析所有登录表单的元素并将它们添加到您的登录请求中(但是注意,有2个表单输入,1个属于搜索栏,只解析登录表单的子项,不要混用2个),记得清除/移除特殊输入标签模拟 javascript,如上所述, 然后向验证码 url 发出 GET 请求,确保提供会话 cookie,解决验证码, 然后使用验证码答案、userNamedenc 和 passwordenc 以及所有其他元素发出最终登录请求 从登录页面解析出来......应该可以。现在,以编程方式解决验证码, captha 看起来不太难,破解它可能是自动化的,但直到有人真正做到这一点, 您可以使用 Deathbycaptcha 为您完成此操作,但请注意,它不是免费服务。

这是一个经过全面测试的工作示例实现,使用我的 hhb_curl 库(来自 https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php)和 Deathbycaptcha api:

<?php
declare(strict_types = 1);
header ( "content-type: text/plain;charset=utf8" );
require_once ('hhb_.inc.php');
const DEATHBYCATPCHA_USERNAME = '?';
const DEATHBYCAPTCHA_PASSWORD = '?';
$hc = new hhb_curl ( '', true );
$hc->setopt(CURLOPT_TIMEOUT,20);// im on a really slow net atm :(
$html = $hc->exec ( 'http://www.mca.gov.in/mcafoportal/login.do' )->getResponseBody (); // cookie session etc
$domd = @DOMDocument::loadHTML ( $html );
$inputs = getDOMDocumentFormInputs ( $domd, true ) ['login'];
$params = [ ];
foreach ( $inputs as $tmp ) {
    $params [$tmp->getAttribute ( "name" )] = $tmp->getAttribute ( "value" );
}
assert ( isset ( $params ['userNamedenc'] ), 'username input not found??' );
assert ( isset ( $params ['passwordenc'] ), 'passwordenc input not found??' );
$params ['userName'] = ''; // defaults to "Enter Username", cleared with javascript
unset ( $params ['dscBasedLoginFlag'] ); // removed with javascript
$params ['Cert'] = ''; // cleared to emptystring with javascript
unset ( $params ['newUserRegistration'] ); // removed with javascript
unset ( $params ['SelectCert'] ); // removed with javascript
$params ['userNamedenc'] = 'hGJfsdnk`1t';
$params ['passwordenc'] = '675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
echo 'parsed login parameters: ';
var_dump ( $params );
$captchaRaw = $hc->exec ( 'http://www.mca.gov.in/mcafoportal/getCapchaImage.do' )->getResponseBody ();
$params ['userEnteredCaptcha'] = solve_captcha2 ( $captchaRaw );
// now actually logging in.
$html = $hc->setopt_array ( array (
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => http_build_query ( $params ) 
) )->exec ( 'http://www.mca.gov.in/mcafoportal/loginValidateUser.do' )->getResponseBody ();
var_dump ( $hc->getStdErr (), $hc->getStdOut () ); // printing debug data
$domd = @DOMDocument::loadHTML ( $html );
$xp = new DOMXPath ( $domd );
$loginErrors = $xp->query ( '//ul[@class="errorMessage"]' );
if ($loginErrors->length > 0) {
    echo 'encountered following error(s) logging in: ';
    foreach ( $loginErrors as $err ) {
        echo $err->textContent, PHP_EOL;
    }
    die ();
}
echo "logged in successfully!";
/**
 * solves the captcha manually, by doing: echo ANSWER>captcha.txt
 *
 * @param string $raw_image
 *          raw image bytes
 * @return string answer
 */
function solve_captcha2(string $raw_image): string {
    $imagepath = getcwd () . DIRECTORY_SEPARATOR . 'captcha.png';
    $answerpath = getcwd () . DIRECTORY_SEPARATOR . 'captcha.txt';
    @unlink ( $imagepath );
    @unlink ( 'captcha.txt' );
    file_put_contents ( $imagepath, $raw_image );
    echo 'the captcha is saved in ' . $imagepath . PHP_EOL;
    echo ' waiting for you to solve it by doing: echo ANSWER>' . $answerpath, PHP_EOL;
    while ( true ) {
        sleep ( 1 );
        if (file_exists ( $answerpath )) {
            $answer = trim ( file_get_contents ( $answerpath ) );
            echo 'solved: ' . $answer, PHP_EOL;
            return $answer;
        }
    }
}
function solve_captcha(string $raw_image): string {
    echo 'solving captcha, hang on, with DEATBYCAPTCHA this usually takes between 10 and 20 seconds.';
    {
        // unfortunately, CURLFile requires a filename, it wont accept a string, so make a file of it
        $tmpfileh = tmpfile ();
        fwrite ( $tmpfileh, $raw_image ); // TODO: error checking (incomplete write or whatever)
        $tmpfile = stream_get_meta_data ( $tmpfileh ) ['uri'];
    }
    $hc = new hhb_curl ( '', true );
    $hc->setopt_array ( array (
            CURLOPT_URL => 'http://api.dbcapi.me/api/captcha',
            CURLOPT_POSTFIELDS => array (
                    'username' => DEATHBYCATPCHA_USERNAME,
                    'password' => DEATHBYCAPTCHA_PASSWORD,
                    'captchafile' => new CURLFile ( $tmpfile, 'image/png', 'captcha.png' ) 
            ) 
    ) )->exec ();
    fclose ( $tmpfileh ); // when tmpfile() is fclosed(), its also implicitly deleted.
    $statusurl = $hc->getinfo ( CURLINFO_EFFECTIVE_URL ); // status url is given in a http 300x redirect, which hhb_curl auto-follows
    while ( true ) {
        // wait for captcha to be solved.
        sleep ( 10 );
        echo '.';
        $json = $hc->setopt_array ( array (
                CURLOPT_HTTPHEADER => array (
                        'Accept: application/json' 
                ),
                CURLOPT_HTTPGET => true 
        ) )->exec ()->getResponseBody ();
        $parsed = json_decode ( $json, false );
        if (! empty ( $parsed->captcha )) {
            echo 'captcha solved!: ' . $parsed->captcha, PHP_EOL;
            return $parsed->captcha;
        }
    }
}
function getDOMDocumentFormInputs(\DOMDocument $domd, bool $getOnlyFirstMatches = false): array {
    // :DOMNodeList?
    $forms = $domd->getElementsByTagName ( 'form' );
    $parsedForms = array ();
    $isDescendantOf = function (\DOMNode $decendant, \DOMNode $ele): bool {
        $parent = $decendant;
        while ( NULL !== ($parent = $parent->parentNode) ) {
            if ($parent === $ele) {
                return true;
            }
        }
        return false;
    };
    // i can't use array_merge on DOMNodeLists :(
    $merged = function () use (&$domd): array {
        $ret = array ();
        foreach ( $domd->getElementsByTagName ( "input" ) as $input ) {
            $ret [] = $input;
        }
        foreach ( $domd->getElementsByTagName ( "textarea" ) as $textarea ) {
            $ret [] = $textarea;
        }
        foreach ( $domd->getElementsByTagName ( "button" ) as $button ) {
            $ret [] = $button;
        }
        return $ret;
    };
    $merged = $merged ();
    foreach ( $forms as $form ) {
        $inputs = function () use (&$domd, &$form, &$isDescendantOf, &$merged): array {
            $ret = array ();
            foreach ( $merged as $input ) {
                // hhb_var_dump ( $input->getAttribute ( "name" ), $input->getAttribute ( "id" ) );
                if ($input->hasAttribute ( "disabled" )) {
                    // ignore disabled elements?
                    continue;
                }
                $name = $input->getAttribute ( "name" );
                if ($name === '') {
                    // echo "inputs with no name are ignored when submitted by mainstream browsers (presumably because of specs)... follow suite?", PHP_EOL;
                    continue;
                }
                if (! $isDescendantOf ( $input, $form ) && $form->getAttribute ( "id" ) !== '' && $input->getAttribute ( "form" ) !== $form->getAttribute ( "id" )) {
                    // echo "this input does not belong to this form.", PHP_EOL;
                    continue;
                }
                if (! array_key_exists ( $name, $ret )) {
                    $ret [$name] = array (
                            $input 
                    );
                } else {
                    $ret [$name] [] = $input;
                }
            }
            return $ret;
        };
        $inputs = $inputs (); // sorry about that, Eclipse gets unstable on IIFE syntax.
        $hasName = true;
        $name = $form->getAttribute ( "id" );
        if ($name === '') {
            $name = $form->getAttribute ( "name" );
            if ($name === '') {
                $hasName = false;
            }
        }
        if (! $hasName) {
            $parsedForms [] = array (
                    $inputs 
            );
        } else {
            if (! array_key_exists ( $name, $parsedForms )) {
                $parsedForms [$name] = array (
                        $inputs 
                );
            } else {
                $parsedForms [$name] [] = $tmp;
            }
        }
    }
    unset ( $form, $tmp, $hasName, $name, $i, $input );
    if ($getOnlyFirstMatches) {
        foreach ( $parsedForms as $key => $val ) {
            $parsedForms [$key] = $val [0];
        }
        unset ( $key, $val );
        foreach ( $parsedForms as $key1 => $val1 ) {
            foreach ( $val1 as $key2 => $val2 ) {
                $parsedForms [$key1] [$key2] = $val2 [0];
            }
        }
    }
    return $parsedForms;
}

示例用法:在终端中,写php foo.php | tee test.html,几秒钟后它会说:

the captcha is saved in /home/captcha.png
 waiting for you to solve it by doing: echo ANSWER>/home/captcha.txt

然后查看 /home/captcha.png 中的验证码,解决它,然后在另一个终端中写入:echo ANSWER&gt;/home/captcha.txt,现在脚本将登录,并将登录的 html 转储到 test.html 中,你可以在浏览器中打开,确认它确实登录,运行时截图:https://image.prntscr.com/image/_AsB_0J6TLOFSZuvQdjyNg.png

还请注意,我制作了 2 个验证码求解器功能,1 个使用 deathbycaptcha api,直到您在第 5 行和第 6 行提供一个有效且已记入贷方的 deathbycaptcha 帐户后才能工作,这不是免费的,另一个 1,solve_captcha2,问你自己解决验证码,并告诉你验证码图像保存在哪里(所以你可以去看看),以及要写什么命令行参数,以提供答案。只需在第 28 行将 solve_captcha 替换为 solve_captcha2 即可手动解决,反之亦然。该脚本已使用solve_captcha2进行了全面测试,但deathbycaptcha求解器未经测试,因为我的deathbycatpcha帐户是空的(如果您想捐款以便我可以实际测试它,请将7美元发送到paypal帐户divinity76@gmail.com链接到这个线程,我会买最便宜的deathbycaptcha信用包并实际测试它)

  • 免责声明:我与 deadbycaptcha 没有任何关系(除了几年前我是他们的客户),而且这篇文章没有得到赞助。

【讨论】:

    【解决方案5】:

    您可以做的最简单的事情是在 POSTMAN 中渲染 PHP 代码,因为您已经在 POSTMAN 中工作了。 Here 是关于从 POSTMAN 获取 PHP 代码的链接。然后您可以将 POSTMAN 示例与您的代码进行比较。

    <?php
    
    $curl = curl_init();
    
    curl_setopt_array($curl, array(
      CURLOPT_URL => "http://www.mca.gov.in/mcafoportal/loginValidateUser.do?userNamedenc=hGJfsdnk%601t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=",
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_ENCODING => "",
      CURLOPT_MAXREDIRS => 10,
      CURLOPT_TIMEOUT => 30,
      CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
      CURLOPT_CUSTOMREQUEST => "POST",
      CURLOPT_HTTPHEADER => array(
        "cache-control: no-cache",
        "postman-token: b54abdc0-17be-f38f-9aba-dbf8f007de99"
      ),
    ));
    
    $response = curl_exec($curl);
    $err = curl_error($curl);
    
    curl_close($curl);
    
    if ($err) {
      echo "cURL Error #:" . $err;
    } else {
      echo $response;
    }
    

    我立即想到的是这个“hGJfsdnk`1t”。后引号可以是转义字符'`'。这很可能引发错误处理重定向回登录页面的错误。 POSTMAN 可能内置了一些东西来将转义字符渲染为“hGJfsdnk%601t”。因此,这适用于 POSTMAN,但不适用于您的代码。

    这是此请求的状态:

    {
    "url": "http:\/\/www.mca.gov.in\/mcafoportal\/login.do",
    "content_type": "text\/html;charset=ISO-8859-1",
    "http_code": 200,
    "header_size": 3020,
    "request_size": 821,
    "filetime": -1,
    "ssl_verify_result": 0,
    "redirect_count": 1,
    "total_time": 2.920125,
    "namelookup_time": 8.2e-5,
    "connect_time": 8.7e-5,
    "pretransfer_time": 0.000181,
    "size_upload": 0,
    "size_download": 42381,
    "speed_download": 14513,
    "speed_upload": 0,
    "download_content_length": -1,
    "upload_content_length": -1,
    "starttransfer_time": 0.320995,
    "redirect_time": 2.084554,
    "redirect_url": "",
    "primary_ip": "115.114.108.120",
    "certinfo": [],
    "primary_port": 80,
    "local_ip": "192.168.1.3",
    "local_port": 45086
    }
    

    这里显示成功登录。

    【讨论】:

    • 不,不幸的是,登录页面有问题,即使登录失败也会提供HTTP 200 OK。它应该回复了HTTP 403 Forbidden,应该有人联系他们并帮助他们修复它。这是检测登录错误的更可靠方法:$domd = @DOMDocument::loadHTML ( $html ); $xp = new DOMXPath ( $domd ); $loginErrors = $xp-&gt;query ( '//ul[@class="errorMessage"]' ); if ($loginErrors-&gt;length &gt; 0) { echo 'encountered following error(s) logging in: '; foreach ( $loginErrors as $err ) { echo $err-&gt;textContent, PHP_EOL; } die (); } else { echo "logged in successfully!"; }
    • 可能是主页被窃听了,但这确实会在顶部返回“Welcome Guest”,就像 POSTMAN 所做的那样。
    • 当您使用 OP 提供的凭据登录时,它会显示 Welcome Kiran(不是访客)。如果您想亲自查看,请尝试我的答案中的代码php foo.php | tee foo.html,解决验证码,然后在浏览器中打开 foo.html
    • 这是我运行它时的截图,保存 html,然后在浏览器中打开它:image.prntscr.com/image/_AsB_0J6TLOFSZuvQdjyNg.png
    【解决方案6】:

    老实说,这是我很长时间以来看到的奇怪网站之一。首先要知道它是如何工作的。所以我决定用 chrome 看看当我们用错误的数据登录时会发生什么

    观察:

    • 空白用户名和密码字段
    • 生成用户名和密码字段的 SHA1 哈希,然后在 userNamedencrespectively 中设置
    • 我们可以直接在 JavaScript 中覆盖用户名和密码,并通过覆盖控制台的详细信息登录到您的帐户。
    • 有许多不同的请求会生成 cookie,但它们看起来都没有任何用处

    所以解决问题的方法是按照以下步骤进行

    • 获取登录地址login.do
    • 从访问代码的响应中获取表单详细信息
    • 提交表格至loginValidateUser.do

    表单发送以下参数

    现在一个有趣的部分是在帖子数据下方

    displayCaptcha:true
    userEnteredCaptcha:strrty
    

    如果我们将 displayCaptcha 覆盖为 false,则不再需要验证码。如此美妙的绕行

    displayCaptcha: false
    

    接下来是用 PHP 编写上述所有代码,但该站点看起来很奇怪,以至于许多尝试都失败了。所以最后我意识到我们需要让它更接近浏览器登录,而且我觉得调用之间需要延迟

    <?php
    
        require_once("curl.php");
    
        $curl = new CURL();
        $default_headers = Array(
            "Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding" => "deflate",
            "Accept-Language" => "en-US,en;q=0.8",
            "Cache-Control" => "no-cache",
            "Connection" => "keep-alive",
            "DNT" => "1",
            "Pragma" => "no-cache",
            "Referer" => "http://www.mca.gov.in/mcafoportal/login.do",
            "Upgrade-Insecure-Requests" => "1"
        );
    
        // Get the login page 
        $curl
            ->followlocation(0)
            ->cookieejar("")
            ->verbose(1)
            ->get("http://www.mca.gov.in/mcafoportal/login.do")
            ->header($default_headers)
            ->useragent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
            ->execute();
    
    
        // Save the postfileds and access code as we would need them later for the POST field
        $post = $curl->loadInputFieldsFromResponse()
            ->updatePostParameter(array(
                "displayCaptcha" => "false",
                "userNamedenc" => "hGJfsdnk`1t",
                "passwordenc" => "675894242fa9c66939d9fcf4d5c39d1830f4ddb9",
                "userName" => "",
                "Cert" => ""))
            ->referrer("http://www.mca.gov.in/mcafoportal/login.do")
            ->removePostParameters(
                Array("dscBasedLoginFlag", "maxresults", "fe", "query", "SelectCert", "newUserRegistration")
            );
    
        $postfields = $curl->getPostFields();
    
        var_dump($postfields);
    
    
        // Access some dummy URLs to make it look like browser
        $curl
            ->get("http://www.mca.gov.in/mcafoportal/js/global.js")->header($default_headers)->execute()->sleep(2)
            ->get("http://www.mca.gov.in/mcafoportal/js/loginValidations.js")->header($default_headers)->execute()->sleep(2)
            ->get("http://www.mca.gov.in/mcafoportal/css/layout.css")->header($default_headers)->execute()->sleep(2)
            ->get("http://www.mca.gov.in/mcafoportal/img/bullet.png")->header($default_headers)->execute()->sleep(2)
            ->get("http://www.mca.gov.in/mcafoportal/getCapchaImage.do")->header($default_headers)->execute()->sleep(2);
    
    
        // POST to the login form the postfields saved earlier
        $curl
            ->sleep(20)
            ->header($default_headers)
            ->postfield($postfields)
            ->referrer("http://www.mca.gov.in/mcafoportal/login.do")
            ->post("http://www.mca.gov.in/mcafoportal/loginValidateUser.do")
            ->execute(false)
            ->sleep(3)
            ->get("http://www.mca.gov.in/mcafoportal/login.do")
            ->header($default_headers)
            ->execute(true);
    
        // Get the response from last GET of login.do
        $curl->getResponseText($output);
    
        //Check if user name is present in the output or not
        if (stripos($output, "Kiran") > 0) {
            echo "Hurray!!!! Login succeeded";
        } else {
            echo "Login failed please retry after sometime";
        }
    

    运行代码后,它运行了几次,但也运行了几次。我的观察

    • 一次只允许一次登录。所以不确定我测试时其他人是否使用登录
    • 如果没有延迟,它大部分时间都会失败
    • 登录失败没有明显原因,除了网站在服务器端做了一些事情来阻止请求

    我创建并用于链接方法的可重用curl.php如下

    <?php
    
    class CURL
    {
        protected $ch;
        protected $postfields;
    
        public function getPostFields() {
            return $this->postfields;
        }
    
        public function newpost()
        {
            $this->postfields = array();
            return $this;
        }
    
        public function addPostFields($key, $value)
        {
            $this->postfields[$key]=$value;
            return $this;
        }
    
        public function __construct()
        {
            $ch       = curl_init();
            $this->ch = $ch;
            $this->get()->followlocation()->retuntransfer(); //->connectiontimeout(20)->timeout(10);
        }
    
        function url($url)
        {
            curl_setopt($this->ch, CURLOPT_URL, $url);
            return $this;
        }
    
        function verbose($value = true)
        {
            curl_setopt($this->ch, CURLOPT_VERBOSE, $value);
            return $this;
        }
    
        function post($url='')
        {
            if ($url !== '')
                $this->url($url);
            curl_setopt($this->ch, CURLOPT_POST, count($this->postfields));
            curl_setopt($this->ch, CURLOPT_POSTFIELDS, http_build_query($this->postfields));
            return $this;
        }
    
        function postfield($fields)
        {
            if (is_array($fields)){
                $this->postfields = $fields;
            }
            return $this;
        }
    
        function close()
        {
            curl_close($this->ch);
            return $this;
        }
    
        function cookieejar($cjar)
        {
            curl_setopt($this->ch, CURLOPT_COOKIEJAR, $cjar);
            return $this;
        }
    
        function cookieefile($cfile)
        {
            curl_setopt($this->ch, CURLOPT_COOKIEFILE, $cfile);
            return $this;
        }
    
        function followlocation($follow = 1)
        {
            curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, $follow);
            return $this;
        }
    
        function loadInputFieldsFromResponse($response ='')
        {
            if ($response)
                $doc = $response;
            else
                $doc = $this->lastCurlRes;
    
    
            /* @var $doc DOMDocument */
            //simplexml_load_string($data)
            $this->getResponseDoc($doc);
            $this->postfields = array();
    
            foreach ($doc->getElementsByTagName('input') as $elem) {
                /* @var $elem DomNode */
                $name = $elem->getAttribute('name');
    //            if (!$name)
    //                $name = $elem->getAttribute('id');
                if ($name)
                    $this->postfields[$name] = $elem->getAttribute("value");
    
            }
    
            return $this;
        }
    
        function retuntransfer($transfer=1)
        {
            curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, $transfer);
            return $this;
        }
    
        function connectiontimeout($connectiontimeout)
        {
            curl_setopt($this->ch, CURLOPT_CONNECTTIMEOUT, $connectiontimeout);
            return $this;
        }
    
        function timeout($timeout)
        {
            curl_setopt($this->ch, CURLOPT_TIMEOUT, $timeout);
            return $this;
        }
        function useragent($useragent)
        {
            curl_setopt($this->ch, CURLOPT_USERAGENT, $useragent);
            return $this;
        }
    
        function referrer($referrer)
        {
            curl_setopt($this->ch, CURLOPT_REFERER, $referrer);
            return $this;
        }
    
        function getCURL()
        {
            return $this->ch;
        }
    
        protected $lastCurlRes;
        protected $lastCurlResInfo;
    
        function get($url = '')
        {
            if ($url !== '')
                $this->url($url);
            curl_setopt($this->ch, CURLOPT_POST, 0);
            curl_setopt($this->ch, CURLOPT_HTTPGET, true);
            return $this;
        }
    
        function sleep($seconds){
            sleep($seconds);
            return $this;
        }
    
        function execute($output=false)
        {
            $this->lastCurlRes = curl_exec($this->ch);
    
            if ($output == true)
            {
                echo "Response is \n " . $this->lastCurlRes;
                file_put_contents("out.html", $this->lastCurlRes);
            }
            $this->lastCurlResInfo = curl_getinfo($this->ch);
            $this->postfields = array();
            return $this;
        }
    
        function header($headers)
        {
            //curl_setopt($this->ch, CURLOPT_HEADER, true);
            curl_setopt($this->ch, CURLOPT_HTTPHEADER, $headers);
            return $this;
        }
        function getResponseText(&$text){
            $text = $this->lastCurlRes;
            return $this;
        }
    
    
        /*
         *
        * @param DOMDocument $doc
        *
        *
        */
        function getResponseDoc(&$doc){
            $doc = new DOMDocument();
            libxml_use_internal_errors(false);
            libxml_disable_entity_loader();
            @$doc->loadHTML($this->lastCurlRes);
            return $this;
        }
    
        function removePostParameters($keys) {
            if (!is_array($keys))
                $keys = Array($keys);
    
            foreach ($keys as $key){
                if (array_key_exists($key, $this->postfields))
                    unset($this->postfields[$key]);
            }
    
            return $this;
        }
    
        function keepPostParameters($keys) {
            $delete = Array();
            foreach ($this->postfields as $key=>$value){
                if (!in_array($key, $keys)){
                    array_push($delete, $key);
                }
            }
    
            foreach ($delete as $key) {
                unset($this->postfields[$key]);
            }
    
            return $this;
        }
    
        function updatePostParameter($postarray, $encoded=false)
        {
            if (is_array($postarray))
            {
                foreach ($postarray as $key => $value) {
                    if (is_null($value))
                        unset($this->postfields[$key]);
                    else
                        $this->postfields[$key] = $value;
                }}
            elseif (is_string($postarray))
            {
                $parr = preg_split("/&/",$postarray);
                foreach ($parr as $postvalue) {
                    if (($index = strpos($postvalue, "=")) != false)
                    {
                        $key = substr($postvalue, 0,$index);
                        $value = substr($postvalue, $index + 1);
                        if ($encoded)
                            $this->postfields[$key]=urldecode($value);
                        else
                            $this->postfields[$key]=$value;
                    }
                    else
                        $this->postfields[$postvalue] = "";
                }
    
    
            }
    
            return $this;
        }
    
        function getResponseXml(){
            //SimpleXMLElement('<INPUT/>')->asXML();
        }
    
        function SSLVerifyPeer($verify=false)
        {
            curl_setopt($this->ch, CURLOPT_SSL_VERIFYPEER, $verify);
            return $this;
        }
    }
    
    ?>
    

    【讨论】:

    • 非常感谢。感谢您抽出时间。它给了我洞察力,我会尝试你的代码,看看是否有帮助。
    猜你喜欢
    • 2023-04-06
    • 1970-01-01
    • 2017-12-27
    • 2017-11-19
    • 2020-06-20
    • 2018-02-11
    • 1970-01-01
    • 2019-12-26
    • 2019-04-27
    相关资源
    最近更新 更多