it doesnt work either in PHP/Python(正如其他人已经指出的那样)因为您使用的是浏览器现有的 cookie 会话,
这已经解决了验证码。清除您的浏览器 cookie,获取新的 cookie 会话,并且不要解决验证码,邮递员也将无法登录。
Any idea what is missing ? 几件事,其中,几个登录后参数(browserFlag、loginType、__checkbox_dscBasedLoginFlag 等等),
你的编码循环也被窃听了$str = $str . "$key=$value" . "&";,
只要键和值都只包含 [a-zA-Z0-9] 字符,它几乎就可以工作,
并且由于您的 userNamedenc 包含重音字符,因此您的编码循环不足。
一个固定的循环将是
foreach($params as $key=>$value){
$str = $str . urlencode($key)."=".urlencode($value) . "&";
}
$str=substr($str,0,-1);
,但是
这正是我们有 http_build_query 函数的原因,整个循环和以下修剪可以用这一行替换:
$str=http_build_query($params);
同样,您似乎正在尝试在没有预先存在的 cookie 会话的情况下登录,
那是行不通的。当您向登录页面发出 GET 请求时,您会得到一个 cookie 和一个唯一的验证码,
验证码答案与您的 cookie 会话相关,需要在您尝试登录之前解决,
您也没有提供处理验证码的代码。另外,在解析“userName”输入元素时,它会默认为“Enter Username”,它被javascript清空并替换为userNamedenc,你必须在PHP中复制这个,
此外,它将有一个名为“dscBasedLoginFlag”的输入元素,它被 javascript 删除,您还必须在 php 中执行此部分,
它还有一个名为“Cert”的输入元素,它有一个默认值,但是这个值是用javascript清除的,在php中做同样的事情,
和一个名为“newUserRegistration”的输入元素,用javascript删除,这样做,
这是您应该做的:向登录页面发出 GET 请求,保存 cookie 会话并确保为所有进一步的请求提供它,并解析所有登录表单的元素并将它们添加到您的登录请求中(但是注意,有2个表单输入,1个属于搜索栏,只解析登录表单的子项,不要混用2个),记得清除/移除特殊输入标签模拟 javascript,如上所述,
然后向验证码 url 发出 GET 请求,确保提供会话 cookie,解决验证码,
然后使用验证码答案、userNamedenc 和 passwordenc 以及所有其他元素发出最终登录请求
从登录页面解析出来......应该可以。现在,以编程方式解决验证码,
captha 看起来不太难,破解它可能是自动化的,但直到有人真正做到这一点,
您可以使用 Deathbycaptcha 为您完成此操作,但请注意,它不是免费服务。
这是一个经过全面测试的工作示例实现,使用我的 hhb_curl 库(来自 https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php)和 Deathbycaptcha api:
<?php
declare(strict_types = 1);
header ( "content-type: text/plain;charset=utf8" );
require_once ('hhb_.inc.php');
const DEATHBYCATPCHA_USERNAME = '?';
const DEATHBYCAPTCHA_PASSWORD = '?';
$hc = new hhb_curl ( '', true );
$hc->setopt(CURLOPT_TIMEOUT,20);// im on a really slow net atm :(
$html = $hc->exec ( 'http://www.mca.gov.in/mcafoportal/login.do' )->getResponseBody (); // cookie session etc
$domd = @DOMDocument::loadHTML ( $html );
$inputs = getDOMDocumentFormInputs ( $domd, true ) ['login'];
$params = [ ];
foreach ( $inputs as $tmp ) {
$params [$tmp->getAttribute ( "name" )] = $tmp->getAttribute ( "value" );
}
assert ( isset ( $params ['userNamedenc'] ), 'username input not found??' );
assert ( isset ( $params ['passwordenc'] ), 'passwordenc input not found??' );
$params ['userName'] = ''; // defaults to "Enter Username", cleared with javascript
unset ( $params ['dscBasedLoginFlag'] ); // removed with javascript
$params ['Cert'] = ''; // cleared to emptystring with javascript
unset ( $params ['newUserRegistration'] ); // removed with javascript
unset ( $params ['SelectCert'] ); // removed with javascript
$params ['userNamedenc'] = 'hGJfsdnk`1t';
$params ['passwordenc'] = '675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
echo 'parsed login parameters: ';
var_dump ( $params );
$captchaRaw = $hc->exec ( 'http://www.mca.gov.in/mcafoportal/getCapchaImage.do' )->getResponseBody ();
$params ['userEnteredCaptcha'] = solve_captcha2 ( $captchaRaw );
// now actually logging in.
$html = $hc->setopt_array ( array (
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => http_build_query ( $params )
) )->exec ( 'http://www.mca.gov.in/mcafoportal/loginValidateUser.do' )->getResponseBody ();
var_dump ( $hc->getStdErr (), $hc->getStdOut () ); // printing debug data
$domd = @DOMDocument::loadHTML ( $html );
$xp = new DOMXPath ( $domd );
$loginErrors = $xp->query ( '//ul[@class="errorMessage"]' );
if ($loginErrors->length > 0) {
echo 'encountered following error(s) logging in: ';
foreach ( $loginErrors as $err ) {
echo $err->textContent, PHP_EOL;
}
die ();
}
echo "logged in successfully!";
/**
* solves the captcha manually, by doing: echo ANSWER>captcha.txt
*
* @param string $raw_image
* raw image bytes
* @return string answer
*/
function solve_captcha2(string $raw_image): string {
$imagepath = getcwd () . DIRECTORY_SEPARATOR . 'captcha.png';
$answerpath = getcwd () . DIRECTORY_SEPARATOR . 'captcha.txt';
@unlink ( $imagepath );
@unlink ( 'captcha.txt' );
file_put_contents ( $imagepath, $raw_image );
echo 'the captcha is saved in ' . $imagepath . PHP_EOL;
echo ' waiting for you to solve it by doing: echo ANSWER>' . $answerpath, PHP_EOL;
while ( true ) {
sleep ( 1 );
if (file_exists ( $answerpath )) {
$answer = trim ( file_get_contents ( $answerpath ) );
echo 'solved: ' . $answer, PHP_EOL;
return $answer;
}
}
}
function solve_captcha(string $raw_image): string {
echo 'solving captcha, hang on, with DEATBYCAPTCHA this usually takes between 10 and 20 seconds.';
{
// unfortunately, CURLFile requires a filename, it wont accept a string, so make a file of it
$tmpfileh = tmpfile ();
fwrite ( $tmpfileh, $raw_image ); // TODO: error checking (incomplete write or whatever)
$tmpfile = stream_get_meta_data ( $tmpfileh ) ['uri'];
}
$hc = new hhb_curl ( '', true );
$hc->setopt_array ( array (
CURLOPT_URL => 'http://api.dbcapi.me/api/captcha',
CURLOPT_POSTFIELDS => array (
'username' => DEATHBYCATPCHA_USERNAME,
'password' => DEATHBYCAPTCHA_PASSWORD,
'captchafile' => new CURLFile ( $tmpfile, 'image/png', 'captcha.png' )
)
) )->exec ();
fclose ( $tmpfileh ); // when tmpfile() is fclosed(), its also implicitly deleted.
$statusurl = $hc->getinfo ( CURLINFO_EFFECTIVE_URL ); // status url is given in a http 300x redirect, which hhb_curl auto-follows
while ( true ) {
// wait for captcha to be solved.
sleep ( 10 );
echo '.';
$json = $hc->setopt_array ( array (
CURLOPT_HTTPHEADER => array (
'Accept: application/json'
),
CURLOPT_HTTPGET => true
) )->exec ()->getResponseBody ();
$parsed = json_decode ( $json, false );
if (! empty ( $parsed->captcha )) {
echo 'captcha solved!: ' . $parsed->captcha, PHP_EOL;
return $parsed->captcha;
}
}
}
function getDOMDocumentFormInputs(\DOMDocument $domd, bool $getOnlyFirstMatches = false): array {
// :DOMNodeList?
$forms = $domd->getElementsByTagName ( 'form' );
$parsedForms = array ();
$isDescendantOf = function (\DOMNode $decendant, \DOMNode $ele): bool {
$parent = $decendant;
while ( NULL !== ($parent = $parent->parentNode) ) {
if ($parent === $ele) {
return true;
}
}
return false;
};
// i can't use array_merge on DOMNodeLists :(
$merged = function () use (&$domd): array {
$ret = array ();
foreach ( $domd->getElementsByTagName ( "input" ) as $input ) {
$ret [] = $input;
}
foreach ( $domd->getElementsByTagName ( "textarea" ) as $textarea ) {
$ret [] = $textarea;
}
foreach ( $domd->getElementsByTagName ( "button" ) as $button ) {
$ret [] = $button;
}
return $ret;
};
$merged = $merged ();
foreach ( $forms as $form ) {
$inputs = function () use (&$domd, &$form, &$isDescendantOf, &$merged): array {
$ret = array ();
foreach ( $merged as $input ) {
// hhb_var_dump ( $input->getAttribute ( "name" ), $input->getAttribute ( "id" ) );
if ($input->hasAttribute ( "disabled" )) {
// ignore disabled elements?
continue;
}
$name = $input->getAttribute ( "name" );
if ($name === '') {
// echo "inputs with no name are ignored when submitted by mainstream browsers (presumably because of specs)... follow suite?", PHP_EOL;
continue;
}
if (! $isDescendantOf ( $input, $form ) && $form->getAttribute ( "id" ) !== '' && $input->getAttribute ( "form" ) !== $form->getAttribute ( "id" )) {
// echo "this input does not belong to this form.", PHP_EOL;
continue;
}
if (! array_key_exists ( $name, $ret )) {
$ret [$name] = array (
$input
);
} else {
$ret [$name] [] = $input;
}
}
return $ret;
};
$inputs = $inputs (); // sorry about that, Eclipse gets unstable on IIFE syntax.
$hasName = true;
$name = $form->getAttribute ( "id" );
if ($name === '') {
$name = $form->getAttribute ( "name" );
if ($name === '') {
$hasName = false;
}
}
if (! $hasName) {
$parsedForms [] = array (
$inputs
);
} else {
if (! array_key_exists ( $name, $parsedForms )) {
$parsedForms [$name] = array (
$inputs
);
} else {
$parsedForms [$name] [] = $tmp;
}
}
}
unset ( $form, $tmp, $hasName, $name, $i, $input );
if ($getOnlyFirstMatches) {
foreach ( $parsedForms as $key => $val ) {
$parsedForms [$key] = $val [0];
}
unset ( $key, $val );
foreach ( $parsedForms as $key1 => $val1 ) {
foreach ( $val1 as $key2 => $val2 ) {
$parsedForms [$key1] [$key2] = $val2 [0];
}
}
}
return $parsedForms;
}
示例用法:在终端中,写php foo.php | tee test.html,几秒钟后它会说:
the captcha is saved in /home/captcha.png
waiting for you to solve it by doing: echo ANSWER>/home/captcha.txt
然后查看 /home/captcha.png 中的验证码,解决它,然后在另一个终端中写入:echo ANSWER>/home/captcha.txt,现在脚本将登录,并将登录的 html 转储到 test.html 中,你可以在浏览器中打开,确认它确实登录,运行时截图:https://image.prntscr.com/image/_AsB_0J6TLOFSZuvQdjyNg.png
还请注意,我制作了 2 个验证码求解器功能,1 个使用 deathbycaptcha api,直到您在第 5 行和第 6 行提供一个有效且已记入贷方的 deathbycaptcha 帐户后才能工作,这不是免费的,另一个 1,solve_captcha2,问你自己解决验证码,并告诉你验证码图像保存在哪里(所以你可以去看看),以及要写什么命令行参数,以提供答案。只需在第 28 行将 solve_captcha 替换为 solve_captcha2 即可手动解决,反之亦然。该脚本已使用solve_captcha2进行了全面测试,但deathbycaptcha求解器未经测试,因为我的deathbycatpcha帐户是空的(如果您想捐款以便我可以实际测试它,请将7美元发送到paypal帐户divinity76@gmail.com链接到这个线程,我会买最便宜的deathbycaptcha信用包并实际测试它)
- 免责声明:我与 deadbycaptcha 没有任何关系(除了几年前我是他们的客户),而且这篇文章没有得到赞助。