简单的 html dom 抓取大的 html 文件答案

【问题标题】：simple html dom scraping large html file简单的 html dom 抓取大的 html 文件
【发布时间】：2013-07-30 03:10:23
【问题描述】：

我需要使用简单的 html dom 抓取一个大的 html 文件（例如：http://www.indianrail.gov.in/mail_express_trn_list.html）。我从一个简单的脚本开始：

<?php
require "simple_html_dom.php";
echo file_get_html('http://www.indianrail.gov.in/mail_express_trn_list.html')->plaintext;
?>

什么都不显示，只是一个空白页面，Apache error.log 文件中有错误消息

 PHP Notice:  Trying to get property of non-object in /var/www/index.php on line 3
 PHP Notice:  Trying to get property of non-object in /var/www/index.php on line 3

同时所有其他页面（例如：http://www.indianrail.gov.in/special_trn_list.html）都可以使用相同的脚本正常工作。

【问题讨论】：

你试过用file_get_contents代替file_get_html吗？ php.net/manual/en/function.file-get-contents.php
我可以复制这个问题，我会深入挖掘并告诉你
@Fred 我试过了，但同样的错误..
@DevZer0 等待回复.. 非常感谢 :)
@krizna 这些关于 SO 的答案可能会有所帮助 stackoverflow.com/a/6006379/1415724 和 stackoverflow.com/a/6519443/1415724

标签： php html parsing dom file-get-contents

【解决方案1】：

问题似乎是在simple_html_dom 中定义的MAX_FILE_SIZE。

您可以通过编辑 simple_html_dom.php 文件中的define('MAX_FILE_SIZE', 600000); 行来调整它。

【讨论】：

我试过define('MAX_FILE_SIZE', 6000000000000000000); ..但没有运气..仍然是同样的错误..谢谢
定义一个真实的数字，我设置为12600000
它接缝工作，但我现在得到不同的错误..退出信号分段错误（11）