【问题标题】:How I get the html all source?我如何获得 html 所有源代码?
【发布时间】:2012-04-11 12:21:30
【问题描述】:

我的意思是我使用下面的代码从 url 获取 html 源代码。但是不包含所有source.Buffersize是问题还是字符串大小问题?

HttpURLConnection connection; 
            OutputStreamWriter request = null; 

                 URL url = null;    
                 String response = null;          
                 String parameters = "aranan="+et.getText();    

                 try 
                 { 
                     url = new URL("http://www.fragmanfan.com/arama.asp"); 
                     connection = (HttpURLConnection) url.openConnection(); 
                     connection.setDoOutput(true); 
                     connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); 

                     request = new OutputStreamWriter(connection.getOutputStream()); 
                     request.write(parameters); 
                     request.flush();             
                     String line = "";                
                     InputStreamReader isr = new InputStreamReader(connection.getInputStream()); 
                     BufferedReader reader = new BufferedReader(isr); 
                     StringBuilder sb = new StringBuilder(); 
                     while ((line = reader.readLine()) != null) 
                     { 
                         sb.append(line + "\n"); 
                     } 
                     // Response from server after login process will be stored in response variable.                 
                     response = sb.toString(); 
                     // You can perform UI operations here 
                     browser.loadDataWithBaseURL(null, response,"text/html", "UTF-8", null); 

                     isr.close(); 
                     reader.close(); 

                 } 
                 catch(IOException e) 
                 { 
                     // Error 
                 } 




        } 
    }); 

我尝试了一些类似BufferedReader reader = new BufferedReader(isr,8192); 但它不起作用。

【问题讨论】:

  • 您是否尝试过使用普通的独立 java 应用程序使用相同的 url 获取源代码?
  • 没有。不在 java 中。但是我没有得到完整的html页面源代码。几乎只有892个字。字符串大小问题?还是缓冲?我该如何解决这个问题?
  • 先尝试在java中运行相同的逻辑。这样你就可以很容易地弄清楚发生了什么。如果在java中有效,它应该在android中也有效
  • 我该如何尝试。它是相同的代码吗?它适用于android,但它不包含所有html源:(
  • 我们使用 double 而不是 int 。那么有没有比我使用的字符串更大的字符串而不是字符串。

标签: android string inputstream bufferedreader


【解决方案1】:

创建一个 WebRequest 类。 比提出您的要求并得到回应。 我试过那个网站,它可以工作。

WebRequest response = new WebRequest("http://www.fragmanfan.com/arama.asp?aranan=kurtlar", PostType.GET);
String htmltext = response.Get();
browser.loadDataWithBaseURL(null, htmltext, "text/html", "UTF-8", null);

WebRequest.class

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.UnknownHostException;
import java.nio.charset.Charset;
import org.apache.http.HttpResponse;
import org.apache.http.client.CookieStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;

public class WebRequest {
  public enum PostType{
    GET, POST;
  }

  public String _url;
  public String response = "";
  public PostType _postType;
  CookieStore _cookieStore = new BasicCookieStore();

  public WebRequest(String url) {
    _url = url;
    _postType = PostType.POST;
  }

  public WebRequest(String url, CookieStore cookieStore) {
    _url = url;
    _cookieStore = cookieStore;
    _postType = PostType.POST;
  }

  public WebRequest(String url, PostType postType) {
    _url = url;
    _postType = postType;
  }

  public String Get() {
    HttpClient httpclient = new DefaultHttpClient();

    try {
      // Create local HTTP context
      HttpContext localContext = new BasicHttpContext();

      // Bind custom cookie store to the local context
      localContext.setAttribute(ClientContext.COOKIE_STORE, _cookieStore);

      HttpResponse httpresponse;
      if (_postType == PostType.POST)
      {
        HttpPost httppost = new HttpPost(_url);
        httpresponse = httpclient.execute(httppost, localContext);
      }
      else
      {
        HttpGet httpget = new HttpGet(_url);
        httpresponse = httpclient.execute(httpget, localContext);
      }

      StringBuilder responseString = inputStreamToString(httpresponse.getEntity().getContent());

      response = responseString.toString();
    }
    catch (UnknownHostException e) {
      e.printStackTrace();
    }
    catch (Exception e) {
      e.printStackTrace();
    }
    finally {
      // When HttpClient instance is no longer needed,
      // shut down the connection manager to ensure
      // immediate deallocation of all system resources
      httpclient.getConnectionManager().shutdown();
    }

    return response;
  }

  private StringBuilder inputStreamToString(InputStream is) throws IOException {
    String line = "";
    StringBuilder total = new StringBuilder();

    // Wrap a BufferedReader around the InputStream
    BufferedReader rd = new BufferedReader(new InputStreamReader(is,Charset.forName("iso-8859-9")));
    // Read response until the end
    while ((line = rd.readLine()) != null) {
      total.append(line);
    }

    // Return full string
    return total;
  }
}

【讨论】:

  • 但它仍然不包含所有源页面。尝试 WebRequest response = new WebRequest("fragmanfan.com/fan/film-1", PostType.GET);这样你就看不到所有页面了
  • 我试过了,我看到了整个页面?为什么你认为你得到部分页面?这是我在 WebView 上看到的页面页脚的屏幕截图。 img14.imageshack.us/img14/573/resim2cf.png
  • 我也调试记录了请求返回的最后 100 个字符。您可以看到页面结束的
【解决方案2】:

我遇到了一些问题。

我用Log.i("tag", html) 但是记录器有一个最大长度的消息。我的 html 文本被剪裁了。 有两种解决方案:

  1. 将您的 html 拆分成小块
  2. 使消息的最大长度更大,如本文所示: link

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-04-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-04-06
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多