【问题标题】:How to Fetch Webpage Through TCP socket using HTTP Request in JAVA如何在 JAVA 中使用 HTTP 请求通过 TCP 套接字获取网页
【发布时间】:2022-01-26 12:39:20
【问题描述】:

我的大学任务是使用 TCP 套接字和 HTTP GET 请求通过 URL 从任何 Web 服务器获取网页。

我没有收到来自任何服务器的HTTP/1.0 200 OK 响应。

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {

    public static void main(String[] args) {
            Scanner inpt = new Scanner(System.in);
                System.out.print("Enter URL: ");
                String url = inpt.next();
                TCPConnect(url); 
            }
   public static void TCPConnect(String url) {
        try {
            String hostname = new URL(url).getHost();
            System.out.println("Loading contents of Server: " + hostname);
            InetAddress ia = InetAddress.getByName(hostname);
            String ip = ia.getHostAddress();
            System.out.println(ip + " is IP Adress for  " + hostname);
            String path = new URL(url).getPath();
            System.out.println("Requested Path on the server: " + path);
            Socket socket = new Socket(ip, 80);
            // Create input and output streams to read from and write to the server
            PrintStream out = new PrintStream(socket.getOutputStream());
            BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
            // Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
            if (hostname ! = url) {
                //Request Line
                out.println("GET " + path + " HTTP/1.1");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            } else {
                //Request Line
                out.println("GET / HTTP/1.0");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            }
            // Read data from the server until we finish reading the document
            String line = in.readLine();
            while (line != null) {
                System.out.println(line);
                line = in.readLine();
            }
            // Close our streams
            in.close();
            out.close();
            socket.close();
        } catch (Exception e) {
            System.out.println("Invalid URl");
            e.printStackTrace();
        }
    }
}

我创建了一个 TCP 套接字,并将我从 InetAddress.getHostAddress() 收到的 IP 地址和端口 80 传递给 Web 服务器,并使用 getPath()getHost() 将路径和主机名与 URL 分开,并且在 HTTP GET 请求中使用相同的路径和主机名。

来自服务器的响应:

Enter URL: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
    Loading contents of Server: stackoverflow.com
    151.101.65.69 is IP Adress for  stackoverflow.com
    Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
    HTTP/1.1 301 Moved Permanently
    cache-control: no-cache, no-store, must-revalidate
    location: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
    x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
    feature-policy: microphone 'none'; speaker 'none'
    content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
    Accept-Ranges: bytes
    Transfer-Encoding: chunked
    Date: Mon, 27 Dec 2021 15:00:17 GMT
    Via: 1.1 varnish
    Connection: keep-alive
    X-Served-By: cache-qpg1263-QPG
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1640617217.166650,VS0,VE338
    Vary: Fastly-SSL
    X-DNS-Prefetch-Control: off
    Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
    
    0

我的要求是获取此网页的 HTML 源代码,以及一个HTTP/1.0 200 OK 响应。

【问题讨论】:

  • 另外,HTTPS 不使用普通套接字进行通信。因此,您应该使用SSLSocket 进行HTTPS 或查找没有HTTPS 的站点。
  • @geobreze,我没有使用 SSL 套接字并点击“https”。谢谢你成功了。

标签: java html http sockets tcpclient


【解决方案1】:

发生这种情况是因为您使用的是带有硬编码端口80 的普通Socket。这意味着,与在输入中使用 httphttps url 无关,您通过不安全协议 http 请求。

在这种情况下,服务器会告诉您,正如 Samuel L. Jackson 所说“嘿,mf!您正试图通过 f 不安全协议 HTTP 与我联系。使用安全的 mf,f HTTPS .”,因此,它以301 响应(这只是表示“使用此网址,而不是原始网址”),Location 标头指向正确的网址,https 一个。

显然301Location 是同一个 URL,但事实并非如此,因为在您的代码中,您正在硬编码 http,并且服务器响应重定向到 https

要使您的代码使用https,而不是普通的Socket,请使用:

SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);

请注意我没有使用ip,因为对于https,您需要证书对应于域,如果您使用IP,您将获得CertificateExpiredException .

现在,是使用Socket 还是SSLSocket,您必须根据用户输入以编程方式进行管理。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-17
    • 1970-01-01
    • 2021-07-09
    • 1970-01-01
    相关资源
    最近更新 更多