在 Java 中更快地将 txt 文件读取到 mySQL 数据库答案

【问题标题】：Read txt-files to mySQL database faster in Java在 Java 中更快地将 txt 文件读取到 mySQL 数据库
【发布时间】：2015-03-24 04:48:36
【问题描述】：

我正在尝试读取超过 17 000 个文件（每个文件包含 100 到 23 000 行）并将数据解析到 mysql 数据库中。问题是它这样做太慢了，我不知道瓶颈在哪里。

private void readFile() { 
    PreparedStatement prepStatement = null;

    String queryInsItem = "INSERT IGNORE INTO item VALUES(?)";

    String queryInsUser = "INSERT IGNORE INTO user VALUES(?)";

    String queryInsRating = "INSERT IGNORE INTO rating VALUES(?,?,?,?)";

    try {
        int x = 1;
        int itemID = 0;
        int userID = 0;
        int rating = 0;
        java.util.Date date = null;
        java.sql.Date sqlDate = null;
        DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd", Locale.ENGLISH);
        String line = null;

        conn.setAutoCommit(false);
        System.out.println("Loading...");
          File dir = new File("src/bigdata/training_set/");
          File[] directoryListing = dir.listFiles();
          if (directoryListing != null) {
            for (File itemFile : directoryListing) {
                in = new BufferedReader(new FileReader(itemFile));
                line = in.readLine();
                itemID = Integer.parseInt(line.substring(0, line.length()-1));
                userID = 0;
                rating = 0;
                date = null;
                sqlDate = null;

                // Add to item table
                prepStatement = conn.prepareStatement(queryInsItem);
                prepStatement.setInt(1, itemID);
                prepStatement.executeUpdate();
                conn.commit();
                prepStatement.close();


                while ((line = in.readLine()) != null) {
                    // Splits the line to corresponding variables
                    userID = Integer.parseInt(line.substring(0, line.indexOf(",")));
                    rating = Integer.parseInt(line.substring(line.indexOf(",")+1, line.lastIndexOf(",")));
                    date= dateFormat.parse(line.substring(line.lastIndexOf(",")+1, line.length()));

                    sqlDate = new java.sql.Date(date.getTime());

                    // Add to user table
                    prepStatement = conn.prepareStatement(queryInsUser);
                    prepStatement.setInt(1, userID);
                    prepStatement.executeUpdate();
                    conn.commit();
                    prepStatement.close();

                    // Add to rating table
                    prepStatement = conn.prepareStatement(queryInsRating);
                    prepStatement.setInt(1, userID);
                    prepStatement.setInt(2, itemID);
                    prepStatement.setInt(3, rating);
                    prepStatement.setDate(4, sqlDate);
                    prepStatement.executeUpdate();
                    conn.commit();
                    prepStatement.close();

                }
                in.close();
                System.out.println("File " + x++ +" done.");
            }
          }


    } catch (IOException | ParseException | SQLException e) {e.printStackTrace();}

    System.out.println("Done.");
}

我尝试先str.split 行，然后将其更改为indexOf/lastIndexOf，但正如19486077 中提到的那样，没有明显的改进。同一线程中的其他人提到使用线程，但在我的情况下这是正确的方法吗？

这是原始数据的 sn-p：

5317:
2354291,3,2005-07-05
185150,2,2005-07-05
868399,3,2005-07-05

上面的意思是：

[item_id]:
[user_id],[rating],[date]
[user_id],[rating],[date]
[user_id],[rating],[date]

【问题讨论】：

MySQL 原生支持将文本文件加载到数据库表中，并且通常比使用插入语句编写代码要快得多：dev.mysql.com/doc/refman/5.1/en/load-data.html 也许您更适合将文件转换为格式可以通过 LOAD DATA INFILE 加载到 MySQL 中。根据我的经验，它更快。

标签： java mysql performance readfile text-parsing

【解决方案1】：

如果您在这些表上有AUTO_INCREMENT PRIMARY KEY，请注意INSERT IGNORE 会像疯了一样烧毁ID。
“批量”插入。如果收集 100-1000 行，用这些行构建一个 INSERT，然后执行语句，INSERTion 的运行速度将提高 10 倍。
不要尝试一次批处理 23,000 行，您可能会遇到一些问题（很难预测是什么问题）。
另一方面，如果您可以对这些文件执行LOAD DATA，您就可以摆脱所有的解析代码！它的运行速度至少与批量插入一样快。

【讨论】：

我选择了 LOAD DATA 方法。现在至少快 60%！谢谢！