多线程运行速度比单进程慢答案

【问题标题】：MultiThread runs slower than single process多线程运行速度比单进程慢
【发布时间】：2016-04-07 23:24:45
【问题描述】：

我被要求创建一个简单的程序来创建 1000 个文本文件，每个文本文件的行数都是随机的，通过多线程\单进程计算有多少行。而不是删除那些文件。

现在在测试过程中发生了一件奇怪的事情 - 所有文件的线性计数总是比以多线程方式计数要快一些，这在我的课堂圈子中引发了相当多的学术理论会议。

当使用Scanner 读取所有文件时，一切正常 - 1000 个文件以大约 500 毫秒的线性时间和 400 毫秒的线程时间读取

然而，当我使用 BufferedReader 时，时间下降到大约 110 毫秒线性和 130 毫秒线程。

代码的哪一部分导致了这个瓶颈，为什么？

编辑：澄清一下，我不是在问为什么Scanner 的工作速度比BufferedReader 慢。

完整的可编译代码：（尽管您应该更改文件创建路径输出）

import java.io.*;
import java.util.Random;
import java.util.Scanner;

/**
 * Builds text files with random amount of lines and counts them with 
 * one process or multi-threading.
 * @author Hazir
 */// CLASS MATALA_4A START:
public class Matala_4A {

    /* Finals: */
    private static final String MSG = "Hello World";

    /* Privates: */
    private static int count;
    private static Random rand;

    /* Private Methods: */ /**
     * Increases the random generator.
     * @return The new random value.
     */
    private static synchronized int getRand() {
        return rand.nextInt(1000);
    }

    /**
     * Increments the lines-read counter by a value.
     * @param val The amount to be incremented by.
     */
    private static synchronized void incrementCount(int val) {
        count+=val;
    }

    /**
     * Sets lines-read counter to 0 and Initializes random generator 
     * by the seed - 123.
     */
    private static void Initialize() {
        count=0;
        rand = new Random(123);
    }

    /* Public Methods: */ /**
     * Creates n files with random amount of lines.
     * @param n The amount of files to be created.
     * @return String array with all the file paths.
     */
    public static String[] createFiles(int n) {
        String[] array = new String[n];
        for (int i=0; i<n; i++) {
            array[i] = String.format("C:\\Files\\File_%d.txt", i+1);
            try (   // Try with Resources: 
                    FileWriter fw = new FileWriter(array[i]); 
                    PrintWriter pw = new PrintWriter(fw);
                    ) {
                int numLines = getRand();
                for (int j=0; j<numLines; j++) pw.println(MSG);
            } catch (IOException ex) {
                System.err.println(String.format("Failed Writing to file: %s", 
                        array[i]));
            }
        }
        return array;
    }

    /**
     * Deletes all the files who's file paths are specified 
     * in the fileNames array.
     * @param fileNames The files to be deleted.
     */
    public static void deleteFiles(String[] fileNames) {
        for (String fileName : fileNames) {
            File file = new File(fileName);
            if (file.exists()) {
                file.delete();
            }
        }
    }

    /**
     * Creates numFiles amount of files.<br>
     * Counts how many lines are in all the files via Multi-threading.<br>
     * Deletes all the files when finished.
     * @param numFiles The amount of files to be created.
     */
    public static void countLinesThread(int numFiles) {
        Initialize();
        /* Create Files */
        String[] fileNames = createFiles(numFiles);
        Thread[] running = new Thread[numFiles];
        int k=0;
        long start = System.currentTimeMillis();
        /* Start all threads */
        for (String fileName : fileNames) {
            LineCounter thread = new LineCounter(fileName);
            running[k++] = thread;
            thread.start();
        }
        /* Join all threads */
        for (Thread thread : running) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                // Shouldn't happen.
            }
        }
        long end = System.currentTimeMillis();
        System.out.println(String.format("threads time = %d ms, lines = %d",
                end-start,count));
        /* Delete all files */
        deleteFiles(fileNames);
    }

    @SuppressWarnings("CallToThreadRun")
    /**
     * Creates numFiles amount of files.<br>
     * Counts how many lines are in all the files in one process.<br>
     * Deletes all the files when finished.
     * @param numFiles The amount of files to be created. 
     */
    public static void countLinesOneProcess(int numFiles) {
        Initialize();
        /* Create Files */
        String[] fileNames = createFiles(numFiles);
        /* Iterate Files*/
        long start = System.currentTimeMillis();
        LineCounter thread;
        for (String fileName : fileNames) {
            thread = new LineCounter(fileName);
            thread.run(); // same process
        }
        long end = System.currentTimeMillis();
        System.out.println(String.format("linear time = %d ms, lines = %d",
                end-start,count));
        /* Delete all files */
        deleteFiles(fileNames);
    }

    public static void main(String[] args) {
        int num = 1000;
        countLinesThread(num);
        countLinesOneProcess(num);
    }

    /**
     * Auxiliary class designed to count the amount of lines in a text file.
     */// NESTED CLASS LINECOUNTER START:
    private static class LineCounter extends Thread {

        /* Privates: */
        private String fileName;

        /* Constructor: */
        private LineCounter(String fileName) {
            this.fileName=fileName;
        }

        /* Methods: */

        /**
         * Reads a file and counts the amount of lines it has.
         */ @Override
        public void run() {
            int count=0;
            try ( // Try with Resources:
                    FileReader fr = new FileReader(fileName);
                    //Scanner sc = new Scanner(fr);
                    BufferedReader br = new BufferedReader(fr);
                    ) {
                String str;
                for (str=br.readLine(); str!=null; str=br.readLine()) count++;
                //for (; sc.hasNext(); sc.nextLine()) count++;
                incrementCount(count);
            } catch (IOException e) {
                System.err.println(String.format("Failed Reading from file: %s", 
                fileName));            
            }
        }
    } // NESTED CLASS LINECOUNTER END;
} // CLASS MATALA_4A END;

【问题讨论】：

@kstandell 不，countLinesOneProcess() 不是多线程的。它调用线程的.run() 函数而不使用.start()，因此它只是作为常规类特定方法运行。
这不是您对 Java 进行基准测试的方式！这些数字完全没有意义。
多么复杂的作业

标签： java multithreading java.util.scanner bufferedreader

【解决方案1】：

瓶颈是磁盘。

您每次只能使用一个线程访问磁盘，因此使用多个线程无济于事，而且线程切换所需的超时会降低您的全局性能。

仅当您需要拆分工作以等待不同源（例如网络和磁盘，或两个不同的磁盘，或许多网络流）上的长时间 I/O 操作，或者您有 cpu 密集型操作时，使用多线程才是有趣的可以在不同的内核之间拆分。

请记住，对于一个好的多线程程序，您需要始终考虑：

在线程之间切换上下文时间
长 I/O 操作可以并行或不并行完成
是否存在用于计算的密集 CPU 时间
cpu 计算是否可以拆分为子问题
线程之间共享数据的复杂性（信号量或同步）
与单线程应用程序相比，多线程代码难以读取、写入和管理

【讨论】：

即使我使用的是 SSD 驱动器？
@GiladMitrani 这取决于 I/O 的确切组成。如果读取磁盘比处理更快，那么也许会有好处。但是，在大多数情况下，即使是 SSD 也比像行计数这样简单的进程要慢，而且与多线程带来的不利影响相比，调度 IO 的操作系统开销也不会给您带来好处。
如果计算时间非常快（与访问磁盘相比）是的，使用多线程没有任何改进。

【解决方案2】：

可能有不同的因素：

最重要的是避免同时从多个线程访问磁盘（但由于您使用的是 SSD，因此您可能会侥幸逃脱）。然而，在普通硬盘上，从一个文件切换到另一个文件可能会花费您 10 毫秒的寻道时间（取决于数据的缓存方式）。
1000 线程太多，尝试使用核心数 * 2。太多时间只会浪费切换上下文。
尝试使用线程池。总时间在 110 毫秒到 130 毫秒之间，其中一部分来自创建线程。
通常在测试中多做一些工作。计时 110 毫秒并不总是那么准确。还取决于当时正在运行的其他进程或线程。
尝试切换测试的顺序，看看是否会产生影响（缓存可能是一个重要因素）
```
countLinesThread(num);
countLinesOneProcess(num);
```

另外，根据系统，currentTimeMillis() 的分辨率可能为 10 到 15 毫秒。所以短期运行的时间不是很准确。

long start = System.currentTimeMillis();
long end = System.currentTimeMillis();

【讨论】：

切换测试顺序导致整个计数以稳定的方式产生更快的线程时间。您能向我介绍一下发生了什么样的“缓存”，我在哪里可以看到它？
@GiladMitrani - 这是操作系统磁盘缓存。在第一次访问时，数据或磁盘页面被读入缓存。之后的所有访问都会直接从缓存中读取，这样会快很多（尤其是使用普通硬盘时）
仅供参考：currentTimeMillis() 的准确性可以通过在启动程序时在单独的线程中运行 Thread.sleep(Long.MAX_VALUE) 来提高
@ferrybig - System.nanoTime() 在这种情况下可能是更好的解决方案。 code.google.com/p/javasimon/wiki/SystemTimersGranularity

【解决方案3】：

使用的线程数非常重要。尝试在 1000 个线程之间切换的单个进程（您为每个文件创建了一个新线程）可能是速度较慢的主要原因。

尝试使用假设 10 个线程读取 1000 个文件，然后您会看到速度明显提高

【讨论】：

每个文件使用一个线程由分配决定。然而，当我使用 Scanner 时，速度较慢的方法，甚至我使用 1000 个线程的事实都让我缩短了一些时间
@GiladMitrani 是它的简单伴侣，在多线程中使用 Scanner 更快的原因意味着 Switching Thread 实际上比文件读取更快。

【解决方案4】：

如果计算所需的实际时间与 I/O 所需的时间相比可以忽略不计，那么潜在的多线程优势也可以忽略不计：一个线程能够很好地使 I/O 饱和，然后将执行非常快速计算；更多的线程不能加快速度。相反，将应用通常的线程开销，加上 I/O 实现中的锁定惩罚可能实际上会降低吞吐量。

我认为，当处理数据块所需的 CPU 时间比从磁盘获取数据块的时间长时，潜在的好处是最大的。在这种情况下，除了当前读取的线程（如果有的话）之外的所有线程都可以计算，并且执行速度应该随着内核数量而很好地扩展。尝试从文件中检查大的素数候选者或破解加密行（这在某种程度上，相当于同一件事，够傻的）。

【讨论】：