【问题标题】:Seek through FileStream then using StreamReader to read from there通过 FileStream 查找,然后使用 StreamReader 从那里读取
【发布时间】:2016-07-15 14:08:02
【问题描述】:

所以我希望能够在 fileStream 中找到一个点,然后使用 StreamReader 向前读取。然后再次向前搜索,并使用 StreamReader 读取另一块数据。

const int BufferSize = 4096;
var buffer = new char[BufferSize];

var endpoints = new List<long>();

using (var fileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
{ 
    var fileLength = fileStream.Length;

    var seekPositionCount = fileLength / concurrentReads;

    long currentOffset = 0;
    for (var i = 0; i < concurrentReads; i++)
    {
        var seekPosition = seekPositionCount + currentOffset;

        // seek the file forward
        fileStream.Seek(seekPosition, SeekOrigin.Current);

        // setting true at the end is very important, keeps the underlying fileStream open.
        using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize, true))
        {
            // this also seeks the file forward the amount in the buffer...
            int bytesRead;
            var totalBytesRead = 0;
            while ((bytesRead = await streamReader.ReadAsync(buffer, 0, buffer.Length)) > 0)
            {
                totalBytesRead += bytesRead;

                var found = false;

                var gotR = false;

                for (var j = 0; j < buffer.Length; j++)
                {
                    if (buffer[j] == '\r')
                    {
                        gotR = true;
                        continue;
                    }

                    if (buffer[j] == '\n' && gotR)
                    {
                        // so we add the total bytes read, minus the current buffer amount read, then add how far into the buffer we actually read.
                        seekPosition += totalBytesRead - BufferSize + j;
                        endpoints.Add(seekPosition);
                        found = true;
                        break;
                    }
                }

                if (found) break;
            }
        }
        
        // we need to seek to the position we got to in the StreamReader (but not going by how much was read).
        fileStream.Seek(seekPosition, SeekOrigin.Current);

        currentOffset += seekPosition;
    }
}

return endpoints;

但是,我在endpoints 中有两个条目,然后它退出了。

(bytesRead = await streamReader.ReadAsync(buffer, 0, buffer.Length)) > 0

您传递给ReadAsync 的参数我认为仅与缓冲区有关,因此我认为index 参数是说,在index 处填充buffer

我无法从Reference Source 看出这个值是如何使用的。

我假设(并且找不到支持的证据)当您打开 StreamReader 时,它使用底层 Stream 作为指导,因此当您要求读取一些字节时,它将从底层Stream的位置...

但是我所做的结果并没有显示,它们似乎表明StreamReader 每次都从Stream 的开头开始-但是,我找不到证据支持它就是它的方式......

寻求

我对寻求的理解是否正确,如果我称之为寻求

fileStream.Seek(seekPosition, SeekOrigin.Current);

如果文件在300,我想找600,上面的变量seekPosition应该是600??

ReferenceSource 会另说:

else if (origin == SeekOrigin.Current) {
    // Don't call FlushRead here, which would have caused an infinite
    // loop.  Simply adjust the seek origin.  This isn't necessary
    // if we're seeking relative to the beginning or end of the stream.
    offset -= (_readLen - _readPos);
}

【问题讨论】:

  • StreamReader 保留自己的缓冲区,您必须调用它的 DiscardBufferedData() 方法来强制它与 FileStream 重新同步。
  • 每次读完之后呢?
  • @HansPassant 啊源代码说
  • @HansPassant 很有趣,因为它说如果你需要重新阅读,但是如果你不断地向前寻找,为什么它会效率低下......

标签: c# filestream streamreader


【解决方案1】:

感谢 Hans Passant,我得到了答案:

var buffer = new char[BufferSize];

var endpoints = new List<long>();

using (var fileStream = this.CreateMultipleReadAccessFileStream(fileName))
{
    var fileLength = fileStream.Length;

    var seekPositionCount = fileLength / concurrentReads;

    long currentOffset = 0;
    for (var i = 0; i < concurrentReads; i++)
    {
        var seekPosition = seekPositionCount + currentOffset;

        // seek the file forward
        // fileStream.Seek(seekPosition, SeekOrigin.Current);

        // setting true at the end is very important, keeps the underlying fileStream open.
        using (var streamReader = this.CreateTemporaryStreamReader(fileStream))
        {
            // this is poor on performance, hence why you split the file here and read in new threads.
            streamReader.DiscardBufferedData();
            // you have to advance the fileStream here, because of the previous line
            streamReader.BaseStream.Seek(seekPosition, SeekOrigin.Begin);
            // this also seeks the file forward the amount in the buffer...
            int bytesRead;
            var totalBytesRead = 0;
            while ((bytesRead = await streamReader.ReadAsync(buffer, 0, buffer.Length)) > 0)
            {
                totalBytesRead += bytesRead;

                var found = false;

                var gotR = false;

                for (var j = 0; j < buffer.Length; j++)
                {
                    if (buffer[j] == '\r')
                    {
                        gotR = true;
                        continue;
                    }

                    if (buffer[j] == '\n' && gotR)
                    {
                        // so we add the total bytes read, minus the current buffer amount read, then add how far into the buffer we actually read.
                        seekPosition += totalBytesRead - BufferSize + j;
                        endpoints.Add(seekPosition);
                        found = true;
                        break;
                    }
                    // if we have found new line then move the position to 
                }

                if (found) break;
            }
        }

        currentOffset = seekPosition;
    }
}

return endpoints;

注意新部分,而不是这样做两次

fileStream.Seek(seekPosition, SeekOrigin.Current);

我现在使用SeekOrigin.Begin 并使用StreamReader 来推进底层基础流:

// this is poor on performance, hence why you split the file here and read in new threads.
streamReader.DiscardBufferedData();
// you have to advance the fileStream here, because of the previous line
streamReader.BaseStream.Seek(seekPosition, SeekOrigin.Begin);

DiscardBufferedData 表示我一直在使用底层流位置。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2012-03-18
    • 1970-01-01
    • 2012-10-07
    • 2016-08-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多