【问题标题】:How can I return a record from a CSV file using the byte position of line?如何使用行的字节位置从 CSV 文件返回记录?
【发布时间】:2022-11-16 20:53:54
【问题描述】:

我有一个 172 MB、一百万行和 16 列的 assets.csv 文件。我想使用 offset -> bytes/line/record 阅读它。在下面的代码中,我使用的是字节值。

我已经存储了所需的位置(record.postion.bytes() in assets_index.csv),我想使用保存的偏移量读取 assets.csv 中的特定行。

我能够获得输出,但我觉得必须有更好的方法来根据字节位置从 CSV 文件中读取。

请指教。我是编程新手,也是 Rust 新手,使用教程学到了很多东西。

assets.csv 的格式如下:

asset_id,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation
1000001,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000

我使用另一个函数来获取Position { byte: 172999933, line: 1000000, record: 999999 }

assets_index.csv 的格式如下:

asset_id,offset_inbytes
1999999,172999933
fn read_from_position() -> Result<(), Box<dyn Error>> {
    let asset_pos = 172999933 as u64;

    let file_path = "assets.csv";

    let mut rdr = csv::ReaderBuilder::new()
        .flexible(true)
        .from_path(file_path)?;

    let mut wtr = csv::Writer::from_writer(io::stdout());

    let mut record = csv::ByteRecord::new();

    while rdr.read_byte_record(&mut record)? {
        
        let pos = &record.position().expect("position of record");

        if pos.byte() == asset_pos
        { 
            wtr.write_record(&record)?; 
            break;
        }     
    }

    wtr.flush()?;

    Ok(())
}
$ time ./target/release/testcsv
1999999,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000

Time elapsed in readcsv() is: 239.290125ms

./target/release/testcsv  0.22s user 0.02s system 99% cpu 0.245 total

【问题讨论】:

    标签: csv rust offset


    【解决方案1】:

    除了使用 from_path,您还可以使用 from_readerFile,并在创建 CsvReader 之前在该文件中查找:

    use std::{error::Error, fs, io::{self, Seek}};
    
    fn read_from_position() -> Result<(), Box<dyn Error>> {
        let asset_pos = 0x115 as u64; // offset to only record in example
        let file_path = "assets.csv";
    
        let mut f = fs::File::open(file_path)?;
        f.seek(io::SeekFrom::Start(asset_pos))?;
        let mut rdr = csv::ReaderBuilder::new()
            .flexible(true)
            .from_reader(f);
    
        let mut wtr = csv::Writer::from_writer(io::stdout());
        let mut record = csv::ByteRecord::new();
    
        rdr.read_byte_record(&mut record)?;
        wtr.write_record(&record)?;
        wtr.flush()?;
        Ok(())
    }
    

    然后读取的第一个记录将是您要查找的记录。

    【讨论】:

      猜你喜欢
      • 2012-08-20
      • 1970-01-01
      • 2010-09-10
      • 1970-01-01
      • 2019-01-16
      • 2014-03-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多