【问题标题】:Renaming multiple file paths INSIDE a file with updated file paths在具有更新文件路径的文件内重命名多个文件路径
【发布时间】:2022-01-09 11:43:19
【问题描述】:

我有一个名为experiments.txt 的文件,其中包含一个名为script.py 的python 脚本的参数。

../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True

假设文件夹结构和文件如下图所示,注意data/中的csv文件与experiments.txt中的文件相同。

data
 |___20211117_09-10-50CST_raw_fold_results-mlr.csv
 |___20211117_09-11-35CST_raw_fold_results-rf.csv
src
 |___script.py
 |___experiments.txt

我想替换第一个参数

(例如,../data/20211015_08-09-50CST_raw_fold_results-mlr.csv

对于experiments.txt 中的每一行,使用更新后的数据使experiments.txt(或创建一个新文件,如experiments-2.txt)变为

..\data\20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
..\data\20211117_09-11-35CST_raw_fold_results-rf.csv --plot True

我知道我可以使用 Python 编写一个笨拙的解决方案,但我的解决方案充其量似乎不是最理想的,最坏的情况是设计非常糟糕。 如何在 bash 中执行所需的任务(因为它似乎很适合该任务,但我不确定如何)?

# This sample solution is written in `.ipynb` in the `src/` directory
import os
from pathlib import Path

cwd = os.getcwd()  # src
replacement_fnames = [file for file in os.listdir(os.path.join(cwd, '..', 'data'))]
with open('experiments.txt', 'r') as fobj:
    lines = [line.strip() for line in fobj.readlines()]

    # The replacement lines for the file `experiments-2.txt` will be
    # appended to this empty string
    write_str = ''

    for line in lines:

        # A line in the file is of the form
        # `path <SPACE> opts`, therefore splitting the line into a
        # list delimited by a space `' '` allows access to the `path`
        # by indexing 0
        space_separated_line = line.split(' ')
        cur_path = Path(space_separated_line[0])
        cur_fname = Path(cur_path).name

        # File names are separated by model name... in this case
        # `mlr` and `rf`... by splitting the file name into a list
        # delimited by `-`, then the last element of that list is the
        # name of the model
        # e.g., cur_fname = 20211015_08-09-50CST_raw_fold_results-mlr.csv
        # cur_fname.split('-') --> ['20211015_08-09-50CST_raw_fold_results', 'mlr.csv']
        cur_fname_model_name = cur_fname.split('-')[-1] 

        for replacement_fname in replacement_fnames:

            # Extract model name from the replacement fname in the same
            # fashion as done for cur_fname
            replacement_fname_model_name = replacement_fname.split('-')[-1]

            if replacement_fname_model_name == cur_fname_model_name:
                space_separated_line[0] = os.path.join(Path(cur_path).parent, replacement_fname)
                
        write_str += ' '.join(space_separated_line) + '\n'

print('Original:')
print('\n'.join(lines))
print()
print('Replaced:')
print(write_str)

with open('experiments-2.txt', 'w') as fobj:
    fobj.write(write_str)

## Output
# Original:
# ../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
# ../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True

# Replaced:
# ..\data\20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
# ..\data\20211117_09-11-35CST_raw_fold_results-rf.csv --plot True

【问题讨论】:

    标签: python bash file


    【解决方案1】:

    假设文件名,例如20211015_08-09-50CST_raw_fold_results-mlr.csv 可以分为变量前缀20211015_08-09-和固定 子串50CST_raw_fold_results-mlr.csv,我们可以测试现有的 data 目录中的文件使用固定子字符串。
    那你试试看:

    #!/bin/bash
    
    declare -A map                          # associative array to map filenames
    for f in ../data/*.csv; do              # find the csv filenames in the ../data dir
        f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$f")"
                                            # remove the variable prefix (dirname and the date)
        map[$f2]=$f                         # map the fixed substring of the filename to the fullpath
    done
    
    while read -r path opts; do             # read line of experiments.txt and break into variables
        f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$path")"
                                            # remove the variable prefix (dirname and the date)
        f=${map[$f2]}                       # map filename via the fixed substring
        if [[ -n $f ]]; then                # if the variable $f is not empty, the file exists
            echo "${f//\//\\} $opts"        # replace slashes with backslashes and write to "experiments-2.txt"
        fi
    done < experiments.txt > experiments-2.txt
    
    • for f in ../data/*.csv; do循环中,假设f被分配给 ../data/20211117_09-10-50CST_raw_fold_results-mlr.csv,然后sed 命令sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' 去掉前缀 然后将f2 分配给50CST_raw_fold_results-mlr.csv
    • map[$f2]=$f 将由50CST_raw_fold_results-mlr.csv 索引的关联数组(也称为python 中的字典)分配给其完整 路径名../data/20211117_09-10-50CST_raw_fold_results-mlr.csv
    • 在下面的 while 循环中,我们使用 fixed 子字符串作为完整路径名的键来替换文件名。

    [替代]
    如果我们将上面的bash 脚本转换为python,它看起来像:

    #!/usr/bin/python
    
    import glob
    import re
    
    map = {re.sub(r'.*\d{8}_\d{2}-\d{2}-', '', f) : f for f in glob.glob('../data/*.csv')}
    with open('experiments.txt', 'r') as f, open('experiments-2.txt', 'w') as fw:
        for line in f:
            path, opts = line.strip().split(' ', 1)
            f2 = re.sub(r'.*\d{8}_\d{2}-\d{2}-', '', path)
            if f2 in map:
                fw.write(' '.join([map[f2], opts]).replace('/', '\\') + '\n')
    

    仅供参考

    【讨论】:

    • src/目录,我尝试了这些命令;但是,experiments-2.txt 仅包含 ../data/data1-2.csv --plot True
    • 我环顾了一下,似乎sed 在这里也可能是一个好工具?我发布的问题是一个玩具问题,但实际上 data 目录中的文件名只是与更新数据 20211010_raw-model_name1.csv 对应的文件名。然后在experiments.txt 中,第一个参数可能是20200909_raw-model_name1.csv 被替换。因此,我的想法是遍历data目录中更新数据的名称,如果模型名称(由'-'分隔)与experiments.txt中的行匹配,则该行的第一个参数被更改。
    • 感谢您的反馈。但是我仍然不知道为什么我的代码没有产生您预期的结果。您能否用您的实际文件名和预期结果更新您的问题,以便我可以重现当前问题?正如您提到的sed 是替换文件名的选项,但是,我已经决定bash 也足以满足您提供的示例的目的。 BR。
    • 我已更新问题以反映当前问题。我还更新了示例 Python 解决方案,我希望它可以准确地说明我在寻找什么。谢谢!!
    • 感谢您提供可理解的更新。现在我想我明白了。你能用更新的脚本测试一下吗?干杯!
    猜你喜欢
    • 2018-09-29
    • 2017-04-10
    • 1970-01-01
    • 1970-01-01
    • 2017-05-20
    • 2012-09-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多