在具有更新文件路径的文件内重命名多个文件路径答案

【问题标题】：Renaming multiple file paths INSIDE a file with updated file paths在具有更新文件路径的文件内重命名多个文件路径
【发布时间】：2022-01-09 11:43:19
【问题描述】：

我有一个名为experiments.txt 的文件，其中包含一个名为script.py 的python 脚本的参数。

../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True

假设文件夹结构和文件如下图所示，注意data/中的csv文件与experiments.txt中的文件不相同。

data
 |___20211117_09-10-50CST_raw_fold_results-mlr.csv
 |___20211117_09-11-35CST_raw_fold_results-rf.csv
src
 |___script.py
 |___experiments.txt

我想替换第一个参数

（例如，../data/20211015_08-09-50CST_raw_fold_results-mlr.csv）

对于experiments.txt 中的每一行，使用更新后的数据使experiments.txt（或创建一个新文件，如experiments-2.txt）变为

..\data\20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
..\data\20211117_09-11-35CST_raw_fold_results-rf.csv --plot True

我知道我可以使用 Python 编写一个笨拙的解决方案，但我的解决方案充其量似乎不是最理想的，最坏的情况是设计非常糟糕。 如何在 bash 中执行所需的任务（因为它似乎很适合该任务，但我不确定如何）？

# This sample solution is written in `.ipynb` in the `src/` directory
import os
from pathlib import Path

cwd = os.getcwd()  # src
replacement_fnames = [file for file in os.listdir(os.path.join(cwd, '..', 'data'))]
with open('experiments.txt', 'r') as fobj:
    lines = [line.strip() for line in fobj.readlines()]

    # The replacement lines for the file `experiments-2.txt` will be
    # appended to this empty string
    write_str = ''

    for line in lines:

        # A line in the file is of the form
        # `path <SPACE> opts`, therefore splitting the line into a
        # list delimited by a space `' '` allows access to the `path`
        # by indexing 0
        space_separated_line = line.split(' ')
        cur_path = Path(space_separated_line[0])
        cur_fname = Path(cur_path).name

        # File names are separated by model name... in this case
        # `mlr` and `rf`... by splitting the file name into a list
        # delimited by `-`, then the last element of that list is the
        # name of the model
        # e.g., cur_fname = 20211015_08-09-50CST_raw_fold_results-mlr.csv
        # cur_fname.split('-') --> ['20211015_08-09-50CST_raw_fold_results', 'mlr.csv']
        cur_fname_model_name = cur_fname.split('-')[-1] 

        for replacement_fname in replacement_fnames:

            # Extract model name from the replacement fname in the same
            # fashion as done for cur_fname
            replacement_fname_model_name = replacement_fname.split('-')[-1]

            if replacement_fname_model_name == cur_fname_model_name:
                space_separated_line[0] = os.path.join(Path(cur_path).parent, replacement_fname)
                
        write_str += ' '.join(space_separated_line) + '\n'

print('Original:')
print('\n'.join(lines))
print()
print('Replaced:')
print(write_str)

with open('experiments-2.txt', 'w') as fobj:
    fobj.write(write_str)

## Output
# Original:
# ../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
# ../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True

# Replaced:
# ..\data\20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
# ..\data\20211117_09-11-35CST_raw_fold_results-rf.csv --plot True

【问题讨论】：

标签： python bash file

【解决方案1】：

假设文件名，例如20211015_08-09-50CST_raw_fold_results-mlr.csv 可以分为变量前缀20211015_08-09-和固定子串50CST_raw_fold_results-mlr.csv，我们可以测试现有的 data 目录中的文件使用固定子字符串。
那你试试看：

#!/bin/bash

declare -A map                          # associative array to map filenames
for f in ../data/*.csv; do              # find the csv filenames in the ../data dir
    f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$f")"
                                        # remove the variable prefix (dirname and the date)
    map[$f2]=$f                         # map the fixed substring of the filename to the fullpath
done

while read -r path opts; do             # read line of experiments.txt and break into variables
    f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$path")"
                                        # remove the variable prefix (dirname and the date)
    f=${map[$f2]}                       # map filename via the fixed substring
    if [[ -n $f ]]; then                # if the variable $f is not empty, the file exists
        echo "${f//\//\\} $opts"        # replace slashes with backslashes and write to "experiments-2.txt"
    fi
done < experiments.txt > experiments-2.txt

在for f in ../data/*.csv; do循环中，假设f被分配给 ../data/20211117_09-10-50CST_raw_fold_results-mlr.csv，然后sed 命令sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' 去掉前缀然后将f2 分配给50CST_raw_fold_results-mlr.csv。
map[$f2]=$f 将由50CST_raw_fold_results-mlr.csv 索引的关联数组（也称为python 中的字典）分配给其完整路径名../data/20211117_09-10-50CST_raw_fold_results-mlr.csv。
在下面的 while 循环中，我们使用 fixed 子字符串作为完整路径名的键来替换文件名。

[替代]
如果我们将上面的bash 脚本转换为python，它看起来像：

#!/usr/bin/python

import glob
import re

map = {re.sub(r'.*\d{8}_\d{2}-\d{2}-', '', f) : f for f in glob.glob('../data/*.csv')}
with open('experiments.txt', 'r') as f, open('experiments-2.txt', 'w') as fw:
    for line in f:
        path, opts = line.strip().split(' ', 1)
        f2 = re.sub(r'.*\d{8}_\d{2}-\d{2}-', '', path)
        if f2 in map:
            fw.write(' '.join([map[f2], opts]).replace('/', '\\') + '\n')

仅供参考

【讨论】：

从src/目录，我尝试了这些命令；但是，experiments-2.txt 仅包含 ../data/data1-2.csv --plot True
我环顾了一下，似乎sed 在这里也可能是一个好工具？我发布的问题是一个玩具问题，但实际上 data 目录中的文件名只是与更新数据 20211010_raw-model_name1.csv 对应的文件名。然后在experiments.txt 中，第一个参数可能是20200909_raw-model_name1.csv 被替换。因此，我的想法是遍历data目录中更新数据的名称，如果模型名称（由'-'分隔）与experiments.txt中的行匹配，则该行的第一个参数被更改。
感谢您的反馈。但是我仍然不知道为什么我的代码没有产生您预期的结果。您能否用您的实际文件名和预期结果更新您的问题，以便我可以重现当前问题？正如您提到的sed 是替换文件名的选项，但是，我已经决定bash 也足以满足您提供的示例的目的。 BR。
我已更新问题以反映当前问题。我还更新了示例 Python 解决方案，我希望它可以准确地说明我在寻找什么。谢谢！！
感谢您提供可理解的更新。现在我想我明白了。你能用更新的脚本测试一下吗？干杯！