【问题标题】:How to get new files coming into a folder while processing files for the same folder in shell script如何在shell脚本中处理同一文件夹的文件时让新文件进入文件夹
【发布时间】:2016-05-19 19:25:38
【问题描述】:

在我的 shell 脚本的开头,我有一个 FOR 循环来扫描文件夹以查看那里是否有任何文件,如果有,我需要处理每个文件。每个文件的处理过程需要一些时间(比如几分钟),具体取决于文件夹中有多少文件。

问题是:在处理每个文件的过程中,可能会有新文件进入文件夹,但我的测试表明新文件没有被拾取和处理。那么,有没有办法检测在 FOR 循环处理过程中出现的新文件?

我考虑过定期检查文件夹中的新文件,但我不想再次重新处理现有文件,更重要的是,因为这只是在脚本的开头,我不不希望 FOR 循环重复太多次。谢谢。****

for aFile in  "$mydir"/*
do
   // some tasks that may take 30 secs or so to finish for each file    
done

【问题讨论】:

  • 您是否考虑过重构以使用 fsnotifywait?我有一个 bash 脚本位于一个无限循环中,它会捕获信号。我有另一个 bash 脚本在给定目录上运行 fsnotifywait,并在发生正确的文件系统事件时向另一个 bash 脚本发送信号。

标签: bash shell


【解决方案1】:

这样的事情怎么样:

#!/bin/sh -xe

# create some dummy files to start with
touch filea
touch fileb

function analyzeFile() {
    echo "analyzing $1"
    sleep 10    # dummy for the real stuff you need to do
}

declare stillGettingSomething
declare -A alreadyAnalyzed

stillGettingSomething=true
while [ $stillGettingSomething ]; do
    stillGettingSomething=false    # prevent endless looping

    for i in ./file*; do
        # idea: see also http://superuser.com/questions/195598/test-if-element-is-in-array-in-bash 

        if [[ ${alreadyAnalyzed[$i]} ]]; then
            echo "$i was already analyzed before; skipping it immediately"
            continue
        fi

        alreadyAnalyzed[$i]=true    # Memorize the file which we visited
        stillGettingSomething=true  # We found some new file; we have to run another scan iteration later on

        analyzeFile $i

        # create some new files for the purpose of demonstration
        echo "creating file $i-latecreate"
        touch $i-latecreate
    done

done

这个脚本的结果是

+ declare stillGettingSomething
+ declare -A alreadyAnalyzed
+ stillGettingSomething=true
+ '[' true ']'
+ stillGettingSomething=false
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./filea
+ echo 'analyzing ./filea'
analyzing ./filea
+ sleep 10
+ echo 'creating file ./filea-latecreate'
creating file ./filea-latecreate
+ touch ./filea-latecreate
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./fileb
+ echo 'analyzing ./fileb'
analyzing ./fileb
+ sleep 10
+ echo 'creating file ./fileb-latecreate'
creating file ./fileb-latecreate
+ touch ./fileb-latecreate
+ '[' true ']'
+ stillGettingSomething=false
+ for i in './file*'
+ [[ -n true ]]
+ echo './filea was already analyzed before; skipping it immediately'
./filea was already analyzed before; skipping it immediately
+ continue
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./filea-latecreate
+ echo 'analyzing ./filea-latecreate'
analyzing ./filea-latecreate
+ sleep 10

其背后的想法是使用关联数组,它可以记住那些已经处理过的文件。如果一个文件已经被处理过,下次我们跳过它时会跳过它。只要我们在扫描迭代中至少获得一个新文件,我们就会这样做。

编辑:清理编码

这是上面编码的清理变体,修剪演示目的编码,尽量接近原始要求。

#!/bin/sh

function analyzeFile() {
    echo "analyzing $1"
    sleep 10    # dummy for the real stuff you need to do
}

declare stillGettingSomething
declare -A alreadyAnalyzed

stillGettingSomething=true
while [ $stillGettingSomething ]; do
    stillGettingSomething=false    # prevent endless looping

    for i in "$mydir"/*; do 

        if [[ ${alreadyAnalyzed[$i]} ]]; then
            echo "$i was already analyzed before; skipping it immediately"
            continue
        fi

        alreadyAnalyzed[$i]=true    # Memorize the file which we visited
        stillGettingSomething=true  # We found some new file; we have to run another scan iteration later on

        analyzeFile $i
    done
done

【讨论】:

  • 谢谢。但是当while循环将结束时。对我来说,这是一个无限循环。
  • 这只是一个用于演示目的的无限循环。只需删除循环中的两行“echo”创建文件$i-latecreate和“触摸$i-latecreate”就可以了工作,如果你添加更多的文件。
【解决方案2】:

这是一个有趣的问题,有很多方法可以解决它。一种方法是以某种方式跟踪哪些文件已完成,然后在每次循环迭代时处理第一个撤消的文件,例如,

cd "$mydir"
# make a donedir to put placeholder dummy files
mkdir donedir

while true; do

  # find first file with no corresponding dummy file in donedir
  newfile=`find * -maxdepth 0 -type f |
    sed 's/.*/[ ! -f "../donedir/&" ] \&\& echo "&"/' |
    sh | head -n1`

  # break out of the loop if there aren't any
  [ "$newfile" = "" ] && break

  # do your thing with $newfile...

  # record that you're done with $newfile
  touch "donedir/$newfile"
done

更有效的策略是在完成后将每个文件移动到 donedir:

cd "$mydir"
mkdir donedir

while true; do

  # find first file
  newfile=`find * -maxdepth 0 -type f | head -n1`

  # break out of the loop if there aren't any
  [ "$newfile" = "" ] && break

  # do your thing with $newfile...

  # done with $newfile...
  mv "$newfile" donedir
done

还可以跟踪哪些文件已完成,例如 EagleRainbow 建议的关联数组,但该方法的缺点是 1. 不必要的复杂性,以及 2. 不会​​自动保留跟踪哪些文件已完成跨脚本的不同运行。

【讨论】:

  • 谢谢。但是while循环似乎是一个无限循环。正如我所说,这只是脚本的开始,我不能在那里无限循环。
  • 一旦没有新文件,[ "$newfile" = "" ] && break 就会跳出循环,# do your thing with $newfile 是对每个新文件进行所需处理的地方。跨度>
猜你喜欢
  • 2013-08-18
  • 2014-05-08
  • 1970-01-01
  • 2019-06-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多