不提供高效工作的纯 Bash Cuting 脚本答案

【问题标题】：A pure Bash Cuting script that do not provide efficient work不提供高效工作的纯 Bash Cuting 脚本
【发布时间】：2017-01-28 22:38:51
【问题描述】：

已经发布的使用 awk 或 sed 的解决方案是相当标准的，并且在某些事情不能正常工作时提供帮助。

喜欢一个：

StringStr="ValueA:ValueB,ValueC:ValueC" ; 

echo ${StringStr} | gawk -F',' 'BEGIN{}{for(intx=1;intx<=NF;intx++){printf("%s\n",$(intx))}}END{}'

确实会产生相同的结果，但是可以登录其帐户并且具有较少选项（例如由于特定原因不允许使用 awk 或 gawk）的受限用户确实必须产生每次都必须工作的东西。

出于有效的原因，我确实在 github.com 上开发了自己的 Bash 函数库，但使用了一种无法按预期工作的技术，这里是一个工作示例：

此技术使用 Bash 的“删除匹配前缀模式”和“删除匹配后缀模式”。目标是获取一串链接信息以使用尽可能简单的 bash-shell 元素来提取插入的元素。

到目前为止，我确实有第一条语句来获取特定格式的字符串：例如：

StringPattern="__VALUE1__:__VALUE2__,"

格式假设在链中添加多个StringPattern 类型的Pattern。剩下的 ',' 将用于拆分和分离字符串 VALUE1：VALUE2形式。

像StringStorage一样会保存很多次，解析StringPattern，这里举2个例子： 1 - 样本 1

StringPattern="VariableA:InformationA,"
StringStorage="${StringStorage}${StringPattern}" ;

2 - 样本 2

StringPattern="VariableB:InformationB,"
StringStorage="${StringStorage}${StringPattern}" ;

此时，StringStorage 正确保存了这些信息：

StringStorage="VariableA:InformationA,VariableB:InformationB,"

现在使用 StringStorage，由“删除匹配前缀模式”和“删除匹配后缀模式”混合而成的 bash 算法确实适用于这种情况：

### Description of IntCsvCount
### does remove all chosed Comma Separated value ',' from StringStorage
### and subtract from the original length the removed result from this 
### subtraction. This produce IntCsvCount == 2
IntCsvCount=$( cstr=${StringStorage//,/} ; echo $(( ${#StringStorage} - ${#cstr} )) ) ;

### Description of 
### Will be re Variable used to put the Extracted sequence.
bstr="" ;

### Description of for
### Received information from IntCsvCount it should count
### from 0 to Last element . This case it's ${IntCsvCount}-1 or 1 in 
### my example.

for (( intx=0 ; intx <= ${IntCsvCount}-1 ; intx++ )) ; do
  ### This extracting First Segment based on 
  ### Remove  matching suffix pattern ${parameter%word} where 
  ### work is ${astr#*,} ( Remove matching prefix pattern ) of 
  ### everything in $astr until find a ',' .
  bstr=${astr%*${astr#*,}} ;
  ### Destroying the $bstr part in by starting the astr to position of
  ### the end of size equivalent of bstr size (${#bstr}), end position is
  ### equal to [ Highest-String size ] - [ Shortest-String size ] 
  astr=${astr:${#bstr}:$(( ${#astr} - ${#bstr}))} ;
  echo -ne "Element: ${bstr}\n" ;
done

这应该会产生以下答案。

Element: VariableA:InformationA,
Element: VariableB:InformationB,

将其放入函数中只需将 CSV 更改为“:”并提取“VariableA”和“InformationA”。

问题开始使用非统一的字符串。正如在此板上观察到的，句子示例和切割部分应该适用于非均匀字符串，但这里的示例不起作用。而且我确实有不止一个建议可以使用 gawk、sed 甚至 cut，但是从这个算法来看，它不适用于这个示例：

astr="master|ZenityShellEval|Variable declaration|Added Zenity font support to allow choosing both font-name and size and parsing the zenition option, notice --font option require a space between font and size.|20170127|"

来自

astr=$( zenity --width=640 --height=600 --forms --show-header --text="Commit Message" --add-entry="Branch name" --add-entry="function" --add-entry="section" --add-entry="commit Message" --add-calendar="Commit Date" --forms-date-format="%Y%m%d" --separator='|' ) ;

我还强制输出看起来像 StringPattern 应该看起来的样子： asr="${astr}|" ;

除 CSV（逗号分隔值）之外的相同代码已从“，”更改为“|”

IntCsvCount=$( cstr=${astr//|/} ; echo $(( ${#astr} - ${#cstr} )) ) ;
bstr="" ;
for (( intx=0 ; intx <= ${IntCsvCount}-1 ; intx++ )) ; do
  bstr=${astr%*${astr#*|}} ;
  astr=${astr:${#bstr}:$(( ${#astr} - ${#bstr}))} ;
  echo -ne "Element: ${bstr}\n" ;
done

此时输出生成如下输出：

Element:master|ZenityShellEval|Variable declaration|Added Zenity font support to allow choosing both font-name and size and parsing the zenition option, notice --font option require a space between font and size.|20170127|
Element:
Element:
Element:

有什么原因为什么它不应该每次都工作吗？

【问题讨论】：

这很难理解。您能否将其简化为您的输入、实际和预期输出以及您尝试获得该结果的代码。

标签： string bash matching text-extraction

【解决方案1】：

所以，您发布了这个 AWK 脚本：

BEGIN{}{for(intx=1;intx<=NF;intx++){printf("%s\n",$(intx))}}END{}

如果我理解正确，您是说它完全按照您的意愿行事，唯一的问题是您不想依赖 AWK？

在这种情况下，您确实使事情变得比您需要的更复杂。您可以直接使用 Bash 的子字符串替换功能：

str=ValueA:ValueB,ValueC:ValueC
printf '%s\n' "${str//,/$'\n'}"

【讨论】：

据我所知，awk 脚本是一种替代方案，用于计算来自 iptable 的流量以供网络提供商使用。但如前所述，它只是一种替代方案，实际上是在不到 2 行代码中保留一个有效的解决方案：bstr=${astr%*${astr#*|}} ; astr=${astr:${#bstr}:$(( ${#astr} - ${#bstr}))} ;。此解决方案可以通过 CSV 一次弹出最右侧的元素，直到 asr 中的字符串为空。它很高效，每次迭代需要更少的 CPU 周期。我应该在我的 github Fnct.D 中发布一些东西，这是为了快速发展并保持清晰。

【解决方案2】：

如果我正确理解了你的问题的结尾，你有一个类似astr="master|ZenityShellEval|Variable declaration|Added Zenity font support to allow choosing both font-name and size and parsing the zenition option, notice --font option require a space between font and size.|20170127|" 的字符串，并且你想要以下输出：

Element: master
Element: ZenityShellEval
Element: Variable declaration
Element: Added Zenity font support to allow choosing both font-name and size and parsing the zenition option, notice --font option require a space between font and size.
Element: 20170127

我能想到的最简单的方法如下：

s="${astr%|}"; echo "Element: ${s//|/$'\n'Element: }";

另外，不要忘记数组！我认为它们会为你正在做的事情派上用场。以下也产生了所需的输出：

(IFS='|'; declare -a a=(${astr}); printf "Element: %s\n" "${a[@]}")

Bash Hackers Wiki has a great page on arrays，我建议你看看。

【讨论】：

要保留bstr=${astr%*${astr#*|}} ; astr=${astr:${#bstr}:$(( ${#astr} - ${#bstr}))} ; 的两个最重要的事情，终于开始工作了大多数 CSV 声明较少，解决方法较少。对于数组，我喜欢使用 declare -a Array=( ${astr//|/ } ) ; 它按空间扩展 CSV，而数组必须将其拆分为元素。需要添加一个for item in ${Array[@]} ; do echo ${item} ; done，我们在添加 3 行时效率不高。

【解决方案3】：

这是最后几个主题的相同运行：

IFS="|" read -ra arr<<<"${astr}"
printf "Element: %s\n" "${arr[@]}"

我想我会补充一点，你原来的 awk 有点臃肿：

echo -n "ValueA:ValueB,ValueC:ValueC" | awk '1' RS=","

当然，当前解决方案的 awk：

awk 'NF && $0 = "Element: " $0' RS="|" <<<"$astr"

【讨论】：