如何使用带有正则表达式的 linux sed 在文本文件中查找和删除长模式答案

【问题标题】：How to find and remove a long pattern in text files using linux sed with regex如何使用带有正则表达式的 linux sed 在文本文件中查找和删除长模式
【发布时间】：2019-04-02 16:41:24
【问题描述】：

我正在将大量 bibtex 文件解析为 R 以进行一些数据分析。然而，这些摘要会定期引起问题，我想使用 sed 预先删除它们。

我找到了sed 's/Abstract\s\=\s[{][{]//' < file.bib

成功删除摘要条目和

sed 's/[}][}]\,//' < file.bib 删除右括号和逗号。

但是，我无法以任何方式将两者结合起来以删除介于两者之间的所有内容。例如通过尝试：

sed 's/^Abstract\s\=\s[{][{][\s\S]*[}][}]\,$//' < file.bib

这是 bibtex 参考的样子：

@article{ ISI:000072671200001,
Author = {Edmondson, A and Moingeon, B},
Title = {{From organizational learning to the learning organization}},
Journal = {{MANAGEMENT LEARNING}},
Year = {{1998}},
Volume = {{29}},
Number = {{1}},
Pages = {{5-20}},
Month = {{MAR}},
Abstract = {{This article reviews theories of organizational learning and presents a
   framework with which to organize the literature. We argue that unit of
   analysis provides one critical distinction in the organizational
   learning literature and research objective provides another. The
   resulting two-by-two matrix contains four categories of research, which
   we have called: (2) residues (organizations as residues of past
   learning); (2) communities (organizations as collections of individuals
   who can learn and develop); (3) participation (organizational
   improvement gained through intelligent activity of individual members),
   and (4) accountability (organizational improvement gained through
   developing individuals' mental models). We also propose a distinction
   between the terms organizational learning and the learning organization.
   Our subsequent analysis identifies relationships between disparate parts
   of the literature and shows that these relationships point to individual
   mental models as a critical source of leverage for creating learning
   organizations. A brief discussion of the work of two of the most visible
   researchers in this field, Peter Senge and Chris Argyris, provides
   additional support for this type of change strategy.}},
DOI = {{10.1177/1350507698291001}},
ISSN = {{1350-5076}},
Unique-ID = {{ISI:000072671200001}},
}

这就是我想要的样子：

@article{ ISI:000072671200001,
Author = {Edmondson, A and Moingeon, B},
Title = {{From organizational learning to the learning organization}},
Journal = {{MANAGEMENT LEARNING}},
Year = {{1998}},
Volume = {{29}},
Number = {{1}},
Pages = {{5-20}},
Month = {{MAR}},
DOI = {{10.1177/1350507698291001}},
ISSN = {{1350-5076}},
Unique-ID = {{ISI:000072671200001}},
}

【问题讨论】：

标签： regex sed

【解决方案1】：

这可能对你有用（GNU sed）：

sed '/^Abstract = {{/,/.*}},$/d' file

这使用范围运算符, 结合删除命令d 删除以Abstract = {{ 开头的行到以}}, 结尾的行。

【讨论】：

【解决方案2】：

您可以尝试将 sed 命令按顺序传递给彼此。像这样的：

sed 's/Abstract\s\=\s[{][{]//' < file.bib | sed 's/[}][}]\,//'

您也可以尝试在您的模式中使用 OR 正则表达式运算符，例如：

sed 's/Abstract\s\=\s[{][{]|[}][}]\,//' < file.bib

其中任何一个都应该有效。我希望这会有所帮助。

【讨论】：