【发布时间】:2020-03-27 22:26:54
【问题描述】:
我正在寻找一个 R 解决方案来解决解析引用文本文件(如下所示)的问题,该解决方案给出一个 data.frame,每个引用一个观察值,变量 text 和 source 如下所述。
DIAGRAMS are of great utility for illustrating certain questions of vital statistics by
conveying ideas on the subject through the eye, which cannot be so readily grasped when
contained in figures.
--- Florence Nightingale, Mortality of the British Army, 1857
To give insight to statistical information it occurred to me, that making an
appeal to the eye when proportion and magnitude are concerned, is the best and
readiest method of conveying a distinct idea.
--- William Playfair, The Statistical Breviary (1801), p. 2
Regarding numbers and proportions, the best way to catch the imagination is to speak to the eyes.
--- William Playfair, Elemens de statistique, Paris, 1802, p. XX.
The aim of my carte figurative is to convey promptly to the eye the relation not given quickly by numbers requiring mental calculation.
--- Charles Joseph Minard
这里,每个引用都是一个段落,用"\n\n" 与下一个分隔。在该段落中,以--- 开头的所有行构成text,--- 之后的行是source。
我想我可以先将文本行分成段落(由'\\n\\n+'(2 个或更多空白行)分隔),但我无法做到这一点。
【问题讨论】:
-
您能与我们分享您尝试过的代码吗?
标签: r parsing paragraph quotations