【发布时间】:2015-01-20 04:46:46
【问题描述】:
我收到了一个 .txt 文件中的数据,我需要将其格式化为可以上传到数据库的内容。文本以任何 .根据标签,需要将数据转储到特定的 txt 文件中并用制表符分隔。我一生中很少使用 Perl,但我知道 Perl 可以轻松处理这种类型的应用程序,我只是不知道从哪里开始。在 Java、SQL 和 R 之外,我毫无用处。这是一个我有近 1,000 个要处理的单个条目的示例):
<PaperTitle>True incidence of all complications following immediate and delayed breast reconstruction.</PaperTitle>
<Abstract>BACKGROUND: Improved self-image and psychological well-being after breast reconstruction are well documented. To determine methods that optimized results with minimal morbidity, the authors examined their results and complications based on reconstruction method and timing. METHODS: The authors reviewed all breast reconstructions after mastectomy for breast cancer performed under the supervision of a single surgeon over a 6-year period at a tertiary referral center. Reconstruction method and timing, patient characteristics, and complication rates were reviewed. RESULTS: Reconstruction was performed on 240 consecutive women (94 bilateral and 146 unilateral; 334 total reconstructions). Reconstruction timing was evenly split between immediate (n = 167) and delayed (n = 167). Autologous tissue (n = 192) was more common than tissue expander/implant reconstruction (n = 142), and the free deep inferior epigastric perforator was the most common free flap (n = 124). The authors found no difference in the complication incidence with autologous reconstruction, whether performed immediately or delayed. However, there was a significantly higher complication rate following immediate placement of a tissue expander when compared with delayed reconstruction (p = 0.008). Capsular contracture was a significantly more common late complication following immediate (40.4 percent) versus delayed (17.0 percent) reconstruction (p < 0.001; odds ratio, 5.2; 95 percent confidence interval, 2.3 to 11.6). CONCLUSIONS: Autologous reconstruction can be performed immediately or delayed, with optimal aesthetic outcome and low flap loss risk. However, the overall complication and capsular contracture incidence following immediate tissue expander/implant reconstruction was much higher than when performed delayed. Thus, tissue expander placement at the time of mastectomy may not necessarily save the patient an extra operation and may compromise the final aesthetic outcome.</Abstract>
<BookTitle>Book1</BookTitle>
<Publisher>Publisher01, Boston</Publisher>
<Edition>1st</Edition>
<EditorList>
<Editor>
<LastName>Lewis</LastName>
<ForeName>Philip M</ForeName>
<Initials>PM</Initials>
</Editor>
<Editor>
<LastName>Kiffer</LastName>
<ForeName>Michael</ForeName>
<Initials>M</Initials>
</Editor>
</EditorList>
<Page>19-28</Page>
<Year>2008</Year>
<AuthorList>
<Author ValidYN="Y">
<LastName>Sullivan</LastName>
<ForeName>Stephen R</ForeName>
<Initials>SR</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Fletcher</LastName>
<ForeName>Derek R D</ForeName>
<Initials>DR</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Isom</LastName>
<ForeName>Casey D</ForeName>
<Initials>CD</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Isik</LastName>
<ForeName>F Frank</ForeName>
<Initials>FF</Initials>
</Author>
</AuthorList>
//
PaperTitle、Abstract、Page,需要进入Papers.txt文件
PaperTitle、BookTitle、Edition、Publisher 和 Year 需要进入 Book.txt 文件
PaperTitle,所有编辑器数据LastName、ForeName、Initials都需要进入Editors.txt
PaperTitle,所有作者信息 LastName, ForeName, Initials 需要进入 Authors.txt
// 标记条目的结束。所有文件都需要制表符分隔。 虽然我不会拒绝完成的代码,但我希望至少有一些想法能让我朝着正确的方向发展,至少解析出其中一个文件(如 Book.txt)的代码我很可能会弄清楚从那里。非常感谢。”
【问题讨论】:
-
我会先看看使用 Config::General 模块来处理解析和 Text::CSV_XS 模块来生成输出文件。
-
听起来你需要
XML::Twig。请显示该数据将产生的文件内容。
标签: perl parsing text tabs tags