【发布时间】:2011-12-02 20:08:46
【问题描述】:
我刚刚开始学习 sed。我想提取并打印 > 和
<span id="ctl00_ContentPlaceHolder1_lblRollNo">12029</span>
<br /><b>Engineering & IT/Computer Science</b><br />
<div id="ctl00_ContentPlaceHolder1_divEngITMerit">
<span id="ctl00_ContentPlaceHolder1_lblEngITSelListNo">3rd Provisional Selection List</span>
<tr><td style='width: 200px' class='TblTRData'>IT/Computer Science/Software</td><td style='width: 150px'class='TblTRData'>7 (out of 471)</td><td style='width: 325px'class='TblTRData'>Selected in MS COMPUTER SCIENCE</td></tr>
Name:
<span id="ctl00_ContentPlaceHolder1_lblName">SIDRA SHAHID</span>
Father Name:
<span id="ctl00_ContentPlaceHolder1_lblFatherName">SHAHID RAFEEQ AHMAD</span>
我已经写好了命令:
sed -n -e '/^[^>]*>\([^<]*\)<.*/s//\1/p' myfile.txt
问题是它返回了一些 > 12029,但未在 Selected in MS COMPUTER SCIENCE 中选择。我做错了什么?
【问题讨论】:
-
您应该改用 xml 解析器。如果里面有实体怎么办?
-
我会把这个链接放到 cmets 中,以防万一有人发现它有用:stackoverflow.com/questions/1732348/…