【发布时间】:2014-06-17 13:09:04
【问题描述】:
我正在尝试编写一个程序来读取 rss 新闻提要并在 txt 文件中重写文章的日期、标题和正文。两天前我才刚学C#,但有其他语言的经验。 该程序适用于某些提要,但在其他提要中(例如路透社),每篇文章正文后都有一个“通过电子邮件发送这篇文章”类型的链接,我在复制它时似乎无法摆脱它。我为整个提要运行程序。
例如,这是一些新闻的xml代码:
<item>
<title>Pimco's Ivascyn sees 'significant' opportunity in European bank assets</title>
<link>http://feeds.reuters.com/~r/news/wealth/~3/vUJ74S5mXQg/story01.htm</link>
<category domain="">PersonalFinance</category>
<pubDate>Mon, 16 Jun 2014 15:37:52 GMT</pubDate>
<guid isPermaLink="false">http://www.reuters.com/article/2014/06/16/us-investing-pimco-ivascyn-idUSKBN0ER1VV20140616?feedType=RSS&feedName=PersonalFinance</guid>
<description>NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth?a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/></description
<feedburner:origLink>http://reuters.us.feedsportal.com/c/35217/f/654211/s/3b8e7c6b/sc/2/l/0L0Sreuters0N0Carticle0C20A140C0A60C160Cus0Einvesting0Epimco0Eivascyn0EidUSKBN0AER1VV20A140A6160DfeedType0FRSS0GfeedName0FPersonalFinance/story01.htm</feedburner:origLink>
</item>
但是当我运行程序时,我得到:
Mon, 16 Jun 2014 15:37:52 GMT
Pimco's Ivascyn sees 'significant' opportunity in European bank assets
NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/>
**********
我试图去掉文章正文之后的最后两行代码。星号由我添加以分隔不同的文章。
这是我的代码:
using System;
using System.IO;
using System.Text;
using System.Xml;
namespace XmlReading
{
class RssReading
{
static void Main(string[] args)
{
//Creater a StreamWriter object to write in a text file.
StreamWriter sw = new StreamWriter("C:\\Users\Testing002.txt");
XmlDocument xmlDoc = new XmlDocument();
// Loads the rss feed page
xmlDoc.Load("http://feeds.reuters.com/news/wealth");
//create an object of item nodes.
XmlNodeList itemNodes = xmlDoc.SelectNodes("//rss/channel/item");
foreach (XmlNode itemNode in itemNodes)
{
//Reading the title
XmlNode titleNode = itemNode.SelectSingleNode("title");
//Reading the date
XmlNode dateNode = itemNode.SelectSingleNode("pubDate");
//Reading the body
XmlNode bodyNode = itemNode.SelectSingleNode("description");
if(((titleNode != null) && (dateNode != null)) && (bodyNode!= null))
{
/* Xpath of article body, and of extra links.
* //*[@id="bodyblock"]/ul/li[2]/div/text()
* //*[@id="bodyblock"]/ul/li[2]/div/div
*/
//writing to console just to check the output.
Console.WriteLine(dateNode.InnerText);
sw.WriteLine(dateNode.InnerText);
Console.WriteLine(titleNode.InnerText);
sw.WriteLine(titleNode.InnerText);
Console.WriteLine(bodyNode.Value);
sw.WriteLine(bodyNode.InnerText);
Console.WriteLine("**********\n\n\n");
sw.WriteLine("**********\n\n\n");
sw.WriteLine(" ");
sw.WriteLine(" ");
}
}
sw.Close();
Console.ReadKey(true);
}
}
}
提前感谢您的任何帮助或建议。
【问题讨论】:
-
您的“XML 代码”不是 RSS 提要的 XML 结构。它是它的 HTML 表示。请提供您尝试处理的 XML 结构。
-
对不起,我的错。我现在更正了。