使用 BeautifulSoup 将解析的 xml 文件写入文本文件时如何摆脱列表？答案

【问题标题】：How to get rid of lists when writing the parsed xml file into text file using BeautifulSoup?使用 BeautifulSoup 将解析的 xml 文件写入文本文件时如何摆脱列表？
【发布时间】：2017-06-24 20:49:48
【问题描述】：

我有一个以下格式的 xml 文件：

<date>31,March,2001</date>
<post>



       urlLink The Register reports on "war driving"  - the wireless equivalent of war dialing.  Instead of having your modem dial into thousands of networks until you get in, you just drive within range of a wireless net with your wireless-equipped laptop and hack away.    related:   urlLink The latest issue of CIO  has a great feature on wireless.



</post>

我想提取每个帖子的内容并将其写入我的输出文本文件的新行。这是我的解析代码：

from bs4 import BeautifulSoup as Soup
def parseLog(file):
        with open(file, 'rb') as handler:
            soup = Soup(handler, "html.parser")
            for message in soup.findAll('post'):
                #print(len(str(message).strip()))
                content = message.contents
                if(len(str(content).strip()) > 300):
                    re.sub("[^a-zA-Z0-9]", "", str(content))
                    with open(dest, 'a', encoding="utf-8") as f:
                        f.write(str(message.contents) + "\n")

但是，输出文件现在将每个内容包含为一个列表。此外，到处都有不需要的“\r”和“\n”字符（我使用 re.sub() 来摆脱这些但没有用）：

['\r\n\r\n\r\n \r\n 可引用的 Mindjack！从迈克 Sugarbaker 对 Lemon 的 urlLink 评论：“如果你没有耐心聪明的疯子的冗长沉思，柠檬不适合你。但你读过 Mindjack，所以你可能喜欢那种东西，对吗？"\r\n \r\n\r\n \r\n']['\r\n\r\n\r\n \r\n 我\'m 不确定我是否喜欢刚刚转向的 urlLink FEED 方向。 “这过滤器”，一个新的博客链接到外部内容现在有更多特色比 FEED 的原创内容更显眼。也不清楚是什么区别在于过滤器和 urlLink Plastic 之间。\r\n
\r\n\r\n \r\n']

如何摆脱这些？

【问题讨论】：

标签： python xml python-3.x xml-parsing beautifulsoup

【解决方案1】：

message.contents 是list。你应该改用get_text()：

def parseLog(file):
        with open(file, 'rb') as handler:
            soup = Soup(handler, "html.parser")
            for message in soup.findAll('post'):
                content = message.get_text() #a string!
                if(len(content.strip()) > 300):
                    re.sub("[^a-zA-Z0-9]", "", str(content))
                    with open(dest, 'a', encoding="utf-8") as f:
                        f.write(content + "\n")

【讨论】：