如何将多个输出保存在多个文件中，其中每个文件的标题来自python中的对象？答案

【问题标题】：How to save multiple output in multiple file where each file has a different title coming from an object in python?如何将多个输出保存在多个文件中，其中每个文件的标题来自python中的对象？
【发布时间】：2016-10-02 14:54:58
【问题描述】：

我正在从网站 (http://www.gfrvitale.altervista.org/index.php/autismo-in?format=feed&type=rss) 抓取 RSS 提要。我写了一个脚本来从每个提要中提取和净化文本。我的主要问题是将每个项目的每个文本保存在不同的文件中，我还需要使用项目中正确的标题 exctractet 命名每个文件。我的代码是：

for item in myFeed["items"]:
    time_structure=item["published_parsed"]
    dt = datetime.fromtimestamp(mktime(time_structure))

    if dt>t:

     link=item["link"]           
     response= requests.get(link)
     doc=Document(response.text)
     doc.summary(html_partial=False)

     # extracting text
     h = html2text.HTML2Text()

     # converting
     h.ignore_links = True  #ignoro i link
     h.skip_internal_links=True  #ignoro i link esterni
     h.inline_links=True
     h.ignore_images=True  #ignoro i link alle immagini
     h.ignore_emphasis=True
     h.ignore_anchors=True
     h.ignore_tables=True

     testo= h.handle(doc.summary())  #testo estratto

     s = doc.title()+"."+" "+testo  #contenuto da stampare nel file finale

     tit=item["title"]

     # save each file with it's proper title
     with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f:
         f.write(s)
         f.close()

错误是：

File "<ipython-input-57-cd683dec157f>", line 34 with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f:
                                 ^
SyntaxError: invalid syntax

【问题讨论】：

标签： python rss feed

【解决方案1】：

%tit后面需要加逗号

应该是：

#save each file with it's proper title
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)
     f.close()

但是，如果您的文件名包含无效字符，则会返回错误（即[Errno 22]）

你可以试试这个代码：

...
tit = item["title"]
tit = tit.replace(' ', '').replace("'", "").replace('?', '') # Not the best way, but it could help for now (will be better to create a list of stop characters)

with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)
     f.close()

使用nltk的其他方式：

from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
tit = item["title"]
tit = tokenizer.tokenize(tit)
tit = ''.join(tit)
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)
     f.close()

【讨论】：

我做到了，但它不起作用，我得到这个错误： C:\Anaconda2\lib\codecs.pyc in open(filename, mode, encoding, errors, buffering) 894 # Force opening二进制模式下的文件 895 mode = mode + 'b' --> 896 file = builtin.open(filename, mode, buffering) 897 if encoding is None: 898 return file IOError: [Errno 22] 无效模式 ('wb') 或文件名：u'testo_La Comunicazione Facilitata？帕利亚莫内。
代码是对的。逗号在目标 %tit 之后，而不是之前。那是另一个错误。我会检查的。
期望的输出是什么？（即.csv、.txt）
你的问题是字符串的名称u'testo_La Comunicazione Facilitata?.. 你不能这样调用你的文件。我编辑了我的答案，为此提供了一种解决方案。
#estebanpdl 和你的 etit 它工作得更好，但我在某些标题上仍然有一些问题。所以我试着做更多的替换，而不是我告诉你它是怎么回事。非常感谢！

【解决方案2】：

首先，你放错了逗号，它应该在%tit之后而不是之前。

其次，您不需要关闭文件，因为您使用的 with 语句会自动为您执行此操作。编解码器是从哪里来的？我在其他任何地方都没有看到它......无论如何，正确的with 声明是：

with open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)

【讨论】：

我已经运行了上面的代码，但它给出了错误。现在我正在尝试这个： with io.open("testo_"+tit, "w", encoding="utf-8") as f: f.write(s)
它给出了什么错误？你应该提供一些可以使用的东西......并且对于命名你应该坚持使用"testo_%s" %tit，因为我认为"testo_"+tit不会起作用（但我可能是错的）
我已经运行了上面的代码，但它给了我错误。它说该函数不接受 %tit 之类的参数。现在我正在尝试这个： with io.open("testo_"+tit, "w", encoding="utf-8") as f: f.write(s) 它的部分功能是因为它用他的第一个项目保存正确的标题，然后停止。我收到这个新错误：IOError: [Errno 22] Invalid argument: u'testo_La Comunicazione Facilitata? Parliamone...'