变量文件名不被视为文件，无法打开答案

【问题标题】：Variable file name is not being seen as a file and cannot be opened变量文件名不被视为文件，无法打开
【发布时间】：2013-12-11 22:27:12
【问题描述】：

这是我使用 Python 的第三天，我确信一些简单的事情被忽略了。

我正在尝试索引 html 文件名列表，将索引的 html 文件名设置为 var，然后尝试打开该文件。计划是遍历文件名列表。

不幸的是，var 不是作为文件读取，而是作为名称读取。

我认为这将是一个容易回答的问题，但我只是没有找到它。

那么，我做错了什么？任何帮助将不胜感激。

这是我的代码：

file_list = []
   for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
     for file in files:
       if file.endswith('.html'):
          file_list.append(file)
input_file = file_list[0]
orig_file = open(input_file, 'w')

我知道我错过了一些简单的东西，但这让我发疯了！

更新：

file_list = []
for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
 for file in files:
   if file.endswith('.html'):
      file_list.append(os.path.join(root,file))
     input_file = file_list[0]
     orig_file = open(input_file, 'w')
     soup = BeautifulSoup(orig_file)
     title = soup.find('title')      
     main_txt = soup.findAll(id='main')[0]
     toc_txt = soup.findAll(class_ ='toc-indentation')[0]

然后是崩溃：

Traceback (most recent call last):
  File "C:\Aptana\beautiful\B-1.py", line 47, in <module>
   soup = BeautifulSoup(orig_file)
 File "C:\Python33\lib\site-packages\bs4\__init__.py", line 161, in __init__
   markup = markup.read()
 io.UnsupportedOperation: not readable

感谢广告匠！如果您有任何其他问题，请告诉我。

orig_file 被打印为： <_io.textiowrapper name="C:\Aptana\Beautiful mode=" r encoding="cp1252">

【问题讨论】：

这段代码一目了然。 “不作为文件读取而是作为名称读取”是什么意思？该程序的行为是什么，您希望它做什么？

标签： python file-io

【解决方案1】：

在我看来，您当前的工作目录与您要去的目录不在同一个目录中。尝试这样做：

file_list = []
   for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
     for file in files:
       if file.endswith('.html'):
          file_list.append(os.path.join(root,file))
input_file = file_list[0]
orig_file = open(input_file, 'w')

我也强烈建议使用“with”contextlib 而不是使用orig_file = open(file) 和orig_file.close()。而是按如下方式实现：

#walk through your directory as you're doing already
input_file = file_list[0] #you know this is only for the first file, right?
with open(input_file,'w') as orig_file:
  #do stuff to the file
#once you're out of the block, the file automagically closes, which catches
#all kinds of accidental breaks in cases of error or exception.

看起来您的问题是您正在使用“写入”标志而不是“读取”标志打开文件。我实际上不知道 BeautifulSoup 是做什么的，但是快速的谷歌让它看起来像一个屏幕解析器。以“r”而不是“w”打开 orig_file。

orig_file = open(input_file,'r') #your way
#or the better way ;)
with open(input_file,'r') as orig_file:
  #do stuff to it in the block

这样更好，因为以“w”打开文件会使文件空白:)

【讨论】：

首先，感谢 adsmith！
我尝试了你的代码，似乎一切正常，直到我使用 Beautiful Soup 的下一段代码并且它中断了。以下是返回的代码：<_io.textiowrapper name="C:\\Aptana\\Beautiful\\Administration+Guide.html" mode="w" encoding="cp1252"> Traceback (last last call last): File " C:\Aptana\beautiful\B-1.py”，第 47 行，在汤 = BeautifulSoup(orig_file) 文件“C:\Python33\lib\site-packages\bs4_init_. py"，第 161 行，在 init 中 markup = markup.read() io.UnsupportedOperation: not readable Any Ideas?
给我看代码，我们会找出原因:)。听起来您可能正在尝试将 file_list 用作文件名列表和文件路径列表。请使用现在失败的代码编辑您的问题。
您打开的是要写入的文件，而不是读取的文件。如果您需要读取它而不是写入它，请尝试open(input_file,'r')。
我尝试了导致 [0] 出现索引问题的 'r'，但没有从汤中返回任何内容。

【解决方案2】：

我相信在这里可以找到类似的问题：How to read file attributes in directory?

答案可能包含您正在寻找的信息（使用 os.stat 或 os.path 提供文件的实际路径。）

【讨论】：

感谢杂食！我没有看到，我做了一些搜索。今后我会努力做得更好。