12.文件和输入输出

一、文件对象

1、文件对象不仅可以用来访问普通的磁盘文件, 而且也可以访问任何其它类型抽象层面上的"文件".

2、一旦设置了合适的"钩子", 你就可以访问具有文件类型接口的其它对象, 就好像访问的是普通文件一样.

3、内建函数 open() 返回一个文件对象，对该文件进行后继相关的操作都要用到它

4、文件只是连续的字节序列
5、数据的传输经常会用到字节流, 无论字节流是由单个字节还是大块数据组成

二、文件内建函数：open()和file()

打开文件

open()和file()都可以操作文件，打开文件后时候会返回一个文件对象，否则引发一个错误；当操作失败, Python 会产生一个 IOError 异常

1、open() 和 file() 函数具有相同的功能, 可以任意替换
2、您所看到任何使用 open() 的地方, 都可以使用 file() 替换它
3、建议使用 open() 来读写文件

4、基本语法：file_object = open(file_name, access_mode=\'r\', buffering=-1)

file_name 是包含要打开的文件名字的字符串, 它可以是相对路径或者绝对路径

可选变量access_mode 也是一个字符串, 代表文件打开的模式

使用 \'r\' 或 \'U\' 模式打开的文件必须是已经存在的

如果没有给定 access_mode , 它将自动采用默认值 \'r\' .

另外一个可选参数 buffering 用于指示访问文件所采用的缓冲方式 ；其中 0 表示不缓冲, 1表示只缓冲一行数据, 任何其它大于 1 的值代表使用给定值作为缓冲区大小

不提供该参数或者给定负值代表使用系统默认缓冲机制, 既对任何类电报机( tty )设备使用行缓冲, 其它设备使用正常缓冲

注意点：

1.python解释器打开文件时，是对硬盘进行操作,需要内核态才可以操作硬盘，故此时python解释器是调用操作系统的文件读取接口。
windows中文版本默认使用GBK编码表，linux默认使用utf-8，所有如果操作的文件在windows下，非GBK编码的，需要在open函数中声明编码类型，
使操作系统运用相应的编码规则进行解码读取，防止串码，乱码现象。

2.open主要有三种模式,读(r),写(w),追加(a),其中，默认为读模式

关闭文件

关闭文件有两组方式：

1.使用f.close()  ,f为open返回的句柄赋值的变量名。

2.程序结束后，自动关闭。第一个方法容易造成文件写操作时，数据的丢失。原因是写数据时，数据会先保存在内存中，文件关闭时才会写入硬盘，此时如果文件未关闭，
软件因为异常崩溃，导致内存中的数据丢失，且未写入硬盘中。作为第一种关闭方法的优化，是使用：with open(\'filename\') as f 。
with会创建一个程序块，将文件操作置于with程序块下，这样with控制块结束，文件也会自动关闭。

语法如下：

with open(\'f1.txt\') as f1 , open(\'f2.txt\') as f2:
    ......

文件对象访问模式

file对象有自己的属性和方法。先来看看file的属性。（+和b可以和其他的字符组合成mode，例如rb以二进制只读方式打开，mode参数是可选的，如果没有默认为r）

（注意：文件打开之后，应当被及时关闭，可以查看f.closed属性以确认文件是否被关闭）

mode	function
r	只读模式（默认，文件不存在，则发生异常）文件的指针将会放在文件的开头
w	只写模式（不可读，文件不存在则创建，存在则删除内容，再打开文件）
a	追加模式（只能写，文件不存在则创建，存在则追加内容）
r+	可读写模式（可读，可写，可追加）,如果文件存在，则覆盖当前文件指针所在位置的字符，如原来文件内容是"Hello，World"，打开文件后写入"hi"则文件内容会变成"hillo, World"
b	以二进制方式打开（如：FTP发送上传ISO镜像文件，linux可忽略，windows处理二进制文件时需标注）
w+	先写再读（可读，可写，可追加）如果该文件已存在则将其覆盖。如果该文件不存在，创建新文件。
a+	同a(可读可写，文件不存在则创建，存在则追加内容)。如果该文件已存在，文件指针将会放在文件的结尾。文件打开时会是追加模式。如果该文件不存在，创建新文件用于读写。
rb	以二进制读方式打开，只能读文件，如果文件不存在，会发生异常
wb	以二进制写方式打开，只能写文件，如果文件不存在，创建该文件
ab	二进制追写文件。从文件顶部读取内容从文件底部添加内容不存在则创建
rt	以文本读方式打开，只能读文件，如果文件不存在，会发生异常
wt	以文本写方式打开，只能读文件，如果文件不存在，创建该文件。如果文件存在。先清空，再打开文件
at	以文本读写方式打开，只能读文件，如果文件不存在，创建该文件。如果文件存在。先清空，再打开文件
rb+	以二进制读方式打开，可以读、写文件，如果文件不存在，会发生异常
wb+	以二进制写方式打开，可以读、写文件，如果文件不存在，创建该文件.如果文件存在。先清空，再打开文件
ab+	追读写二进制。从文件顶部读取内容从文件底部添加内容不存在则创建

注：以b方式打开时，读取到的内容是字节类型，写入时也需要提供字节类型

通用换行符支持(UNS)

1、不同平台用来表示行结束的符号是不同的
2、当你使用 \'U\' 标志打开文件的时候, 所有的行分割符通过 Python 的输入方法返回时都会被替换为换行符 NEWLINE(\n)
3、UNS 只用于读取文本文件

三、文件内建方法

>>> help(open)
Help on built-in function open in module __builtin__:

open(...)
    open(name[, mode[, buffering]]) -> file object
    
    Open a file using the file() type, returns a file object.  This is the
    preferred way to open a file.  See file.__doc__ for further information.

>>> help(file)
Help on class file in module __builtin__:

class file(object)
 |  file(name[, mode[, buffering]]) -> file object
 |  
 |  Open a file.  The mode can be \'r\', \'w\' or \'a\' for reading (default),
 |  writing or appending.  The file will be created if it doesn\'t exist
 |  when opened for writing or appending; it will be truncated when
 |  opened for writing.  Add a \'b\' to the mode for binary files.
 |  Add a \'+\' to the mode to allow simultaneous reading and writing.
 |  If the buffering argument is given, 0 means unbuffered, 1 means line
 |  buffered, and larger numbers specify the buffer size.  The preferred way
 |  to open a file is with the builtin open() function.
 |  Add a \'U\' to mode to open the file for input with universal newline
 |  support.  Any line ending in the input file will be seen as a \'\n\'
 |  in Python.  Also, a file so opened gains the attribute \'newlines\';
 |  the value for this attribute is one of None (no newline read yet),
 |  \'\r\', \'\n\', \'\r\n\' or a tuple containing all the newline types seen.
 |  
 |  \'U\' cannot be combined with \'w\' or \'+\' mode.
 |  
 |  Methods defined here:
 |  
 |  __delattr__(...)
 |      x.__delattr__(\'name\') <==> del x.name
 |  
 |  __enter__(...)
 |      __enter__() -> self.
 |  
 |  __exit__(...)
 |      __exit__(*excinfo) -> None.  Closes the file.
 |  
 |  __getattribute__(...)
 |      x.__getattribute__(\'name\') <==> x.name
 |  
 |  __init__(...)
 |      x.__init__(...) initializes x; see help(type(x)) for signature
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __setattr__(...)
 |      x.__setattr__(\'name\', value) <==> x.name = value
 |  
 |  close(...)
 |      close() -> None or (perhaps) an integer.  Close the file.
 |      
 |      Sets data attribute .closed to True.  A closed file cannot be used for
 |      further I/O operations.  close() may be called more than once without
 |      error.  Some kinds of file objects (for example, opened by popen())
 |      may return an exit status upon closing.
 |  
 |  fileno(...)
 |      fileno() -> integer "file descriptor".
 |      
 |      This is needed for lower-level file interfaces, such os.read().
 |  
 |  flush(...)
 |      flush() -> None.  Flush the internal I/O buffer.
 |  
 |  isatty(...)
 |      isatty() -> true or false.  True if the file is connected to a tty device.
 |  
 |  next(...)
 |      x.next() -> the next value, or raise StopIteration
 |  
 |  read(...)
 |      read([size]) -> read at most size bytes, returned as a string.
 |      
 |      If the size argument is negative or omitted, read until EOF is reached.
 |      Notice that when in non-blocking mode, less data than what was requested
 |      may be returned, even if no size parameter was given.
 |  
 |  readinto(...)
 |      readinto() -> Undocumented.  Don\'t use this; it may go away.
 |  
 |  readline(...)
 |      readline([size]) -> next line from the file, as a string.
 |      
 |      Retain newline.  A non-negative size argument limits the maximum
 |      number of bytes to return (an incomplete line may be returned then).
 |      Return an empty string at EOF.
 |  
 |  readlines(...)
 |      readlines([size]) -> list of strings, each a line from the file.
 |      
 |      Call readline() repeatedly and return a list of the lines so read.
 |      The optional size argument, if given, is an approximate bound on the
 |      total number of bytes in the lines returned.
 |  
 |  seek(...)
 |      seek(offset[, whence]) -> None.  Move to new file position.
 |      
 |      Argument offset is a byte count.  Optional argument whence defaults to
 |      0 (offset from start of file, offset should be >= 0); other values are 1
 |      (move relative to current position, positive or negative), and 2 (move
 |      relative to end of file, usually negative, although many platforms allow
 |      seeking beyond the end of a file).  If the file is opened in text mode,
 |      only offsets returned by tell() are legal.  Use of other offsets causes
 |      undefined behavior.
 |      Note that not all file objects are seekable.
 |  
 |  tell(...)
 |      tell() -> current file position, an integer (may be a long integer).
 |  
 |  truncate(...)
 |      truncate([size]) -> None.  Truncate the file to at most size bytes.
 |      
 |      Size defaults to the current file position, as returned by tell().
 |  
 |  write(...)
 |      write(str) -> None.  Write string str to file.
 |      
 |      Note that due to buffering, flush() or close() may be needed before
 |      the file on disk reflects the data written.
 |  
 |  writelines(...)
 |      writelines(sequence_of_strings) -> None.  Write the strings to the file.
 |      
 |      Note that newlines are not added.  The sequence can be any iterable object
 |      producing strings. This is equivalent to calling write() for each string.
 |  
 |  xreadlines(...)
 |      xreadlines() -> returns self.
 |      
 |      For backward compatibility. File objects now include the performance
 |      optimizations previously implemented in the xreadlines module.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  closed
 |      True if the file is closed
 |  
 |  encoding
 |      file encoding
 |  
 |  errors
 |      Unicode error handler
 |  
 |  mode
 |      file mode (\'r\', \'U\', \'w\', \'a\', possibly with \'b