更新问题的答案 -
这是无论数据宽度如何都不会失败的代码。您可以根据自己的需要进行修改。
df = pd.read_table('file.txt', header=None)
# Replacing uneven spaces with single space
df = df[0].apply(lambda x: ' '.join(x.split()))
# An empty dataframe to hold the output
out = pd.DataFrame(np.NaN, index=df.index, columns=['col1', 'col2', 'col3', 'col4', 'col5'])
n_cols = 5 # number of columns
for i in range(n_cols-2):
# 0 1
if i == 0 or i == 1:
out.iloc[:, i] = df.str.partition(' ').iloc[:,0]
df = df.str.partition(' ').iloc[:,2]
else:
out.iloc[:, 4] = df.str.rpartition(' ').iloc[:,2]
df = df.str.rpartition(' ').iloc[:,0]
out.iloc[:,3] = df.str.rpartition(' ').iloc[:,2]
out.iloc[:,2] = df.str.rpartition(' ').iloc[:,0]
print(out)
+---+------------+-------------+----------------+-------+--------+
| | col1 | col2 | col3 | col4 | col5 |
+---+------------+-------------+----------------+-------+--------+
| 0 | 1541783101 | 8901951488 | file.log | 12345 | 123456 |
| 1 | 1541783401 | 21872967680 | other file.log | 23456 | 123 |
| 2 | 1541783701 | 3 | third file.log | 23456 | 123 |
+---+------------+-------------+----------------+-------+--------+
注意 - 代码被硬编码为 5 列。也可以泛化。
上一个答案 -
使用pd.read_fwf() 读取固定宽度的文件。
在你的情况下:
pd.read_fwf('file.txt', header=None)
+---+----------+-----+-------------------+-------+--------+
| | 0 | 1 | 2 | 3 | 4 |
+---+----------+-----+-------------------+-------+--------+
| 0 | 20181201 | 3 | file.log | 12345 | 123456 |
| 1 | 20181201 | 12 | otherfile.log | 23456 | 123 |
| 2 | 20181201 | 200 | odd file name.log | 23456 | 123 |
+---+----------+-----+-------------------+-------+--------+