从文本文件中删除字符串保持浮动答案

【问题标题】：remove string from text file keep float从文本文件中删除字符串保持浮动
【发布时间】：2012-12-03 15:58:38
【问题描述】：

我希望删除文本文件中带有字符串的行或空行。它看起来像这样。正如您所看到的，标题在文件中自我重复。包含数据的行数因每个块而异。我需要将它作为数组导入 numpy.起初我用逗号作为小数点，至少我能够改变它。

我试过了，但它根本不起作用：

from types import StringType

z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
    for x in z:
        if type(z.readline(x)) is StringType:
            print line


z.close()

数据示例：

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

【问题讨论】：

if line[0].isdigit(): whatever()

标签： python string file

【解决方案1】：

Python 最初会将所有文件元素作为字符串读取，除非您强制转换它们，因此您的方法将不起作用。

您最好的选择可能是使用正则表达式来过滤掉其中包含非数据字符的行。

f = open("datafile")
for line in f:
  #Catch everything that has a non-number/space in it
  if re.search("[^-0-9.\s]",line): 
     continue
  # Catch empty lines
  if len(line.strip()) == 0:
     continue
  # Keep the rest
  print(line)

f.close()

【讨论】：

哇，非常感谢！我所要做的就是修改 if re.search("[^-0-9.\s]"): 为 if re.search("[^-0-9 .\s]",line): 继续前进。
@Starter2 如果它回答了您的问题，您可以将其标记为答案吗？ ;)

【解决方案2】：

你为什么不使用 numpy.loadtxt ？它有一个非常适合这些情况的界面。
见documentation here

yourArry = np.loadtxt(open('yourfilename.txt', skiprows=7)

此外，由于您有 heder（应该是标题，可以在文件顶部找到），您可以将文件拆分为多个文件。您可以使用 Python 来完成，也可以使用 UNIX 命令csplit。怎么做，你会得到什么：

oz123@:~/tmp> csplit -k data.txt   '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00  xx01  xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

oz123@:~/tmp> cat xx02
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

【讨论】：

你能举个例子吗？我对这些文档的阅读并没有显示处理分散在整个文件中的标题的方法。
@StevenRumbalski，我猜它假设 header 确实在顶部，而不是文件中的某个位置。
@Oz123 不幸的是，这不是 OP 问题中的情况
@Chris，OP 可能已经从某个仪器中获取了所有数据文件。这个仪器——我猜在这里——吐出单个文件。出于某种原因，OP 将它们堆叠到一个文件中。将它们拆分为多个文件而不是读取它们应该不是问题......
@Oz123，仪器通过测试自动追加不同周期的数据。我没有堆叠它们。我正在制作一个 GUI 来分析数据，所以我不希望用户导入多个文件。这可能需要更多时间，但对用户来说会更容易。