Python、列表、数组、元组、数据——如何处理特定的数据集答案

【问题标题】：Python, lists, arrays, tuples, data - How to deal with a particular datasetPython、列表、数组、元组、数据——如何处理特定的数据集
【发布时间】：2016-02-15 18:04:25
【问题描述】：

我希望这次我能给你足够的信息来解释我自己。我正在尝试以矢量符号读取速度数据，以便（目前）绘制一些 XY 散点图。文件如下所示：

#               x            0.0025             0.005            0.0075              0.01             0.015              0.02              0.03              0.04              0.05              0.06              0.08               0.1              0.12              0.14              0.16              0.18               0.2
#               y                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0
#               z                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0                 0
#            Time
           50                 (0.0007558915435 -0.0004561530839 -0.0004827045695)                 (0.002621093455 -0.0004982563588 -0.0004670886403)                 (0.004284814163 -0.0004701779131 -0.0003427572777)                 (0.005427856321 -0.0004415657508 -0.0002581055849)                 (0.009283872431 -0.0003824524669 -9.862169137e-05)                 (0.01336058599 -0.0003623751773 -3.007799017e-05)                 (0.02241437059 -0.0002222313074 0.0001136439177)                 (0.03056537385 -4.38083924e-05 0.0002682758253)                 (0.038580681 -4.613463513e-06 0.0002734791838)                 (0.04315368113 7.912822938e-05 0.0002553115381)                 (0.04920978201 0.0001259194082 0.0001679574544)                 (0.05178246176 3.113282703e-05 8.74525373e-05)                 (0.05351566041 -6.546046173e-07 5.251841968e-05)                 (0.05470950178 5.582683289e-06 5.456222367e-05)                 (0.05765609801 1.604055123e-05 5.61024635e-05)                 (0.05910960178 8.390667426e-06 5.051911761e-05)                 (0.06047027361 -3.362615186e-06 5.137448521e-05)
          100                 (-0.03638183522 -0.0004212943087 -0.0001445116086)                 (-0.04599742972 1.934674765e-05 0.0002080845418)                 (-0.0263580529 0.0007034850972 0.0007206210834)                 (-0.005878665916 0.0009878563826 0.0009139785036)                 (0.03751451082 0.0008459502289 0.0008117077564)                 (0.06155058308 0.0007058376794 0.0007077796084)                 (0.09253546972 0.0005743407599 0.0005878527131)                 (0.1056482525 0.0004776711045 0.0005015883363)                 (0.1147274675 0.0003535542095 0.0003873958082)                 (0.1197626602 0.0003578742091 0.0003643755411)                 (0.1264856441 0.0003138045371 0.0003051010097)                 (0.1307027216 0.0002453538171 0.0002362933067)                 (0.1347570923 0.000177587389 0.0001672847755)                 (0.1366348914 0.0001554091899 0.000144292499)                 (0.1398319486 0.0001272587836 0.000111811677)                 (0.141127784 0.0001160117874 9.894530615e-05)                 (0.1422487007 0.0001054244658 8.819660841e-05)
          150                 (-0.05825943888 0.0001136539473 0.0004206885026)                 (-0.04572555779 0.0007272639883 0.0005475238907)                 (0.001189305157 0.001076000002 0.0006294173999)                 (0.02934769975 0.0009229883365 0.0006037649856)                 (0.07194848666 0.0006515992717 0.0005186304839)                 (0.09490965777 0.0005256600022 0.0004767879994)                 (0.1233413075 0.0004350708279 0.0004479392071)                 (0.1347607461 0.0003609992666 0.0003952444021)                 (0.1426707096 0.0002771968784 0.0003190311903)                 (0.147209712 0.0002727655531 0.0003053133615)                 (0.1532548565 0.0002247845037 0.0002564816634)                 (0.1570851548 0.0001718066583 0.0002036570558)                 (0.1608564722 0.0001242749078 0.0001549789597)                 (0.1626047646 0.0001093818898 0.0001393982173)                 (0.1656239159 9.055609841e-05 0.0001172163492)                 (0.1668961273 8.334132321e-05 0.0001085831113)                 (0.168037179 7.648813655e-05 0.0001009290741)
... and so on down to ...
        10000

“...”表示有更多数据，但我不得不删减“一点”以使其易于理解。数据由空格分隔。我想了解处理此类数据的更好方法，以便读取、绘制或以其他格式写入，保留或不保留括号。

我正在考虑将其作为列表读取，去掉“()”符号，并通过对列表进行切片来绘制数据。或者，我应该使用数组吗？

在这两种情况下，我应该将向量视为元组吗？或作为列表？（在列表或数组内）或每个数字作为列表的成员，在这种情况下，我在绘制 X、Y 或 Z 坐标时必须小心。

我已经写了一些代码，但是我卡住了。我昨晚只睡了两个小时，我现在正在付出代价:-(

代码：

import glob
import numpy as np
import matplotlib.pyplot as plt

#=============================================================================#
# The header of Velocity (U) probes shows the XYZ coordinates in separate     #
# lines. To work with the center line along the wake, we may assume Y=Z=0.    #
# Thus, we are interested in the values of X, in the first line of the file.  #
# The first character of each header line is '#', and the second character is #
# the coordinate, 'x' for the first line.                                     #
# The first element of interest will be [2] of the list                       #
#=============================================================================#

inFile = glob.glob("*.inp")  # list of files in current directory for input.

for Ufile in inFile:
    print("File Opened: ", Ufile)
    fi = open(Ufile, "rb")       # openning input file for reading.

    fileroot = Ufile[0:-4]       # keeping input file root for output file
    outfile = fileroot + '.out'  # adding extension
    fo = open(outfile, "wb")     # openning output file for writing

    try:
        inHead = fi.readlines()[0]  # Read X-coordinates and transform to float
        inHead = inHead.split()
        outHead = inHead[2:]

        inData = fi.readlines()[4:]    # Read data as strings. Skipping header
        r = 0
        for line in inData:
            fila = line.split()        # Divinding each row in elements
            c = 0
            for elem in fila:
                if elem[0] == '(':     # Slicing undesired character
                    elem = elem[1:]
                    fila[c] = float(elem)  # Converting string to float
                elif elem[-1] == ')':      # Slicing undesired character
                    elem = elem[0:-1]
                    fila[c] = float(elem)  # Converting string to float
                else:
                    fila[c] = float(elem)  # Converting string to float
                c += 1        # Tracking with row element the loop is at
            inData[r] = fila  # Updating list row with '(' and ')' removed
            r += 1


    finally:
        print("File Closed: ", Ufile)
        fi.close()
        fo.close()

在此处粘贴代码时，某些缩进可能显示错误。我展示的就是它应该做的。

提前致谢。

【问题讨论】：

您可以使用re.findall 来搜索元组。我相信所有程序都会有 5 行左右

标签： python arrays list file-io tuples

【解决方案1】：

首先，您应该使用open(Ufile, "r") 而不是open(Ufile, "rb")（也可以是wb），因为您使用的是文本文件。其次，inData = fi.readlines()[4:] 不读取任何内容（指向文件末尾的文件指针，因为您之前使用了inHead = fi.readlines()[0]。您可以使用fi.seek(0) 重置它。更好的是，您可以将所有行读取到 var 并将其用于inHead 和 inData。第三，你不输出任何东西...... 你可以用elem.rstrip('\)').lstrip('\(')代替一些代码..

【讨论】：

谢谢您，platinhom，仍在处理此问题，但有所收获。希望尽快发布解决方案！

【解决方案2】：

为每一行使用列表（向量）列表可能是您的解决方案。但是如果每一行都有一组向量，那么将它们输入到一个 numpy 数组中将是前进的方向。

然而，处理字符串数据，以下应该会有所帮助：

import numpy as np

#The following assumes the data is read as lines of text in the following format
txt=["           50                 (0.0007558915435 -0.0004561530839 -0.0004827045695) .... (0.06047027361 -3.362615186e-06 5.137448521e-05)",
     "          100                 (-0.03638183522 -0.0004212943087 -0.0001445116086) .... (0.1422487007 0.0001054244658 8.819660841e-05)",
     "          150                 (-0.05825943888 0.0001136539473 0.0004206885026) ....  (0.168037179 7.648813655e-05 0.0001009290741)"]
complete_list = []

for line in txt:
    line_part = line.split('(')
    header = int(line_part[0].strip(' '))  #changed from .rstrip(' ')
    vector_list = []
    for vector in line_part[1:]:
        coords = vector.split(' ')
        X = float(coords[0])
        Y = float(coords[1])
        Z = float(coords[2].rstrip(')'))
        vector_list.append([X,Y,Z])
    vector_array = np.array(vector_list)
    complete_list.append([header,vector_array])

#addressing can be done as follows:
line = 1
vector =2
print("header\n",complete_list[line][0])
print("vector\n",complete_list[line][1][vector])

【讨论】：

谢谢科林，仍在努力。我改变了读取文件的方式。首先，我将它们“倾倒”在一个列表中，并将处理线分割、切片和玩得开心:-)
很抱歉再次问科林，但我卡住了。您的建议是最接近我的问题的建议，但仍然无法弄清楚如何解决这个问题。因此，正如我在发布问题时所说的那样，我有很多行需要从其中读取数值数据，并且我想将其放入数组中。我可以将 for 循环中的最后一行更改为 "np.array(vector_list.append([U,V,W]))" 但是，我收到一条错误消息： ValueError: invalid literal for int() with base 10 : ' ' , 在 "header = int(... " 行
@CarlosE.MV ，好吧听起来好像错误更进一步。请您完整发布前两行数据。解释数据似乎是一个简单的问题。
我已经编辑了我的原始帖子以添加您要求的数据。对许多字符使用此评论框。谢谢！
@CarlosE.MV 抱歉花了这么长时间。我已经编辑了我的答案，因为我没有考虑到“标题”编号之前的空格。希望这可以消除您最近遇到的错误。我还添加了另一个循环来处理所有数据行，并给出了如何读取结果的示例。希望这现在可以工作了！