【问题标题】:CSV parsing & testing with Python and Pandas使用 Python 和 Pandas 进行 CSV 解析和测试
【发布时间】:2019-11-22 22:27:55
【问题描述】:

我正在尝试在 Python 中测试一系列输出 CSV 文件,在每个 CSV 上我尝试读取并解析它,以便测试其中的以下内容。 (目前正在使用 Pycharm ide 和 windows CLI 进行测试)

  1. 如果该 csv 文件存在则断言
  2. 断言 csv 标头中存在 x 个列
  3. 断言存在某些列标题(例如:TITLE、DOB 和 MAN_ID)
  4. 然后生成带有结果的 Html 报告

这是我正在使用的示例,测试 csv 文件。


MAN_ID,TITLE,YOB,MOB,DOB,BDT,DT,RC_ID,EC_ID,L_ID,PID,CS_ID,PSV,GSV,GSC_ID,RSV,RSC_ID,ESV,ESC_ID
1,,1946,5,2,00:00:00.019460,,,,,0,,,F,,,,,
9,,1981,2,21,00:00:00.019810,,,,,9,,,M,,,,,
8,,1957,12,12,00:00:00.019571,,,,,8,,,M,,,,,
7,,1990,3,19,00:00:00.019900,,,,,7,,,F,,,,,
6,,1976,8,18,00:00:00.019760,,,,,6,,,F,,,,,
5,,1976,11,10,00:00:00.019761,,,,,5,,,M,,,,,
4,,1981,7,19,00:00:00.019810,,,,,4,,,M,,,,,
3,,1989,1,8,00:00:00.019890,,,,,4,,,M,,,,,
2,,1985,3,28,00:00:00.019850,,,,,4,,,M,,,,,


这是我迄今为止尝试过的,

  1. 可以断言文件是否存在
  2. 试图将我的 csv 转换为 pandas 数据帧(目前无法正确执行)
  3. 然后使用该 pandas 数据框对列、列名、空值等进行断言(当前在此处失败,因为第 2 步失败)

我当前的 Python 测试脚本代码

import csv
import os.path
from os import path
import pandas as pd
import pytest

assert path.exists("C:\Work\Tests\test.csv"), "test.csv file does not exists"
p = "C:\Work\Tests\test.csv"
path.exists(p)

file = open(p, newline='')

reader = csv.reader(file)

header = next(reader) # The first line ie Header/title is Skipped in the CSV file
data = [row for row in reader] # Read the remaining data

#Checking for presence of test.csv
def main():
    print ("File exists:"+str(path.exists(p)))

if __name__== "__main__":
   main()

#defining Print all
def printAll():
    print(header) # print just the header columns
    print(data[1]) # printing second row
    #print(data) # printing all csv data
    #print(pd.read_csv(p)) ##printing csv data in table as rows and columns
    print('Success')

printAll()

"""
df = pd.DataFrame({pd.read_csv(p)}, columns=['TITLE', 'DOB', 'MAN_ID'])
#print("Printing Dataframe: "+df)
print("dataframe not head.....")
#print(pd.read_csv(p).head())
#print(pd.read_csv(p))
#print(df.shape)
#print(len(df.index))
#print(len(df.columns))

print(len(pd.read_csv(p)))


请给我建议 1. 如何从 csv 文件中获取 pandas 数据框,然后进行我需要的断言和报告。 2. 我也可以用 Pytest 实现这一切吗?

提前感谢您的宝贵时间。

【问题讨论】:

  • 我下面的回答令人满意,您能否接受正确的答案?谢谢。
  • 谢谢 Run-out,是的,我将要这样做,但是你知道我是否可以使用 Pytest 完成上述所有测试吗?
  • 你用pytest的文档试过了吗?您需要使用函数上的测试断言语句创建一个测试文件。您可以将主程序导入测试文件,然后进行断言以验证信息,如下面的答案。然后你可以从终端运行测试文件,它会告诉你断言语句是否通过。

标签: python-3.x pandas csv


【解决方案1】:

使用 pd.read_csv('filename')

import Pandas as pd

df = pd.read_csv('parsing.csv')
print(df)

   MAN_ID  TITLE   YOB  MOB  DOB              BDT  DT  RC_ID  EC_ID  L_ID  \
0       1    NaN  1946    5    2  00:00:00.019460 NaN    NaN    NaN   NaN   
1       9    NaN  1981    2   21  00:00:00.019810 NaN    NaN    NaN   NaN   
2       8    NaN  1957   12   12  00:00:00.019571 NaN    NaN    NaN   NaN   
3       7    NaN  1990    3   19  00:00:00.019900 NaN    NaN    NaN   NaN   
4       6    NaN  1976    8   18  00:00:00.019760 NaN    NaN    NaN   NaN   
5       5    NaN  1976   11   10  00:00:00.019761 NaN    NaN    NaN   NaN   
6       4    NaN  1981    7   19  00:00:00.019810 NaN    NaN    NaN   NaN   
7       3    NaN  1989    1    8  00:00:00.019890 NaN    NaN    NaN   NaN   
8       2    NaN  1985    3   28  00:00:00.019850 NaN    NaN    NaN   NaN   

更多栏目:

   PID  CS_ID  PSV GSV  GSC_ID  RSV  RSC_ID  ESV  ESC_ID  
0    0    NaN  NaN   F     NaN  NaN     NaN  NaN     NaN  
1    9    NaN  NaN   M     NaN  NaN     NaN  NaN     NaN  
2    8    NaN  NaN   M     NaN  NaN     NaN  NaN     NaN  
3    7    NaN  NaN   F     NaN  NaN     NaN  NaN     NaN  
4    6    NaN  NaN   F     NaN  NaN     NaN  NaN     NaN  
5    5    NaN  NaN   M     NaN  NaN     NaN  NaN     NaN  
6    4    NaN  NaN   M     NaN  NaN     NaN  NaN     NaN  
7    4    NaN  NaN   M     NaN  NaN     NaN  NaN     NaN  
8    4    NaN  NaN   M     NaN  NaN     NaN  NaN     NaN  

总列数:

df.shape[1]

19

列名:

df.columns

Index(['MAN_ID', 'TITLE', 'YOB', 'MOB', 'DOB', 'BDT', 'DT', 'RC_ID', 'EC_ID',
       'L_ID', 'PID', 'CS_ID', 'PSV', 'GSV', 'GSC_ID', 'RSV', 'RSC_ID', 'ESV',
       'ESC_ID'],
      dtype='object')

列中的特定列名:

'YOB' in df.columns

True

检查空值。

df.isna()

  MAN_ID  TITLE    YOB    MOB    DOB    BDT    DT  RC_ID  EC_ID  L_ID    PID  \
0   False   True  False  False  False  False  True   True   True  True  False   
1   False   True  False  False  False  False  True   True   True  True  False   
2   False   True  False  False  False  False  True   True   True  True  False   
3   False   True  False  False  False  False  True   True   True  True  False   
4   False   True  False  False  False  False  True   True   True  True  False   
5   False   True  False  False  False  False  True   True   True  True  False   
6   False   True  False  False  False  False  True   True   True  True  False   
7   False   True  False  False  False  False  True   True   True  True  False   
8   False   True  False  False  False  False  True   True   True  True  False   

更多栏目:

  CS_ID   PSV    GSV  GSC_ID   RSV  RSC_ID   ESV  ESC_ID  
0   True  True  False    True  True    True  True    True  
1   True  True  False    True  True    True  True    True  
2   True  True  False    True  True    True  True    True  
3   True  True  False    True  True    True  True    True  
4   True  True  False    True  True    True  True    True  
5   True  True  False    True  True    True  True    True  
6   True  True  False    True  True    True  True    True  
7   True  True  False    True  True    True  True    True  
8   True  True  False    True  True    True  True    True  

【讨论】:

    猜你喜欢
    • 2016-03-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-03-25
    • 2020-10-23
    • 2012-04-13
    • 2022-07-04
    • 2017-10-14
    相关资源
    最近更新 更多