【问题标题】:Turn nested list of str into nested list of respective datatypes将字符串的嵌套列表转换为相应数据类型的嵌套列表
【发布时间】:2021-07-07 08:25:46
【问题描述】:

我基本上有一个由字符串组成的嵌套列表——我的目标是将这个列表变成它们各自数据类型的列表。我在字符串、日期和浮点数方面取得了成功,但我的问题是,代码也将整数识别为浮点数。我尝试使用嵌套的 try-except 来解决这个问题,但它不起作用。也许有人找到解决方案?

from datetime import datetime


l = [['Middle Management', '5', '5584.10', '2019-02-03', '12', '100'],
 ['Lower Management', '2', '3925.52', '2016-04-18', '12', '100'],
 ['Upper Management', '1', '7174.46', '2019-01-02', '10', '200'],
 ['Middle Management', '5', '2921.92', '2018-02-02', '14', '300'],
 ['Middle Management', '7', '2921.92', '2017-09-09', '17', '400'],
 ['Upper Management', '10', '2921.92', '2020-01-01', '11', '500'],
 ['Lower Management', '2', '2921.92', '2019-08-17', '11', '500'],
 ['Middle Management', '5', '2921.92', '2017-11-21', '15', '600'],
 ['Upper Management', '7', '2921.92', '2018-08-18', '18', '700']]

columns = len(l[0]) #the number of columns is given by the number of objects in the header list, at least in a clean CSV
without_header = l[1:]

types_list = []
looping_list = []

for x in range(0, columns):
        looping_list = [item[x] for item in without_header]
        worklist = []
        for b in looping_list:
            try:
                float(b)
                try:
                    b.is_integer() #this is where it fails!
                    worklist.append(int)
                except:
                    worklist.append(float)
            except:
                try:
                    b=datetime.strptime(b, "%Y-%m-%d")
                    worklist.append(type(b))
                except:
                    worklist.append(type(b))
        types_list.append(worklist)

types_list

我现在的输出是:

 [float, float, float, float, float, float, float, float, float],
 [float, float, float, float, float, float, float, float, float],
 [datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime],
 [float, float, float, float, float, float, float, float, float],
 [float, float, float, float, float, float, float, float, float]]

但我想要的是:

[[str, str, str, str, str, str, str, str, str],
 [int, int, int, int, int, int, int, int, int],
 [float, float, float, float, float, float, float, float, float],
 [datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime,
  datetime.datetime],
 [int, int, int, int, int, int, int, int, int],
 [int, int, int, int, int, int, int, int, int]]
````

【问题讨论】:

    标签: python list csv types


    【解决方案1】:

    您可以使用可应用于每个值并生成检测到的类型的函数:

    import re, datetime
    def get_type(val):
       p = [['\d{4}\-\d{2}\-\d{2}(?:\s\d{2}:\d{2}:\d{2})*', datetime.datetime], ['^\d+$', int], ['^\d+\.\d+', float]]
       return next((b for a, b in p if re.findall(a, val)), str)
    
    l = [['Middle Management', '5', '5584.10', '2019-02-03', '12', '100'], ['Lower Management', '2', '3925.52', '2016-04-18', '12', '100'], ['Upper Management', '1', '7174.46', '2019-01-02', '10', '200'], ['Middle Management', '5', '2921.92', '2018-02-02', '14', '300'], ['Middle Management', '7', '2921.92', '2017-09-09', '17', '400'], ['Upper Management', '10', '2921.92', '2020-01-01', '11', '500'], ['Lower Management', '2', '2921.92', '2019-08-17', '11', '500'], ['Middle Management', '5', '2921.92', '2017-11-21', '15', '600'], ['Upper Management', '7', '2921.92', '2018-08-18', '18', '700']]
    result = [[get_type(b) for b in i] for i in zip(*l)]
    

    输出:

    [[<class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>], 
     [<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>], 
     [<class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>], 
     [<class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>], 
     [<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>], 
     [<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>]]
    

    【讨论】:

      【解决方案2】:

      b.is_integer() 更改为int(b) 即可解决问题。 b.is_integer() 只返回一个布尔值(True 或 False)。

      【讨论】:

        【解决方案3】:

        你不需要嵌套try catch。只需使用这种方式检查您的号码是否为整数。这是更自然的实现方式。

        b = float(b)
        if b.is_integer():
          worklist.append(int)
        else:
          worklist.append(float)
        

        【讨论】:

          【解决方案4】:

          经常使用的一种解决方案是检查数字是否可以转换为 int

          当字符串包含小数时,尝试转换为 int 会引发错误,然后将值添加到 float 数组中。

          另外一点需要注意的是,您应该添加您期望的异常类型,而不是捕获所有异常。 例如,当float(b) 失败时,它将引发ValueError,因此单独捕获它会更好。然后如果有什么异常情况,就会提升到一个更高的级别。

          还值得注意的是,如果列是日期而不是您指定的日期格式,它将被视为字符串。可能值得对此发表评论,以节省将来遇到此类异常时的调试器时间。

          from datetime import datetime
          
          
          l = [['Middle Management', '5', '5584.10', '2019-02-03', '12', '100'],
           ['Lower Management', '2', '3925.52', '2016-04-18', '12', '100'],
           ['Upper Management', '1', '7174.46', '2019-01-02', '10', '200'],
           ['Middle Management', '5', '2921.92', '2018-02-02', '14', '300'],
           ['Middle Management', '7', '2921.92', '2017-09-09', '17', '400'],
           ['Upper Management', '10', '2921.92', '2020-01-01', '11', '500'],
           ['Lower Management', '2', '2921.92', '2019-08-17', '11', '500'],
           ['Middle Management', '5', '2921.92', '2017-11-21', '15', '600'],
           ['Upper Management', '7', '2921.92', '2018-08-18', '18', '700']]
          
          columns = len(l[0]) #the number of columns is given by the number of objects in the header list, at least in a clean CSV
          without_header = l[1:]
          
          types_list = []
          looping_list = []
          
          for x in range(0, columns):
                  looping_list = [item[x] for item in without_header]
                  worklist = []
                  for b in looping_list:
                      try:
                          float(b)
                          try:
                              int(b) #this is where it fails!
                              worklist.append(int)
                          except ValueError:
                              worklist.append(float)
                      except ValueError:
                          try:
                              b=datetime.strptime(b, "%Y-%m-%d")
                              worklist.append(type(b))
                          except ValueError:
                              worklist.append(type(b))
                  types_list.append(worklist)
          
          types_list
          

          【讨论】:

            猜你喜欢
            • 2016-02-08
            • 2021-11-25
            • 2019-04-24
            • 1970-01-01
            • 1970-01-01
            • 2020-09-03
            • 2015-04-03
            • 1970-01-01
            • 2014-06-04
            相关资源
            最近更新 更多