【问题标题】:Read ".db" dictionary type file into pandas DataFrame将“.db”字典类型文件读入pandas DataFrame
【发布时间】:2018-01-05 10:36:47
【问题描述】:

如何将包含以下数据的文件导入 pandas DataFrame?它保存为“data.db”,一种我不熟悉的格式。

{"hostname":"136.243.73.66","ip":"136.243.73.66","port":16600,"TCPPort":15600,"UDPPort":14600,"seen":1,"connected":0,"tried":0,"weight":1,"dateTried":null,"dateLastConnected":null,"dateCreated":{"$$date":1514997141045},"isTrusted":true,"key":"5d301e5f46bb6e0db9d379c19c451cc6905c09885e5529c7a3c5d2750674db5ab8c3e714edd4fd53fee86499db4f93f8","remoteKey":null,"lastConnections":[],"_id":"FTpfNM4c4OuXAS6d"}
    {"hostname":"45.77.187.45","ip":"45.77.187.45","port":16600,"TCPPort":15600,"UDPPort":14600,"seen":1,"connected":0,"tried":0,"weight":1,"dateTried":null,"dateLastConnected":null,"dateCreated":{"$$date":1514997141046},"isTrusted":true,"key":"c752b75fbc74eaba10745937d50abd2decf71c509aff49db6662a180ba76fa3f74e5118ad905adb3b6873c250270f85f","remoteKey":null,"lastConnections":[],"_id":"f6Gn2xXyoeMrSvi8"}
    {"hostname":"mainnet.deviota.com","ip":null,"port":16600,"TCPPort":15600,"UDPPort":14600,"seen":1,"connected":0,"tried":0,"weight":1,"dateTried":null,"dateLastConnected":null,"dateCreated":{"$$date":1514997141048},"isTrusted":true,"key":"a923372977f65fe08f472916f671a1749963cea36701682761307af8537c52d4e2414f4e5b471898ef84a0957b5deec3","remoteKey":null,"lastConnections":[],"_id":"oVKsMubQ5rtAhfpq"}
    {"hostname":"mainnet2.deviota.com","ip":null,"port":16600,"TCPPort":15600,"UDPPort":14600,"seen":1,"connected":0,"tried":0,"weight":1,"dateTried":null,"dateLastConnected":null,"dateCreated":{"$$date":1514997141049},"isTrusted":true,"key":"9aae219149f088c9295de31125fb1f39060dd4fe1540c048f2bd097375298703c43d49dc48bb609f708f8b6e2578f7f2","remoteKey":null,"lastConnections":[],"_id":"rkQpS6BimYvfDIZU"}

这是尽可能接近的,但目前标签仍在表格内。

file_path = "path_to_file/data.db"

def read_data():
    with open(file_path) as f:
        return [x.split(',') for x in f.readlines()]

a = read_data()

pd.DataFrame(a)

【问题讨论】:

    标签: python pandas dataframe import


    【解决方案1】:

    你可以使用read_json,因为它是json文件,带有参数lines=True

    df = pd.read_json('sample.db', lines=True)
    print (df)
    
       TCPPort  UDPPort               _id  connected                dateCreated  \
    0    15600    14600  FTpfNM4c4OuXAS6d          0  {'$$date': 1514997141045}   
    1    15600    14600  f6Gn2xXyoeMrSvi8          0  {'$$date': 1514997141046}   
    2    15600    14600  oVKsMubQ5rtAhfpq          0  {'$$date': 1514997141048}   
    3    15600    14600  rkQpS6BimYvfDIZU          0  {'$$date': 1514997141049}   
    
       dateLastConnected  dateTried              hostname             ip  \
    0                NaN        NaN         136.243.73.66  136.243.73.66   
    1                NaN        NaN          45.77.187.45   45.77.187.45   
    2                NaN        NaN   mainnet.deviota.com           None   
    3                NaN        NaN  mainnet2.deviota.com           None   
    
       isTrusted                                                key  \
    0       True  5d301e5f46bb6e0db9d379c19c451cc6905c09885e5529...   
    1       True  c752b75fbc74eaba10745937d50abd2decf71c509aff49...   
    2       True  a923372977f65fe08f472916f671a1749963cea3670168...   
    3       True  9aae219149f088c9295de31125fb1f39060dd4fe1540c0...   
    
      lastConnections   port  remoteKey  seen  tried  weight  
    0              []  16600        NaN     1      0       1  
    1              []  16600        NaN     1      0       1  
    2              []  16600        NaN     1      0       1  
    3              []  16600        NaN     1      0       1  
    

    如果想解析dateCreated列中字典的值,请添加apply

    df = pd.read_json('sample.json', lines=True)
    df['dateCreated'] = df['dateCreated'].apply(lambda x: x.get('$$date'))
    print (df)
       TCPPort  UDPPort               _id  connected    dateCreated  \
    0    15600    14600  FTpfNM4c4OuXAS6d          0  1514997141045   
    1    15600    14600  f6Gn2xXyoeMrSvi8          0  1514997141046   
    2    15600    14600  oVKsMubQ5rtAhfpq          0  1514997141048   
    3    15600    14600  rkQpS6BimYvfDIZU          0  1514997141049   
    
       dateLastConnected  dateTried              hostname             ip  \
    0                NaN        NaN         136.243.73.66  136.243.73.66   
    1                NaN        NaN          45.77.187.45   45.77.187.45   
    2                NaN        NaN   mainnet.deviota.com           None   
    3                NaN        NaN  mainnet2.deviota.com           None   
    
       isTrusted                                                key  \
    0       True  5d301e5f46bb6e0db9d379c19c451cc6905c09885e5529...   
    1       True  c752b75fbc74eaba10745937d50abd2decf71c509aff49...   
    2       True  a923372977f65fe08f472916f671a1749963cea3670168...   
    3       True  9aae219149f088c9295de31125fb1f39060dd4fe1540c0...   
    
      lastConnections   port  remoteKey  seen  tried  weight  
    0              []  16600        NaN     1      0       1  
    1              []  16600        NaN     1      0       1  
    2              []  16600        NaN     1      0       1  
    3              []  16600        NaN     1      0       1  
    

    【讨论】:

      【解决方案2】:

      .db 不是特定的文件类型,尽管它经常用于 sqlite 文件。然而,这似乎只是一系列 JSON 文档,每行一个。

      with open(file_path) as f:
          return [json.loads(x) for x in f]
      

      【讨论】:

        猜你喜欢
        • 2019-06-26
        • 2016-02-22
        • 2021-11-24
        • 2019-06-08
        • 1970-01-01
        • 2018-07-13
        • 2017-11-14
        • 1970-01-01
        相关资源
        最近更新 更多