将列添加到 .DBF 文件并从制表符分隔的文件加载数据答案

【问题标题】：Add columns to a .DBF file and load data from a tab delimited file将列添加到 .DBF 文件并从制表符分隔的文件加载数据
【发布时间】：2021-12-10 18:29:01
【问题描述】：

我有几千个 .DBF 文件需要更新。这些文件目前有一个包含一列或两列（名称、地理路径）或（名称）的标题，以及一行数据。

更新包括：

如果存在地理路径列，则删除它
添加其他列
用制表符分隔文件中的数据填充（仅）数据行中的新列。

我无法添加列并填充它们。（错误见下文）

在高层次上，我的意图是：

遍历源文件中的记录对于每条记录：使用记录中的“文件名”打开相应的 .dbf 文件（在当前目录中）添加其他列标题使用源文件中当前记录的数据填充新列

源文件 - 制表符分隔，列： GUID C(24) 场地名称 C(100) 网络 C(50) 地址 1 C(75) C市(50) SVID 浮点数文件名（例如：test1.dbf）

这适用于第 1 步：

import dbf    
    
file = open('testing.tsv')
for line in file:   #'GUID\tVenueName\tNetworkName\tAddress1\tCity\tSalesVenueID\n'
    fields = line.strip().split('\t')
    res=[fields[0], fields[1], fields[2], fields[3], fields[4],fields[5], fields[6]]
    print("the elements of the list are : " + str(res))
    #fields[0]=Guid, fields[1]=VenueName, fields[2]=NetworkName, fields[3] = Address1, fields[4]=City, fields[5]=SalesVenueID, fields[6]=FilePath
    filename=fields[6]
    print("filename is: " + str(filename))
    if not str(filename)=='FileName':
        with dbf.Table(str(filename)) as db:
                     
            try: 
                db.delete_fields('geojson')
            except Exception:
                pass
            db.pack()

对于第 2 步和第 3 步，这个稍有不同的 sn-p 写入列，但错误为 'data to append must be a tuple, dict, record, or template;在 table.append 步骤中不是 '。

import dbf    
    
file = open('testing.tsv')
for line in file:   #'GUID\tVenueName\tNetworkName\tAddress1\tCity\tSalesVenueID\n'
    fields = line.strip().split('\t')
    res=[fields[0], fields[1], fields[2], fields[3], fields[4],fields[5], fields[6]]
    print("the elements of the list are : " + str(res))
    #fields[0]=Guid, fields[1]=VenueName, fields[2]=NetworkName, fields[3] = Address1, fields[4]=City, fields[5]=SalesVenueID, fields[6]=FilePath
    currentfilename=str(fields[6])
    print("filename is: " + currentfilename)
    if not currentfilename=='FileName':   #Ignore header row

# create an in-memory table
        table = dbf.Table(
            filename=currentfilename,
            field_specs='GUID C(24); VenueName C(100); Network C(50) ; Address1 C(75); City C(50); SVID C(20)',
            on_disk=True,
            )
        table.open(dbf.READ_WRITE)

# add some records to it
        for datum in (
            (tuple(res))
            ):
            table.append(datum)

# iterate over the table, and print the records
        for record in table:
            print(record)
            print('--------')


   table.pack()

查看屏幕截图以了解变量 variable 'res' 中的内容

已编辑以包含最终（工作）代码：

    import dbf    
    
    file = open('SourceFile.tsv')
    for line in file:   
    #'GUID\tVenueName\tNetworkName\tAddress1\tCity\tSalesVenueID\n'
    fields = line.strip().split('\t')
    res=[fields[0], fields[1], fields[2], fields[3], fields[4]]
    print("the elements of the list are : " + str(res))
    #fields[0]=VenueName, fields[1]=NetworkName, fields[2] = Address1, 
    fields[3]=City, fields[4]=SalesVenueID, fields[5]=FilePath
    currentfilename=str(fields[5])
    print("filename is: " + currentfilename)
    if not currentfilename=='FileName':   #Ignore header row

    # create or open existing file
        table = dbf.Table(
            filename=currentfilename,
            field_specs='VenueName C(100); Network C(50) ; Address1 
C(75); City C(50); SVID C(20)',
            on_disk=True,
            )
        table.open(dbf.READ_WRITE)

# Update record
    VenueName=fields[1]
        
    try:
        table.append(tuple(res)) 
            
    except Exception as e:
        print('--------')
        print('--------')
        print('Error on VenueName: ' + VenueName )
        print(e)
        pass

    # iterate over the table, and print the records
    for record in table:
        print(record)
        print('--------')
    table.pack()

【问题讨论】：

标签： python shapes dbf

【解决方案1】：

当您遍历tuple(res) 以获取datum 时，看起来您正在提取单个字段值，并且其中至少一个是字符串。从错误消息来看，我敢打赌 Table.append() 方法一次需要一整行，所以 res 变量将是要传递的参数，因为它代表整行。

此外，fields 和 res 的值看起来相同（除非您有意只存储 fields 的前 7 个值）。如果是这种情况，您可以只参考fields，而不是让两个变量执行相同的角色。当然，如果按照您现在的方式进行操作可以让您更轻松地阅读代码，那也无妨。

【讨论】：

我在上面的问题中添加了res变量输出，还是报错了。
是同一个错误吗？ table.append(res) 至少应该提供被请求的元组。

【解决方案2】：

未经测试：

import dbf

NEW_FIELDS = "GUID C(24);VenueName C(100);Network C(50);Address1 C(75);City C(50);SVID C(20)"

    
file = open('testing.tsv')
for line in file:   #'GUID\tVenueName\tNetworkName\tAddress1\tCity\tSalesVenueID\n'
    fields = line.strip().split('\t')
    guid, venue_name, network, address1, city, svid, filename = fields
    if filename != 'FileName':
        with dbf.Table(filename) as db:
            # remove geojson if it exists
            if 'geojson' in dbf.fields(db):
                db.delete_fields('geojson')
            # add the new fields
            db.add_fields(NEW_FIELDS)
            with db[0] as rec:         # the first and only record
                rec.guid = guid
                rec.venuename = venuename
                rec.network = network
                rec.address1 = address1
                rec.city = city
                rec.svid = svid

关于代码的一些cmets：

文件以文本模式打开，所以所有行都已经是str（无需在数据上继续调用str()）
pack() 函数删除已删除的行；它对已删除的列没有影响
!= 优于 not something == something_else
一次分配多个变量很常见，只要确保你有足够的变量（换句话说，如果标题行以“...\tCity\n”结尾，那么多重分配就会失败）。
将list 更改为tuple 不会改变您的算法——您仍试图一次附加一个数据字段；正确的行是：

table.append(tuple(res)) # add the whole row at once

如果上面的代码有任何错误，请发表评论，祝你好运！

【讨论】：