在python数据框中迭代文件路径答案

【问题标题】：Iterating file paths in python dataframe在python数据框中迭代文件路径
【发布时间】：2021-09-18 06:11:13
【问题描述】：

我有一个数据框，其中包含称为 filedataframe 的所有文件路径。我的代码适用于从单个 xml 文件中提取我想要的内容。但它目前是为单个文件设置的。 如何在其中迭代数据框 filedataframe 以使用文件路径？我想添加 rootId、file_Name、unique_ID 和 employee_badge 以及相应的文件路径。

import re
import pathlib
import os  
import pandas as pd
import xml.etree.ElementTree as ET

filesdataframe = []
# example path would be Defined Contributions,



xmlfile = (r'INVESTING.cdm')
    #WE are parseing it.
tree = ET.parse(xmlfile)
    #We then get the root.
root = tree.getroot()

for elm in root.findall('.//{object}IntraModelReport'):
        print(elm.text)


for Model in root.findall('.//{object}IntraModelReport'):
        rootId = elm.attrib
        file_Name = Model.find("{attribute}Code").text
        unique_ID = Model.find("{attribute}ObjectID").text
        employee_badge = Model.find("{attribute}Creator").text
        print(rootId,file_Name, unique_ID, employee_badge)

【问题讨论】：

标签： python pandas dataframe loops iteration

【解决方案1】：

试试这个。

import re
import pathlib
import os  
import pandas as pd
import xml.etree.ElementTree as ET
from typing import Dict, List

def process_single_xmlfile(xmlfile: str, verbose: bool=False) -> Dict:
    tree = ET.parse(xmlfile)
    root = tree.getroot()

    for elm in root.findall('.//{object}IntraModelReport'):
        print(elm.text)

    package: Dict = {'xmlfile': xmlfile, 'models': []}
    for Model in root.findall('.//{object}IntraModelReport'):
        rootId = elm.attrib
        file_Name = Model.find("{attribute}Code").text
        unique_ID = Model.find("{attribute}ObjectID").text
        employee_badge = Model.find("{attribute}Creator").text
        if verbose:
            print(rootId, file_Name, unique_ID, employee_badge)
        package['models'].append(dict(
            rootId = rootId,
            file_Name = file_Name, 
            unique_ID = unique_ID,
            employee_badge = employee_badge,
        ))
    return package

#### LOOP OVER
        
# all the results will be stored in this list        
extracts: List[Dict] = []
# xmlfiles is a list of xml filenames: You need to provide this
# you can replace "xmlfiles" with your "filedataframe".
for xmlfile in xmlfiles:
    # set verbose=True to enable printing
    extracts.append(process_single_xmlfile(xmlfile, verbose=False))

【讨论】：

我确实需要一点帮助来了解正在发生的事情。因此，在 package = dict 的行中，我遇到了语法错误。同样，我在“filedataframe”中有所有文件路径。我需要将此数据框转换为名为“xmlfiles”的列表吗？
@StevenMarsh 是的，你提到的那一行有一个错字。我刚刚纠正了它。您可以将xmlfiles 替换为filedataframe，这样也可以。但如果 filedataframe 是一个列表，您可能需要更改变量的名称，因为名称 filedataframe 听起来像是文件的数据框。