如何遍历并解析 json 文件的文件夹然后输出到单个文件答案

【问题标题】：How to walk through and parse folder of json files then output to single file如何遍历并解析 json 文件的文件夹然后输出到单个文件
【发布时间】：2014-02-12 13:11:38
【问题描述】：

我有一个 json 文件的文件夹，我想解析特定的键值对。然后将这些对附加到字典中，然后将该字典（作为 json 行）输出到新的 json 文件中。我目前无法获取文件夹中的文件进行解析，更不用说将解析后的数据导入字典进行打印了。这是我的代码：

import json, os

FbDict=[]

topdir=os.getcwd() 

def main():        

    for root, dirs, files in os.walk(topdir):            
        for f in files:                        
            if f.lower().endswith((".json")):                    
                json_data = open(f, 'r+').read().decode("utf-8")
                jdata = json.loads(json_data)   
                fname=f.split(".json")[0]
                for k, v in jdata.items(): 
                    if isinstance(v, dict):                                                                
                        try:
                            dataFormat = {"created_at":v['data'][0]['created_time'],"user":v['data'][0]['from']['id'],
                                               "id":v['data'][0]['id'],"name":v['data'][0]['from']['name'],"text":v['data'][0]['message']}                                        
                                FbDict.append(json.dumps(dataFormat, separators=(',', ':')))                                                                            
                        except KeyError:
                            continue                            

if __name__ == '__main__':
    main()
    with open ('fbFile', 'w') as f:
        f.write(FbDict)

【问题讨论】：

你的代码有什么问题？它会产生错误吗？它会运行但产生错误的输出吗？
@larsks 哦，是的，那个。它给了我 [errno 2] - 没有这样的文件或目录，并列出了一个不在文件夹中但之前在记事本中打开的文件。我不明白这段代码是如何对不属于文件夹的其他文件进行排序的？

标签： python json parsing dictionary

【解决方案1】：

以下是 Python 文档中您缺少的部分：

http://docs.python.org/2/library/os.html#os.walk

请注意，列表中的名称不包含路径组件。要获得 dirpath 中文件或目录的完整路径（以 top 开头），请执行os.path.join(dirpath, name)。

现在您只是在files 上进行迭代，这是没有任何路径信息的裸文件名。添加路径信息，您应该停止收到那些“找不到文件”的错误。

【讨论】：

有道理，我只是不知道在哪里添加它？
@user2338089 - 你想只搜索顶级文件，还是搜索所有 json 文件，递归遍历所有子目录？
@Robᵩ 越远越好，所以所有子目录都会有所帮助

【解决方案2】：

感谢@rmunn 和@Rob 的帮助，这是更新：

import json, os

FbDict=[]

def main():        

    for root, dirs, files in os.walk(os.getcwd()):            
        for f in files:                        
            if f.lower().endswith((".json")):                    
                f = os.path.join(root, f)
                with open(f, 'r') as f: json_data=f.read().decode("utf-8")
                jdata = json.loads(json_data)                       
                for k, v in jdata.items(): 
                    if isinstance(v, dict):                                                                
                        try:
                            dataFormat = {"created_at":v['data'][0]['created_time'],"user":v['data'][0]['from']['id'],
                                           "id":v['data'][0]['id'],"name":v['data'][0]['from']['name'],"text":v['data'][0]['message']}                                        
                            if dataFormat no in FbDict:
                                FbDict.append(json.dumps(dataFormat, separators=(',',':')))                          
                            else:
                                continue              
                        except KeyError:
                            continue
                f.close()

if __name__ == '__main__':
    main()
    with open ('fbFile.json', 'w') as f_out:
        for line in fbDict:
             f_out.write(line+'\n')
        f_out.close()

【讨论】：

很好地找到了自己的答案并发布了它。如果答案正确，您也可以接受自己的答案。两个批评：1）你不需要'r+'，'r'会更合适，2）你没有关闭你打开的文件。试试with open(f,'r') as f: json_data = f.read().decode('utf-8')。
@Robᵩ 感谢您的帮助！我会在更新中添加您的建议。如何添加功能以通过子文件夹？