Python：将markdown表转换为json答案

【问题标题】：Python: Convert markdown table to json withPython：将markdown表转换为json
【发布时间】：2021-05-17 00:43:02
【问题描述】：

我想弄清楚，仅使用 python 将一些降价表文本转换为 json 的最简单方法是什么。例如，将其视为输入字符串：

| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |

想要的输出应该是这样的：

[
    {"Some Title":"Dark Souls","Some Description":"This is a fun game","Some Number":5},
    {"Some Title":"Bloodborne","Some Description":"This one is even better","Some Number":2},
    {"Some Title":"Sekiro","Some Description":"This one is also pretty good","Some Number":110101}
]

注意：理想情况下，输出应该符合 RFC 8259，也就是在键值对周围使用双引号 " 而不是单引号 '。

我见过一些 JS 库可以做到这一点，但仅适用于 python。有人可以向我解释实现这一目标的最快方法是什么，所以我不必为此编写自己的解析器。

感谢所有帮助！

【问题讨论】：

标签： python parsing markdown

【解决方案1】：

你可以把它当作一个多行字符串，在\n和|分割的同时逐行解析

执行此操作的简单代码：

import json

my_str='''| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |'''

def mrkd2json(inp):
    lines = inp.split('\n')
    ret=[]
    keys=[]
    for i,l in enumerate(lines):
        if i==0:
            keys=[_i.strip() for _i in l.split('|')]
        elif i==1: continue
        else:
            ret.append({keys[_i]:v.strip() for _i,v in enumerate(l.split('|')) if  _i>0 and _i<len(keys)-1})
    return json.dumps(ret, indent = 4) 
print(mrkd2json(my_str))

[
    {
        "Some Title": "Dark Souls",
        "Some Description": "This is a fun game",
        "Some Number": "5"
    },
    {
        "Some Title": "Bloodborne",
        "Some Description": "This one is even better",
        "Some Number": "2"
    },
    {
        "Some Title": "Sekiro",
        "Some Description": "This one is also pretty good",
        "Some Number": "110101"
    }
]

PS：不知道有哪个库可以做到这一点，如果我发现了什么会更新！

【讨论】：

干杯，感谢您的意见！我什至没有想过简单地进行字符串拆分——这绝对是一个好方法。刚想到的一件事（即使在我的原始帖子中没有说明）是我们丢失了有关每个单元格对齐的信息。你对此有什么想法吗？也许通过为每个存储对齐方式的单元格设置一个附加字段？
@Kyu96 比如它是居中还是左/右对齐？这很有趣，让我读一下吧！但如果您的原始问题得到解决，请将其标记为答案！
是的，这就是我的意思（左、中和右对齐）。如果您对此有任何想法，请告诉我。另外，我注意到您的解决方案存在一个小问题 - 它使用单引号而不是双引号，根据 RFC 8259，这无效。而且做一个简单的 .replace("'", '"') 不是一种选择，因为引号可能是内容的一部分在一个单元格中。如果您能找到解决方案，我会将其标记为已解决:)
@Kyu96 只需使用json.dumps，解决方案已更新
谢谢！这解决了它！如果您对对齐有任何想法，请告诉我

【解决方案2】：

我的方法与@Kuldeep Singh Sidhu 的方法非常相似：


md_table = """
| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |
"""

result = []

for n, line in enumerate(md_table[1:-1].split('\n')):
    data = {}
    if n == 0:
        header = [t.strip() for t in line.split('|')[1:-1]]
    if n > 1:
        values = [t.strip() for t in line.split('|')[1:-1]]
        for col, value in zip(header, values):
            data[col] = value
        result.append(data)

结果是：

[{'Some Title': 'Dark Souls',
  'Some Description': 'This is a fun game',
  'Some Number': '5'},
 {'Some Title': 'Bloodborne',
  'Some Description': 'This one is even better',
  'Some Number': '2'},
 {'Some Title': 'Sekiro',
  'Some Description': 'This one is also pretty good',
  'Some Number': '110101'}]

【讨论】：

谢谢！您知道如何确保表格中的对齐不会丢失吗？