【问题标题】:Parsing nested indented text into lists将嵌套缩进文本解析为列表
【发布时间】:2014-03-21 03:16:01
【问题描述】:

将嵌套的缩进文本解析为列表

嗨,

也许有人可以给我一个开始帮助。

我已经嵌套了与此类似的缩进 txt。我应该将其解析为嵌套列表结构,例如

TXT = r"""
Test1
    NeedHelp
        GotStuck
            Sometime
            NoLuck
    NeedHelp2
        StillStuck
        GoodLuck
"""

Nested_Lists = ['Test1', 
    ['NeedHelp', 
        ['GotStuck', 
            ['Sometime', 
            'NoLuck']]], 
    ['NeedHelp2', 
        ['StillStuck', 
        'GoodLuck']]
]

Nested_Lists = ['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']]], ['NeedHelp2', ['StillStuck', 'GoodLuck']]]

对 python3 的任何帮助都会得到帮助

【问题讨论】:

  • 文本是使用制表符还是空格?
  • 文本以空格缩进

标签: parsing python-3.x text-indent


【解决方案1】:

您可以利用 Python 分词器来解析缩进的文本:

from tokenize import NAME, INDENT, DEDENT, tokenize

def parse(file):
    stack = [[]]
    lastindent = len(stack)

    def push_new_list():
        stack[-1].append([])
        stack.append(stack[-1][-1])
        return len(stack)

    for t in tokenize(file.readline):
        if t.type == NAME:
            if lastindent != len(stack):
                stack.pop()
                lastindent = push_new_list()
            stack[-1].append(t.string) # add to current list
        elif t.type == INDENT:
            lastindent = push_new_list()
        elif t.type == DEDENT:
            stack.pop()
    return stack[-1]

例子:

from io import BytesIO
from pprint import pprint
pprint(parse(BytesIO(TXT.encode('utf-8'))), width=20)

输出

['Test1',
 ['NeedHelp',
  ['GotStuck',
   ['Sometime',
    'NoLuck']]],
 ['NeedHelp2',
  ['StillStuck',
   'GoodLuck']]]

【讨论】:

  • J.F.塞巴斯蒂安非常感谢这个例子
【解决方案2】:

我希望你能理解我的解决方案。如果没有,请询​​问。

def nestedbyindent(string, indent_char=' '):
    splitted, i = string.splitlines(), 0
    def first_non_indent_char(string):
        for i, c in enumerate(string):
            if c != indent_char:
                return i
        return -1
    def subgenerator(indent):
        nonlocal i
        while i < len(splitted):
            s = splitted[i]
            title = s.lstrip()
            if not title:
                i += 1
                continue
            curr_indent = first_non_indent_char(s)
            if curr_indent < indent:
                break
            elif curr_indent == indent:
                i += 1
                yield title
            else:
                yield list(subgenerator(curr_indent))
    return list(subgenerator(-1))

>>> nestedbyindent(TXT)
['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']],
'NeedHelp2',['StillStuck', 'GoodLuck']]]

【讨论】:

  • SzieberthAdam 非常感谢 看起来很干净,而且做得很好。我的失败尝试一直都是一团糟:)我将不得不研究一下,尤其是“非本地”的使用从未遇到过。谢谢
  • '也许你可以接受我的回答' 你的意思是我几天前刚刚注册的'stackoverflow' 中可以投赞成票的东西。总之很有帮助
  • 只需单击我的答案左侧的管道标志。它位于赞成/反对票柜台下方。
  • 我明白了 - 做不到需要 15 个声誉:顺便说一下,上面的答案更符合预期的输出:['NeedHelp2' 从一个新列表开始,但你的确实也很有帮助
  • 哈哈,我完全错过了。
【解决方案3】:

这是非常非 Pythonic 和冗长的答案。但它似乎有效。

TXT = r"""
Test1
    NeedHelp
        GotStuck
            Sometime
            NoLuck
    NeedHelp2
        StillStuck
        GoodLuck
"""

outString = '['
level = 0
first = 1
for i in TXT.split("\n")[1:]:
    count = 0
    for j in i:
        if j!=' ':
            break
        count += 1
    count /= 4 #4 space = 1 indent
    if i.lstrip()!='':
        itemStr = "'" + i.lstrip() + "'"
    else:
        itemStr = ''
    if level < count:
        if first:
            outString += '['*(count - level) + itemStr
            first = 0
        else:
            outString += ',' + '['*(count - level) + itemStr
    elif level > count:
        outString += ']'*(level - count) + ',' + itemStr
    else:
        if first:
            outString += itemStr
            first = False
        else:
            outString += ',' + itemStr
    level = count
if len(outString)>1:
    outString = outString[:-1] + ']'
else:
    outString = '[]'

output = eval(outString)
#['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']], 'NeedHelp2', ['StillStuck', 'GoodLuck']]]

【讨论】:

  • 嗨,谢谢:给我一个错误:第 31 行` outString += ',' + '['*(count - level) + itemStr TypeError: 不能将序列乘以非整数类型'浮动'`
  • @user3426681 嗯...像'x'*0.1 那样做会引发这个错误。查看countlevel的类型是否为int
  • @ user2931409 需要类似count = int(count / 4) #4 space = 1 indent 的输出类似于:nestedbyindent[0] 这与示例不完全相同:output: ['NeedHelp2' starting in a new list
【解决方案4】:

复制this answer,如果整行想要保留并且如果这些行不仅仅包含变量名,t.type == NAME 可以替换为t.type == NEWLINE,并且如果-statement 可以附加剥离的行而不是t.string。像这样的:

from tokenize import NEWLINE, INDENT, DEDENT, tokenize

def parse(file):
    stack = [[]]
    lastindent = len(stack)

    def push_new_list():
        stack[-1].append([])
        stack.append(stack[-1][-1])
        return len(stack)

    for t in tokenize(file.readline):
        if t.type == NEWLINE:
            if lastindent != len(stack):
                stack.pop()
                lastindent = push_new_list()
            stack[-1].append(t.line.strip()) # add entire line to current list
        elif t.type == INDENT:
            lastindent = push_new_list()
        elif t.type == DEDENT:
            stack.pop()
    return stack[-1]

否则,行会在任何标记上拆分,其中标记包括空格、括号、方括号等。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-11-12
    • 2018-04-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-12-24
    • 1970-01-01
    相关资源
    最近更新 更多