我怎样才能对这些数据进行排序，就好像它在字典中一样？答案

【问题标题】：How can I sort this data as if it were in a dictionary?我怎样才能对这些数据进行排序，就好像它在字典中一样？
【发布时间】：2021-06-19 01:49:15
【问题描述】：

我有这个文本文件，其中包含某些产品，每个产品都有可用的商店。商店行以 tab 字符开头，而产品行则没有。

为了能够以更好的方式对其进行可视化，我想将其作为字典进行排序，将商店名称作为键，然后是产品列表。一个例子是：

{
    'Store1' : ['product1', 'product2'],
    'Store2' : ...
}

这是我拥有的数据示例，为每个产品存储：

Crucial Ballistix BLT8G4D26BFT4K
- Infor-Ingen
- 毕普
- 电脑厂
- 我的盒子
爱国者签名线PSD48G266681
- PC 快递
- Soluservi
金士顿KCP426NS6/8
- 优科技
- 毕普

预期的输出必须是这样的（打印得很漂亮）：

{
    'Infor-Ingen' : ['Crucial Ballistix BLT8G4D26BFT4K'     ],
    'Bip'         : ['Crucial Ballistix BLT8G4D26BFT4K',
                     'Kingston KCP426NS6/8'                 ],
    'PC Factory'  : ['Crucial Ballistix BLT8G4D26BFT4K'     ],
    'MyBox'       : ['Crucial Ballistix BLT8G4D26BFT4K'     ],
    'PC Express'  : ['Patriot Signature Line PSD48G266681'  ],
    'Soluservi'   : ['Patriot Signature Line PSD48G266681'  ],
    'YouTech'     : ['Kingston KCP426NS6/8'                 ]
}

我有这个代码

from collections import OrderedDict

od = OrderedDict()
tienda, producto ,otra,aux,wea= [], [],[], [],[]

with open("rams.txt","r") as f:
    data = f.readlines()
    for linea in data:
        linea = linea.strip('\n')
        if '\t' in linea:
            tienda.append(linea.strip('\t'))
            aux.append(linea.strip("\t").strip("\n"))
        else:
            otra.append(aux)
            aux=[]
            producto.append(linea)
            aux.append(linea.strip("\n"))
    tienda = sorted(list(set(tienda)))
    for i in range(1,len(otra)):
        wea=[]
        for key in tienda:
            if key in otra[i]:
                wea.append(otra[i][0])
                od[key] = wea

现在的问题是，在打印字典时，它给了我这样的信息：

('Bip', ['Crucial Ballistix BLT8G4D26BFT4K ']), ('Infor-Ingen', ['Crucial Ballistix BLT2K8G4D26BFT4K ']), ('MyBox', ['Crucial Ballistix CT16G4DFD8266']),..)

【问题讨论】：

您的问题在于括号的打印方式？该打印输出是在OrderedDict 类定义中的默认__str__ 和__repr__ 方法上定义的。有一些替代方法可以更改这些方法，但我建议您自己构建。看看这个question，尤其是这个answer

标签： python dictionary

【解决方案1】：

您在解析文件时遇到了一些问题。考虑到数据的格式，您应该坐下来尝试了解您要完成的工作。

文件由可以被视为一组的行组成：

包含产品名称的非缩进行
后跟包含该产品的商店的缩进行

因此，当您阅读产品时，您应该记住该产品，直到阅读新产品。

对于您阅读的每个商店，您应该将产品添加到该商店可用的产品列表中。为此，您需要一个字典，其中键是商店名称，值是产品。

请记住，在尝试附加产品之前，您必须检查商店是否存在于字典中。

解决它的一种方法是：

products_by_store = dict()
with open("rams.txt","r") as f:
    cur_prod = None
    data = f.readlines()
    for linea in data:
        linea = linea.strip('\n')
        if '\t' in linea:
          linea = linea.strip('\t')
          if cur_prod:
            if not linea in products_by_store:
              products_by_store[linea] = [cur_prod]
            else:
              products_by_store[linea].append(cur_prod)
        else:
          cur_prod = linea
for k,v in products_by_store.items():
  print(k, v)

这将返回以下输出：

Infor-Ingen ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K']
Bip ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K', 'Kingston KCP426NS6/8']
PC Factory ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K']
MyBox ['Crucial Ballistix Tactical Tracer BLT8G4D26BFT4K']
PC Express ['Patriot Signature Line PSD48G266681']
Soluservi ['Patriot Signature Line PSD48G266681']
YouTech ['Kingston KCP426NS6/8']

当然，您应该根据自己的需要进行调整。你说了一些关于使用有序集合的事情。一切就绪后，对元素进行排序应该很简单。

【讨论】：

【解决方案2】：

首先要做的事情 - 关于仅在课程中使用print()，请记住__str__() 的普遍接受的目的（这是print() 调用编织它的魔力）。它旨在成为对象的人类可读表示。

因此，OrderedDict 的默认 __str__() 正在执行完全的预期。对于您的特定案例，这不一定是您希望看到的，但解决方案是意识到最好将其作为OrderedDict 的抽象来完成。

Python 的部分功能（作为一种面向对象的语言）是它能够在当前类的基础上定义新类，添加您想要的任何额外行为或状态。

对于您的情况，我将实现 OrderedDict 子类并更改 __str__() 的输出以生成您需要的任何格式，例如：

from collections import OrderedDict

class ProductDb(OrderedDict):
    # Optional file to constructor to load immediately.

    def __init__(self, fspec=None):
        super().__init__(self)
        if fspec is not None:
            self.load(fspec)

    # Allow reloading at any point.

    def load(self, fspec):
        # Remove all existing information.

        self.clear()

        # For the Espanol-challenged amongst us:
        #     archivo = file
        #        este = this
        #       linea = line
        #    producto = product
        #      tienda = shop

        with open(fspec, 'r') as archivo:
            # To handle missing product line at start of file,
            # start with a fixed value. If first line IS a
            # product, it will simply replace this fixed value.
            # Then we process each line, sans newline character.

            este_producto = 'UNKNOWN'

            for linea in archivo.readlines():
                linea = linea.strip('\n')

                # Tienda lines start with tabs, producto lines do not.

                if '\t' in linea:
                    tienda = linea.strip('\t')

                    # Make NEW shops start with empty product list.
                    # Then we can just add current product to the
                    # list, not caring if shop was new.

                    if tienda not in self:
                        self[tienda] = []
                    self[tienda].append(este_producto)
                else:
                    # Change current product so subsequent
                    # stores pick that up instead.

                    este_producto = linea

        # Then, for each dictionary entry (store), de-dupe
        # and sort list (products), giving sorted products
        # within sorted stores. Use a copy of the keys, this
        # ensures no changes to the dictionary while you're
        # iterating over it.

        for key in list(self.keys()):
            self[key] = sorted(list(set(self[key])))

    def __str__(self):
        def spc(n): return " " * n

        # Get maximum store/product lengths for formatting.

        max_st = max([len(st) for st in self])
        max_pr = max([len(pr) for st in self for pr in self[st]])

        out = ""
        st_sep = f"{{\n{spc(4)}"
        for st in self:
            out += f"{st_sep}'{st}'{spc(max_st-len(st))} : "
            pr_sep = f"["
            for pr in self[st]:
                out += f"{pr_sep}'{pr}'"
                pr_sep = f",{spc(max_pr-len(pr))}\n{spc(max_st+10)}"
            out += f"{spc(max_pr-len(self[st][-1])+1)}]"
            st_sep = f",\n{spc(4)}"
        out += f"\n}}"

        return out

xyzzy = ProductDb('infile.txt')
print(xyzzy)

您会注意到我还对文件加载器方法进行了一些相当大的更改，而不仅仅是使其成为类的方法。

您的原始文件加载代码不需要像当前那样复杂接近。具体来说，您可以通过动态构建列表字典来摆脱所有这些临时列表（代码中的 cmets 应该有望解释事情）。

我使用了以下 infile.txt 测试文件（在车间线的开头有一个 tab）：

Crucial Ballistix BLT8G4D26BFT4K
    Infor-Ingen
    Bip
    PC Factory
    MyBox
Patriot Signature Line PSD48G266681
    PC Express
    Soluservi
Kingston KCP426NS6/8
    YouTech
    Bip

输出如下，和你要求的差不多：

{
    'Infor-Ingen' : ['Crucial Ballistix BLT8G4D26BFT4K'    ],
    'Bip'         : ['Crucial Ballistix BLT8G4D26BFT4K',
                     'Kingston KCP426NS6/8'                ],
    'PC Factory'  : ['Crucial Ballistix BLT8G4D26BFT4K'    ],
    'MyBox'       : ['Crucial Ballistix BLT8G4D26BFT4K'    ],
    'PC Express'  : ['Patriot Signature Line PSD48G266681' ],
    'Soluservi'   : ['Patriot Signature Line PSD48G266681' ],
    'YouTech'     : ['Kingston KCP426NS6/8'                ]
}

【讨论】：