【问题标题】:Parsing and sorting keys in Python dictionaryPython字典中的解析和排序键
【发布时间】:2013-04-13 20:04:42
【问题描述】:

我创建了以下字典:

code dictionary =  {u'News; comment; negative': u'contradictory about news', u'News; comment': u'something about news'}

我现在想编写一些 Python 代码,通过字典的键并分离出代码及其对应的值。所以对于字典中的第一个元素,我想结束:

News: 'contradictory about news', 'something about news'
comment: 'contradictory about news', 'something about news'
negative: 'contradictory about news'

最终结果可以是字典、列表、制表符或逗号分隔的文本。

你可以在这里看到我的尝试:

from bs4 import BeautifulSoup as Soup
f = open('transcript.xml','r')
soup = Soup(f)
#print soup.prettify()


#searches text for all w:commentrangestart tags and makes a dictionary that matches ids with text
textdict = {}
for i in soup.find_all('w:commentrangestart'):
        # variable 'key' is assigned to the tag id
        key = i.parent.contents[1].attrs['w:id']
        key = str(key)
        #variable 'value' is assigned to the tag's text
        value= ''.join(i.nextSibling.findAll(text=True))
        # key / value pairs are added to the dictionary 'textdict'
        textdict[key]=value
print "Transcript Text = " , textdict

# makes a dictionary that matches ids with codes        
codedict = {}
for i in soup.find_all('w:comment'):
        key = i.attrs['w:id']
        key = str(key)
        value= ''.join(i.findAll(text=True))
        codedict[key]=value
print "Codes = ", codedict

# makes a dictionary that matches all codes with text
output = {}
for key in set(textdict.keys()).union(codedict.keys()):
        print "key= ", key
        txt = textdict[key]
        print "txt = ", txt
        ct = codedict[key]
        print "ct= ", ct
        output[ct] = txt
        #print "output = ", output
print "All code dictionary = ", output

#codelist={}
#for key in output:
#   codelist =key.split(";")
#print "codelist= " , codelist


code_negative = {}
code_news = {}
print output.keys()
for i in output:
    if 'negative' in output.keys():
        print 'yay'
        code_negative[i]=textdict[i]
        print 'text coded negative: ' , code_negative
    if 'News' in i:
        code_news[i]=textdict[i]
        print 'text coded News: ' ,code_news

但由于某种原因,当我运行最后一个函数时,我不断收到一个关键错误:

code_negative = {}
code_news = {}
for i in output:
    if 'negative' in output.keys():
        code_negative[i]=textdict[i]
    print 'text coded negative: ' , code_negative
if 'News' in i:
    code_news[i]=textdict[i]
    print 'text coded News: ' ,code_news

有什么想法吗?谢谢!

【问题讨论】:

  • 使用拆分函数不断迭代您想要的内容...类似于 for 循环,然后您返回 i.split(';')。这应该允许您遍历您需要的内容

标签: python dictionary xml-parsing beautifulsoup


【解决方案1】:

如果我正确理解了问题,以下代码应该可以工作:

from collections import defaultdict

out = defaultdict(list)
for k, v in code_dictionary.viewitems():
    for item in k.split('; '):
        out[item].append(v)

【讨论】:

    【解决方案2】:
    output = {u'News; comment; negative': u'contradictory about news', u'News; comment': u'something about news'}
    negatives = []
    comments = []
    news = []
    for k, v in output.items():
        key_parts = k.split('; ')
        key_parts = [part.lower() for part in key_parts]
        if 'negative' in key_parts:
            negatives.append(v)
        if 'news' in key_parts:
            news.append(v)
        if 'comment' in key_parts:
            comments.append(v)
    

    【讨论】:

      猜你喜欢
      • 2021-03-28
      • 2011-01-25
      • 1970-01-01
      • 2018-12-14
      • 2010-09-14
      • 1970-01-01
      • 2022-11-12
      • 2020-03-31
      • 1970-01-01
      相关资源
      最近更新 更多