【问题标题】:Parsing data to lists of lists将数据解析为列表列表
【发布时间】:2017-09-02 13:51:23
【问题描述】:

我在尝试将数据解析为列表时遇到了问题。 我正在尝试收集有关部门及其主题的信息。 但是,由于每个部门的科目数量不同,我需要创建一个列表列表,以便以后可以将数据链接在一起。我设法解决了索引错误,问题似乎来自编译主题列表。

from lxml import html
import requests

page = requests.get('URL')

page_source_code = html.fromstring(page.text)


departments_list = []
subject_list = []

for dep in range(1,3):
    departments = page_source_code.xpath('tag'
                                         +str(dep)+']tag/text()')

    ### print(dep, departments)
    if departments == []:
        pass
    else:
        departments_list.append(departments[0])


    for sub in range(1,20):
        subjects = page_source_code.xpath('tag'
                                      +str(dep)+']tag'
                                      +str(sub)+']tag/text()')
        ### print(sub, subjects)
        if subjects == []:
            pass
        else:
            subject_list.append(subjects[0])

print('Department list ------ ', len(departments_list), departments_list, '\n')
print('Subject list ------ ', len(subject_list), subject_list)

我的输出如下所示:

Department list ------  2 ['Department_1', 'Department_2'] 

Subject list ------  7 ['Subject_1'(dep_1), 'Subject_2 '(dep_1), 'Subject_3 '(dep_1), 'Subject_4'(dep_1), 'Subject_5'(dep_2), 'Subject_6 '(dep_2), 'Subject_7 '(dep_2)'] 

此代码似乎将所有主题放在一个列表中。我希望如下:

Subject list ------  7 [['Subject_1'(dep_1), 'Subject_2 '(dep_1), 'Subject_3 '(dep_1), 'Subject_4'(dep_1)], ['Subject_5'(dep_2), 'Subject_6 '(dep_2), 'Subject_7 '(dep_2)']] 

【问题讨论】:

    标签: python list multidimensional-array xml-parsing


    【解决方案1】:

    您需要两个将两个列表全局添加为主题列表 并找出 subject[0] 字符串中的 'dep_1' 或 'dep_2' 词。

         #declare the list for subject 
         sub_list1 = [] sub_list2 = []
    
         #this code is under the second for loop
         if subjects.find('dep_1') == -1 :
             sub_list2.append(subjects[0])
         else:
             sub_list1.append(subjects[0])
    
         #Please remove the subjectList.append statement 
         #from second for loop
         #and put it end of both loop like that .
         subjectList = [sub_list1,sub_list2]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-05-25
      • 1970-01-01
      • 1970-01-01
      • 2014-12-25
      • 2011-09-13
      • 2016-09-03
      • 2021-01-31
      相关资源
      最近更新 更多