使用 pandas 从 xml 获取数据答案

【问题标题】：get data from xml using pandas使用 pandas 从 xml 获取数据
【发布时间】：2016-12-22 05:36:21
【问题描述】：

我正在尝试使用 pandas 从 xml 中获取一些数据。目前我有“工作”代码，工作是指它几乎可以工作。

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "http://degra.wi.pb.edu.pl/rozklady/webservices.php?"


response = requests.get(url).content
soup = BeautifulSoup(response)

tables = soup.find_all('tabela_rozklad')

tags = ['dzien', 'godz', 'ilosc', 'tyg', 'id_naucz', 'id_sala',
'id_prz', 'rodz', 'grupa', 'id_st', 'sem', 'id_spec']

df = pd.DataFrame()
for table in tables:
    all = map(lambda x: table.find(x).text, tags)
    df = df.append([all])

df.columns = tags

a = df[(df.sem == "1")]
a = a[(a.id_spec == "0")]
a = a[(a.dzien == "1")]
print(a)

所以我在 "a = df[(df.sem == "1")]" 上遇到错误：

文件“pandas\index.pyx”，第 139 行，在 pandas.index.IndexEngine.get_loc (pandas\index.c:4443) 中

文件“pandas\index.pyx”，第 161 行，在 pandas.index.IndexEngine.get_loc (pandas\index.c:4289) 中

文件“pandas\src\hashtable_class_helper.pxi”，第 732 行，在 pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13733)

文件“pandas\src\hashtable_class_helper.pxi”，第 740 行，在 pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13687)

当我阅读其他堆栈问题时，我看到人们建议使用 df.loc，所以我将这一行修改为

a = df.loc[(df.sem == "1")]

现在代码编译但结果显示此行不存在。需要提一下，问题仅出在“sem”标签上。休息完美但不幸的是我需要使用这个标签。如果有人能解释我导致此错误的原因以及如何解决它，我将不胜感激。

【问题讨论】：

标签： python pandas web service

【解决方案1】：

您可以将ignore_index=True 添加到append 以避免重复index 然后需要通过[] 选择列sem，因为函数sem：

df = pd.DataFrame()
for table in tables:
    all = map(lambda x: table.find(x).text, tags)
    df = df.append([all], ignore_index=True)

df.columns = tags
#print (df)

a = df[(df['sem'] == '1') & (df.id_spec == "0") & (df.dzien == "1")]

print(a)
    dzien godz ilosc tyg id_naucz id_sala id_prz rodz grupa id_st sem id_spec
0       1    1     2   0       52      79     13    W     1    13   1       0
1       1    3     2   0       12      79     32    W     1    13   1       0
2       1    5     2   0       52      65     13   Ćw     1    13   1       0
3       1   11     2   0      201       3     70   Ćw    10    13   1       0
4       1    5     2   0       36      78     13   Ps     5    13   1       0
5       1    5     2   1       18      32    450   Ps     3    13   1       0
6       1    5     2   2       18      32    450   Ps     4    13   1       0
7       1    7     2   1       18      32    450   Ps     7    13   1       0
8       1    7     2   2       18      32    450   Ps     8    13   1       0
9       1    7     2   0       66      65    104   Ćw     1    13   1       0
10      1    7     2   0      283       3    104   Ćw     5    13   1       0
11      1    7     2   0      346       5    104   Ćw     8    13   1       0
12      1    7     2   0      184      29     13   Ćw     7    13   1       0
13      1    9     2   0       66      65    104   Ćw     2    13   1       0
14      1    9     2   0      346       5     70   Ćw     8    13   1       0
15      1    9     1   0       73       3    203   Ćw     9    13   1       0
16      1   10     1   0       73       3    203   Ćw    10    13   1       0
17      1    9     2   0      184      19     13   Ps    13    13   1       0
18      1   11     2   0      184      19     13   Ps    14    13   1       0
19      1   11     2   0       44      65     13   Ćw     9    13   1       0
87      1    9     2   0      201      54    463    W     1    17   1       0
88      1    3     2   0       36      29     13   Ćw     2    17   1       0
89      1    3     2   0      211       5     70   Ćw     1    17   1       0
90      1    5     2   0      211       5     70   Ćw     2    17   1       0
91      1    7     2   0       36      78     13   Ps     4    17   1       0
105     1    1     2   1       11      16     32   Ps     2    18   1       0
106     1    1     2   2       11      16     32   Ps     3    18   1       0
107     1    3     2   0       51       3    457    W     1    18   1       0
110     1    5     2   2       11      16     32   Ps     1    18   1       0
111     1    7     2   0       91      64     97   Ćw     2    18   1       0
112     1    5     2   0      283       3    457   Ćw     2    18   1       0
254     1    5     1   0       12      29     32   Ćw     6    13   1       0
255     1    6     1   0       12      29     32   Ćw     5    13   1       0
462     1    7     2   0       98       1    486    W     1    19   1       0
463     1    9     1   0       91       1    484    W     1    19   1       0
487     1    5     2   0      116      19     13   Ps     1    17   1       0
488     1    7     2   0      116      19     13   Ps     2    17   1       0
498     1    5     2   0        0       0    431   Ps     2    17   1       0
502     1    5     2   0        0       0    431   Ps    15    13   1       0
503     1    5     2   0        0       0    431   Ps    16    13   1       0
504     1    5     2   0        0       0    431   Ps    19    13   1       0
505     1    5     2   0        0       0    431   Ps    20    13   1       0
531     1   13     2   0      350      79    493    W     1    13   1       0
532     1   13     2   0      350      79    493    W     2    17   1       0
533     1   13     2   0      350      79    493    W     1    18   1       0

【讨论】：

这真的很好用，但是你能解释一下为什么我的代码不是吗？
感谢您的接受。所以这是代码字的问题，所以如果你的列是sum，mean，count，sem，你不能选择df.sum，df.count，df.mean，df.sem，因为所有其中是pandas函数sum，mean...所以对于这个专栏需要选择df['sum']，df['mean']...df['sem']。
查看文档here - 检查warning - The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed.