【发布时间】:2019-06-15 14:20:43
【问题描述】:
所以,我创建了两个包含 100 个元素的系列并将它们“或”组合在一起。 但首先我对第一个系列进行了“排序”,这意味着索引没有对齐。 我预计会出错。或者不好的结果。但我得到的是第三个系列,有 126 个元素!那是一个惊喜。任何想法为什么?
请注意 billy_or_peter 输出列表中的 4 行“Richardson”。有 4 个值,两个是“真”,两个是“假”。
我认为可能存在某种“笛卡尔积”导致 200 行。但相反,我看到 126 行 - 这很奇怪。
想法?
# Loc and Iloc also allow for conditional statments to filter rows of data
# using Loc on the logic test above only returns rows where the result is True
only_billys = df.loc[df["first_name"] == "Billy", :]
print(only_billys)
only_peters = df.loc[df["first_name"] == "Peter", :]
print(only_peters)
print()
only_richardsons = df.loc["Richardson", :]
print(only_richardsons)
print()
isBilly = (df["first_name"] == "Billy").sort_index()
print(isBilly.describe())
print()
isPeter = (df["first_name"] == "Peter")
print(isPeter.describe())
print()
billy_or_peter = isPeter | isBilly
print(billy_or_peter.describe())
print(billy_or_peter)
输出
(only_billys)
id first_name Phone Number Time zone
last_name
Clark 20 Billy 62-(213)345-2549 Asia/Makassar
Andrews 23 Billy 86-(859)746-5367 Asia/Chongqing
Price 59 Billy 86-(878)547-7739 Asia/Shanghai
id first_name Phone Number Time zone
(only_peters)
last_name
Richardson 1 Peter 7-(789)867-9023 Europe/Moscow
id first_name Phone Number Time zone
(only_richardsons)
last_name
Richardson 1 Peter 7-(789)867-9023 Europe/Moscow
Richardson 25 Donald 62-(259)282-5871 Asia/Jakarta
(isBilly.describe() - sorted index)
count 100
unique 2
top False
freq 97
Name: first_name, dtype: object
(isPeter.describe())
count 100
unique 2
top False
freq 99
Name: first_name, dtype: object
(billy_or_peter.describe() - 126 rows???)
count 126
unique 2
top False
freq 121
Name: first_name, dtype: object
(billy_or_peter listing - notice 4 Richardsons where before there were only 2)
last_name
Adams False
Allen False
Andrews True
Austin False
Baker False
Banks False
Bell False
Berry False
Bishop False
Black False
Brooks False
Brown False
Bryant False
Bryant False
Bryant False
Bryant False
Burke False
Butler False
Butler False
Butler False
Butler False
Carroll False
Chapman False
Chavez False
Clark True
Collins False
Cook False
Day False
Day False
Day False
...
Price True
Reid False
Reyes False
Rice False
*Richardson True
*Richardson True
*Richardson False
*Richardson False
Riley False
Roberts False
Robertson False
Robinson False
Rogers False
Scott False
Shaw False
Shaw False
Shaw False
Shaw False
Simmons False
Snyder False
Sullivan False
Torres False
Tucker False
Vasquez False
Wagner False
Walker False
Washington False
Watkins False
Wells False
Williamson False
Name: first_name, Length: 126, dtype: bool
【问题讨论】:
标签: python pandas dataframe series