【发布时间】:2020-04-10 15:19:52
【问题描述】:
我正在编写一个脚本,该脚本从 excel 文件中的每个类别中抽取样本。脚本有效,但我的结果并不如预期 - 我得到了 2 个样本。我希望脚本从每个类别中抽取 1%、3% 或 5%,除非该类别中的项目数量有限;在这种情况下,我想要一个 2 的样本。我已经复制了下面的代码——对于大块的文本感到抱歉,我只是认为查看整个代码会有所帮助。任何解决此问题的帮助将不胜感激。
#imports
import pandas as pd
#read file
df = pd.read_excel(r"C:\Users\***\Desktop\***.xlsx")
#check for certain condition (Y)
df2 = df.loc[(df['Track Item']=='Y')]
print(len(df2))
#unique categories and subcategories
categories = df2['Category'].unique()
subcategories = df2['Subcategory'].unique()
#check for empty subcategories
subcategory = df2['Subcategory'].isnull().all()
#taking a sample based on whether subcategory is empty and the number of y-tracked items
if subcategory == True:
def sample_per(df2):
if len(df2) >= 1500:
for category in categories:
return df2.loc[(df2["Category"] == category)].apply(lambda x: x.sample(n=2) if
x.size*0.01 < 2 else x.sample(frac=0.01))
elif len(df2) < 15000 and len(df2) > 10000:
for category in categories:
return df2.loc[(df2["Category"] == category)].apply(lambda x: x.sample(n=2) if
x.size*0.03 < 2 else x.sample(frac=0.03))
else:
for category in categories:
return df2.loc[(df2["Category"] == category)].apply(lambda x: x.sample(n=2) if
x.size*0.05 < 2 else x.sample(frac=0.05))
else:
def sample_per(df2):
if len(df2) >= 1500:
for subcategory in subcategories:
return df2.loc[(df2["Subcategory"] == subcategory)].apply(lambda x: x.sample(n=2) if
x.size*0.01 < 2 else x.sample(frac=0.01))
elif len(df2) < 15000 and len(df2) > 10000:
for subcategory in categories:
return df2.loc[(df2["Subcategory"] == subcategory)].apply(lambda x: x.sample(n=2) if
x.size*0.03 < 2 else x.sample(frac=0.03))
else:
for subcategory in subcategories:
return df2.loc[(df2["Subcategory"] == subcategory)].apply(lambda x: x.sample(n=2) if
x.size*0.05 < 2 else x.sample(frac=0.05))
#result of sample_per function
final = sample_per(df2)
因为线条很长,所以间距看起来不对--缩进是正确的
【问题讨论】:
标签: python pandas if-statement lambda