一、统计数据频率
1. values_counts
pd.value_counts(df.column_name) df.column_name.value_counts() Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)[source] Return a Series containing counts of unique values.
参数详解
normalize : boolean, default False If True then the object returned will contain the relative frequencies of the unique values. sort : boolean, default True Sort by values. ascending : boolean, default False Sort in ascending order. bins : integer, optional Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data. dropna : boolean, default True Don’t include counts of NaN.
参数示例讲解
index = pd.Index([3, 1, 2, 3, 4, np.nan]) index.value_counts() Out[144]: 3.0 2 4.0 1 2.0 1 1.0 1 dtype: int64 index.value_counts(normalize=True) Out[145]: 3.0 0.4 4.0 0.2 2.0 0.2 1.0 0.2 dtype: float64 index.value_counts(bins=3) Out[146]: (2.0, 3.0] 2 (0.996, 2.0] 2 (3.0, 4.0] 1 dtype: int64 index.value_counts(dropna=False) Out[148]: 3.0 2 NaN 1 4.0 1 2.0 1 1.0 1 dtype: int64
In [21]: data=pd.DataFrame(pd.Series([1,2,3,4,5,6,11,1,1,1,1,2,2,2,2,3]).values.reshape(4,4),columns=['a','b','c','d']) In [22]: data Out[22]: a b c d 0 1 2 3 4 1 5 6 11 1 2 1 1 1 2 3 2 2 2 3 In [23]: pd.value_counts(data.a) Out[23]: 1 2 2 1 5 1 Name: a, dtype: int64 In [26]: pd.value_counts(data.a).sort_index() Out[26]: 1 2 2 1 5 1 Name: a, dtype: int64 In [27]: pd.value_counts(data.a).sort_index().index Out[27]: Int64Index([1, 2, 5], dtype='int64') In [28]: pd.value_counts(data.a).sort_index().values Out[28]: array([2, 1, 1], dtype=int64)