Pandas：用数据填充随机空行答案

【问题标题】：Pandas: Filling random empty rows with dataPandas：用数据填充随机空行
【发布时间】：2018-12-08 19:09:09
【问题描述】：

我有一个数据框，其中包含几个当前为空的列。我想要其中的一小部分填充来自正态分布的数据，而其余的都留空。因此，例如，如果 60% 的元素应该是空白的，那么 60% 将是空白，而另外 40% 将被填充。我已经通过 numpy 获得了正态分布，但我试图弄清楚如何选择随机行来填充。目前，我能想到的唯一方法涉及 FOR 循环，我宁愿避免这样做。

有没有人知道如何随机填充数据框的空元素？我有一些下面的代码，用于随机数。

data.loc[data['ColumnA'] == 'B', 'ColumnC'] = np.random.normal(1000, 500, rowsB).astype('int64')

【问题讨论】：

如何获取行的随机索引并向其中添加数据？
我可以想出很多方法来做我认为你正在尝试做的事情。问题是，我无法准确说出您要做什么，我不想浪费时间猜测。您可以通过生成minimal reproducible example 来改进您的问题。 edit你的问题，我相信你会得到你的答案。

标签： python-3.x pandas random

【解决方案1】：

piRSquared 的建议很好。我们只能猜测要解决什么问题。刚刚浏览了一些最新的未回答的 pandas 问题，情况更糟。

import pandas as pd
import numpy as np

#some redundancy here as i make an empty dataframe -pretending i start like you with a Dataframe.
df = pd.DataFrame(index = range(11),columns=list('abcdefg'))
num_cells = np.product(df.shape)

# make a 2-dim array with number from 1 to number cells.
arr =np.arange(1,num_cells+1)

#inplace shuffle - this is the key randomization operation
np.random.shuffle(arr)   

arr = arr.reshape(df.shape) 

#place the shuffled values, normalized to the number of cells, into my dateframe.
df = pd.DataFrame(index = df.index,columns = df.columns,data=arr/np.float(num_cells))

#use applymap to set keep 40% of cells as ones, the other 60% as nan.
df = df.applymap(lambda x: 1 if x > 0.6 else np.nan)

# now sample a full set from normal distribution
# but when multiplying the nans will cause the sampled value to nullify, whilst the multiply by 1 will retain the sample value.
df * np.random.normal(1000,500,df.shape)

因此，您会随机留下 40% 的单元格，其中包含来自您的正态分布的抽取。

如果您的数据框很大，您可以假设统一 rand() 函数的稳定性。在这里我没有这样做，而是明确确定有多少细胞高于和低于阈值。

【讨论】：

嗨。我想知道如果我想填充具有不同列类型的数据框，例如我们有一些带有整数的列和一些带有随机日期的列，我该如何使用此代码？如何用随机日期和数字填充此数据框？谢谢