规范化 Python 中的数字列表答案

【问题标题】：Normalizing a list of numbers in Python规范化 Python 中的数字列表
【发布时间】：2015-01-03 07:21:53
【问题描述】：

我需要标准化一个值列表以适应概率分布，即介于 0.0 和 1.0 之间。

我了解如何进行规范化，但很好奇 Python 是否有自动执行此操作的函数。

我想从：

raw = [0.07, 0.14, 0.07]

到

normed = [0.25, 0.50, 0.25]

【问题讨论】：

为什么不是[0.5, 1.0, 0.5]？
@Joran 因为 OP 想要sum(normed) == 1.0（忽略浮点错误）。
如果您想在不同范围之间进行标准化，请参阅此帖子。 How to normalize a list of positive and negative decimal number to a specific range

标签： python probability

【解决方案1】：

使用：

norm = [float(i)/sum(raw) for i in raw]

对总和进行标准化，以确保总和始终为 1.0（或尽可能接近）。

使用

norm = [float(i)/max(raw) for i in raw]

根据最大值标准化

【讨论】：

不错。可能值得注意的是，提前计算总和，而不是计算推导中的每个元素，效率会更高。所以：s = sum(raw); norm = [float(i)/s for i in raw]
和(np.array(x) / np.array(x).sum()) / np.array(x).max()一样吗？
@alvas 抱歉 - 我不能确定 numpy - 但假设将数组除以单个值会除以数组中的每个值；那么它看起来是正确的。

【解决方案2】：

尝试：

normed = [i/sum(raw) for i in raw]

normed
[0.25, 0.5, 0.25]

【讨论】：

【解决方案3】：

标准库中没有任何功能（据我所知）可以做到这一点，但绝对有模块具有这样的功能。但是，它很简单，您可以编写自己的函数：

def normalize(lst):
    s = sum(lst)
    return map(lambda x: float(x)/s, lst)

样本输出：

>>> normed = normalize(raw)
>>> normed
[0.25, 0.5, 0.25]

【讨论】：

这是从循环中提取sum() 的两个答案之一...我仍然更喜欢我的答案，但我认为这是一个+，恰好是辅助变量s = sum(lst)。跨度>
normalize([1,0,-1]) 将提高 ZeroDivisionError :)

【解决方案4】：

您要规范化的列表有多长？

def psum(it):
    "This function makes explicit how many calls to sum() are done."
    print "Another call!"
    return sum(it)

raw = [0.07,0.14,0.07]
print "How many calls to sum()?"
print [ r/psum(raw) for r in raw]

print "\nAnd now?"
s = psum(raw)
print [ r/s for r in raw]

# if one doesn't want auxiliary variables, it can be done inside
# a list comprehension, but in my opinion it's quite Baroque    
print "\nAnd now?"
print [ r/s  for s in [psum(raw)] for r in raw]

输出

# How many calls to sum()?
# Another call!
# Another call!
# Another call!
# [0.25, 0.5, 0.25]
# 
# And now?
# Another call!
# [0.25, 0.5, 0.25]
# 
# And now?
# Another call!
# [0.25, 0.5, 0.25]

【讨论】：

【解决方案5】：

试试这个：

from __future__ import division

raw = [0.07, 0.14, 0.07]  

def norm(input_list):
    norm_list = list()

    if isinstance(input_list, list):
        sum_list = sum(input_list)

        for value in input_list:
            tmp = value  /sum_list
            norm_list.append(tmp) 

    return norm_list

print norm(raw)

这将按照您的要求进行。 但我会建议尝试 Min-Max 归一化。

最小-最大归一化：

def min_max_norm(dataset):
    if isinstance(dataset, list):
        norm_list = list()
        min_value = min(dataset)
        max_value = max(dataset)

        for value in dataset:
            tmp = (value - min_value) / (max_value - min_value)
            norm_list.append(tmp)

    return norm_list

【讨论】：

感谢最小最大标准化的代码

【解决方案6】：

如果您的列表有负数，这就是您将其标准化的方式

a = range(-30,31,5)
norm = [(float(i)-min(a))/(max(a)-min(a)) for i in a]

【讨论】：

【解决方案7】：

如果您考虑使用numpy，您可以获得更快的解决方案。

import random, time
import numpy as np

a = random.sample(range(1, 20000), 10000)
since = time.time(); b = [i/sum(a) for i in a]; print(time.time()-since)
# 0.7956490516662598

since = time.time(); c=np.array(a);d=c/sum(a); print(time.time()-since)
# 0.001413106918334961

【讨论】：

你确定这个等式是对的吗？我在 d
@ScipioAfricanus random.sample 仅适用于整数。如果需要浮动，请检查 `np.random.uniform' 或类似的东西。

【解决方案8】：

如果使用数据，很多时候pandas 是简单的键

此特定代码会将raw 放入一列，然后按每行的列进行规范化。（但我们也可以将它放在一行中，每列逐行执行！只需更改 axis 值，其中 0 代表行，1 代表列。）

import pandas as pd


raw = [0.07, 0.14, 0.07]  

raw_df = pd.DataFrame(raw)
normed_df = raw_df.div(raw_df.sum(axis=0), axis=1)
normed_df

normed_df 将显示如下：

然后也可以继续玩数据！

【讨论】：

【解决方案9】：

想用scikit-learn的朋友可以用

from sklearn.preprocessing import normalize

x = [1,2,3,4]
normalize([x]) # array([[0.18257419, 0.36514837, 0.54772256, 0.73029674]])
normalize([x], norm="l1") # array([[0.1, 0.2, 0.3, 0.4]])
normalize([x], norm="max") # array([[0.25, 0.5 , 0.75, 1.]])

【讨论】：

或者完全不同的标准化：from sklearn.utils.extmath import softmax 或 from scipy.special import softmax

【解决方案10】：

使用scikit-learn:

from sklearn.preprocessing import MinMaxScaler
data = np.array([1,2,3]).reshape(-1, 1)
scaler = MinMaxScaler()
scaler.fit(data)
print(scaler.transform(data))

【讨论】：

【解决方案11】：

这是一个与最佳答案类似的不是非常低效的单行代码（仅执行一次求和）

norm = (lambda the_sum:[float(i)/the_sum for i in raw])(sum(raw))

对带有负数的列表也可以做类似的方法

norm = (lambda the_max, the_min: [(float(i)-the_min)/(the_max-the_min) for i in raw])(max(raw),min(raw))

【讨论】：