【问题标题】:Similarity between lists of floats [closed]浮动列表之间的相似性[关闭]
【发布时间】:2021-08-18 21:52:23
【问题描述】:

我有一个浮点数列表,我想与其他列表进行比较并在 python 中获得相似度:

我要比较的列表:

[0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001]

其他列表之一:

[0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000]

我尝试将它们转换为字符串并使用fuzzywyzzy 库、python-Levenshtein 和 difflib 来比较字符串并获得一个比率,但这并没有给我想要的结果,而且它们非常慢。我搜索并找不到任何关于此的内容。

比较两个浮点数列表的最佳方法是什么?

我想知道是否有本地方法来比较浮动列表的相似性或可以完成这项工作的库,例如字符串比较的许多示例。

【问题讨论】:

  • 在这种特定情况下的预期输出是什么?另外,什么时候两个数字被认为是相似的?你如何衡量相似度?
  • 预期输出是一个介于 0 和 100 之间或介于 0 和 1 之间的数字。100 表示相同,0 表示完全不同。
  • 0.0001 和 0.0002 比第一个元素比较中的 0.0001 和 0.0005 更相似,等等所有元素都需要比较,并且需要输出一个分数,我确信有进行此比较的库或方法,以查看浮动列表是否与另一个浮动列表相似。但我什么也找不到。
  • 您需要指定 0% 和 100% 差异的含义,例如:您的差异是 0.1 到 0.2 的百分比? 0.1 到 100 是多少?在什么情况下会有 0% 的百分比差异?如果一个数趋于无穷怎么办?
  • 您的问题被否决的最可能原因是您无法清楚地定义您的问题陈述。您需要提供一个数值指标来说明您的案例中的相似含义,因为在这种情况下,“相似”不是一个定义明确的数学概念。

标签: python python-3.x levenshtein-distance difflib


【解决方案1】:

我认为这个问题并不完全清楚,但是您可以看看以下方法是否对您有帮助:

import numpy as np
l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])

mse1 = ((l1 - l2)**2).mean()
# Out[180]: 6.699999999999999e-08

l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([1.0000,1.0002,1.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])

mse2 = ((l1 - l2)**2).mean()
# Out[180]: 0.15000006700000001

mse1 < mse2
# Out[187]: True

你不会得到介于 0 和 1 之间的值,但你可以比较结果,它们越接近 0,它们越相同。mse 代表均方误差。但是还有更多可能与您相关的指标,例如 msle、mae 等。

【讨论】:

  • 谢谢,我希望这对其他人也有帮助,因为字符串比较是一个很好解释的主题,但是对于非数学导向的人来说,使用数字列表并不是很好的解释。
  • @ElyesLounissi,很高兴回答对您有所帮助。下次尝试提供预期的输出,这将增加您获得更多答案的机会。如果您喜欢,请随时为答案投票,否则:快乐编码!
猜你喜欢
  • 2014-04-08
  • 1970-01-01
  • 2021-03-29
  • 1970-01-01
  • 1970-01-01
  • 2019-03-27
  • 1970-01-01
  • 1970-01-01
  • 2013-04-18
相关资源
最近更新 更多