【发布时间】:2011-09-29 18:07:15
【问题描述】:
我希望使用 Java 计算两个特征之间的互信息。
我已经阅读了Calculating Mutual Information For Selecting a Training Set in Java,但那是关于互信息是否适合发帖者的讨论,只有一些关于实现的简单伪代码。
我当前的代码如下,但我希望有办法对其进行优化,因为我有大量信息需要处理。我知道调用另一种语言/框架可能会提高速度,但现在想专注于用 Java 解决这个问题。
非常感谢任何帮助。
public static double calculateNewMutualInformation(double frequencyOfBoth, double frequencyOfLeft,
double frequencyOfRight, int noOfTransactions) {
if (frequencyOfBoth == 0 || frequencyOfLeft == 0 || frequencyOfRight == 0)
return 0;
// supp = f11
double supp = frequencyOfBoth / noOfTransactions; // P(x,y)
double suppLeft = frequencyOfLeft / noOfTransactions; // P(x)
double suppRight = frequencyOfRight / noOfTransactions; // P(y)
double f10 = (suppLeft - supp); // P(x) - P(x,y)
double f00 = (1 - suppRight) - f10; // (1-P(y)) - P(x,y)
double f01 = (suppRight - supp); // P(y) - P(x,y)
// -1 * ((P(x) * log(Px)) + ((1 - P(x)) * log(1-p(x)))
double HX = -1 * ((suppLeft * MathUtils.logWithoutNaN(suppLeft)) + ((1 - suppLeft) * MathUtils.logWithoutNaN(1 - suppLeft)));
// -1 * ((P(y) * log(Py)) + ((1 - P(y)) * log(1-p(y)))
double HY = -1 * ((suppRight * MathUtils.logWithoutNaN(suppRight)) + ((1 - suppRight) * MathUtils.logWithoutNaN(1 - suppRight)));
double one = (supp * MathUtils.logWithoutNaN(supp)); // P(x,y) * log(P(x,y))
double two = (f10 * MathUtils.logWithoutNaN(f10));
double three = (f01 * MathUtils.logWithoutNaN(f01));
double four = (f00 * MathUtils.logWithoutNaN(f00));
double HXY = -1 * (one + two + three + four);
return (HX + HY - HXY) / (HX == 0 ? MathUtils.EPSILON : HX);
}
public class MathUtils {
public static final double EPSILON = 0.000001;
public static double logWithoutNaN(double value) {
if (value == 0) {
return Math.log(EPSILON);
} else if (value < 0) {
return 0;
}
return Math.log(value);
}
【问题讨论】:
-
您是否测量了性能并认为它很慢?
-
很好的问题,但是你能在互信息的上下文中将每个符号映射到它的变量中吗?因为我有点困惑。
标签: java optimization machine-learning