简单的accord.net机器学习示例答案

【问题标题】：Simple accord.net machine learning example简单的accord.net机器学习示例
【发布时间】：2017-03-27 04:24:47
【问题描述】：

我是机器学习的新手，也是 accord.net 的新手（我用 C# 编写代码）。

我想创建一个简单的项目，在其中查看一个简单的振荡数据时间序列，然后我希望让accord.net 学习它并预测下一个值将是什么。

这就是数据（时间序列）的样子：

X - Y

然后我希望它预测以下内容：

X - Y

你们能帮我举一些例子来解决这个问题吗？

【问题讨论】：

标签： c# machine-learning accord.net

【解决方案1】：

一种简单的方法是使用 Accord ID3 决策树。

诀窍是找出要使用的输入 - 您不能只在 X 上进行训练 - 树不会从中学到任何关于 X 未来值的信息 - 但是您可以构建一些从 X（或以前的Y) 的值将是有用的。

通常对于这样的问题 - 您会根据从 Y（被预测的事物）的先前值而不是 X 派生的特征进行每个预测。但是，假设您可以在每次预测之间按顺序观察 Y（那么您不能预测任何任意 X），所以我会坚持提出的问题。

我尝试构建 Accord ID3 决策树来解决以下问题。我使用了x % n 的几个不同值作为特征——希望树能从中得出答案。事实上，如果我添加了 (x-1) % 4 作为一个特性，它可以在一个级别中只使用该属性来完成它 - 但我想重点是让树找到模式。

这是代码：

    // this is the sequence y follows
    int[] ysequence = new int[] { 1, 2, 3, 2 };

    // this generates the correct Y for a given X
    int CalcY(int x) => ysequence[(x - 1) % 4];

    // this generates some inputs - just a few differnt mod of x
    int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };


    // for http://stackoverflow.com/questions/40573388/simple-accord-net-machine-learning-example
    [TestMethod]
    public void AccordID3TestStackOverFlowQuestion2()
    {
        // build the training data set
        int numtrainingcases = 12;
        int[][] inputs = new int[numtrainingcases][];
        int[] outputs = new int[numtrainingcases];

        Console.WriteLine("\t\t\t\t x \t y");
        for (int x = 1; x <= numtrainingcases; x++)
        {
            int y = CalcY(x);
            inputs[x-1] = CalcInputs(x);
            outputs[x-1] = y;
            Console.WriteLine("TrainingData \t " +x+"\t "+y);
        }

        // define how many values each input can have
        DecisionVariable[] attributes =
        {
            new DecisionVariable("Mod2",2),
            new DecisionVariable("Mod3",3),
            new DecisionVariable("Mod4",4),
            new DecisionVariable("Mod5",5),
            new DecisionVariable("Mod6",6)
        };

        // define how many outputs (+1 only because y doesn't use zero)
        int classCount = outputs.Max()+1;

        // create the tree
        DecisionTree tree = new DecisionTree(attributes, classCount);

        // Create a new instance of the ID3 algorithm
        ID3Learning id3learning = new ID3Learning(tree);

        // Learn the training instances! Populates the tree
        id3learning.Learn(inputs, outputs);

        Console.WriteLine();
        // now try to predict some cases that werent in the training data
        for (int x = numtrainingcases+1; x <= 2* numtrainingcases; x++)
        {
            int[] query = CalcInputs(x);

            int answer = tree.Decide(query); // makes the prediction

            Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
            Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
        }
    }

这是它产生的输出：

                 x   y
TrainingData     1   1
TrainingData     2   2
TrainingData     3   3
TrainingData     4   2
TrainingData     5   1
TrainingData     6   2
TrainingData     7   3
TrainingData     8   2
TrainingData     9   1
TrainingData     10  2
TrainingData     11  3
TrainingData     12  2

Prediction       13  1
Prediction       14  2
Prediction       15  3
Prediction       16  2
Prediction       17  1
Prediction       18  2
Prediction       19  3
Prediction       20  2
Prediction       21  1
Prediction       22  2
Prediction       23  3
Prediction       24  2

希望对您有所帮助。

编辑：在 cmets 之后，修改下面的示例以训练目标 (Y) 的先前值 - 而不是从时间索引 (X) 派生的特征。这意味着您不能在系列开始时开始训练 - 因为您需要以前 Y 值的回溯历史。在此示例中，我从 x=9 开始只是因为它保持相同的序列。

        // this is the sequence y follows
    int[] ysequence = new int[] { 1, 2, 3, 2 };

    // this generates the correct Y for a given X
    int CalcY(int x) => ysequence[(x - 1) % 4];

    // this generates some inputs - just a few differnt mod of x
    int[] CalcInputs(int x) => new int[] { CalcY(x-1), CalcY(x-2), CalcY(x-3), CalcY(x-4), CalcY(x - 5) };
    //int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };


    // for http://stackoverflow.com/questions/40573388/simple-accord-net-machine-learning-example
    [TestMethod]
    public void AccordID3TestTestStackOverFlowQuestion2()
    {
        // build the training data set
        int numtrainingcases = 12;
        int starttrainingat = 9;
        int[][] inputs = new int[numtrainingcases][];
        int[] outputs = new int[numtrainingcases];

        Console.WriteLine("\t\t\t\t x \t y");
        for (int x = starttrainingat; x < numtrainingcases + starttrainingat; x++)
        {
            int y = CalcY(x);
            inputs[x- starttrainingat] = CalcInputs(x);
            outputs[x- starttrainingat] = y;
            Console.WriteLine("TrainingData \t " +x+"\t "+y);
        }

        // define how many values each input can have
        DecisionVariable[] attributes =
        {
            new DecisionVariable("y-1",4),
            new DecisionVariable("y-2",4),
            new DecisionVariable("y-3",4),
            new DecisionVariable("y-4",4),
            new DecisionVariable("y-5",4)
        };

        // define how many outputs (+1 only because y doesn't use zero)
        int classCount = outputs.Max()+1;

        // create the tree
        DecisionTree tree = new DecisionTree(attributes, classCount);

        // Create a new instance of the ID3 algorithm
        ID3Learning id3learning = new ID3Learning(tree);

        // Learn the training instances! Populates the tree
        id3learning.Learn(inputs, outputs);

        Console.WriteLine();
        // now try to predict some cases that werent in the training data
        for (int x = starttrainingat+numtrainingcases; x <= starttrainingat + 2 * numtrainingcases; x++)
        {
            int[] query = CalcInputs(x);

            int answer = tree.Decide(query); // makes the prediction

            Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
            Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
        }
    }

您还可以考虑对先前 Y 值之间的差异进行训练 - 在 Y 的绝对值不如相对变化重要的情况下，这会更好。

【讨论】：

这太棒了，我从这个例子中学到了很多东西（如何产生输入和输出）这个例子工作得很好。但在“实际情况”中，我不能使用 X 值进行计算，因为它是一个时间序列（例如 x1 = 3:00am，x2=4:00am，x3=5:00am），所以我只有一个所有 Y 值的时间序列，并希望在此处找到 patten 以帮助预测下一个 Y 值将是什么......如果这有意义吗？
当然 - 将目标 (Y) 的先前值用于时间序列更自然 - 至少当实际时间不相关且值之间的关系是模式所在时。
我将编辑答案以添加如何修改示例以训练以前的 Y 值。
非常感谢，非常感谢您的快速响应和帮助。谢谢。
谢谢@reddal，如果输出 Ys 是实数并且没有具体的类数，你建议做什么。例如我们有一系列数字，例如 { 0.4, 0.9, 0.3, 1.2, 0.7}，现在我们要预测下一个值。