2分钟内有监督与无监督学习

监督学习(Supervised Learning)

Supervised learning algorithms take a dataset and use its features to learn some relationship with a corresponding set of labels. This process is known as training and, once complete, we would hope that our algorithm would do a good job of predicting the labels of brand new data in which the algorithm has no explicit knowledge of the true label. For example, we might train a supervised algorithm using a set of images of common animals as well as their corresponding labels (e.g. dog, cat, chicken). The algorithm will exploit useful features from the images such as number of legs or colour to find useful patterns that link images with their correct labels. After successful training we can use the fully trained algorithm to attempt to predict the labels of a brand new set of unseen images. We generally judge the performance of the algorithm by its accuracy in prediction of these new unseen images. Supervised learning can be applied to a wide range of problems such as email spam detection or stock price prediction. The Decision Tree is an example of a supervised learning algorithm.

监督学习算法采用数据集，并使用其功能来学习与一组对应标签的关系。这个过程被称为训练，一旦完成，我们希望我们的算法能够很好地预测全新数据的标签，而该算法没有对真实标签的明确了解。例如，我们可以使用一组常见动物的图像及其相应的标签(例如狗，猫，鸡)来训练监督算法。该算法将利用图像的有用特征(例如腿数或颜色)来找到将图像与其正确标签链接在一起的有用模式。成功训练后，我们可以使用经过全面训练的算法来尝试预测一组全新的看不见图像的标签。我们通常通过预测这些新的看不见的图像的准确性来判断算法的性能。监督学习可以应用于各种各样的问题，例如电子邮件垃圾邮件检测或股价预测。决策树是监督学习算法的一个示例。

无监督学习 (Unsupervised Learning)

Unsupervised learning algorithms, on the other hand, work with data that isn’t explicitly labelled. Instead, unsupervised algorithms attempt to find some sort of underlying structure in the data. Are some observations clustered into groups? Are there interesting relationships between different features? Which features carry most of the information? Unlike supervised learning, there is generally no need train unsupervised algorithms as they can be applied directly to the data of interest. Also in contrast to supervised learning, assessing performance of an unsupervised learning algorithm is somewhat subjective and largely depend on the specific details of the task. Unsupervised learning is commonly used in tasks such as text mining and dimensionality reduction. K-means is an example of an unsupervised learning algorithm.

另一方面，无监督学习算法可处理未明确标记的数据。相反，无监督算法会尝试在数据中找到某种基础结构。是否将某些观察结果分组？不同功能之间是否存在有趣的关系？哪些功能可以携带大多数信息？与监督学习不同，通常不需要训练非监督算法，因为它们可以直接应用于感兴趣的数据。同样与监督学习相反，评估无监督学习算法的性能有些主观，并且很大程度上取决于任务的具体细节。无监督学习通常用于诸如文本挖掘和降维等任务中。 K均值是无监督学习算法的一个示例。

打破二分法 (Breaking the Dichotomy)

In recent years a number of paradigms have appeared that don’t quite fit under the supervised and unsupervised labels. Semi-Supervised Learning is just what is sounds like, approaches that combine some labelled and some unlabelled data. Often labelling is an expensive, time consuming process so there are many situations where we would like to use information from a small amount of labelled data and a larger amount of unlabelled data. Also related to this situation is Active Learning where a learning algorithm can query a user to label particular observations which will add the most information. A slightly different situation is where we would like an algorithm to learn from experience. For example, in the scenario of a game such as chess we might like an algorithm to learn from playing many games and using a sort of label from the result of each game. At a very high level we might hope that good moves would be labelled as winning moves while bad moves might be correlated with losing. This is known as Reinforcement Learning and has received a lot of attention in recent years. Most machine learning approaches are quite narrow in the tasks they can achieve whereas Meta-Learning is concerned with generalisability or learning to learn. An example might be a algorithm that should identify animals in images but is only trained on cats and dogs. In this situation, a good meta learning algorithm would be able to identify brand new animals that it has not yet seen. There are plenty of other approaches that don’t neatly fit under supervised or unsupervised learning, but I hope this post gives a useful introduction to the topic.

近年来，出现了许多范式，这些范式在监督和无监督的标签下不太适合。 半监督学习就是听起来很像的方法，它结合了一些标记的数据和一些未标记的数据。标记通常是一个昂贵且耗时的过程，因此在许多情况下，我们希望使用来自少量标记数据和大量未标记数据的信息。与这种情况也有关的是主动学习，在这种学习方法中，学习算法可以查询用户以标记特定的观察结果，从而增加最多的信息。稍微不同的情况是我们希望算法可以从经验中学习。例如，在象棋这样的游戏场景中，我们可能希望有一种算法可以从玩很多游戏中学习，并从每个游戏的结果中使用某种标签。在很高的水平上，我们可能希望好的举动会被标记为获胜的举动，而坏的举动可能会与输掉有关。这就是所谓的强化学习，近年来受到了很多关注。大多数机器学习方法在可以完成的任务中都非常狭窄，而元学习则关注普遍性或学习学习。一个示例可能是一种算法，该算法应识别图像中的动物，但仅对猫和狗进行训练。在这种情况下，良好的元学习算法将能够识别尚未见过的全新动物。还有很多其他方法并不能很好地适应有监督或无监督学习，但是我希望这篇文章对这一主题有一个有益的介绍。

All images are my own.

所有图像都是我自己的。

翻译自: https://towardsdatascience.com/supervised-vs-unsupervised-learning-in-2-minutes-72dad148f242