机器学习数据范例
Deep learning is a vast field, centered around an algorithm whose shape is determined by millions or even billions of variables and is constantly being altered — the neural network. It seems that every other day overwhelming amounts of new methods and techniques are being proposed.
深度学习是一个广阔的领域,围绕着一种算法,该算法的形状由数百万甚至数十亿个变量确定,并且一直在不断变化-神经网络。 似乎每隔一天就要提出大量的新方法和技术。
In general, however, deep learning in the modern era can be broken down into three fundamental learning paradigms. Within each lies an approach and belief towards learning that offers significant potential and interest to increasing the current power and scope of deep learning.
但是,总的来说,现代时代的深度学习可以分为三个基本的学习范式。 在每个方法中都包含一种学习方法和信念,它为增加当前深度学习的能力和范围提供了巨大的潜力和兴趣。
-
Hybrid learning — how can modern deep learning methods cross the boundaries between supervised and unsupervised learning to accommodate for a vast amount of unused unlabeled data?
混合学习 -现代深度学习方法如何跨越有监督和无监督学习之间的界限,以适应大量未使用的未标记数据?
-
Composite learning — how can different models or components be connected in creative methods to produce a composite model greater than the sum of its parts?
复合学习 -如何将不同的模型或组件以创新的方式联系起来,以产生大于其各个部分之和的复合模型?
-
Reduced learning — how can both the size and information flow of models be reduced, both for performance and deployment purposes, while maintaining the same or greater predictive power?
减少学习 -在保持相同或更大的预测能力的同时,如何减少模型的大小和信息流(出于性能和部署目的)?
The future of deep learning lies in these three paradigms of learning, each of which are heavily interconnected.
深度学习的未来在于这三个学习范式,它们彼此紧密相连。
混合学习 (Hybrid Learning)
This paradigm seeks to cross boundaries between supervised and unsupervised learning. It is often used in the context of business because of the lack and high cost of labelled data. In essence, hybrid learning is an answer to the question,
这种范例试图在有监督和无监督学习之间划清界限。 由于缺少标记数据且成本较高,因此经常在业务环境中使用它。 本质上,混合学习是对这个问题的答案,
How can I use supervised methods to solve/in conjunction with unsupervised problems?
如何使用监督方法来解决/结合非监督问题?
For one, semi-supervised learning is gaining ground in the machine learning community for being able to perform exceptionally well on supervised problems with few labelled data. For example, a well-designed semi-supervised GAN (Generative Adversarial Network) achieved over 90% accuracy on the MNIST dataset after seeing only 25 training examples.
首先,半监督学习在机器学习社区中获得了发展,因为它能够以很少的标记数据在监督问题上表现出色。 例如,经过精心设计的半监督GAN(生殖对抗网络)在仅看到25个训练示例后,在MNIST数据集上的准确性就达到了90%以上。
Semi-supervised learning is designed for datasets where there is a lot of unsupervised data but small amounts of supervised data. Whereas traditionally a supervised learning model would be trained on one part of the data and an unsupervised model the other, a semi-supervised model can combine labelled data with insights extracted from unlabeled data.
半监督学习是针对存在大量非监督数据而少量监督数据的数据集而设计的。 传统上,将在数据的一部分上训练有监督的学习模型,而在另一部分上训练无监督的模型,而半监督的模型则可以将标记的数据与从未标记的数据中提取的见解相结合。
The semi-supervised GAN (abbreviated as SGAN), is an adaptation of the standard Generative Adversarial Network model. The discriminator both outputs 0/1 to indicate if an image is generated or not, but also outputs the class of the item (multioutput learning).
半监督GAN(缩写为SGAN)是标准Generative Adversarial Network模型的改编。 鉴别器既输出0/1以指示是否生成图像,又输出项目的类别(多输出学习)。
This is premised on the idea that through the discriminator learning to differentiate between real and generated images, it is able to learn their structures without concrete labels. With additional reinforcement from a small amount of labelled data, semi-supervised models can achieve top performances with minimal amounts of supervised data.
这是基于这样的思想,即通过鉴别器学习来区分真实图像和生成的图像,它能够在没有具体标签的情况下学习其结构。 通过少量标签数据的额外加强,半监督模型可以用最少的监督数据来达到最佳性能。
You can read more about SGANs and semi-supervised learning here.
您可以在此处阅读有关SGAN和半监督学习的更多信息。
GANs are also involved in another area of hybrid learning — self-supervised learning, in which unsupervised problems are explicitly framed as supervised ones. GANs artificially create supervised data through the introduction of a generator; labels are created to identify real/generated images. From an unsupervised premise, a supervised task was created.
GAN还参与了混合学习的另一个领域- 自我监督学习,其中无监督问题被明确地定义为监督问题。 GAN通过引入生成器来人为地创建监督数据; 创建标签以标识真实/生成的图像。 在无人监督的前提下,创建了有监督的任务。
Alternatively, consider the usage of encoder-decoder models for compression. In their simplest form, they are neural networks with a small amount of nodes in the middle to represent some sort of bottleneck, compressed form. The two sections on either side are the encoder and decoder.
或者,考虑使用编码器-解码器模型进行压缩。 最简单的形式是神经网络,中间有少量节点,代表某种瓶颈的压缩形式。 两侧的两个部分是编码器和解码器。
The network is trained to produce the same output as the vector input (an artificially created supervised task from unsupervised data). Because there is a deliberately placed bottleneck in the middle, the network cannot passively pass the information along; instead, it must find the best ways to preserve the content of the input into a small unit such that it can be reasonably decoded again by the decoder.
训练网络以产生与向量输入相同的输出(从无监督的数据人工创建的有监督的任务)。 由于中间有一个故意放置的瓶颈,因此网络无法被动传递信息。 取而代之的是,它必须找到最好的方法将输入的内容保存到一个较小的单元中,以便解码器可以对其进行合理地再次解码。
After trained, the encoder and decoder are taken apart and can be used on receiving ends of compressed or encoded data to transmit information in extremely small form with little to no lost data. They can also be used to reduce the dimensionality of data.
经过训练后,编码器和解码器被拆开,可用于压缩或编码数据的接收端,以极小的形式传输信息,几乎不会丢失数据。 它们也可以用来减少数据的维数。
As another example, consider a large collection of texts (perhaps comments from a digital platform). Through some clustering or manifold learning method, we can generate cluster labels for collections of texts, then treat these as labels (provided that the clustering is well done).
再举一个例子,考虑大量文本(也许来自数字平台的评论)。 通过某种聚类或多种学习方法,我们可以为文本集合生成聚类标签,然后将它们视为标签(前提是聚类做得很好)。
After each cluster is interpreted (e.g. cluster A represents comments complaining about a product, cluster B represents positive feedback, etc.) a deep NLP architecture like BERT can then be used to classify new texts into these clusters, all with completely unlabeled data and minimal human involvement.
在解释了每个群集之后(例如,群集A代表抱怨某产品的评论,群集B代表积极的反馈等),然后可以使用像BERT这样的深层NLP架构将新文本分类到这些群集中,所有这些都具有完全未标记的数据并且最少人类的参与。
This is yet again a fascinating application of converting unsupervised tasks into supervised ones. In an era where the vast majority of all data is unsupervised data, there is tremendous value and potential in building creative bridges to cross the boundaries between supervised and unsupervised learning with hybrid learning.
这又是将非监督任务转换为监督任务的一种引人入胜的应用。 在当今所有数据中绝大多数都是非监督数据的时代,建立创造性的桥梁来跨越混合学习在监督和非监督学习之间建立界限具有巨大的价值和潜力。
综合学习 (Composite Learning)
Composite learning seeks to utilize the knowledge not of one model but of several. It is the belief that through unique combinations or injections of information — both static and dynamic — deep learning can continually go deeper in understanding and performance than a single model.
复合学习旨在利用一种模型的知识,而不是一种模型的知识。 人们认为,通过静态或动态信息的独特组合或注入,深度学习可以比单个模型持续地在理解和性能上更深入。
Transfer learning is an obvious example of composite learning, and is premised on the idea that a model’s weights can be borrowed from a model pretrained on a similar task, then fine-tuned on a specific task. Pretrained models like Inception or VGG-16 are built with architectures and weights designed to distinguish between several different classes of images.
转移学习是组合学习的一个明显例子,其前提是可以从预先在相似任务上训练的模型中借鉴模型的权重,然后在特定任务上对其进行微调。 像Inception或VGG-16这样的预训练模型是使用旨在区分几种不同类别图像的体系结构和权重构建的。
If I were to train a neural network to recognize animals (cats, dogs, etc.), I wouldn’t train a convolutional neural network from scratch because it would take too long to achieve good results. Instead, I’d take a pretrained model like Inception, which has already stored the basics of image recognition, and train for a few additional epochs on the dataset.
如果我要训练一个神经网络来识别动物(猫,狗等),那么我不会从头开始训练卷积神经网络,因为要获得良好的结果将花费很长时间。 相反,我将采用像Inception这样的预训练模型,该模型已经存储了图像识别的基础知识,并在数据集中训练了一些额外的时期。
Similarly, word embeddings in NLP neural networks, which map words physically closer to other words in an embedding space depending on their relationships (e.g. ‘apple’ and ‘orange’ have smaller distances than ‘apple’ and ‘truck’). Pretrained embeddings like GloVe can be placed into neural networks to start from what is already an effective mapping of words to numerical, meaningful entities.
类似地,NLP神经网络中的词嵌入,根据词与词之间的关系将词在物理上更接近其他词(例如,“苹果”和“橙色”的距离要比“苹果”和“卡车”的距离小)。 像GloVe这样的预训练嵌入可以放置到神经网络中,从已经有效地将单词映射到有意义的数字实体开始。
Less obviously, competition can also stimulate knowledge growth. For one, Generative Adversarial Networks borrow from the composite learning paradigm by fundamentally pitting two neural networks against each other. The generator’s goal is to trick the discriminator, and the discriminator’s goal is not to be tricked.
不太明显的是,竞争也可以刺激知识的增长。 一方面,生成对抗网络通过从根本上使两个神经网络相互抵触而借鉴了复合学习范式。 生成器的目标是欺骗鉴别器,而鉴别器的目标则不是欺骗。
Competition among models will be referred to as ‘adversarial learning’, not to be confused with another type of adversarial learning that refers to the designing of malicious inputs and exploiting of weak decision boundaries in models.
模型之间的竞争将被称为“对抗性学习”,不要与另一种对抗性学习相混淆,后者是指设计恶意输入并利用模型中的弱决策边界 。
Adversarial learning can stimulate models, usually of different types, in which the performance of a model can be represented in relation to the performance of others. There is still a lot of research to be done in the field of adversarial learning, with the generative adversarial network as the only prominent creation of the subfield.
对抗学习可以刺激通常是不同类型的模型,其中模型的性能可以相对于其他模型的性能来表示。 在对抗学习领域,仍需要进行大量研究,而生成对抗网络是该子领域的唯一杰出创造。
Competitive learning, on the other hand, is similar to adversarial learning, but is performed on the node-by-node scale: nodes compete for the right to respond to a subset of the input data. Competitive learning is implemented in a ‘competitive layer’, in which a set of neurons are all the same, except for some randomly distributed weights.
另一方面,竞争性学习与对抗性学习类似,但是是以逐个节点的规模执行的:节点竞争对输入数据子集做出响应的权利。 竞争性学习是在“竞争性层”中实现的,其中一组神经元完全相同,除了一些随机分布的权重。
Each neuron’s weight vector is compared to the input vector and the neuron with the highest similarity, the ‘winner take all’ neuron, is activated (output = 1). The others are ‘deactivated’ (output = 0). This unsupervised technique is a core component of self-organizing maps and feature discovery.
将每个神经元的权重向量与输入向量进行比较,并**具有最高相似性的神经元,即“赢家通吃”神经元(输出= 1)。 其他的则被“禁用”(输出= 0)。 这种无人监督技术是自组织地图和特征发现的核心组成部分。
Another interesting example of composite learning is in neural architecture search. In simplified terms, a neural network (usually recurrent) in a reinforcement learning environment learns to generate the best neural network for a dataset — the algorithm finds the best architecture for you! You can read more about the theory here and implementation in Python here.
复合学习的另一个有趣的例子是神经体系结构搜索 。 简单来说,强化学习环境中的神经网络(通常是递归的)会学习为数据集生成最佳的神经网络-该算法为您找到最佳的架构! 您可以在此处阅读更多有关该理论和Python实现的信息 。
Ensemble methods are also a staple in composite learning. Deep ensemble methods have shown to be very effective, and the stacking of models end-to-end, like encoders and decoders, has risen in popularity.
集成方法也是复合学习中的主要内容。 深度集成方法已证明非常有效 ,并且端对端模型的堆叠(例如编码器和解码器)已越来越流行。
Much of composite learning is figuring out unique ways to build connections between different models. It is premised on the idea that,
许多复合学习正在寻找在不同模型之间建立联系的独特方法。 前提是,
A single model, even one very large, performs worse than several small models/components, each delegated to specialize in part of the task.
一个单一的模型,甚至是非常大的模型,都比几个小型模型/组件的性能差,每个小型模型/组件都被委托专门负责部分任务。
For example, consider the task of building a chatbot for a restaurant.
例如,考虑为餐厅构建聊天机器人的任务。
We can segment it it into three separate parts: pleasantries/chit-chat, information retrieval, and an action, and design a model to specialize in each. Alternatively, we can delegate a singular model to perform all three tasks.
我们可以将其分为三个独立的部分:愉快/闲聊,信息检索和动作,并设计一个专门针对每个模型的模型。 另外,我们可以委托一个单一模型来执行所有三个任务。
It should be no surprise that the compositional model can perform better while taking up less space. Additionally, these sorts of nonlinear topologies can be easily constructed with tools like Keras’ functional API.
组成模型可以在占用更少空间的同时实现更好的性能也就不足为奇了。 此外,可以使用Keras的功能性API之类的工具轻松构建此类非线性拓扑。
In order to process an increasing diversity of data types, like videos and 3-dimensional data, researchers must build creative compositional models
为了处理越来越多的数据类型(例如视频和3D数据),研究人员必须建立创造性的合成模型
Read more about compositional learning and the future here.
减少学习 (Reduced Learning)
The size of models, particularly in NLP — the epicenter of flurried excitement in deep learning research — is growing, by a lot. The most recent GPT-3 model has 175 billion parameters. Comparing it to BERT is like comparing Jupiter to a mosquito (well, not literally). Is the future of deep learning bigger?
模型的大小,特别是在NLP -深学习研究恍兴奋的震中-不断增长, 受到了很多 。 最新的GPT-3模型具有1,750 亿个参数。 将其与BERT进行比较就像将木星与蚊子进行比较(嗯,不是字面上的意思)。 深度学习的未来会更大吗?
Very arguably, no. GPT-3 is very powerful, admittedly, but it has shown repeatedly in the past that ‘successful sciences’ are ones that have the largest impact on humanity. Whenever academia strays too far from reality, it usually fades into obscurity. This was the case when neural networks were forgotten in the late 1900’s for a brief period of time because there was so little available data that the idea, however ingenious, was useless.
非常有争议,没有。 诚然,GPT-3非常强大,但是在过去,它反复表明“成功的科学”是对人类影响最大的科学。 每当学术界偏离现实太远时,它通常就会变得晦涩难懂。 就是在1900年代末短暂地忘记了神经网络的情况,这是因为可用数据很少,所以这个想法无论多么巧妙,都没有用。
GPT-3 is another language model, and it can write convincing text. Where are its applications? Yes, it could generate, for instance, answers to a query. There are, however, more efficient ways to do this (e.g. traverse a knowledge graph and use a smaller model like BERT to output an answer).
GPT-3是另一种语言模型,它可以编写令人信服的文本。 它的应用在哪里? 是的,它可以生成例如查询的答案。 但是,有更有效的方法(例如遍历知识图并使用较小的模型(例如BERT)输出答案)。
It simply just does not seem to be the case that GPT-3’s massive size, not to mention a larger model, is feasible or necessary given a drying up of computational power.
考虑到计算能力的枯竭 ,GPT-3的庞大规模(更不用说更大的模型)似乎根本不是可行或必要的情况。
“Moore’s Law is kind of running out of steam.”- Satya Nadella, CEO of Microsoft
“摩尔定律有点枯竭 。”-微软首席执行官萨蒂亚·纳德拉 ( Satya Nadella)
Instead, we’re moving towards an AI-embedded world, where a smart refrigerator can automatically order groceries and drones can navigate entire cities on their own. Powerful machine learning methods should be able to be downloaded onto PCs, mobile phones, and small chips.
取而代之的是,我们正朝着AI嵌入的世界迈进,在这个世界中,智能冰箱可以自动订购杂货,而无人机则可以自行导航整个城市。 强大的机器学习方法应该能够下载到PC,移动电话和小型芯片上。
This calls for lightweight AI: making neural networks smaller while maintaining performance.
这要求轻量级AI:在保持性能的同时使神经网络更小。
It turns out that, directly or indirectly, almost everything in deep learning research has to do with reducing the necessary amount of parameters, which goes hand-in-hand with improving generalization and hence, performance. For example, the introduction of convolutional layers drastically reduced the number of parameters needed for neural networks to process images. Recurrent layers incorporate the idea of time while using the same weights, allowing neural networks to process sequences better and with less parameters.
事实证明,直接或间接地,深度学习研究中的几乎所有事情都与减少必要的参数数量有关,而这与提高通用性并因此提高性能密切相关。 例如,卷积层的引入极大地减少了神经网络处理图像所需的参数数量。 循环层在使用相同权重的同时合并了时间的概念,从而使神经网络可以更好地处理序列并使用更少的参数。
Embedding layers explicitly map entities to numerical values with physical meanings such that the burden is not placed on additional parameters. In one interpretation, Dropout layers explicitly block parameters from operating on certain parts of an input. L1/L2 regularization makes sure a network utilizes all of its parameters by making sure none of them grows too large and that each maximizes their information value.
嵌入层将实体显式映射到具有物理意义的数值,从而不会将负担放在其他参数上。 在一种解释中, 辍学层明确阻止参数对输入的某些部分进行操作。 L1 / L2正则化通过确保网络中的所有参数都不会变得太大以及每个参数都最大化其信息价值来确保网络利用其所有参数。
With the creation of specialized layers, networks require less and less parameters for more complex and larger data. Other more recent methods explicitly seek to compress the network.
通过创建专用层,网络需要越来越少的参数来获取更复杂和更大的数据。 其他最近的方法明确地寻求压缩网络。
Neural network pruning seeks to remove synapses and neurons that don’t provide value to the output of a network. Through pruning, networks can maintain their performance while removing almost all of itself.
神经网络修剪力图去除无法为网络输出提供价值的突触和神经元。 通过修剪,网络可以在保持性能的同时删除几乎所有网络。
Other methods like Patient Knowledge Distillation find methods to compress large language models into forms downloadable onto, for example, users’ phones. This was a necessary consideration for the Google Neural Machine Translation (GNMT) system, which powers Google Translate, which needed to create a high-performing translation service that could be accessed offline.
其他方法,例如患者知识提取,可以找到将大型语言模型压缩为可下载到用户电话等表格的方法。 这是为Google Translate提供支持的Google神经机器翻译(GNMT)系统的必要考虑因素,该系统需要创建可以离线访问的高性能翻译服务。
In essence, reduced learning centers around deployment-centric design. This is why most research for reduced learning comes from the research department of companies. One aspect of deployment-centric design is not to blindly follow performance metrics on datasets, but to focus on potential issues when a model is deployed.
实质上,减少的学习中心围绕以部署为中心的设计。 这就是为什么大多数减少学习的研究来自公司的研究部门。 以部署为中心的设计的一个方面不是盲目遵循数据集的性能指标,而是专注于部署模型时的潜在问题。
For instance, previously mentioned adversarial inputs are malicious inputs designed to trick a network. Spray paint or stickers on signs can trick self-driving cars to accelerating well over the speed limit. Part of responsible reduced learning is not only making models lightweight enough for usage, but ensuring that it can accommodate for corner cases not represented in datasets.
例如,前面提到的对抗性输入是旨在欺骗网络的恶意输入。 在标志上喷上油漆或贴纸会欺骗自动驾驶汽车,使其加速超过极限速度。 负责任的减少学习的一部分,不仅是使模型轻巧到足以使用,而且要确保它可以适应数据集中未显示的极端情况。
Reduced learning is perhaps getting the least attention of research in deep learning, because “we managed to achieve good performance with a feasible architecture size” isn’t nearly as sexy as “we achieve state-of-the-art performance with an architecture consisting of kajillions of parameters”.
减少学习可能是深度学习研究中最不受关注的问题,因为“我们设法通过合理的架构规模实现了良好的性能”并不比“我们通过包含以下内容的架构实现了最新的性能”性感数十亿个参数”。
Inevitably, when the hyped pursuit of a higher fraction of a percentage dies away, as the history of innovation as shown, reduced learning — which is really just practical learning — will receive more of the attention it deserves.
不可避免地,当人们不再追求更高的百分比时,如所示的创新历史,减少学习(实际上只是实践性学习)将得到更多应有的关注。
摘要 (Summary)
- Hybrid learning seeks to cross the boundaries of supervised and unsupervised learning. Methods like semi-supervised and self-supervised learning are able to extract valuable insights from unlabeled data, something incredibly valuable as the amount of unsupervised data grows exponentially. 混合学习试图跨越有监督和无监督学习的界限。 半监督学习和自我监督学习之类的方法能够从未标记的数据中提取有价值的见解,随着无监督数据量的呈指数增长,这种方法具有不可思议的价值。
- As tasks grow more complex, composite learning deconstructs one task into several simpler components. When these components work together — or against each other — the result is a more powerful model. 随着任务变得越来越复杂,复合学习将一项任务分解为几个更简单的组件。 当这些组件一起工作或相互影响时,结果就是一个更强大的模型。
- Reduced learning hasn’t received much attention as deep learning rides out a hype phase, but soon enough practicality and deployment-centric design will emerge. 随着深度学习走入炒作阶段,减少学习并没有引起太多关注,但是很快就会出现足够的实用性和以部署为中心的设计。
Thanks for reading!
谢谢阅读!
机器学习数据范例