unity实习生简历_AI @ Unity实习生帮助塑造世界

unity实习生简历

Every Summer, we recruit interns to help us deliver on our mission to empower Unity developers with AI and Machine Learning tools and services. Last summer, the [email protected] group had four fantastic interns working on impactful projects: Aryan Mann, PSankalp Patro, Ruo-Ping Dong, and Jay Patel. Read on to find out more about their experiences and achievements interning at Unity.

每年夏天，我们都会招募实习生，以帮助我们履行使命，为Unity开发人员提供AI和机器学习工具及服务。去年夏天，AI @ Unity小组有四个出色的实习生来从事具有影响力的项目：Aryan Mann，PSankalp Patro，Ruo-Ping Dong和Jay Patel。请继续阅读以了解有关他们在Unity实习的经验和成就的更多信息。

As Unity grows, our internship program grows as well. For 2020, the size of our internship program will increase by 40% globally. The [email protected] organization has tripled the number of intern positions for 2020, and we are hiring in San Francisco and Bellevue for a total of 19 open positions, ranging from software development to machine learning research. If you are interested in our 2020 internship program, please apply here.

随着Unity的发展，我们的实习计划也随之增长。到2020年，我们的实习计划规模将在全球范围内增加40％。 AI @ Unity组织将2020年的实习职位数量增加了两倍，我们正在旧金山和贝尔维尤(Bellevue)招聘一共19个职位，从软件开发到机器学习研究。如果您对我们的2020年实习计划感兴趣，请在此处申请。

Aryan Mann：游戏模拟SDK (Aryan Mann: Game Simulations SDK)

The Game Simulations team aspires to support game developers throughout the game creation phase by enabling them to leverage large-scale game simulations to test, validate and optimize their games. This is intended to augment traditional human playtesting where developers can analyze their game by launching a large number of parallel simulations and analyzing the resulting data. The simulations and resulting analysis can be used to test the game (ensure there are no crashes), answer critical design questions (is the game balanced?) and even optimize the game (find the game settings that achieve the desired balance). The Game Simulations team builds both an SDK and cloud service. The SDK enables developers to instrument the right data/metrics that they wish to track, while the cloud service enables running game builds (instrumented with the SDK) at unprecedented scale and process/analyze the resulting data.

游戏模拟团队希望通过使大型游戏模拟能够测试，验证和优化他们的游戏，从而在整个游戏创建阶段为游戏开发商提供支持。这旨在增强传统的人类游戏测试，开发人员可以通过启动大量并行模拟并分析所得数据来分析其游戏。仿真和结果分析可用于测试游戏(确保没有崩溃)，回答关键的设计问题(游戏是否平衡？)甚至优化游戏(查找达到所需平衡的游戏设置)。游戏模拟团队同时构建SDK和云服务。 SDK使开发人员能够检测到他们希望跟踪的正确数据/指标，而云服务使运行的游戏版本(由SDK植入)能够以空前的规模进行并处理/分析所得数据。

问题：游戏测试需要很多时间 (Problem: Playtesting takes a lot of time)

Playtesting is a process of assessing and validating game design. Before game studios release their game to thousands, even millions of players, they will often run it by a small group of people who aim to examine various systems that dictate gameplay and give feedback. The scale and timing of the playtest are not limited to polishing the game after it is developed, rather it is a continuous process which exists right from the prototype up until the release of the game. Currently, playtesting for games is a very manual process where studios hire people to play the game and fill out surveys about their gameplay experience. Observing hours of gameplay, analyzing it, and getting feedback can be tedious and downright impractical.

游戏测试是评估和验证游戏设计的过程。在游戏工作室将其游戏发布给成千上万甚至数百万玩家之前，他们通常会由一小组人来运行它，这些人旨在研究决定游戏玩法并提供反馈的各种系统。游戏测试的规模和时机不仅限于在开发游戏后对其进行润饰，而是一个连续的过程，从原型到游戏发行一直存在。当前，游戏的游戏测试是一个非常手动的过程，在这个过程中，工作室聘请人员来玩游戏并填写有关其游戏体验的调查表。观察游戏时间，对其进行分析并获得反馈可能是乏味且完全不切实际的。

To help explore the value of running game simulations for automated playtesting, I worked closely with Illogika, a veteran studio that has created AAA experiences such as Lara Croft GO and Cuphead. They are currently developing a new racing game called Rogue Racers (pictured above), which blends traditional infinite runner gameplay with competitive arcade-style mechanics such as spells and power-ups. They had a few design questions and wanted to see if the Game Simulations service could help answer those.

为了帮助探索运行游戏模拟对自动化游戏测试的价值，我与Illogika紧密合作，该工作室已经创建了AAA体验，例如Lara Croft GO和Cuphead，是经验丰富的工作室。他们目前正在开发一种名为Rogue Racers的新赛车游戏(如上图所示)，该游戏将传统的无限跑步者的游戏玩法与具有竞争性的街机风格的机制(如咒语和强化道具)融合在一起。他们有几个设计问题，想看看游戏模拟服务是否可以帮助回答这些问题。

解决方案：带有模拟的自动游戏测试 (Solution: Automated playtesting with simulations)

Illogika envisions the game to be competitive and requires a large degree of skill to be competitive, yet be forgiving to new players. As such, they initially wanted the maximum difference in completion times between two equally skilled players to be five seconds. To help answer this question, I set up a Playtest where two bots competed against each other on a specific map, and I used the Game Simulations SDK to track two metrics:

Illogika认为游戏必须具有竞争力，并且需要大量技能才能具有竞争力，但要宽容新玩家。因此，他们最初希望两位同等技能的玩家之间的完成时间最大差值为5秒。为了帮助回答这个问题，我设置了一个Playtest，其中两个机器人在特定地图上相互竞争，并使用Game Simulations SDK跟踪两个指标：

“Completion Time” tracked how many seconds it would take for the first bot to cross the finish line.

“ 完成时间” 跟踪第一个漫游器越过终点线需要多少秒。
“Completion Difference” recorded the difference in completion time between the first and the second bot in seconds.

“ 完成差异” 以秒为单位记录了第一个和第二个机器人之间完成时间的差异。

We then performed 1000’s of simulations of the game across the different skills settings of the bots to analyze these two metrics. When looking at the data in the graph below, we found that across a similar bot skill level, the “Completion Difference” was sometimes higher than the five seconds Illogika wanted.

然后，我们在漫游器的不同技能设置中执行了1000场游戏模拟，以分析这两个指标。在查看下表中的数据时，我们发现，在类似的机器人技能水平上，“完成差异”有时高于Illogika希望的五秒钟。

While this data is enough to answer the design question, Illogika demonstrated how the data we generated could provide additional insights that failed to catch our eye. They noticed that the average “Completion Difference”, which can be seen in the red line above, were merely two seconds. We even found that bots of different skill levels were still close to two seconds apart. This meant that the game does not value skill as much as Illogika wanted. They hypothesized that this was due to their emergency boost system being too proactive and powerful. When a player would fall behind, they would get a speed boost to catch up, which in its current configuration, made the gameplay require less skill. With this keen insight, Illogika reworked their emergency boost systems to provide comeback opportunities while still enabling more skilled players to thrive.

尽管这些数据足以回答设计问题，但Illogika证明了我们生成的数据如何能够提供无法引起我们注意的其他见解。他们注意到，在上方的红线中可以看到平均的“完成差异”仅为2秒。我们甚至发现，不同技能水平的机器人之间的距离仍然相差近两秒。这意味着该游戏没有像Illogika所希望的那样重视技能。他们假设这是由于他们的紧急救援系统过于主动和强大。当玩家落后时，他们将获得提速以赶上，按照目前的配置，这使得游戏所需的技能更少。凭借这种敏锐的洞察力，Illogika重新设计了他们的紧急助推系统，以提供复出机会，同时仍使技能娴熟的球员得以蓬勃发展。

From here, we wanted to explore the game settings that would best achieve the design goals that Illogika had in mind. To support this, we expanded our simulations to try out a large number of combinations of game parameters to help Illogika understand how the two metrics above change with three specific game parameters. An evolution of these experiments was presented at the Unite Copenhagen Keynote. Additionally, I helped evolve the Game Simulations SDK to support time-series metrics. This is helpful for validating a game’s economy and understanding how a player’s account balance (e.g. points, coins) evolve as they play a large number of sessions.

从这里开始，我们想探索最能实现Illogika所想到的设计目标的游戏设置。为了支持这一点，我们扩展了模拟以尝试大量游戏参数组合，以帮助Illogika理解上述两个指标如何随三个特定游戏参数而变化。这些实验的演化呈现在团结哥本哈根主题。此外，我还帮助发展了Game Simulations SDK，以支持时间序列指标。这有助于验证游戏的经济性，并了解玩家在玩大量游戏时其帐户余额(例如，点数，硬币)如何演变。

PSankalp Patro：培训适应性强的特工 (PSankalp Patro: Training adaptable agents)

The ML-Agents Toolkit is an open-source project that aims to enable developers to leverage Deep Reinforcement Learning (DRL) to train playable and non-playable characters. By simply instrumenting the characters inputs (how it perceives the environment), actions (what decisions it can take) and rewards (a signal for achieving a desired behavior), developers train a character or game entity to learn a desired behavior, as a byproduct of repeated interaction between the character and the environment (the world in which the character resides). From each interaction, the environment sends a reward signal to the character. The character then tries to learn the behavior that awards it the maximum rewards over time.

在 ML-代理工具包 是一个开源项目，旨在使开发人员能够充分利用深强化学习(DRL)训练可玩和非可玩的字符。通过简单地测试角色输入(它如何感知环境)，动作(它可以做出什么样的决定)和奖励(实现所需行为的信号)，开发人员可以训练角色或游戏实体学习副产品的所需行为。角色与环境(角色所居住的世界)之间反复互动的过程。通过每次交互，环境都会向角色发送奖励信号。然后，角色尝试学习随着时间的流逝而获得最大奖励的行为。

问题：过度拟合的陷阱 (Problem: The pitfall of overfitting)

Consider the in-house pet Puppo trained using DRL (if you’re not familiar with Puppo, check out this blog post). The ML-Agents toolkit enabled us to teach Puppo to fetch on a flat garden. Through repeated trials of throwing the stick, Puppo learned to walk and to fetch the stick guided by the reward signals it received by the environment every time it retrieved the stick.

考虑使用DRL训练过的内部宠物 Puppo (如果您不熟悉Puppo，请查看此博客文章 )。 ML-Agents工具包使我们能够教Puppo在平坦的花园中取货。通过反复试验扔棍子，Puppo在每次取回棍子时都受到环境收到的奖励信号的引导，学会了走路和取回棍子。

But what happens if we train Puppo to play fetch on a garden with a rough terrain? The previous ML-Agents setup would only allow us to train Puppo on a single fixed terrain. But when we play fetch on a different terrain, there is a drop in performance and Puppo often gets stuck.

但是，如果我们训练Puppo在地形崎terrain的花园里玩取球，会发生什么？以前的ML-Agents设置仅允许我们在单个固定地形上训练Puppo。但是，当我们在不同的地形上进行抓取操作时，性能会下降，Puppo经常会卡住。

This is a common pitfall in deep reinforcement learning termed as overfitting. Overfitting reduces the reliability, flexibility, and usability of our trained characters to perform at testing time. This poses a serious hindrance to developers trying to train their characters as they may display undesirable behavior when the environments are even slightly modified.

这是深度强化学习中普遍存在的陷阱，称为过度拟合。过度拟合会降低我们训练有素的角色在测试时执行的可靠性，灵活性和可用性。这对尝试训练其角色的开发人员构成了严重的障碍，因为即使对环境稍加修改，他们也可能表现出不良的行为。

解决方案：通用培训 (Solution: Generalized Training)

The project that I worked on for the Summer aims to mitigate overfitting by training characters, Puppo, in this instance, to learn the task over multiple variations of the training environment, as opposed to the single fixed environment. This allows characters to ignore trivial aspects that do not affect the task at hand. My project alters the conventional training protocol by introducing an additional step in the training pipeline: the periodic modification of the environment (e.g. the roughness of the terrain that Puppo plays in).

我的工作对夏季旨在缓解由训练字符过学习项目，Puppo，在这种情况下，学习任务过训练环境的多种变化，而不是单一的 固定的 环境。这使角色可以忽略不影响当前任务的琐碎方面。我的项目通过在培训流程中引入额外的步骤来更改常规的培训协议：定期修改环境(例如 Puppo 所处地形的崎 roughness 不平 )。

Let’s look at the performance of Puppo, who is now trained over multiple terrains. The terrain used for testing here is identical to the terrain used to test the Puppo in the earlier setup. As the newly trained Puppo traverses the terrain, we can clearly see how much quicker it is able to fetch the stick. In addition, it doesn’t get stuck as often either. It seems like Puppo has learned to play fetch better!

让我们看一下 Puppo 的表现，他现在已经接受了多种训练。此处用于测试的地形与先前设置中用于测试Puppo的地形相同。当新近训练的Puppo穿越地形时，我们可以清楚地看到它能够更快地取下棍子。此外，它也不会经常卡住。看来Puppo已经学会了更好的比赛！

The new training procedure is particularly helpful when training characters for tasks with dynamic environments. Agents that can generalize better do not need to be retrained as often when the environment changes during game development. Overfitting is a highly sought research area in the field of Reinforcement Learning (as well as the wider field of Machine Learning). To learn more about overfitting and progress made to address the issue, check out the following research paper, which formed the basis of the project: Assessing Generalization in Deep Reinforcement Learning. Visit the ML-Agent toolkit documentation on Generalized Training for a detailed description of how to use this new training option.

当在具有动态环境的任务中训练角色时，新的训练过程特别有用。当环境在游戏开发过程中发生变化时，可以更好地推广游戏的代理商无需经常接受培训。过度拟合是强化学习领域(以及更广泛的机器学习领域)的一个热门研究领域。要了解有关过度拟合和解决该问题的进展的更多信息，请查看以下研究论文，该论文构成了该项目的基础： 评估深度强化学习中的泛化 。请访问有关一般培训的ML-Agent工具箱文档，以获取有关如何使用此新培训选项的详细说明。

董若平：加快ML-Agent的培训 (Ruo-Ping Dong: Speeding up ML-Agents training)

At the start of the Summer, the ML-Agents Toolkit could only be used with CPU or single GPU training. The training for some complex games may take a long time since they are data-intensive (large batch sizes) and may use more complex neural networks. For my project, I wanted to understand the impact of GPU use on training performance and speed up some of our slower environments with multi-GPU training.

在夏季开始时，ML-Agents工具包只能与CPU或单个GPU训练一起使用。一些复杂游戏的训练可能会花费很长时间，因为它们是数据密集型(大批量)，并且可能使用更复杂的神经网络。对于我的项目，我想了解GPU使用对训练性能的影响，并通过多GPU训练加快一些较慢的环境。

问题：训练模型需要很多时间 (Problem: Training models takes a lot of time)

Training a reinforcement learning algorithm takes a lot of time. Time that is mostly spent either simulating — running the Unity game to collect data — or updating the model using said collected data. In a previous release of ML-Agents, we improved the former by providing a mechanism to launch multiple Unity environments in parallel (on a single machine). This project addresses the latter by providing the ability to leverage multiple GPUs during the model update phase.

训练强化学习算法需要大量时间。主要用于模拟(运行Unity游戏以收集数据)或使用所述收集的数据更新模型的时间。在ML-Agent的先前版本中，我们通过提供一种机制(在一台机器上)并行启动多个Unity环境来改进了前者。该项目通过提供在模型更新阶段利用多个GPU的能力来解决后者。

解决方案：利用多个GPU进行培训 (Solution: Leveraging multiple GPUs for training)

We replaced the original Proximal Policy Optimization (PPO) algorithm with a new algorithm which creates one copy of the model for each GPU. All of the models share the same neural network weights. When there is enough data to perform an update, each GPU processes a subset of the training batch in parallel. The multi-GPU policy will then aggregate and average the gradients from all GPUs apply the updated weights to all models.

我们用一种新算法替换了原来的近端策略优化(PPO)算法，该算法为每个GPU创建了一个模型副本。所有模型共享相同的神经网络权重。当有足够的数据来执行更新时，每个GPU会并行处理训练批处理的子集。然后，多GPU策略将汇总并平均所有GPU的梯度，将更新后的权重应用于所有模型。

We tested the effect of multi-GPU training by measuring the update time during a training run of the Obstacle Tower environment. We tested using three separate models available via ML-Agents: a small “Simple” Convolutional Neural Network (CNN), the “Nature” CNN described in Mnih et. al., and the ResNet described in Espeholt et. al. While multiple GPUs had a minimal impact on the performance for the smaller models, we can see a substantial improvement in performance for the larger ResNet model.

我们通过测量障碍塔环境训练期间的更新时间来测试多GPU训练的效果。我们使用通过ML-Agents获得的三种独立模型进行了测试：小型“简单”卷积神经网络(CNN)， Mnih等人中描述的“自然” CNN 。等，以及 Espeholt等人描述的ResNet 。等尽管多个GPU对较小模型的性能影响很小，但我们可以看到较大ResNet模型的性能有了显着提高。

数据管道优化 (Data pipeline optimization)

Looking closer into the update time, we noticed that ML-Agents was spending a substantial amount of time feeding the data into the graph, including pulling stored data from the training buffer, transforming input data into TensorFlow tensors, moving data onto GPU devices, etc. We wanted to improve this processing time by preparing the data for subsequent update batches while in parallel performing optimization on the current batch.

仔细查看更新时间，我们注意到ML-Agents花费大量时间将数据馈送到图形中，包括从训练缓冲区中提取存储的数据，将输入数据转换为TensorFlow张量，将数据移至GPU设备等。我们希望通过为后续更新批次准备数据，同时对当前批次执行优化来缩短处理时间。

We implemented this by adapting our trainer to use the Tensorflow Dataset API, which takes care of all data pipeline operations, including batching, shuffling, repeating, and prefetching. The experimental results showed a 20-40% improvement in update time for both CPU and GPU training.

我们通过使培训人员适应使用Tensorflow Dataset API的方式来实现这一目标，该API处理所有数据管道操作，包括批处理，混排，重复和预取。实验结果表明，CPU和GPU训练的更新时间缩短了20-40％。

杰伊·帕特尔(Jay Patel)：探索图像生成和设计 (Jay Patel: Exploring image generation and design)

Content creation is a broad and important component of game development, one in which machine learning may play an increasing role in the near future. In particular, the ability of machine learning algorithms to generate novel 2D images has improved dramatically in recent years. The potential applications of this technology for the creation of 3D worlds are plentiful – from machine-generated textures to level design, and more. For my internship, I focused on exploring one particular technique for human-guided image generation: Transparent Latent-space Generative Adversarial Network (TL-GAN).

内容创建是游戏开发的广泛而重要的组成部分，在不久的将来，机器学习将在其中发挥越来越重要的作用。尤其是，近年来，机器学习算法生成新颖2D图像的能力得到了显着提高。这项技术在3D世界创建方面的潜在应用非常广泛-从机器生成的纹理到关卡设计等等。在我的实习中，我专注于探索一种用于人类指导的图像生成的特殊技术：透明潜在空间生成对抗网络( TL-GAN )。

问题：GAN的输出很难控制 (Problem: Outputs of GANs are hard to control)

Before we can understand TL-GAN, we need first to understand GANs, the older and simpler model it is based on. GAN stands for “Generative Adversarial Network,” a deep neural network that can learn how to generate new data with the same distribution as the training data it has seen. For example, if our training data consists of a large set of car images, the GAN would train on those images, and eventually learn to create new, unique images of cars.

在理解TL-GAN之前，我们首先需要了解GAN，GAN是其基础且更老，更简单的模型。 GAN代表“对抗性神经网络”，这是一个深度神经网络，可以学习如何生成新数据，其分布与已看到的训练数据相同。例如，如果我们的训练数据包含大量的汽车图像，则GAN将在这些图像上进行训练，并最终学会创建新的独特汽车图像。

The primary shortcoming of this approach is that it does not provide a human-understandable manner of controlling the output. A random vector of noise controls the exact image produced by the generator, but exactly what random noise corresponds to which image features is not something a human can understand because the mapping is simply too complex.

这种方法的主要缺点是它没有提供人类可理解的控制输出的方式。噪声的随机矢量控制生成器生成的精确图像，但是由于映射太简单了，所以人类无法理解到底 是什么 随机噪声对应于哪些图像特征。

What we want is to have control over the features in the images? We can generate images of cars – but what if we want to control the color of the car, or maybe the number of seats? Any tool to generate content becomes much more powerful if a human can easily and intuitively direct it. The random noise vector that we feed into the generator is called the latent code. To have control over the features, we need to understand this latent space. This is where TL-GAN comes into the picture.

我们想要控制图像中的功能吗？我们可以生成汽车的图像，但是如果我们要控制汽车的颜色或座位数，该怎么办？如果人们可以轻松直观地进行内容生成，则任何用于生成内容的工具都将变得更加强大。我们馈入发生器的随机噪声矢量称为潜码。要控制功能，我们需要了解此潜在空间。这就是TL-GAN出现的地方。

解决方案：TL-GAN (Solution: TL-GAN)

This is where TL-GAN comes in. The TL-GAN model provides a way to control the output of the discriminator, but it requires a trained GAN as one of its components. So, my first goal was to train a GAN. As a test-case, I took on the task of generating images of cars, based on a large training set of car images.

这就是TL-GAN的用武之地。TL-GAN模型提供了一种控制鉴别器输出的方法，但它需要经过训练的GAN作为其组成部分之一。因此，我的首要目标是训练GAN。作为一个测试用例，我承担了根据大量的汽车图像训练集生成汽车图像的任务。

TL-GAN has three major components: a GAN, a feature extractor, and a generalized linear model. The roles of these three components are as follows:

TL-GAN具有三个主要组件：GAN，特征提取器和广义线性模型。这三个组件的作用如下：

GAN: Generates synthetic car images from a latent space random noise vector. We discussed this above. Trained using a large dataset of unlabeled images.

GAN ：从潜在空间随机噪声矢量生成合成汽车图像。我们在上面讨论了这一点。使用大量未标记图像进行训练。
Feature Extractor: A multi-class classifier that outputs labels for a given car image. Trained on a smaller set of labeled car images. Once trained, we can use the feature extractor with GAN to produce a large labeled dataset of {random latent vector, features}, by running synthetic car images through the feature extractor.

Feature Extractor ：一个多类分类器，用于输出给定汽车图像的标签。受过较少标签的汽车图像训练。经过训练后，我们可以通过特征提取器运行合成汽车图像，将特征提取器与GAN一起使用，以生成大型的{随机潜矢量，特征}标记数据集。
Generalized Linear Model (GLM): Trained to understand the latent space in terms of the multi-class labels we have. We train this using the large {random latent vector, features} dataset compiled from our (feature extractor + GAN) process above.

广义线性模型(GLM) ：经过训练可以根据我们拥有的多类标签来了解潜在空间。我们使用从上面的(特征提取器+ GAN)过程中编译得到的大型{random latent vector，features}数据集进行训练。

Unfortunately, I did not have the time to finish training TL-GAN. However, when trained, this model could allow a user to design the generated car in a tool that had a random button and a slider for each supported feature. The exciting thing is that if we can do this for cars – we can do it for anything.

不幸的是，我没有时间完成TL-GAN的培训。但是，在接受培训时，该模型可以允许用户在工具中设计生成的汽车，该工具具有用于每个受支持功能的随机按钮和滑块。令人兴奋的是，如果我们能在汽车上做到这一点–我们就能为任何事情做到。

我们的2020年实习计划 (Our 2020 Internship Program)

Our 2019 Summer Interns were a fantastic addition to the Game Simulations, ML-Agents and Visual Machine Learning teams (some of whom will return next year as full-time team members or for another internship). We will continue to expand our internship program in Summer 2020. If you want the opportunity to work on an aspirational project that will have an impact on the experiences of millions of players, please apply!

我们的2019年暑期实习生为Game Simulations，ML-Agents和Visual Machine Learning团队(其中一些将于明年作为专职团队成员或另一个实习生返回)提供了绝佳的补充。我们将在2020年夏季继续扩大我们的实习计划。如果您希望有机会从事一个有抱负的项目，这会对数百万玩家的体验产生影响，请申请！

翻译自: https://blogs.unity3d.com/2019/10/21/the-aiunity-interns-help-shape-the-world/

unity实习生简历