c++ 游戏 状态效果
你们中的许多人一直在使用我们的 游戏性能报告 服务来获取崩溃报告 ,并 询问 有关我们未来计划的 问题 。 我和我的团队已经阅读了您的所有帖子,因此我们知道大家都想要很棒的新功能! 请保持要求和想法的到来! 在过去的几个月中,我们已经对基础架构进行了重新设计,从支持项目的基础架构到支持超酷多层网络应用程序的基础架构,以处理那里数百万的Unity游戏。 继续阅读以了解更多关于我们的工作。 (Many of you have been using our Game Performance Reporting service to get crash reports, and asked questions about our plans going forward. My team and I have read all your posts, so we know you all want cool new features! Please, keep the requests and ideas coming! We’ve spent the past few months reworking the infrastructure from one that supports a project to one that supports a super-cool, multi-tiered web application to handle all the millions of Unity games out there. Read on to find out some more on what we’re up to.)
背景 (Background)
Unity Game Performance started as a Unity Hack Week project a year ago, with the simple goal of trying new things. We had people from different backgrounds contributing and each one took a piece of the crash service, which consists of:
一年前,Unity Game Performance始于Unity Hack Week项目,其简单目标是尝试新事物。 我们有来自不同背景的人员参与其中,每个人都提供了一份崩溃服务,其中包括:
-
The javascript UI
javascript界面
-
The Rails API that the UI uses
UI使用的Rails API
-
The crash report intake pipeline
事故报告输入管道
The UI changes are probably the most visible ones. You may have noticed the launch of developer.cloud.unity3d.com already, which aims to unify access to our growing number of services.
UI更改可能是最明显的更改。 您可能已经注意到了 developer.cloud.unity3d.com 的启动 ,其目的是统一对我们不断增长的服务的访问。
Of the three pieces of the crash service, the one that has changed most in the last 12 months has been the intake pipeline. These intake changes (fortunately) are less visible, but they are crucial, because we want to support every Unity game made.
在紧急服务的三项服务中,过去12个月变化最大的一项是进气管道。 这些摄入量变化(幸运的是)不太明显,但它们至关重要,因为我们要支持制作的每个Unity游戏。
过去的工作方式 (How It Used To Work)
Originally, the intake pipeline looked something like this:
最初,进气管道看起来像这样:
Editor Plugin -> Node -> SQS + DynamoDB -> Rails -> MySQL
编辑器插件->节点-> SQS + DynamoDB-> Rails-> MySQL
The editor plugin listened for exceptions, batched them, then sent them to Node. Node listened for the events, it put them in DynamoDB, then it sent an SQS message to Rails stating where to find the event in Dynamo. Rails then got it back out of Dynamo, processed it, and stored the data in MySQL. Though this workflow was really easy to set up, it’s not very elegant to say the least.
编辑器插件侦听异常,将其批处理,然后将其发送到Node。 Node监听了事件,将其放入DynamoDB中,然后将SQS消息发送到Rails,说明在Dynamo中可以找到事件的位置。 然后,Rails将其从Dynamo中取出,进行处理,并将数据存储在MySQL中。 尽管此工作流程确实很容易设置,但至少可以说这不是很优雅。
At that time, SQS had a fairly small message-size limit; not enough to store exceptions of all sizes. This is why the SQS message merely states where the event is stored in Dynamo. SQS has since increased the message-size limit to 2GB (which would have relieved our problem with storing exceptions). At first, we stored every event we received in Dynamo, just in case we made a huge mistake, because we could always re-import the data by replaying the events.
那时,SQS的消息大小限制很小。 不足以存储所有大小的异常。 这就是为什么SQS消息仅声明事件存储在Dynamo中的位置的原因。 此后,SQS已将 消息大小限制增加到2GB(这可以缓解我们存储异常的问题)。 最初,我们将收到的每个事件都存储在Dynamo中,以防万一我们犯了一个大错误,因为我们总是可以通过重播事件来重新导入数据。
我们住的时候发生了什么 (What Happened When We Went Live)
We launched our little hack project during GDC ‘15, and we got way more activity than we expected. We were expecting thousands of exceptions a day—but we got millions. We had to rate-limit certain projects that were sending thousands of exceptions per second.
我们在GDC '15期间启动了一个小小的hack项目,并且活动量超出了我们的预期 。 我们原本希望每天有成千上万的例外情况,但是却有数百万。 我们必须对每秒发送数千个异常的某些项目进行速率限制。
Outside of operational issues, we noticed that our setup had one big bottleneck. The time spent putting things into SQS and Dynamo, only to grab them in Rails, process them, and put them into the database. Just the Rails side of that took around 75ms per exception!
除运营问题外,我们注意到我们的设置存在一个大瓶颈。 花时间将内容放入SQS和Dynamo中,只是将它们放入Rails中,进行处理,然后将其放入数据库中。 只是,Rails的每个异常花费了大约75ms的时间!
One positive thing about the original setup was the way that accepting an event and processing an event were decoupled. This design made it easy to start and stop processing while we updated the code, without dropping ANY events.
有关原始设置 一个 积极 的事情是,接受一个事件和处理事件被分离的方式。 这种设计使我们在更新代码时轻松启动和停止处理,而不会丢弃任何事件。
接下来我们做什么 (What We Did Next)
In the abstract, processing a crash report consists of the following steps:
概括而言,处理崩溃报告包含以下步骤:
-
Fingerprint it,
指纹识别
-
Find or create it by the fingerprint,
通过指纹找到或创建它,
-
Increment the counter,
增加计数器
-
Associate it with the operating systems and platforms we saw it on.
将其与我们看到的操作系统和平台相关联。
Of course, I set out to replace just the fast thing that I didn’t like (Node) with something else that I hadn’t learned yet (Golang). I tried this, but realized it wouldn’t work any better, because the AWS libraries for Golang were very young. So I decided to try replacing the whole intake pipeline, just to simplify it.
当然,我设置了只更换 快件事,我不喜欢(节点)别的东西,我还没学会(Golang)。 我尝试过此方法,但意识到它再好不过了,因为Golang的AWS库还很年轻。 因此,我决定尝试更换整个进气管道,只是为了简化它。
My goal was to write something like this:
我的目标是写这样的东西:
Editor Plugin -> Go -> MySQL
编辑器插件->转到-> MySQL
I wanted something really simple and fast. I didn’t want disk space alerts from verbose logging, or memory alerts from abused Ruby processes. Here’s how my process went:
我想要一个非常简单快捷的工具。 我不想从冗长的日志记录中获取磁盘空间警报,也不想从滥用的Ruby进程中获取内存警报。 这是我的流程:
My initial implementation was a literal translation from Rails. It did all the same MySQL select statements, then created the rows or updated the counters.
我最初的实现是Rails的字面翻译。 它执行所有相同MySQL select语句,然后创建行或更新计数器。
My first optimization was to remove all the statements that were duplicated between reports. These duplicates were SELECT statements, such as: ‘SELECT id FROM operating_systems where name = “Windows 7”’. These statements were completely safe to cache in the app, and I made great use of Hashicorp’s go LRU hash to do it. Then I performed the same optimization to cache crash fingerprints, so that I didn’t have to ask the database each time I saw the same exception.
我的第一个优化是删除报表之间重复的所有语句。 这些重复项是SELECT语句,例如:' SELECT id FROM operating_systems其中name =“ Windows 7” '。 这些语句在应用程序中缓存是完全安全的,并且我充分利用了 Hashicorp的go LRU哈希 来做到这一点。 然后,我执行了相同的优化操作以缓存崩溃指纹,因此不必每次看到相同的异常都询问数据库。
I had to implement a fair amount of locking around each of these LRU hashes, which didn’t feel very Go-like, but it worked. One thing I did was make finer grain locks so that I could update different keys concurrently.
我必须对这些LRU散列中的每个散列进行大量的锁定,这看起来并不像Go,但是确实有效。 我做的一件事是制作 更精细的谷物锁, 以便可以同时更新不同的**。
The next bottleneck I hit was regarding writes: each write event caused me to increment the counter. My database was dutifully counting from 1 to 100,000,000. One at a time.
我遇到的下一个瓶颈是关于写入:每个写入事件使我增加计数器。 我的数据库忠实地从1增加到1亿。 一次一个。
I knew I wanted to batch my writes, but I wanted a to do it in a robust way. I leveraged Hashicorp’s LRU hash again, which provides an on evict hook. That way, when the crash report was evicted from memory, it was written to the database. But then I thought, “What if I don’t get enough unique crash reports to cause an eviction?” So, I hacked it and added another method that lets you make an entry with a Time To Live (“TTL”).
我知道我想分批写入,但是我想以一种健壮的方式做到这一点。 我再次利用了Hashicorp的LRU哈希,它提供了一个 逐出钩子 。 这样,当崩溃报告从内存中清除时,它将被写入数据库。 但是后来我想到:“如果我没有足够的唯一崩溃报告来导致驱逐该怎么办?” 因此,我对其进行了修改,并添加了另一种方法,该方法可让您使用生存时间(TTL)进行输入。
It’s important to note that the TTL lives on each entry. That way, each TTL eviction is staggered, so that it doesn’t create a thundering herd of database writes.
请务必注意,TTL位于 每个 条目上。 这样,每次TTL逐出都是交错的,因此不会造成 数据库写入 的 激增 。
Given all the above considerations, an AWS t2.medium instance can (burst) process about 10,000 req/s, which is pretty decent.
考虑到上述所有因素,AWS t2.medium实例可以(突发)处理约10,000 req / s的速度,这相当不错。
We also plan to have edge servers in different regions. Your games will send reports to the servers in the closest geographic region. Those servers will do the same batching, then they will forward the events to the area where the database lives. They’ll be using the same eviction hook to make an HTTPS request instead of a database call.
我们还计划在不同地区部署边缘服务器。 您的游戏会将报告发送到最近的地理区域中的服务器。 这些服务器将进行相同的批处理,然后将事件转发到数据库所在的区域。 他们将使用相同的收回挂钩来发出HTTPS请求,而不是发出数据库调用。
TL; DR:我知道关于游戏性能报告的新闻并不多,但是我们并没有忘记它。 我希望这个故事能帮助您了解我们在幕后所做的事情。 在我们的论坛上 继续与我们交谈 ! (TL;DR:, I know there hasn’t been much news around Game Performance Reporting, but we haven’t forgotten about it. I hope this story helped you understand what we’ve been doing behind the scenes. Keep talking to us on our forum!)
翻译自: https://blogs.unity3d.com/2015/12/02/the-state-of-game-performance-reporting/
c++ 游戏 状态效果