Unity 2017 Game Optimizations (Chris Dickinson 著)

1. Pursuing Performance Problems (已看)

　　Pursuing Performance Problems, provides an exploration of the Unity Profiler and a series of methods to profile our application, detect performance bottlenecks, and perform root cause analysis.

2. Scripting Strategies (已看)

　　Scripting Strategies, deals with the best practices for our Unity C# Script code, minimizing MonoBehaviour callback overhead, improving interobject communication, and more.

3. The Benefits of Batching (已看)

　　The Benefits of Batching, explores Unity's Dynamic Batching and Static Batching systems, and how they can be utilized to ease the burden on the Rendering Pipeline.

4. Kickstart Your Art

　　Kickstart Your Art, helps you understand the underlying technology behind art assets and learn how to avoid common pitfalls with importing, compression, and encoding.

5. Faster Physics

　　Faster Physics, is about investigating the nuances of Unity's internal Physics Engines for both 3D and 2D games, and how to properly organize our physics objects for improved performance.

6. Dynamic Graphics (已看)

　　Dynamic Graphics, provides an in-depth exploration of the Rendering Pipeline, and how to improve applications that suffer rendering bottlenecks in the GPU, or CPU, how to optimize graphical effects such as lighting, shadows, and Particle Effects, ways in which to ptimize Shader code, and some specific techniques for mobile devices.

7. Virtual Velocity and Augmented Acceleration

　　Virtual Velocity and Augmented Acceleration, focuses on the new entertainment mediums of Virtual Reality (VR) and Augmented Reality (AR), and includes several techniques for optimizing performance that are unique to apps built for these platforms.

8. Masterful Memory Management (已看)

　　Masterful Memory Management, examines the inner workings of the Unity Engine, the Mono Framework, and how memory is managed within these components to protect our application from excessive heap allocations and runtime garbage collection.

9. Tactical Tips and Tricks

　　Tactical Tips and Tricks, closes the book with a multitude of useful techniques used by Unity professionals to improve project workflow and scene management.

1. Pursuing Performance Problems

　　The Unity Profiler

The different subsystems it can gather data for are listed as follows:

　　CPU consumption (per-major subsystem)
　　Basic and detailed rendering and GPU information
　　Runtime memory allocations and overall consumption
　　Audio source/data usage
　　Physics Engine (2D and 3D) usage
　　Network messaging and operation usage
　　Video playback usage
　　Basic and detailed user interface performance (new in Unity 2017)
　　Global Illumination statistics (new in Unity 2017)

There are generally two approaches to make use of a profiling tool: instrumentation and benchmarking (although, admittedly, the two terms are often used interchangeably).

Instrumentation typically means taking a close look into the inner workings of the application by observing the behavior of targeted function calls, where/how much memory is being allocated, and, generally getting an accurate picture of what is happening with the hope of finding the root cause of a problem. However, this is normally not an efficient way of starting to find performance problems because profiling of any application comes with a performance cost of its own.

When a Unity application is compiled in Development Mode (determined by the Development Build flag in the Build Settings menu), additional compiler flags are enabled causing the application to generate special events at runtime, which get logged and stored by the Profiler. Naturally, this will cause additional CPU and memory overhead at runtime due to all of the extraworkload the application takes on. Even worse, if the application is being profiled through the Unity Editor, then even more CPU and memory will be spent, ensuring that the Editor updates its interface, renders additional windows (such as the Scene window), and handles background tasks. This profiling cost is not always negligible. In excessively large projects, it can sometimes cause wildly inconsistent behavior when the Profiler is enabled. In some cases, the inconsistency is significant enough to cause completely unexpected behavior due to changes in event timings and potential race conditions in asynchronous behavior. This is a necessary price we pay for a deep analysis of our code's behavior at runtime, and we should always be aware of its presence.

Before we get ahead of ourselves and start analyzing every line of code in our application, it would be wiser to perform a surface-level measurement of the application. We should gather some rudimentary data and perform test scenarios during a runtime session of our game while it runs on the target hardware; the test case could simply be a few seconds of Gameplay, playback of a cut scene, a partial play through of a level, and so on. The idea of this activity is to get a general feel for what the user might experience and keep
watching for moments when performance becomes noticeably worse. Such problems may be severe enough to warrant further analysis.

This activity is commonly known as benchmarking, and the important metrics we're interested in are often the number of frames per-second (FPS) being rendered, overall memory consumption, how CPU activity behaves (looking for large spikes in activity), and sometimes CPU/GPU temperature. These are all relatively simple metrics to collect and can be used as a best first approach to performance analysis for one important reason; it will save us an enormous amount of time in the long run, since it ensures that we only
spend our time investigating problems that users would notice.

We should dig deeper into instrumentation only after a benchmarking test indicates that further analysis is required. It is also very important to benchmark by simulating actual platform behavior as much as possible if we want a realistic data sample. As such, we should never accept benchmarking data that was generated through Editor Mode as representative of real gameplay, since Editor Mode comes with some additional overhead costs thatmight mislead us, or hide potential race conditions in a real application. Instead, we should hook the profiling tool into the application while it is running in a standalone format on the target hardware

Many Unity developers are surprised to find that the Editor sometimes calculates the results of operations much faster than a standalone application does. This is particularly common when dealing with serialized data like audio files, Prefabs and Scriptable Objects. This is because the Editor will cache previously imported data and is able to access it much faster than a real application would

　　　　　　Launching the Profiler

　　　　　　Editor or standalone instances

ensure that the Development Build and Autoconnect Profiler flags are enabled

Unity 2017 Game Optimizations (Chris Dickinson 著)

　　　　　　Connecting to a WebGL instance

　　　　　　Remote connection to an iOS device

https://docs.unity3d.com/Manual/TroubleShootingIPhone.html
　　　　　　Remote connection to an Android device

https://docs.unity3d.com/Manual/TroubleShootingAndroid.html
　　　　　　Editor profiling

We can profile the Editor itself. This is normally used when trying to profile the performance of custom Editor Scripts.
　　　　The Profiler window

The Profiler window is split into four main sections

Profiler Controls
Timeline View
Breakdown View Controls
Breakdown View

Unity 2017 Game Optimizations (Chris Dickinson 著)

　　　　　　Profiler controls

　　　　　　　　Add Profiler

　　　　　　　　Record

　　　　　　　　Deep Profile

Enabling the Deep Profile option re-compiles our scripts with much deeper level of instrumentation, allowing it to measure each and every invoked method

　　　　　　　　Profile Editor

　　　　　　　　Connected Player

　　　　　　　　Clear

　　　　　　　　Load

　　　　　　　　Save

　　　　　　　　Frame Selection

　　　　　　Timeline View

　　　　　　Breakdown View Controls

　　　　　　Breakdown View

　　　　　　　　The CPU Usage Area

Hierarchy mode reveals most callstack invocations, while grouping similar data elements and global Unity function calls together for convenience. For instance, rendering delimiters, such as BeginGUI() and EndGUI() calls, are combined together in this mode. Hierarchy mode is helpful as an initial first step to determine which function calls cost the most CPU time to execute.

Raw Hierarchy mode is similar to Hierarchy mode, except it will separate global Unity function calls into separate entries rather than being combined into one bulk entry. This will tend to make the Breakdown View more difficult to read, but may be helpful if we're trying to count how many times a particular global method is invoked or determining whether one of these calls is costing more CPU/memory than expected. For example, each BeginGUI() and EndGUI() calls will be separated into different entries, making it more clear how many times each is being called compared to the Hierarchy mode.

Perhaps, the most useful mode for the CPU Usage Area is the Timeline mode option (not to be confused with the main Timeline View). This mode organizes CPU usage during the current frame by how the call stack expanded and contracted during processing.

Timeline mode organizes the Breakdown View vertically into different sections that represent different threads at runtime, such as Main Thread, Render Thread, and various background job threads called Unity Job System,used for loading activity such as scenes and other assets. The horizontal axis represents time, so wider blocks are consuming more CPU time than narrower blocks. The horizontal size also represents relative time, making it easy to compare how much time one function call took compared to another. The vertical axis represents the callstack, so deeper chains represent more calls in the callstack at that time.

Under Timeline mode, blocks at the top of the Breakdown View are functions (or technically, callbacks) called by the Unity Engine at runtime (such as Start(), Awake(), or Update() ), whereas blocks underneath them are functions that those functions had called into, which can include functions on other Components or regular C# objects.

The Timeline mode offers a very clean and organized way to determine which particular method in the callstack consumes the most time and how that processing time measures up against other methods being called during the same frame. This allows us to gauge the method that is the biggest cause of performance problems with minimal effort.

For example, let's assume that we are looking at a performance problem in the following screenshot. We can tell, with a quick glance, that there are three methods that are causing a problem, and they each consume similar amounts of processing time, due to their similar widths:

Unity 2017 Game Optimizations (Chris Dickinson 著)

In the previous screenshot, we have exceeded our 16.667 millisecond budget with calls to three different MonoBehaviour Components. The good news is that we have three possible methods through which we can find performance improvements, which means lots of opportunities to find code that can be improved. The bad news is that increasing the performance of one method will only improve about one-third of the total processing for that frame. Hence, all three methods may need to be examined and optimized in order get back under budget.

　　　　　　　　The GPU Usage Area

The GPU Usage Area is similar to the CPU Usage Area, except that it shows method calls and processing time as it occurs on the GPU. Relevant Unity method calls in this Area will relate to cameras, drawing, opaque and transparent geometry, lighting and shadows, and so on.

The GPU Usage Area offers hierarchical information similar to the CPU Usage Area and estimates time spent calling into various rendering functions such as Camera.Render() (provided rendering actually occurs during the frame currently selected in the Timeline View).

　　　　　　　　The Rendering Area

The Rendering Area provides some generic rendering statistics that tend to focus on activities related to preparing the GPU for rendering, which is a set of activities that occur on the CPU (as opposed to the act of rendering, which is activity handled within the GPU and is detailed in the GPU Usage Area). The Breakdown View offers useful information, such as the number of SetPass calls (otherwise known as Draw Calls), the total number of batches used to render the Scene, the number of batches saved from Dynamic Batching and Static Batching and how they are being generated, as well as memory consumed for textures.

　　　　　　　　The Memory Area

Simple mode provides only a high-level overview of memory consumption of subsystems. This include Unity's low-level Engine, the Mono framework (total heap size that is being watched by the Garbage Collector), graphical assets, audio assets and buffers, and even memory used to store data collected nby the Profiler.

Detailed mode shows memory consumption of individual GameObjects and MonoBehaviours for both their Native and Managed representations. It also has a column explaining the reason why an object may be consuming memory and when it might be deallocated

　　　　　　　　The Audio Area

The Audio Area grants an overview of audio statistics and can be used both to measure CPU usage from the audio system and total memory consumed by Audio Sources (both for those that are playing or paused) and Audio Clips.

The Breakdown View provides lots of useful insight into how the Audio System is operating and how various audio channels and groups are being used.

　　　　　　　　The Physics 3D and Physics 2D Areas

There are two different Physics Areas, one for 3D physics (Nvidia's PhysX) and another for the 2D physics system (Box2D). This Area provides various physics statistics, such as Rigidbody, Collider, and Contact counts

　　　　　　　　The Network Messages and Network Operations Areas

These two Areas provide information about Unity's Networking System, which was introduced during the Unity 5 release cycle. The information present will depend on whether the application is using the High-Level API (HLAPI) or Transport Layer API (TLAPI) provided by Unity. The HLAPI is a more easy-to-use system for managing Player and GameObject network synchronization automatically, whereas the TLAPI is a thin layer that operates just above the socket level, allowing Unity developers to conjure up their own networking system

　　　　　　　　The Video Area

If our application happens to make use of Unity's VideoPlayer API, then we might find this Area useful for profiling video playback behavior

　　　　　　　　The UI and UI Details Areas

These Areas are new in Unity 2017 and provide insight into applications making use of Unity's built-in User Interface System.

　　　　　　　　The Global Illumination Area

The Global Illumination Area is another new Area in Unity 2017, and gives us a fantastic amount of detail into Unity's Global Illumination (GI) system

　　Best approaches to performance analysis

　　　　Verifying script presence

Sometimes, there are things we expect to see, but don't. These are usually easy to spot because the human brain is very good at pattern recognition and spotting differences we didn't expect. Meanwhile, there are times where we assume that something has been happening, but it didn't. These are generally more difficult to notice, because we're often scanning for the first kind of problem, and we’re assuming that the things we don’t see are working as intended. In the context of Unity, one problem that manifests itself this way is
verifying that the scripts we expect to be operating are actually present in the Scene

　　　　Verifying script count

Preventing casual mistakes such as this is essential for good productivity, since experience tells us that if we don't explicitly disallow something, then someone, somewhere, at some point, for whatever reason, will do it anyway. This is likely to cost us a frustrating afternoon hunting down a problem that eventually turned out to be caused by human-error

　　　　Verifying the order of events

Unity applications mostly operate as a series of callbacks from Native code to Managed code

　　　　Minimizing ongoing code changes

　　　　Minimizing internal distractions

Vertical Sync (otherwise known as VSync) is used to match the application's frame rate to the frame rate of the device it is being displayed to, for example, a monitor may run at 60 Hertz (60 cycles per-second), and if a rendering loop in our game is running faster than this then it will sit and wait until that time has elapsed before outputting the rendered frame. This feature reduces screen-tearing which occurs when a new image is pushed to the monitor before the previous image was finished, and for a brief moment part of the
new image overlaps the old image.

Executing the Profiler with VSync enabled will probably generate a lot of noisy spikes in the CPU Usage Area under the WaitForTargetFPS heading, asthe application intentionally slows itself down to match the frame rate of the display. These spikes often appear very large in Editor Mode since the Editor is typically rendering to a very small window, which doesn’t take a lot of CPU or GPU work to render.

This will generate unnecessary clutter, making it harder to spot the real issue(s). We should ensure that we disable the VSync checkbox under the CPU Usage Area when we're on the lookout for CPU spikes during performance tests. We can disable the VSync feature entirely by navigating to Edit | Project Settings | Quality and then to the sub-page for the currently selected platform.

We should also ensure that a drop in performance isn't a direct result of a massive number of exceptions and error messages appearing in the Editor Console window. Unity's Debug.Log() and similar methods, such as Debug.LogError() and Debug.LogWarning() are notoriously expensive in terms of CPU usage and heap memory consumption, which can then cause garbage collection to occur and even more lost CPU cycles (refer to Chapter 8, Masterful Memory Management, for more information on these topics).

　　　　Minimizing external distractions

　　　　Targeted profiling of code segments

　　　　　　Profiler script control

The Profiler can be controlled in script code through the Profiler class
　　　　　　Custom CPU Profiling

using System;
using System.Diagnostics;

public class CustomTimer : IDisposable {
    private string _timerName;
    private int _numTests;
    private Stopwatch _watch;

    // give the timer a name, and a count of the
    // number of tests we're running
    public CustomTimer(string timerName, int numTests) {
        _timerName = timerName;
        _numTests = numTests;

        if (_numTests &lt;= 0) {
            _numTests = 1;
        }

        _watch = Stopwatch.StartNew();
    }

    // automatically called when the 'using()' block ends
    public void Dispose() {
        _watch.Stop();
        float ms = _watch.ElapsedMilliseconds;
        UnityEngine.Debug.Log(string.Format("{0} finished: {1:0.00} " + "milliseconds total, {2:0.000000} milliseconds per-test " + "for {3} tests", _timerName, ms, ms / _numTests, _numTests));
    }
}

const int numTests = 1000;
using (new CustomTimer("My Test", numTests)) {
    for(int i = 0; i &lt; numTests; ++i) {
        TestFunction();
    }
} // the timer's Dispose() method is automatically called here

View Code