This article is splitted to following parts. Click the title to redirect.
- Part 1: What is Unity scriptable render pipeline (SRP)
- Part 2: Why SRP – What’s wrong with the default renderer
- Part 3: Why SRP – Unleash the Power of SRP
- Part 4: Implement details – basics
- Part 5: Implement details – wind
Unleash the power of SRP
With SRP, we can make full use of the culling results from Unity culling module including shadow culling. And for every light, we can completely decide how and when to render their shadows and don’t need an extra shadow camera that we mentioned in the last section.
In the following contents, I’ll first introduce the render pipeline I choose for our project, then I’ll explain the key features chosen in this flow. Finally, some implementation details will be present.
Frustum Aligned Render Pipeline
This render pipeline is based on the cluster lighting method. Because we are a PS VR project, several features are added to make it works more efficient under VR. The general procedure of the render pipeline is shown as the image below.
Cluster lighting: As we discussed in previous posts, GPU lights culling methods are very suitable for nowadays graphics APIs. A large number of lights can be processed efficiently by
Tiled lighting and cluster lighting are both widely used in all kinds of games right now. So why did we choose cluster lighting as a final solution?
First, cluster lighting computing covers a whole camera frustum regardless of any depth information. So objects with no depth write can still make full use of the light culling results as we’ve talked before.
Second, this is a choice for VR. Our project is a VR title, which means we have two viewports to concerned about. Tiled lighting is more of a screen space solution because it needs the depth information. The VR render target is twice as large as a 1080p render target or even bigger. Can we do the calculation in one eye and use the results for both eye shading? Apparently, no! Under VR, the more close a object is to the camera, the larger difference they have in screen space between both eyes, which means that the depth value varies a lot inside a single tile across two eyes. When it comes to cluster lighting, everything happens in frustum space. The two eyes’ frustums are pretty close comparing to the world scale and they are parral. We can use one frustum that’s slightly bigger horizontally that can encapsulate both frustums inside. Then we regard this frustum as our desination frustum to do all the cluster separation and calculations. Then during shading, transform pixel’s world position to this frustums space and get all the lights information for lighting.
Volumetric light/fog: Just like why we choose cluster lighting for our lighting solution, we choose frustum aligned 3D texture for our volumetric effect rendering solution for two reasons.
First, this method does not need depth information, so transparent objects can have coherent volumetric effect as opaque objects.
Second, under VR, any screen space method cost twice as much as none VR situation. This 3D method can use the same frustum as cluster lighting suggests to get final results with computation similar to the none VR rendering.
There is another reason for this choice is that texels in volumetric lighting textures are actually aligned with light clusters, which make light list fetching very easy in volumetric lighting compute process.
With these two methods both do the calculation with frustum aligned units, “Frustum Aligned VR-Compatible” in the title makes sense now.
Hybrid shadow map: We all know that rendering geometry takes time both on CPU and GPU. In a game, the main camera is always moving. So the view matrix is always changing. The scene always needs to be re-rendered every frame. But shadows, do they need to be updated every frame? Not always. If a light is static and all the objects inside the shadow scope are static, we actually don’t need to update the shadow map by frame and from this, came the idea of partially updating shadow. So the shadow map of a light only needs to be updated under the following circumstances:
- The transform of a light is changed;
- The transform of a shadow caster inside the light is changed;
- The shadow map of a light currently does not exist.
Our project is not an open world game. So, for directional lights, we render an offline shadow map of the whole scene with a very large texture, 4K – 8K based on the size of the scene. The last cascade always uses this static shadow map. And other nearer cascades all use dynamic shadows. The Rainbow Six use a combination of a static shadow map as above and a dynamic shadow map that only update dynamic objects for not first and not last cascades. It can save a lot for CPU and shadow rendering time on GPU, but it’ll increase shadow map sampling time and introduce more branch in shader code maybe. Maybe try it in the future, should be a win.
For local lights, if the light itself and the shadow casters in it are all static, the shadow map should also be rendered offline. If the light is dynamic or there are always dynamic shadow casters inside the light range, it should be updated every frame as long as its visible. And there is another kind of situation here, the light itself is static, but there may or may not be any dynamic objects in it, eg, an enemy walks into a static light during battle. If a light has no shadow map in the current shadow map atlas, as rule 3, render the shadow map. If it exists, only update it when it fits rule 1 or 2.
Async Compute: Most modern GPUs have a unified shader architecture like AMD GCN. A unified shader engine means that every thread on SIMD can execute all types of shader works to make the best use of all the GPU resources. Some tasks are ALU heavy. Some are RW heavy. Some are context busy. When we dispatch all the tasks from the same shader program, they may compete for the GPU resources with each other which will result in a stall and waste the computational resources.
So modern GPUs always have two kinds of pipes: one is the Graphics Pipe and the other is the Compute Pipe. The shader processor can simultaneously fetch and execute tasks from these two pipes and sync at a certain point as the user want. The Graphics Pipe can have all types of shader tasks, but the Compute Pipe can only dispatch compute shader tasks. So if we have a compute shader to execute and has no dependency on the graphics pipe tasks right now, the compute shader can be dispatched async to make use of the GPU resources that the graphics pipe tasks may not starve for to achieve the best performance. This is called async compute.
The best combination of async compute tasks is that they have different desired GPU resources. Like rendering G-Buffers needs a lot of texture read and render target write. It’s an RW heavy process. So we can do some ALU heavy compute tasks like cluster-light intersection shader. Under this situation, an AO shader or a texture blur shader that also need a lot of RW operations may not be that suitable for async compute.
Unity SRP also supports this feature on DX11 level APIs. As the flow graph the last section showed, we also make use of this feature to save some time for the whole render pipeline. Like in the depth prepass process, the GPU is starving for shader tasks to execute because of context switch. Now we can feed the empty wavefronts with the volumetric lighting compute tasks.
The efficiency of async compute is not that perfect as it sounds because of resource competing. So carefully profile it with GPU tools.
-  GPU base light culling algorithm study
-  Implementing the cluster lighting method
-  Volumetric fog/light rendering
-  Rendering Rainbow Six Siege, https://www.youtube.com/watch?v=RAy8UoO2blc&t=191s