Graphics Programming

Frustum Aligned Rendering Solution in VR with Unity’s Scriptable Rendering Pipeline – Part 2

This article is splitted to following parts. Click the title to redirect.

Why SRP

What’s wrong with the default renderer

The default unity renderer has two rendering pipelines: default forward and default deferred.

The default forward renderer has the best compatibility but the way it deals with multiple lights is not that good. In the forward renderer, if an object was lit by multiple lights, the object will be drawn as many times as the lights count, which means that if an object was lit by 5 lights, it will draw the mesh 5 times and add them together to achieve the final shading color. This method suffers both CPU and GPU when scene and lighting condition becomes more and more complex. On the CPU side, batch count increases rapidly when light count grows and for different objects are lit by different lights, no matter if they have the same material, they can not be batched together. On the GPU side, one mesh should be processed several times through the whole graphics pipeline to get the final lighting results. All the IA, VS, clipping, rasterization, all kinds of test and final merge should be done multiple times for a single mesh. So if we are not targeting some really low-end platform, the default forward renderer is really not a good choice especially when it comes to PBR pipeline.

Figure 1. Forward rendering in Unity default pipeline. In frame debugger, we can see that the plane mesh was draw as many times as the lights count
Figure 2. Deferred rendering in Unity default pipeline. In the frame debugger, we can see that for every light, an approximated geometry of the light was drawn to the buffer which results in repeating g-buffers read and render targe write

The default deferred renderer also put compatibility at first. Any platform that supports multiple render targets can use this renderer. It only uses vertex and pixel shaders. After GBuffers are rendered, lights are rendered one by one as geometry onto the final render target referring information from GBuffer textures. GBuffer rendering is not affected by light information so they can be batched well. There are two drawbacks when it comes to the lighting part: one is that every light was drawn in a separate pass. So for a GBuffer area that is covered by multiple lights, the GBuffer textures will be read multiple times. Second, the light may not light any pixel even it covers a lot of pixels in screen space. This means that a lot of texel fetch operations are finally meaningless. You can firstly only read the depth buffer and see if it’s inside the light’s range, but we all not branching if not that well executed on GPU threads. Another main drawback of the default deferred pipeline is that it only supports standard BRDF. When we have something translucency, anisotropic, or transparent objects, it falls back to the forward method and the lighting problem appears again.

This situation is understandable if we look at the projects made with Unity. From consoles, PC to mobile even web, it is hard for a single render pipeline to guarantee compatibility and at the same time achieve the maximum efficiency.

Possible solutions

Except the default render pipelines, Unity also provides some manual graphics features to provide some sort of customization of the default render pipeline: Graphics related APIs and CommandBuffer related APIs. With these APIs, you can do some additional work aside from the default pipeline. Combined with shaders, some rendering techniques can be achieved. However, when you are trying to reorganize a whole render pipeline with these additional APIs, all kinds of problems still stand in your way.

At the start of my work, I’ve realized the lighting problems of the default render pipeline. The CommandBuffer features are available then. So the tiled lighting method came to my mind and seems to be pretty easy to implement with CommandBuffer features, dispatching compute shaders, binding resources, etc. For every light in scenes, we attach a MonoBehavior to it and collect lights data with a manager class. In the MonoBehaviour.OnPreRender(), do all the light-tile intersection with compute shader and bind all the related compute and constant buffers to the shaders. Then do the default forward render pipeline with tiled lighting shaders. Till now everything works fine except we can not directly make use of the frustum and occlusion culling results of the Unity’s culling module. Of course, either do it again manually or compute all the lights in scenes in the compute shader results in a waste of computing on CPU or GPU. This is not that big a problem when comparing to the shadow rendering. Unity is not able to access any reference of the shadow maps on the CPU side, which means we can not bind the default rendered shadow maps to GPU manually. Oops, no shadow maps are available during pixel shading. So, do it again by hand? Seem the only way to go! Let’s do all the custom shadowing with an additional shadow camera. In a test scene with several cubes, spheres, it works fine. Then test it in a scene of the actual project, the frame rate is so bad, even much worse than the default forward rendering. What’s going on? After profiling, on the GPU side, everything is actually fine. But, on the CPU side, the shadow camera cost a whole lot of time. The Unity regards the shadow camera as a common camera (Of course, why not?). It does all the work a common camera does, culling, batch organizing, etc. The time spending on CPU just increases rapidly when we have multiple lights with shadows in scenes.

Figure 3. Doing cluster lighting in defaul pipeline. In a scene with a directional light that has shadow of two cascades, each cascade is rendered with the shadow camera and the shadow camera rendering cost a lot of time on CPU (inside the red box).

Now we can draw the conclusion here. The lighting process of the default Unity render pipelines are not sufficient enough especially when it falls back to forward rendering, and even with customizable graphics APIs like CommandBuffer, it’s still difficult to replace the full rendering process of placed-in-scene objects.


Leave a Reply

Your email address will not be published. Required fields are marked *