Graphics Programming

GPU based Light Culling Algorithm Study

In games, more and more dynamic lights are needed nowadays. And the increasing complexity of game scenes makes it more and more difficult to determine light-object intersections in the object space on the CPU side.

Deferred shading was developed to solve this kind of problem by moving the actual lighting process from object space to screen space. But every light on screen has to be drawn once which leads to reading all the G-buffers repeatedly resulting in a bandwidth problem.

So can we draw all the lights in one pass? Of course, we can. There are several GPU based light culling algorithms that have been developed to solve this kind of problems.

There are basically two solutions to this problem: Tiled Lighting and Clustered Lighting.

Tiled Solution

Tiled lighting is a screen space method. To draw all the lights in one pass, the brutal force solution can be calculating the light list for every pixel on the screen, read the G-buffer information and then do all the lighting. However, this is too expensive to compute and a per-pixel light list will cost a lot of memories. So we reduce the compute from per-pixel to per-tile. We separate the screen into tiles. Each tile may include 16*16 pixels or 32*32 pixels or whatever the size that fits the balance of your project (Figure 1).

Figure 1. Basic concept of tiled rendering

Then we calculate the frustum bounds in view frustum that fits the position of each tile. For each tile, we intersect the sub-frustums with all the visible punctual lights. Now we can get the actual light list of each tile. When we do deferred shading, we calculate which tile the pixel belongs to and lighting it with all the light list for that tile.

Depth Bound

To make the intersection results better, a depth pre-pass was done to render all the objects in the scene. During tiled light culling, we read every depth from the depth buffer inside each tile and calculate the min and max depth to get a more tightly bounded frustum volume which makes sure that every light we get actually lighting the scene (Figure 2).

Figure 2. Use depth information to tight the frustum extent

Discontinuous Problem

Even with depth bound, the tiled light actually suffers from depth discontinuous. When depth in tile varies a lot, the light list might have a light that inside the range of min and max depth but didn’t actually light the pixel in tile.

  1. 2.5D Culling. For all the frustums of the tiles, if the depth range is too big, we split them again in depth range to intersect every subrange of the tile frustum with the light (Figure 3).
Figure 3. Seperate depth extent to smaller units
to improve the depth discontinuou problem
  1. Fine Pruned Tiled Light Lists(FPTL). This method was developed to add an additional pass after tiled light culling to calculate actual light list that affects every pixel that inside the tile.

Cluster Solution

As a depth buffer dependent screen space solution, one bad thing about tiled lighting is that the transparent objects cannot use the light list information, because the transparent objects don’t have depth information. Then the clustered lighting comes.

The core idea of clustered lighting is actually pretty similar to tiled lighting. The only difference is how to split the frustum of the camera. Tiled lighting split the frustum based on screen space tile size and depth bounds. For clustered lighting, we also split the frustum based on the screen space tile size. But in the Z direction, we no longer depend on the depth bounds. We directly separate the tiled sub frustum into several z slices (Figure 4). So the whole frustum space was separated into 3D frustums called clusters. After light-cluster intersection, we’ll have all the lights information with every subspace inside the camera view, which means the light list information no longer depend on any scene geometry, and we don’t actually need depth information during the computation.

Figure 4. Separate the whole camera frustum into clusters
regardless of the depth information

After doing the cluster-light intersections, we’ll get the actual lighting link list of every cluster. During shading, for every visible pixel, we calculate which cluster it belongs to based on its view position or world position, fetch the light list, and do all the PBR lighting in one pass.


  • Optimizing tile-based light culling
  • A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
  • Forward+: Bringing Deferred Lighting to the Next Level
  • Clustered Deferred and Forward Shading

Leave a Reply

Your email address will not be published. Required fields are marked *