About Kite & Lightning
We are a creative development studio with cinematic film sensibilities forging experiences to move people.

Variance Shadow Maps

By on February 24, 2014

Real time shadows are still an annoyance in real time graphics. Surprisingly, even the latest next-gen games use a multitiude of shadowing techniques to compensate for each one’s shortcommings. (Crytek shadow techniques: http://www.crytek.com/download/Playing%20with%20Real-Time%20Shadows.pdf)

The solution? Visual scoping. Be cogniscent of the techniques available and craft the art direction within those constraints. Our art direction is heavy cinematic lighting. Certain assumptions are

  • Small number of lights that don’t move. Lights don’t move around in real life
  • Physical Plausible Pipeline (Area lights, not point lights; we still have a ways to go here)
  • Aliasing is the worst offender. Give up hard-shadows in preference to soft shadows if there’s going to be sampling
  • 60 fps hard minimum at 1080p

So we have a lot of wiggle room within this art direction but we still need some sort of shadowing mechanism for dynamic characters.

Variance Shadow Maps Overview

My first inclination was to implement Variance Shadow Maps because they are very fast. VSMs use a probability distribution function to compute shadow visibility. The idea behind them is that we want to separate the shadowing function terms into occluder terms (things that go into shadow-map) vs the receiver terms (the scene you’re rendering) because this allows us to perform pre-filtering on the shadow map (Gaussian blur, mipmapping, bilinear/trilinear sampling, etc, all are things prevent aliasing, biasing problems such as shadow-acne, etc) The initial insight for this technique came from computing volumetric shadows (Deep Shado Maps by Locovic & Veach).

So, what does it mean when people talk about the shadow test as being a function? Our shadow test is normally a function that returns 1 if a fragment is not in shadow and 0 if a fragment is in shadow. This is a heaviside function defined as



VSM’s approximate this function as a probabilistic function instead of a heave-side step function.


where clip_image005

do becomes a random variable that represents occluder depth distribution function. Instead of each texel in a shadow map representing a single depth value, it represents a distribution of depth values. This is powerful because most shadow bias/acne problems come from the quantization of the shadow map:


In traditional shadow mapping, the red lines show the depth sample stored at each texel. The teal object spans multiple depth at each texel because it’s curved. When the camera renders the pixels depicted by arrows, we get self-shadowing because of this quantization.

VSM Deets: Sprinkle That Math Magic

So, instead of storing a single depth value, we store a distribution of depth values at each texel. P(do < dr) is the probability that our current fragment depth is further than all the depth samples in the distribution. So, how do store this distribution? Well, we store the first two moments that allow us to reconstruct the distribution:


x is our depth at our current pixel, p(x) is our filter weight, E(x) is our expected value of the distribution in this neighborhood (which is the result of averaging/filtering the shadowmap texel)

Bringing back that undergrad Probability, we can compute the variance and the mean:


Using Chebyshev’s inequality, we can compute an upper bound for P(do < dr):


Fortunately, this upper bound is a good enough approximation for planar receivers. For a detailed explanation and assumptions, you can check out the VSM paper: http://www.punkuser.net/vsm/vsm_paper.pdf

So to recap, here are the general steps to VSM:

  • Render a shadow map and store z, z*z to a render texture. Use a linear z-depth. Perspective-corrected z (aka z/w that is stored in the depth buffer) is horrible. For floating point textures, you can remap the linear z to [-1,1]. You can enable the usual AA flags on the texture (MSAA, bilinear/trilinear sampling, etc)
  • Optionally blur the shadow map (box or gaussian filter)
  • Generate mipmaps
  • Render the scene as usual. For the shadow test, use Chebyshev’s inequality to compute p_max. p_max is your shadow occlusion factor
  • Attenuate the light contribution by p_max

VSMs are extremely fast b/c you of this pre-filtering (e.g. blur it, mipmap, do anistropic bilinear filtering) so you get a nice fall-off at the edges. However, one of the unavoidable problems though is that you get light leaking and peter panning when you have high-variance in your depth distribution:


High variance in the depth distribution caused by multiple overlapping occluders

Implementation Fun That Drives You Crazy

While I implementing this, I ran into a couple more implementation gotchas. For anyone implementing this in Unity, here are some pitfalls that you can avoid:


Careful with DirectX vs OpenGL: DirectX NDC.z goes from 0 to 1 whereas OpenGL NDC.z goes from -1 to 1 (x,y go from -1 to 1 for both). I was effectively halving my depth buffer resolution by scaling in DirectX to .5 to 1!



Much better but you can still see the effects of peter panning & light leakage

Next problem to fix was realizing that A)I was using the wrong projection matrix for the DX render path. Unity uses OpenGL projection matrices by default so you have to call GetGPUProjectionMatrix() to set the right version. That caused a change in handedness which lead to 7 hours of shader debugging to find that Cull Back turned into Cull Front B) You shouldn’t front face cull, you’ll get this winnowing effect where the shadows shrink


This is the correct image which shows a bit of light leakage but better contact shadows and not as much peter panning.


I also noticed there was a difference between computing VSM’s on linearized depth vs perspective corrected z (z/w which is what’s stored in the depth buffer). Surprisingly, there was a difference, even though I was using 32-bit floating point textures. There’s a little bit less light bleeding

clip_image017 clip_image018

Using perspective corrected depth values (z/w) vs  using linearized depth (z values mapped linearly from [Near Plane,Far Plan] to [0,1])

Sadly, my foundation in Probability isn’t solid enough to explain this. If you can explain it, please add some insight in the comments. I’d imagine it has something to do with how perspective correction alters the variance of our depth distribution random variable.

Ultimately, I didn’t settle on this technique as the light leaking problem artifact was too detrimental. The standard light leakage reduction technique created some “cartoony” fattened shadows. Instead, I switched over to Exponential Shadow Maps, which are even faster and better:


Stay tuned for a detailed follow-up…