On Vertex Shader Performance

These tutorials are simple and should run fast enough, but it is still important to look at the performance implications of various operations. In this tutorial, we present 3 ways of moving vertex data: transform it yourself on the CPU and upload it to buffer objects, generate transform parameters on the CPU and have the vertex shader use them to do the transform, and put as much as possible in the vertex shader and only have the CPU provide the most basic parameters. Which is the best to use?

This is not an easy question to answer. However, it is almost always the case that CPU transformations will be slower than doing it on the GPU. The only time it will not be is if you need to do the exact same transformations many times within the same frame. And even then, it is better to do the transformations once on the GPU and save the result of that in a buffer object that you will pull from later. This is called transform feedback, and it will be covered in a later tutorial.

Between the other two methods, which is better really depends on the specific case. Take our example. In one case, we compute the offset on the CPU and pass it to the GPU. The GPU applies the offset to each vertex position. In the other case, we simply provide a time parameter, and for every vertex, the GPU must compute the exact same offset. This means that the vertex shader is doing a lot of work that all comes out to the same number.

Even so, that does not mean it's always slower. What matters is the overhead of changing data. Changing a uniform takes time; changing a vector uniform typically takes no more time than changing a single float, due to the way that many cards handle floating-point math. The question is this: what is the cost of doing more complex operations in a vertex shader vs. how often those operations need to be done.

The second vertex shader we use, the one that computes the offset itself, does a lot of complex math. Sine and cosine values are not particularly fast to compute. They require quite a few computations to calculate. And since the offset itself does not change for each vertex in a single rendering call, performance-wise it would be best to compute the offset on the CPU and pass the offset as a uniform value.

And typically, that is how rendering is done much of the time. Vertex shaders are given transformation values that are pre-computed on the CPU. But this does not mean that this is the only or best way to do this. In some cases, it is often useful to compute the offsets via parameterized values passed to the vertex shader.

This is best done when vertex shader inputs are abstracted away. That is, rather than passing a position, the user passes more general information, and the shader generates the position at a particular time or some other parameter. This can be done for particle systems based on forces; the vertex shader executes the force functions based on time, and is able to thus compute the location of the particle at an arbitrary time.

This also has an advantage that we have seen. By passing high-level information to the shader and letting it do complex math, you can affect much more than just a simple offset. The color animation in the fragment shader would not have been possible with just an offset. High-level parameterization gives shaders a great deal of freedom.

Fork me on GitHub