Shared Uniforms

The World Space example had a few annoyances in it. Of particular pain was the fact that, whenever the perspective projection matrix or the world-to-camera matrix changed, we had to change uniforms in 3 programs. They all used the same value; it seems strange that we should have to go through so much trouble to change these uniforms.

Also, 3 programs is a relatively simple case. When dealing with real examples, the number of programs can get quite large.

There is a way to share uniforms between programs. To do this, we use a buffer object to store uniform data, and then tell our programs to use this particular buffer object to find its uniform data. A buffer object that stores uniforms is commonly called a uniform buffer object.

It is important to understand that there is nothing special about a uniform buffer. Any of the things you could do with a regular buffer object can be done with a uniform buffer object. You can bind it to the GL_ARRAY_BUFFER and use it for vertex data, you can use it for indexed rendering with GL_ELEMENT_ARRAY_BUFFER, and many other things that buffer objects can be used for. Now granted, that doesn't mean that you should, only that you can.

The example World with UBO uses a uniform buffer object to store the camera and perspective matrices.

Uniform Blocks

This begins with how the vertex shaders are defined.

Example 7.10. UBO-based Vertex Shader

#version 330

layout(location = 0) in vec4 position;

layout(std140) uniform GlobalMatrices
{
    mat4 cameraToClipMatrix;
    mat4 worldToCameraMatrix;
};

uniform mat4 modelToWorldMatrix;

void main()
{
    vec4 temp = modelToWorldMatrix * position;
    temp = worldToCameraMatrix * temp;
    gl_Position = cameraToClipMatrix * temp;
}

The definition of GlobalMatrices looks like a struct definition, but it is not. It defines a uniform block. A uniform block is a series of uniform definitions whose data is not stored in the program object, but instead must come from a uniform buffer.

The name GlobalMatrices is used to identify this particular uniform block. This block has two members, both of the of mat4 type. The order of the components in a uniform block is very important.

Notices that nothing else needs to change in the vertex shader. The modelToWorldMatrix is unchanged, and the use of the components of the uniform block do not even need to be scoped with the GlobalMatrices name.

The layout(std140) part modifies the definition of the uniform block. Specifically, it specifies the uniform block layout.

Buffer objects are unformatted arrays of bytes. Therefore, something must determine how the shader interprets a uniform buffer object's contents. OpenGL itself defines this to a degree, but the layout qualifier modifies the definition.

OpenGL is very clear about how each element within a uniform block is laid out. Floating-point values are just the C++ representation of floats, so you can copy them directly from objects like glm::vec4.

Matrices are slightly trickier due to the column-major vs. row-major issue. The glUniformMatrix* functions all had a parameter that defines what order the matrix data given to the function is in. Similarly, a layout qualifier can specify row-major or column-major; these tell OpenGL how the matrices are stored in the buffer object. The default is column-major, and since GLM stores its matrices in column-major order, we can use the defaults.

What OpenGL does not directly specify is the spacing between elements in the uniform block. This allows different hardware to position elements where it is most efficient for them. Some shader hardware can place 2 vec3's directly adjacent to one another, so that they only take up 6 floats. Other hardware cannot handle that, and must pad each vec3 out to 4 floats.

Normally, this would mean that, in order to set any values into the buffer object, you would have to query the program object for the byte offsets for each element in the uniform block.

However, by using the std140 layout, this is not necessary. The std140 layout has an explicit layout specification set down by OpenGL itself. It is basically a kind of lowest-common-denominator among the various different kinds of graphics hardware. The upside is that it allows you to easily know what the layout is without having to query it from OpenGL. The downside is that some space-saving optimizations may not be possible on certain hardware.

One additional feature of std140 is that the uniform block is sharable. Normally, OpenGL allows the GLSL compiler considerable leeway to make optimizations. In this instance, if a GLSL compiler detects that a uniform is unused in a program, it is allowed to mark it as unused. glGetUniformLocation will return -1. It's actually legal to set a value to a location that is -1, but no data will actually be set.

If a uniform block is marked with the std140 layout, then the ability to disable uniforms in within that block is entirely removed. All uniforms must have storage, even if this particular program does not use them. This means that, as long as you declare the same uniforms in the same order within a block, the storage for that uniform block will have the same layout in any program. This means that multiple different programs can use the same uniform buffer.

The other two alternatives to std140 are packed and shared. The default, shared, prevents the uniform optimization, thus allowing the block's uniform buffer data to be shared among multiple programs. However, the user must still query layout information about where each uniform is stored. packed allows uniform optimization, so these blocks cannot be shared between programs at all.

For our needs, std140 is sufficient. It's also a good first step in any implementation; moving to packed or shared as needed should generally be done only as an optimization. The rules for the std140 layout are spelled out explicitly in the OpenGL Specification.

Uniform Block Indices

Uniforms inside a uniform block do not have individual uniform locations. After all, they do not have storage within a program object; their data comes from a buffer object.

So instead of calling glGetUniformLocation, we have a new function.

data.globalUniformBlockIndex =
    glGetUniformBlockIndex(data.theProgram, "GlobalMatrices");

The function glGetUniformBlockIndex takes a program object and the name of a uniform block. It returns a uniform block index that is used to refer to this uniform block. This is similar to how a uniform location value is used to refer to a uniform, rather than directly using its string name.

Uniform Buffer Creation

Now that the programs have a uniform block, we need to create a buffer object to store our uniforms in.

Example 7.11. Uniform Buffer Creation

glGenBuffers(1, &g_GlobalMatricesUBO);
glBindBuffer(GL_UNIFORM_BUFFER, g_GlobalMatricesUBO);
glBufferData(GL_UNIFORM_BUFFER, sizeof(glm::mat4) * 2, NULL, GL_STREAM_DRAW);
glBindBuffer(GL_UNIFORM_BUFFER, 0);

For all intents and purposes, this is identical to the way we created other buffer objects. The only difference is the use of the GL_UNIFORM_BUFFER binding target.

The GL_ARRAY_BUFFER target has a specific meaning. When something is bound to that target, calling glVertexAttribPointer will cause the buffer object bound to that target to become the source for that particular attribute, as defined by the function call. The GL_ELEMENT_ARRAY_BUFFER target also has a meaning; it specifies where indices come from for indexed rendering. The element array binding is even stored as part of a VAO's data (recall that the array buffer binding is not stored in the VAO).

GL_UNIFORM_BUFFER does not really have an intrinsic meaning like these other two. Having something bound to this binding means nothing as far as any other function of OpenGL is concerned. Oh, you can call buffer object functions on it, like glBufferData as above. But it does not have any other role to play in rendering. The main reason to use it is to preserve the contents of more useful binding points. It also communicates to someone reading your code that this buffer object is going to be used to store uniform data.

Note

This is not entirely 100% correct. OpenGL is technically allowed to infer something about your intended use of a buffer object based on what target you first use to bind it. So by allocating storage for this buffer in GL_UNIFORM_BUFFER, we are signaling something to OpenGL, which can change how it allocates storage for the buffer.

However, OpenGL is not allowed to make any behavioral changes based on this. It is still legal to use a buffer allocated on the GL_UNIFORM_BUFFER target as a GL_ARRAY_BUFFER or in any other buffer object usage. It just may not be as fast as you might want.

We know that the size of this buffer needs to be two glm::mat4's in size. The std140 layout guarantees this. That and the size of glm::mat4, which just so happens to correspond to how large a GLSL mat4 is when stored in a uniform buffer.

The reshape function is guaranteed to be called after our init function. That's why we can allocate this buffer without filling in a default matrix. The reshape function is as follows:

Example 7.12. UBO-based Perspective Matrix

void reshape (int w, int h)
{
    glutil::MatrixStack persMatrix;
    persMatrix.Perspective(45.0f, (w / (float)h), g_fzNear, g_fzFar);
    
    glBindBuffer(GL_UNIFORM_BUFFER, g_GlobalMatricesUBO);
    glBufferSubData(GL_UNIFORM_BUFFER, 0, sizeof(glm::mat4), glm::value_ptr(persMatrix.Top()));
    glBindBuffer(GL_UNIFORM_BUFFER, 0);
    
    glViewport(0, 0, (GLsizei) w, (GLsizei) h);
    glutPostRedisplay();
}

This function just uses glBufferSubData to upload the matrix data to the buffer object. Since we defined the perspective matrix as the first matrix in our uniform block, it is uploaded to byte 0.

The display function is what uploads the world-to-camera matrix to the buffer object. It is quite similar to what it used to be:

Example 7.13. UBO-based Camera Matrix

const glm::vec3 &camPos = ResolveCamPosition();

glutil::MatrixStack camMatrix;
camMatrix.SetMatrix(CalcLookAtMatrix(camPos, g_camTarget, glm::vec3(0.0f, 1.0f, 0.0f)));

glBindBuffer(GL_UNIFORM_BUFFER, g_GlobalMatricesUBO);
glBufferSubData(GL_UNIFORM_BUFFER, sizeof(glm::mat4), sizeof(glm::mat4), glm::value_ptr(camMatrix.Top()));
glBindBuffer(GL_UNIFORM_BUFFER, 0);

The world-to-camera matrix is the second matrix, so we start the upload at the end of the previous matrix.

Uniform Buffer Binding

Thus far, we have a uniform buffer object into which we store matrices. And we have a program that has a uniform block that needs a uniform buffer to get its uniforms for. Now, the final step is to create the association between the uniform block in the programs and the uniform buffer object itself.

Your first thought might be that there would be a function like glUniformBuffer that takes a program, a uniform block index, and the uniform buffer to associate that block with. But this is not the case; attaching a uniform buffer to a program's block is more complicated. And this is a good thing if you think about it.

It works like this. The OpenGL context (effectively a giant struct containing each piece of data used to render) has an array of uniform buffer binding points. Buffer objects can be bound to each of these binding points. For each uniform block in a program, there is a reference, not to a buffer object, but to one of these uniform buffer binding points. This reference is just a numerical index: 0, 1, 2, etc.

A diagram should make it clearer:

Figure 7.7. Uniform Buffer and Block Binding Points

Uniform Buffer and Block Binding Points

The program object is given an index that represents one of the slots in the context. The uniform buffer is bound to one of those slots. Therefore, when you render with that program, the uniform buffer that is in the slot specified by the program will be where the program gets its uniform data from.

Therefore, to use a uniform buffer, one first must tell the program object which binding point in the context to find the buffer. This association is made with the glUniformBlockBinding function.

glUniformBlockBinding(data.theProgram, data.globalUniformBlockIndex,
    g_iGlobalMatricesBindingIndex);

The first parameter is the program, the second is the uniform block index queried before. The third is the uniform buffer binding point that this block should use.

After doing this for each program, the uniform buffer must be bound to that binding point. This is done with a new function, glBindBufferRange.

glBindBufferRange(GL_UNIFORM_BUFFER, g_iGlobalMatricesBindingIndex,
    g_GlobalMatricesUBO, 0, sizeof(glm::mat4) * 2);

This functions similarly to glBindBuffer; in addition to binding the buffer to the GL_UNIFORM_BUFFER target, it also binds the buffer to the given uniform buffer binding point. Lastly, it provides an offset and range, the last two parameters. This allows you to put uniform data in arbitrary places in a buffer object. You could have the uniform data for several uniform blocks in several programs all in one buffer object. The range parameters would be how to say where that block's data begins and how big it is.

The reason this is better than directly binding a buffer object to the program object can be seen in exactly where all of this happens. Both of these functions are called as part of initialization code. glUniformBlockBinding is called right after creating the program, and similarly glBindBufferRange is called immediately after creating the buffer object. Neither one needs to ever be changed. Yes, we change the contents of the buffer object. But where it is bound never changes.

The global constant g_iGlobalMatricesBindingIndex is, as the name suggests, global. By convention, all programs get their buffer data from this index. Because of this convention, if we wanted to use a different buffer, we would not have to update every program that needs to use that buffer. Sure, for one or two programs, that would be a simple operation. But real applications can have hundreds of programs. Being able to establish this kind of convention makes using uniform buffer objects much easier than if they were directly associated with programs.

The Viewpoint

In the World Space example, we drew the camera's look-at target directly in camera space, bypassing the world-to-camera matrix. Doing that with uniform buffers would be harder, since we would have to set the uniform buffer value twice in the same draw call. This is not particularly difficult, but it could be a drain on performance.

Instead, we just use the camera's target position to compute a model-to-world matrix that always positions the object at the target point.

Example 7.14. Viewing Point with UBO

glDisable(GL_DEPTH_TEST);

glutil::PushStack push(modelMatrix);

modelMatrix.Translate(g_camTarget);
modelMatrix.Scale(1.0f, 1.0f, 1.0f);

glUseProgram(ObjectColor.theProgram);
glUniformMatrix4fv(ObjectColor.modelToWorldMatrixUnif, 1,
    GL_FALSE, glm::value_ptr(modelMatrix.Top()));
g_pCubeColorMesh->Render();
glUseProgram(0);
glEnable(GL_DEPTH_TEST);

We do not get the neat effect of having the object always face the camera though. We still shut off the depth test, so that we can always see the object.

Fork me on GitHub