Please read and seek to understand the material below. Questions and programming exercises in light yellow will be discussed in class. Please write down enough so that you will be able to participate in the discussion. If you do not understand an exercise, feel free to skip it.

Framebuffers, Offscreen Rendering, and Post-Processing

There are many instances in real-time graphics (and, thus, games), where you want some rendering operations to be able to read the pixel values written by previous rendering operations. For example:

What is another effect used in games where rendering results are not directly shown on the screen?

All of these techniques involve setting up the GPU to render to memory that isn't going to be (directly) shown on the screen ("offscreen rendering"). In OpenGL, the area the GPU renders to is called a framebuffer; offscreen rendering is accomplished by creating a new framebuffer object to describe where the GPU should send its results, and binding this framebuffer to the pipeline.

The remainder of this lesson will describe the mechanics of framebuffer setup, and give you a chance to play with one of the more straightforward (but, nonetheless, powerful) uses of offscreen rendering: post-processing (generally: effects that modify the output pixels of the whole scene in order to perform blurs, color correction, and other fancy effects).

Framebuffer Objects

In OpenGL, information about the current target for rendering operations is held in a framebuffer object. These objects are managed just like any other OpenGL object: they are named with GLuints (with 0 being a special name for the default framebuffer object), there are functions to allocate and delete names, there are functions to set the currently bound framebuffer object.

//Allocate and bind a framebuffer:
glGenFramebuffers(1, &hdr_fb);
glBindFramebuffer(GL_FRAMEBUFFER, hdr_fb);

A framebuffer object keeps track of attachments -- areas of memory where the GPU should read and write during rendering operations. Framebuffers have various attachment locations with specific purposes:

Note that these attachment points are references to GPU-allocated memory -- your code needs to allocate memory before it can point a framebuffer object to it. Your code can allocate memory for a framebuffer object to render to in two ways: by allocating a texture or by allocating a renderbuffer.

Textures are, well, textures. You can render to them using an attached framebuffer, read from them with all sorts of different sampling and wrapping modes in a shader program, and so on.

//Allocate and bind texture to framebuffer's color attachments:
//allocate texture name:
glGenTextures(1, &hdr_color_tex);
glBindTexture(GL_TEXTURE_2D, hdr_color_tex);
//allocate texture memory:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB16F, size.x, size.y, 0, GL_RGB, GL_FLOAT, nullptr);
//set sampling parameters for texture:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glBindTexture(GL_TEXTURE_2D, 0);

//attach texture to framebuffer as the first color buffer:
glBindFramebuffer(GL_FRAMEBUFFER, hdr_fb);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, hdr_color_tex, 0);
glBindFramebuffer(GL_FRAMEBUFFER, 0);

Renderbuffers are "as simple as possible" framebuffer-attachable memory that cannot be accessed outside of their function as part of a framebuffer. (This makes sense for things that have formats that don't clearly map to a texture -- like a multisample color buffer, or a combined depth-and-stencil buffer. This also makes sense when you just don't want to bother with texture state because you won't be reading the values from a shader.)

//Allocate and bind renderbuffers to framebuffer's depth attachment:
//allocate renderbuffer name:
glGenRenderbuffers(1, &hdr_depth_rb);
glBindRenderbuffer(GL_RENDERBUFFER, hdr_depth_rb);
//allocate renderbuffer memory:
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT24, size.x, size.y);
glBindRenderbuffer(GL_RENDERBUFFER, 0);

//attach renderbuffer to framebuffer as the depth buffer:
glBindFramebuffer(GL_FRAMEBUFFER, hdr_fb);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, hdr_depth_rb);
glBindFramebuffer(GL_FRAMEBUFFER, 0);

Finally, a framebuffer won't work unless it is complete -- this is a specification-defined term that basically means that a framebuffer has a set of attachments which match in dimension, and are of appropriate formats to render to. It is very frustrating to discover that the reason you weren't seeing any output on-screen is that your framebuffers were not complete. The framebuffer example code has a convenient helper function (in gl_check_fb.hpp), which you should probably get in the habit of using.

#include "gl_check_fb.hpp"
//Check for framebuffer completeness:
glBindFramebuffer(GL_FRAMEBUFFER, hdr_fb);
gl_check_fb();
glBindFramebuffer(GL_FRAMEBUFFER, 0);

Though they many not require many lines of code, framebuffers (or, really, their associated memory) can take up surprising amounts of GPU memory.

Imagine a game whose post-processing pipeline requires access to the position (in an GL_RGB32F floating point texture), normal (in an GL_RGB10_A2 fixed-point texture), and color value (in an GL_RGB16F floating point texture for high dynamic range) associated with each pixel. How many bytes of GPU memory are required to store these textures given a 4k (3840 x 2160) output resolution?

If the game is running at 60Hz, how many bytes per second of memory write bandwidth are required to store these buffers to GPU memory each frame?

(This number should give you a hint as to why graphics cards generally include faster memory than that generally used as main system memory.)

Imagine that an enterprising programmer with a many-core processor decides to run post-process filters on the host CPU rather than on the GPU, by transferring these buffers from GPU memory to host memory each frame, running a CPU-based post-processing pass, and then transferring the result (say, an 32-bit-per-channel floating point RGB image) back to the GPU for display. Why might this cause problems if the programmer's GPU is connected via a PCIe 3.0 16x link (15.754GB/sec of transfer bandwidth)?

As crazy as it seems, this is actually a technique folks used in (some) Playstation 3 games, because its Cell processor architecture includes an array of very fast co-processors along with a dedicated GPU. (And because it wasn't pushing 4k frames around.)

Small Notes

Different framebuffer objects can reference the same attachments! This can be useful in order to share memory (e.g., for screen-sized temporary storage locations) between rendering steps.

Drawing into a framebuffer which is also being used as a source texture during the drawing has undefined behavior! (To see why this is hard notice that it could -- e.g. -- make the order of fragment processing observable; something which the specification takes pains to otherwise avoid.) This means that when your code is processing a texture it needs to output the processing result into a different texture, "ping-pong-ing" between buffers for each processing step.

An Example: HDR Rendering

I have prepared an example of using off-screen rendering to do some simple HDR (high-dynamic-range) rendering with a "glow" or "bloom" effect. This code is available at https://github.com/15-466/15-466-f20-framebuffer.

I encourage you to explore the code and play with the shaders to generate different effects.

Experiment with the ToneMapProgram shader.

Make a version of the shader that does something really weird to the colors in the scene, and paste the relevant shader code below:

Make a version of the shader that highlights edges (pixels with large color differences with their neighbors). (HINT: you may need to add additional texelFetch calls.) Paste the relevant shader code below:

Final Remarks

The use of offscreen rendering is pervasive in modern real-time graphics because it allows the use of rasterization and shading hardware on GPUs to compute all sorts of rendering effects that aren't directly rendered as part of the main scene. These can be as simple as rendering separate viewpoints and as complex as running physics on particle systems. (And, importantly, using all this compute without requiring communication back to system memory.)

Recent GPUs go even further than offscreen rendering by offering "compute shaders", which can directly read and write memory buffers. These are convenient because -- among other capabilities -- they enable scatter-style memory access rather than just the gather-style access offered by fragment shaders.

The idea of framebuffer objects that store references to memory in textures and renderbuffers should remind you of a similar situation involving to vertex buffers and vertex array objects.

How do vertex buffer objects and vertex array objects relate and what are they used for?