The next stage in the evolution of graphics hardware again came from NVIDIA. While 3Dfx released competing cards, they were behind the curve. The NVIDIA GeForce 256 (not to be confused with the GeForce GT250, a much more modern card), released in 1999, provided something truly new: a vertex processing pipeline.
The OpenGL API has always defined a vertex processing pipeline (it was fixed-function in those days rather than shader-based). And NVIDIA implemented it in their TNT-era drivers on the CPU. But only with the GeForce 256 was this actually implemented in hardware. And NVIDIA essentially built the entire OpenGL fixed-function vertex processing pipeline directly into the GeForce hardware.
This was primarily a performance win. While it was important for the progress of hardware, a less-well-known improvement of the early GeForce hardware was more important to its future.
In the fragment processing pipeline, the texture environment stages were removed. In their place was a more powerful mechanism, what NVIDIA called “register combiners.”
The GeForce 256 provided 2 regular combiner stages. Each of these stages represented up to four independent opcodes that operated over the register set. The opcodes could result in multiple outputs, which could be written to two temporary registers.
What is interesting is that the register values are no longer limited to color values. Instead, they are signed values, on the range [-1, 1]; they have 9 bits of precision or so. While the initial color or texture values are on [0, 1], the actual opcodes themselves can perform operations that generate negative values. Opcodes can even scale/bias their inputs, which allow them to turn unsigned colors into signed values.
Because of this, the GeForce 256 was the first hardware to be able to do true normal mapping, without hacks or tricks. A single register combiner stage could do 2 3-vector dot-products at a time. Textures could store normals by compressing them to a [0, 1] range. The light direction could either be a constant or interpolated per-vertex in texture space.
Now granted, this still was a primitive form of bump mapping. There was no way to correct for tangent-space values with bitantent and tangents. But this was at least something. And it really was the first step towards programmability; it showed that textures could truly represent values other than colors.
There was also a single final combiner stage. This was a much more limited stage than the regular combiner stages. It could do a linear interpolation operation and an addition; this was designed specifically to implement OpenGL's fixed-function fog and specular computations.
The register file consisted of two temporary registers, two per-vertex colors, two texture colors, two uniform values, the zero register, and a few other values used for OpenGL fixed-function fog operations. The color and texture registers were even writeable, if you needed more temporaries.
There were a few other sundry additions to the hardware. Cube textures first came onto the scene. Combined with the right texture coordinate computations (now in hardware), you could have reflective surfaces much more easily. Anisotropic filtering and multisampling also appeared at this time. The limits were relatively small; anisotropic filtering was limited to 4x, while the maximum number of samples was restricted to two. Compressed texture formats also appeared on the scene.
What we see thus far as we take steps towards true programmability is that increased complexity in fragment processing starts pushing for other needs. The addition of a dot product allows lighting computations to take place per-fragment. But you cannot have full texture-space bump mapping because of the lack of a normal/binormal/tangent matrix to transform vectors to texture space. Cubemaps allow you to do arbitrary reflections, but computing reflection directions per-vertex requires interpolating reflection normals, which does not work very well over large polygons.
This also saw the introduction of something called a rectangle texture. This texture type is something of an odd duck that still remains in current day. It was a way of creating a texture of arbitrary size; until then, textures were limited to powers of two in size (though the sizes did not have to be the same). The texture coordinates for rectangle textures are not normalized; they were in texture space values.