Graphics, Video, and Display (D2:F0)
9.1.4
Pixel Processing
After vertices are transformed and lit by the vertex processing pipeline, the pixel
processor takes the vertex information and generates the final rasterized pixels to be
displayed. The steps of this process include removing hidden surfaces, applying
textures and shading, and converting pixels to the final display format. The vertex/pixel
shader engine is described in Section 9.1.5.
The pixel processing operations also have their own data scheduling function that
controls image processor functions and the texture and shader routines.
9.1.4.1
Hidden Surface Removal
The image processor takes the floating-point results of the vertex processing and
further converts them to polygons for rasterization and depth processing. During depth
processing, the relative positions of objects in a scene, relative to the camera, are
determined. The surfaces of objects hidden behind other objects are then removed
from the scene, thus preventing the processing of un-seen pixels. This improves the
efficiency of subsequent pixel-processing.
9.1.4.2
9.1.4.3
Applying Textures and Shading
After hidden surfaces are removed, textures and shading are applied. Texture maps are
fetched, mipmaps calculated, and either is applied to the polygons. Complex pixel-
shader functions are also applied at this stage.
Final Pixel Formatting
The pixel formatting module is the final stage of the pixel-processing pipeline and
controls the format of the final pixel data sent to the memory. It supplies the unified
shader with an address into the output buffer, and the shader core returns the relevant
pixel data. The pixel formatting module also contains scaling functions, as well as a
dithering and data format packing function.
9.1.5
Unified Shader
The unified shader engine contains a specialized programmable microcontroller with
capabilities specifically suited for efficient processing of graphics geometries (vertex
shading), graphics pixels (pixel shading), and general-purpose video and image
processing programs. In addition to data processing operations, the unified shader
engine has a rich set of program-control functions permitting complex branches,
subroutine calls, tests, etc., for run-time program execution.
The unified shader core also has a task and thread manager which tries to maintain
maximum performance utilization by using a 16-deep task queue to keep the 16
threads full.
The unified store contains 16 banks of 128 registers. These 32-bit registers contain all
temporary and output data, as well as attribute information. The store employs
features which reduce data collisions such as data forwarding, pre-fetching of a source
argument from the subsequent instruction. It also contains a write back queue.
Like the register store, the arithmetic logic unit (ALU) pipelines are 32-bits wide. For
floating-point instructions, these correlate to IEEE floating point values. However, for
integer instructions, they can be considered as one 32-bit value, two 16-bit values, or
four 8-bit values. When considered as four 8-bit values, the integer unit effectively acts
like a four-way SIMD ALU, performing four operations per clock. It is expected that in
legacy applications pixel processing will be done on 8-bit integers, roughly quadrupling
the pixel throughput compared to processing on float formats.
Datasheet
97