GPU-Friendly High-Quality Terrain Rendering
Transcrição
GPU-Friendly High-Quality Terrain Rendering
GPU-Friendly High-Quality Terrain Rendering Dipl.-Inf. Christian Dick [email protected] computer graphics & visualization Terrain Data Orthophoto Digital Elevation Model Christian Dick, 04.12.2007 computer graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Christian Dick, 04.12.2007 computer graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Resolution 0.25 m Texture: 46 MB / km2 (R8G8B8) Height field: 31 MB / km2 (16 Bit) This region: 30 GB (400 km2) Christian Dick, 04.12.2007 2) Bavaria: 5.1 TB computer (70549 km graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Overall PC Architecture CPU Theoretical Bandwidths 8.3 GB/s (1066 MHz FSB) North Bridge Main Memory 12.8 GB/s (DDR2-800 Dual-Channel) Up to 8 GB … (Memory Controller Hub) PCI Express (x16) 4 GB/s (Each Direction) 1 GB/s (Each Direction) PCI South Bridge 133 MB/s (Total) (I/O Controller Hub) 64 GB/s (8800 GTS) 86.4 GB/s (8800 GTX) 103.7 GB/s (8800 Ultra) Graphics Memory Several 100 GBs ~ 50 MB/s (Disk Transfer Rate) Display GPU Up to 768 MB SATA Ports USB Ports LAN … Christian Dick, 04.12.2007 computer graphics & visualization The Graphics Pipeline User / Driver Vertex Stream Transform & Lighting Fragment Stream Texturing Vertex Stage Rasterizer Pixel Stage Blending/Ops Texture 0 1 Texture Texture 2 3 Texture Christian Dick, 04.12.2007 computer graphics & visualization The Direct3D 10 Pipeline Input Assembler Stage Vertex Shader Stage* Geometry Shader Stage* Stream Output Stage Memory Resources (Buffers, Textures) Rasterizer Stage Pixel Shader Stage* Output-Merger Stage *Programmable Christian Dick, 04.12.2007 computer graphics & visualization Terrain Rendering - Increase in resolution leads to extremely large - data sets Resolution 0.25 m - Texture: 46 MB / km2 (R8G8B8) - Height field: 31 MB / km2 (16 Bit) - Bavaria: 5.1 TB (70549 km2) - Challenges: - Limited rendering performance (triangles/s, …) - Limited memory capacities, read/write rates and bus - bandwidths Brute force not possible in general Christian Dick, 04.12.2007 computer graphics & visualization Terrain Rendering - At any time only a portion of the data is visible - Limited field of view (View Frustum) - Limited resolution of the display / Constant number of pixels (Level of detail) Christian Dick, 04.12.2007 computer graphics & visualization Terrain Rendering - Only the visible data needs to be rendered and to be resident in graphics memory - Employ a memory hierarchy consisting of background - memory (hard disk, LAN), main memory and graphics memory Dynamically load data from background memory, dependent on the movements of the viewer Use pre-fetching to hide latency, caching Data compression reduces demands on memory capacities, read/write rates and bus bandwidths - Favor compression schemes that can be decoded directly on - the GPU to reduce CPU load and CPU-GPU data transfer Compression is done in a (time-consuming) pre-processing step Christian Dick, 04.12.2007 computer graphics & visualization Terrain Rendering - To determine the visible portion of the data a hierarchical data structure is used 512x512 Samples Level 0 2x2 Tiles 1024x1024 Samples Level 1 4x4 Tiles 2048x2048 Samples Level 2 8x8 Tiles Christian Dick, 04.12.2007 computer graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Terrain Rendering Level 0 2x2 Tiles Level 1 4x4 Tiles - Each tile consists of 256x256 samples - The distance between two samples (world space error) is halved from level to level (top-down) Level 2 8x8 Tiles - The tile extent (256) is halved from level to level (top-down) Christian Dick, 04.12.2007 computer graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Terrain Rendering - The visible tiles are determined with view frustum culling and level of detail computation ': Screen Space Error (0.7 Pixel) : World Space Error Christian Dick, 04.12.2007 computer graphics & visualization Terrain Rendering Christian Dick, 04.12.2007 computer graphics & visualization Texture Compression – S3TC - S3 Texture Compression, here: DXT1 (no alpha) - Lossy compression scheme - Supported by the hardware (Decoding is done by - the GPU on-the-fly during rendering) The texture is divided into blocks of 4x4 texels Each block is encoded into 64 Bits (4 bpp), resulting in a compression ratio of 6:1 for R8G8B8 textures Christian Dick, 04.12.2007 computer graphics & visualization Texture Compression – S3TC - For each block, two reference colors c0 and c1 are - stored in R5G6B5 format (2 x 16 Bit) Two other colors are determined by linear interpolation between c0 and c1: c2 = 2/3 c0 + 1/3 c1 c3 = 1/3 c0 + 2/3 c1 For each texel a 2 Bit index is stored, selecting one of the four colors for that texel (16 x 2 Bit) c0 16 Bit 00 10 11 01 c2 10 00 10 01 c3 c1 00 10 00 11 00 01 01 11 16 Bit 16 x 2 Bit Christian Dick, 04.12.2007 computer graphics & visualization Texture Compression – S3TC - Range Fit G c0 c2 c3 Color Space n = ComputePrincipleAxis() a, b = ComputeExtremePointsOnAxis( n ) ComputeDxtPointsFromEndPoints( a, b ) for each point: index[point] = GetNearestDxtPoint() c1 R Christian Dick, 04.12.2007 computer graphics & visualization Source: http://www.sjbrown.co.uk/?code=squish Texture Compression – S3TC - Range Fit - Compute principal axis (by using least squares fitting) - Project texel colors onto principal axis - Determine extreme points on principal axis - c0 and c1 are the texel colors which correspond to the - extreme points (c2 and c3 are determined with linear interpolation between c0 and c1) The indices are computed by determining the color ci which is closest to the respective texel color (Euclidian distance) Christian Dick, 04.12.2007 computer graphics & visualization Texture Compression – S3TC - Cluster Fit G c0 c2 c3 c1 Color Space n = ComputePrincipleAxis() ordering = ComputeTotalOrderingFromAxis( n ) best = null for each clustering that preserves ordering: indices = GetIndicesFromClustering( clustering ) block = LeastSquaresFitDxtBlockUsingIndices( indices ) if error( block ) < error( best ): best = block R Christian Dick, 04.12.2007 computer graphics & visualization Source: http://www.sjbrown.co.uk/?code=squish Texture Compression – S3TC - Cluster Fit - Compute principal axis (by using least squares fitting) - Project texel colors onto principal axis - This yields a total ordering of the texel colors along the - principal axis Iterate over all clusterings that preserve the total ordering; Note that a clustering already determines the indices of the texels - Compute c0 and c1 (and implicitly c2 and c3) by using least squares fitting with respect to the Euclidian distance between the texel colors and the colors ci - Use the ci of the clustering that yields the smallest error - Cluster fit is more accurate than range fit, but slower - In iterative cluster fit, c0 and c1 define the principal axis for the next iteration Christian Dick, 04.12.2007 computer graphics & visualization Texture Compression – S3TC Original Range Fit Cluster Fit Christian Dick, 04.12.2007 computer graphics & visualization Source: http://www.sjbrown.co.uk/?code=squish Texture Compression – S3TC R8G8B8 (24 bpp) DXT1 (4 bpp, 6:1) Christian Dick, 04.12.2007 computer graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Texture Compression – S3TC R8G8B8 (24 bpp) DXT1 (4 bpp, 6:1) Christian Dick, 04.12.2007 computer graphics & visualization Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern Geometry Compression - Lossless compression scheme for restricted - quadtree meshes (no T-vertices) A restricted quadtree mesh is built by successively splitting triangles via diamond splits Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression - The compression method is based on a - generalized triangle strip representation of the restricted quadtree mesh Only one vertex per triangle needs to be stored 1 3 5 1 3 4 5 7 0 2 4 0 2 6 - To find a generalized triangle strip representation, a directed path is constructed that visits each triangle exactly once and enters/leaves triangles only across edges Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression 6 0 1 3 2 5 4 Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression - The triangles can be classified into six different cases depending on the edges through which the path enters and leaves the triangle Type A: From cathetus to cathetus Type B: From cathetus to hypotenuse Type C: From hypotenuse to cathetus Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression - The construction of the path is directly incorporated into the construction of the restricted quadtree mesh by successively splitting triangles via diamond splits Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression - For each triangle only the type (A, B, C) of the triangle (the winding can be inferred) and the height value of the new vertex is stored (Per triangle: 2 Bit for the triangle type + Bits for the height value) New vertex Already known vertices Christian Dick, 04.12.2007 computer graphics & visualization Geometry Compression - Decompression is done on the GPU using the geometry shader CL, CR, AL, AL, …, BR, BL, AL + height values Christian Dick, 04.12.2007 computer graphics & visualization Thanks for your attention Thanks for your attention! Please feel free to ask questions! Christian Dick, 04.12.2007 computer graphics & visualization