GPU-Friendly High-Quality Terrain Rendering

Transcrição

GPU-Friendly High-Quality Terrain Rendering
GPU-Friendly High-Quality
Terrain Rendering
Dipl.-Inf. Christian Dick
[email protected]
computer graphics & visualization
Terrain Data
Orthophoto
Digital Elevation Model
Christian Dick, 04.12.2007
computer graphics & visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Christian Dick, 04.12.2007
computer graphics & visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Resolution 0.25 m
Texture: 46 MB / km2 (R8G8B8)
Height field: 31 MB / km2 (16 Bit)
This region: 30 GB (400 km2)
Christian Dick, 04.12.2007
2)
Bavaria: 5.1 TB computer
(70549
km
graphics
& visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Overall PC Architecture
CPU
Theoretical
Bandwidths
8.3 GB/s
(1066 MHz FSB)
North Bridge
Main
Memory
12.8 GB/s
(DDR2-800
Dual-Channel)
Up to 8 GB
…
(Memory
Controller Hub)
PCI Express
(x16)
4 GB/s
(Each
Direction)
1 GB/s
(Each Direction)
PCI
South Bridge
133 MB/s
(Total)
(I/O Controller Hub)
64 GB/s (8800 GTS)
86.4 GB/s (8800 GTX)
103.7 GB/s (8800 Ultra)
Graphics
Memory
Several 100 GBs
~ 50 MB/s
(Disk Transfer
Rate)
Display
GPU
Up to 768 MB
SATA Ports
USB Ports
LAN
…
Christian Dick, 04.12.2007
computer graphics & visualization
The Graphics Pipeline
User / Driver
Vertex Stream
Transform & Lighting
Fragment Stream
Texturing
Vertex Stage
Rasterizer
Pixel Stage
Blending/Ops
Texture
0 1
Texture
Texture
2 3
Texture
Christian Dick, 04.12.2007
computer graphics & visualization
The Direct3D 10 Pipeline
Input Assembler Stage
Vertex Shader Stage*
Geometry Shader Stage*
Stream Output Stage
Memory
Resources
(Buffers,
Textures)
Rasterizer Stage
Pixel Shader Stage*
Output-Merger Stage
*Programmable
Christian Dick, 04.12.2007
computer graphics & visualization
Terrain Rendering
- Increase in resolution leads to extremely large
-
data sets
Resolution 0.25 m
- Texture: 46 MB / km2 (R8G8B8)
- Height field: 31 MB / km2 (16 Bit)
- Bavaria: 5.1 TB (70549 km2)
- Challenges:
- Limited rendering performance (triangles/s, …)
- Limited memory capacities, read/write rates and bus
-
bandwidths
Brute force not possible in general
Christian Dick, 04.12.2007
computer graphics & visualization
Terrain Rendering
- At any time only a portion of the data is visible
- Limited field of view (View Frustum)
- Limited resolution of the display / Constant number of
pixels (Level of detail)
Christian Dick, 04.12.2007
computer graphics & visualization
Terrain Rendering
- Only the visible data needs to be rendered and to
be resident in graphics memory
- Employ a memory hierarchy consisting of background
-
memory (hard disk, LAN), main memory and graphics
memory
Dynamically load data from background memory,
dependent on the movements of the viewer
Use pre-fetching to hide latency, caching
Data compression reduces demands on memory
capacities, read/write rates and bus bandwidths
- Favor compression schemes that can be decoded directly on
-
the GPU to reduce CPU load and CPU-GPU data transfer
Compression is done in a (time-consuming) pre-processing
step
Christian Dick, 04.12.2007
computer graphics & visualization
Terrain Rendering
- To determine the visible portion of the data a
hierarchical data structure is used
512x512
Samples
Level 0
2x2 Tiles
1024x1024
Samples
Level 1
4x4 Tiles
2048x2048
Samples
Level 2
8x8 Tiles
Christian Dick, 04.12.2007
computer graphics & visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Terrain Rendering
Level 0
2x2 Tiles
Level 1
4x4 Tiles
- Each tile consists of
256x256 samples
- The distance  between
two samples (world space
error) is halved from level
to level (top-down)
Level 2
8x8 Tiles
- The tile extent (256) is
halved from level to level
(top-down)
Christian Dick, 04.12.2007
computer graphics & visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Terrain Rendering
- The visible tiles are determined with view frustum
culling and level of detail computation
': Screen Space Error (0.7 Pixel)
: World Space Error
Christian Dick, 04.12.2007
computer graphics & visualization
Terrain Rendering
Christian Dick, 04.12.2007
computer graphics & visualization
Texture Compression – S3TC
- S3 Texture Compression, here: DXT1 (no alpha)
- Lossy compression scheme
- Supported by the hardware (Decoding is done by
-
the GPU on-the-fly during rendering)
The texture is divided into blocks of 4x4 texels
Each block is encoded into 64 Bits (4 bpp),
resulting in a compression ratio of 6:1 for R8G8B8
textures
Christian Dick, 04.12.2007
computer graphics & visualization
Texture Compression – S3TC
- For each block, two reference colors c0 and c1 are
-
stored in R5G6B5 format (2 x 16 Bit)
Two other colors are determined by linear
interpolation between c0 and c1:
c2 = 2/3 c0 + 1/3 c1
c3 = 1/3 c0 + 2/3 c1
For each texel a 2 Bit index is stored, selecting
one of the four colors for that texel (16 x 2 Bit)
c0
16 Bit
00 10 11 01
c2
10 00 10 01
c3
c1
00 10 00 11
00 01 01 11
16 Bit
16 x 2 Bit
Christian Dick, 04.12.2007
computer graphics & visualization
Texture Compression – S3TC
- Range Fit
G
c0
c2
c3
Color Space
n = ComputePrincipleAxis()
a, b = ComputeExtremePointsOnAxis( n )
ComputeDxtPointsFromEndPoints( a, b )
for each point:
index[point] = GetNearestDxtPoint()
c1
R
Christian Dick, 04.12.2007
computer graphics & visualization
Source: http://www.sjbrown.co.uk/?code=squish
Texture Compression – S3TC
- Range Fit
- Compute principal axis (by using least squares fitting)
- Project texel colors onto principal axis
- Determine extreme points on principal axis
- c0 and c1 are the texel colors which correspond to the
-
extreme points (c2 and c3 are determined with linear
interpolation between c0 and c1)
The indices are computed by determining the color ci
which is closest to the respective texel color (Euclidian
distance)
Christian Dick, 04.12.2007
computer graphics & visualization
Texture Compression – S3TC
- Cluster Fit
G
c0
c2
c3
c1
Color Space
n = ComputePrincipleAxis()
ordering = ComputeTotalOrderingFromAxis( n )
best = null
for each clustering that preserves ordering:
indices = GetIndicesFromClustering( clustering )
block = LeastSquaresFitDxtBlockUsingIndices( indices )
if error( block ) < error( best ):
best = block
R
Christian Dick, 04.12.2007
computer graphics & visualization
Source: http://www.sjbrown.co.uk/?code=squish
Texture Compression – S3TC
- Cluster Fit
- Compute principal axis (by using least squares fitting)
- Project texel colors onto principal axis
- This yields a total ordering of the texel colors along the
-
principal axis
Iterate over all clusterings that preserve the total
ordering; Note that a clustering already determines the
indices of the texels
- Compute c0 and c1 (and implicitly c2 and c3) by using least
squares fitting with respect to the Euclidian distance
between the texel colors and the colors ci
- Use the ci of the clustering that yields the smallest error
- Cluster fit is more accurate than range fit, but slower
- In iterative cluster fit, c0 and c1 define the principal axis
for the next iteration
Christian Dick, 04.12.2007
computer graphics & visualization
Texture Compression – S3TC
Original
Range Fit
Cluster Fit
Christian Dick, 04.12.2007
computer graphics & visualization
Source: http://www.sjbrown.co.uk/?code=squish
Texture Compression – S3TC
R8G8B8 (24 bpp)
DXT1 (4 bpp, 6:1)
Christian Dick, 04.12.2007
computer graphics & visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Texture Compression – S3TC
R8G8B8 (24 bpp)
DXT1 (4 bpp, 6:1)
Christian Dick, 04.12.2007
computer graphics & visualization
Luftbild/Geobasisdaten © Landesamt für Vermessung und Geoinformation Bayern
Geometry Compression
- Lossless compression scheme for restricted
-
quadtree meshes (no T-vertices)
A restricted quadtree mesh is built by successively
splitting triangles via diamond splits
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
- The compression method is based on a
-
generalized triangle strip representation of the
restricted quadtree mesh
Only one vertex per triangle needs to be stored
1
3
5
1
3
4
5
7
0
2
4
0
2
6
- To find a generalized triangle strip representation,
a directed path is constructed that visits each
triangle exactly once and enters/leaves triangles
only across edges
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
6
0
1
3
2
5
4
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
- The triangles can be classified into six different
cases depending on the edges through which the
path enters and leaves the triangle
Type A: From cathetus
to cathetus
Type B: From cathetus
to hypotenuse
Type C: From hypotenuse
to cathetus
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
- The construction of the path is directly
incorporated into the construction of the restricted
quadtree mesh by successively splitting triangles
via diamond splits
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
- For each triangle only the type (A, B, C) of the
triangle (the winding can be inferred) and the
height value of the new vertex is stored
(Per triangle: 2 Bit for the triangle type + Bits for
the height value)
New vertex
Already known vertices
Christian Dick, 04.12.2007
computer graphics & visualization
Geometry Compression
- Decompression is done on the GPU using the
geometry shader
CL, CR, AL, AL, …, BR, BL, AL + height values
Christian Dick, 04.12.2007
computer graphics & visualization
Thanks for your attention
Thanks for
your attention!
Please feel free
to ask questions!
Christian Dick, 04.12.2007
computer graphics & visualization