Caravela: A Distributed Stream-based Computing
Transcrição
Caravela: A Distributed Stream-based Computing
technology from seed Caravela: A Distributed Stream-based Computing Platform Leonel Sousa, Schinichi Yamagiwa Students involved: Diogo Antão, Gabriel Falcão and Tomás Brandão Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 1 IBM Haifa Labs 17-04-2007 Outline technology from seed 1. Stream-based computing 2. GPUs 3. Flow-model 4. Caravela platform 5. Research future direction Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 2 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Stream-based computing technology from seed • Stream-based computing – Gathers data into a stream – Processes – Scatters the output data stream to memory • Applications – Audio, Image and Video processing (AAC, JPEG2000, MPEG CoDec) – Numerical and Scientific Computations – Simulation of Fluid Flows • Merits – Streams are the only resources that have to be touched by programs – No stall in processing – Hardware available to speed up processing Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 3 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Graphics Processing Unit technology from seed Graphics Processing Unit Textures for objects Vertex processor Map vertices to a global world Display to screen Pixel processor Coloring objects with textures Rasterizer Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 4 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 GPGPU technology from seed General Purpose processing on Graphics Processing Unit • GPU performs stream-based computing – Input texture images are inputted to GPU which calculates and outputs each pixel color data • Performance of GPUs – Recently: GPU performance improves twice every 6 months CPU improves twice every 18 month (Moore’s law: 18-24 months) – GeForce7 achieves 300GFLOPS Core2Duo achieves 8GFLOPS • GPUs in nowadays: – Are programmable – Have very high performance floating point units – High data parallelism (color elements RGBA) Can it become an high performance computing platform? Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 5 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Organization of this research work technology from seed Stream-based computing Flow-model Local execution Remote execution Pipeline Parallel Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 6 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Flow-model technology from seed Input Data Stream(s) Constant Values Input6 Input5 Input4 "Kernel" "Processor" Input3 Op Constant0 Memory effect Output2 Output1 Output0 Output Data Stream(s) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 7 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Local machine execution technology from seed • Map the flow-model unit to a shader – Input data stream: Texture image input – Output data stream: Frame buffer – Program: pixel shader program • What the application needs to do – Initialize input data stream – Execute flow-model unit – Read back output data stream Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 8 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Caravela platform and tools technology from seed • Why this designation? Caravela is named after the speedy vessel commanded by Pedro Álvares Cabral, the Portuguese explorer who discovered Brazil in 1500 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 9 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Caravela technology from seed Stream-based computing on GPUs Flow-model unit Local execution Remote execution Meta-Pipeline Applying MPI Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 10 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Caravela platform technology from seed • Creation of the flow-model unit • Caravela library (for Windows and Linux) – Local execution on GPU Optimized for recursive computing – Remote execution GPUs WebServices via Simple Object Access Protocol (SOAP) – Meta-pipelining and as MPI extension Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 11 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Creating of a flow-model unit technology from seed • Flow-model is an autonomous unit – Pixel Shader program: DirectX (Assembly, HLSL) OpenGL (GLSL) • FlowModelCreator – GUI to create flow-model unit – Saves flow-model to XML file • Benefits – Reproducible – Fit to distributed environment Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 12 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Caravela library • from seed Resource concept in Caravela library – – – • technology Machine: a machine that includes adapters Adapter: an video adapter that includes shaders Shader: a pixel processor Programming steps 1. 2. 3. 4. 5. 6. Acquiring shader(s) in machine Creating flow-model units Mapping flow-model units to shader GetInputData Executing (firing) flow-model units GetOutputData Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 13 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Experimental setup technology from seed Machine1 Machine2 Chipset Nvidia nForce Ultra Intel 945GM Express Main Memory AMD Opteron 170 @ 2GHz Intel CoreDuo T2300 @ 1.66GHz Graphics Board MSI NX 7300GS 256MB DDR nVidia GeForce Go 7400 128MB DDR2 Display Size 1280x1024 1280x800 OS Windows XP Pro Windows XP Home Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 14 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example 1D FIR filter with 16 taps OpenGL (GLSL) void main(){ int i,j; float inv = 1.0/Const4.x; vec4 res = vec4(0.0,0.0,0.0,0.0); vec2 coord = gl_TexCoord[0].xy; vec4 data0 = texture2D(CaravelaTex0, coord); coord.x+=inv; vec4 data1 = texture2D(CaravelaTex0, coord); coord.x+=inv; vec4 data2 = texture2D(CaravelaTex0, coord); technology from seed yn 15 bi * xn i 0 x b // for x value for( j=0; j<4; j++ ){ res.x += data0[j] * Const0[j]; res.x += data1[j] * Const1[j]; } // for y value for(j=1;j<4;j++) res.y += data0[j] * Const0[j-1]; res.y += data1[0] * Const0[3]; for( j=1; j<4; j++ ) res.y += data1[j] * Const1[j-1]; res.y += data2[0] * Const1[3]; y ... gl_FragData[0] = res; } DirectX (HLSL) i void main( in float2 t0: TEXCOORD0, out float4 oC0: COLOR0){ int j; float inv = 1.0/Const4.x; float4 res = 0; float2 coord = t0; float4 data0 = tex2D(CaravelaTex0, coord); coord.x += inv; float4 data1 = tex2D(CaravelaTex0, coord); coord.x += inv; float4 data2 = tex2D(CaravelaTex0, coord); // for x value for( j=0; j<4; j++ ) res.x += data0[j] * taps[j][0]; for( j=0; j<4; j++ ) res.x += data1[j] * taps[j][1]; // for y value for(j=1;j<4;j++) res.y += data0[j] * taps[j-1][0]; res.y += data1[0] * taps[3][0]; for( j=1; j<4; j++ ) res.y += data1[j] * taps[j-1][1]; res.y += data2[0] * taps[3][1]; oC0 = res; } Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 15 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Result of FIR filter technology from seed Caravela (DX9) Caravela (OpenGL) CPU 12 • 1Mega samples for input matrix with 30 iterations • 4-10 times faster than CPU-based execution 10 8 6 4 2 0 Machine1 Machine2 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 16 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example 1D IIR filter with 16 taps OpenGL GLSL yn void main(){ int i,j; float inv = 1.0/Const4.x; vec4 res = vec4(0.0,0.0,0.0,0.0); vec2 coord = gl_TexCoord[0].xy; vec4 data0 = texture2D(CaravelaTex0, coord); coord.x+=inv; vec4 data1 = texture2D(CaravelaTex0, coord); coord.x+=inv; vec4 data2 = texture2D(CaravelaTex0, coord); technology from seed 7 bi * xn 8 i i 0 bk * yn k 1 b x Recursive y // for x value for( j=0; j<4; j++ ){ res.x += data0[j] * Const0[j]; res.x += data1[j] * Const1[j]; } // for y value for(j=1;j<4;j++) res.y += data0[j] * Const0[j-1]; res.y += data1[0] * Const0[3]; for( j=1; j<4; j++ ) res.y += data1[j] * Const1[j-1]; res.y += data2[0] * Const1[3]; ... gl_FragData[0] = res; } y k DirectX HLSL void main( in float2 t0: TEXCOORD0, out float4 oC0: COLOR0){ int j; float inv = 1.0/Const4.x; float4 res = 0; float2 coord = t0; float4 data0 = tex2D(CaravelaTex0, coord); coord.x += inv; float4 data1 = tex2D(CaravelaTex0, coord); coord.x += inv; float4 data2 = tex2D(CaravelaTex0, coord); // for x value for( j=0; j<4; j++ ) res.x += data0[j] * taps[j][0]; for( j=0; j<4; j++ ) res.x += data1[j] * taps[j][1]; // for y value for(j=1;j<4;j++) res.y += data0[j] * taps[j-1][0]; res.y += data1[0] * taps[3][0]; for( j=1; j<4; j++ ) res.y += data1[j] * taps[j-1][1]; res.y += data2[0] * taps[3][1]; … oC0 = res; } Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 17 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Memory effect for recursive computing copies VRAM data output to CPU memory and back to VRAM technology from seed Exchanges pointers of input and output data in GPU side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 18 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 technology Result of IIR Filter from seed Caravela (DX9) • • • 1Mega samples for input matrix with 30 iterations 4-10 times faster than CPUbased execution The swap function on OpenGL reduces 50% regarding the DirectX implementation Caravela (OpenGL) CPU 12 10 8 6 4 2 0 Machine1 Machine2 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 19 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example 2D-DWT technology from seed • Example of a 2-level 2D-DWT decomposition Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 20 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example 2D-DWT technology from seed • For each iteration: – LL, HL, LH and HH are computed – LL is sent back to the input (swap) • Stops at the desired decomposition level – New problem: we are continuously decreasing data size • Shader program equations: LLn (i, j ) K M LLn 1 (2i k )(2 j m)l (k )l (m) k 0m 0 LH n (i, j ) K M LLn 1 (2i k )(2 j m)l (k )h( m) k 0m 0 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 21 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example LDPC decoder technology from seed • Low-Density Parity-Check codes are amoung the most powerful error correcting codes known (BER < 10-8) • LDPC decoders demand huge computational Power • Belief bropagation:Tanner graph and Sum Product recursive Algorithm (SPA) (i-1) q 20 (i) r00 (i-1) q10 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 22 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example LDPC technology from seed • (Example: Kernel 1) Computes the message sent from CNm to BNn, that indicates the probability of BNn (0 or 1) (i ) rmn ( 0) 1 2 1 2 n' (1 2qn(i'm1) (1)) (i ) (i ) rmn (1) 1 rmn ( 0) N (m)\n • SDF: kernels from stream-based LDPC decoding HBN, HCN: linear data structures representing H for streaming computing Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 23 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Application example LDPC technology from seed • Folded SDF for LDPC decoding on GPUs GPU • Significant speedup! (Paper submitted with results for GPU and also to MMX) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 24 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Remote execution technology from seed • Caravela Snoop Server – Creates a virtual network using WebServices • Worker : flow-model execution • Broker : maintains routing information to Workers – Caravela Library • Creates a remote machine • Queries shader processors via a broker machine. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 25 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Caravela network technology from seed Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 26 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 technology Distributed computing: Meta-pipeline from seed – To create a processing pipeline with multiple flow-model units – Flow-model units are virtually connected and mapped to distributed resources. Pipeline-model Map to worker machine network Caravela worker machine network Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 27 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Roadmap for the Caravela project 2006 technology from seed Caravela runtime for local execution Remote execution evaluation Meta-pipeline 2007 GPU benchmark over Caravela Input tool for meta-pipeline MPI-caravela Port to the CELL Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 28 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Publications technology from seed 1. Shinichi Yamagiwa, Leonel Sousa, "Caravela: A Novel Environment for streambased distributed computing", IEEE Computer Magazine, May 2007, pp.76-83 2. Shinichi Yamagiwa, Leonel Sousa, “Design and implementation of a stream-based distributed computing platform using graphics processing units”, ACM International Conference on Computing Frontier, May 2007 3. Shinichi Yamagiwa, Leonel Sousa, Diogo Antão, “Data buffering optimization methods toward a uniform programming interface for GPU-based applications”, ACM International conference of Computing Frontier, May 2007 4. Shinichi Yamagiwa, Leonel Sousa, Tomas Brandao, "Meta-Pipeline: A new execution mechanism for distributed pipeline processing", 6th International Symposium on Parallel and Distributed Computing (ISPDC 2007), August 2007 Shinichi Yamagiwa, Leonel Sousa, “Identifying execution deadlocks on pipeline processing”, submitted to the Europar Conference 2007 • Patent (pending) • “Program execution method applied to data streaming in distributed heterogeneous computing environment”, Portuguese national patent Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 29 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 Access the Caravela webpage now! technology from seed http://www.caravela-gpu.org Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 30 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007 technology from seed technology from seed Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 31 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007
Documentos relacionados
Locality Sensitive Hashing (1)
– Family of hashes H is (R,cR,P1,P2)-sensitive for any p,q ∈ P: • B(q,r) = {p: d(q,p) ≤ R}, ball of radius R • If p ∈ B(q,R) then Pr[h(q) = h(p)] ≥ P1 • If p ∉ B(q,cR) then Pr[h(q) = h(p)] ≤ P2 • H...
Leia maisGarbage Collection Curriculum Contents
divides the heap in two disjoint semi-spaces called from-space and to-space during normal mutator execution objects are allocated linearly in from-space once the collection starts the collector mov...
Leia mais