9 GSS_Experience_2013-04-16 KKfinal

Transcrição

9 GSS_Experience_2013-04-16 KKfinal
IBM GPFS Expert Workshop
GPFS Storage Server
erste Erfahrungswerte aus der Praxis
IBM GPFS Expert Workshop
16. / 17. April 2013
IBM Executive Briefing Center
Mainz
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Agenda
What are GPFS Native RAID and GPFS Storage Server?
GSS – from Design and Beta testing to Limited Availability
Experience with GSS
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Agenda
What are GPFS Native RAID and GPFS Storage Server?
GSS – from Design and Beta testing to Limited Availability
Experience with GSS
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server – replacing Storage Controllers
Compute Cluster
R
RD
Compute Cluster
File/Data Servers
NSDFile
FileServer
Server11
NSD
x3650
x3650
NSD File Server 2
NSD File Server 2
FDR IB
10 GigE
NSDFile
FileServer
Server11
NSD
Migrate RAID to
Commodity File
Servers
GPFSNative
NativeRAID
RAID
GPFS
NSDFile
FileServer
Server22
NSD
GPFS Native RAID
GPFS Native RAID
Dedicated
Disk Controllers
Disk Enclosures
Disk Enclosures
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Introducing IBM GPFS Storage Server
What’s New:
• Replaces external hardware controller with software based RAID
• Modular upgrades improve TCO
• Non-intrusive disk diagnostics
Client Business Value:
•
•
•
•
•
•
Integrated and ready to go for Big Data applications
3 years maintenance and support
Improved storage affordability
Delivers data integrity, end-to-end
Faster rebuild and recovery times
Reduces rebuild overhead by 3.5x
Key Features:
•
•
•
•
•
•
•
Declustered RAID (8+2p, 8+3p)
2- and 3-fault-tolerant erasure codes
End-to-end checksum
Protection against lost writes
Off-the-shelf JBODs
Standardized in-band SES management
SSD Acceleration Built-in
© 2013 IBM Corporation
IBM GPFS Expert Workshop
A Scalable Building Block Approach to Storage
Complete Storage Solution
Data Servers, Disk (SSD and NL-SAS), Software, Infiniband and Ethernet
x3650
M4
“Twin Tailed” JBOD
Disk Enclosure
Model 24:
Light and Fast
4 Enclosures 20U
232 NL-SAS 6 SSD
Model 26:
HPC
Workhorse!
6 Enclosures 28U
348 NL-SAS 6 SSD
High Density HPC
Options
18 Enclosures
2 - 42u Standard Racks
1044 NL-SAS 18 SSD
6
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server (GSS) documentation
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/doc_updates/bl1du13a.pdf
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GSS Limited Availability
Since End of March 2013
Components
– GSS hardware (either GSS24 or GSS26)
– GPFS software
– GPFS Native RAID RPQ
Considerations for GSS LA
– GPFS development is involved in customer projects
– Expect more regular software updates (and maintenance windows) compared to GPFS
– No „out of the box“ Graphical User Interface for monitoring the storage (CLI)
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Agenda
What are GPFS Native RAID and GPFS Storage Server?
GSS – from Design and Beta testing to Limited Availability
Experience with GSS
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Native Raid (GNR) on Power 775 – GA on 11/11/11
See Chapter 10 of GPFS 3.5 Advanced Administration Guide (SC23-5182-05)
– or GPFS 3.4 Native RAID Administration and Programming Reference (SA23-1354-00)
First customers:
– Weather agencies, government agencies, universities.
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
2012/01
Suggestion of a potential GNR solution based on IBM System x servers
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
2012/01
Suggestion of a potential GNR solution based on IBM System x servers
2012/02
Running initial base bandwidth tests on similar hardware (without GNR)
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Hardware used to create a “proof of concepts” building block
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
2012/01
Suggestion of a potential GNR solution based on IBM System x servers
2012/02
Running initial base bandwidth tests on similar hardware (without GNR)
2012/03-04
Running tests with GNR code on similar hardware
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Initial results with a single RG on test hardware
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
2012/01
Suggestion of a potential GNR solution based on IBM System x servers
2012/02
Running initial base bandwidth tests on similar hardware (without GNR)
2012/03-04
Running tests with GNR code on similar hardware
2012/05-12
Finalizing building block design and ongoing tests
2012/11
Announcement IBM GPFS Storage Server hardware
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server hardware announcement 11/2012
Source: http://www-01.ibm.com/common/ssi/rep_ca/8/897/ENUS112-218/ENUS112-218.PDF
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
2012/01
Suggestion of a potential GNR solution based on IBM System x servers
2012/02
Running initial base bandwidth tests on similar hardware (without GNR)
2012/03-04
Running tests with GNR code on similar hardware
2012/05-12
Finalizing building block design and ongoing tests
2012/11
Announcement IBM GPFS Storage Server hardware
2013/01
Hardware installation of 20 GPFS Storage Server(24) building blocks
© 2013 IBM Corporation
IBM GPFS Expert Workshop
20 GSS building blocks in 10 racks
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server infrastructure
client
client
client
client
client
client
client
10 GbE (c1)
0
2
4
bondc1
1
3
5
bondc2
gss01ac1
gss01a
0
2
4
bondc1
5
1
3
gss01b
client
client
client
client
Mgmt
(xCAT)
Mon
(Icinga)
5
5
10 GbE (c2)
5
bondc2
gss01bc1
client
5
...
0
2
4
bondc1
1
1 GbE (mgmt)
3
5
bondc2
gss20ac1
gss20a
5
0
2
4
bondc1
1
3
5
bondc2
gss20bc1
5
gss20b
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GPFS Storage Server history
2011/11
GPFS Native RAID is released on IBM Power 775 (GA 11/11/11)
2012/01
Suggestion of a potential GNR solution based on IBM System x servers
2012/02
Running initial base bandwidth tests on similar hardware (without GNR)
2012/03-04
Running tests with GNR code on similar hardware
2012/05-12
Finalizing building block design and ongoing tests
2013/01
Hardware installation of 20 GPFS Storage Server(24) building blocks
2013/01-02
Installation of GNR beta software and testing
2012/11
Announcement IBM GPFS Storage Server hardware
2013/03-04
Implementation of Icinga monitoring and move to GSS LA software level
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Agenda
What are GPFS Native RAID and GPFS Storage Server?
GSS – from Design and Beta testing to Limited Availability
Experience with GSS
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Experience with GSS
What worked good
– GNR core functionality (Redundancy codes, failed disk recovery, ...)
– Support by GPFS development team (also onsite support)
Challenges
– How to effectively setup and manage a GSS cluster?
– How to achieve aggregate bandwidth over 10 GbE using a limited number of clients?
– How to demonstrate the „data integrity“ feature?
– How to monitor GSS?
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Experience with GSS
What worked good
– GNR core functionality (Redundancy codes, failed disk recovery, ...)
– Support by GPFS development team (also onsite support)
Challenges
– How to effectively setup and manage a GSS cluster?
– How to achieve aggregate bandwidth over 10 GbE using a limited number of clients?
– How to demonstrate the „data integrity“ feature?
– How to monitor GSS?
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Experience with GSS
What worked good
– GNR core functionality (Redundancy codes, failed disk recovery, ...)
– Support by GPFS development team (also onsite support)
Challenges
– How to effectively setup and manage a GSS cluster?
– How to achieve aggregate bandwidth over 10 GbE using a limited number of clients?
– How to demonstrate the „data integrity“ feature?
– How to monitor GSS?
© 2013 IBM Corporation
IBM GPFS Expert Workshop
xCAT as GSS provisioning and management tool
Dedicated xCAT server makes sense, but is not required
GPFS development team is providing xCAT install support
– New xCAT provisioning method „gssServer“
• groups xCAT postscripts to set up GSS servers
• configures xCAT ressources
– New subtree /install/gss
• Includes all software to provision GSS server
The provisioning method can be implemented on an existing GSS server
– Will not damage existing installation
– Requires adequate xCAT level (>= 2.7.6?) and RHEL 6.3 on xCAT server
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Experience with GSS
What worked good
– GNR core functionality (Redundancy codes, failed disk recovery, ...)
– Support by GPFS development team (also onsite support)
Challenges
– How to effectively setup and manage a GSS cluster?
– How to achieve aggregate bandwidth over 10 GbE using a limited number of clients?
– How to demonstrate the „data integrity“ feature?
– How to monitor GSS?
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Challenge: avoid congestion of server ports with limited clients
client
client
client
client
client
client
client
client
client
client
c1
0
2
4
bondc1
client
client
c2
1
3
bondc2
5
0
2
4
bondc1
1
3
5
bondc2
justgss02ac1
justgss02bc1
justgss02a
justgss02b
Port usage is used on hashing algorithms
– For write, the switch will decide which link to the server to use
– For read, the server will decide which link to use to the switch
Challenge for the test environment: if the same port is used by two connections (red), another is unused (yellow)
– this will cut the available bandwidth for the two clients in half and not use one server port at all
– not an issue for large number of clients, where port usage will level out to saturate the ports
29
23. April 2013
FZJ - GSS Performance
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GSS single building block – gpfsperf results (preliminary)
Total MB transferred divided by total runtime
(long runners reduce numbers)
SUM of gpfsperf results per process
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Experience with GSS
What worked good
– GNR core functionality (Redundancy codes, failed disk recovery, ...)
– Support by GPFS development team (also onsite support)
Challenges
– How to effectively setup and manage a GSS cluster?
– How to achieve aggregate bandwidth over 10 GbE using a limited number of clients?
– How to demonstrate the „data integrity“ feature?
– How to monitor GSS?
© 2013 IBM Corporation
IBM GPFS Expert Workshop
How to demonstrate the „data integrity“ feature?
This is easy to do (for a GPFS developer)
This is not easy (for a customer or IBMer outside of GPFS development)
Ongoing discussion with GPFS development...
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Experience with GSS
What worked good
– GNR core functionality (Redundancy codes, failed disk recovery, ...)
– Support by GPFS development team (also onsite support)
Challenges
– How to effectively setup and manage a GSS cluster?
– How to achieve aggregate bandwidth over 10 GbE using a limited number of clients?
– How to demonstrate the „data integrity“ feature?
– How to monitor GSS?
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Monitoring GPFS Native RAID
GNR commands that show configuration, but also current state
– mmlsrecoverygroup
– mmlspdisk
– mmpmon
GPFS Native RAID event log
– mmlsrecoverygroupevents
GPFS Native RAID user exits (caution!) with mmaddcallback
– preRGTakeover
– postRGTakeover
– preRGRelinquish
– postRGRelinquish
– rgOpenFailed
– rgPanic
– pdFailed
– pdRecovered
– pdReplacePdisk
– pdPrioritypdPathDown
– daRebuildFailed
– nsdCksumMismatch
gnrhealthcheck [--topology] [--enclosure] [--rg] [--pdisk][--perf-dd] [--local]
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Monitoring of GSS using Icinga with check_mk scripting
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GSS Monitoring – Icinga Tactical Overview
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GSS Monitoring – GSS building block overview
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GSS Monitoring – check_mk services
© 2013 IBM Corporation
IBM GPFS Expert Workshop
GSS Monitoring – check_mk services in detail
© 2013 IBM Corporation
IBM GPFS Expert Workshop
Acknowledgements
Client team
– Ulrike Schmidt
– Lothar Wollschläger
– Stefan Graf
GPFS researach and development
– Sven Oehme
– Puneet Chaudhary
– Marc Roskow
IBM Global Technology Services
– Michael Lauffs
– Steffen Waitz
IBM Technical Computing
– Martin Hiegl
– Michael Hennecke
© 2013 IBM Corporation

Documentos relacionados