Slides
Transcrição
Slides
HEROKU CAMINHO ATÉ A ALTA ESCALA E DISPONIBILIDADE [email protected] QCONSP 2014 FABIO KUNG Tech Lead, Runtime Systems at Heroku heroku scale web=3 worker=2 alta-escala-disponibilidade.herokuapp.com milhões de aplicações (web) um dos maiores deployments de Linux containers (LXC) do mundo >60k requisições por segundo >5G requisições por dia 12FACTOR.NET portáveis plataformas modernas (cloud) elasticidade duas regiões em produção: us-east and eu-west diversas Availability Zones 2008 2009 2010 CRESCIMENTO facebook/heroku 2011 2013 VIBE starving-samurai-42.herokuapp.com https://www.flickr.com/photos/timriley/9361949580 cultura hacker flickr/dominicotine TIMES E COMPONENTES TOTAL OWNERSHIP Dependências? Autonomia Poliglota Full stack CORE -> MICROSERVICES no free lunch INTERFACES IMPLÍCITAS documentação pobre, informal manifest-driven APIs evolução, updates, releases coordenados SISTEMAS DISTRIBUÍDOS retry circuit breaker rate limiting rollback (transações distribuidas) state replication cache ... HEROKU SCALE WEB=3 WORKER=5 HEROKU SCALE WEB=3 WORKER=5 TROUBLESHOOTING assincronicidade distributed tracing visibilidade! TESTES DEPLOYS DUPLICAÇÃO! EPHEMERALIZATION Do more with less. DOGFOODING TOOLS TEAM DEVCLOUDS boot your own Heroku @merman boot my cloud KERNEL PLATFORM DIREWOLF POSTGRESQL contra exemplo: RabbitMQ ORG ACCOUNTS MÚLTIPLAS TECNOLOGIAS diretrizes service toolkits produto poliglota #OPSLIFE plantões semanais ESCALATION PATH time todo na rotação gerente do time Incident Commander TRANSPARÊNCIA status.heroku.com csquared's Heroku Outage Lights System TIME DE OPS Total ownership? SRE SITE RELIABILITY ENGINEERS confiabilidade global capacity planning reviews retrospectivas de incidentes tools, dashboards fardo do plantão MUDANÇAS atualizar instâncias existentes vs. substituir por novas instâncias AVERSÃO A RISCO mudanças simples de uma linha -> catástrofe RIGOR "Hackers write Too Much Software. Need to change Process. Heroes mask Too Many Problems. Need to change Teamwork." -- Noah, Engineering Manager REVISÃO DE CÓDIGO async, membros remotos DOCUMENTAÇÃO DIAGRAMAS DESIGN BLOG DRIVEN DEVELOPMENT CFP grandes decisões difîceis CHECKLISTS Example: production checklist ✓ Has ops docs with executable instructions ✓ Has a high-fidelity staging setup with production parity ✓ Requested audit from the security team ✓ Alerts a human if it is down ✓ Simulated failures ✓ Uses structured logging ✓ Enforces SSL access ✓ Creds and rotation procedures are documented ✓ Send a launch email to engineering@ ✓ Move to Production on the Engineering Lifecycle board SUPORTE embutido BUS FACTOR Total ownership? BENEVOLENT DICTATORSHIP BDFL COOPETIÇÃO COMPETIÇÃO COOPERATIVA LXC ex.: DotCloud, container-rfc We lost the standards game for virtual machine images, but it feels like this community is tight nit enough we might be able to do something for Linux Containers. -- Alex Polvi (coreos.com) GIT $ git push heroku master Counting objects: 1, done. Writing objects: 100% (1/1), 181 bytes | 0 bytes/s, done. Total 1 (delta 0), reused 0 (delta 0) -----> Ruby app detected -----> Compiling Ruby ... To [email protected]:myapp.git 91dfe0b..f251ba7 master -> master ex.: GitHub 2012 CLOUD ex.: AWS, AppEngine PESSOAS política de "não jerks" CORE -> TIMES INDEPENDENTES TOTAL OWNERSHIP FOCO? SRE produto -heróis +coordenação HEROKU NA EUROPA Furacão Sandy (2012) -> us-east -> us-west GERÊNCIA mdz's Scaling Human Systems SLACK always too busy O QUE MUDOU? valores (Adam Wiggins) EPHEMERALIZATION Do more with less. MAKE IT REAL Ideas are cheap. SHIP IT Nothing is real until it's being used by a real user. DO IT WITH STYLE Aesthetic matters. INTUITION-DRIVEN | DATA-DRIVEN Users don't really know what they want. ... PROVE COM DADOS [email protected] DIVIDE AND CONQUER If it's hard, cut scope. TIMING MATTERS Maybe now isn't the right time. THROW THINGS AWAY Never be afraid to throw something away and do it again. https://www.flickr.com/photos/teich/9427507382/ SMALL SHARP TOOLS Composability. The Art of Unix Programming. Also teams. Several small, autonomous, focused teams. PUT IT IN THE CLOUD Services, not software. RESULTS, NOT POLITICS "get ahead" in your career by delivering real value. Not by impressing your boss or with big talk. DECISION-MAKING VIA OWNERSHIP NOT CONSENSUS OR AUTHORITY Ownership can't be given, only taken. Ownership can't be declared, only demonstrated. DO-OCRACY / INTRAPRENEURSHIP Ask forgiveness, not permission. EVERYTHING IS AN EXPERIMENT Everything is always subject to change. Ending an experiment isn't a failure. OWN UP TO FAILURE Admit your mistake, say you're sorry, and feel the failure to make sure you learned from it. Then, get back to work. GRADUAL ROLLOUTS Incremental. Adjust. DESIGN EVERYTHING Be intentional. QUESTION EVERYTHING The status quo is never good enough. MANIACAL FOCUS ON SIMPLICITY There is no step 1. ... CÓDIGO ESPERTO DEMAIS load averages ELB -> unicorns CLI 4 LIFE Command-line interfaces are the heart of developer workflows. IGNORE THE COMPETITION Except to borrow good ideas. WRITE WELL Clear writing is clear thinking. STRONG OPINIONS, WEAKLY HELD Be willing to change your mind. OBRIGADO! [email protected]