Big Data Analytics - Zwischen Wunsch und Realität
Transcrição
Big Data Analytics - Zwischen Wunsch und Realität
6/30/2014 Big Data Analytics Zwischen Wunsch und Realität © 2014 IBM Corporation Dr. Wolfgang Rother IBM Deutschland GmbH Nahmitzer Damm 12 12277 Berlin Email: [email protected] © 2014 IBM Corporation 1 6/30/2014 Agenda • • • • • • • • • Über Daten Paradigmenwechsel Apache Hadoop Ein einfaches Beispiel für Text Analytics IBM Watson Big Data ist nicht nur Hadoop Weitere Big Data Analytics Beispiele Why Infrastructure Matters Zwischen Wunsch und Realität 3 How Big is the Internet of Things? 4 © 2014 IBM Corporation 2 6/30/2014 The 10 A They major million meters read gas the and meters read meters electricread utilityevery has 15 minutes = Now, they ’smart reare installing 10 once million abillion an month. hour. meters. smart 350 meters. transactions a year. © 2014 IBM Corporation 5 The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing This means enterprises are getting “more naive” over time Data AVAILABLE to an organization Data an organization can PROCESS 6 © 2014 IBM Corporation 3 6/30/2014 The Four V’s Volume Use greater amounts of data Variety Use more types of data Velocity Use data more quickly Veracity Use uncertain data 7 © 2014 IBM Corporation Big Data is All Data and All Paradigms Transactional & Application Data 8 Machine Data Social Data Enterprise Content • Volume • Velocity • Variety • Variety • Structured • Structured • Unstructured • Unstructured • Throughput • Ingestion • Veracity • Volume © 2014 IBM Corporation 4 6/30/2014 PARADIGMENWECHSEL © 2014 IBM Corporation How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Leverage more of the data being captured TRADITIONAL APPROACH All available information Analyze small subsets of information 10 BIG DATA APPROACH Analyzed information All available information analyzed Analyze all information © 2014 IBM Corporation 5 6/30/2014 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Reduce effort required to leverage data TRADITIONAL APPROACH BIG DATA APPROACH Small amount of carefully organized information Carefully cleanse information before any analysis Large amount of messy information Analyze information as is, cleanse as needed © 2014 IBM Corporation 11 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Data leads the way – and sometimes correlations are good enough TRADITIONAL APPROACH Hypothesis Question Data Exploration Answer Data Insight Correlation Start with hypothesis and test against selected data 12 BIG DATA APPROACH Explore all data and identify correlations © 2014 IBM Corporation 6 6/30/2014 How is Big Data transforming the way organizations analyze information and generate actionable insights? Paradigm shifts enabled by big data Leverage data as it is captured TRADITIONAL APPROACH BIG DATA APPROACH Data Analysis Data Repository Analysis Insight Insight Analyze data after it’s been processed and landed in a warehouse or mart Analyze data in motion as it’s generated, in real-time © 2014 IBM Corporation 13 APACHE HADOOP © 2014 IBM Corporation 7 6/30/2014 It’s easy to forget just how “big” the data really is! Datasets are vast Facebook daily logs ~ 60 TB 1,000 genomes project ~ 200 TB Google web index ~ 10+ PB Storage is cheap Cost of a commodity 1TB drive ~ $50 A terabyte is still a lot of data! Time to read 1TB from a single disk: ~ 6 hours @ 50 MB/second !! As data gets big, traditional approaches no longer work Distributed systems are the only way to scale 15 15 © 2014 IBM Corporation What is Hadoop? Apache Hadoop = free, open source framework for data-intensive applications – Inspired by Google technologies (MapReduce, GFS) – Well-suited to batch-oriented, read-intensive applications – Originally built to address scalability problems of Nutch, an open source Web search technology Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner – CPU + disks of commodity box = Hadoop “node” – Boxes can be combined into clusters – New nodes can be added as needed without changing • Data formats • How data is loaded • How jobs are written 16 © 2014 IBM Corporation 8 6/30/2014 How files are stored: HDFS • Key ideas: • Divide big files in blocks and store blocks randomly across cluster • Provide API to ask: where are the pieces of this file? • => Programs can be shipped to nodes for parallel distributed processing 10110100 10100100 11100111 11100101 00111010 01010010 11001001 01010011 00010100 10111010 11101011 11011011 01010110 10010101 00101010 10101110 01001101 01110100 1 Cluster 2 Blocks 3 3 1 2 2 4 1 4 2 4 1 3 4 3 Logical File 17 © 2014 IBM Corporation HDFS stores data across multiple nodes http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html 18 © 2014 IBM Corporation 9 6/30/2014 HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html © 2014 IBM Corporation 19 How Files are Processed: MapReduce • Common pattern in data processing: apply a function, then aggregate grep "World Cup” *.txt | wc –l • User simply writes two pieces of code: “mapper” and “reducer” • Mapper code executes on every split of every file • Reducer consumes/aggregates mapper outputs • The Hadoop MR framework takes care of the rest (resource allocation, scheduling, coordination, temping of intermediate results, storage of final result on HDFS) Cluster 10110100 10100100 11100111 11100101 00111010 01010010 11001001 01010011 00010100 10111010 11101011 11011011 01010110 10010101 1 Splits 2 3 20 Logical File 2 1 Map 3 Map Reduce Map Result © 2014 IBM Corporation 10 6/30/2014 Logical MapReduce Example: Word Count Content of Input Documents map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 21 Hello World Bye World Hello IBM Map 1 emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> Map 2 emits: < Hello, 1> < IBM, 1> Reduce (final output): < Bye, 1> < IBM, 1> < Hello, 2> < World, 2> © 2014 IBM Corporation WordCount 22 © 2014 IBM Corporation 11 6/30/2014 So What Does This Result In? Easy To Scale Fault Tolerant and Self-Healing Data Agnostic Extremely Flexible BUT you need programming skills 23 © 2014 IBM Corporation EIN EINFACHES BEISPIEL FÜR TEXTANALYSE Aus einer Bachelorarbeit Wirtschaftsinformatik FH Brandenburg © 2014 IBM Corporation 12 6/30/2014 Use Case: IBM Quartalsberichte Ziel: Lösung eines Big Data Textanalyse Problems ohne Expertenhilfe oder spezielle Ausbildung Umgebung: –IBM POWER 7R2 Server –RHEL 6.2 –IBM Infosphere BigInsights 2.0 Ablauf –Laden von Pressemitteilungen mittels Webcrawler –Erste Verarbeitung in BigSheets –Entwicklung von Textanalyse Skripts –Anwenden der Skripte 25 © 2014 IBM Corporation BigInsights Enterprise Edition Optional IBM and partner offerings Analytics and discovery Text processing engine and library Accelerator for social data analysis BigSheets Accelerator for machine data analysis Infrastructure Integrated installer Text compression Enhanced security Open Source “Apps” Web Crawler Boardreader Distrib file copy ... Flume 26 Data processing Pig HBase Hive GPFS (EAP) Adaptive MapReduce MapReduce HCatalog Connectivity and Integration Data Explorer Machine learning Jaql Lucene Administrative and development tools Ad hoc query Oozie Indexing Sqoop DB import ZooKeeper Flexible scheduler JDBC DB export Web console • Monitor cluster health, jobs, etc. • Add / remove nodes • Start / stop services • Inspect job status • Inspect workflow status • Deploy applications • Launch apps / jobs • Work with distrib file system • Work with spreadsheet Interface • Support REST-based API • ... Eclipse tools HDFS Streams DB2 Netezza R Guardium DataStage Cognos BI IBM Text analytics MapReduce programming Jaql, Hive, Pig development BigSheets plug-in development • Oozie workflow generation • • • • © 2014 IBM Corporation 13 6/30/2014 BigInsights and Text Analytics Distills structured info from unstructured text – Sentiment analysis – Consumer behavior – Illegal or suspicious activities –… Parses text and detects meaning with annotators Understands the context in which the text is analyzed Features pre-built extractors for names, addresses, phone numbers, etc. – Built-in support for English, Spanish, French, German, Portuguese, Dutch, Japanese, Chinese Unstructured text (document, email, etc) Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas made the save. Winger Andres Iniesta scored for Spain for the win. Classification and Insight 27 © 2014 IBM Corporation Web Crawler Web Crawler intuitiv einsetzbar Abhängig von Breitbandanbindung zeitintensiv –Laufzeit über 3 Tage 28 © 2014 IBM Corporation 14 6/30/2014 Use Case: Erste Verarbeitung in BigSheets WebCrawler lieferte über 17.000 Pressemitteilungen Nach Filterung nur noch 65 Quartalsberichte –Innerhalb des erstellten Workbooks wurden zunächst alle HTML Seiten extrahiert, welche die Begriffe „quarter“ und „results“ enthalten. 29 © 2014 IBM Corporation Text Analytics Tooling AQL Editor Result Viewer Runtime Explain 30 © 2014 IBM Corporation 15 6/30/2014 Use Case: Entwicklung eines AQL Textanalyse Skripts create view content as extract regex /Start Whitespace .* End Whitespace/ on D.text as text from Document D; 31 © 2014 IBM Corporation Entwicklung AQL Textanalyse Skript 32 © 2014 IBM Corporation 16 6/30/2014 Use Case Entwicklung Textanalyse Skript Weitere 8 Views waren notwendig um Umsatz nach Region, Jahr und Quartal zu extrahieren. 33 © 2014 IBM Corporation Anwenden der Textanalyse Skripte America? Q4? Man beachte: Information sind nicht immer vollständig! Forschung nach Ursachen vs. Auswirkung? 34 © 2014 IBM Corporation 17 6/30/2014 IBM WATSON © 2014 IBM Corporation IBM Watson answers a grand challenge Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of information in real-time? 36 © 2014 IBM Corporation 18 6/30/2014 2011: Taking on Jeopardy! Chess – A finite, mathematically well-defined search space – Large but limited number of moves and states – Everything explicit, unambiguous mathematical rules Human Language – Ambiguous, contextual and implicit – Grounded only in human cognition – Seemingly infinite number of ways to express the same meaning 37 © 2014 IBM Corporation Keyword search In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. In May, Craig arrived in India after he celebrated his anniversary in Portugal. arrived in celebrated In May 1898 Keyword Matching Keyword Matching 400th anniversary Portugal celebrated In May Keyword Matching anniversary Keyword Matching in Portugal arrival in India explorer 38 Keyword Matching India Craig © 2014 IBM Corporation 19 6/30/2014 Finding Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. On27th 27thMay May1498, 1498,Vasco Vascoda daGama Gama On On 27th May 1498, Vasco da Gama thKappad landed Kappad Beach Onlanded the 27inin of MayBeach 1498, Vasco da landed in Kappad Beach Gama landed in Kappad Beach • Search Far and Wide • Explore many hypotheses • Find & judge evidence celebrated landed in • Many inference algorithms Portugal May 1898 Temporal Reasoning 400th anniversary 27th May 1498 Statistical Paraphrasing arrival in GeoSpatial Reasoning India Kappad Beach Vasco da Gama explorer 39 © 2014 IBM Corporation Watson won Jeopardy, but … the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning the Big Bang moment, we were creatures of the cosmic ocean” Louis Pasteur FATHERLY NICKNAMES This Frenchman was "The Father of Bacteriology" 40 How Tasty Was My Little Frenchman © 2014 IBM Corporation 20 6/30/2014 41 © 2014 IBM Corporation Watson Workload Optimized System in 2011 • 90 x IBM Power 7501 servers • 2880 POWER7 cores • POWER7 3.55 GHz chip • 500 GB per sec on-chip bandwidth • 10 Gb Ethernet network • 16 Terabytes of memory • 20 Terabytes of disk storage • Can operate at 80 Teraflops • Runs IBM DeepQA software • Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source components • SUSE Linux performance-optimized to exploit POWER 7 systems • 10 racks include servers, networking, shared disk system, cluster controllers 1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010 42 © 2014 IBM Corporation 21 6/30/2014 What’s for Watson? Healthcare and life sciences Diagnostic Assistance Evidence-based Collaborative Medicine “In healthcare, we talk about turning data into knowledge. Technical support: help-desk, call centers Joe Jasinski Program Director, IBM Healthcare and Life Sciences Research That’s really what Watson does.” Enterprise knowledge management and business intelligence Government citizen services 43 © 2014 IBM Corporation BIG DATA IST NICHT NUR HADOOP © 2014 IBM Corporation 22 6/30/2014 Ohne Analytics ist BigData Einfach nur ein Sack voll Daten MYTH: Big Data geht nur um MEHR Daten MYTH: Big Data = Hadoop ... fertig MYTH: Big Data ersetzt alles Vorhandene, Tot dem RDBMS und keinerlei Governance MYTH: NoSQL = no SQL ... niemals 45 MYTH: Big Data sind unstrukturierte Daten und nur für Meinungsanalysen How are leading companies transforming their data and analytics environment? Big Data Hadoop ≠ “There’s a belief that if you want big data, you need to go out and buy Hadoop and then you’re pretty much set. People shouldn’t get ideas about turning off their relational systems and replacing them with Hadoop… As we start thinking about big data from the perspective of business needs, we’re realizing that Hadoop isn’t always the best tool for everything we need to do, and that using the wrong tool can sometimes be painful.” Ken Rudin Head of Analytics at Facebook 46 © 2014 IBM Corporation 23 6/30/2014 Big Data is about more than just Hadoop … Data may be structured, un-structured, static, in-flight (or all of above) Data at rest Huge volumes of data on disk Structured or semi-structured May or may not have schemas Too large for traditional tools Need to process in place Data in Motion In-flight, frequently not stored Tremendous velocity, high bandwidth Diverse data sources Frequently unstructured, semi-structured Ultra low-latency processing required 47 © 2014 IBM Corporation InfoSphere Streams delivers analytics for data in-motion Real time delivery ICU Monitoring • Scale-out architecture for massive linear scalability • Sophisticated analytics with pre-built toolkits & accelerators • Comprehensive development tools to build applications with minimal learning Algorithmic Trading Cyber Security Millions of events per second Environment Monitoring Powerful Analytics Government / Law enforcement Telco Churn Prediction Smart Grid Microsecond Latency Traditional / Non-traditional data sources Video, audio, networks, social media, etc 48 © 2014 IBM Corporation 24 6/30/2014 New Architecture to Leverage All Data and Analytics Real-time Analytics Streams Data in Motion Intelligence Analysis Video/Audio Network/Sensor Entity Analytics Predictive Information Ingestion and Operational Information Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Landing Area, Analytics Zone and Archive Stream Processing Data Integration Master Data Data at Rest Decision Management BI and Predictive Analytics Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Data in Many Forms Navigation and Discovery Information Governance, Security and Business Continuity 49 © 2014 IBM Corporation How are leading companies transforming their data and analytics environment? Big Data Landing zone eco-system 5 Watson Foundations Real-time processing & analytics Data Types Machine and sensor data Operational systems Exploration, landing and archive Image and video 1 Enterprise content Transaction and application data 3 Trusted data Advanced workload management & multi-tenancy Enhanced, flexible storage management (GPFS) Enhanced data access (BigSQL, Search) Analytics accelerators & visualization Enterprise-ready security framework in Motion 2 Data Enterprise class stream processing & analytics 50 Decision management Predictive analytics and modeling Reporting & interactive analysis Reporting, analysis, content analytics 3 3 Discovery and exploration Information Integration & Governance than Hadoop 1 More Greater resiliency and recoverability Actionable Insight 3 3 Third-party data 3 Deep analytics & modeling 3 Social data 2 4 Everywhere 3 Analytics Richest set of analytics capabilities Ability to analyze data in place Everywhere 4 Governance Complete integration & governance capabilities Ability to govern all data where ever it is Portfolio 5 Complete End-to-end capabilities to address all needs Ability to grow and address future needs Remains open to work with existing investments © 2014 IBM Corporation 25 6/30/2014 Why SQL on Hadoop? Hadoop stores large volumes and varieties of data SQL gets information and insight out of Hadoop SQL leverages existing IT skills resulting in quicker time to value and lower cost © 2014 IBM Corporation 51 SQL on Hadoop and Hive • Hadoop can process data of any kind (as long as it's splittable, etc) • A very common scenario: • Tabular data • Programs that “query” the data • Java Hadoop APIs are the wrong tool for this • Too low level, steep learning curve • Require strong programming expertise • Universally accepted solution: SQL • Enter Hive ... 1. Impose relational structure on plain files 2. Translate SELECT statements to MapReduce jobs 3. Hide all the low level details 52 © 2014 IBM Corporation 26 6/30/2014 Big SQL 3.0 SQL-based Application Comprehensive SQL functionality – IBM SQL/PL support, including… – Stored procedures (SQL bodied and external) – Functions (SQL bodied and external) – IBM Data Server JDBC and ODBC drivers IBM data server client Leverages advanced IBM SQL compiler/runtime – High performance native (C++) runtime Big SQL Engine SQL MPP Run-time Replaces Map/Reduce – Advanced message passing runtime – Data flows between nodes without requiring persisting intermediate results – Continuous running daemons – Advanced workload management allows resources to remain constrained Data Sources – Low latency, high throughput… CSV Seq Parquet RC Avro ORC JSON Custom InfoSphere BigInsights 53 © 2014 IBM Corporation Big R “End-to-end integration of R into IBM BigInsights” 1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm R Clients Pull data (summaries) to R client R Packages 2. Scale out R • • • • Partitioning of large data (“divide”) Parallel cluster execution of pushed down R code (“conquer”) All of this from within the R environment (Jaql, Map/Reduce are hidden from you Almost any R package can run in this environment 1 Data Sources 3 2 3. Scalable machine learning • A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R Scalable Statistic s Engine Or, push R functions right on the data R Packages Embedded R Execution 54 © 2014 IBM Corporation 27 6/30/2014 Why names are difficult? There are no consistent standards for names. Some countries mandate certain standards but they differ from country to country, and most countries have no standards. Names can contain a variety of OPTIONAL information that can make the same name appear very differently. Ben Al Haden (Anglo) Bin Al-Hadin (son of somebody who came from the city of Hadin) Bin Al Hadin (son of Hadin) Bint Ali Hadin Renato Loffreda Mancinelli = Renato Mancinelli <> Renato Loffreda Using the anglo rules 55 © 2014 IBM Corporation IBM InfoSphere Identity Insight Solutions Commercially available Identity Analytics and Relationship Detection software Who Is Who Identity Insight 3 Key Functionalities: – Who is who? No matter how hard they try to hide – Who knows who? The infamous hiding behind the innocuous – Who does what? Alerts you when bad guys do bad things – ? ?? ??? Who Knows Who Who Does What Entity Analytics is a methodical process of detecting like and related entities across large, sparse, and disparate collections of data, that is both new and old, internal and external, using advanced techniques to establish connections that are not obvious. 56 © 2014 IBM Corporation 28 6/30/2014 BIG DATA ANALYTICS BEISPIELE © 2014 IBM Corporation Predictive Maintenance bei Union Pacific Predictive analytics help Union Pacific to predict certain derailments days or even weeks before they are likely to occur. Using thermometers, acoustic and visual sensors on the underside of each of its rail carriages they can detect and analyse imminent problems with tracks and wheels. In order for all the data to be transmitted over the vast rail system, they have deployed a fibre optic communications network throughout its rail system. Although a train derailment does not have to be a large accident, small errors can result in vast delays and with 3.350 trains operational on any given day this can become very expensive. 58 © 2014 IBM Corporation 29 6/30/2014 Smarter Farming Claas Landmaschinen: Landmaschinenhersteller arbeiten unterdessen an der Vernetzung von Maschinen und Daten, an Strategien des Data-Mining. Bodendaten, Ertragsdaten, Verbrauchsdaten, Wetterdaten, sie werden zum Rohstoff eines umfassenden Expertensystems. Landwirtschaft 4.0 nennen das die Experten – eine Parallele zur Industrie 4.0, in der Maschinen und Werkstücke miteinander kommunizieren. 365FarmNet nennt das Claas und holt mit Erfolg auch die Konkurrenz auf diese erste universelle Managementplattform. 59 © 2014 IBM Corporation Handel Luxottica nutzt statistische Methoden auf einem Verhaltensmodell, um Kunden über Identitäten hinweg zu segmentieren und bewerten. 10% improvement in marketing effectiveness 100 million customers can be down-selected to the highest value individuals Target individual customers based on unique preferences and histories Solution Components • • • • 60 Customer Intelligence Appliance Software Twin Fin 12 PDA IBM Campaign IBM Enterprise Marketing Operations Business Challenge: Luxottica, the eyewear giant with nearly 100 million customers in eight house brands on the company’s numerous websites and in retail stores, generates massive amounts of data, the majority of which was housed and managed by outside data and marketing vendors. Lacking a holistic understanding and view of the customers, marketers struggled to nurture customer relationships, seize cross-sell and up-sell opportunities, personalize campaigns and acquire new customers during the shopping process. The Smarter Solution: After a successful proof of concept, the company is deploying an advanced Customer Intelligence analytics appliance, built on a high-performance platform that integrates online and physical customer data from multiple sources. The resulting 360-degree omni-channel customer view will not only help the retailer identify its most profitable sales channels, but also segment, track and score customers down to the persona level based on thousands of behavioral attributes, and refine and personalize marketing campaigns. “The results of the POC were eye-opening, revealing unprecedented and actionable insight into omni-channel customers we had never seen or analyzed before.” —Chief Digital Officer © 2014 IBM Corporation 30 6/30/2014 Optimizing capital investments based on double digit Petabyte analysis Model the weather to optimize placement of turbines, maximizing power generation for their client and longevity (warranty optimization) Needed more data in richer models (adding hundreds of variables) Perspective: If you were to replay the Vestas Wind library, you would be sitting down to watch 70 years of TV in HD http://www.youtube.com/watch?v=Z4xkA4Qye5I 61 © 2014 IBM Corporation Neonatal Care http://www.youtube.com/watch?v=cc8UV3Tcsfg InfoSphere Streams Low Latency Analytics for streaming data • Multiple devices are attached to the baby or humidicrib • Medical devices output via serial port in a range of formats • Indicative readings are recorded on paper every 30 or 60 minutes • Cost of care per baby is approx $100-150K not including morbidity related care 62 © 2014 IBM Corporation 31 6/30/2014 Wir essen mehr Süsses, wenn es regnet Wetterabhängige Absatzprognosen für eine Großbäckerei Selbstlernender Regelkreis Verbesserte Produkt- und Service verfügbarkeit -30% Kaufverhalten Data Mining Retouren hoch präzise AbsatzPrognosemodelle Spart 2-3 Arbeitsstunden pro Woche und Filiale Präzisere Produktionsplanung Punktgenaue Wettervorhersage für jede Filiale Abfallvermeidung Umweltschutz 63 © 2014 IBM Corporation Optimierung der Einsatzplanung Sixt Autovermietung Standort A Standort B Standort C Standort D Modeller Kundenverhalten Überbuchung zur besseren Auslastung 64 FahrzeugVerfügbarkeit Voraussage No-Show Fahrzeugbuchungen „No Show“-Kunden erschweren Einsatzplanung Optimierte Einsatzplanung Stillstände vermeiden Ohne Eingriff in Prozesse oder Infrastruktur © 2014 IBM Corporation 32 6/30/2014 Prävention bei Mehrfach- und Intensivtätern Kriminalistisch-Kriminologische Forschungsstelle des Hessischen Landes-Kriminal-Amts Vollerhebung Biografien von Mehrfachund Intensivtätern Clusteranalyse Ableitung passender Maßnahmen Handlungsrelevantes Wissen Prävention 65 © 2014 IBM Corporation The 5 Key Use Cases Big Data Exploration Find, visualize, understand all big data to improve decision making 66 Enhanced 360o View of the Customer Security/Intelligence Extension Extend existing customer views by incorporating additional internal and external information sources Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Data Warehouse Augmentation Analyze a variety of machine data for improved business results Integrate big data and data warehouse capabilities to increase operational efficiency © 2014 IBM Corporation 33 6/30/2014 We can take the same use cases further with big data solutions Financial Services Fraud detection Risk management 360° View of the Customer Transportation Weather and traffic impact on logistics and fuel consumption Health & Life Sciences Epidemic early warning system ICU monitoring Remote healthcare monitoring Telecommunications CDR processing Churn prediction Geomapping / marketing Network monitoring 67 67 Utilities Weather impact analysis on power generation Transmission monitoring Smart grid management IT Transition log analysis for multiple transactional systems Cybersecurity Retail 360° View of the Customer Click-stream analysis Real-time promotions Law Enforcement Real-time multimodal surveillance Situational awareness Cyber security detection © 2014 IBM Corporation WHY INFRATRUCTURE MATTERS © 2014 IBM Corporation 34 6/30/2014 Access Matters Speed Matters Availability Matters To get new levels of visibility into customers and operations To accelerate insights in real-time at the point of impact To consistently deliver insights to the people and processes that need them Infrastructure must enable shared and secured access to all relevant data, no matter it’s type or where it resides. Infrastructure must build intelligence into operational events and transactions. Infrastructure must maximize the availability of information and insights at the point of impact. © 2014 IBM Corporation 69 Herausforderungen an Big Data Analytics Projekte ZWISCHEN WUNSCH UND REALITÄT 70 © 2014 IBM Corporation 35 6/30/2014 FRAGEN? 71 © 2014 IBM Corporation 36