Big Data Analytics - Zwischen Wunsch und Realität

Transcrição

Big Data Analytics - Zwischen Wunsch und Realität
6/30/2014
Big Data Analytics Zwischen Wunsch und Realität
© 2014 IBM Corporation
Dr. Wolfgang Rother
IBM Deutschland GmbH
Nahmitzer Damm 12
12277 Berlin
Email: [email protected]
© 2014 IBM Corporation
1
6/30/2014
Agenda
•
•
•
•
•
•
•
•
•
Über Daten
Paradigmenwechsel
Apache Hadoop
Ein einfaches Beispiel für Text Analytics
IBM Watson
Big Data ist nicht nur Hadoop
Weitere Big Data Analytics Beispiele
Why Infrastructure Matters
Zwischen Wunsch und Realität
3
How Big is the Internet of Things?
4
© 2014 IBM Corporation
2
6/30/2014
The
10
A
They
major
million
meters
read
gas
the
and
meters
read
meters
electricread
utilityevery
has 15 minutes =
Now,
they
’smart
reare
installing
10
once
million
abillion
an
month.
hour.
meters.
smart
350
meters.
transactions a year.
© 2014 IBM Corporation
5
The Big Data Conundrum
The percentage of available data an enterprise can analyze is
decreasing
This means enterprises are getting “more naive” over time
Data AVAILABLE to
an organization
Data an organization
can PROCESS
6
© 2014 IBM Corporation
3
6/30/2014
The Four V’s
Volume
Use greater amounts of data
Variety
Use more types of data
Velocity
Use data more quickly
Veracity
Use uncertain data
7
© 2014 IBM Corporation
Big Data is All Data and All Paradigms
Transactional &
Application Data
8
Machine Data
Social Data
Enterprise
Content
• Volume
• Velocity
• Variety
• Variety
• Structured
• Structured
• Unstructured
• Unstructured
• Throughput
• Ingestion
• Veracity
• Volume
© 2014 IBM Corporation
4
6/30/2014
PARADIGMENWECHSEL
© 2014 IBM Corporation
How is Big Data transforming the way organizations analyze information and generate actionable insights?
Paradigm shifts enabled by big data
Leverage more of the data being captured
TRADITIONAL APPROACH
All available
information
Analyze small subsets
of information
10
BIG DATA APPROACH
Analyzed
information
All available
information
analyzed
Analyze
all information
© 2014 IBM Corporation
5
6/30/2014
How is Big Data transforming the way organizations analyze information and generate actionable insights?
Paradigm shifts enabled by big data
Reduce effort required to leverage data
TRADITIONAL APPROACH
BIG DATA APPROACH
Small
amount of
carefully
organized
information
Carefully cleanse information
before any analysis
Large
amount of
messy
information
Analyze information as is,
cleanse as needed
© 2014 IBM Corporation
11
How is Big Data transforming the way organizations analyze information and generate actionable insights?
Paradigm shifts enabled by big data
Data leads the way – and sometimes correlations are good enough
TRADITIONAL APPROACH
Hypothesis
Question
Data
Exploration
Answer
Data
Insight
Correlation
Start with hypothesis and
test against selected data
12
BIG DATA APPROACH
Explore all data and
identify correlations
© 2014 IBM Corporation
6
6/30/2014
How is Big Data transforming the way organizations analyze information and generate actionable insights?
Paradigm shifts enabled by big data
Leverage data as it is captured
TRADITIONAL APPROACH
BIG DATA APPROACH
Data
Analysis
Data
Repository
Analysis
Insight
Insight
Analyze data after it’s been processed
and landed in a warehouse or mart
Analyze data in motion as it’s
generated, in real-time
© 2014 IBM Corporation
13
APACHE HADOOP
© 2014 IBM Corporation
7
6/30/2014
It’s easy to forget just how “big” the data really is!
Datasets are vast
Facebook daily logs ~ 60 TB
1,000 genomes project ~ 200 TB
Google web index ~ 10+ PB
Storage is cheap
Cost of a commodity 1TB drive ~ $50
A terabyte is still a lot of data!
Time to read 1TB from a single disk:
~ 6 hours @ 50 MB/second !!
As data gets big, traditional approaches no longer work
Distributed systems are the only way to scale
15
15
© 2014 IBM Corporation
What is Hadoop?
Apache Hadoop = free, open source framework for data-intensive
applications
– Inspired by Google technologies (MapReduce, GFS)
– Well-suited to batch-oriented, read-intensive applications
– Originally built to address scalability problems of Nutch, an open source Web search
technology
Enables applications to work with thousands of nodes and petabytes of
data in a highly parallel, cost effective manner
– CPU + disks of commodity box = Hadoop “node”
– Boxes can be combined into clusters
– New nodes can be added as needed without changing
• Data formats
• How data is loaded
• How jobs are written
16
© 2014 IBM Corporation
8
6/30/2014
How files are stored: HDFS
• Key ideas:
• Divide big files in blocks and store blocks randomly across cluster
• Provide API to ask: where are the pieces of this file?
• => Programs can be shipped to nodes for parallel distributed processing
10110100
10100100
11100111
11100101
00111010
01010010
11001001
01010011
00010100
10111010
11101011
11011011
01010110
10010101
00101010
10101110
01001101
01110100
1
Cluster
2
Blocks
3
3
1
2
2
4
1
4
2
4
1
3
4
3
Logical File
17
© 2014 IBM Corporation
HDFS stores data across multiple nodes
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
18
© 2014 IBM Corporation
9
6/30/2014
HDFS assumes nodes will fail, so it achieves reliability by replicating data
across multiple nodes
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
© 2014 IBM Corporation
19
How Files are Processed: MapReduce
• Common pattern in data processing: apply a function, then aggregate
grep "World Cup” *.txt | wc –l
• User simply writes two pieces of code: “mapper” and “reducer”
• Mapper code executes on every split of every file
• Reducer consumes/aggregates mapper outputs
• The Hadoop MR framework takes care of the rest (resource
allocation, scheduling, coordination, temping of intermediate results,
storage of final result on HDFS)
Cluster
10110100
10100100
11100111
11100101
00111010
01010010
11001001
01010011
00010100
10111010
11101011
11011011
01010110
10010101
1
Splits
2
3
20
Logical File
2
1
Map
3
Map
Reduce
Map
Result
© 2014 IBM Corporation
10
6/30/2014
Logical MapReduce Example: Word Count
Content of Input Documents
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
21
Hello World Bye World
Hello IBM
Map 1 emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
Map 2 emits:
< Hello, 1>
< IBM, 1>
Reduce (final output):
< Bye, 1>
< IBM, 1>
< Hello, 2>
< World, 2>
© 2014 IBM Corporation
WordCount
22
© 2014 IBM Corporation
11
6/30/2014
So What Does This Result In?
Easy To Scale
Fault Tolerant and Self-Healing
Data Agnostic
Extremely Flexible
BUT you need programming skills
23
© 2014 IBM Corporation
EIN EINFACHES BEISPIEL FÜR
TEXTANALYSE
Aus einer Bachelorarbeit Wirtschaftsinformatik FH Brandenburg
© 2014 IBM Corporation
12
6/30/2014
Use Case: IBM Quartalsberichte
Ziel: Lösung eines Big Data Textanalyse Problems ohne
Expertenhilfe oder spezielle Ausbildung
Umgebung:
–IBM POWER 7R2 Server
–RHEL 6.2
–IBM Infosphere BigInsights 2.0
Ablauf
–Laden von Pressemitteilungen mittels Webcrawler
–Erste Verarbeitung in BigSheets
–Entwicklung von Textanalyse Skripts
–Anwenden der Skripte
25
© 2014 IBM Corporation
BigInsights Enterprise Edition
Optional
IBM and
partner
offerings
Analytics and discovery
Text
processing
engine and
library
Accelerator for
social data
analysis
BigSheets
Accelerator for
machine data
analysis
Infrastructure
Integrated
installer
Text compression
Enhanced
security
Open Source
“Apps”
Web Crawler
Boardreader
Distrib file
copy
...
Flume
26
Data
processing
Pig
HBase
Hive
GPFS (EAP)
Adaptive
MapReduce
MapReduce
HCatalog
Connectivity and Integration
Data Explorer
Machine
learning
Jaql
Lucene
Administrative and
development tools
Ad hoc query
Oozie
Indexing
Sqoop
DB import
ZooKeeper
Flexible
scheduler
JDBC
DB export
Web console
• Monitor cluster health, jobs,
etc.
• Add / remove nodes
• Start / stop services
• Inspect job status
• Inspect workflow status
• Deploy applications
• Launch apps / jobs
• Work with distrib file system
• Work with spreadsheet
Interface
• Support REST-based API
• ...
Eclipse tools
HDFS
Streams
DB2
Netezza
R
Guardium
DataStage
Cognos BI
IBM
Text analytics
MapReduce programming
Jaql, Hive, Pig development
BigSheets plug-in
development
• Oozie workflow generation
•
•
•
•
© 2014 IBM Corporation
13
6/30/2014
BigInsights and Text Analytics
Distills structured info from
unstructured text
– Sentiment analysis
– Consumer behavior
– Illegal or suspicious activities
–…
Parses text and detects
meaning with annotators
Understands the context in
which the text is analyzed
Features pre-built extractors
for names, addresses, phone
numbers, etc.
– Built-in support for English, Spanish,
French, German, Portuguese, Dutch,
Japanese, Chinese
Unstructured text (document, email, etc)
Football World Cup 2010, one team
distinguished themselves well, losing to the
eventual champions 1-0 in the Final. Early in
the second half, Netherlands’ striker, Arjen
Robben, had a breakaway, but the keeper for
Spain, Iker Casillas made the save. Winger
Andres Iniesta scored for Spain for the win.
Classification and Insight
27
© 2014 IBM Corporation
Web Crawler
Web Crawler intuitiv einsetzbar
Abhängig von Breitbandanbindung zeitintensiv
–Laufzeit über 3 Tage
28
© 2014 IBM Corporation
14
6/30/2014
Use Case: Erste Verarbeitung in BigSheets
WebCrawler lieferte über 17.000 Pressemitteilungen
Nach Filterung nur noch 65 Quartalsberichte
–Innerhalb des erstellten Workbooks wurden zunächst alle HTML Seiten
extrahiert, welche die Begriffe „quarter“ und „results“ enthalten.
29
© 2014 IBM Corporation
Text Analytics Tooling
AQL Editor
Result Viewer
Runtime Explain
30
© 2014 IBM Corporation
15
6/30/2014
Use Case: Entwicklung eines AQL Textanalyse Skripts
create view content as extract
regex /Start Whitespace .* End Whitespace/
on D.text as text
from Document D;
31
© 2014 IBM Corporation
Entwicklung AQL Textanalyse Skript
32
© 2014 IBM Corporation
16
6/30/2014
Use Case Entwicklung Textanalyse Skript
Weitere 8 Views waren notwendig um Umsatz nach Region, Jahr und Quartal zu extrahieren.
33
© 2014 IBM Corporation
Anwenden der Textanalyse Skripte
America?
Q4?
Man beachte: Information sind nicht immer vollständig!
Forschung nach Ursachen vs. Auswirkung?
34
© 2014 IBM Corporation
17
6/30/2014
IBM WATSON
© 2014 IBM Corporation
IBM Watson answers a grand challenge
Can we design a computing system that rivals a human’s ability to answer
questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
36
© 2014 IBM Corporation
18
6/30/2014
2011: Taking on Jeopardy!
Chess
– A finite, mathematically well-defined search space
– Large but limited number of moves and states
– Everything explicit, unambiguous mathematical rules
Human Language
– Ambiguous, contextual and implicit
– Grounded only in human cognition
– Seemingly infinite number of ways
to express the same meaning
37
© 2014 IBM Corporation
Keyword search
In May 1898 Portugal celebrated
the 400th anniversary of this
explorer’s arrival in India.
In May, Craig arrived in
India after he celebrated his
anniversary in Portugal.
arrived in
celebrated
In May
1898
Keyword Matching
Keyword Matching
400th
anniversary
Portugal
celebrated
In May
Keyword Matching
anniversary
Keyword Matching
in Portugal
arrival in
India
explorer
38
Keyword Matching
India
Craig
© 2014 IBM Corporation
19
6/30/2014
Finding Deeper Evidence
In May 1898 Portugal celebrated
the 400th anniversary of this
explorer’s arrival in India.
On27th
27thMay
May1498,
1498,Vasco
Vascoda
daGama
Gama
On
On 27th May 1498, Vasco da Gama
thKappad
landed
Kappad
Beach
Onlanded
the 27inin
of
MayBeach
1498, Vasco da
landed in Kappad Beach
Gama landed in Kappad Beach
• Search Far and Wide
• Explore many hypotheses
• Find & judge evidence
celebrated
landed in
• Many inference algorithms
Portugal
May 1898
Temporal
Reasoning
400th anniversary
27th May 1498
Statistical
Paraphrasing
arrival in
GeoSpatial
Reasoning
India
Kappad Beach
Vasco da Gama
explorer
39
© 2014 IBM Corporation
Watson won Jeopardy, but …
the People
THE AMERICAN DREAM
Decades before Lincoln, Daniel Webster spoke of government "made
for", "made by" & "answerable to" them
No One
Apollo 11 moon landing
MILESTONES
In 1994, 25 years after this event, 1 participant said, "For one crowning
the Big Bang
moment, we were creatures of the cosmic ocean”
Louis Pasteur
FATHERLY NICKNAMES
This Frenchman was "The Father of Bacteriology"
40
How Tasty Was My
Little Frenchman
© 2014 IBM Corporation
20
6/30/2014
41
© 2014 IBM Corporation
Watson Workload Optimized System in 2011
• 90 x IBM Power 7501 servers
• 2880 POWER7 cores
• POWER7 3.55 GHz chip
• 500 GB per sec on-chip bandwidth
• 10 Gb Ethernet network
• 16 Terabytes of memory
• 20 Terabytes of disk storage
• Can operate at 80 Teraflops
• Runs IBM DeepQA software
• Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop
open source components
• SUSE Linux performance-optimized to exploit POWER 7 systems
• 10 racks include servers, networking, shared disk system, cluster controllers
1 Note that the Power 750 featuring POWER7 is a commercially available
server that runs AIX, IBM i and Linux and has been in market since Feb 2010
42
© 2014 IBM Corporation
21
6/30/2014
What’s for Watson?
Healthcare and life sciences
Diagnostic Assistance
Evidence-based
Collaborative Medicine
“In healthcare, we talk about
turning data into knowledge.
Technical support: help-desk,
call centers
Joe Jasinski
Program Director,
IBM Healthcare and Life Sciences
Research
That’s really what Watson does.”
Enterprise knowledge management
and business intelligence
Government citizen services
43
© 2014 IBM Corporation
BIG DATA IST NICHT NUR HADOOP
© 2014 IBM Corporation
22
6/30/2014
Ohne Analytics ist
BigData
Einfach nur ein
Sack voll Daten
MYTH: Big Data geht nur um MEHR Daten
MYTH: Big Data = Hadoop ... fertig
MYTH: Big Data ersetzt alles Vorhandene, Tot dem RDBMS und keinerlei
Governance
MYTH: NoSQL = no SQL ... niemals
45
MYTH: Big Data sind unstrukturierte Daten und nur für Meinungsanalysen
How are leading companies transforming their data and analytics environment?
Big Data
Hadoop
≠
“There’s a belief that if you want big data, you need to go out and buy Hadoop
and then you’re pretty much set. People shouldn’t get ideas about turning off
their relational systems and replacing them with Hadoop…
As we start thinking about big data from the perspective of business needs,
we’re realizing that Hadoop isn’t always the best tool for everything we need to
do, and that using the wrong tool can sometimes be painful.”
Ken Rudin
Head of Analytics at Facebook
46
© 2014 IBM Corporation
23
6/30/2014
Big Data is about more than just Hadoop …
Data may be structured, un-structured, static, in-flight (or all of above)
Data at rest
Huge volumes of data on disk
Structured or semi-structured
May or may not have schemas
Too large for traditional tools
Need to process in place
Data in Motion
In-flight, frequently not stored
Tremendous velocity, high bandwidth
Diverse data sources
Frequently unstructured, semi-structured
Ultra low-latency processing required
47
© 2014 IBM Corporation
InfoSphere Streams delivers analytics for data in-motion
Real time delivery
ICU
Monitoring
• Scale-out architecture
for massive linear scalability
• Sophisticated analytics
with pre-built toolkits & accelerators
• Comprehensive development tools
to build applications with minimal
learning
Algorithmic
Trading
Cyber
Security
Millions of
events per
second
Environment
Monitoring
Powerful
Analytics
Government /
Law
enforcement
Telco Churn
Prediction
Smart
Grid
Microsecond
Latency
Traditional / Non-traditional
data sources
Video, audio, networks, social media, etc
48
© 2014 IBM Corporation
24
6/30/2014
New Architecture to Leverage All Data and Analytics
Real-time
Analytics
Streams
Data in
Motion
Intelligence
Analysis
Video/Audio
Network/Sensor
Entity Analytics
Predictive
Information
Ingestion and
Operational
Information
Exploration,
Integrated
Warehouse,
and Mart Zones
Discovery
Deep Reflection
Operational
Predictive
Landing Area,
Analytics Zone
and Archive
Stream Processing
Data Integration
Master Data
Data at
Rest
Decision
Management
BI and Predictive
Analytics
Raw Data
Structured Data
Text Analytics
Data Mining
Entity Analytics
Machine Learning
Data in
Many Forms
Navigation
and Discovery
Information Governance, Security and Business Continuity
49
© 2014 IBM Corporation
How are leading companies transforming their data and analytics environment?
Big Data Landing zone eco-system
5
Watson Foundations
Real-time processing & analytics
Data Types
Machine and
sensor data
Operational
systems
Exploration,
landing and
archive
Image and video
1
Enterprise
content
Transaction and
application data
3
Trusted data
Advanced workload management & multi-tenancy
Enhanced, flexible storage management (GPFS)
Enhanced data access (BigSQL, Search)
Analytics accelerators & visualization
Enterprise-ready security framework
in Motion
2 Data
Enterprise class stream processing & analytics
50
Decision
management
Predictive analytics
and modeling
Reporting &
interactive
analysis
Reporting, analysis,
content analytics
3
3
Discovery and
exploration
Information Integration & Governance
than Hadoop
1 More
Greater resiliency and recoverability
Actionable Insight
3
3
Third-party data
3
Deep
analytics &
modeling
3
Social data
2
4
Everywhere
3 Analytics
Richest set of analytics capabilities
Ability to analyze data in place
Everywhere
4 Governance
Complete integration & governance capabilities
Ability to govern all data where ever it is
Portfolio
5 Complete
End-to-end capabilities to address all needs
Ability to grow and address future needs
Remains open to work with existing investments
© 2014 IBM Corporation
25
6/30/2014
Why SQL on Hadoop?
Hadoop stores
large volumes and
varieties of data
SQL gets information and
insight out of Hadoop
SQL leverages existing IT
skills resulting in quicker
time to value and lower cost
© 2014 IBM Corporation
51
SQL on Hadoop and Hive
• Hadoop can process data of any kind (as long as it's splittable, etc)
• A very common scenario:
• Tabular data
• Programs that “query” the data
• Java Hadoop APIs are the wrong tool for this
• Too low level, steep learning curve
• Require strong programming expertise
• Universally accepted solution: SQL
• Enter Hive ...
1. Impose relational structure on plain files
2. Translate SELECT statements to MapReduce jobs
3. Hide all the low level details
52
© 2014 IBM Corporation
26
6/30/2014
Big SQL 3.0
SQL-based
Application
Comprehensive SQL functionality
– IBM SQL/PL support, including…
– Stored procedures (SQL bodied and external)
– Functions (SQL bodied and external)
– IBM Data Server JDBC and ODBC drivers
IBM data server
client
Leverages advanced IBM SQL compiler/runtime
– High performance native (C++) runtime
Big SQL
Engine
SQL MPP Run-time
Replaces Map/Reduce
– Advanced message passing runtime
– Data flows between nodes without
requiring persisting intermediate results
– Continuous running daemons
– Advanced workload management allows
resources to remain constrained
Data Sources
– Low latency, high throughput…
CSV
Seq
Parquet
RC
Avro
ORC
JSON
Custom
InfoSphere BigInsights
53
© 2014 IBM Corporation
Big R
“End-to-end integration of R into IBM BigInsights”
1. Explore, visualize, transform, and
model big data using familiar R
syntax and paradigm
R Clients
Pull data
(summaries) to
R client
R Packages
2. Scale out R
•
•
•
•
Partitioning of large data (“divide”)
Parallel cluster execution of pushed
down R code (“conquer”)
All of this from within the R
environment (Jaql, Map/Reduce
are hidden from you
Almost any R package can run in
this environment
1
Data Sources
3
2
3. Scalable machine learning
•
A scalable statistics engine that
provides canned algorithms, and
an ability to author new ones, all
via R
Scalable
Statistic
s Engine
Or, push R
functions
right on the
data
R Packages
Embedded R Execution
54
© 2014 IBM Corporation
27
6/30/2014
Why names are difficult?
There are no consistent standards for names. Some countries mandate certain standards but they
differ from country to country, and most countries have no standards.
Names can contain a variety of OPTIONAL information that can make the same name appear very
differently.
Ben Al Haden (Anglo)
Bin Al-Hadin (son of somebody who came from
the city of Hadin)
Bin Al Hadin (son of Hadin)
Bint Ali Hadin
Renato Loffreda Mancinelli = Renato Mancinelli <> Renato Loffreda
Using the anglo rules
55
© 2014 IBM Corporation
IBM InfoSphere Identity Insight Solutions
Commercially available Identity Analytics and
Relationship Detection software
Who Is Who
Identity Insight 3 Key Functionalities:
– Who is who? No matter how hard they try
to hide – Who knows who? The infamous hiding
behind the innocuous – Who does what? Alerts you when bad
guys do bad things –
? ?? ???
Who Knows Who
Who Does What
Entity Analytics is a methodical process of
detecting like and related entities across large,
sparse, and disparate collections of data, that is
both new and old, internal and external, using
advanced techniques to establish connections that
are not obvious.
56
© 2014 IBM Corporation
28
6/30/2014
BIG DATA ANALYTICS BEISPIELE
© 2014 IBM Corporation
Predictive Maintenance bei Union Pacific
Predictive analytics help Union Pacific to predict certain derailments days or even weeks
before they are likely to occur. Using thermometers, acoustic and visual sensors on the
underside of each of its rail carriages they can detect and analyse imminent problems with
tracks and wheels. In order for all the data to be transmitted over the vast rail system, they
have deployed a fibre optic communications network throughout its rail system. Although a
train derailment does not have to be a large accident, small errors can result in vast delays
and with 3.350 trains operational on any given day this can become very expensive.
58
© 2014 IBM Corporation
29
6/30/2014
Smarter Farming
Claas Landmaschinen:
Landmaschinenhersteller arbeiten unterdessen an der Vernetzung von Maschinen und Daten,
an Strategien des Data-Mining. Bodendaten, Ertragsdaten, Verbrauchsdaten, Wetterdaten,
sie werden zum Rohstoff eines umfassenden Expertensystems. Landwirtschaft 4.0 nennen
das die Experten – eine Parallele zur Industrie 4.0, in der Maschinen und Werkstücke
miteinander kommunizieren. 365FarmNet nennt das Claas und holt mit Erfolg auch die
Konkurrenz auf diese erste universelle Managementplattform.
59
© 2014 IBM Corporation
Handel
Luxottica nutzt statistische Methoden auf einem Verhaltensmodell,
um Kunden über Identitäten hinweg zu segmentieren und bewerten.
10% improvement
in marketing effectiveness
100 million customers
can be down-selected to the
highest value individuals
Target
individual customers based on
unique preferences and
histories
Solution Components
•
•
•
•
60
Customer Intelligence Appliance Software
Twin Fin 12 PDA
IBM Campaign
IBM Enterprise Marketing Operations
Business Challenge: Luxottica, the eyewear giant with nearly 100 million customers in
eight house brands on the company’s numerous websites and in retail stores,
generates massive amounts of data, the majority of which was housed and managed
by outside data and marketing vendors. Lacking a holistic understanding and view of
the customers, marketers struggled to nurture customer relationships, seize cross-sell
and up-sell opportunities, personalize campaigns and acquire new customers during
the shopping process.
The Smarter Solution: After a successful proof of concept, the company is deploying
an advanced Customer Intelligence analytics appliance, built on a high-performance
platform that integrates online and physical customer data from multiple sources. The
resulting 360-degree omni-channel customer view will not only help the retailer identify
its most profitable sales channels, but also segment, track and score customers down
to the persona level based on thousands of behavioral attributes, and refine and
personalize marketing campaigns.
“The results of the POC were eye-opening, revealing unprecedented and actionable
insight into omni-channel customers we had never seen or analyzed before.”
—Chief Digital Officer
© 2014 IBM Corporation
30
6/30/2014
Optimizing capital investments based on double digit Petabyte analysis
Model the weather to optimize placement of turbines, maximizing power generation
for their client and longevity (warranty optimization)
Needed more data in richer models (adding hundreds of variables)
Perspective: If you were to replay the Vestas Wind library, you would be sitting
down to watch 70 years of TV in HD
http://www.youtube.com/watch?v=Z4xkA4Qye5I
61
© 2014 IBM Corporation
Neonatal Care
http://www.youtube.com/watch?v=cc8UV3Tcsfg
InfoSphere Streams
Low Latency Analytics
for streaming data
• Multiple devices are attached to the baby or humidicrib
• Medical devices output via serial port in a range of formats
• Indicative readings are recorded on paper every 30 or 60 minutes
• Cost of care per baby is approx $100-150K not including morbidity related care
62
© 2014 IBM Corporation
31
6/30/2014
Wir essen mehr Süsses, wenn es regnet
Wetterabhängige Absatzprognosen für eine Großbäckerei
Selbstlernender
Regelkreis
Verbesserte
Produkt- und Service
verfügbarkeit
-30%
Kaufverhalten
Data Mining
Retouren
hoch präzise AbsatzPrognosemodelle
Spart 2-3
Arbeitsstunden
pro Woche und
Filiale
Präzisere
Produktionsplanung
Punktgenaue Wettervorhersage für jede Filiale
Abfallvermeidung
Umweltschutz
63
© 2014 IBM Corporation
Optimierung der Einsatzplanung
Sixt Autovermietung
Standort
A
Standort
B
Standort
C
Standort
D
Modeller
Kundenverhalten
Überbuchung zur besseren Auslastung
64
FahrzeugVerfügbarkeit
Voraussage
No-Show
Fahrzeugbuchungen
„No Show“-Kunden erschweren
Einsatzplanung
Optimierte
Einsatzplanung
Stillstände
vermeiden
Ohne Eingriff in
Prozesse oder
Infrastruktur
© 2014 IBM Corporation
32
6/30/2014
Prävention bei Mehrfach- und Intensivtätern
Kriminalistisch-Kriminologische Forschungsstelle des Hessischen Landes-Kriminal-Amts
Vollerhebung
Biografien
von Mehrfachund Intensivtätern
Clusteranalyse
Ableitung passender
Maßnahmen
Handlungsrelevantes Wissen
Prävention
65
© 2014 IBM Corporation
The 5 Key Use Cases
Big Data Exploration
Find, visualize, understand all big
data to improve decision making
66
Enhanced 360o View
of the Customer
Security/Intelligence
Extension
Extend existing customer
views by incorporating additional
internal and external information
sources
Lower risk, detect fraud and
monitor cyber security in real-time
Operations Analysis
Data Warehouse Augmentation
Analyze a variety of machine
data for improved business results
Integrate big data and data warehouse
capabilities to increase operational
efficiency
© 2014 IBM Corporation
33
6/30/2014
We can take the same use cases further with big data solutions
Financial Services
Fraud detection
Risk management
360° View of the Customer
Transportation
Weather and traffic
impact on logistics and
fuel consumption
Health & Life Sciences
Epidemic early warning
system
ICU monitoring
Remote healthcare monitoring
Telecommunications
CDR processing
Churn prediction
Geomapping / marketing
Network monitoring
67
67
Utilities
Weather impact analysis on
power generation
Transmission monitoring
Smart grid management
IT
Transition log analysis
for multiple
transactional systems
Cybersecurity
Retail
360° View of the Customer
Click-stream analysis
Real-time promotions
Law Enforcement
Real-time multimodal surveillance
Situational awareness
Cyber security detection
© 2014 IBM Corporation
WHY INFRATRUCTURE MATTERS
© 2014 IBM Corporation
34
6/30/2014
Access Matters
Speed Matters
Availability Matters
To get new levels of
visibility into customers and
operations
To accelerate insights in
real-time at the point of
impact
To consistently deliver
insights to the people and
processes that need
them
Infrastructure must
enable shared and
secured access to all
relevant data, no matter
it’s type or where it
resides.
Infrastructure must build
intelligence into
operational events and
transactions.
Infrastructure must
maximize the
availability of
information and insights
at the point of impact.
© 2014 IBM Corporation
69
Herausforderungen an Big Data Analytics Projekte
ZWISCHEN WUNSCH UND REALITÄT
70
© 2014 IBM Corporation
35
6/30/2014
FRAGEN?
71
© 2014 IBM Corporation
36