Solutions That Work - The Comprehensive Data Dictionary

Transcrição

Solutions That Work - The Comprehensive Data Dictionary
Solutions that Work
The Comprehensive Data Dictionary
Antonio Amorin
Agenda
•
•
•
•
•
•
•
Introductions
Case Study Overview
Assessment
Solution
Implementation
Deliverables
Utilization
© 2012
Biography
• Antonio Amorin
President, Data Innovations, Inc.
– Graduated from Illinois State University with a BA in
Computer Science and a minor in Economics
– Twenty years of data modeling experience
– Twelve years of data profiling experience
– 2012 ISU CAST Academy of Achievement Inductee
– Delivered data modeling and data profiling solutions
to numerous clients in the Midwest and East Coast
– Founded Data Innovations, Inc. in 2002
– Presented at CA World, Enterprise Data World,
DAMA Chapter Meetings, Modeling User Groups,
webcasts, and at client sites
© 2012
Data Innovations, Inc.
•
•
Established in 2002
Professional Services:
–
–
–
–
–
–
•
•
•
Data Modeling
Data Profiling
Data Architecture
Metadata
Database Administration
ETL
CA Service Partner in 2004
CA Commercial Reseller in
2006
CA Enterprise Solution
Provider in 2007
© 2012
Data Innovations, Inc.
•
Projects
– Data Warehousing
– Metadata Repository
– Customer Master Data
Management
– Product Master Data Management
– Outsourcing
– Data Governance
– Data Quality
– Application Development
– EDI Conversions
– Data Integration
– Software Integration
© 2012
Case Study Overview
•
•
•
•
•
•
•
•
•
Brokerage Firm
Outsourcing Mortgage
Business
Legacy Mortgage Application
Mortgage Components
Rewritten
Legacy Data Retained
Legacy Data Redeployed
Remote Development Teams
Local Maintenance Team
Limited Timeframe
© 2012
Case Study Overview
•
Legacy Mortgage Application
– Developed in the mid 90’s
– Oracle Database
•
•
•
•
•
•
•
Over 1,100 tables with more than
24,000 attributes
No primary keys
No referential integrity (foreign keys)
Naming standard identifies parent-child
relationships
Over fifteen years worth of data
Undocumented
UNIX platform
– User Interface
– Hyperion Reports
– Support
•
•
No original developers
Maintenance team
© 2012
Case Study Overview
•
Requirements
– Legacy Oracle database will be
stored in original format for audit
purposes
– Specific data will be extracted
from the legacy Oracle database
and loaded into a different
RDBMS
– The legacy data will be utilized for
data warehousing, master data
management, and other business
intelligence purposes
– Specific screens and functionality
will be developed to display the
legacy data for active mortgages
– A data model documenting all of
the tables and columns, as well as
the relationships between the
tables
© 2012
Assessment
•
Created Data Model
–
–
–
•
Profiled Data
–
–
–
•
Profiled production data from the legacy
database
Performed a data quality assessment
Validated parent-child relationships between
tables
Reviewed Interface
–
–
•
Reverse engineered the database into a
combined logical-physical data model
Created reports to identify all of the tables and
columns
Analyzed the data model and reports to
determine the quality of the metadata
Reviewed the legacy screens for the mission
critical data
Discussed the use of the mission critical data
with the business analysts
Development Team
–
Met with the development team to understand
their data requirements
© 2012
Solution
•
The Comprehensive Data
Dictionary
–
–
–
–
–
Reverse engineer the legacy Oracle
database into a combined logicalphysical data model
Utilize the descriptions and comments
at the column level to capture the
detailed metadata identified during the
analysis
Derive a logical data model from the
combined logical-physical data model
Create the parent-child relationships
between the tables and use the
relationship description to capture the
analysis identifying the columns
involved in the relationship
Generate the custom data dictionary
report in PDF format
© 2012
Solution
•
Process Overview
–
–
–
–
–
–
–
–
–
–
Identify the tables to be analyzed based on
the development teams’ needs
Create waves of 200-250 tables in each wave
Profile the tables identified for the current
wave
Generate spreadsheets from the profiling
results to capture the inferred metadata and
the analysis results
Perform a data quality assessment against
each column to identify data anomalies and
other data content issues
Perform basic analysis to identify the
descriptions, data rules, and business rules
Perform detailed analysis for the data quality
problems identified
Perform cross-table analysis to identify the
parent tables for each table in the wave and
update the logical data model
Using the .csv file from the Bulk Editor,
capture the analysis results in the combined
logical-physical data model
Generate custom data dictionary report in
PDF format
© 2012
Solution
•
Data Profiling
– Perform column level analysis
against all of the tables identified
for the current wave
– Generate spreadsheets containing
the inferred metadata
•
•
•
•
•
•
•
•
Cardinality
Range
Mode
Sparse
Null Count
Value Frequencies
Pattern Frequencies
Length Frequencies
– Leverage the profiling results for
analysis as necessary
– Archive the profiling results for
later reference
© 2012
Solution
•
Data Quality Assessment
– Analyze the profiling results to
identify anomalies in the data
content
– Validate the metadata reflects the
data content properly
– Update the analysis summary to
reflect the results of the
assessment
•
Basic Analysis
– Document the definition for the
column
– Identify the business rules
– Identify the data rules
– Indicate if the content is a code,
indicator, flag, or type
© 2012
Solution
• Data Profiling
– Cross-Table Analysis
• Create data sources for
each table and include all
of the potential parent
tables
• Execute the primaryforeign key profiler
• Validate each relationship
at a column level by
creating the relationship
manually
• Identify any orphaned
rows and capture them in
the analysis spreadsheet
© 2012
Solution
© 2012
Solution
•
Update the Logical Only Data
Model
– Create the parent-child
relationships for each table
– Update the relationship
descriptions with the analysis
results
– Create a subject area for each
table
– Add all of the parent tables to the
subject area
– Generate data model picture to
include in the data dictionary
report
© 2012
Solution
• Upload Analysis Results
– Utilize the Bulk Editor to
create a .csv file from the
logical-physical data model
– Cut and paste the analysis
results into the .csv file
– Import the .csv file back
into the logical-physical
data model
© 2012
Solution
• Report Deliverables
– Create a subject area for each
table in the wave in the logical
data model
– Add the table and the parent
tables to the subject area
– Generate a picture of the subject
area’s diagram
– Add the generated picture to the
customized data dictionary report
– Generate the customized data
dictionary report for the specific
table in the logical-physical data
model
– Save the report in PDF format and
move to the SharePoint site
© 2012
Implementation
• Data Modeling
– Install CA ERwin Data
Modeler Workgroup Edition
– Create the workgroup
repository
– Initialize repository
– Define users to repository
• Data Profiling
– Install profiling software
– Create all connections to
the data sources
– Create projects and related
objects for profiling effort
© 2012
Implementation
•
Project plan
– Create detailed project plan to
track each task for each wave of
tables for each resource
– Assign specific tables to
resources
– Integrate the project tracking into
the analysis workflow
•
Training
– Software training for the data
modelers, database
administrators, and other technical
resources
– Analysis training for business and
technical resources
© 2012
Deliverables
© 2012
Deliverables
© 2012
Deliverables
© 2012
Deliverables
© 2012
Utilization
• Application Development
– Develop screens and
programs to capture and
display the data content
• Data Warehousing
– Identify data to be sourced
using the data dictionary
– Determine transformation
rules to consolidate data
content and correct data
quality problems
• Reporting
© 2012
Questions
© 2012
Thank You!
• Contact us
– www.dataprofilers.com
– 1-888-GET-ER1S (438-3717)
– [email protected]
• Antonio Amorin
– [email protected]
© 2012

Documentos relacionados