Solutions That Work - The Comprehensive Data Dictionary
Transcrição
Solutions That Work - The Comprehensive Data Dictionary
Solutions that Work The Comprehensive Data Dictionary Antonio Amorin Agenda • • • • • • • Introductions Case Study Overview Assessment Solution Implementation Deliverables Utilization © 2012 Biography • Antonio Amorin President, Data Innovations, Inc. – Graduated from Illinois State University with a BA in Computer Science and a minor in Economics – Twenty years of data modeling experience – Twelve years of data profiling experience – 2012 ISU CAST Academy of Achievement Inductee – Delivered data modeling and data profiling solutions to numerous clients in the Midwest and East Coast – Founded Data Innovations, Inc. in 2002 – Presented at CA World, Enterprise Data World, DAMA Chapter Meetings, Modeling User Groups, webcasts, and at client sites © 2012 Data Innovations, Inc. • • Established in 2002 Professional Services: – – – – – – • • • Data Modeling Data Profiling Data Architecture Metadata Database Administration ETL CA Service Partner in 2004 CA Commercial Reseller in 2006 CA Enterprise Solution Provider in 2007 © 2012 Data Innovations, Inc. • Projects – Data Warehousing – Metadata Repository – Customer Master Data Management – Product Master Data Management – Outsourcing – Data Governance – Data Quality – Application Development – EDI Conversions – Data Integration – Software Integration © 2012 Case Study Overview • • • • • • • • • Brokerage Firm Outsourcing Mortgage Business Legacy Mortgage Application Mortgage Components Rewritten Legacy Data Retained Legacy Data Redeployed Remote Development Teams Local Maintenance Team Limited Timeframe © 2012 Case Study Overview • Legacy Mortgage Application – Developed in the mid 90’s – Oracle Database • • • • • • • Over 1,100 tables with more than 24,000 attributes No primary keys No referential integrity (foreign keys) Naming standard identifies parent-child relationships Over fifteen years worth of data Undocumented UNIX platform – User Interface – Hyperion Reports – Support • • No original developers Maintenance team © 2012 Case Study Overview • Requirements – Legacy Oracle database will be stored in original format for audit purposes – Specific data will be extracted from the legacy Oracle database and loaded into a different RDBMS – The legacy data will be utilized for data warehousing, master data management, and other business intelligence purposes – Specific screens and functionality will be developed to display the legacy data for active mortgages – A data model documenting all of the tables and columns, as well as the relationships between the tables © 2012 Assessment • Created Data Model – – – • Profiled Data – – – • Profiled production data from the legacy database Performed a data quality assessment Validated parent-child relationships between tables Reviewed Interface – – • Reverse engineered the database into a combined logical-physical data model Created reports to identify all of the tables and columns Analyzed the data model and reports to determine the quality of the metadata Reviewed the legacy screens for the mission critical data Discussed the use of the mission critical data with the business analysts Development Team – Met with the development team to understand their data requirements © 2012 Solution • The Comprehensive Data Dictionary – – – – – Reverse engineer the legacy Oracle database into a combined logicalphysical data model Utilize the descriptions and comments at the column level to capture the detailed metadata identified during the analysis Derive a logical data model from the combined logical-physical data model Create the parent-child relationships between the tables and use the relationship description to capture the analysis identifying the columns involved in the relationship Generate the custom data dictionary report in PDF format © 2012 Solution • Process Overview – – – – – – – – – – Identify the tables to be analyzed based on the development teams’ needs Create waves of 200-250 tables in each wave Profile the tables identified for the current wave Generate spreadsheets from the profiling results to capture the inferred metadata and the analysis results Perform a data quality assessment against each column to identify data anomalies and other data content issues Perform basic analysis to identify the descriptions, data rules, and business rules Perform detailed analysis for the data quality problems identified Perform cross-table analysis to identify the parent tables for each table in the wave and update the logical data model Using the .csv file from the Bulk Editor, capture the analysis results in the combined logical-physical data model Generate custom data dictionary report in PDF format © 2012 Solution • Data Profiling – Perform column level analysis against all of the tables identified for the current wave – Generate spreadsheets containing the inferred metadata • • • • • • • • Cardinality Range Mode Sparse Null Count Value Frequencies Pattern Frequencies Length Frequencies – Leverage the profiling results for analysis as necessary – Archive the profiling results for later reference © 2012 Solution • Data Quality Assessment – Analyze the profiling results to identify anomalies in the data content – Validate the metadata reflects the data content properly – Update the analysis summary to reflect the results of the assessment • Basic Analysis – Document the definition for the column – Identify the business rules – Identify the data rules – Indicate if the content is a code, indicator, flag, or type © 2012 Solution • Data Profiling – Cross-Table Analysis • Create data sources for each table and include all of the potential parent tables • Execute the primaryforeign key profiler • Validate each relationship at a column level by creating the relationship manually • Identify any orphaned rows and capture them in the analysis spreadsheet © 2012 Solution © 2012 Solution • Update the Logical Only Data Model – Create the parent-child relationships for each table – Update the relationship descriptions with the analysis results – Create a subject area for each table – Add all of the parent tables to the subject area – Generate data model picture to include in the data dictionary report © 2012 Solution • Upload Analysis Results – Utilize the Bulk Editor to create a .csv file from the logical-physical data model – Cut and paste the analysis results into the .csv file – Import the .csv file back into the logical-physical data model © 2012 Solution • Report Deliverables – Create a subject area for each table in the wave in the logical data model – Add the table and the parent tables to the subject area – Generate a picture of the subject area’s diagram – Add the generated picture to the customized data dictionary report – Generate the customized data dictionary report for the specific table in the logical-physical data model – Save the report in PDF format and move to the SharePoint site © 2012 Implementation • Data Modeling – Install CA ERwin Data Modeler Workgroup Edition – Create the workgroup repository – Initialize repository – Define users to repository • Data Profiling – Install profiling software – Create all connections to the data sources – Create projects and related objects for profiling effort © 2012 Implementation • Project plan – Create detailed project plan to track each task for each wave of tables for each resource – Assign specific tables to resources – Integrate the project tracking into the analysis workflow • Training – Software training for the data modelers, database administrators, and other technical resources – Analysis training for business and technical resources © 2012 Deliverables © 2012 Deliverables © 2012 Deliverables © 2012 Deliverables © 2012 Utilization • Application Development – Develop screens and programs to capture and display the data content • Data Warehousing – Identify data to be sourced using the data dictionary – Determine transformation rules to consolidate data content and correct data quality problems • Reporting © 2012 Questions © 2012 Thank You! • Contact us – www.dataprofilers.com – 1-888-GET-ER1S (438-3717) – [email protected] • Antonio Amorin – [email protected] © 2012