Transcription

TEAMFLY

Building theData WarehouseThird EditionW. H. InmonWiley Computer PublishingJohn Wiley & Sons, Inc.N EW YOR K CH ICH ESTER WEI N H EI M B R ISBAN E SI NGAPOR E TORONTO

Building theData WarehouseThird Edition

Building theData WarehouseThird EditionW. H. InmonWiley Computer PublishingJohn Wiley & Sons, Inc.N EW YOR K CH ICH ESTER WEI N H EI M B R ISBAN E SI NGAPOR E TORONTO

Publisher: Robert IpsenEditor: Robert ElliottDevelopmental Editor: Emilie HermanManaging Editor: John AtkinsText Design & Composition: MacAllister Publishing Services, LLCDesignations used by companies to distinguish their products are often claimed as trademarks. In allinstances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.This book is printed on acid-free paper.Copyright 2002 by W.H. Inmon. All rights reserved.Published by John Wiley & Sons, Inc.Published simultaneously in Canada.No part of this publication may be reproduced, stored in a retrieval system or transmitted in any formor by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except aspermitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the priorwritten permission of the Publisher, or authorization through payment of the appropriate per-copy feeto the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212)850-6008, E-Mail: PERMREQ @ WILEY.COM.This publication is designed to provide accurate and authoritative information in regard to the subjectmatter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought.Library of Congress Cataloging-in-Publication Data:ISBN: 0-471-08130-2Printed in the United States of America.10 9 8 7 6 5 4 3 2 1

To Jeanne Friedman—a friend for all times

C O N T E N TSPreface for the Second EditionxiiiPreface for the Third EditionxivAcknowledgmentsxixAbout the AuthorxxChapter 1Evolution of Decision Support Systems1The Evolution24456The Advent of DASDPC/4GL TechnologyEnter the Extract ProgramThe Spider WebProblems with the Naturally Evolving ArchitectureLack of Data CredibilityProblems with ProductivityFrom Data to InformationA Change in ApproachThe Architected EnvironmentData Integration in the Architected EnvironmentChapter 266912151619Who Is the User?19The Development Life Cycle21Patterns of Hardware Utilization22Setting the Stage for Reengineering23Monitoring the Data Warehouse Environment25Summary28The Data Warehouse Environment31The Structure of the Data Warehouse35Subject Orientation36Day 1-Day n Phenomenon41Granularity43454649The Benefits of GranularityAn Example of GranularityDual Levels of Granularityvii

C O N T E N TSExploration and Data Mining53Living Sample Database53Partitioning as a Design Approach5556Partitioning of Data59Data Warehouse: The Standards Manual64Auditing and the Data Warehouse64Cost JustificationJustifying Your Data Warehouse6566Data Homogeneity/Heterogeneity69Purging Warehouse Data72Reporting and the Architected Environment73The Operational Window of Opportunity74Incorrect Data in the Data Warehouse76Summary77AMFLYChapter 3Structuring Data in the Data WarehouseThe Data Warehouse and Design81Beginning with Operational Data82Data/Process Models and the Architected Environment87The Data Warehouse and Data Models89929498The Data Warehouse Data ModelThe Midlevel Data ModelThe Physical Data ModelTEviiiThe Data Model and Iterative napshots in the Data WarehouseMeta DataManaging Reference Tables in a Data Warehouse113113Cyclicity of Data-The Wrinkle of Time115Complexity of Transformation and Integration118Triggering the Data Warehouse Record122122123123EventsComponents of the SnapshotSome ExamplesProfile Records124Managing Volume126Creating Multiple Profile Records127

CO NTE NTSGoing from the Data Warehouse to the OperationalEnvironment128Direct Access of Data Warehouse Data129Indirect Access of Data Warehouse Data130130132133An Airline Commission Calculation SystemA Retail Personalization SystemCredit ScoringChapter 4Indirect Use of Data Warehouse Data136Star Joins137Supporting the ODS143Summary145Granularity in the Data Warehouse147Raw Estimates148Input to the Planning Process149Data in Overflow?149151Overflow StorageChapter 5ixWhat the Levels of Granularity Will Be155Some Feedback Loop Techniques156Levels of Granularity-Banking Environment158Summary165The Data Warehouse and Technology167Managing Large Amounts of Data167Managing Multiple Media169Index/Monitor Data169Interfaces to Many Technologies170Programmer/Designer Control of Data Placement171Parallel Storage/Management of Data171171Meta Data ManagementLanguage Interface173Efficient Loading of Data173Efficient Index Utilization175Compaction of Data175Compound Keys176Variable-Length Data176Lock Management176

xC O N T E N TSIndex-Only Processing178Fast Restore178Other Technological Features178DBMS Types and the Data Warehouse179Changing DBMS Technology181Multidimensional DBMS and the Data Warehouse182Data Warehousing across Multiple Storage Media188Meta Data in the Data Warehouse Environment189Context and Content192193Three Types of Contextual InformationCapturing and Managing Contextual InformationLooking at the PastChapter 6Refreshing the Data Warehouse195Testing198Summary198The Distributed Data Warehouse201Types of Distributed Data Warehouses202202220221Local and Global Data WarehousesThe Technologically Distributed Data WarehouseThe Independently Evolving Distributed Data WarehouseThe Nature of the Development EffortsCompletely Unrelated WarehousesDistributed Data Warehouse DevelopmentCoordinating Development across Distributed LocationsThe Corporate Data Model-DistributedMeta Data in the Distributed WarehouseChapter 7194195222224226227228232Building the Warehouse on Multiple Levels232Multiple Groups Building the Current Level of DetailDifferent Requirements at Different LevelsOther Types of Detailed DataMeta Data235238239244Multiple Platforms for Common Detail Data244Summary245Executive Information Systems and the Data Warehouse247EIS-The Promise248A Simple Example248Drill-Down Analysis251

CO NTE NTSSupporting the Drill-Down Process253The Data Warehouse as a Basis for EIS254Where to Turn256Event Mapping258Detailed Data and EIS261262Keeping Only Summary Data in the EISChapter 8Chapter 9xiSummary263External/Unstructured Data and the Data Warehouse265External/Unstructured Data in the Data Warehouse268Meta Data and External Data269Storing External/Unstructured Data271Different Components of External/Unstructured Data272Modeling and External/Unstructured Data273Secondary Reports274Archiving External Data275Comparing Internal Data to External Data275Summary276Migration to the Architected Environment277A Migration Plan278The Feedback Loop286Strategic Considerations287Methodology and Migration289A Data-Driven Development Methodology291Data-Driven Methodology293System Development Life Cycles294A Philosophical Observation294Operational Development/DSS Development294Summary295Chapter 10 The Data Warehouse and the Web297Supporting the Ebusiness Environment307Moving Data from the Web to the Data Warehouse307Moving Data from the Data Warehouse to the Web308Web Support309Summary310

XIIC O N T E N TSChapter 11 ERP and the Data Warehouse311ERP Applications Outside the Data Warehouse312Building the Data Warehouse inside the ERP Environment314Feeding the Data Warehouse through ERP and Non-ERPSystems314The ERP-Oriented Corporate Data Warehouse318Summary320Chapter 12 Data Warehouse Design Review ChecklistWhen to Do Design ReviewWho Should Be in the Design Review?What Should the Agenda Be?The ResultsAdministering the ReviewA Typical Data Warehouse Design ossary385Reference397Index407

xiiiPREFACE FOR TH E SECON DIntroductionEDITIONDatabases and database theory have been around for a long time. Early renditions of databases centered around a single database serving every purposeknown to the information processing community—from transaction to batchprocessing to analytical processing. In most cases, the primary focus of theearly database systems was operational—usually transactional—processing. Inrecent years, a more sophisticated notion of the database has emerged—onethat serves operational needs and another that serves informational or analytical needs. To some extent, this more enlightened notion of the database is dueto the advent of PCs, 4GL technology, and the empowerment of the end user.The split of operational and informational databases occurs for many reasons: The data serving operational needs is physically different data from thatserving informational or analytic needs. The supporting technology for operational processing is fundamentally different from the technology used to support informational or analyticalneeds. The user community for operational data is different from the one servedby informational or analytical data. The processing characteristics for the operational environment and theinformational environment are fundamentally different.Because of these reasons (and many more), the modern way to build systems isto separate the operational from the informational or analytical processing anddata.This book is about the analytical [or the decision support systems (DSS)] environment and the structuring of data in that environment. The focus of the bookis on what is termed the “data warehouse” (or “information warehouse”), whichis at the heart of informational, DSS processing.The discussions in this book are geared to the manager and the developer.Where appropriate, some level of discussion will be at the technical level. But,for the most part, the book is about issues and techniques. This book is meantto serve as a guideline for the designer and the developer.xiii

PREFACE FOR TH E TH I RD EDITIONWhen the first edition of Building the Data Warehouse was printed, the database theorists scoffed at the notion of the data warehouse. One theoreticianstated that data warehousing set back the information technology industry 20years. Another stated that the founder of data warehousing should not beallowed to speak in public. And yet another academic proclaimed that datawarehousing was nothing new and that the world of academia had knownabout data warehousing all along although there were no books, no articles, noclasses, no seminars, no conferences, no presentations, no references, nopapers, and no use of the terms or concepts in existence in academia at thattime.When the second edition of the book appeared, the world was mad for anythingof the Internet. In order to be successful it had to be “e” something—e-business,e-commerce, e-tailing, and so forth. One venture capitalist was known to say,“Why do we need a data warehouse when we have the Internet?”But data warehousing has surpassed the database theoreticians who wanted toput all data in a single database. Data warehousing survived the dot.com disaster brought on by the short-sighted venture capitalists. In an age when technology in general is spurned by Wall Street and Main Street, data warehousing hasnever been more alive or stronger. There are conferences, seminars, books,articles, consulting, and the like. But mostly there are companies doing datawarehousing, and making the discovery that, unlike the overhyped New Economy, the data warehouse actually delivers, even though Silicon Valley is still ina state of denial.The third edition of this book heralds a newer and even stronger day for datawarehousing. Today data warehousing is not a theory but a fact of life. Newtechnology is right around the corner to support some of the more exotic needsof a data warehouse. Corporations are running major pieces of their businesson data warehouses. The cost of information has dropped dramatically becauseof data warehouses. Managers at long last have a viable solution to the uglinessof the legacy systems environment. For the first time, a corporate “memory” ofhistorical information is available. Integration of data across the corporation isa real possibility, in most cases for the first time. Corporations are learning howxiv

Preface for the Third Editionxvto go from data to information to competitive advantage. In short, data warehousing has unlocked a world of possibility.One confusing aspect of data warehousing is that it is an architecture, not atechnology. This frustrates the technician and the venture capitalist alikebecause these people want to buy something in a nice clean box. But data warehousing simply does not lend itself to being “boxed up.” The difference betweenan architecture and a technology is like the difference between Santa Fe, NewMexico, and adobe bricks. If you drive the streets of Santa Fe you know you arethere and nowhere else. Each home, each office building, each restaurant has adistinctive look that says “This is Santa Fe.” The look and style that make SantaFe distinctive are the architecture. Now, that architecture is made up of suchthings as adobe bricks and exposed beams. There is a whole art to the makingof adobe bricks and exposed beams. And it is certainly true that you could nothave Santa Fe architecture without having adobe bricks and exposed beams.But adobe bricks and exposed beams by themselves do not make an architecture. They are independent technologies. For example, you