Table of Contents
Title Page
Copyright
About the Authors
Credits
Acknowledgments
Introduction
How this Book is Organized
Who Should Read this Book
How this Book Differs from the First Edition
Chapter 1: Introducing the Kimball Lifecycle
Lifecycle History Lesson
Lifecycle Milestones
Using the Lifecycle Roadmap
Lifecycle Navigation Aids
Lifecycle Vocabulary Primer
Conclusion
Chapter 2: Launching and Managing the Project/Program
Define the Project
Plan the Project
Manage the Project
Manage the Program
Conclusion
Chapter 3: Collecting the Requirements
Overall Approach to Requirements Definition
Prepare for the Interview
Conduct the Interview
Wrap Up the Interview
Review the Interview Results
Prepare and Publish Requirements Deliverables
Prioritize and Agree on Next Steps
Adjustments for Project Level Requirements
Deal with Challenging Interviewees
Conclusion
Chapter 4: Introducing the Technical Architecture
The Value of Architecture
Technical Architecture Overview
Back Room Architecture
Presentation Server Architecture
Front Room Architecture
Infrastructure
Metadata
Security
Conclusion
Chapter 5: Creating the Architecture Plan and Selecting Products
Create the Architecture
Select Products
Manage the Metadata
Secure the System
Create the Infrastructure Map
Install the Hardware and Software
Conclusion
Chapter 6: Introducing Dimensional Modeling
Making the Case for Dimensional Modeling
Dimensional Modeling Primer
Enterprise Data Warehouse Bus Architecture
More on Dimensions
More on Facts
Fables and Falsehoods About Dimensional Modeling
Conclusion
Chapter 7: Designing the Dimensional Model
Modeling Process Overview
Get Organized
Recall the Four-Step Modeling Process
Design the Dimensional Model
Embrace Data Stewardship
Conclusion
Chapter 8: Designing the Physical Database and Planning for Performance
Develop Standards
Develop the Physical Data Model
Build the Development Database
Design Processing Data Stores
Develop the Initial Index Plan
Design the OLAP Database
Build the Test Database
Design Aggregations
Design and Build the Database Instance
Develop the Physical Storage Structure
Conclusion
Chapter 9: Introducing Extract, Transformation, and Load
Round Up the Requirements
The 34 Subsystems of ETL
Extracting Data
Cleaning and Conforming Data
Delivering Data for Presentation
Managing the ETL Environment
Real Time Implications
Conclusion
Chapter 10: Designing and Developing the ETL System
ETL Process Overview
Getting Started
Develop the ETL Plan
Develop One-Time Historic Load Processing
Develop Incremental ETL Processing
Conclusion
Chapter 11: Introducing Business Intelligence Applications
Importance of Business Intelligence Applications
Analytic Cycle for Business Intelligence
Types of Business Intelligence Applications
Navigating Applications via the BI Portal
Conclusion
Chapter 12: Designing and Developing Business Intelligence Applications
Business Intelligence Application Resource Planning
Business Intelligence Application Specification
Business Intelligence Application Development
Business Intelligence Application Maintenance
Conclusion
Chapter 13: Deploying and Supporting the DW/BI System
System Deployment
Documentation and Training
Maintenance and Support
Conclusion
Chapter 14: Expanding the DW/BI System
Manage the Existing Environment
Prepare for Growth and Evolution
Conclusion
Glossary
Index
Advertisement
The Data Warehouse Lifecycle Toolkit, Second Edition
Published by
Wiley Publishing, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright ©2008 by Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy and Bob Becker
Published byWiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-0-470-14977-5
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that InternetWebsites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services or to obtain technical support, please contact our Customer Care Department within the U.S. at (800) 762-2974, outside the U.S. at (317) 572-3993 or fax (317) 572-4002.
Library of Congress Cataloging-in-Publication Data:
The data warehouse lifecycle toolkit / Ralph Kimball… [et al.]. -- 2nd ed.
p. cm.
Includes index.
ISBN 978-0-470-14977-5 (paper/website)
1. Data warehousing. I. Kimball, Ralph.
QA76.9.D37D38 2007
005.74--dc22
2007040691
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. Wiley Publishing, Inc., is not associated with any product or vendor mentioned in this book.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
About the Authors
The authors' professional careers have followed remarkably similar paths. Each author has focused on data warehousing and business intelligence (DW/BI) consulting and education for more than fifteen years. Most worked together at Metaphor Computer Systems, a pioneering decision support vendor, in the 1980s. All the authors are members of the Kimball Group and teach for Kimball University. They contribute regularly to Intelligent Enterprise magazine and other industry publications; most have previously written books in the Toolkit series.
Ralph Kimball founded the Kimball Group. Since the mid 1980s, he has been the DW/BI industry's thought leader on the dimensional approach and trained more than 10,000 IT professionals. Ralph has his Ph.D. in Electrical Engineering from Stanford University.
Margy Ross is President of the Kimball Group. She has focused exclusively on DW/BI since 1982 with an emphasis on business requirements analysis and dimensional modeling. Margy graduated with a BS in Industrial Engineering from Northwestern University.
Warren Thornthwaite began his DW/BI career in 1980. After managing Metaphor's consulting organization, he worked for Stanford University and WebTV. Warren holds a BA in Communications Studies from the University of Michigan and an MBA from the University of Pennsylvania's Wharton School.
Joy Mundy has focused on DW/BI systems since 1992 with stints at Stanford, Web TV, and Microsoft's SQL Server product development organization. Joy graduated from Tufts University with a BA in Economics, and from Stanford University with an MS in Engineering Economic Systems.
Bob Becker has helped clients across a variety of industries with their DW/BI challenges and solutions since 1989, including extensive work with health care organizations. Bob has a BSB in Marketing from the University of Minnesota's School of Business.
Credits
Executive Editor
Robert Elliott
Development Editor
Sara Shlaer
Production Editor
Debra Banninger
Copy Editor
Kim Cofer
Editorial Manager
Mary Beth Wakefield
Production Manager
Tim Tate
Vice President and Executive Group Publisher
Richard Swadley
Vice President and Executive Publisher
Joseph B. Wikert
Project Coordinator, Cover
Lynsey Osborn
Proofreader
Nancy Carrasco
Indexer
Melanie Belkin
Anniversary Logo Design
Richard Pacifico
Cover Image
© Steve Allen/Getty Images
Acknowledgments
First, thanks to our students, clients, readers, and former colleagues for supporting, teaching, and influencing us. One of the authors recently received a fortune cookie that read, “You learn most when teaching others.” We couldn't agree more. Our Kimball University students have pushed us to provide precise, specific guidance and kept us on our toes with their questions. Similarly, the challenges faced by our Kimball Group consulting clients have become our challenges, and have kept us grounded in reality. Finally, ex-colleagues have contributed to our thinking about the concepts in this book, including Laura Reeves who participated as a co-author of the first edition of the Lifecycle Toolkit. Beginning with our associates from the early days at Metaphor, through Red Brick, Stanford University, DecisionWorks Consulting, InfoDynamics, and Microsoft, we've learned lots from each of you.
Thanks to the Wiley team for making this book a reality. Bob Elliott's subtle, yet persistent prodding got the project off the ground. Sara Shlaer did a wonderful job editing our text with an incredible amount of patience, tenacity, and attention to detail. Deb Banninger and the behind-the-scenes folks worked tirelessly to deliver a quality product. We've enjoyed working with all of you.
Finally, thanks to our spouses, partners, and children for putting up with the demands of our careers, while supporting us unconditionally. You've suffered through late nights and missed vacations alongside us. Thanks to Julie Kimball, Sara Kimball Smith, and Brian Kimball, Scott and Katie Ross, Elizabeth Wright, Tony Navarrete, and Pam, Elisa, and Jenna Becker. We couldn't have done it without you!
Introduction
Remarkable transformations have occurred in the nine years since the first edition of The Data Warehouse Lifecycle Toolkit was published. The data warehouse industry has reached full maturity and acceptance across the business world. Hardware and software have made mind boggling advances in these nine years. We have replaced “gigabytes” with “terabytes” in all our conversations. Yet somehow the data warehousing task has remained fundamentally unchanged.
Many of you have thousands of data warehouse users in your organizations. You have welcomed operational decision makers to the ranks of data warehouse users to accompany the original cadres of marketing and finance users. In fact, operational urgencies are the hottest aspects of data warehousing, with everybody insisting that they need the data in “real time.” As our data warehouses have become more important and more visible, we have been hammered by privacy, security, and compliance requirements that are non-negotiable. Business users are waking up to the value of high quality data in much the same way that conventional manufacturing has embraced the management of quality. Finally, and perhaps most important, we have a new name for what we do that reflects our true purpose. It is business intelligence. To emphasize that point, in most places in this book we refer to the overall system you are building as the DW/BI system The complete end-to-end data warehouse and business intelligence system. Although some would argue that you can theoretically deliver business intelligence without a data warehouse, and vice versa, that is ill-advised from our perspective. Linking the two together in the DW/BI acronym reinforces their dependency. Independently, we refer to the queryable data in your DW/BI system as the enterprise data warehouse, and value-add analytics as BI applications. We disagree with others who insist that the data warehouse is a highly normalized data store whose primary purpose is to serve as a source for the transformation and loading of data into summarized dimensional structures..
The shift to business intelligence puts initiative in the hands of business users, not IT. But at the same time this shift puts into perfect focus the mission of the data warehouse: It is the necessary platform for business intelligence. The data warehouse does the hard work of wrangling the data out of the source systems, cleaning it, and organizing it so that normal business users can understand it. Of course we strive for world class business intelligence, but world class business intelligence is only possible if you have a world class data warehouse. And conversely, a data warehouse without business intelligence will fail spectacularly.
This book is a relentlessly practical field guide for designers, managers, and owners of the DW/BI system. We have tried to distinguish this book from other DW/BI books by making the content very concrete and actionable. It's okay to be dazzled by the landscape but we want you to make it all the way to the finish line. This book describes a coherent framework that goes all the way from the original scoping of an overall enterprise DW/BI system, through the detailed steps of developing and deploying, to the final steps of planning the next phases.
There are tens of thousands of functioning data warehouse installations across the world. Many DW/BI owners have developed a complete lifecycle perspective. Probably the biggest insight that comes from this perspective is that each DW/BI system is continuously evolving and dynamic. It cannot be static. It never stops transforming. New business requirements arise. New managers and executives place unexpected demands on the system. New data sources become available. At the very least, the DW/BI system needs to evolve as fast as the surrounding organization evolves. Stable organizations will place modest demands on the system to evolve. Dynamic, turbulent organizations will make the task more challenging.
Given this churning, evolving nature of the DW/BI system, we need design techniques that are flexible and adaptable. We need to be half DBA and half MBA. We need to opportunistically hook together little pieces from individual business processes into larger pieces, making enterprise data warehouses. And we need our changes to the system always to be graceful. A graceful change is one that doesn't invalidate previous data or previous applications.
How this Book is Organized
This book has two deep underlying themes. The first is the Kimball Lifecycle approach. You might ask “What makes the Kimball Lifecycle different from any other methodology?” The shortest answer is that we build DW/BI systems by starting with the business users and figuring out what they need to do their jobs. Then, with those results in mind, we systematically work backward through the reports, applications, databases, and software, finally arriving at the most physical layers of the implementation. This contrasts strongly with technology driven approaches, which proceed in the opposite direction. In the early days of the 1990s, some IT shops didn't know what to make of our business and user oriented approach. But as we publish this book in 2008, the very name “business intelligence” says it all. The user and the business drive the data warehouse.
The second theme is the “bus architecture.” We will show you how to build a succession of individual business process iterations that will, in time, create an enterprise DW/BI system. In this book, you will see a heavy reliance on dimensional modeling as a way to present data to business users. We recommend this approach for only one reason: It is demonstrably the best organization of data to meet the business user's desires for simplicity and high query performance. We thank you in advance for following the dimensional approach that is developed in this book. In the end, you are free to present data to users in any way you think appropriate. But we urge you to constantly revisit the fundamental goal of user satisfaction. We have learned to be humble in the presence of business users. It's not our opinion that matters; it's theirs.
This book captures these perspectives. We will give you actionable skills and actionable tools for getting your job done. Along the way, we hope to give you the perspective and judgment we have accumulated in building DW/BI systems since 1982.
Who Should Read this Book
The primary reader of this book should be a designer or a manager who really needs to get about the business of building and managing a “data warehouse that is a platform for business intelligence applications.” Because that is quite a mouthful, we have consistently referred to this overall system with the name “DW/BI” to drive home the point that you are responsible for getting the data all the way from the original source systems to the business users' screens.
Although the book contains some introductory material, we think the book will be of most use to an IT professional who has already had some exposure to data warehousing. An appropriate next book, which would concentrate more deeply on dimensional modeling, would be The Data Warehouse Toolkit, Second Edition, by Ralph Kimball and Margy Ross, published in 2002.
You may have developed your experience and formed your opinions by designing and delivering a real data warehouse. That is the best background of all! There is no substitute for having had the responsibility of delivering an effective DW/BI system. We the authors have all had the humbling experience of presenting our “baby” to a crowd of demanding business users. It is sometimes hard to accept the reality that most users have real jobs that don't involve technology. They may not even like technology particularly. But business users will use our technology if it is easy to use and provides obvious value.
This book is rather technical. The discussion of design techniques and architectures will undoubtedly introduce terminology that you have not encountered. We have combed this book carefully to make sure that the more technical topics are ones we think you must understand. We have tried not to get bogged down in detail for its own sake. There is a glossary of DW/BI terms at the back of the book that will briefly explain the most insidious terms that we all have to live with.
Although we hope you read this book in its entirety to understand the complete Kimball Lifecycle, we highlight the target audience at the start of each chapter, so you can best judge what to read carefully, and what to skim. Hopefully, your experiences and opinions will give you your own personal framework on which to hang all these ideas. After reading Chapter 1, you will see that there are three parallel threads that must be pursued in building a DW/BI system: the technology, the data, and the business intelligence applications. We even show these three threads in the “You Are Here” diagrams at the beginning of each chapter. Although these threads clearly affect each other, they should be developed in parallel and asynchronously.
However, because a book is necessarily a linear thing, we have had to present the steps in the Kimball Lifecycle as if they occur in just one fixed order. Hopefully as you work through the book, you will visualize the more realistic and complex real world relationships among the various steps. After reading this book, please return eventually to each individual chapter and re-read it very carefully when your project gets to that particular phase. That is why we called it the Lifecycle Toolkit.
How this Book Differs from the First Edition
This second edition of the Lifecycle Toolkit is significantly updated and reorganized compared to the first edition. The first three chapters set you up for understanding the complete Kimball Lifecycle process and for making sure your effort has satisfied the requirements for moving forward. We then worked very hard to make the complex discussion of architectures more actionable and more obviously tied to the sequence of the Kimball Lifecycle. In Chapter 4 we carefully describe the complete technical architecture of the DW/BI system, from original data extraction to the final painting of results on the business users' screens. In Chapter 5 we show you how to create specific plans for this technical architecture and select products. Then in Chapters 6 through 12 we systematically expand the three main deliverables (database designs, ETL system, and BI applications) by first describing each one conceptually and then physically. Finally, in the last two chapters we show you how to deploy this amazing edifice into real operational environments and how to think about expanding and growing your DW/BI system beyond the first implementation.
We hope our enthusiasm for data warehousing and business intelligence shows through in this book. The DW/BI challenge is a fascinating and worthy one. Undoubtedly, the labels will change over the years as vendors position their products to be new things that will remove all the old objections. But our mission has remained constant: Bring the data and analyses to the business users so they can make better business decisions.