CONTENTS
CONTENTS
Professional Microsoft® SQL Server® 2012 Integration Services
Published by
John Wiley & Sons, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
Copyright © 2012 by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-1-118-10112-4
ISBN: 978-1-118-22380-2 (ebk)
ISBN: 978-1-118-23709-0 (ebk)
ISBN: 978-1-118-26211-5 (ebk)
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at .
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at . For more information about Wiley products, visit .
Library of Congress Control Number: 2011945018
Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, and Wrox Programmer to Programmer are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Microsoft and SQL Server are registered trademarks of Microsoft Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
To my great team and work family at Pragmatic Works
—Brian Knight
To my parents, Paul and Darian Veerman
—Erik Veerman
For the SQL Community: without you, this would not be possible.
—Jessica M. Moss
To my wife Jessy, she makes me a better person every day.
—Mike Davis
To the three girls in my life that mean more than anything to me, Tammy, Calista, and Callie.
—Chris Rock
ABOUT THE AUTHORS
BRIAN KNIGHT, SQL Server MVP, MCITP, MCSE, MCDBA, is the owner and founder of Pragmatic Works. He is also the cofounder of , , and . He runs the local SQL Server users group in Jacksonville (JSSUG). Brian is a contributing columnist at several technical magazines. He is the author of a dozen SQL Server books. Brian has spoken at conferences like PASS, SQL Connections, and TechEd, SQL Saturdays, Code Camps, and many pyramid scheme motivational sessions. His blog can be found at . Brian lives in Jacksonville, Florida, where he enjoys his kids and running marathons.
ERIK VEERMAN is a mentor with SolidQ focusing on training, mentoring, and architecting solutions on the SQL Server BI platform. Erik has designed dozens of BI solutions across a broad business spectrum — telecommunications, marketing, retail, commercial real estate, finance, supply chain, and information technology. His industry recognition includes Microsoft’s Worldwide BI Solution of the Year and SQL Server Magazine’s Innovator Cup winner. As an expert in OLAP design, ETL processing, and dimensional modeling, Erik is a presenter, author, and instructor. He has helped drive the industry ETL standards and best practices for SSIS through his Wrox books on SSIS and is the lead author of Microsoft Press SQL Server BI training kits. Erik lives in Atlanta, Georgia, with his wife and four energetic children.
JESSICA M. MOSS is a well-known practitioner, author, and speaker of Microsoft SQL Server business intelligence. She has created numerous data warehouse and business intelligence solutions for companies in different industries and has delivered training courses on Integration Services, Reporting Services, and Analysis Services. While working for a major clothing retailer, Jessica participated in the SQL Server 2005 TAP program where she developed best implementation practices for Integration Services. Jessica has authored technical content for multiple magazines, websites, and books, including the Wrox book Microsoft SQL Server 2008 Integration Services: Problem-Design-Solution, and has spoken internationally at conferences such as the PASS Community Summit, SharePoint Connections, and the SQLTeach International Conference. As a strong proponent of developing user-to-user community relations, Jessica actively participates in local user groups and code camps in central Virginia. In addition, Jessica volunteers her time to help educate people through the PASS organization.
MIKE DAVIS, MCTS, MCITP, is the Managing Project Lead at Pragmatic Works. This book is his third on the subject of business intelligence and specifically Integration Services. He has worked with SQL Server for almost a decade and has led many successful business intelligence projects with his clients. Mike is an experienced speaker and has presented at many events such as SQL Server user groups, code camps, SQL Saturday events, and the PASS Summit. Mike is an active member at his local user group (JSSUG) in Jacksonville, Florida. In his spare time he likes to play darts and guitar. You can also find him on twitter @MikeDavisSQL and his blog on and .
CHRIS ROCK is a software developer and program manager for Pragmatic Works. He started developing software using VB6 and SQL Server 6.5 in 1998 and has been using SSIS since its inception. Chris has spoken at many local SQL Saturday and Code Camp events in Florida. When he’s not writing code, Chris enjoys training cats to speak English. He blogs at .
ABOUT THE TECHNICAL EDITORS
DR. WEE-HYONG TOK is a Program Manager from the SQL Server product group. Wee-Hyong spent the last few years working on Integration Services. Prior to joining Microsoft, Wee-Hyong was a Microsoft Most Valuable Professional (MVP), and he is a frequent speaker at various conferences in Asia-Pacific. He is deeply passionate about how organizations can make use of business intelligence tools to gain insights into the nuggets of data within the organization and make better, more informed decisions.
MATTHEW ROCHE has nearly 20 years’ experience as a developer, trainer, and mentor, with a focus on the Microsoft SQL Server platform. He has designed, developed, and deployed small- and large-scale applications based on SQL Server starting with version 6.5, and with the release of SQL Server 2005, he has increasingly focused on BI and ETL solutions with SQL Server Integration Services. Matthew has presented on SQL Server topics across the United States and Europe at numerous user groups, code camps, and technical conferences, including TechEd and SQL Server Connections, and has contributed to multiple SQL Server books as author and technical reviewer. Matthew has been a Microsoft Certified Trainer (MCT) since 1996 and prior to joining Microsoft in 2008 was a Microsoft Most Valuable Professional (MVP) for SQL Server. He is currently a Senior Program Manager on the SQL Server Integration Services team.
CREDITS
EXECUTIVE EDITOR
Bob Elliott
SENIOR PROJECT EDITOR
Kevin Kent
TECHNICAL EDITORS
Wee-Hyong Tok
Matthew Roche
SENIOR PRODUCTION EDITOR
Debra Banninger
COPY EDITOR
Luann Rouff
EDITORIAL MANAGER
Mary Beth Wakefield
FREELANCER EDITORIAL MANAGER
Rosemarie Graham
ASSOCIATE DIRECTOR OF MARKETING
David Mayhew
MARKETING MANAGER
Ashley Zurcher
BUSINESS MANAGER
Amy Knies
PRODUCTION MANAGER
Tim Tate
VICE PRESIDENT AND EXECUTIVE GROUP PUBLISHER
Richard Swadley
VICE PRESIDENT AND EXECUTIVE PUBLISHER
Neil Edde
ASSOCIATE PUBLISHER
Jim Minatel
PROJECT COORDINATOR, COVER
Katie Crocker
PROOFREADER
Nicole Hirschman
INDEXER
J&J Indexing
COVER IMAGE
© Mark Evans / iStockPhoto
COVER DESIGNER
Ryan Sneed
ACKNOWLEDGMENTS
THANKS TO EVERYONE who made this book possible. As always, I owe a huge debt to my wife Jenn, for putting up with my late nights, and to my children, Colton, Liam, Camille, and my newest son John, for being so patient with their tired dad who has always overextended. Thanks to Kevin Kent and my tech editors Matthew Roche and Wee-Hyong Tok for keeping me in my place. Thanks also to the makers of Guinness for providing my special juice that helped me power through the book. Thanks for all the user group leaders out there who work so hard to help others become proficient in technology. You make a huge difference! Finally, thanks to my acting coach Shawn Harrison for getting me ready for my debut this fall.
—Brian Knight
MANY PEOPLE to thank . . . First thanks again to Brian and the Wrox team for another opportunity to participate in a grand book adventure. You’ll notice that for this book, I dedicated it to my wonderful parents. Thanks for buying me that Commodore 64 way back, and of course, thanks for the opportunity to be your perpetual help desk support. (What technologist isn’t the family’s help desk support?) I’ll never forget the call about the virus where your desktop icons would move away from the mouse pointer, or the “issue” last year with the printer not working (plugging it in does wonders).
I would like to commend Discovery Channel’s documentary How Beer Saved the World. This book is an indirect proof of its theory. As always, thanks to my elegant wife Amy and to our kids for their patience and tolerance. I would also like to recognize John Calvin (my philosophical and theological hero) and the reformers —Soli Deo Gloria.
—Erik Veerman
I AM TRULY THANKFUL to the many people who have helped me throughout my career. Most recently, I would like to thank my coauthors and editors on this book: Brian Knight, Erik Veerman, Mike Davis, Chris Rock, Matthew Roche, Wee-Hyong Tok, Kevin Kent, Bob Elliott, and the rest of the Wrox team. It definitely takes a village to raise a child, and our “baby” was no exception. Thank you also to my wonderful family and friends who stood by me through late nights and caffeine overloads. I would not be here without every one of you. Thank you.
—Jessica M. Moss
THANKS TO my Pragmatic Works Team for its support in this book. Thank you to Brian Knight for giving me the opportunity of a lifetime. Thank you to Adam Jorgensen for pushing me. Thank you to the Wiley team, especially Kevin and Bob. Thank you to the technical editors for their help in making this book great. Thank you to my mother for raising me to be the man I am today. Thank you to my wife and kids for being by my side. And finally, thank you to the Flying Spaghetti Monster for showing me the way of logic and reasoning.
—Mike Davis
THANKS TO everyone who made this book possible. I want to thank my wife, Tammy, for putting up with “I’m writing the book” for the last few months. Thank you to my two girls, Calista and Callie. Without you two I wouldn’t have tried so hard to achieve the goals I’ve reached in my life. Thanks to Matt Masson for answering any questions I had during this process. Thanks to Brian Knight for giving me the opportunity to contribute to this book. Finally, thanks to Andy Warren for setting me on the right professional development path all those years ago.
—Chris Rock
INTRODUCTION
THE MOST IMPORTANT BUSINESS intelligence tool in the Microsoft Swiss Army knife of tools is SQL Server Integration Services (SSIS). This is because the other tools would be nothing without the cleansing and movement of data into a presentable format. The product can extract, transform, and load (ETL) data astonishingly fast. A 2010 benchmark showed movement of more than a terabyte an hour with SSIS! If you’re new to SSIS, you’ve picked a fantastic field to become involved in. The one consistent skill needed in today’s technical job market is ETL. If a company wants to establish a partnership with another company, it’ll need to communicate data back and forth between the two companies. If your company wants to launch new products, it’ll need a way to integrate those products into its website and catalog. All of these types of tasks are going to require the skill set you are developing and will learn in this book.
Companies that had never used SQL Server before are now allowing it in their environment because SSIS is such an easy-to-use and cost-effective way to move data. SSIS competes with the largest ETL tools on the market like Informatica, DataStage, and Ab Initio at a tiny fraction of the price. SQL Server 2012 now offers more components that you use to make your life even easier and the performance scales to a level never seen on the SQL Server platform.
The best thing about SSIS is its price tag: free with your SQL Server purchase. Many ETL vendors charge hundreds of thousands of dollars, if not millions, for what you will see in this book. SSIS is also a great platform for you to expand and integrate into, which many ETL vendors do not offer. Once you get past the initial learning curve, you’ll be amazed with the power of the tool, and it can take weeks off your time to market. This author team has trained hundreds of people over the years, and you’ll find that the learning curve of SSIS is shallow relative to competing platforms. In SQL Server 2012, the product has matured to its third major envisioning. In this release the focus was on scalability, management, and more advanced data cleansing.
WHO THIS BOOK IS FOR
Because we have used SSIS since the beta stages of SQL Server 2005 and through its evolution into its current form, the idea of writing this book was quite compelling. If you’ve never used SSIS before, we spend the first chapters focusing on lowering your learning curve on this product. If you’ve used SSIS in the past, we’ve added quite a bit of new content that is specific to SQL Server 2012 and to take your skills to the next level. If you’re a SSIS 2005 or 2008 user, luckily the interface has not drastically changed, so you’ll want to focus on advanced data cleansing and administration, which has gone through a drastic overhaul.
This book is intended for developers, DBAs, and casual users who hope to use SSIS for transforming data, creating a workflow, or maintaining their SQL Server. This book is a professional book, meaning that the authors assume that you know the basics of how to query a SQL Server and have some rudimentary programming skills. Not many programming skills will be needed or assumed, but it will help with your advancement. No skills in the prior release of SSIS are required, but we do reference it throughout the book when we call attention to feature enhancements.
WHAT THIS BOOK COVERS
Whether you’re new to SSIS or an experienced SSIS developer, there’s something for you in this book. This book takes you from the architecture and basics of SSIS all the way through to developing hard-core SSIS solutions to solve many of the industry’s common business scenarios. The book is tutorial based, meaning that it teaches you through simple examples.
By the time you complete this book, you’ll know how to load and synchronize database systems using SSIS by using some of the new SQL Server 2012 features. You’ll also know how to load data warehouses, which is a very hot and specialized skill. Even in warehousing, you’ll find features in the new 2012 release that you’ll wonder how you lived without like the Data Quality Services integration and CDC integration!
HOW THIS BOOK IS STRUCTURED
After discussing the architecture of SSIS, we’ll start with the basics by introducing the fundamental concepts of SSIS: the Data Flow and Control Flow. We’ll then build through the various other features, including the warehousing and scripting, and proceed to advanced topics like programming and extending the engine. We’ll conclude with a case study that helps to tie everything together. SSIS is a very feature-rich product, and it took a lot to cover the product:
Chapter 1, “Welcome to SQL Server Integration Services,” introduces the concepts that we’re going to discuss throughout the remainder of this book. We talk about the SSIS architecture and give a brief overview of what you can do with SSIS.
Chapter 2, “The SSIS Tools,” shows you how to quickly learn how to import and export data by using the Import and Export Wizard and then takes you on a tour of the SQL Server Data Tools (SSDT).
Chapter 3, “SSIS Tasks,” goes into each of the tasks that are available to you in SSIS. These tasks are the building blocks for your SSIS workflow and are much like LEGO block programming.
Chapter 4, “Containers,” covers how to use containers to do looping in SSIS and describes how to configure each of the basic transforms.
Chapter 5, “The Data Flow,” dives into the Data Flow components in SSIS. These components are where typical ETL developers will spend 75 percent of their time when loading a database.
Chapter 6, “Using Variables, Parameters, and Expressions,” instructs you how to use the obscure expression language in SSIS by showing you many example use cases and how to solve them through the language. We also cover in the chapter a new concept of parameters and parameterization of a package.
Chapter 7, “Joining Data,” focuses on how to join systems together, whether those systems are two flat files or database platforms. Much of the chapter is spent showing the Lookup Transform.
Now that you know how to configure most of the tasks and transforms, Chapter 8, “Creating an End-to-End Package,” puts it all together with a large example that lets you try out your SSIS experience.
Chapter 9, “Scripting in SSIS,” shows you some of the ways you can use the Script task in SSIS.
Chapter 10, “Loading a Data Warehouse,” covers how to load a data warehouse from the ground up through example. Even smaller companies now are finding that to compete they need to make their data work for them by employing a data warehouse. In this chapter we show how to load dimension and fact tables and some of the common issues.
Chapter 11, “Advanced Data Cleansing in SSIS,” walks through common patterns in data cleansing and how to accomplish them in SSIS. The chapter also covers Data Quality Services (DQS) and how to integrate it into SSIS.
Chapter 12, “Using the Relational Engine,” discusses other features in the SQL Server arsenal that can help you build robust and high-performance ETL solutions. The SQL Server relational database engine has many features that were designed with data loading in mind, and as such the engine and SSIS form a perfect marriage to extract, load, and transform your data.
Sometimes you connect to systems other than SQL Server. Chapter 13, “Accessing Heterogenous Data,” shows you how to connect to systems other than SQL Server like Excel, XML, and web services. Much work has gone into new ODBC integration in this release of SQL Server 2012.
Chapter 14, “Reliability and Scalability,” demonstrates how to scale SSIS and make it more reliable. You can use the features in this chapter to show you how to make the package restartable if a problem occurs.
Chapter 15, “Understanding and Tuning the Data Flow Engine,” explains the architecture of the SSIS Data Flow engine in detail and how to tune your SSIS packages for maximum efficiency.
Chapter 16, “SSIS Software Development Life Cycle,” introduces a software development life cycle methodology to you. It speaks to how SSIS can integrate with Visual Studio Team System.
Chapter 17, “Error and Event Handling,” discusses how to handle problems with SSIS with error and event handling.
Chapter 18, “Programming and Extending SSIS,” shows the SSIS object model and how to use it to extend SSIS. The chapter goes through creating your own task, and then Chapter 19, “Adding a User Interface to Your Component,” adds a user interface to the discussion.
Chapter 20, “External Management and WMI Task Implementation,” walks through creating an application that interfaces with the SSIS to manage the environment. It also discusses the WMI set of tasks.
Chapter 21, “Using SSIS with External Applications,” teaches you how to expose the SSIS Data Flow to other programs like InfoPath and your own .NET applications.
Chapter 22, “Administering SSIS,” shows you how to deploy and administer the packages that you’ve worked so hard to develop. Much of the work in SSIS in SQL Server 2012 has gone into modifying the way you deploy and administer SSIS packages.
Chapter 23, “Case Study: A Programmatic Example,” is a programmatic case study that creates an end-to-end project.
WHAT YOU NEED TO USE THIS BOOK
To follow this book, you will only need to have SQL Server 2012 and the Integration Services component installed. You’ll need a machine that can support the minimum hardware requirements to run SQL Server 2012. You’ll also want to have the AdventureWorks and AdventureWorksDW databases installed. You can find the versions of these databases we used for this book on the Wrox website ().
CONVENTIONS
To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.
Boxes with a warning icon like this one hold important, not-to-be-forgotten information that is directly relevant to the surrounding text.
The pencil icon indicates notes, tips, hints, tricks, and asides to the current discussion.
As for styles in the text:
We use a monofont type with no highlighting for most code examples.
We use bold to emphasize code that's particularly important in the present context.
SOURCE CODE
As you work through the examples in this book, you may choose either to type in all the code manually or to use the source code files that accompany the book. All of the source code used in this book is available for download at . You will find the code snippets from the source code are accompanied by a download icon and note indicating the name of the program so you know it’s available for download and can easily locate it in the download file. Once at the site, simply locate the book’s title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book.
Because many books have similar titles, you may find it easiest to search by ISBN; this book’s ISBN is 978-1-118-10112-4.
Once you download the code, just decompress it with your favorite compression tool. Alternately, you can go to the main Wrox code download page at to see the code available for this book and all other Wrox books.
Don’t forget. You will also find and be able to download the versions of the AdventureWorks and AdventureWorksDW databases that we used for this book at .
ERRATA
We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.
To find the errata page for this book, go to and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors. A complete book list including links to each book’s errata is also available at .
If you don’t spot “your” error on the Book Errata page, go to and complete the form there to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.
P2P.WROX.COM
For author and peer discussion, join the P2P forums at . The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.
At you will find a number of different forums that will help you not only as you read this book but also as you develop your own applications. To join the forums, just follow these steps:
1. Go to and click the Register link.
2. Read the terms of use and click Agree.
3. To join, complete the required information, as well as any optional information you wish to provide, and click Submit.
4. You will receive an e-mail with information describing how to verify your account and complete the joining process.
You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.
Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.
For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works, as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.