The Wiley & SAS Business Series presents books that help senior-level managers with their critical management decisions.
Titles in the Wiley & SAS Business Series include:
Agile by Design: An Implementation Guide to Analytic Lifecycle Management by Rachel Alt-Simmons
Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications by Bart Baesens
Bank Fraud: Using Technology to Combat Losses by Revathi Subramanian
Big Data, Big Innovation: Enabling Competitive Differentiation through Business Analytics by Evan Stubbs
Business Forecasting: Practical Problems and Solutions edited by Michael Gilliland, Len Tashman, and Udo Sglavo
Business Intelligence Applied: Implementing an Effective Information and Communications Technology Infrastructure by Michael Gendron
Business Intelligence and the Cloud: Strategic Implementation Guide by Michael S. Gendron
Business Transformation: A Roadmap for Maximizing Organizational Insights by Aiman Zeid
Data-Driven Healthcare: How Analytics and BI Are Transforming the Industry by Laura Madsen
Delivering Business Analytics: Practical Guidelines for Best Practice by Evan Stubbs
Demand-Driven Forecasting: A Structured Approach to Forecasting, Second Edition by Charles Chase
Demand-Driven Inventory Optimization and Replenishment: Creating a More Efficient Supply Chain by Robert A. Davis
Developing Human Capital: Using Analytics to Plan and Optimize Your Learning and Development Investments by Gene Pease, Barbara Beresford, and Lew Walker
Economic and Business Forecasting: Analyzing and Interpreting Econometric Results by John
Silvia, Azhar Iqbal, Kaylyn Swankoski, Sarah Watt, and Sam Bullard
Financial Institution Advantage and the Optimization of Information Processing by Sean C. Keenan
Financial Risk Management: Applications in Market, Credit, Asset, and Liability Management and Firmwide Risk by Jimmy Skoglund and Wei Chen
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection by Bart Baesens, Veronique Van Vlasselaer, and Wouter Verbeke
Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data Driven Models by Keith Holdaway
Health Analytics: Gaining the Insights to Transform Health Care by Jason Burke
Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World by Carlos Andre, Reis Pinheiro, and Fiona McNeill
Hotel Pricing in a Social World: Driving Value in the Digital Economy by Kelly McGuire
Implement, Improve and Expand Your Statewide Longitudinal Data System: Creating a Culture of Data in Education by Jamie McQuiggan and Armistead Sapp
Killer Analytics: Top 20 Metrics Missing from Your Balance Sheet by Mark Brown
Mobile Learning: A Handbook for Developers, Educators, and Learners by Scott McQuiggan, Lucy Kosturko, Jamie McQuiggan, and Jennifer Sabourin
The Patient Revolution: How Big Data and Analytics Are Transforming the Healthcare Experience by Krisa Tailor
Predictive Analytics for Human Resources by Jac Fitz-enz and John Mattox II
Predictive Business Analytics: Forward-Looking Capabilities to Improve Business Performance by Lawrence Maisel and Gary Cokins
Statistical Thinking: Improving Business Performance, Second Edition by Roger W. Hoerl and Ronald D. Snee
Too Big to Ignore: The Business Case for Big Data by Phil Simon
Trade-Based Money Laundering: The Next Frontier in International Money Laundering Enforcement by John Cassara
The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions by Phil Simon
Understanding the Predictive Analytics Lifecycle by Al Cordoba
Unleashing Your Inner Leader: An Executive Coach Tells All by Vickie Bevenour
Using Big Data Analytics: Turning Big Data into Big Money by Jared Dean
Visual Six Sigma, Second Edition by Ian Cox, Marie Gaudard, and Mia Stephens.
For more information on any of the above titles, please visit www.wiley.com.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Names: Hughes, Troy Martin, 1976– author.
Title: SAS data analytic development : dimensions of software quality / Troy Martin Hughes.
Description: Hoboken, New Jersey : John Wiley & Sons, 2016. | Includes index.
Identifiers: LCCN 2016021300 | ISBN 9781119240761 (cloth) | ISBN 9781119255918 (epub) | ISBN 9781119255703 (ePDF)
Subjects: LCSH : SAS (Computer file) | Quantitative research—Data processing.
To Mom, who dreamed of being a writer and, through unceasing love, raised one, and Dad, who taught me to program before I could even reach the keys.
Preface
Because SAS practitioners are software developers, too!
Within the body of SAS literature, an overwhelming focus on data quality eclipses software quality. Whether discussed in books, white papers, technical documentation, or even posted job descriptions, nearly all references to quality in relationship to SAS describe the quality of data or data products.
The focus on data quality and diversion from traditional software development priorities is not without reason. Data analytic development is software development but ultimate business value is delivered not through software products but rather through subsequent, derivative data products. In aligning quality only with data, however, data analytic development environments can place an overwhelming focus on software functional requirements to the detriment or exclusion of software performance requirements. When SAS literature does describe performance best practices, it typically demonstrates only how to make SAS software faster or more efficient while omitting other dimensions of software quality.
However, what about software reliability, scalability, security, maintainability, or modularity—or the host of other software quality characteristics? For all the SAS practitioners of the world—including developers, biostatisticians, econometricians, researchers, students, project managers, market analysts, data scientists, and others—this text demonstrates a model for software quality promulgated by the International Organization for Standardization (ISO) to facilitate the evaluation and pursuit of software quality.
Through hundreds of Base SAS software examples and more than 4,000 lines of code, SAS practitioners will learn how to define, prioritize, implement, and measure 15 dimensions of software quality. Moreover, nontechnical stakeholders, including project managers, functional managers, customers, sponsors, and business analysts, will learn to recognize the value of quality inclusion and the commensurate risk of quality exclusion. With this more comprehensive view of quality, SAS software quality is finally placed on par with SAS data quality.
Why this text and the relentless pursuit of SAS software quality? Because SAS practitioners, regardless of job title, are inherently software developers, too, and should benefit from industry standards and best practices. Software quality can and should be allowed to flourish in any environment.
OBJECTIVES
The primary goal is to describe and demonstrate SAS software development within the framework of the ISO software product quality model. The model defines characteristics of software quality codified within the Systems and software Quality Requirements and Evaluation (SQuaRE) series (ISO/IEC 25000:2014). Through the 15 intertwined dimensions of software quality presented in this text, readers will be equipped to understand, implement, evaluate, and, most importantly, value software quality.
A secondary goal is to demonstrate the role and importance of the software development life cycle (SDLC) in facilitating software quality. Thus, the dimensions of quality are presented as enduring principles that influence software planning, design, development, testing, validation, acceptance, deployment, operation, and maintenance. The SDLC is demonstrated in a requirements-based framework in which ultimate business need spawns technical requirements that drive the inclusion (or exclusion) of quality in software. Requirements initially provide the backbone of software design and ultimately the basis against which the quality of completed software is evaluated.
A tertiary goal is to demonstrate SAS software development within a risk management framework that identifies the threats of poor quality software to business value. Poor data quality is habitually highlighted in SAS literature as a threat to business value, but poor code quality can equally contribute to project failure. This text doesn't suggest that all dimensions of software quality should be incorporated in all software, but rather aims to formalize a structure through which threats and vulnerabilities can be identified and their ultimate risk to software calculated. Thus, performance requirements are most appropriately implemented when the benefits of their inclusion as well as the risks of their exclusion are understood.
AUDIENCE
Savvy SAS practitioners are the intended audience and represent the professionals who utilize the SAS application to write software in the Base SAS language. An advanced knowledge of Base SAS, including the SAS macro language, is recommended but not required.
Other stakeholders who will benefit from this text include project sponsors, customers, managers, Agile facilitators, ScrumMasters, software testers, and anyone with a desire to understand or improve software performance. Nontechnical stakeholders may have limited knowledge of the SAS language, or software development in general, yet nevertheless generate requirements that drive software projects. These stakeholders will benefit through the introduction of quality characteristics that should be used to define software requirements and evaluate software performance.
APPLICATION OF CONTENT
The ISO software product quality model is agnostic to industry, team size, organizational structure (e.g., functional, projectized, matrix), development methodology (e.g., Agile, Scrum, Lean, Extreme Programming, Waterfall), and developer role (e.g., developer, end-user developer). The student researcher working on a SAS client machine will gain as much insight from this text as a team of developers working in a highly structured environment with separate development, test, and production servers.
While the majority of Base SAS code demonstrated is portable between SAS interfaces and environments, some input/output (I/O) and other system functions, options, and parameters are OS- or interface-specific. Code examples in this text have been tested in the SAS Display Manager for Windows, SAS Enterprise Guide for Windows, and the SAS University Edition. Functional differences among these applications are highlighted throughout the text, and discussed in chapter 10, “Portability.”
While this text includes hundreds of examples of SAS code that demonstrate the successful implementation and evaluation of quality characteristics, it differs from other SAS literature in that it doesn't represent a compendia of SAS software best practices, but rather the application of SAS code to support the software product quality model within the SDLC. Therefore, code examples demonstrate software performance rather than functionality.
ORGANIZATION
Most software texts are organized around functionality—either a top-down approach in which a functional objective is stated and various methods to achieve that goal are demonstrated, or a bottom-up approach in which uses and caveats of a specific SAS function, procedure, or statement are explored. Because this text follows the ISO software product quality model and focuses on performance rather than functionality, it eschews the conventional organization of functionality-driven SAS literature. Instead, 15 chapters highlight a dynamic or static performance characteristic—a single dimension of software quality. Code examples often build incrementally throughout each chapter as quality objectives are identified and achieved, and related quality characteristics are highlighted for future reference and reading.
The text is divided into two parts comprising 18 total chapters:
Overview Three chapters introduce the concept of quality, the ISO software product quality model, the SDLC, risk management, Agile and Waterfall development methodologies, exception handling, and other information and terms central to the text. Even to the reader who is anxious to reach the more technically substantive performance chapters, Chapters 1, “Introduction,” and 2, “Quality,” should be skimmed to gleam the context of software quality within data analytic development environments.
Part I. Dynamic Performance These nine chapters introduce dynamic performance requirements—software quality attributes that are demonstrated, measured, and validated through software execution. For example, software efficiency can be demonstrated by running code and measuring run time and system resources such as CPU and memory usage. Chapters include “Reliability,” “Recoverability,” “Robustness,” “Execution Efficiency,” “Efficiency,” “Scalability,” “Portability,” “Security,” and “Automation.”
Part II. Static Performance These six chapters introduce static performance requirements—software quality attributes that are assessed through code inspection rather than execution. For example, the extent to which software is modularized cannot be determined until the code is opened and inspected, either through manual review or automated test software. Chapters include “Maintainability,” “Modularity,” “Readability,” “Testability,” “Stability,” and “Reusability.”
Text formatting constructs are standardized to facilitate SAS code readability. Formatting is not intended to demonstrate best practices but rather standardization. All code samples are presented in lowercase, but the following conventions are used where code is referenced within the text:
SAS libraries are capitalized, such as the WORK library, or the PERM.Burrito data set within the PERM library.
SAS data sets appear in sentence case, such as the Chimichanga data set or the WORK.Tacos_are_forever data set.
SAS reserved words—including statements, functions, and procedure names—are capitalized, such as the UPCASE function or the MEANS procedure.
The DATA step is always capitalized, such as the DATA step can be deleted if the SQL procedure is implemented.
Variables used within the DATA step or SAS procedures are capitalized, such as the variable CHAR1 is missing.
SAS user-defined formats are capitalized, such as the MONTHS format.
SAS macros are capitalized and preceded with a percent sign, such as the %LOCKITDOWN macro prevents file access collisions.
SAS macro variables are capitalized, such as the &DSN macro variable is commonly defined to represent the data set name.
SAS parameters that are passed to macros are capitalized, such as the DSN parameter in the %GOBIG macro invocation.
Acknowledgments
So many people, through contributions to my life as well as endurance and encouragement throughout this journey, have contributed directly and indirectly and made this project possible.
To the family and friends I ignored for four months while road-tripping through 24 states to write this, thank you for your love, patience, understanding, and couches.
To my teachers who instilled a love of writing, thank you for years of red ink and encouragement: Sister Mary Katherine Gallagher, Estelle McCarthy, Lorinne McKnight, Dolores Cummings, Millie Bizzini, Patty Ely, Jo Berry, Liana Hachiya, Audrey Musson, Dana Trevethan, Cheri Rowton, Annette Simmons, and Dr. Robyn Bell.
To the mentors whose words continue to guide me, thank you for your leadership and friendship: Dr. Cathy Schuman, Dr. Barton Palmer, Dr. Kiko Gladsjo, Dr. Mina Chang, Dean Kauffman, Rich Nagy, Jim Martin, and Jeff Stillman.
To my SAS spirit guides, thank you not only for challenging the limits of the semicolon but also for sharing your successes and failures with the world: Dr. Gerhard Svolba, Art Carpenter, Kirk Paul Lafler, Susan Slaughter, Lora Delwiche, Peter Eberhardt, Ron Cody, Charlie Shipp, and Thomas Billings.
To SAS, thank you for distributing the SAS University Edition and for providing additional software free of charge, without which this project would have been impossible.
Finally, thank you to John Wiley & Sons, Inc. for support and patience throughout this endeavor.
About the Author
Troy Martin Hughes has been a SAS practitioner for more than 15 years, has managed SAS projects in support of federal, state, and local government initiatives, and is a SAS Certified Advanced Programmer, SAS Certified Base Programmer, SAS Certified Clinical Trials Programmer, and SAS Professional V8. He has an MBA in information systems management and additional credentials, including: PMP, PMI-ACP, PMI-PBA, PMI-RMP, CISSP, CSSLP, CSM, CSD, CSPO, CSP, and ITIL v3 Foundation. He has been a frequent presenter and invited speaker at SAS user conferences, including SAS Global Forum, WUSS, MWSUG, SCSUG, SESUG, and PharmaSUG. Troy is a U.S. Navy veteran with two tours of duty in Afghanistan and, in his spare time, a volunteer firefighter and EMT.