Cover Page

Praise for Predictive Analytics

“Littered with lively examples…”

The Financial Times

“Readers will find this a mesmerizing and fascinating study. I know I did!…I was entranced by the book.”

The Seattle Post-Intelligencer

“Siegel is a capable and passionate spokesman with a compelling vision.”

Analytics Magazine

“A must-read for the normal layperson.”

Journal of Marketing Analytics

“This book is an operating manual for twenty-first-century life. Drawing predictions from big data is at the heart of nearly everything, whether it's in science, business, finance, sports, or politics. And Eric Siegel is the ideal guide.”

Stephen Baker, author, The Numerati and Final Jeopardy: The Story of Watson, the Computer That Will Transform Our World

“Simultaneously entertaining, informative, and nuanced. Siegel goes behind the hype and makes the science exciting.”

Rayid Ghani, Chief Data Scientist, Obama for America 2012 Campaign

“The most readable (for we laymen) ‘big data’ book I've come across. By far. Great vignettes/stories.”

Tom Peters, coauthor, In Search of Excellence

“The future is right now—you're living in it. Read this book to gain understanding of where we are and where we're headed.”

Roger Craig, record-breaking analytical Jeopardy! champion; Data Scientist, Digital Reasoning

“A clear and compelling explanation of the power of predictive analytics and how it can transform companies and even industries.”

Anthony Goldbloom, founder and CEO, Kaggle.com

“The definitive book of this industry has arrived. Dr. Siegel has achieved what few have even attempted: an accessible, captivating tome on predictive analytics that is a must-read for all interested in its potential—and peril.”

Mark Berry, VP, People Insights, ConAgra Foods

“I've always been a passionate data geek, but I never thought it might be possible to convey the excitement of data mining to a lay audience. That is what Eric Siegel does in this book. The stories range from inspiring to downright scary—read them and find out what we've been up to while you weren't paying attention.”

Michael J. A. Berry, author of Data Mining Techniques, Third Edition

“Eric Siegel is the Kevin Bacon of the predictive analytics world, organizing conferences where insiders trade knowledge and share recipes. Now, he has thrown the doors open for you. Step in and explore how data scientists are rewriting the rules of business.”

Kaiser Fung, VP, Vimeo; author of Numbers Rule Your World

“Written in a lively language, full of great quotes, real-world examples, and case studies, it is a pleasure to read. The more technical audience will enjoy chapters on The Ensemble Effect and uplift modeling—both very hot trends. I highly recommend this book!”

Gregory Piatetsky-Shapiro, Editor, KDnuggets; founder, KDD Conferences

“Exciting and engaging—reads like a thriller! Predictive analytics has its roots in people's daily activities and, if successful, affects people's actions. By way of examples, Siegel describes both the opportunities and the threats predictive analytics brings to the real world.”

Marianna Dizik, Statistician, Google

“A fascinating page-turner about the most important new form of information technology.”

Emiliano Pasqualetti, CEO, DomainsBot Inc.

“Succeeds where others have failed—by demystifying big data and providing real-world examples of how organizations are leveraging the power of predictive analytics to drive measurable change.”

Jon Francis, Senior Data Scientist, Nike

“In a fascinating series of examples, Siegel shows how companies have made money predicting what customers will do. Once you start reading, you will not be able to put it down.”

Arthur Middleton Hughes, VP, Database Marketing Institute; author of Strategic Database Marketing, Fourth Edition

“Excellent. Each chapter makes the complex comprehensible, making heavy use of graphics to give depth and clarity. It gets you thinking about what else might be done with predictive analytics.”

Edward Nazarko, Client Technical Advisor, IBM

“What is predictive analytics? This book gives a practical and up-to-date answer, adding new dimension to the topic and serving as an excellent reference.”

Ramendra K. Sahoo, Senior VP, Risk Management and Analytics, Citibank

“Competing on information is no longer a luxury—it's a matter of survival. Despite its successes, predictive analytics has penetrated only so far, relative to its potential. As a result, lessons and case studies such as those provided in Siegel's book are in great demand.”

Boris Evelson, VP and Principal Analyst, Forrester Research

“Fascinating and beautifully conveyed. Siegel is a leading thought leader in the space—a must-have for your bookshelf!”

Sameer Chopra, Chief Analytics Officer, Orbitz Worldwide

“A brilliant overview—strongly recommended to everyone curious about the analytics field and its impact on our modern lives.”

Kerem Tomak, VP of Marketing Analytics, Macys.com

“Eric explains the science behind predictive analytics, covering both the advantages and the limitations of prediction. A must-read for everyone!”

Azhar Iqbal, VP and Econometrician, Wells Fargo Securities, LLC

Predictive Analytics delivers a ton of great examples across business sectors of how companies extract actionable, impactful insights from data. Both the novice and the expert will find interest and learn something new.”

Chris Pouliot, Director, Algorithms and Analytics, Netflix

“In this new world of big data, machine learning, and data scientists, Eric Siegel brings deep understanding to deep analytics.”

Marc Parrish, VP, Membership, Barnes & Noble

“A detailed outline for how we might tame the world's unpredictability. Eric advocates quite clearly how some choices are predictably more profitable than others—and I agree!”

Dennis R. Mortensen, CEO of Visual Revenue, former Director of Data Insights at Yahoo!

“This book is an invaluable contribution to predictive analytics. Eric's explanation of how to anticipate future events is thought provoking and a great read for everyone.”

Jean Paul Isson, Global VP Business Intelligence and Predictive Analytics, Monster Worldwide; coauthor, Win with Advanced Business Analytics: Creating Business Value from Your Data

“Predictive analytics is the key to unlocking new value at a previously unimaginable economic scale. In this book, Siegel explains how, doing an excellent job to bridge theory and practice.”

Sergo Grigalashvili, VP of Information Technology, Crawford & Company

“Predictive analytics has been steeped in fear of the unknown. Eric Siegel distinctively clarifies, removing the mystery and exposing its many benefits.”

Jane Kuberski, Engineering and Analytics, Nationwide Insurance

“As predictive analytics moves from fashionable to mainstream, Siegel removes the complexity and shows its power.”

Rajeeve Kaul, Senior VP, OfficeMax

“Dr. Siegel humanizes predictive analytics. He blends analytical rigor with real-life examples with an ease that is remarkable in his field. The book is informative, fun, and easy to understand. I finished reading it in one sitting. A must-read…not just for data scientists!”

Madhu Iyer, Marketing Statistician, Intuit

“An engaging encyclopedia filled with real-world applications that should motivate anyone still sitting on the sidelines to jump into predictive analytics with both feet.”

Jared Waxman, Web Marketer at LegalZoom, previously at Adobe, Amazon, and Intuit

“Siegel covers predictive analytics from start to finish, bringing it to life and leaving you wanting more.”

Brian Seeley, Manager, Risk Analytics, Paychex, Inc.

“A wonderful look into the world of predictive analytics from the perspective of a true practitioner.”

Shawn Hushman, VP, Analytic Insights, Kelley Blue Book

“A must—Predictive Analytics provides an amazing view of the analytical models that predict and influence our lives on a daily basis. Siegel makes it a breeze to understand, for all readers.”

Zhou Yu, Online-to-Store Analyst, Google

“As our ability to collect and analyze information improves, experts like Eric Siegel are our guides to the mysteries unlocked and the moral questions that arise.”

Jules Polonetsky, Co-Chair and Director, Future of Privacy Forum; former Chief Privacy Officer, AOL and DoubleClick

“Highly recommended. As Siegel shows in his very readable new book, the results achieved by those adopting predictive analytics to improve decision making are game changing.”

James Taylor, CEO, Decision Management Solutions

“An engaging, humorous introduction to the world of the data scientist. Dr. Siegel demonstrates with many real-life examples how predictive analytics makes big data valuable.”

David McMichael, VP, Advanced Business Analytics

“An excellent exposition on the next generation of business intelligence—it's really mankind's latest quest for artificial intelligence.”

Christopher Hornick, President and CEO, HBSC Strategic Services

Predictive Analytics

img

The Power to Predict Who will Click, Buy, Lie, or Die

Eric Siegel

Wiley Logo

Dedication

This book is dedicated with all my heart to my mother,
Lisa Schamberg, and my father, Andrew Siegel.

Foreword

This book deals with quantitative efforts to predict human behavior. One of the earliest efforts to do that was in World War II. Norbert Wiener, the father of “cybernetics,” began trying to predict the behavior of German airplane pilots in 1940—with the goal of shooting them from the sky. His method was to take as input the trajectory of the plane from its observed motion, consider the pilot's most likely evasive maneuvers, and predict where the plane would be in the near future so that a fired shell could hit it. Unfortunately, Wiener could predict only one second ahead of a plane's motion, but 20 seconds of future trajectory were necessary to shoot down a plane.

In Eric Siegel's book, however, you will learn about a large number of prediction efforts that are much more successful. Computers have gotten a lot faster since Wiener's day, and we have a lot more data. As a result, banks, retailers, political campaigns, doctors and hospitals, and many more organizations have been quite successful of late at predicting the behavior of particular humans. Their efforts have been helpful at winning customers, elections, and battles with disease.

My view—and Siegel's, I would guess—is that this predictive activity has generally been good for humankind. In the context of healthcare, crime, and terrorism, it can save lives. In the context of advertising, using predictions is more efficient and could conceivably save both trees (for direct mail and catalogs) and the time and attention of the recipient. In politics, it seems to reward those candidates who respect the scientific method (some might disagree, but I see that as a positive).

However, as Siegel points out—early in the book, which is admirable—these approaches can also be used in somewhat harmful ways. “With great power comes great responsibility,” he notes in quoting Spider-Man. The implication is that we must be careful as a society about how we use predictive models, or we may be restricted from using and benefiting from them. Like other powerful technologies or disruptive human innovations, predictive analytics is essentially amoral and can be used for good or evil. To avoid the evil applications, however, it is certainly important to understand what is possible with predictive analytics, and you will certainly learn that if you keep reading.

This book is focused on predictive analytics, which is not the only type of analytics, but the most interesting and important type. I don't think we need more books anyway on purely descriptive analytics, which only describe the past and don't provide any insight as to why it happened. I also often refer in my own writing to a third type of analytics—“prescriptive”—that tells its users what to do through controlled experiments or optimization. Those quantitative methods are much less popular, however, than predictive analytics.

This book and the ideas behind it are a good counterpoint to the work of Nassim Nicholas Taleb. His books, including The Black Swan, suggest that many efforts at prediction are doomed to fail because of randomness and the inherent unpredictability of complex events. Taleb is no doubt correct that some events are black swans that are beyond prediction, but the fact is that most human behavior is quite regular and predictable. The many examples that Siegel provides of successful prediction remind us that most swans are white.

Siegel also resists the blandishments of the “big data” movement. Certainly some of the examples he mentions fall into this category—data that is too large or unstructured to be easily managed by conventional relational databases. But the point of predictive analytics is not the relative size or unruliness of your data, but what you do with it. I have found that “big data often equals small math,” and many big data practitioners are content just to use their data to create some appealing visual analytics. That's not nearly as valuable as creating a predictive model.

Siegel has fashioned a book that is both sophisticated and fully accessible to the non-quantitative reader. It's got great stories, great illustrations, and an entertaining tone. Such non-quants should definitely read this book, because there is little doubt that their behavior will be analyzed and predicted throughout their lives. It's also quite likely that most non-quants will increasingly have to consider, evaluate, and act on predictive models at work.

In short, we live in a predictive society. The best way to prosper in it is to understand the objectives, techniques, and limits of predictive models. And the best way to do that is simply to keep reading this book.

Thomas H. Davenport

Thomas H. Davenport is the President's
Distinguished Professor at Babson College,
a fellow of the MIT Center for Digital Business,
Senior Advisor to Deloitte Analytics,
and cofounder of the International Institute for Analytics.
He is the coauthor of Competing on Analytics,
Big Data @ Work
, and several other books on analytics.

Preface to the Revised and Updated Edition
What's New and Who's This Book for—The Predictive Analytics FAQ

Data Scientist: The Sexiest Job of the Twenty-first Century

—Title of a Harvard Business Review article by Thomas Davenport and DJ Patil, who in 2015 became the first U.S. Chief Data Scientist

Prediction is booming. It reinvents industries and runs the world.

More and more, predictive analytics (PA) drives commerce, manufacturing, healthcare, government, and law enforcement. In these spheres, organizations operate more effectively by way of predicting behavior—i.e., the outcome for each individual customer, employee, patient, voter, and suspect.

Everyone's doing it. Accenture and Forrester both report that PA's adoption has more than doubled in recent years. Transparency Market Research projects the PA market will reach $6.5 billion within a few years. A Gartner survey ranked business intelligence and analytics as the current number one investment priority of chief information officers. And in a Salesforce.com study, PA showed the highest growth rate of all sales tech trends, more than doubling its adoption in the next 18 months. High-performance sales teams are four times more likely to already be using PA than underperformers.

I am a witness to PA's expanding deployment across industries. Predictive Analytics World (PAW), the conference series I founded, has hosted over 10,000 attendees since its launch in 2009 and is expanding well beyond its original PAW Business events. With the expert assistance of industry partners, we've launched the industry-focused events PAW Government, PAW Healthcare, PAW Financial, PAW Workforce, and PAW Manufacturing, events for senior executives, and the news site The Predictive Analytics Times.

Since the publication of this book's first edition in 2013, I have been commissioned to deliver keynote addresses in each of these industries: marketing, market research, e-commerce, financial services, insurance, news media, healthcare, pharmaceuticals, government, human resources, travel, real estate, construction, and law, plus executive summits and university conferences.

Want a future career in futurology? The demand is blowing up. McKinsey forecasts a near-term U.S. shortage of 140,000 analytics experts and 1.5 million managers “with the skills to understand and make decisions based on analysis of big data.” LinkedIn's number one “Hottest Skills That Got People Hired” is “statistical analysis and data mining.”

PA is like Moneyball for…money.

Frequently Asked Questions about Predictive Analytics

Who Is this Book for?

Everyone. It's easily understood by all readers. Rather than a how-to for hands-on techies, the book serves lay readers, technology enthusiasts, executives, and analytics experts alike by covering new case studies and the latest state-of-the-art techniques.

Is the Idea of Predictive Analytics Hard to Understand?

Not at all. The heady, sophisticated notion of learning from data to predict may sound beyond reach, but breeze through the short Introduction chapter and you'll see: The basic idea is clear, accessible, and undeniably far-reaching.

Is this Book a How-To?

No, it is a conceptually complete, substantive introduction and industry overview.

Not a How-To? Then Why Should Techies Read it?

Although this mathless introduction is understandable by any reader—including those with no technical background—here's why it also affords value for would-be and established hands-on practitioners:

  • A great place to start—provides prerequisite conceptual knowledge for those who will go on to learn the hands-on practice or will serve in an executive or management role in the deployment of PA.
  • Detailed case studies—explores the real-world deployment of PA by Chase, IBM, HP, Netflix, the NSA, Target, U.S. Bank, and more.
  • A compendium of 182 mini-case studies—the Central Tables, divided into nine industry groups, include examples from BBC, Citibank, ConEd, Facebook, Ford, Google, the IRS, Match.com, MTV, PayPal, Pfizer, Spotify, Uber, UPS, Wikipedia, and more.
  • Advanced, cutting-edge topics—the last three chapters introduce subfields new even to many senior experts: Ensemble models, IBM Watson's question answering, and uplift modeling. No matter how experienced you are, starting with a conceptually rich albeit non-technical overview may benefit you more than you'd expect—especially for uplift modeling. The Notes for these three chapters then provide comprehensive references to technically deep sources (available at www.PredictiveNotes.com).
  • Privacy and civil liberties—the second chapter tackles the particular ethical concerns that arise when harnessing PA's power.
  • Holistic industry overview—the book extends more broadly than a standard technology introduction—all of the above adds up to a survey of the field that sheds light on its societal, commercial, and ethical context.

That said, burgeoning practitioners who wish to jump directly to a more traditional, technically in-depth or hands-on treatment of this topic should consider themselves warned: This is not the book you are seeking (but it makes a good gift; any of your relatives would be able to understand it and learn about your field of interest).

As with introductions to other fields of science and engineering, if you are pursuing a career in the field, this book will set the foundation, yet only whet your appetite for more. At the end of this book, you are guided by the Hands-On Guide on where to go next for the technical how-to and advanced underlying theory and math.

What Is the Purpose of this Book?

I wrote this book to demonstrate why PA is intuitive, powerful, and awe-inspiring. It's a book about the most influential and valuable achievements of computerized prediction and the two things that make it possible: the people behind it and the fascinating science that powers it.

While there are a number of books that approach the how-to side of PA, this book serves a different purpose (which turned out to be a rewarding challenge for its author): sharing with a wider audience a complete picture of the field, from the way in which it empowers organizations, down to the inner workings of predictive modeling.

With its impact on the world growing so quickly, it's high time the predictive power of data—and how to scientifically tap it—be demystified. Learning from data to predict human behavior is no longer arcane.

How Technical Does this Book Get?

While accessible and friendly to newcomers of any background, this book explores “under the hood” far enough to reveal the inner workings of decision trees (Chapter 4), an exemplary form of predictive model that serves well as a place to start learning about PA, and often as a strong first option when executing a PA project.

I strove to go as deep as possible—substantive across the gamut of fascinating topics related to PA—while still sustaining interest and accessibility not only for neophyte users, but even for those interested in the field avocationally, curious about science and how it is changing the world.

Is this a University Textbook?

This book has served as a textbook at more than 30 colleges and universities. A former computer science professor, I wrote this introduction to be conceptually complete. In the table of contents, the words in parentheses beside each chapter's “catchy” title reveal an outline that covers the fundamentals: (1) model deployment, (2) ethics, (3) data, (4) predictive modeling, (5) ensemble models, (6) question answering, and (7) uplift modeling. To guide reading assignments, see the diagram under the next question below.

However, this is not written in the formal style of a textbook; rather, I sought to deliver an entertaining, engaging, relevant work that illustrates the concepts largely via anecdotes.

For instructors considering this book for course material, additional resources and information may be found at www.teachPA.com.

How Should I Read this Book?

The chapters of this book build upon one another. Some depend only on first reading the Introduction, but others build cumulatively. The figure below depicts these dependencies—read a chapter only after first reading the one it points up to. For example, Chapter 3 assumes you've already read Chapter 1, which assumes you've read the Introduction.

A flowchart describing dependencies between chapters. Chapter 6 points an arrow upward to Chapter 5, which along with Chapter 7 points arrow at Chapter 4. It further points an arrow toward Chapter 3 followed by Chapter 1 and Introduction. Chapter 2 and central tables also point arrows at Introduction. An upward arrow means, “read the chapter above first.”

Dependencies between chapters. An arrow pointing up means, “Read the chapter above first.”

Note: If you are reading the e-book version, be sure not to miss the Central Tables (a compendium of 182 mini-case studies), the link for which may be less visibly located toward the end of the table of contents.

What's New in the “Revised and Updated” Edition of Predictive Analytics?

  • The Real Reason the NSA Wants Your Data: Automatic Suspect Discovery. A special sidebar in Chapter 2 (on ethics in PA) presumes—with much evidence—that the National Security Agency considers PA a strategic priority. Can the organization use PA without endangering civil liberties?
  • Dozens of new examples from Facebook, Hopper, Shell, Uber, UPS, the U.S. government, and more. The Central Tables' compendium of mini-case studies has grown to 182 entries, including breaking examples.
  • A much-needed warning regarding bad science. Chapter 3, “The Data Effect,” includes an in-depth section about an all-too-common pitfall and how we avoid it, i.e., how to successfully tap data's potential without being fooled by random noise, ensuring sound discoveries are made.
  • Even more extensive Notes, updated and expanded to 120 pages, now moved online. Now located at www.PredictiveNotes.com, the Notes include citations and comments that pertain to the above new content, as well as updated citations throughout chapters.

Where Can I Learn More After this Book, Such as a How-To for Hands-On Practice?

  • The Hands-On Guide at the end of this book—reading and training options that guide getting started
  • This book's website—videos, articles, and more resources: www.thepredictionbook.com
  • Predictive Analytics World—the leading cross-vendor conference series in North America and Europe, which includes advanced training workshop days and the industry-specific events PAW Business, PAW Government, PAW Healthcare, PAW Financial, PAW Workforce, and PAW Manufacturing: www.pawcon.com
  • The Predictive Analytics Guide—articles, industry portals, and other resources: www.pawcon.com/guide
  • Predictive Analytics Applied—the author's online training workshop, which, unlike this book, is a how-to. Access immediately, on-demand at any time: www.businessprediction.com
  • The Predictive Analytics Timesthe premier resource: industry news, technical articles, videos, events, and community: www.predictiveanalyticstimes.com

Preface to the Original Edition

Yesterday is history, tomorrow is a mystery, but today is a gift. That's why we call it the present.

—Attributed to A. A. Milne, Bil Keane, and Oogway, the wise turtle in Kung Fu Panda

People look at me funny when I tell them what I do. It's an occupational hazard.

The Information Age suffers from a glaring omission. This claim may surprise many, considering we are actively recording Everything That Happens in the World. Moving beyond history books that document important events, we've progressed to systems that log every click, payment, call, crash, crime, and illness. With this in place, you would expect lovers of data to be satisfied, if not spoiled rotten.

But this apparent infinity of information excludes the very events that would be most valuable to know of: things that haven't happened yet.

Everyone craves the power to see the future; we are collectively obsessed with prediction. We bow to prognostic deities. We empty our pockets for palm readers. We hearken to horoscopes, adore astrology, and feast upon fortune cookies.

But many people who salivate for psychics also spurn science. Their innate response says “yuck”—it's either too hard to understand or too boring. Or perhaps many believe prediction by its nature is just impossible without supernatural support.

There's a lighthearted TV show I like premised on this very theme, Psych, in which a sharp-eyed detective—a modern-day, data-driven Sherlock Holmesian hipster—has perfected the art of observation so masterfully, the cops believe his spot-on deductions must be an admission of guilt. The hero gets out of this pickle by conforming to the norm: He simply informs the police he is psychic, thereby managing to stay out of prison and continuing to fight crime. Comedy ensues.

I've experienced the same impulse, for example, when receiving the occasional friendly inquiry as to my astrological sign. But, instead of posing as a believer, I turn to humor: “I'm a Scorpio, and Scorpios don't believe in astrology.”

The more common cocktail party interview asks what I do for a living. I brace myself for eyes glazing over as I carefully enunciate: predictive analytics. Most people have the luxury of describing their job in a single word: doctor, lawyer, waiter, accountant, or actor. But, for me, describing this largely unknown field hijacks the conversation every time. Any attempt to be succinct falls flat:

  1. I'm a business consultant in technology. They aren't satisfied and ask, “What kind of technology?”
  2. I make computers predict what people will do. Bewilderment results, accompanied by complete disbelief and a little fear.
  3. I make computers learn from data to predict individual human behavior. Bewilderment, plus nobody wants to talk about data at a party.
  4. I analyze data to find patterns. Eyes glaze over even more; awkward pauses sink amid a sea of abstraction.
  5. I help marketers target which customers will buy or cancel. They sort of get it, but this wildly undersells and pigeonholes the field.
  6. I predict customer behavior, like when Target famously predicted whether you are pregnant. Moonwalking ensues.

So I wrote this book to demonstrate for you why predictive analytics is intuitive, powerful, and awe-inspiring.

I have good news: A little prediction goes a long way. I call this The Prediction Effect, a theme that runs throughout the book. The potency of prediction is pronounced—as long as the predictions are better than guessing. This effect renders predictive analytics believable. We don't have to do the impossible and attain true clairvoyance. The story is exciting yet credible: Putting odds on the future to lift the fog just a bit off our hazy view of tomorrow means pay dirt. In this way, predictive analytics combats risk, boosts sales, cuts costs, fortifies healthcare, streamlines manufacturing, conquers spam, toughens crime fighting, optimizes social networks, and wins elections.

Do you have the heart of a scientist or a businessperson? Do you feel more excited by the very idea of prediction, or by the value it holds for the world?

I was struck by the notion of knowing the unknowable. Prediction seems to defy a law of nature: You cannot see the future because it isn't here yet. We find a workaround by building machines that learn from experience. It's the regimented discipline of using what we do know—in the form of data—to place increasingly accurate odds on what's coming next. We blend the best of math and technology, systematically tweaking until our scientific hearts are content to derive a system that peers right through the previously impenetrable barrier between today and tomorrow.

Talk about boldly going where no one has gone before!

Some people are in sales; others are in politics. I'm in prediction, and it's awesome.