The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliographie: http://dnb.d-nb.de
Stephan Salinger, Lutz Prechelt:
Understanding Pair Programming: The Base Layer
Typeset with LaTeX in Palatino font Published and printed by:
BoD — Books on Demand, Norderstedt, Germany
www.bod.de
ISBN 978-3-7322-0270-6
© Copyright 2013 by Stephan Salinger and Lutz Prechelt
This work is licensed under a Creative Commons
Attribution–NonCommercial–NoDerivatives 4.0 International License CC BY–NC–ND 4.0
http://creativecommons.org/licenses/by-nc-nd/4.0/
Note
The PDF version of this book contains very many cross-reference hyperlinks. It may be convenient to use the paper version for learning but then the PDF version for actually working with the base layer.
Acknowledgments
Sincere thanks to Laura Plonka for collecting a large part of our session recordings and for working closely with Stephan in the early stage of our analysis, to Franz Zieris for the first serious third-party use of the base layer, to Franz Zieris, David Socha, and Helen Sharp for their feedback on the book draft, to Gesine Milde for proofreading, and to all the pairs that agreed to be recorded and scrutinized.
… in which we explain what this book is all about, how to best use it, and what notation we will use for the examples.
This book is a handbook for researchers attempting to make sense of what is going on in pair programming sessions; it is based on Stephan Salinger’s Ph.D. dissertation [11]. The present chapter will introduce pair programming (in Section 1.1), summarize what research has so far found out about it (Section 1.2), explain the the raw data we have used (Section 1.3) and the research approach we propose (Section 1.4), propose how to make use of the book (Section 1.5), and introduce a few key terms and notations (Section 1.6).
Assume you have a Ph.D. in dancing science and are the only non-programmer at a party full of programmers. According to the stereotype, it is hard to talk to these people. Your best bet would be to grab two or three of them at once and ask
“Is pair programming a good engineering practice?”
The ensuing discussion will be lively and despite talking to techies you can have your part in the discussion!
Pair programming is a subtle matter and so any good answer to the question ought to begin with “Well… ”, but (and that is what makes the discussion so lively) many people appear to have a simplified notion of it and a correspondingly clear opinion.
Why is that so? And what, exactly, is pair programming anyway?
Pair programming is an old technique. Fred Brooks (of Mythical Man-Month fame) reports: “Fellow graduate student Bill Wright and I first tried pair programming when I was a grad student (1953–56). We produced 1500 lines of defect-free code; it ran correctly first try.” [15, p.8]. Its modern popularity is largely due to Kent Beck’s 1999 book on eXtreme Programming (XP) [2], a holistic method for small-team software development consisting of twelve practices, a core one of which is pair programming. In the section on pair programming, Beck states “Pair programming really deserves its own book. It’s a subtle skill” [2, p.100], and indeed such a book appeared in 2002: “Pair Programming Illuminated”. It offers the following characterization:
“Pair programming is a style of programming in which two programmers work side by side at one computer, continually collaborating on the same design, algorithm, code, or test. One of the pair, called the driver, is typing at the computer or writing down a design. The other partner, called the navigator, has many jobs, one of which is to observe the work of the driver, looking for tactical and strategic defects.” [15, p.3]
Note that more than half of this definition is concerned with describing the roles of driver and navigator (the latter is now more (Google-)commonly called “observer”). But once you have read this book (or any substantial part of it), you will know that while the first part of the definition is alright, the second part is misleading: The description of both roles is wrong in many respects and the whole driver/observer distinction does not go far in characterizing the pair programming process anyway.1
Kent Beck’s description is shorter: “Pair programming—All production code is written with two programmers at one machine.” [2, p.54]. There is elaboration later, but this is arguably his definition of this all-important practice. The 2004 second edition of the book is more explicit:
“Write all production programs with two people sitting at one machine. Set up the machine so the partners can sit comfortably side-by-side. Move the keyboard and mouse back and forth so you are comfortable while you are typing. Pair programming is a dialog between two people simultaneously programming (and analyzing and designing and testing) and trying to program better.” [3, p.26]
“Sitting comfortably” sounds like trivial information compared to the presumably illuminating driver/observer characterization, but it is relevant. And once you have read the present book, you will appreciate that the above definition captures, very inconspicuously, a key property of pair programming: “Pair programming is a dialog”. Yes!
At this point, we have nothing to add to that.
This leaves the other question: Why do some people have such a simplified (and then strong) notion of whether pair programming is a good engineering practice? The strongest ones tend to be the strict opponents: their attitude is usually the belief that the obvious cost of pair programming (occupying two precious software developers rather than just one) is so large that no corresponding benefits can possibly outweigh it.
More thoughtful discussants will not readily agree because the list of potential benefits is impressive. Here is (in paraphrased form) the one presented in “Pair Programming Illuminated” [15, p.4]:
How much do we know about which of these are true and to what degree? Not much.
Since the pioneering study of Nosek (which appeared even before Kent Beck’s book) in 1998 [10] there have been many empirical studies on pair programming, in particular controlled experiments comparing it to solo programming, but the amount of knowledge produced by these studies is not large; an overview of research until 2007 is provided by Hannay et al. [8]. We do not aim at a detailed overview here. Roughly speaking, there is good evidence that pairs tend to be faster than solo programmers, some evidence that their work tends to have fewer defects, and beginning evidence that the designs produced are better. The size of each of these effects, however, is hardly understood: The results of individual studies differ so much (and those differences remain unexplained) that taken together the results are inconclusive.
What is worse, their validity is highly questionable as the conditions under which most of them were created are highly unrealistic: mostly nonprofessional programmers, normally non-gelled pairings, usually either development from scratch or work on fairly small programs, generally little or no relevance of domain knowledge. Even the most ambitious of the controlled experiments, which hired 295 professionals for one day, concluded: “It is possible that the benefits of pair programming will exceed the results obtained in this experiment for larger, more complex tasks and if the pair programmers have a chance to work together over a longer period of time.” [1]. This statement is also one of the few exceptions of the disturbing tendency that most studies tacitly assume there is no such thing as a specific pair programming skill distinct from general software development skill. We believe that this assumption is wrong and that successful pair programming research needs to reflect that. This implies a lot of qualitative pair programming research will be required before meaningful designs for quantitative pair programming studies can even be formulated.
For such qualitative types of research questions, the amount of work done so far is much smaller3 although the number of questions is larger: There is evidence that different capability levels of the pair members play a role [5] and some evidence that personality characteristics of the pair members may play a modest role, too [9]. Only few studies discuss high-level behaviors or mechanisms and those do not do much decomposition or analysis yet, e.g. [6], or are even based on anecdotal evidence only, e.g. [16].
In our view, the most conclusive of the qualitative studies showed that the description of the driver and navigator roles from the above definition does not represent reality: Rather than working on different levels of abstraction (low and high for the navigator versus medium for the driver) as the definition assumes, the partners in fact strongly tend to move through these abstraction levels together [4, 6]. Work towards a more meaningful roles model is still in its infancy [13].
The results and all examples presented in this book are based on complete recordings of individual pair programming sessions. The recordings consist of audio, a pixel-precise recording of all screen activity, and a webcam recording of the pair (usually recorded from atop the monitor). We use Techsmith Camtasia Studio4 for recording and place the webcam video into the lower-right corner of the screen video. See [12] for a few more details.
We possess a substantial collection of such recordings of typically one to three hours length. 55 recordings stem from pairs of 48 different volunteer industrial software developers (called A1 to K4, see session descriptions below) doing their normal work in their usual environment (domain, code, task, tools, hardware, office, etc.) in one of 11 different companies (called A to K). The reality distortion of these videos is presumably negligible; the pairs do not show (nor report when interviewed afterwards) any acute awareness of being recorded beyond a minute into their work. The videos reflect a variety of domains, developer constellations, and task types; most tasks can be subsumed under extension programming. They reflect only small cultural variety, though: All sessions are from German companies and involve German-speaking developers. See the note on translation in Section 1.6.
Further 28 recordings stem from pairs of 56 different volunteer graduate students (called Z1 to Z56) working in one of 5 different controlled laboratory settings (called ZA to ZE). The advantage of these recordings is that the researcher has a good understanding of the code base, the task, and correct solutions for the task, making it often much easier to understand what is really going on in the session.
Only 7 of these recordings (6 professional/industrial and 1 student/laboratory) were used for the research reflected here. For the concepts reported here, we reached theoretical saturation (see Section 1.4.4) with only this many.5 For the examples presented in this book, we even confine ourselves to only three of these sessions, so that over time you can get better acquainted with their respective topics; some of the examples are even understandably related. These three sessions are the following:
An industrial session (with a duration of 1:47 hours) of two professional programmers B1 and B2 who worked for a large community portal operator B and had paired several times before. They built an extension to the community portal, which is implemented in PHP. The task difficulty had several aspects including understanding the design and design rationale of the pre-existing code, which had been written by nearshore programmers.
An industrial session (duration 1:16 hours) of two professional programmers C2 and C5 who worked for a software product company C. The product they work on is a geographics information system (GIS) desktop GUI application written in Java. The design of this software uses abstraction elaborately; the task involves a small functional extension and its main difficulty lies in understanding and properly applying the existing design abstractions.
A laboratory session (duration 2:58 hours) of two graduate students Z19 and Z20 who had worked together as a pair several times before. They built a small extension to a cleanly designed Java EE web shop system with which they were modestly familiar. The main task difficulty lay in the need to apply certain Java EE technologies (JMS, JNDI, JBoss application server) that the developers had learned about in a recent graduate course but had not applied often beforehand.
The purpose of this book is to lay the groundwork for a stream of research aiming at thoroughly understanding pair programming. We will now explain why we believe this is relevant from the perspective of basic software engineering research (Section 1.4.1) as well as from a practioner perspective (1.4.2), what the overall architecture of this research will look like (1.4.3), which specific research method we suggest to primarily use (1.4.4) and what the benefits are with respect to science’s principle of knowledge accumulation (“standing on the shoulders of giants”, Section 1.4.5).
Several decades after research began that attempted to understand what is going on in the activity we call “programming”, this understanding is still very much in its infancy. Pair programming provides a wonderful opportunity for making a lot of progress there, because rather than having to rely on artificial think-aloud data gathering techniques, pair programmers verbalize naturally much of the time.
Pair programming will surely be different from solo programming in many respects, but probably also fundamentally similar. And while think-aloud studies may occasionally be possible even in industrial work contexts, they tend to be difficult to arrange. In comparison, pair programming data can be gathered more easily and almost uninvasively in industrial work contexts on real work tasks; see Section 1.3. This whole basic research aspect, however, is more a fringe benefit, not the core reason why we started this line of work.
Our overall research goal is to understand the mechanisms of pair programming sufficiently well to provide practitioners with detailed advice regarding (a) in which situations to use pair programming and (b) how pair members might behave to make pair programming effective, smooth, and efficient.
The basic idea for achieving this is to understand many sub-behaviors at work within pair programming and formulate this understanding into one or more patterns or antipatterns of behavior for each. This research will be almost purely qualitative; better quantitative research can then be started based on this differentiated and advanced understanding.
The goals described in Sections 1.4.1 and 1.4.2 are far too ambitious for a single research project; the work needs to be modularized somehow. This, however, will not be easy: Initially, many fundamentals need to be understood before even the first few useful patterns will emerge. Later on, of the various topics studied, many will be interdependent or at least layered on top of each other.
Our overall approach is therefore to first lay a foundation of elementary concepts useful for analyzing and understanding pair programming sessions. This is what the current book is about. We call this foundation the base layer. It consists of a set of base concepts (surprisingly called the base concept set and introduced in Chapters 3 to 20) and rules for its use (Chapter 21) and extension (Chapter 22).
On top of this foundation, a subsequent study of some pair programming topic X (such as “decision-making”) can then build an X-layer of concepts that together characterize X. While working on the X-layer, the study can make use of the base layer and of the concepts found in subsequent studies performed earlier on other topics A, B, C (say, “pair programming roles” and others). If, for understanding X, some other topic Y (say, “knowledge transfer”) is relevant, the study on X will obtain a minimal understanding of Y required internally but needs not work it out fully.
Once the study of Y has been performed later (which may also use the X-layer fully), the X-layer can be consolidated into also using the Y-layer. This will break the layering for the overall results (pair programming is a holistic activity after all!), but still keeps a convenient mostly-layered work style for the individual sub-studies.
Each such study may provide a number of behavioral patterns and antipatterns. The role of the base layer is special because it provides common terminology that not only jumpstarts but also connects the other studies such as to form a whole rather than a set of separate pieces. The number of concepts in the base layer is sufficiently small to allow the various researchers to stay on top of them, so there are good chances of actual (near-)consistency between studies even of different researchers rather than only formal pseudo-consistency.
When we started with this work, we felt that many of the common statements made about pair programming were likely misleading or at least naive, but we had no expectation of what a better characterization would be like. We shared Kent Beck’s view that pair programming is “a subtle skill”. So once we had made the decision to analyze session recordings such as those described in Section 1.3, we had no idea which aspects of them would be relevant: The dialog content? Its wording? Phrasing? Intonation? Screen content? Changes of screen content? Human activity on the computer? Gestures? Facial expressions? The list went on and on. We quickly decided it would be important to pick a research method that was as empty of assumptions as possible.
Ethnographical approaches are rather far away from software engineering thinking, so we decided for Grounded Theory Methodology (GTM) [14] as our basic research approach. We selected the Straussian variety because we expect its higher degree of structuredness to be more appealing to software engineers compared to the Glaser style – and we believe that both methods, if understood correctly, will lead to similarly valuable results.
We will not give a primer on Grounded Theory Methodology here. If you have not used GTM before, you might want to get a textbook about it and read it up; there are a number of such books. The Strauss/Corbin book (or its second edition but preferably not the third) is a possibility although other books may be easier to work with. To summarize it in a nutshell, GTM suggests to work as follows:
• GTM work aims at a conceptual explanation (theory) of some phenomenon of interest for which each element of the explanation (called a concept or category; we will only use the former term) is directly connected to one or more raw observations (grounding).
• Formulate your research interest. In our case this was “Define the elementary behaviors which constitute pair programming.”6 The research question is allowed to drift freely during GTM work.
• Obtain some observation data. In our case this was the first handful of session recordings. GTM work does not require to pre-plan the data collection nor to achieve any kind of representativeness. Additional data will be collected once the researcher has found for what sub-phenomena more data is needed (theoretical sampling). For instance, a study of knowledge transfer in pair programming might find that the general knowledge level difference within the pair appears to be highly relevant. If no recording of an expert working with a true novice is yet available, the researcher would look for such a context and make a recording there. Representativeness is not required because GTM results focus on explaining things that exist, not on making claims about their frequency.
• Work through the observation data and annotate labels to phenomena that appear “interesting” with respect to the research focus. You need theoretical sensitivity to select relevant phenomena and appropriate labels. The phenomenon can be anything and of any granularity. Each label is the name of a (preliminary) concept; see Chapters 4 to 20 for examples. It is meant to be reused in several places in the data. Each concept is chosen such as to help explain some aspect of the phenomenon (theoretical coding). This process is called open coding.
• When assigning the same label again, make sure the phenomena are similar so that you will obtain a consistent concept. To do so, compare to all previous annotations of this concept (constant comparison), determine the commonalities, and record them in a memo. Make sure your concept assignment is fully grounded, that is, is based only on phenomena actually present in your data, not on any prior knowledge you might have (or rather: assume). The ungrounded use of any prior assumption when assigning a concept is called forcing.
• If the differences between phenomena annotated with the same concept appear relevant, represent them by auxiliary concepts: attributes (properties) and attribute values (also properties); apply constant comparison and memoing to them as well. This process is called dimensionalization. Avoid forcing.
• If you have accumulated enough isolated concepts, start discovering relevant relationships between concepts and validate them for specific phenomena. The relationships may pertain to context factors, constraints, causes, effects, the actor’s strategies, etc. This process is called axial coding and should also involve constant comparison as well as a lot of memoing. Avoid forcing. Meanwhile, open coding continues as well.
• If you have accumulated enough relationships, determine the core of the subject matter and extract those concepts around it that allow to formulate a narrative (grounded theory) that explains what is going on around this core concept. This process is called selective coding. Beware of forcing! Selective coding can start as soon as you have the first idea for it and should start no later than when you find you are detecting only known concepts, not creating new ones (theoretical saturation). Selective coding will often point out gaps in your conceptualization and hence trigger theoretical sampling, in particular if you start it early.
Working in this manner (with mostly open coding, some dimensionalization, a little axial coding, and no selective coding) and considering all of the abovementioned aspects of the data, we were initially totally overwhelmed by the amount of information residing in our recordings. To cope with this, we developed several additions to plain GTM (see [12] for details), in particular:
• A perspective on the data: GTM suggests to initially conceptualize “everything” that may be of relevance and only start focussing on fewer concepts during selective coding. This approach does not work for data as rich as ours with a research question as open as ours. We decided early on that we would need to constrain ourselves to behavioristic concepts as much as possible (see Section 2.3.4 for details) and soon thereafter to conceptualize verbal interaction in far more detail than other behaviors (see Section 2.3.1).
• Structured concept names to further constrain and structure the applicable concept universe in order to make it manageable. See Section 2.1.1 and Section 3.1 for details.
• Pair conceptualizing: Doing GTM in pairs (which we originally called pair coding) helps to quickly weed out or improve inadequate conceptualizations, in particular early in a study when the concept set is still small and hence open to a multitude of possible additions, including additions that lead astray. This practice can save inordinate amounts of time and frustration.
• Furthermore, most GTM books recommend transcribing the data but adequate transcription of hour-long audio/video data that is as fine-grained and feature-rich as ours is hardly practical. So we annotate these data directly (without transcription) in the ATLAS.ti7 data analysis software.
When doing GTM, knowing a lot about your phenomenon in advance is a mixed blessing: On the one hand, such prior knowledge can greatly enhance your theoretical sensitivity and hence speed up the research process a lot. On the other hand, it can lead to forcing and thus ruin the validity of your results if you are not careful.
all is data.