This edition first published 2014
© 2014 John Wiley & Sons Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Dictionary of computer vision and image processing / R. B. Fisher, T. P. Breckon,
K. Dawson-Howe, A. Fitzgibbon, C. Robertson, E. Trucco, C. K. I. Williams. – 2nd edition.
pages cm
Includes bibliographical references.
ISBN 978-1-119-94186-6 (pbk.)
1. Computer vision–Dictionaries. 2. Image processing–Dictionaries. I. Fisher, R. B.
TA1634.I45 2014
006.3′703–dc23
2013022869
A catalogue record for this book is available from the British Library.
ISBN: 9781119941866
From Bob to Rosemary,
Mies, Hannah, Phoebe
and Lars
From Toby to Alison, my
parents and Amy
From Ken to Jane,
William and Susie
From AWF to Liz, to my
parents, and again to D
From Craig to Karen,
Aidan and Caitlin
From Manuel to Emily,
Francesca, and Alistair
Preface
This dictionary arose out of a continuing interest in the resources needed by students and researchers in the fields of image processing, computer vision and machine vision (however you choose to define these overlapping fields). As instructors and mentors, we often found confusion about what various terms and concepts mean for the beginner. To support these learners, we have tried to define the key concepts that a competent generalist should know about these fields.
This second edition adds approximately 1000 new terms to the more than 2500 terms in the original dictionary. We have chosen new terms that have entered reasonably common usage (e.g., those which have appeared in the index of influential books) and terms that were not included originally. We are pleased to welcome Toby Breckon and Chris Williams into the authorial team and to thank Andrew Fitzgibbon and Manuel Trucco for all their help with the first edition.
One innovation in the second edition is the addition of reference links for a majority of the old and new terms. Unlike more traditional dictionaries, which provide references to establish the origin or meaning of the word, our goal here was instead to provide further information about the term.
Another innovation is to include a few videos for the electronic version of the dictionary.
This is a dictionary, not an encyclopedia, so the definitions are necessarily brief and are not intended to replace a proper textbook explanation of the term. We have tried to capture the essentials of the terms, with short examples or mathematical precision where feasible or necessary for clarity.
Further information about many of the terms can be found in the references. Many of the references are to general textbooks, each providing a broad view of a portion of the field. Some of the concepts are quite recent; although commonly used in research publications, they may not yet have appeared in mainstream textbooks. Subsequently, this book is also a useful source for recent terminology and concepts. Some concepts are still missing from the dictionary, but we have scanned textbooks and the research literature to find the central and commonly used terms.
The dictionary was intended for beginning and intermediate students and researchers, but as we developed the dictionary it was clear that we also had some confusions and vague understandings of the concepts. It surprised us that some terms had multiple usages. To improve quality and coverage, each definition was reviewed during development by at least two people besides its author. We hope that this has caught any errors and vagueness, as well as providing alternative meanings. Each of the co-authors is quite experienced in the topics covered here, but it was still educational to learn more about our field in the process of compiling the dictionary. We hope that you find using the dictionary equally valuable.
To help the reader, terms appearing elsewhere in the dictionary are underlined in the definitions. We have tried to be reasonably thorough about this, but some terms, such as 2D, 3D, light, camera, image, pixel, and color were so commonly used that we decided not to cross-reference all of them.
We have tried to be consistent with the mathematical notation: italics for scalars (s), arrowed italics for points and vectors , and bold for matrices (M).
The authors would like to thank Xiang (Lily) Li, Georgios Papadimitriou, and Aris Valtazanos for their help with finding citations for the content from the first edition. We also greatly appreciate all the support from the John Wiley & Sons editorial and production team!
Numbers
1D: One dimensional, usually in reference to some structure. Examples include: a signal x(t) that is a function of time t; the dimensionality of a single property value; and one degree of freedom in shape variation or motion. [Hec87:2.1]
1D projection: The projection of data from a higher dimension to a single dimensional representation (line).
1-norm: A specific case of the
p-norm, the sum of the absolute values of the entries of a given vector
, of length
n. Also known as the taxicab (Manhattan) norm or the
L1 norm. [Sho07]
2D: Two dimensional. A space describable using any pair of orthogonal basis vectors consisting of two elements. [WP: Two-dimensional_space]
2D coordinate system: A system uniquely associating two real numbers to any point of a plane. First, two intersecting lines (axes) are chosen on the plane, usually perpendicular to each other. The point of intersection is the origin of the system. Second, metric units are established on each axis (often the same for both axes) to associate numbers to points. The coordinates Px and Py of a point, P, are obtained by projecting P onto each axis in a direction parallel to the other axis and reading the numbers at the intersections: [JKS95:1.4]
2D Fourier transform: A special case of the general Fourier transform often used to find structures in images. [FP03:7.3.1]
2D image: A matrix of data representing samples taken at discrete intervals. The data may be from a variety of sources and sampled in a variety of ways. In computer vision applications, the image values are often encoded color or monochrome intensity samples taken by digital cameras but may also be range data. Some typical intensity values are: [SQ04:4.1.1]
2D input device: A device for sampling light intensity from the real world into a 2D matrix of measurements. The most popular two-dimensional imaging device is the charge-coupled device (CCD) camera. Other common devices are flatbed scanners and X-ray scanners. [SQ04:4.2.1]
2D point: A point in a 2D space, i.e., characterized by two coordinates; most often, a point on a plane, e.g., an image point in pixel coordinates. Notice, however, that two coordinates do not necessarily imply a plane: a point on a 3D surface can be expressed either in 3D coordinates or by two coordinates given a surface parameterization (see surface patch). [JKS95:1.4]
2D point feature: Localized structures in a 2D image, such as interest points, corners and line meeting points (e.g., X, Y and T shaped). One detector for these features is the SUSAN corner finder. [TV98:4.1]
2D pose estimation: A special case of
3D pose estimation. A fundamental open problem in
computer vision where the correspondence between two sets of 2D points is found. The problem is defined as follows: Given two sets of points
and
, find the
Euclidean transformation (the pose) and the match matrix {
Mjk} (the correspondences) that best relates them. A large number of techniques has been used to address this problem, e.g., tree-pruning methods, the
Hough transform and
geometric hashing. [HJL+89]
2D projection: A transformation mapping higher dimensional space onto two-dimensional space. The simplest method is to simply discard higher dimensional coordinates, although generally a viewing position is used and the projection is performed.
For example, the main steps for a computer graphics projection are as follows: apply normalizing transform to 3D point world coordinates; clip against canonical view volume; project onto projection plane; transform into viewport in 2D device coordinates for display. Commonly used projection functions are parallel projection and perspective projection. [JKS95:1.4]
2D shape descriptor (local): A compact summary representation of object shape over a localized region of an image. See shape descriptor. [Blu67]
2D shape representation (global): A compact summary representation of image shape features over the entire image. See shape representation. [FP03:28.3]
2D view: Planar aspect view or planar projected view (such as an image under perspective projection) such that positions within its spatial representation can be indexed in two dimensions. [SB11:2.3.1]
2.1D sketch: A lesser variant of the established 2.5D sketch, which captures the relative depth ordering of (possibly self-occluding) scene regions in terms of their front-to-back relationship within the scene. By contrast, the 2.5D sketch captures the relative scene depth of regions, rather than merely depth ordering: [NM90]
2.5D image: A range image obtained by scanning from a single viewpoint. It allows the data to be represented in a single image array, where each pixel value encodes the distance to the observed scene. The reason this is not called a 3D image is to make explicit the fact that the back sides of the scene objects are not represented. [SQ04:4.1.1]
2.5D model: A geometric model representation corresponding to the 2.5D image representation used in the model to (image) data matching problem of model-based recognition: [Mar82] An example model is:
2.5D sketch: Central structure of Marr’s Theory of vision. An intermediate description of a scene indicating the visible surfaces and their arrangement with respect to the viewer. It is built from several different elements: the contour, texture and shading information coming from the primal sketch, stereo information and motion. The description is theorized to be a kind of buffer where partial resolution of the objects takes place. The name 2.5D sketch stems from the fact that, although local changes in depth and discontinuities are well resolved, the absolute distance to all scene points may remain unknown. [FP03:11.3.2]
3D: Three dimensional. A space describable using any triple of mutually orthogonal basis vectors consisting of three elements. [WP: Three-dimensional_space]
3D coordinate system: Same as 2D coordinate system but in three dimensions: [JKS95:1.4]
3D data: Data described in all three spatial dimensions. See also range data, CAT and NMR. [WP: 3D_data_acquisition_and_object_reconstruction] An example of a 3D data set is:
3D data acquisition: Sampling data in all three spatial dimensions. There is a variety of ways to perform this sampling, e.g., using structured light triangulation. [FP03:21.1]
3D image: See range image.
3D imaging: Any of a class of techniques that obtain three-dimensional information using imaging equipment. Active vision techniques generally include a source of structured light (or other electromagnetic or sonar radiation) and a sensor, such as a camera or a microphone. Triangulation and time-of-flight computations allow the distance from the sensor system to be computed. Common technologies include laser scanning, texture projection systems and moiré fringe methods. Passive sensing in 3D depends only on external (and hence unstructured) illumination sources. Examples of such systems are stereo reconstruction and shape from focus techniques. See also 3D surface imaging and 3D volumetric imaging. [FMN+91]
3D interpretation: A 3D model, e.g., a solid object that explains an image or a set of image data. For instance, a certain configuration of image lines can be explained as the perspective projection of a polyhedron; in simpler words, the image lines are the images of some of the polyhedron’s lines. See also image interpretation. [BB82:9.1]
3D model: A description of a 3D object that primarily describes its shape. Models of this sort are regularly used as exemplars in model-based recognition and 3D computer graphics. [TV98:10.6]
3D model-based tracking: An extension of model-based tracking using a 3D model of the tracked object. [GX11:5.1.4]
3D moments: A special case of moment where the data comes from a set of 3D points. [GC93]
3D motion estimation: An extension of
motion estimation whereby the motion is estimated as a displacement vector
in
3. [LRF93]
3D motion segmentation: An extension to
motion segmentation whereby motion is segmented within an
3 dataset. [TV07]
3D object: A subset of
3. In computer vision, often taken to mean a volume in
3 that is bounded by a
surface. Any solid object around you is an example: table, chairs, books, cups; even yourself. [BB82:9.1]
3D point: An infinitesimal volume of 3D space. [JKS95:1.4]
3D point feature: A point feature on a 3D object or in a 3D environment. For instance, a corner in 3D space. [RBB09]
3D pose estimation: The process of determining the transformation (translation and rotation) of an object in one coordinate frame with respect to another coordinate frame. Generally, only rigid objects are considered; models of those objects exist
a priori and we wish to determine the position of the object in an image on the basis of matched features. This is a fundamental open problem in
computer vision where the correspondence between two sets of 3D points is found. The problem is defined as follows: Given two sets of points
and
, find the parameters of a
Euclidean transformation (the pose) and the match matrix {
Mjk} (the correspondences) that best relates them. Assuming the points correspond, they should match exactly under this transformation. [TV98:11.2]
3D reconstruction: The recovery of 3D scene information and organization into a 3D shape via e.g., multi-view geometry: [HZ00:Ch. 10]
3D shape descriptor: An extension to regular
shape descriptor approaches to consider object shape in
3. [Pri12:Ch. 17]
3D shape representation: A compact summary representation of shape extending shape representation to consider object shape in R3. [Pri12:Ch. 17]
3D SIFT: A 3D extension of the SIFT operator defined for use over voxel data. [FBM10]
3D skeleton: A 3D extension of an image skeleton defining a tree-like structure of the medial axes of a 3D object (akin to the form of a human stick figure in the case of considering a person as a 3D object). See also medial axis skeletonization: [Sze10:12.6] See example below:
3D stratigraphy: A modeling and visualization tool used to display different underground layers. Often used for visualizations of archaeological sites or for detecting rock and soil structures in geological surveying. [PKVG00]
3D structure recovery: See 3D reconstruction.
3D SURF: A 3D extension to the
SURF descriptor that considers the characterization of local image regions in
3 via either a volumetric
voxel-based or a surface-based representation. [KPW+10]
3D surface imaging: Obtaining surface information embedded in a 3D space. See also 3D imaging and 3D volumetric imaging.
3D texture: The appearance of texture on a 3D surface when imaged, e.g., the fact that the density of texels varies with distance because of perspective effects. 3D surface properties (e.g., shape, distances and orientation) can be estimated from such effects. See also shape from texture and texture orientation. [DN99]
3D vision: A branch of computer vision dealing with characterizing data composed of 3D measurements. This may involve segmentation of the data into individual surfaces that are then used to identify the data as one of several models. Reverse engineering is a specialism in 3D vision. [Dav90:16.2]
3D volumetric imaging: Obtaining measurements of scene properties at all points in a 3D space, including the insides of objects. This is used for inspection but more commonly for medical imaging. Techniques include nuclear magnetic resonance, computerized tomography, positron emission tomography and single photon emission computed tomography. See also 3D imaging and 3D surface imaging.
4 connectedness: A type of image connectedness in which each rectangular pixel is considered to be connected to the four neighboring pixels that share a common crack edge. See also 8 connectedness: [SQ04:4.5] Four pixels connected to a central pixel (*):
Four groups of pixels joined by 4 connectedness:
4D approach: An approach or solution to a given problem that utilizes both 3D-spatial and temporal information. See 4D representation (3D-spatial + time).
4D representation (3D-spatial + time): A 3D times series data representation whereby 3D scene information is available over a temporal sequence. An example would be a video sequence obtained from stereo vision or some other form of depth sensing: [RG08:Ch. 2]
8 connectedness: A type of image connectedness in which each rectangular pixel is considered to be connected to all eight neighboring pixels. See also 4 connectedness: [SQ04:4.5] Eight pixels connected to a central pixel (*):
Two groups of pixels joined by 8 connectedness:
8-point algorithm: An approach for the recovery of the fundamental matrix using a set of eight feature point correspondences for stereo camera calibration. [HZ00:11.2]