Cover Page

Contents

Cover

Title Page

Copyright Page

Dedication

Preface

Numbers

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

References

Supplemental Images

Title Page

From Bob to Rosemary,
Mies, Hannah, Phoebe
and Lars


From Toby to Alison, my
parents and Amy


From Ken to Jane,
William and Susie


From AWF to Liz, to my
parents, and again to D


From Craig to Karen,
Aidan and Caitlin


From Manuel to Emily,
Francesca, and Alistair

Preface

This dictionary arose out of a continuing interest in the resources needed by students and researchers in the fields of image processing, computer vision and machine vision (however you choose to define these overlapping fields). As instructors and mentors, we often found confusion about what various terms and concepts mean for the beginner. To support these learners, we have tried to define the key concepts that a competent generalist should know about these fields.

This second edition adds approximately 1000 new terms to the more than 2500 terms in the original dictionary. We have chosen new terms that have entered reasonably common usage (e.g., those which have appeared in the index of influential books) and terms that were not included originally. We are pleased to welcome Toby Breckon and Chris Williams into the authorial team and to thank Andrew Fitzgibbon and Manuel Trucco for all their help with the first edition.

One innovation in the second edition is the addition of reference links for a majority of the old and new terms. Unlike more traditional dictionaries, which provide references to establish the origin or meaning of the word, our goal here was instead to provide further information about the term.

Another innovation is to include a few videos for the electronic version of the dictionary.

This is a dictionary, not an encyclopedia, so the definitions are necessarily brief and are not intended to replace a proper textbook explanation of the term. We have tried to capture the essentials of the terms, with short examples or mathematical precision where feasible or necessary for clarity.

Further information about many of the terms can be found in the references. Many of the references are to general textbooks, each providing a broad view of a portion of the field. Some of the concepts are quite recent; although commonly used in research publications, they may not yet have appeared in mainstream textbooks. Subsequently, this book is also a useful source for recent terminology and concepts. Some concepts are still missing from the dictionary, but we have scanned textbooks and the research literature to find the central and commonly used terms.

The dictionary was intended for beginning and intermediate students and researchers, but as we developed the dictionary it was clear that we also had some confusions and vague understandings of the concepts. It surprised us that some terms had multiple usages. To improve quality and coverage, each definition was reviewed during development by at least two people besides its author. We hope that this has caught any errors and vagueness, as well as providing alternative meanings. Each of the co-authors is quite experienced in the topics covered here, but it was still educational to learn more about our field in the process of compiling the dictionary. We hope that you find using the dictionary equally valuable.

To help the reader, terms appearing elsewhere in the dictionary are underlined in the definitions. We have tried to be reasonably thorough about this, but some terms, such as 2D, 3D, light, camera, image, pixel, and color were so commonly used that we decided not to cross-reference all of them.

We have tried to be consistent with the mathematical notation: italics for scalars (s), arrowed italics for points and vectors inline, and bold for matrices (M).

The authors would like to thank Xiang (Lily) Li, Georgios Papadimitriou, and Aris Valtazanos for their help with finding citations for the content from the first edition. We also greatly appreciate all the support from the John Wiley & Sons editorial and production team!

Numbers

1D: One dimensional, usually in reference to some structure. Examples include: a signal x(t) that is a function of time t; the dimensionality of a single property value; and one degree of freedom in shape variation or motion. [Hec87:2.1]
1D projection: The projection of data from a higher dimension to a single dimensional representation (line).
1-norm: A specific case of the p-norm, the sum of the absolute values of the entries of a given vector inline, of length n. Also known as the taxicab (Manhattan) norm or the L1 norm. [Sho07]
2D: Two dimensional. A space describable using any pair of orthogonal basis vectors consisting of two elements. [WP: Two-dimensional_space]
2D coordinate system: A system uniquely associating two real numbers to any point of a plane. First, two intersecting lines (axes) are chosen on the plane, usually perpendicular to each other. The point of intersection is the origin of the system. Second, metric units are established on each axis (often the same for both axes) to associate numbers to points. The coordinates Px and Py of a point, P, are obtained by projecting P onto each axis in a direction parallel to the other axis and reading the numbers at the intersections: [JKS95:1.4]
c00uf001
2D Fourier transform: A special case of the general Fourier transform often used to find structures in images. [FP03:7.3.1]
2D image: A matrix of data representing samples taken at discrete intervals. The data may be from a variety of sources and sampled in a variety of ways. In computer vision applications, the image values are often encoded color or monochrome intensity samples taken by digital cameras but may also be range data. Some typical intensity values are: [SQ04:4.1.1]
c00uf002
2D input device: A device for sampling light intensity from the real world into a 2D matrix of measurements. The most popular two-dimensional imaging device is the charge-coupled device (CCD) camera. Other common devices are flatbed scanners and X-ray scanners. [SQ04:4.2.1]
2D point: A point in a 2D space, i.e., characterized by two coordinates; most often, a point on a plane, e.g., an image point in pixel coordinates. Notice, however, that two coordinates do not necessarily imply a plane: a point on a 3D surface can be expressed either in 3D coordinates or by two coordinates given a surface parameterization (see surface patch). [JKS95:1.4]
2D point feature: Localized structures in a 2D image, such as interest points, corners and line meeting points (e.g., X, Y and T shaped). One detector for these features is the SUSAN corner finder. [TV98:4.1]
2D pose estimation: A special case of 3D pose estimation. A fundamental open problem in computer vision where the correspondence between two sets of 2D points is found. The problem is defined as follows: Given two sets of points inline and inline, find the Euclidean transformation inline (the pose) and the match matrix {Mjk} (the correspondences) that best relates them. A large number of techniques has been used to address this problem, e.g., tree-pruning methods, the Hough transform and geometric hashing. [HJL+89]
2D projection: A transformation mapping higher dimensional space onto two-dimensional space. The simplest method is to simply discard higher dimensional coordinates, although generally a viewing position is used and the projection is performed.
c00uf003
For example, the main steps for a computer graphics projection are as follows: apply normalizing transform to 3D point world coordinates; clip against canonical view volume; project onto projection plane; transform into viewport in 2D device coordinates for display. Commonly used projection functions are parallel projection and perspective projection. [JKS95:1.4]
2D shape descriptor (local): A compact summary representation of object shape over a localized region of an image. See shape descriptor. [Blu67]
2D shape representation (global): A compact summary representation of image shape features over the entire image. See shape representation. [FP03:28.3]
2D view: Planar aspect view or planar projected view (such as an image under perspective projection) such that positions within its spatial representation can be indexed in two dimensions. [SB11:2.3.1]
2.1D sketch: A lesser variant of the established 2.5D sketch, which captures the relative depth ordering of (possibly self-occluding) scene regions in terms of their front-to-back relationship within the scene. By contrast, the 2.5D sketch captures the relative scene depth of regions, rather than merely depth ordering: [NM90]
c00uf004
2.5D image: A range image obtained by scanning from a single viewpoint. It allows the data to be represented in a single image array, where each pixel value encodes the distance to the observed scene. The reason this is not called a 3D image is to make explicit the fact that the back sides of the scene objects are not represented. [SQ04:4.1.1]
2.5D model: A geometric model representation corresponding to the 2.5D image representation used in the model to (image) data matching problem of model-based recognition: [Mar82] An example model is:
c00uf005
2.5D sketch: Central structure of Marr’s Theory of vision. An intermediate description of a scene indicating the visible surfaces and their arrangement with respect to the viewer. It is built from several different elements: the contour, texture and shading information coming from the primal sketch, stereo information and motion. The description is theorized to be a kind of buffer where partial resolution of the objects takes place. The name 2.5D sketch stems from the fact that, although local changes in depth and discontinuities are well resolved, the absolute distance to all scene points may remain unknown. [FP03:11.3.2]
3D: Three dimensional. A space describable using any triple of mutually orthogonal basis vectors consisting of three elements. [WP: Three-dimensional_space]
3D coordinate system: Same as 2D coordinate system but in three dimensions: [JKS95:1.4]
c00uf006
3D data: Data described in all three spatial dimensions. See also range data, CAT and NMR. [WP: 3D_data_acquisition_and_object_reconstruction] An example of a 3D data set is:
c00uf007
3D data acquisition: Sampling data in all three spatial dimensions. There is a variety of ways to perform this sampling, e.g., using structured light triangulation. [FP03:21.1]
3D image: See range image.
3D imaging: Any of a class of techniques that obtain three-dimensional information using imaging equipment. Active vision techniques generally include a source of structured light (or other electromagnetic or sonar radiation) and a sensor, such as a camera or a microphone. Triangulation and time-of-flight computations allow the distance from the sensor system to be computed. Common technologies include laser scanning, texture projection systems and moiré fringe methods. Passive sensing in 3D depends only on external (and hence unstructured) illumination sources. Examples of such systems are stereo reconstruction and shape from focus techniques. See also 3D surface imaging and 3D volumetric imaging. [FMN+91]
3D interpretation: A 3D model, e.g., a solid object that explains an image or a set of image data. For instance, a certain configuration of image lines can be explained as the perspective projection of a polyhedron; in simpler words, the image lines are the images of some of the polyhedron’s lines. See also image interpretation. [BB82:9.1]
3D model: A description of a 3D object that primarily describes its shape. Models of this sort are regularly used as exemplars in model-based recognition and 3D computer graphics. [TV98:10.6]
3D model-based tracking: An extension of model-based tracking using a 3D model of the tracked object. [GX11:5.1.4]
3D moments: A special case of moment where the data comes from a set of 3D points. [GC93]
3D motion estimation: An extension of motion estimation whereby the motion is estimated as a displacement vector inline in inline3. [LRF93]
3D motion segmentation: An extension to motion segmentation whereby motion is segmented within an inline3 dataset. [TV07]
3D object: A subset of inline3. In computer vision, often taken to mean a volume in inline3 that is bounded by a surface. Any solid object around you is an example: table, chairs, books, cups; even yourself. [BB82:9.1]
3D point: An infinitesimal volume of 3D space. [JKS95:1.4]
3D point feature: A point feature on a 3D object or in a 3D environment. For instance, a corner in 3D space. [RBB09]
3D pose estimation: The process of determining the transformation (translation and rotation) of an object in one coordinate frame with respect to another coordinate frame. Generally, only rigid objects are considered; models of those objects exist a priori and we wish to determine the position of the object in an image on the basis of matched features. This is a fundamental open problem in computer vision where the correspondence between two sets of 3D points is found. The problem is defined as follows: Given two sets of points inline and inline, find the parameters of a Euclidean transformation inline (the pose) and the match matrix {Mjk} (the correspondences) that best relates them. Assuming the points correspond, they should match exactly under this transformation. [TV98:11.2]
3D reconstruction: The recovery of 3D scene information and organization into a 3D shape via e.g., multi-view geometry: [HZ00:Ch. 10]
c00uf008
3D shape descriptor: An extension to regular shape descriptor approaches to consider object shape in inline3. [Pri12:Ch. 17]
3D shape representation: A compact summary representation of shape extending shape representation to consider object shape in R3. [Pri12:Ch. 17]
3D SIFT: A 3D extension of the SIFT operator defined for use over voxel data. [FBM10]
3D skeleton: A 3D extension of an image skeleton defining a tree-like structure of the medial axes of a 3D object (akin to the form of a human stick figure in the case of considering a person as a 3D object). See also medial axis skeletonization: [Sze10:12.6] See example below:
c00uf009
3D stratigraphy: A modeling and visualization tool used to display different underground layers. Often used for visualizations of archaeological sites or for detecting rock and soil structures in geological surveying. [PKVG00]
3D structure recovery: See 3D reconstruction.
3D SURF: A 3D extension to the SURF descriptor that considers the characterization of local image regions in inline3 via either a volumetric voxel-based or a surface-based representation. [KPW+10]
3D surface imaging: Obtaining surface information embedded in a 3D space. See also 3D imaging and 3D volumetric imaging.
3D texture: The appearance of texture on a 3D surface when imaged, e.g., the fact that the density of texels varies with distance because of perspective effects. 3D surface properties (e.g., shape, distances and orientation) can be estimated from such effects. See also shape from texture and texture orientation. [DN99]
3D vision: A branch of computer vision dealing with characterizing data composed of 3D measurements. This may involve segmentation of the data into individual surfaces that are then used to identify the data as one of several models. Reverse engineering is a specialism in 3D vision. [Dav90:16.2]
3D volumetric imaging: Obtaining measurements of scene properties at all points in a 3D space, including the insides of objects. This is used for inspection but more commonly for medical imaging. Techniques include nuclear magnetic resonance, computerized tomography, positron emission tomography and single photon emission computed tomography. See also 3D imaging and 3D surface imaging.
4 connectedness: A type of image connectedness in which each rectangular pixel is considered to be connected to the four neighboring pixels that share a common crack edge. See also 8 connectedness: [SQ04:4.5] Four pixels connected to a central pixel (*):
c00uf010
Four groups of pixels joined by 4 connectedness:
c00uf011
4D approach: An approach or solution to a given problem that utilizes both 3D-spatial and temporal information. See 4D representation (3D-spatial + time).
4D representation (3D-spatial + time): A 3D times series data representation whereby 3D scene information is available over a temporal sequence. An example would be a video sequence obtained from stereo vision or some other form of depth sensing: [RG08:Ch. 2]
c00uf012
8 connectedness: A type of image connectedness in which each rectangular pixel is considered to be connected to all eight neighboring pixels. See also 4 connectedness: [SQ04:4.5] Eight pixels connected to a central pixel (*):
c00uf013
Two groups of pixels joined by 8 connectedness:
c00uf014
8-point algorithm: An approach for the recovery of the fundamental matrix using a set of eight feature point correspondences for stereo camera calibration. [HZ00:11.2]