Next: Landmark Tracking Up: Visual Tracking Previous: Visual Tracking

Landmark Recognition

As we have already stated, we can exploit the image intensity distribution in the neighbourhood of a candidate landmark in order to achieve recognition of a previously observed prototype. To this end, we represent the appearance of landmarks (both candidates and prototypes) using a technique known as principal components analysis (PCA) [60, 44, 48]. Image recognition using PCA operates by projecting the image to be classified into a subspace which ``best'' distinguishes the classes (or prototypes) to be identified. The optimality of this representation is based on an assumption that the reconstruction of the image is a linear combination of a set of descriptive vectors. While variants of the method employ a wide variety of classification schemes, we choose the class having the smallest Euclidean distance in the subspace to the target as a match.

PCA operates by first constructing a linear subspace from a set of exemplars. In the domain of face or object recognition, the exemplars might be a set of canonical views of the faces or objects to be distinguished. Each exemplar is expressed as a vector, , and the set of these vectors is assembled into a matrix, . The eigenvectors of are computed using singular values decomposition, producing an orthonormal basis set. Since each vector in this basis set is of the same dimensionality as the input prototypes and, as such, can be represented as images, they are sometimes referred to in the literature as eigenpictures or eigenfaces [60].

More formally, and expressed in the context of landmark recognition, consider a set T of m landmark prototypes . Each of these prototypes is an instance of a landmark candidate - that is, each prototype has been detected using the attention operator outlined in Chapter 3, and therefore each prototype has an associated local intensity map; typically, we select the local intensity map to be of the same scale as the attention operator that was used to detect the landmark. For each prototype , we build a column vector, by scanning the local intensity distribution in row-wise order and normalising the magnitude of to one. Note that if the local intensity image consists of s by t pixels, then it follows that is of dimensionality n=st. Our goal is to construct a discriminator using the set of vectors defined by T. This is accomplished by constructing an matrix whose columns consist of the vectors , and expressing in terms of its singular values decomposition,
equation354
where is an column-orthogonal matrix whose columns represent the principal directions of the range defined by (that is, gives the eigenvectors of ), is an diagonal matrix, whose elements correspond to the singular values (or eigenvalues) of and is an column-orthogonal matrix whose rows represent the projections of the columns of into the subspace defined by (weighted appropriately by the inverses of the eigenvalues). Note that the columns of define a linear subspace of dimensionality m, which can be much smaller than n. In addition, the principal axes of the subspace are arranged so as to maximise the Euclidean distance between the projections of the prototypes into the subspace, which optimises the discriminability of the prototypes. As we have already mentioned, the columns of are of dimensionality n, and hence can be represented as images. Figure 4.2 shows a set of landmark prototypes on the left, and the corresponding eigenvectors, or eigenlandmarks constructed from the prototypes on the right.

Figure 4.2: (a) Landmark Prototypes and (b) Eigenlandmarks.

Once the subspace is constructed, it can be used for classifying landmark candidates. Given a landmark candidate c, we construct a vector from the local intensity distribution of c, normalised to unit magnitude. The subspace projection of is obtained using

and then c can be matched to the prototype whose subspace projection is closest (in the Euclidean sense) to in the subspace. If the subspace projection of prototype is defined using the Euclidean metric,

where is obtained from the prototype image in the same fashion as was used to obtain , then the optimal match is defined as

The following section will demonstrate how this classification mechanism can be used to track landmarks over a set of viewpoints.

Next: Landmark Tracking Up: Visual Tracking Previous: Visual Tracking

Robert Sim
Tue Jul 21 10:30:54 EDT 1998