next up previous contents
Next: Landmark Tracking Up: Visual Tracking Previous: Visual Tracking

Landmark Recognition

As we have already stated, we can exploit the image intensity distribution in the neighbourhood of a candidate landmark in order to achieve recognition of a previously observed prototype. To this end, we represent the appearance of landmarks (both candidates and prototypes) using a technique known as principal components analysis (PCA) [60, 44, 48]. Image recognition using PCA operates by projecting the image to be classified into a subspace which ``best'' distinguishes the classes (or prototypes) to be identified. The optimality of this representation is based on an assumption that the reconstruction of the image is a linear combination of a set of descriptive vectors. While variants of the method employ a wide variety of classification schemes, we choose the class having the smallest Euclidean distance in the subspace to the target as a match.

PCA operates by first constructing a linear subspace from a set of exemplars. In the domain of face or object recognition, the exemplars might be a set of canonical views of the faces or objects to be distinguished. Each exemplar is expressed as a vector, tex2html_wrap_inline4306, and the set of these vectors is assembled into a matrix, tex2html_wrap_inline4308. The eigenvectors of tex2html_wrap_inline4308 are computed using singular values decomposition, producing an orthonormal basis setgif. Since each vector in this basis set is of the same dimensionality as the input prototypes and, as such, can be represented as images, they are sometimes referred to in the literature as eigenpictures or eigenfaces [60].

More formally, and expressed in the context of landmark recognition, consider a set T of m landmark prototypes tex2html_wrap_inline4320. Each of these prototypes is an instance of a landmark candidate - that is, each prototype has been detected using the attention operator outlined in Chapter 3, and therefore each prototype has an associated local intensity map; typically, we select the local intensity map to be of the same scale as the attention operator that was used to detect the landmark. For each prototype tex2html_wrap_inline4322, we build a column vector, tex2html_wrap_inline4324 by scanning the local intensity distribution in row-wise order and normalising the magnitude of tex2html_wrap_inline4324 to one. Note that if the local intensity image consists of s by t pixels, then it follows that tex2html_wrap_inline4324 is of dimensionality n=st. Our goal is to construct a discriminator using the set of vectors defined by T. This is accomplished by constructing an tex2html_wrap_inline4338 matrix tex2html_wrap_inline4308 whose columns consist of the vectors tex2html_wrap_inline4324, and expressing tex2html_wrap_inline4308 in terms of its singular values decomposition,
equation354
where tex2html_wrap_inline4346 is an tex2html_wrap_inline4338 column-orthogonal matrix whose columns represent the principal directions of the range defined by tex2html_wrap_inline4308 (that is, tex2html_wrap_inline4346 gives the eigenvectors of tex2html_wrap_inline4308), tex2html_wrap_inline4356 is an tex2html_wrap_inline4358 diagonal matrix, whose elements correspond to the singular values (or eigenvalues) of tex2html_wrap_inline4308 and tex2html_wrap_inline4362 is an tex2html_wrap_inline4358 column-orthogonal matrix whose rows represent the projections of the columns of tex2html_wrap_inline4308 into the subspace defined by tex2html_wrap_inline4346 (weighted appropriately by the inverses of the eigenvalues). Note that the columns of tex2html_wrap_inline4346 define a linear subspace of dimensionality m, which can begif much smaller than n. In addition, the principal axes of the subspace are arranged so as to maximise the Euclidean distance between the projections of the prototypes tex2html_wrap_inline4322 into the subspace, which optimises the discriminability of the prototypes. As we have already mentioned, the columns of tex2html_wrap_inline4346 are of dimensionality n, and hence can be represented as images. Figure 4.2 shows a set of landmark prototypes on the left, and the corresponding eigenvectors, or eigenlandmarks constructed from the prototypes on the right.

    figure382
Figure 4.2: (a) Landmark Prototypes and (b) Eigenlandmarks.

Once the subspace is constructed, it can be used for classifying landmark candidates. Given a landmark candidate c, we construct a vector tex2html_wrap_inline4388 from the local intensity distribution of c, normalised to unit magnitudegif. The subspace projection tex2html_wrap_inline4392 of tex2html_wrap_inline4388 is obtained using
 equation395
and then c can be matched to the prototype tex2html_wrap_inline4398 whose subspace projection is closest (in the Euclidean sense) to tex2html_wrap_inline4392 in the subspace. If the subspace projection of prototype tex2html_wrap_inline4322 is defined using the Euclidean metric,
 equation403
where tex2html_wrap_inline4404 is obtained from the prototype image in the same fashion as was used to obtain tex2html_wrap_inline4388, then the optimal match tex2html_wrap_inline4398 is defined as
equation412

The following section will demonstrate how this classification mechanism can be used to track landmarks over a set of viewpoints.


next up previous contents
Next: Landmark Tracking Up: Visual Tracking Previous: Visual Tracking

Robert Sim
Tue Jul 21 10:30:54 EDT 1998