source: trunk/documents/theses/dstn/review.tex @ 12309

Revision 12309, 71.0 KB checked in by dstn, 14 months ago (diff)

still wrestling with stupid paragraph numbering

Line 
1\section{Preface}
2
3This thesis describes some of the research carried out by me and my
4colleagues in the \an group.  \an started as a collaboration between
5Sam Roweis, a University of Toronto computer scientist, and David
6W.~Hogg, a New York University astronomer.  An idea that began as a
7crazy scrawl on the back of a napkin---probably in a bar---eventually
8grew into a real project and began attracting collaborators and
9students.  Fast-forwarding a few years, we have achieved a rare feat:
10a computer vision software system that \emph{works}, with little human
11intervention, and is being used by hundreds of real astronomers doing
12real research.  We have also branched out and explored a number of
13other promising areas where ideas in computer vision and machine
14learning can be applied to astronomical data to great benefit.
15
16
17\begin{figure}
18\begin{center}
19test
20\end{center}
21\caption{test caption}
22\end{figure}
23
24
25This thesis is quite blatantly a collection of manuscripts.  I have
26done a bit more than staple them together to produce the chapters of
27this thesis, but the original tone and style of each manuscript still
28shines through.  I hope my readers will find the chapter-to-chapter
29variations a refreshing, rather than jarring, change of pace.  One
30advantage of collecting manuscripts into a thesis is that each chapter
31should largely stand alone.  Another advantage is that each manuscript
32was prepared with a list of authors, so it is simpler to identify the
33collaborators who have contributed to each chapter.
34
35
36Test para
37
38\section{Introduction}
39
40
41The general idea of the \an collaboration is to apply ideas in
42computer vision and machine learning to problems and data in
43astronomy.  Astronomical images are a good domain in which to explore
44ideas in computer vision, because the problems to be solved are often
45much more constrained than in general imaging, the telescopes and
46cameras are often well-understood and calibrated, large volumes of
47data and ``ground truth'' information exist, and with new telescopes
48constantly coming online and producing greater and greater volumes of
49imaging, there is a need for sophisticated computation.  I briefly
50outline astronomical imaging for computer vision researchers in
51\secref{sec:astrointro}.
52
53
54The specific idea addressed in this thesis is that pattern recognition
55approaches can be applied to astronomical images.  The goal is to
56produce a system that is able to recognize automatically the stars in
57any image of the night sky, using the information in the image pixels
58alone.  Recognizing the stars in an image is equivalent to finding the
59pointing, scale, and rotation of the image on the sky in a standard
60coordinate system.  The branch of astronomy concerned with measuring
61the positions and motions of celestial bodies is called
62\emph{astrometry}, and the task of placing a new image in a standard
63reference frame is known as ``solving the astrometry'' or
64``calibrating the astrometry'' of the image.  The \emph{blind
65astrometric calibration} task is to calibrate an image using only the
66image itself.  The broad goal of the \an project was to build a system
67that would allow us to create correct, standards-compliant astrometric
68\metadata for every useful astronomical image ever taken, past and
69future, in any state of archival disarray.  This is part of a larger
70effort to organize, annotate and make searchable all the world's
71astronomical information.
72
73
74After a brief introduction to astronomical imaging for computer
75scientists, the remainder of this chapter reviews the area of pattern
76recognition, and in particular the framework of \emph{geometric
77hashing} for object recognition in images.  One of the contributions
78of this thesis is to replace the simple hash table used in traditional
79geometric hashing with a \kdtree, so I review related work in hashing
80and other approaches for fast feature matching.  Finally, I present
81the blind astrometric calibration task as an instance of object
82recognition, and review some previous approaches to the problem.
83
84%% Our approach?
85
86
87The remaining chapters present our approach to the blind astrometric
88calibration problem.  \Chapref{chap:techreport} explains our approach
89and presents the results of large-scale tests of the system on
90real-world data.  Chapters \ref{chap:verify} and \ref{chap:kdtree}
91delve into details of the approach: \Chapref{chap:verify} presents the
92Bayesian decision theory problem that lies at the heart of our
93approach, while \chapref{chap:kdtree} explains the technical details
94of the \kdtree data structure implementation that is key to making our
95system performant enough to be a practical tool.
96\Chapref{chap:conclusion} summarizes our results.
97
98
99\section{Astronomical imaging for computer scientists}
100\label{sec:astrointro}
101
102Astronomical images are very different from images produced by typical
103consumer-grade cameras.  This section is intended to give
104non-astronomers a sense of the typical parameters of contemporary
105astronomical imaging setups, and introduce some of the terminology and
106key ideas.
107
108
109The Sloan Digital Sky Survey (SDSS) \cite{sdsstechnical, sdsscamera}
110is one of the most important projects in contemporary astronomy.  The
111primary imaging survey was carried out from 2000 through 2008, so the
112hardware is no longer cutting-edge, but it is still very impressive.
113The telescope's primary mirror is $2.5$ meters in diameter.  Incoming
114light first strikes the primary mirror, then a one-meter secondary
115mirror, then passes through two corrective lenses in front of the
116camera.  The nearly distortion-free field of view is about $3$ degrees
117wide.  The camera is composed of $30$ charge-coupled devices (CCDs),
118each $2048\times2048$ pixels, arranged in six columns.  The five CCDs
119in each column have different bandpass filters, ranging from the
120near-infrared ($913~\nanometers$) through the optical to the
121near-ultraviolet ($354~\nanometers$).  The CCDs and associated
122electronics are cooled with liquid nitrogen to $-80^{\circ} \unit{C}$
123and held under vacuum.  Each CCD is about $5~\unit{cm}$ square, and
124the whole focal plane of the camera is about $65~\unit{cm}$ in
125diameter.  The camera assembly has a mass over $300~\unit{kg}$.
126
127
128The survey operates in ``drift-scanning'' mode: the telescope is
129pointed in a nearly fixed direction, and as the Earth rotates the sky
130drifts past the camera.  The camera electronics shift the electrons
131along the CCD columns at the same rate.  As an object drifts through
132the field of view, the electrons it produces in the CCD pixels march
133along with it, and are read out as it disappears from view.  In the
134SDSS camera, the object drifts past each of the five bandpass-filtered
135CCDs of the camera in turn, producing a five-color image.  Over the
136course of a night of observing, the camera sweeps out a great circle,
137creating six columns of five-band images that are $2048$ pixels wide
138and over a million pixels long: over $120$ gigabytes of pixels per
139night.  This clever drift-scanning scheme has many advantages, one
140being that the readout rate of the CCDs can be relatively slow,
141resulting in low readout noise.  Calibration of the camera is simpler
142because charge is integrated over columns of the CCD: it is only
143necessary to calibrate whole columns rather than individual pixels.
144Note that for this scheme to work, the CCDs must be very carefully
145aligned in the focal plane, and the telescope control system must be
146capable of keeping the system pointed stably at the level of a pixel
147or better.  The moving mass of the telescope is over
148$15,000~\unit{kg}$, so this is no mean feat!
149
150% 24 um pixels
151
152Astronomical imaging systems are designed so that the point-spread
153function (PSF)---the image on the CCD of a distant point source of
154light---is spread over several pixels.  This allows the position of a
155point source to be measured to sub-pixel accuracy by fitting a model
156of the PSF to the observed pixel values.  If the system focused a
157point source onto a single pixel, this would be impossible.  The goal
158is typically for the full-width at half-maximum (FWHM) of the PSF to
159be a few pixels, so that the PSF is well-sampled but the signal is not
160spread out over too many pixels.
161
162For ground-based telescopes, the moving atmosphere causes the image of
163a distant point source of light to dance around on the CCD; in typical
164exposures of many seconds this ``speckling'' is integrated into a
165point-spread function.  The size of this PSF is called the ``seeing''
166and a value of one $\unit{arcsecond}$ FWHM is considered quite good.
167The SDSS camera has pixels of size $0.396~\unit{arcseconds}$, and the
168point-spread function is dominated by the seeing: the optics of the
169system are good enough that the images are as clear as they can be
170given the atmosphere.
171
172
173The positions of astronomical objects are usually measured in an
174\emph{equatorial coordinate system}: essentially the projection of the
175Earth's latitudes and longitudes onto the celestial sphere at a
176specific instant.  The latitudinal direction is called ``declination''
177and abbreviated ``Dec'', and the longitudinal direction is called
178``right ascension'' and abbreviated ``RA''.  Declination is typically
179measured in degrees from $-90$ to $+90$, and right ascension is
180typically measured in degrees from $0$ to $360$, or hours from $0$ to
181$24$.  There are $60~\unit{arcminutes}$ per degree and
182$60~\unit{arcseconds}$ per $\unit{arcminute}$.  For reference, the
183Moon as seen from Earth is about half a degree in diameter.  The
184$0.396~\unit{arcsecond}$ pixels of the SDSS camera view an area the
185size of a penny at a distance of ten kilometers; each $2048\times2048$
186pixel CCD views an area of about one-millionth of the sky.
187
188
189The SDSS is very impressive, but the next generation of telescopes
190currently in development dwarf its specifications.  The Large Synoptic
191Survey Telescope (LSST) \cite{lsst}, for example, has an
192$8.4~\unit{meter}$ primary mirror, with a $3.5~\unit{degree}$ field of
193view almost completely covered by an array of nearly $200$ CCDs, each
194with $4096\times4096$ pixels: a total of about $3.2$ gigapixels.  The
195camera is the size of a small car.  It will take exposures of about
196$15$ seconds, yielding over $10$ terabytes of data per night.
197
198% -- basics of data reduction; catalogs
199% -- stars, galaxies, and other stuff
200
201
202
203
204\section{Pattern recognition}
205
206\emph{Pattern recognition} encompasses a huge class of problems and
207techniques in computer vision and machine learning.  Many pattern
208recognition problems are of the type ``identify this object'' or
209``find things like this'': the system is first told about a large set
210of objects, then is presented with a novel object and is asked to
211describe it or find similar objects from the set of known objects.
212
213Pattern recognition systems often take a \emph{feature matching}
214approach.  A \emph{feature} is a compact description of
215(problem-specific) relevant, invariant properties of an object, such
216that objects that are similar in the problem domain have similar
217features.  Defining how features are extracted from objects and
218defining the similarity between features are key factors in the
219success of such systems.  Often the representation of a feature (the
220\emph{feature descriptor}) is a point in a real coordinate space and a
221distance metric defines the similarity between features.
222
223Feature-based approaches typically involve two phases: an
224\emph{indexing} phase, in which the set of known objects is shown to
225the system, and a \emph{test} phase, in which novel objects are
226presented.  During indexing, features are extracted from the set of
227known objects.  At test time, features are extracted from the novel
228object and compared with the set of features from the known objects.
229In many domains, the set of known objects is very large, so it is
230critical that features be stored in a manner that allows the system to
231rapidly find known features that match a given feature.
232
233Often feature-based approaches suffer \emph{feature aliasing}
234problems: two objects that are dissimilar in the problem domain may
235yield similar features.  In this case, systems may use \emph{voting}
236or \emph{verification} schemes to ensure that aliased (false) feature
237matches are eliminated.  Voting schemes typically assume that false
238feature matches are independently randomly distributed, so the
239probability of matching several features to the same incorrect object
240is the product of the individual probabilities; if we demand a
241sufficient number of matches, the probability of a false match becomes
242small.  Verification or \emph{hypothesize-test} schemes treat a
243feature match as a hypothesis that the novel object is similar to a
244known object, and then some test to decide if the hypothesis is
245correct.  Such schemes can be seen as using feature matching as a
246search heuristic.
247
248\section{Visual pattern recognition}
249
250Pattern recognition in images is a particularly challenging and
251complex problem domain which has been studied extensively.  Objects to
252recognize include faces, fingerprints, specific physical objects, or
253even general classes of physical objects such as mugs, chairs, or
254elephants.
255
256
257Visual pattern recognition of general physical (3-dimensional) objects
258is difficult for many reasons: objects may be deformable, can appear
259different when viewed from different angles or under different
260lighting conditions, and can be occluded by other objects.  Features
261used in visual pattern recognition include edges (lines and curves),
262corners, or the color or texture of image patches.
263
264
265An important consideration when designing features for visual pattern
266recognition is the size (\ie, spatial extent in the image) of the
267feature.  \emph{Local features} gather information from a small region
268of the image, while \emph{global features} incorporate information
269from the whole image.  Local features can provide robustness to
270occlusion and deformation: if part of the object is hidden, features
271belonging to that part will be lost, but the features belonging to the
272remaining parts will still be found.  Since local features aggregate
273information from only a small portion of the image, they may be more
274noisy or less distinctive than global features.
275
276
277Since a single local feature may be insufficient to guarantee that an
278object has been correctly recognized, some systems build high-level
279features by combining multiple low-level features.  When the objects
280to be recognized are rigid or articulated (composed of multiple rigid
281parts connected by simple joints), a common approach is to build
282features that describe the relative geometry of low-level features.
283This use of \emph{geometric features} is often called \emph{geometric
284hashing}, since the result is a feature descriptor or \emph{hash code}
285which describes the geometric arrangement of the low-level features.
286
287
288
289\section{The geometric hashing framework}
290
291\begin{figure}
292\begin{center}
293% dot lamdan.dot -Tps2 -o lamdan.ps
294% ps2pdf -sPAPERSIZE=a4 lamdan.ps lamdan.pdf
295\includegraphics[height=0.8\textheight]{lamdan}
296\end{center}
297\caption{Outline of the Geometric Hashing scheme.  This diagram is a
298  reproduction of \fig 2 in Lamdan \etal \cite{lamdan1990}.
299  \label{lamdan}}
300\end{figure}
301
302
303Lamdan \etal \cite{lamdan1990} and Wolfson and Rigoutsos
304\cite{wolfson1997} give an excellent overview of the geometric hashing
305approach.  See Figure \ref{lamdan} for a block-diagram overview.
306Geometric hashing is based on the fact that when rigid objects undergo
307similarity transformations (translation, rotation, and scaling),
308angles between points on the object and relative distances between
309points do not change.  Therefore, if we describe the relative
310arrangement of points on an object in terms of these invariants, our
311description will be invariant under these transformations of the
312object.  Invariant geometric properties can be found for other types
313of transformations such as affine or perspective projection.
314
315
316Geometric hashing builds \emph{geometric features} out of lower-level
317features.  We will call the low-level features \emph{interest points}
318for clarity.
319
320
321Given the types of transformations the system must handle, there is
322some sufficient number of interest points required to define an
323invariant local coordinate system; these are called the \emph{basis
324  set}.  For example, for 2-D objects that can undergo similarity
325transformations, two interest points can be used to define a
326coordinate system.  For 3-D objects with similarity transformations,
327three (non-collinear) interest points are required.
328
329Geometric hashing follows an indexing approach.  During the indexing
330phase, the system preprocesses a database of objects which are to be
331recognized.  For each object, many interest points are extracted which
332are invariant under the transformations that are expected, and can be
333well localized.  The system then enumerates all basis sets by looking
334at all combinations of interest points; each set of basis features
335defines a relative coordinate system.  For each basis set, all
336remaining interest points and enumerated and their positions described
337in this coordinate system.  Each relative coordinate vector forms a
338\emph{geometric feature descriptor}.  In this way, the relative
339geometric arrangement of a set of interest points is converted into a
340point in a vector space.
341
342% --- consider adding a variant of the quad fig here.
343
344In the standard geometric hashing approach, the geometric feature
345descriptor is discretized and encoded as a hash table key. A grid is
346place over the feature descriptor space, and the hash table key of a
347geometric feature is the identifier of the grid cell containing it.
348Each entry in the hash table maps back to the basis set plus the
349feature whose position is encoded.
350
351% --- picture?
352
353Given a new image, a geometric hashing system extracts all interest
354points from the image and, as at indexing time, proceeds to enumerate
355all basis sets and computes the positions of the remaining interest
356points in each basis set.  This generates many geometric features
357whose descriptors are discretized and converted to hash table keys, as
358before.  Similar arrangements of interest points will lead to similar
359geometric feature descriptors, which--with luck---will be mapped to
360the same hash table key.  The hash table values associated with each
361key are retrieved, and each value that is found is considered a match.
362Each match votes for the correspondence between the interest points
363from the database and the interest points from the image that were
364used to define the geometric feature.
365
366
367After all features from the image have been processed, many votes will
368have accumulated.  False matches (due to aliasing or hash table
369collisions) should be uniformly distributed over the set of known
370objects, so should not generate very many votes for any particular
371object.  Objects that are actually present in the image should
372generate many votes.  Objects that receive an insufficient number of
373votes are rejected, and the remainder are subjected to a verification
374process.
375
376
377The verification process requires computing a transformation from the
378object to image coordinates, using the corresponding interest points
379from the known object and the image.  Using this transformation, all
380the interest points belonging to the known object are projected into
381image coordinates.  If the match is correct, we expect the image to
382contain features at these locations.  Some researchers add an extra
383verification stage by projecting the known object into the image and
384checking that the predicted edges of the object are found in the image
385(eg, \cite{lamdan1990, huttenlocher1990}).
386
387
388The geometric hashing approach is flexible and applicable to many
389problems, but there are some issues with the basic version described
390here.  The hash function described above simply places a grid over the
391relative coordinate system and returns the identifier of the grid cell
392in which the interest point lands.  This is prone to edge effects: if
393an interest point is near the edge of a grid cell, a small change in
394its position due to noise can move it across the edge into a different
395grid cell, which causes the hash code to change and a different set of
396feature matches to be found.  This problem can be overcome by
397detecting interest points that are near the edges of their cells and
398also searching the hash table using the hash codes of neighbouring
399cells.  This results in more false feature matches, and can become
400expensive as the dimensionality of the geometric feature space
401increases (there are more edges, and the proportion of the hypervolume
402of feature space that is near an edge increases).  Using a data
403structure other than the hash table may be beneficial.
404
405
406Another issue with uniform grid-based hashing is that it assigns equal
407importance (and storage space) to each grid cell in the feature space.
408Consider an interest point that is far from the basis set that defines
409its local coordinate system.  Small positional errors in the basis set
410can cause the coordinate system to be rotated or scaled, which results
411in large changes in the interest point's relative coordinates.  This
412means that edge effects become more pronounced in such regions of the
413feature space.  One response to this concern is to increases the size
414of the grid cells far from the basis set, but then if the interest
415points are uniformly distributed, these larger hash table bins will
416contain more features, so every match to these bins will be
417accompanied by more false positives.
418
419
420Finally, a hash table with equal-sized cells will only be uniformly
421utilized if interest points are uniformly distributed in feature
422space, but some domains may yield interest points that are far from
423uniformly distributed.  In other domains, it may be beneficial to
424place constraints on the geometric features that are used (for
425example, to avoid regions that are prone to large errors, or that are
426relatively indistinctive).
427
428
429There is nothing in the geometric hashing framework that requires a
430standard hash table to be used for feature matching.  Other methods of
431feature matching are described in the next section.
432
433
434\section{Related work in fast feature matching}
435
436
437For feature matching in a geometric hashing framework, the database of
438known objects is converted into a set of points in a feature
439descriptor space, which is a vector space.  To find matches to a new
440feature (extracted from an image in which we want to recognize
441objects), we need to find all points in the vector space that are
442within some tolerance.  Usually the Euclidean distance metric is
443applied, so the tolerance is a radius in the vector space.
444
445
446\subsection{Bloom filters}
447
448A Bloom filter \cite{bloom1970} is a data structure that can answer
449queries of the form ``is feature $q$ a member of the set of known
450features?''  A Bloom filter is a set of $n$ bits, initially zero.  A
451set of $k$ hash functions must be defined which take a feature as
452input and produce an integer in $[0, n)$ as output.  In the indexing
453phase, we examine each of the known features, and apply each of the
454hash functions, producing $k$ hash codes.  Each code is used to
455reference a bit and \emph{set} it (\ie, turn it on).  Each known
456feature results in $k$ bits being set; several features may request
457that a particular bit be set.
458
459
460Given a new feature to match, we run the $k$ hash functions on it and
461test whether each of the resulting bits are set.  If the feature is
462equal to a feature seen at indexing time (according to the hash
463functions), then all $k$ of the bits will have been set; thus the
464Bloom filter does not produce false negatives.  If the query feature
465is not equal to a known feature, then some of the bits may have been
466set due to \emph{collision} of the hash functions, but the feature
467will only be accepted if all $k$ hash functions result in collision;
468this results in a false positive.  The rate of false positives
469increases as the number of known features increases relative to the
470number of bits in the filter.  The Bloom filter is ideally suited to
471cases where the number of known features is small, and a verification
472step can be applied to eliminate any false positives that are
473produced.
474
475
476For feature matching, we want to know not only whether the query
477feature $q$ matches a known feature, we want to know \emph{which}
478known feature matches.  One way to answer such queries is to use a
479\emph{Bloomier filter} \cite{chazelle2004}.  The simplest Bloomier
480filter uses multiple Bloom filters to categorize a given feature into
481two different categories, and uses slightly more space than two Bloom
482filters.  A set of $n$ such Bloomier filters can be used to categorize
483a feature into $2^n$ different categories by assigning one Bloomier
484filter to each bit of the result.
485
486
487The standard Bloom filter only finds exact matches, but we want to
488find \emph{nearby} matches.  An extension of Bloom filters to allow
489proximity searches is given by Kirsch and Mitzenmacher
490\cite{kirsch2006b}.  This variant uses \emph{locality sensitive
491hashing} (LSH, discussed below) as its hashing function, so it
492inherits some limitations from LSH.  Most importantly, LSH only
493guarantees with some probability that nearby features will hash to the
494same bin.  As a result, the locality-sensitive Bloom filter can
495produce false negatives as well as false positives.
496
497
498In order to answer feature matching queries (find all known features
499near a given query feature), one could combine these Bloom filter
500extensions to build a locality-sensitive Bloomier filter.  However,
501the false negative rate of the locality-sensitive filter would be
502magnified because the Bloomier filter requires \mbox{$2
503\ceil{\log(n)}$} separate filters to represent $n$ known features, and
504all of these filters must produce the correct answer.  It seems that
505Bloom filters are not particularly well suited to this task.
506
507
508\subsection{Locality Sensitive Hashing}
509
510Locality Sensitive Hashing (LSH) \cite{datar2004, gionis1999,
511andoni2006}, is a technique for answering approximate nearest
512neighbour queries: it returns a feature whose distance is within a
513constant factor of the distance to the nearest neighbour.  The LSH
514scheme can be extended to handle exact nearest-neighbour queries
515efficiently if the set of known features satisfies some \emph{bounded
516growth} criteria.  It is intended for use in high-dimensional spaces
517(tens to thousands of dimensions), and $\ell_p$ distance metrics for
518$p \in (0, 2]$.
519
520
521The core idea of LSH is to use several hash functions with the
522property that the probability of collision---features being placed in
523the same bin---is much higher for nearby features than for distant
524ones.  In the indexing stage, we populate the hash table by running
525each hash function on each known feature.  To perform a query, we run
526the hash functions on the query feature and retrieve the features in
527the same hash table bin as the query.  One of these features is likely
528to be an approximate nearest neighbour of the query.
529
530
531In practice, in \cite{datar2004} each hash function is composed of
532several (typically $10$) simpler hash functions.  Each simple hash
533function performs a random projection in the feature space: the hash
534value is the discretized value of the dot product of the feature
535vector with the hash function's random vector.  Combining several of
536these simple hash functions is equivalent to projecting the feature
537vectors onto a grid in a $10$-dimensional, non-orthogonal, subspace.
538The hash code is simply the identifier of the grid cell in which the
539feature falls.
540
541
542This scheme suffers from edge effects: a query feature may land in a
543bin other than the bin containing its nearest neighbour due to small
544positional noise in any of the dimensions.  Attempting to reduce this
545effect by making the bin size (discretization level) larger fails
546because the number of hash functions must be increased to maintain the
547same total number of hash bins.
548
549
550In order to reduce the number of false negatives, we can choose many
551(typically $30$) different hash functions and place a reference to
552each feature in the hash table bin chosen by each function.  This
553increases the probability that at least one of the hash functions will
554select a $10$-dimensional subspace in which the query is close to its
555nearest neighbour.
556
557
558The standard LSH scheme does not seem particularly well suited for the
559purpose of feature matching in a geometric hashing system, since the
560dimensionality of the feature vectors is usually moderate ($2$ to $6$
561in the \an system). Since some feature aliasing is expected in
562geometric hashing systems, the nearest neighbour may not be the
563correct match: we would prefer to get all neighbours within a given
564search radius.
565
566
567\subsection{\Kdtrees}
568
569\begin{figure}
570  \begin{center}
571        % tree-figs.py : plot_bboxes
572    \begin{tabular}{c@{\hspace{3pt}}c@{\hspace{3pt}}c}
573      \includegraphics[width=0.31\textwidth]{kdtree-0} &
574      \includegraphics[width=0.31\textwidth]{kdtree-1} &
575      \includegraphics[width=0.31\textwidth]{kdtree-2} \\
576      \includegraphics[width=0.31\textwidth]{kdtree-3} &
577      \includegraphics[width=0.31\textwidth]{kdtree-4} &
578      \includegraphics[width=0.31\textwidth]{kdtree-5}
579    \end{tabular}
580  \end{center}
581  \caption[A \kdtree partitioning of space.]{A \kdtree partitioning of
582        space.  Each panel shows the nodes at one level of the tree.  The
583        points are 200 samples from a ring with uniformly-distributed
584        radius in $\left(0.6, 1\right)$ and uniformly-distributed angle in
585        $\left(0, \displaystyle{\frac{\pi}{2}}\right)$.  Nodes are split
586        along the dimension with the largest range, and the splitting
587        position is chosen so that the two child nodes will contain an
588        equal number of points.  Note how the tree adapts to the
589        distribution of the data: each node tightly bounds the data points
590        it owns.  This is a slightly different version of the \kdtree than
591        Bentley's original description.\label{kdtree}}
592\end{figure}
593
594The $k$-dimensional tree (\kdtree) was introduced by Bentley in 1975
595\cite{bentley1975}, and several extensions have been described
596\cite{friedman1977, sproull1991}.  The idea is to build a binary tree
597over the feature space, where each node ``owns'' a region of the space
598which is disjoint from the region owned by its sibling node.  Each
599non-leaf node has an axis-aligned splitting hyper-plane; its left
600child owns the subspace to the left of the splitting plane, and the
601right child the subspace to the right.  In this way, the \kdtree
602defines a hierarchical subdivision of the feature space into
603axis-aligned hyper-rectangles.  See \figref{kdtree}.
604
605
606For the purposes of indexing the features in a geometric hashing
607system, the set of features is known beforehand and is static, so we
608do not need to insert or delete features from the tree, and Bentley's
609\emph{optimized} \kdtree construction method can be used.  That is,
610the tree can be built to adapt to the particular set of features we
611have.
612
613
614Tree construction proceeds by first assigning all features to the root
615node, then recursively splitting nodes until each leaf node owns at
616most $L$ features, with $L$ perhaps $10$ or $20$.  Splitting is done
617by selecting a splitting plane and using it to partition the features.
618Sproull \cite{sproull1991} enumerates several strategies for choosing
619the splitting hyperplane.  Traditionally, \kdtrees use axis-aligned
620splitting hyperplanes, and the splitting dimension is chosen by simply
621iterating through the dimensions (as in Bentley's original paper), or
622by selecting the dimension with the largest \emph{range} or
623\emph{variance}.  One can also use an arbitrary splitting hyperplane,
624for example by finding the principal eigenvector of the covariance
625matrix of the feature vectors.  Typically the splitting hyperplane is
626positioned so that the two children own an equal number of data
627points.  This leads to a balanced (and therefore short) tree, but in
628some applications it can be beneficial to bias the splitting location
629\cite{macdonald1990}.
630
631
632Depending on the application, it can be beneficial to store at each
633node the tight axis-aligned bounding box (and other summary statistics
634\cite{deng1995}) of the points owned by that node (as shown in
635\figref{kdtree}).  This is particularly useful where there is
636significant correlation between the dimensions of the data, because
637splitting the data points along one dimension implies that the
638bounding box will shrink in other dimensions as well, and this can
639result in faster searches.  If the bounding boxes are stored, then it
640is no longer strictly necessary to store the splitting hyperplane,
641though it may be useful to speed up some computations.
642
643
644\comment{
645\begin{figure}
646  \begin{center}
647    \includegraphics[width=\textwidth]{rangesearch-bb-fig}
648  \end{center}
649  \caption[\Kdtree range search with bounding boxes.]{\Kdtree range
650search for a query point $q$ in \kdtree node $N$, for a \kdtree that
651stores the bounding box of each node.  We check whether the query
652hypersphere overlaps the child nodes $L$ and $R$.  In both cases $q$
653is contained in the left child node $L$ so we must recurse on $L$.  In
654the top panel, the distance \emph{in the splitting dimension} from $q$
655to the bounding box of node $R$ is less than the search radius $r_1$
656so we do not need to recurse on $R$.  In the lower panel, the distance
657in the search dimension to the query is larger than the search radius
658$r_2$ so we must recurse on $R$.}
659  \label{rangesearch-bb}
660\end{figure}
661
662
663\begin{figure}
664  \begin{center}
665    \includegraphics[width=\textwidth]{bohm2001fig7}
666  \end{center}
667  \caption[Lower and Upper Distance Bounds in a \kdtree.]{Lower and upper
668        bounds (min- and max-dist, respectively) on the distance from a query point
669        to a \kdtree bounding box.
670        This figure is copied from \cite{bohm2001} Figure 7.}
671  \label{upperlower}
672\end{figure}
673 
674
675\begin{figure}
676  \begin{center}
677    \includegraphics[width=\textwidth]{rangesearch-split-fig}
678  \end{center}
679  \caption[\kdtree range search with splitting planes.]{\kdtree range
680        search for a query point $q$ in \kdtree node $N$,
681        for a \kdtree that only stores the splitting plane (not the bounding boxes).
682        Top: since the query is to the left of $N$'s splitting plane (dashed line),
683        we must recurse on the left child $L$.  Since the distance in the splitting
684        dimension from the query to the splitting plane, $D_R$ is less than $r$,
685        we must recurse on $R$.
686
687        Bottom: For the left node ($L$): the query is above the splitting plane, so we must recurse
688        on child $T$.  The distance $D_B$ in the splitting dimension to the splitting plane
689        is less than $r$, so we must recurse on $B$ also.
690       
691        For the right node ($R$): the query is above the splitting plane, so we must recurse
692        on child $t$.  The distance $D_b$ to the splitting plane is greater than $r$ so we
693        need not recurse on node $b$.  (Note, we could also keep track of the \emph{total}
694        distance to the query point, $\sqrt{D_R^2 + D_b^2}$, but this requires more book-keeping.}
695  \label{rangesearch-sp}
696\end{figure}
697}
698
699For feature matching, we typically want to do \emph{range search}:
700find all features within radius $r$ of a given query feature $q$.
701This is easily achieved with a \kdtree using a recursive algorithm.
702Starting at the root, we must decide whether we need to recurse on
703each of the node's children.  (How we make this decision is explained
704below.)  Once we reach a leaf, we enumerate the features owned by the
705leaf and compute the distance to the query feature; any feature with
706distance less than $r$ is accepted.
707
708
709We decide whether we must recurse on a node's children based on their
710bounding boxes.  We compute the \emph{mindist}---the lower-bound of
711distance---between the query point and each child's bounding box.  The
712mindist is simply the distance to the corner or edge of the bounding
713box, and can be computed quickly.  Any node whose mindist to the query
714point is less than the query radius $r$ must be visited.  If the
715\kdtree does not explicitly store the tight bounding-boxes of the
716nodes, loose bounding-boxes can be constructed during the recursion
717based on the splitting planes of the ancestors.
718
719
720\comment{
721Nearest-neighbour search in \kdtrees is essentially the same as range
722search, except that $r$ is considered the \emph{pruning distance}; any
723nodes whose lower-bound distance is larger than the pruning distance
724need not be visited.  The pruning distance decreases monotonically as
725the search proceeds and is simply the (upper bound of) the distance to
726the nearest neighbour found so far.  Since reducing the pruning
727distance is critically important to achieving fast search, the order
728in which nodes are explored is important; one typically keeps a
729priority queue of nodes to visit, sorted by the (lower-bound) distance
730of the node to the query feature.  If the query feature falls in the
731region owned by one of the leaf nodes, then all nodes on the path from
732the root to that leaf will have lower-bound distance of zero, so this
733path will be explored first and the ``greedy'' nearest neighbour
734distance will become the first pruning distance.
735
736
737The \kdtree can also be used for approximate nearest-neighbour or
738range searches.  Beis and Lowe \cite{beis1997} describe the Best Bin
739First algorithm, which is essentially the same as the full
740nearest-neighbour search except that the search terminates after
741exploring a fixed maximum number of leaves or after a fixed period of
742time has elapsed.  The priority queue is sorted by the lower-bound
743distance of the node's bounding box to the query point, so the `most
744promising' nodes are explored first.
745}
746
747
748Chapter \ref{chap:kdtree} presents the \kdtree in more detail.
749
750
751\subsection{Other approaches}
752
753% Voronoi:
754% -de Berg 1997
755%   (Mark de Berg , Marc van Kreveld , Mark Overmars , Otfried Schwarzkopf, Computational geometry: algorithms and applications, Springer-Verlag New York, Inc., Secaucus, NJ, 1997)
756% -Edelsbrunner 1987
757%   (Herbert Edelsbrunner, Algorithms in combinatorial geometry, Springer-Verlag New York, Inc., New York, NY, 1987)
758% -Preparata and Shamon 1985
759%   (Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985)
760
761In contrast to the \kdtree's rectangular decomposition of space, there
762is a family of space decompositions that use hyperspheres.  An example
763is the \emph{ball-tree} \cite{uhlmann1991b}, in which each non-leaf
764node is associated with one of the known features $f$ and a radius
765$r$; the left child owns all features within radius $r$ of feature
766$f$, and the right child owns the remaining features.  Similarly, in
767the \emph{anchors hierarchy} \cite{moore2000}, each non-leaf node is
768represented by a feature (called its \emph{anchor}), and a node owns
769all the features that are closer to its anchor than the anchor of its
770sibling.  These sphere-based trees are supposed to better capture the
771structure of high-dimensional data sets, though the results seem to
772depend quite strongly on the particular data distributions and search
773parameters.
774
775
776For two-dimensional feature spaces, algorithms based on planar point
777location (eg, \cite{kirkpatrick1983}) and the Voronoi tesselation of
778space yield $\mathcal{O}(n \log n)$ preprocessing time, a data
779structure of size $\mathcal{O}(n)$, and query time of
780$\mathcal{O}(\log{n})$.  Unfortunately, in higher dimensions the space
781required to store the Voronoi tesselation is
782$\mathcal{O}(n^{\floor{d/2}})$, which quickly becomes untenable
783\cite{arya1998}.
784
785% planar point location:
786% -Dobkin and Lipton in 1976 O((log n)^2) time, O(n^2) space
787% -Sarnak and Tarjan O(n) space, O(log n) time
788%    * Neil Sarnak, Robert E. Tarjan (1986). "Planar Point Location Using Persistent Search Trees". Communications of the ACM 29 (7): 669-679.
789% -Edelsbrunner, Guibas, and Stolfi, same
790%    * Herbert Edelsbrunner, Leonidas J. Guibas, Jorge Stolfi (1986). "Optimal point location in a monotone subdivision". SIAM Journal on Computing 15 (2): 317-340.
791% -Kirkpatrick, same
792%    * David G. Kirkpatrick (1983). "Optimal Search in Planar Subdivisions". SIAM J. Comput. 12: 28-35.
793% Voronoi tesselation:
794% -B. Delaunay, Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793-800, 1934
795
796The design of spatial indexing data structures and search algorithms
797is itself a large research area.  Dozens of different tree structures
798have been proposed: in two surveys B\"ohm \cite{bohm2001} and
799Hjaltason \cite{hjaltason2003} identify B-, $\textrm{B}^{+}$-, ball-,
800bisector-, BSP-, DABS-, fq-, gh-, \mbox{GNA-,} hB-, $\textrm{hB}^{\pi}$-,
801hybrid-, IQ-, kd-, kd-B-, $\textrm{LSD}^h$-, M-, mb-,
802$\textrm{mb}^{\ast}$-, mvp-, oct-, \mbox{post-office-,} pyramid-, quad-, R-,
803$\textrm{R}^{\ast}$-, $\textrm{R}^{+}$-, sa-, slim-, \mbox{sphere-,}
804SR-, SS-, TV-, vp-, $\textrm{vp}^{\ast}$-, and X-trees.  There is an
805equally mind-boggling variety of exact and approximate search
806algorithms.  Each data structure and algorithm may work well in some
807region of parameter space, but there is no clear winner for
808general-purpose use.
809
810
811\section{Astrometric calibration as a pattern recognition task}
812
813
814\comment{
815wget "http://casjobs.sdss.org/ImgCutoutDR7/getjpeg.aspx?ra=166.45&dec=-0.03&scale=1&opt=&width=2000&height=2000" -O ngc3521-orig.jpg
816jpegtopnm ngc3521-orig.jpg | pnmrotate -45 | pnmcut 600 900 1400 1000 | pnmscale -reduce 2 | pnmtojpeg > ngc3521.jpg
817#---> http://live.astrometry.net/status.php?job=alpha-200906-68444159
818jpegtopnm ngc3521.jpg | ppmtopgm | pnminvert | pnmtojpeg > ngc3521-bw.jpg
819wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=field.xy.fits" -O ngc3521.xy
820wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=index.xy.fits" -O ngc3521-index.xy
821jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white > ngc3521-sources.png
822jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white -P | plotxy -I - -i ngc3521-index.xy -x 1 -y 1 -C black -b white -s crosshair -P | plot-constellations -f 18 -w ngc3521.wcs -i - -N -o ngc3521-index.png
823%%% wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=wcs.fits" -O ngc3521.wcs
824%%% scp gmaps:/data2/test-merc/tycho.mkdt.fits .
825wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-0.5&bb=0,-85,360,85&dashbox=0.1&w=500&h=500&lw=3" -O ngc3521-zoom0.png
826wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-1&bb=175.533,-17.7621663832,211.533,17.6598478619&dashbox=0.01&w=500&h=500&lw=3" -O ngc3521-zoom1.png
827wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=0.5&bb=191.733,-1.85338140354,195.333,1.74602498613&w=500&h=500&lw=3" -O ngc3521-zoom2.png
828for x in 0 1 2; do
829 pngtopnm ngc3521-zoom${x}.png | ppmtopgm | pnminvert | pnmtopng > ngc3521-zoom${x}-bw.png;
830done
831}
832
833
834% ~/an-2/usnob-map/execs/tilerender -x 0.000000 -y -85.000000 -X 360.000000 -Y 85.000000 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.1 -d > tile1.png
835% ~/an-2/usnob-map/execs/tilerender -x 151.724000 -y -4.784223 -X 187.724000 -Y 29.772221 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.25 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.01 -d > tile2.png
836% ~/an-2/usnob-map/execs/tilerender -x 167.924000 -y 11.335527 -X 171.524000 -Y 14.841399 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g 0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 > tile3.png
837%
838% /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/index.xy.fits -S 1 -W 720 -H 503 -x 1 -y 1 -w 2 -r 6 -s s > ixy.pgm
839% /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/field.xy.fits -S 1 -W 720 -H 503 -r 5 -x 1 -y 1 -w 2 -s c > fxy.pgm
840% cp /home/gmaps/ontheweb-data/tor/200706/51145570/image.pnm undim.ppm
841% pgmtoppm green ixy.pgm > ixy.ppm
842% pnmcomp -alpha=ixy.pgm ixy.ppm undim.ppm > sum.ppm
843% pnmcomp -alpha=fxy.pgm fxy.ppm sum.ppm > sum2.ppm
844% pnmtopng sum2.ppm > sum.png
845% pnmcomp -alpha=fxy.pgm fxy.ppm undim.ppm | pnmtopng > sources.png
846% http://oven.cosmo.fas.nyu.edu/test/status.php?job=tor-200706-51145570
847
848
849\begin{figure}
850\begin{center}
851\framebox{\includegraphics[width=0.99\figunit]{ngc3521-bw}} \\
852\framebox{\includegraphics[width=0.99\figunit]{ngc3521-sources}} \\
853\framebox{\includegraphics[width=0.99\figunit]{ngc3521-index}}
854\end{center}
855\caption{\captionpart{Top:} Input image (credit: Sloan Digital Sky
856Survey).  \captionpart{Middle:} The brightest 100 sources extracted
857from the image.  \captionpart{Bottom:} Reference sources, transformed
858into the image coordinate system (crosshairs).  Many of the image and
859reference sources are aligned, but there are many image sources
860without reference sources.  Our system knows about the positions of
861many objects of interest on the sky, and has labelled the galaxy NGC
8623521.\label{fig:redgreen}}
863\end{figure}
864
865
866\begin{figure}
867\begin{center}
868\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c}
869\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom0-bw}} &
870\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom1-bw}} &
871\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom2-bw}}
872\end{tabular}
873\end{center}
874\caption{The location of the input image on the sky.
875  \captionpart{Left:} The whole sky, in Mercator projection.  The
876  sinusoid-shaped feature is the Milky Way.  The dashed box shows the
877  zoomed-in region.  \captionpart{Middle:} Zoomed in by a factor of
878  $10$.  \captionpart{Right:} Zoomed in by a factor of $100$.  The box
879  shows the outline of the input image.\label{fig:onthesky}}
880\end{figure}
881
882
883For modern astronomers, astrometric calibration is often one of the
884first steps toward getting useful information out of an image of the
885sky.  Aligning a new image with an \emph{astrometric reference
886catalog} allows the astronomer to place the image within a standard
887coordinate frame.  This allows stars, galaxies, and other objects
888(\emph{sources}) in the new image to be identified with known sources,
889which in turn allows astronomers to calibrate other properties of the
890new image, and allows the positions of new sources to be described in
891a standard reference frame.
892
893
894The task of blind astrometric calibration---automatically finding the
895astrometric calibration of an image, using only the information in the
896image pixels---can be seen as a pattern recognition problem.  As
897Bertin \cite{bertin2005} notes, ``astrometric and photometric
898calibrations have remained the most tiresome step in the reduction of
899large imaging surveys,'' so this is not only an interesting problem to
900solve, but one with practical implications for astronomers.
901
902
903For the purposes of astrometric calibration, we can think of the sky
904as a large two-dimensional surface: the stars are very distant, so our
905viewpoint is effectively fixed.  We are moving, as are the stars, but
906these motions are small relative to the precision at which we
907typically work.  The sky contains many stars, galaxies, and other
908astronomical sources.  The stars and distant galaxies are effectively
909point sources, while closer galaxies can be resolved.  Astrometric
910reference catalogs list the positions, motions, and brightnesses of
911these sources and serve as the ``ground truth'' or database of known
912(reference) objects.  The USNO-B1 catalog \cite{usnob, nomad}, for
913example, lists over one billion objects.  As many as a few percent of
914these are false detections or other artifacts \cite{barroncleaning},
915and some objects that should be visible are missing.
916
917
918Extra sources can be due to planets, comets, satellites, or aircraft.
919Missing or poorly localized source can be due to imperfections in the
920imaging sensor, saturation, cosmic ray interference, or (rarely)
921occlusion.  Errors in the image processing that detects sources can
922lead to sources being gained or lost.  We call sources that appear in
923only the input image or reference catalog \emph{distractors} and
924\emph{dropouts}, respectively.  The existence of distractors and
925dropouts means that we can never assume that all the objects in the
926reference catalog will be contained in an image to be recognized, or
927vice versa.
928
929
930The images to be recognized are subregions of the sky.  Image sizes
931range from nearly half the celestial sphere down to $10^{-7}$ of the
932area and smaller.  The input images measure unknown bands of the
933electromagnetic spectrum, and various nonlinear functions may have
934been applied to the pixel values.  We cannot rely on absolute
935brightness or color to recognize individual stars or galaxies.  At
936best we can hope that there is some positive correlation in the
937relative brightness ordering of objects in the image and the
938corresponding objects in our catalog.
939
940
941Blind astrometric calibration is an ideal task for exploring geometric
942ideas in pattern recognition.  Most celestial objects are effectively
943point sources, and can be found and localized to sub-pixel accuracy
944using relatively simple image-processing procedures.  But since the
945individual features are characterized only by their positions and
946brightnesses, we must examine collections of features in order to
947build distinctive patterns.  In \chapref{chap:techreport} we present
948\an, which applies the geometric hashing framework to the task of
949blind astrometric calibration.  An example of our results in shown in
950\figs \ref{fig:redgreen} and \ref{fig:onthesky}.
951
952
953\comment{
954There are several useful applications for a blind astrometry solver.
955Most telescopes used by professional astronomers are
956computer-controlled and automatically record the telescope pointing in
957the image header, but these systems occasionally fail.  Feeding the
958images to a blind astrometry solver would allow an independent
959verification that the \metadata written by the telescope control
960software are correct.  Second, some observatories have large archives
961of photographic plates taken by astronomers over many decades, and a
962growing fraction of these have been digitized.  These images could be
963useful to researchers, since they provide a longer baseline for
964measuring changes over time.  However, information about where the
965telescope was pointing has often been lost, or is only available as a
966hand-written note on the edge of the photographic plate, or as an
967entry in a logbook.  A blind astrometry solver would be an invaluable
968tool for easily providing accurate \metadata about these images.
969Finally, these is a large and enthusiastic community of amateur
970astronomers who produce many images that would be useful to
971professional astronomers, but since amateurs are seldom interested in
972recording exact \metadata, these images are not readily available to
973professional astronomers.  A blind astrometry solver would unlock this
974trove of pixels.
975
976Other interesting applications include creating a customized
977star-finding chart overlayed on the input image, and controlling a
978telescope by using the output of a blind solver as feedback to a
979control system.
980
981\begin{figure}
982  \begin{center}
983        %\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c}
984        \includegraphics[width=\figunit]{moon} \\
985        \includegraphics[width=\figunit]{saturated}
986        %\end{tabular}
987  \end{center}
988  \caption{Astronomical images with occlusion.  \captionpart{Top:} The
989moon, buildings, and mountains occlude the stars behind them (image
990copyright Johannes Schedler, \texttt{http://panther-observatory.com}).
991\captionpart{Bottom:} Saturation and diffraction spikes due to bright
992objects can obscure nearby objects.}
993\label{moon}
994\end{figure}
995
996Since most celestial objects are effectively point sources, the
997obvious image feature to use is an individual object.  Feature
998detection is relatively straightforward, and the positions of features
999can be measured with subpixel accuracy, but the feature descriptor
1000essentially contains only brightness and position; individual features
1001are not distinctive.
1002
1003Although occlusion is typically not a problem in astronomical images,
1004it does occasionally occur (see Figure \ref{moon}).  In addition,
1005bright objects in the image can cause diffraction spikes or saturation
1006over a large region of the image, and this can cause nearby objects to
1007be hidden.
1008}
1009
1010
1011
1012\section{Related work in astrometric calibration}
1013
1014There seem to be two distinct groups of researchers who have worked on
1015astrometric calibration.  The first are professional astronomers,
1016whose images are typically of small angular extent, long exposure
1017time, and high quality.  They typically assume that there is a good
1018initial guess of the astrometric calibration and the goal is to
1019produce a very accurate calibration, including image distortion.  The
1020second group of researchers are spacecraft engineers who want to use
1021the stars to estimate the attitude of a camera attached to a
1022spacecraft.  Here the images are of wide angular extent, have short
1023exposure time, and are very noisy.  Primary concerns include weight,
1024power consumption, and robustness (especially in avoiding false
1025positives), while a high degree of accuracy is neither required nor
1026possible given the hardware.
1027
1028\subsection{Non-blind astrometric calibration}
1029
1030Non-blind astrometric calibration requires that an initial estimate of
1031the calibration to be provided.  Groth \cite{groth1986} presents an
1032algorithm for matching two lists of coordinates (eg, image coordinates
1033in pixels and reference star coordinates on the celestial sphere),
1034assuming that the lists contain a significant proportion of objects in
1035common.  With the limited computing resources available at the time,
1036he suggests taking the brightest 20 or 30 objects in each list.
1037
1038
1039Once the two lists of objects have been compiled and the brightest
1040objects selected, all sets of three objects are enumerated and the
1041triangles they form are described by a scale-, rotation-, and
1042translation-invariant descriptor, composed of the length ratio of the
1043longest to shortest edges, plus the cosine between these edges and the
1044\emph{sense} or \emph{parity} of the triangle.  The tolerances
1045associated with these features are computed (by propagating their
1046positional errors through the feature descriptor process) and stored,
1047along with the logarithm of the triangle's perimeter.  Triangles with
1048large length ratio are rejected since they are relatively
1049indistinctive.
1050
1051
1052After triangle features are extracted from both point lists, feature
1053matching is performed by checking whether the distance between each
1054pair of features is less than their corresponding tolerances.  This
1055process is accelerated by sorting the features on one of the feature
1056dimensions.  If multiple matches are found for a particular feature,
1057only the closest is considered.  After all the features have been
1058compared, many correct matches should be found, along with some false
1059matches.  For each match, the difference of the log-perimeters of the
1060two triangles is computed.  This gives the relative scales of the two
1061triangles and hence their coordinate frames, assuming the match is
1062correct.  False matches are rejected by iterative outlier detection:
1063the mean difference of log-perimeters is computed and matches far from
1064the mean are rejected.
1065
1066
1067This approach is essentially an application of the geometric hashing
1068method, though instead of using hashing to accelerate feature
1069matching, the features are simply kept in lists that are sorted on one
1070dimension.  Much of the subsequent work in this area follows
1071essentially the same path (eg, \cite{valdes1995}, \cite{dong2006}).
1072
1073
1074P\'al and Bakos \cite{pal2006} adapt the triangle-matching approach to
1075images containing many more objects (of order $10^4$).  Since it would
1076become prohibitively expensive to enumerate all triangles in such an
1077image, they reduce the number of triangles created by using only the
1078triangles created by a Delauney triangulation.  This vastly reduces
1079the number of triangles created, but makes the algorithm more
1080sensitive to dropout and distractor stars: a single extra star causes
1081a completely disjoint set of triangles to be created in its
1082neighbourhood.  To compensate for this shortcoming, they define an
1083\emph{extended Delauney triangulation}: for each point, they select
1084all points at distance $\ell$ in the Delauney triangulation, and
1085triangulate this set.  This process is repeated for each point.  The
1086first extended triangulation ($\ell = 2$) skips over the nearest set
1087of stars and builds larger triangles by using the next-further set of
1088stars.  The intent is that if some of the nearby stars are distractors
1089that they will be ignored when the extended triangulations are used.
1090
1091\begin{figure}
1092\begin{center}
1093\includegraphics[width=\figunit]{pal-fig2}
1094\end{center}
1095\caption{The triangle parameterization used by P\'al and Bakos.  This
1096figure is a reproduction of \fig 2 in \cite{pal2006}.\label{pal}}
1097\end{figure}
1098
1099P\'al and Bakos introduce a new triangle parameterization which is
1100continuous and sensitive to chirality (parity); see \figref{pal}.
1101This two-dimensional parameter space is used as the geometric feature
1102space.  When matching triangles between the two images, they demand a
1103\emph{symmetric point match}: each triangle must be the other
1104triangle's nearest neighbour in feature space.  Matching two images is
1105done by creating the lists of triangles and attempting to find
1106symmetric point matches.  This process is accelerated by sorting each
1107list by one of the coordinates.  Each triangle match is considered to
1108be a vote for the correspondence of the three pairs of points
1109composing the triangles.  These votes are accumulated in a sparse
1110matrix where element $(i, j)$ contains the number of votes for a
1111correspondence between object $i$ in the first image and object $j$ in
1112the second image.  After all matches are considered, the top 40\% of
1113the nonzero matrix elements are assumed to contain the true
1114correspondences.  A transformation based on these correspondence is
1115computed and the unitarity of the transformation matrix is used to
1116judge whether the match is true or false.
1117
1118
1119A different approach is taken by Kaiser \etal \cite{kaiser1999}.  They
1120assume they are given two lists of source positions that differ by a
1121rigid transformation involving scaling, rotation, and translation.  In
1122each image, they iterate over each pair of points and compute the
1123vector difference of their positions.  For each list, the log-length
1124and angle of the pairwise difference vectors are accumulated in a
1125two-dimensional histogram.  Observe that if the whole list of points
1126is scaled up by a constant factor, then the log-distance between each
1127pair of points increases by a constant amount.  Similarly, if the
1128whole list is rotated then the angles shift by a constant amount.
1129Once the two histograms have been computed, their cross-correlation is
1130computed.  If the two lists contain a significant number of
1131corresponding points, the cross-correlation signal will be strongest
1132at a shift corresponding to the difference in log-scale and rotation
1133between the lists.  This process is similar to a Hough transform
1134\cite{duda1972,ballard1981}, except that instead of finding the peak
1135of a single parameter-space histogram, we are searching for a peak in
1136the similarity of two histograms as we shift them with respect to each
1137other in parameter space.
1138
1139
1140Once the scaling and rotation between the lists has been found, the
1141translation can be found by scaling and rotating one of the lists into
1142the frame of the other list, then histogramming the vector difference
1143between points across the two lists.  This is a standard (generalized)
1144Hough transform.
1145
1146
1147\subsection{Blind astrometric calibration}
1148
1149
1150The majority of previous work on blind astrometric calibration is
1151motivated by the problem of spacecraft attitude estimation.  Sometimes
1152called the ``lost in space'' or ``stellar gyroscope'' problem, the
1153task is to estimate the pose of a spacecraft by using an image of the
1154sky captured by an onboard camera.  Although similar in general
1155spirit, the requirements and limitations of this application are quite
1156different than astronomical applications.  Mass, power consumption,
1157and robustness of the system are primary concerns, and as a result the
1158optical designs are very different from science-grade astronomical
1159instruments, and the available processor and memory resources are very
1160restricted.  Typical fields of view are tens of degrees across, and
1161the exposure time is kept short to allow the system to function while
1162the spacecraft is rotating.  As a result, image quality is typically
1163quite poor: often only a handful of the brightest stars are visible.
1164Since the field of view is large, a reference catalog of a few
1165thousand stars is sufficient to ensure that any view of the sky
1166contains many reference stars.  Since the system will only process
1167images from a single camera, the whole system can be customized and
1168calibrated to that camera.  For example, the nonlinear distortions of
1169the optical system can be measured, and the scale and bandpass of the
1170imaging system are known, so the reference catalog can be tailored to
1171match.
1172
1173
1174For example, Liebe \etal \cite{liebe2004} describe a system design
1175with a $56~\deg$ field of view and exposure time of $50~\unit{ms}$.
1176The resulting images contain tens of stars if the spacecraft is not
1177rotating, but on average only three stars will be detectable when the
1178rotation rate is $50~\deg/\unit{sec}$.  The paper does not describe a
1179particular algorithm for star identification, with the implication
1180that it is not a particularly difficult problem since absolute
1181brightness information will be available, and the total number of
1182stars that are visible to the camera is only a few hundred.
1183
1184
1185In earlier work \cite{liebe1993}, Liebe describes a star
1186identification system.  The reference catalog is composed of the 1539
1187brightest stars, with brightness calibrated to the camera used in the
1188system.  The field of view is $30~\degrees$, and the system uses a
1189feature-matching approach, using the brightest star in the field and
1190its two nearest neighbours to define a geometric feature.  The feature
1191descriptor is composed of the distance from the brightest star to its
1192nearest and second-nearest neighbours, along with the angle between
1193these neighbours.  Note that this feature is not scale-invariant,
1194because the system is only meant to solve images taken by one camera,
1195so the scale is known.  An index is constructed by enumerating all
1196such features that could possibly be detected, given the detection
1197limits of the camera.  These features are coarsely quantized and
1198stored in a table.  To account for noise in the feature descriptors,
1199all neighbouring cells in the quantized feature space are also stored
1200in the table.  This generates $185,000$ features.  At test time, the
1201features in the image are enumerated and the table of features is
1202scanned; an exact feature match in the quantized space is assumed to
1203be correct.
1204
1205
1206This approach is a fairly straightforward geometric hashing technique,
1207except that it does not use hashing as such, and there is no voting or
1208verification scheme because feature aliasing is assumed not to happen.
1209
1210\begin{figure}
1211\begin{center}
1212\includegraphics[width=\figunit]{grid}
1213\end{center}
1214\caption{A grid-based feature: two sources are used to define a local
1215  coordinate system which is discretized into a grid of cells.  Each
1216  cell becomes a bit in the feature descriptor; if a cell is occupied
1217  its bit is set.  This figure is copied from MacKay
1218  \cite{mackay2005}.}
1219\label{gridbased}
1220\end{figure}
1221
1222In contrast to systems that build features from the precise locations
1223of a small number of stars, Padgett and Kreutz-Delgado
1224\cite{padgett1997} present an approach that incorporates information
1225from a large portion of the image.  Their grid-based feature is
1226defined by first choosing one star as the reference star to define the
1227center of the grid, then selecting its nearest neighbour (outside an
1228exclusion radius) to define the orientation of the grid.  Once the
1229center and orientation are determined, a grid is defined, and each
1230remaining stars in the image is assigned to a grid cell.  The feature
1231is a bit vector, one bit per grid cell, where the bit is set only if
1232the cell contains a star.  See \figref{gridbased}.
1233
1234
1235The index is constructed by scanning the sky and selecting, for each
1236field of view, the $n$ brightest stars.  For each of these stars a
1237feature is computed and stored.  At test time, they examine the
1238brightest $c n$ stars (for some safety factor $c$), computing the
1239feature for each one and searching the index for a match.  A pair of
1240features is defined to match if their dot product is above a
1241threshold.  This is equivalent to taking the bitwise \textsc{AND} of
1242the bit vectors and counting the number of bits that are set.  This
1243search can be implemented efficiently by using lookup tables: for each
1244bit, they maintain a list of the features for which that bit is set.
1245Note that this is equivalent to using a grid-based geometric hashing
1246approach: each grid cell is equivalent to a discretized relative
1247coordinate vector, which becomes a hash key, and the ``lookup table''
1248is a hash table.
1249
1250
1251After all the features in the test image have been extracted and
1252matched to the index, the algorithm proceeds to a voting (or perhaps
1253``consensus'') phase where it rejects matches that propose a field of
1254view that does not overlap that of the majority.
1255
1256
1257The system is designed to operate on images of diameter $8~\degrees$,
1258and performs well on simulated data.  They use a grid size of $40
1259\times 40$, and find that on average $25$ grid cells are filled.
1260Their index contains $13,000$ patterns, which means that false
1261positives are quite rare.  However, misidentification of the nearest
1262neighbour, or edge effects (assigning a star to the wrong grid cell
1263due to positional noise) mean that failures to find a match are not
1264uncommon, and are more likely in regions of the sky with high stellar
1265density.
1266
1267In later work, Clouse and Padgett \cite{clouse2000} extend this
1268approach by using Bayesian decision theory instead of a simple
1269threshold on the dot product to define how well features match.  This
1270extension, along with smaller noise levels in the (simulated) imaging
1271system, allows them to extend the approach down to fields of view
1272$2~\degrees$ in diameter.
1273
1274\comment{
1275  Given two features, they compute the dot product between the bit
1276  vectors, then proceed to estimate the probabilities that the match is
1277  a true positive and false positive.  These probability distributions
1278  are quite complex, so several simplifying approximations are made, and
1279  the remaining parameters are estimated by running simulations.
1280  }
1281
1282Harvey \cite{harvey2004} presents two different approaches, one
1283grid-based and the other shape-based.  The grid-based approach is
1284similar to the Padgett--Kreutz-Delgado and Clouse--Padgett approaches
1285\cite{padgett1997, clouse2000}.  A coarser grid is used, so more stars
1286are likely to appear in each bin.  To compensate, a grid cell is only
1287considered ``occupied'' if it contains more than some threshold number
1288of stars.  The other major change is that he expects a test image to
1289be an ``overexposure'' or ``underexposure'' relative to the reference
1290catalog.  This implies that the test image should contain either a
1291subset or a superset of the stars in the index, and therefore the
1292image feature vector must be either greater than or less than an index
1293feature at each bit.  This allows simple bit operations to be used to
1294find feature matches, and feature matching is performed by a linear
1295scan through the index.
1296
1297Harvey's shape-based approach uses constrained $n$-star constellations
1298in order to aggregate information from a large number of stars without
1299allowing the number of potential features to grow combinatorially.
1300Specifically, Harvey uses a pair of stars to define a narrow wedge in
1301the image, then describes the relative positions and angles of a fixed
1302number of nearby stars within that wedge.  Unfortunately, this makes
1303the feature highly sensitive to distractor and dropout stars, since
1304the feature depends on the stars in the feature being enumerated in a
1305particular order.  As a result, all potential features (allowing any
1306combination of stars to drop out) must be checked; the number of
1307features grows exponentially as the density of stars increases.
1308
1309
1310Harvey makes the useful observation that a cascade of indices can be
1311built, where each index is designed to solve images with a particular
1312range of scales.
1313
1314
1315MacKay and Roweis \cite{mackay2005} point out that a grid-based
1316feature such as that used by Harvey leads naturally to a hashing-based
1317strategy.  Each grid cell is associated with a bit that is turned on
1318if the cell is occupied.  This value is placed in a hash table with a
1319mapping back to its position on the sky.  Since false positives can
1320occur as a result of feature aliasing or hash collision, a voting
1321scheme is employed: several feature matches must accumulate before the
1322match is accepted.  Since false negatives can occur as a result of
1323dropouts and distractors (\emph{any} missing or extra star causes the
1324hash code to change completely), many features must be extracted for
1325each region of the sky.
1326
1327
1328\subsection{Fine-tuning astrometric calibrations}
1329
1330In order to ``co-add'' or ``stack'' astronomical images (combine
1331pixels from different images to produce a higher signal-to-noise
1332image), or to do proper-motion studies (measure the movement of stars
1333over time), it is necessary to fine-tune the astrometric calibration
1334of the images.  This is similar to the \emph{bundle adjustment}
1335problem in computer vision \cite{triggs2000}, in that it involves the
1336simultaneous optimization of the various camera and telescope
1337parameters and the estimated positions of objects in the world.
1338Fine-tuning astrometric calibrations is easier because it is
1339essentially two-dimensional, but more difficult because the camera
1340parameters include polynomial distortion terms to model the image
1341distortion introduced by telescope optics.
1342
1343
1344The software package \scamp by Bertin \cite{bertin2005} is a popular
1345tool used by astronomers to fine-tune simultaneously the astrometric
1346calibrations of a large collection of images.  For each image, it is
1347assumed that the center of the image is known to about 25\% of the
1348size of the image, and the scale is known to within a factor of two.
1349The histogram-alignment method of \cite{kaiser1999} is used to find
1350the translation, scaling, and rotation between the image and a
1351reference catalog, which allows the correspondences between image and
1352reference catalog stars to be determined.
1353
1354The core of the \scamp system is a chi-squared minimization of the
1355total weighted distance between the projected positions of all star
1356correspondences among the set of images and the reference catalog.
1357The parameters to be adjusted are the center, scale, rotation, and
1358polynomial distortion coefficients of each image.  A key feature is
1359the ability to share image distortion parameters among subsets of the
1360images, since images taken with the same telescope are expected to
1361share some distortion terms due to the telescope optics.  This reduces
1362the degrees of freedom of the fitting process, resulting in more
1363robust fits.  \scamp can fine-tune thousands of images to sub-pixel
1364accuracy.
1365
1366
1367\section{Summary}
1368
1369
1370The task of automatically finding the astrometric calibration of an
1371astronomical image---this is, recognizing the stars in the image, or
1372locating the image on the sky---is ideal for a geometric hashing-based
1373approach.  Geometric hashing (or more generally, geometric feature
1374matching) is a widely-applicable technique for problems in which it is
1375desirable to build distinctive geometric features out of simple
1376interest-point features.  The approach is ideally suited to the blind
1377astrometric calibration problem, since images of celestial objects can
1378be localized to sub-pixel accuracy but have few other distinguishing
1379features.  Indeed, much of the previous work in blind and non-blind
1380astrometric calibration can be placed within a geometric feature
1381matching framework.
1382
1383
1384\comment{
1385  Although one might suppose that hashing is an intrinsic part of
1386  geometric hashing, we have found that a tree structure is more
1387  effective for the large-scale feature matching we want to perform; we
1388  feel that deciding to use approximate feature matching should be a
1389  design decision, not an intrinsic part of the method: in cases where
1390  each image contains only a few features that could potentially be
1391  found in the index, we do not want to risk missing them due to ``bad
1392  luck'' in a hash function.
1393}
1394
1395
Note: See TracBrowser for help on using the repository browser.