source: trunk/documents/theses/dstn/review.tex @ 12308

Revision 12308, 71.1 KB checked in by dstn, 14 months ago (diff)

screwing around with numbering paragraphs

Line 
1\section{Preface}
2
3\tracingmacros=2
4\numberparagraphs
5
6This thesis describes some of the research carried out by me and my
7colleagues in the \an group.  \an started as a collaboration between
8Sam Roweis, a University of Toronto computer scientist, and David
9W.~Hogg, a New York University astronomer.  An idea that began as a
10crazy scrawl on the back of a napkin---probably in a bar---eventually
11grew into a real project and began attracting collaborators and
12students.  Fast-forwarding a few years, we have achieved a rare feat:
13a computer vision software system that \emph{works}, with little human
14intervention, and is being used by hundreds of real astronomers doing
15real research.  We have also branched out and explored a number of
16other promising areas where ideas in computer vision and machine
17learning can be applied to astronomical data to great benefit.
18
19
20This thesis is quite blatantly a collection of manuscripts.  I have
21done a bit more than staple them together to produce the chapters of
22this thesis, but the original tone and style of each manuscript still
23shines through.  I hope my readers will find the chapter-to-chapter
24variations a refreshing, rather than jarring, change of pace.  One
25advantage of collecting manuscripts into a thesis is that each chapter
26should largely stand alone.  Another advantage is that each manuscript
27was prepared with a list of authors, so it is simpler to identify the
28collaborators who have contributed to each chapter.  %Each chapter has
29%a note detailing the authors of the original manuscript and explaining
30%my personal contribution to the work described.
31
32
33\section{Introduction}
34
35
36The general idea of the \an collaboration is to apply ideas in
37computer vision and machine learning to problems and data in
38astronomy.  Astronomical images are a good domain in which to explore
39ideas in computer vision, because the problems to be solved are often
40much more constrained than in general imaging, the telescopes and
41cameras are often well-understood and calibrated, large volumes of
42data and ``ground truth'' information exist, and with new telescopes
43constantly coming online and producing greater and greater volumes of
44imaging, there is a need for sophisticated computation.  I briefly
45outline astronomical imaging for computer vision researchers in
46\secref{sec:astrointro}.
47
48
49The specific idea addressed in this thesis is that pattern recognition
50approaches can be applied to astronomical images.  The goal is to
51produce a system that is able to recognize automatically the stars in
52any image of the night sky, using the information in the image pixels
53alone.  Recognizing the stars in an image is equivalent to finding the
54pointing, scale, and rotation of the image on the sky in a standard
55coordinate system.  The branch of astronomy concerned with measuring
56the positions and motions of celestial bodies is called
57\emph{astrometry}, and the task of placing a new image in a standard
58reference frame is known as ``solving the astrometry'' or
59``calibrating the astrometry'' of the image.  The \emph{blind
60astrometric calibration} task is to calibrate an image using only the
61image itself.  The broad goal of the \an project was to build a system
62that would allow us to create correct, standards-compliant astrometric
63\metadata for every useful astronomical image ever taken, past and
64future, in any state of archival disarray.  This is part of a larger
65effort to organize, annotate and make searchable all the world's
66astronomical information.
67
68
69After a brief introduction to astronomical imaging for computer
70scientists, the remainder of this chapter reviews the area of pattern
71recognition, and in particular the framework of \emph{geometric
72hashing} for object recognition in images.  One of the contributions
73of this thesis is to replace the simple hash table used in traditional
74geometric hashing with a \kdtree, so I review related work in hashing
75and other approaches for fast feature matching.  Finally, I present
76the blind astrometric calibration task as an instance of object
77recognition, and review some previous approaches to the problem.
78
79%% Our approach?
80
81
82The remaining chapters present our approach to the blind astrometric
83calibration problem.  \Chapref{chap:techreport} explains our approach
84and presents the results of large-scale tests of the system on
85real-world data.  Chapters \ref{chap:verify} and \ref{chap:kdtree}
86delve into details of the approach: \Chapref{chap:verify} presents the
87Bayesian decision theory problem that lies at the heart of our
88approach, while \chapref{chap:kdtree} explains the technical details
89of the \kdtree data structure implementation that is key to making our
90system performant enough to be a practical tool.
91\Chapref{chap:conclusion} summarizes our results.
92
93
94\section{Astronomical imaging for computer scientists}
95\label{sec:astrointro}
96
97Astronomical images are very different from images produced by typical
98consumer-grade cameras.  This section is intended to give
99non-astronomers a sense of the typical parameters of contemporary
100astronomical imaging setups, and introduce some of the terminology and
101key ideas.
102
103
104The Sloan Digital Sky Survey (SDSS) \cite{sdsstechnical, sdsscamera}
105is one of the most important projects in contemporary astronomy.  The
106primary imaging survey was carried out from 2000 through 2008, so the
107hardware is no longer cutting-edge, but it is still very impressive.
108The telescope's primary mirror is $2.5$ meters in diameter.  Incoming
109light first strikes the primary mirror, then a one-meter secondary
110mirror, then passes through two corrective lenses in front of the
111camera.  The nearly distortion-free field of view is about $3$ degrees
112wide.  The camera is composed of $30$ charge-coupled devices (CCDs),
113each $2048\times2048$ pixels, arranged in six columns.  The five CCDs
114in each column have different bandpass filters, ranging from the
115near-infrared ($913~\nanometers$) through the optical to the
116near-ultraviolet ($354~\nanometers$).  The CCDs and associated
117electronics are cooled with liquid nitrogen to $-80^{\circ} \unit{C}$
118and held under vacuum.  Each CCD is about $5~\unit{cm}$ square, and
119the whole focal plane of the camera is about $65~\unit{cm}$ in
120diameter.  The camera assembly has a mass over $300~\unit{kg}$.
121
122
123The survey operates in ``drift-scanning'' mode: the telescope is
124pointed in a nearly fixed direction, and as the Earth rotates the sky
125drifts past the camera.  The camera electronics shift the electrons
126along the CCD columns at the same rate.  As an object drifts through
127the field of view, the electrons it produces in the CCD pixels march
128along with it, and are read out as it disappears from view.  In the
129SDSS camera, the object drifts past each of the five bandpass-filtered
130CCDs of the camera in turn, producing a five-color image.  Over the
131course of a night of observing, the camera sweeps out a great circle,
132creating six columns of five-band images that are $2048$ pixels wide
133and over a million pixels long: over $120$ gigabytes of pixels per
134night.  This clever drift-scanning scheme has many advantages, one
135being that the readout rate of the CCDs can be relatively slow,
136resulting in low readout noise.  Calibration of the camera is simpler
137because charge is integrated over columns of the CCD: it is only
138necessary to calibrate whole columns rather than individual pixels.
139Note that for this scheme to work, the CCDs must be very carefully
140aligned in the focal plane, and the telescope control system must be
141capable of keeping the system pointed stably at the level of a pixel
142or better.  The moving mass of the telescope is over
143$15,000~\unit{kg}$, so this is no mean feat!
144
145% 24 um pixels
146
147Astronomical imaging systems are designed so that the point-spread
148function (PSF)---the image on the CCD of a distant point source of
149light---is spread over several pixels.  This allows the position of a
150point source to be measured to sub-pixel accuracy by fitting a model
151of the PSF to the observed pixel values.  If the system focused a
152point source onto a single pixel, this would be impossible.  The goal
153is typically for the full-width at half-maximum (FWHM) of the PSF to
154be a few pixels, so that the PSF is well-sampled but the signal is not
155spread out over too many pixels.
156
157For ground-based telescopes, the moving atmosphere causes the image of
158a distant point source of light to dance around on the CCD; in typical
159exposures of many seconds this ``speckling'' is integrated into a
160point-spread function.  The size of this PSF is called the ``seeing''
161and a value of one $\unit{arcsecond}$ FWHM is considered quite good.
162The SDSS camera has pixels of size $0.396~\unit{arcseconds}$, and the
163point-spread function is dominated by the seeing: the optics of the
164system are good enough that the images are as clear as they can be
165given the atmosphere.
166
167
168The positions of astronomical objects are usually measured in an
169\emph{equatorial coordinate system}: essentially the projection of the
170Earth's latitudes and longitudes onto the celestial sphere at a
171specific instant.  The latitudinal direction is called ``declination''
172and abbreviated ``Dec'', and the longitudinal direction is called
173``right ascension'' and abbreviated ``RA''.  Declination is typically
174measured in degrees from $-90$ to $+90$, and right ascension is
175typically measured in degrees from $0$ to $360$, or hours from $0$ to
176$24$.  There are $60~\unit{arcminutes}$ per degree and
177$60~\unit{arcseconds}$ per $\unit{arcminute}$.  For reference, the
178Moon as seen from Earth is about half a degree in diameter.  The
179$0.396~\unit{arcsecond}$ pixels of the SDSS camera view an area the
180size of a penny at a distance of ten kilometers; each $2048\times2048$
181pixel CCD views an area of about one-millionth of the sky.
182
183
184The SDSS is very impressive, but the next generation of telescopes
185currently in development dwarf its specifications.  The Large Synoptic
186Survey Telescope (LSST) \cite{lsst}, for example, has an
187$8.4~\unit{meter}$ primary mirror, with a $3.5~\unit{degree}$ field of
188view almost completely covered by an array of nearly $200$ CCDs, each
189with $4096\times4096$ pixels: a total of about $3.2$ gigapixels.  The
190camera is the size of a small car.  It will take exposures of about
191$15$ seconds, yielding over $10$ terabytes of data per night.
192
193% -- basics of data reduction; catalogs
194% -- stars, galaxies, and other stuff
195
196
197
198
199\section{Pattern recognition}
200
201\emph{Pattern recognition} encompasses a huge class of problems and
202techniques in computer vision and machine learning.  Many pattern
203recognition problems are of the type ``identify this object'' or
204``find things like this'': the system is first told about a large set
205of objects, then is presented with a novel object and is asked to
206describe it or find similar objects from the set of known objects.
207
208Pattern recognition systems often take a \emph{feature matching}
209approach.  A \emph{feature} is a compact description of
210(problem-specific) relevant, invariant properties of an object, such
211that objects that are similar in the problem domain have similar
212features.  Defining how features are extracted from objects and
213defining the similarity between features are key factors in the
214success of such systems.  Often the representation of a feature (the
215\emph{feature descriptor}) is a point in a real coordinate space and a
216distance metric defines the similarity between features.
217
218Feature-based approaches typically involve two phases: an
219\emph{indexing} phase, in which the set of known objects is shown to
220the system, and a \emph{test} phase, in which novel objects are
221presented.  During indexing, features are extracted from the set of
222known objects.  At test time, features are extracted from the novel
223object and compared with the set of features from the known objects.
224In many domains, the set of known objects is very large, so it is
225critical that features be stored in a manner that allows the system to
226rapidly find known features that match a given feature.
227
228Often feature-based approaches suffer \emph{feature aliasing}
229problems: two objects that are dissimilar in the problem domain may
230yield similar features.  In this case, systems may use \emph{voting}
231or \emph{verification} schemes to ensure that aliased (false) feature
232matches are eliminated.  Voting schemes typically assume that false
233feature matches are independently randomly distributed, so the
234probability of matching several features to the same incorrect object
235is the product of the individual probabilities; if we demand a
236sufficient number of matches, the probability of a false match becomes
237small.  Verification or \emph{hypothesize-test} schemes treat a
238feature match as a hypothesis that the novel object is similar to a
239known object, and then some test to decide if the hypothesis is
240correct.  Such schemes can be seen as using feature matching as a
241search heuristic.
242
243\section{Visual pattern recognition}
244
245Pattern recognition in images is a particularly challenging and
246complex problem domain which has been studied extensively.  Objects to
247recognize include faces, fingerprints, specific physical objects, or
248even general classes of physical objects such as mugs, chairs, or
249elephants.
250
251
252Visual pattern recognition of general physical (3-dimensional) objects
253is difficult for many reasons: objects may be deformable, can appear
254different when viewed from different angles or under different
255lighting conditions, and can be occluded by other objects.  Features
256used in visual pattern recognition include edges (lines and curves),
257corners, or the color or texture of image patches.
258
259
260An important consideration when designing features for visual pattern
261recognition is the size (\ie, spatial extent in the image) of the
262feature.  \emph{Local features} gather information from a small region
263of the image, while \emph{global features} incorporate information
264from the whole image.  Local features can provide robustness to
265occlusion and deformation: if part of the object is hidden, features
266belonging to that part will be lost, but the features belonging to the
267remaining parts will still be found.  Since local features aggregate
268information from only a small portion of the image, they may be more
269noisy or less distinctive than global features.
270
271
272Since a single local feature may be insufficient to guarantee that an
273object has been correctly recognized, some systems build high-level
274features by combining multiple low-level features.  When the objects
275to be recognized are rigid or articulated (composed of multiple rigid
276parts connected by simple joints), a common approach is to build
277features that describe the relative geometry of low-level features.
278This use of \emph{geometric features} is often called \emph{geometric
279hashing}, since the result is a feature descriptor or \emph{hash code}
280which describes the geometric arrangement of the low-level features.
281
282
283
284\section{The geometric hashing framework}
285
286\begin{figure}
287\begin{center}
288% dot lamdan.dot -Tps2 -o lamdan.ps
289% ps2pdf -sPAPERSIZE=a4 lamdan.ps lamdan.pdf
290\includegraphics[height=0.8\textheight]{lamdan}
291\end{center}
292\caption{Outline of the Geometric Hashing scheme.  This diagram is a
293  reproduction of \fig 2 in Lamdan \etal \cite{lamdan1990}.
294  \label{lamdan}}
295\end{figure}
296
297
298Lamdan \etal \cite{lamdan1990} and Wolfson and Rigoutsos
299\cite{wolfson1997} give an excellent overview of the geometric hashing
300approach.  See Figure \ref{lamdan} for a block-diagram overview.
301Geometric hashing is based on the fact that when rigid objects undergo
302similarity transformations (translation, rotation, and scaling),
303angles between points on the object and relative distances between
304points do not change.  Therefore, if we describe the relative
305arrangement of points on an object in terms of these invariants, our
306description will be invariant under these transformations of the
307object.  Invariant geometric properties can be found for other types
308of transformations such as affine or perspective projection.
309
310% \cite{affineinvariant}
311
312
313Geometric hashing builds \emph{geometric features} out of lower-level
314features.  We will call the low-level features \emph{interest points}
315for clarity.
316
317Given the types of transformations the system must handle, there is
318some sufficient number of interest points required to define an
319invariant local coordinate system; these are called the \emph{basis
320  set}.  For example, for 2-D objects that can undergo similarity
321transformations, two interest points can be used to define a
322coordinate system.  For 3-D objects with similarity transformations,
323three (non-collinear) interest points are required.
324
325Geometric hashing follows an indexing approach.  During the indexing
326phase, the system preprocesses a database of objects which are to be
327recognized.  For each object, many interest points are extracted which
328are invariant under the transformations that are expected, and can be
329well localized.  The system then enumerates all basis sets by looking
330at all combinations of interest points; each set of basis features
331defines a relative coordinate system.  For each basis set, all
332remaining interest points and enumerated and their positions described
333in this coordinate system.  Each relative coordinate vector forms a
334\emph{geometric feature descriptor}.  In this way, the relative
335geometric arrangement of a set of interest points is converted into a
336point in a vector space.
337
338% --- consider adding a variant of the quad fig here.
339
340In the standard geometric hashing approach, the geometric feature
341descriptor is discretized and encoded as a hash table key. A grid is
342place over the feature descriptor space, and the hash table key of a
343geometric feature is the identifier of the grid cell containing it.
344Each entry in the hash table maps back to the basis set plus the
345feature whose position is encoded.
346
347% --- picture?
348
349Given a new image, a geometric hashing system extracts all interest
350points from the image and, as at indexing time, proceeds to enumerate
351all basis sets and computes the positions of the remaining interest
352points in each basis set.  This generates many geometric features
353whose descriptors are discretized and converted to hash table keys, as
354before.  Similar arrangements of interest points will lead to similar
355geometric feature descriptors, which--with luck---will be mapped to
356the same hash table key.  The hash table values associated with each
357key are retrieved, and each value that is found is considered a match.
358Each match votes for the correspondence between the interest points
359from the database and the interest points from the image that were
360used to define the geometric feature.
361
362
363After all features from the image have been processed, many votes will
364have accumulated.  False matches (due to aliasing or hash table
365collisions) should be uniformly distributed over the set of known
366objects, so should not generate very many votes for any particular
367object.  Objects that are actually present in the image should
368generate many votes.  Objects that receive an insufficient number of
369votes are rejected, and the remainder are subjected to a verification
370process.
371
372
373The verification process requires computing a transformation from the
374object to image coordinates, using the corresponding interest points
375from the known object and the image.  Using this transformation, all
376the interest points belonging to the known object are projected into
377image coordinates.  If the match is correct, we expect the image to
378contain features at these locations.  Some researchers add an extra
379verification stage by projecting the known object into the image and
380checking that the predicted edges of the object are found in the image
381(eg, \cite{lamdan1990, huttenlocher1990}).
382
383
384The geometric hashing approach is flexible and applicable to many
385problems, but there are some issues with the basic version described
386here.  The hash function described above simply places a grid over the
387relative coordinate system and returns the identifier of the grid cell
388in which the interest point lands.  This is prone to edge effects: if
389an interest point is near the edge of a grid cell, a small change in
390its position due to noise can move it across the edge into a different
391grid cell, which causes the hash code to change and a different set of
392feature matches to be found.  This problem can be overcome by
393detecting interest points that are near the edges of their cells and
394also searching the hash table using the hash codes of neighbouring
395cells.  This results in more false feature matches, and can become
396expensive as the dimensionality of the geometric feature space
397increases (there are more edges, and the proportion of the hypervolume
398of feature space that is near an edge increases).  Using a data
399structure other than the hash table may be beneficial.
400
401
402Another issue with uniform grid-based hashing is that it assigns equal
403importance (and storage space) to each grid cell in the feature space.
404Consider an interest point that is far from the basis set that defines
405its local coordinate system.  Small positional errors in the basis set
406can cause the coordinate system to be rotated or scaled, which results
407in large changes in the interest point's relative coordinates.  This
408means that edge effects become more pronounced in such regions of the
409feature space.  One response to this concern is to increases the size
410of the grid cells far from the basis set, but then if the interest
411points are uniformly distributed, these larger hash table bins will
412contain more features, so every match to these bins will be
413accompanied by more false positives.
414
415
416Finally, a hash table with equal-sized cells will only be uniformly
417utilized if interest points are uniformly distributed in feature
418space, but some domains may yield interest points that are far from
419uniformly distributed.  In other domains, it may be beneficial to
420place constraints on the geometric features that are used (for
421example, to avoid regions that are prone to large errors, or that are
422relatively indistinctive).
423
424
425There is nothing in the geometric hashing framework that requires a
426standard hash table to be used for feature matching.  Other methods of
427feature matching are described in the next section.
428
429
430\section{Related work in fast feature matching}
431
432
433For feature matching in a geometric hashing framework, the database of
434known objects is converted into a set of points in a feature
435descriptor space, which is a vector space.  To find matches to a new
436feature (extracted from an image in which we want to recognize
437objects), we need to find all points in the vector space that are
438within some tolerance.  Usually the Euclidean distance metric is
439applied, so the tolerance is a radius in the vector space.
440
441
442\subsection{Bloom filters}
443
444A Bloom filter \cite{bloom1970} is a data structure that can answer
445queries of the form ``is feature $q$ a member of the set of known
446features?''  A Bloom filter is a set of $n$ bits, initially zero.  A
447set of $k$ hash functions must be defined which take a feature as
448input and produce an integer in $[0, n)$ as output.  In the indexing
449phase, we examine each of the known features, and apply each of the
450hash functions, producing $k$ hash codes.  Each code is used to
451reference a bit and \emph{set} it (\ie, turn it on).  Each known
452feature results in $k$ bits being set; several features may request
453that a particular bit be set.
454
455
456Given a new feature to match, we run the $k$ hash functions on it and
457test whether each of the resulting bits are set.  If the feature is
458equal to a feature seen at indexing time (according to the hash
459functions), then all $k$ of the bits will have been set; thus the
460Bloom filter does not produce false negatives.  If the query feature
461is not equal to a known feature, then some of the bits may have been
462set due to \emph{collision} of the hash functions, but the feature
463will only be accepted if all $k$ hash functions result in collision;
464this results in a false positive.  The rate of false positives
465increases as the number of known features increases relative to the
466number of bits in the filter.  The Bloom filter is ideally suited to
467cases where the number of known features is small, and a verification
468step can be applied to eliminate any false positives that are
469produced.
470
471
472For feature matching, we want to know not only whether the query
473feature $q$ matches a known feature, we want to know \emph{which}
474known feature matches.  One way to answer such queries is to use a
475\emph{Bloomier filter} \cite{chazelle2004}.  The simplest Bloomier
476filter uses multiple Bloom filters to categorize a given feature into
477two different categories, and uses slightly more space than two Bloom
478filters.  A set of $n$ such Bloomier filters can be used to categorize
479a feature into $2^n$ different categories by assigning one Bloomier
480filter to each bit of the result.
481
482
483The standard Bloom filter only finds exact matches, but we want to
484find \emph{nearby} matches.  An extension of Bloom filters to allow
485proximity searches is given by Kirsch and Mitzenmacher
486\cite{kirsch2006b}.  This variant uses \emph{locality sensitive
487hashing} (LSH, discussed below) as its hashing function, so it
488inherits some limitations from LSH.  Most importantly, LSH only
489guarantees with some probability that nearby features will hash to the
490same bin.  As a result, the locality-sensitive Bloom filter can
491produce false negatives as well as false positives.
492
493
494In order to answer feature matching queries (find all known features
495near a given query feature), one could combine these Bloom filter
496extensions to build a locality-sensitive Bloomier filter.  However,
497the false negative rate of the locality-sensitive filter would be
498magnified because the Bloomier filter requires \mbox{$2
499\ceil{\log(n)}$} separate filters to represent $n$ known features, and
500all of these filters must produce the correct answer.  It seems that
501Bloom filters are not particularly well suited to this task.
502
503
504\subsection{Locality Sensitive Hashing}
505
506Locality Sensitive Hashing (LSH) \cite{datar2004, gionis1999,
507andoni2006}, is a technique for answering approximate nearest
508neighbour queries: it returns a feature whose distance is within a
509constant factor of the distance to the nearest neighbour.  The LSH
510scheme can be extended to handle exact nearest-neighbour queries
511efficiently if the set of known features satisfies some \emph{bounded
512growth} criteria.  It is intended for use in high-dimensional spaces
513(tens to thousands of dimensions), and $\ell_p$ distance metrics for
514$p \in (0, 2]$.
515
516
517The core idea of LSH is to use several hash functions with the
518property that the probability of collision---features being placed in
519the same bin---is much higher for nearby features than for distant
520ones.  In the indexing stage, we populate the hash table by running
521each hash function on each known feature.  To perform a query, we run
522the hash functions on the query feature and retrieve the features in
523the same hash table bin as the query.  One of these features is likely
524to be an approximate nearest neighbour of the query.
525
526
527In practice, in \cite{datar2004} each hash function is composed of
528several (typically $10$) simpler hash functions.  Each simple hash
529function performs a random projection in the feature space: the hash
530value is the discretized value of the dot product of the feature
531vector with the hash function's random vector.  Combining several of
532these simple hash functions is equivalent to projecting the feature
533vectors onto a grid in a $10$-dimensional, non-orthogonal, subspace.
534The hash code is simply the identifier of the grid cell in which the
535feature falls.
536
537
538This scheme suffers from edge effects: a query feature may land in a
539bin other than the bin containing its nearest neighbour due to small
540positional noise in any of the dimensions.  Attempting to reduce this
541effect by making the bin size (discretization level) larger fails
542because the number of hash functions must be increased to maintain the
543same total number of hash bins.
544
545
546In order to reduce the number of false negatives, we can choose many
547(typically $30$) different hash functions and place a reference to
548each feature in the hash table bin chosen by each function.  This
549increases the probability that at least one of the hash functions will
550select a $10$-dimensional subspace in which the query is close to its
551nearest neighbour.
552
553
554The standard LSH scheme does not seem particularly well suited for the
555purpose of feature matching in a geometric hashing system, since the
556dimensionality of the feature vectors is usually moderate ($2$ to $6$
557in the \an system). Since some feature aliasing is expected in
558geometric hashing systems, the nearest neighbour may not be the
559correct match: we would prefer to get all neighbours within a given
560search radius.
561
562
563\subsection{\Kdtrees}
564
565\begin{figure}
566  \begin{center}
567        % tree-figs.py : plot_bboxes
568    \begin{tabular}{c@{\hspace{3pt}}c@{\hspace{3pt}}c}
569      \includegraphics[width=0.31\textwidth]{kdtree-0} &
570      \includegraphics[width=0.31\textwidth]{kdtree-1} &
571      \includegraphics[width=0.31\textwidth]{kdtree-2} \\
572      \includegraphics[width=0.31\textwidth]{kdtree-3} &
573      \includegraphics[width=0.31\textwidth]{kdtree-4} &
574      \includegraphics[width=0.31\textwidth]{kdtree-5}
575    \end{tabular}
576  \end{center}
577  \caption[A \kdtree partitioning of space.]{A \kdtree partitioning of
578        space.  Each panel shows the nodes at one level of the tree.  The
579        points are 200 samples from a ring with uniformly-distributed
580        radius in $\left(0.6, 1\right)$ and uniformly-distributed angle in
581        $\left(0, \displaystyle{\frac{\pi}{2}}\right)$.  Nodes are split
582        along the dimension with the largest range, and the splitting
583        position is chosen so that the two child nodes will contain an
584        equal number of points.  Note how the tree adapts to the
585        distribution of the data: each node tightly bounds the data points
586        it owns.  This is a slightly different version of the \kdtree than
587        Bentley's original description.\label{kdtree}}
588\end{figure}
589
590The $k$-dimensional tree (\kdtree) was introduced by Bentley in 1975
591\cite{bentley1975}, and several extensions have been described
592\cite{friedman1977, sproull1991}.  The idea is to build a binary tree
593over the feature space, where each node ``owns'' a region of the space
594which is disjoint from the region owned by its sibling node.  Each
595non-leaf node has an axis-aligned splitting hyper-plane; its left
596child owns the subspace to the left of the splitting plane, and the
597right child the subspace to the right.  In this way, the \kdtree
598defines a hierarchical subdivision of the feature space into
599axis-aligned hyper-rectangles.  See \figref{kdtree}.
600
601
602For the purposes of indexing the features in a geometric hashing
603system, the set of features is known beforehand and is static, so we
604do not need to insert or delete features from the tree, and Bentley's
605\emph{optimized} \kdtree construction method can be used.  That is,
606the tree can be built to adapt to the particular set of features we
607have.
608
609
610Tree construction proceeds by first assigning all features to the root
611node, then recursively splitting nodes until each leaf node owns at
612most $L$ features, with $L$ perhaps $10$ or $20$.  Splitting is done
613by selecting a splitting plane and using it to partition the features.
614Sproull \cite{sproull1991} enumerates several strategies for choosing
615the splitting hyperplane.  Traditionally, \kdtrees use axis-aligned
616splitting hyperplanes, and the splitting dimension is chosen by simply
617iterating through the dimensions (as in Bentley's original paper), or
618by selecting the dimension with the largest \emph{range} or
619\emph{variance}.  One can also use an arbitrary splitting hyperplane,
620for example by finding the principal eigenvector of the covariance
621matrix of the feature vectors.  Typically the splitting hyperplane is
622positioned so that the two children own an equal number of data
623points.  This leads to a balanced (and therefore short) tree, but in
624some applications it can be beneficial to bias the splitting location
625\cite{macdonald1990}.
626
627
628Depending on the application, it can be beneficial to store at each
629node the tight axis-aligned bounding box (and other summary statistics
630\cite{deng1995}) of the points owned by that node (as shown in
631\figref{kdtree}).  This is particularly useful where there is
632significant correlation between the dimensions of the data, because
633splitting the data points along one dimension implies that the
634bounding box will shrink in other dimensions as well, and this can
635result in faster searches.  If the bounding boxes are stored, then it
636is no longer strictly necessary to store the splitting hyperplane,
637though it may be useful to speed up some computations.
638
639
640\comment{
641\begin{figure}
642  \begin{center}
643    \includegraphics[width=\textwidth]{rangesearch-bb-fig}
644  \end{center}
645  \caption[\Kdtree range search with bounding boxes.]{\Kdtree range
646search for a query point $q$ in \kdtree node $N$, for a \kdtree that
647stores the bounding box of each node.  We check whether the query
648hypersphere overlaps the child nodes $L$ and $R$.  In both cases $q$
649is contained in the left child node $L$ so we must recurse on $L$.  In
650the top panel, the distance \emph{in the splitting dimension} from $q$
651to the bounding box of node $R$ is less than the search radius $r_1$
652so we do not need to recurse on $R$.  In the lower panel, the distance
653in the search dimension to the query is larger than the search radius
654$r_2$ so we must recurse on $R$.}
655  \label{rangesearch-bb}
656\end{figure}
657
658
659\begin{figure}
660  \begin{center}
661    \includegraphics[width=\textwidth]{bohm2001fig7}
662  \end{center}
663  \caption[Lower and Upper Distance Bounds in a \kdtree.]{Lower and upper
664        bounds (min- and max-dist, respectively) on the distance from a query point
665        to a \kdtree bounding box.
666        This figure is copied from \cite{bohm2001} Figure 7.}
667  \label{upperlower}
668\end{figure}
669 
670
671\begin{figure}
672  \begin{center}
673    \includegraphics[width=\textwidth]{rangesearch-split-fig}
674  \end{center}
675  \caption[\kdtree range search with splitting planes.]{\kdtree range
676        search for a query point $q$ in \kdtree node $N$,
677        for a \kdtree that only stores the splitting plane (not the bounding boxes).
678        Top: since the query is to the left of $N$'s splitting plane (dashed line),
679        we must recurse on the left child $L$.  Since the distance in the splitting
680        dimension from the query to the splitting plane, $D_R$ is less than $r$,
681        we must recurse on $R$.
682
683        Bottom: For the left node ($L$): the query is above the splitting plane, so we must recurse
684        on child $T$.  The distance $D_B$ in the splitting dimension to the splitting plane
685        is less than $r$, so we must recurse on $B$ also.
686       
687        For the right node ($R$): the query is above the splitting plane, so we must recurse
688        on child $t$.  The distance $D_b$ to the splitting plane is greater than $r$ so we
689        need not recurse on node $b$.  (Note, we could also keep track of the \emph{total}
690        distance to the query point, $\sqrt{D_R^2 + D_b^2}$, but this requires more book-keeping.}
691  \label{rangesearch-sp}
692\end{figure}
693}
694
695For feature matching, we typically want to do \emph{range search}:
696find all features within radius $r$ of a given query feature $q$.
697This is easily achieved with a \kdtree using a recursive algorithm.
698Starting at the root, we must decide whether we need to recurse on
699each of the node's children.  (How we make this decision is explained
700below.)  Once we reach a leaf, we enumerate the features owned by the
701leaf and compute the distance to the query feature; any feature with
702distance less than $r$ is accepted.
703
704
705We decide whether we must recurse on a node's children based on their
706bounding boxes.  We compute the \emph{mindist}---the lower-bound of
707distance---between the query point and each child's bounding box.  The
708mindist is simply the distance to the corner or edge of the bounding
709box, and can be computed quickly.  Any node whose mindist to the query
710point is less than the query radius $r$ must be visited.  If the
711\kdtree does not explicitly store the tight bounding-boxes of the
712nodes, loose bounding-boxes can be constructed during the recursion
713based on the splitting planes of the ancestors.
714
715
716\comment{
717Nearest-neighbour search in \kdtrees is essentially the same as range
718search, except that $r$ is considered the \emph{pruning distance}; any
719nodes whose lower-bound distance is larger than the pruning distance
720need not be visited.  The pruning distance decreases monotonically as
721the search proceeds and is simply the (upper bound of) the distance to
722the nearest neighbour found so far.  Since reducing the pruning
723distance is critically important to achieving fast search, the order
724in which nodes are explored is important; one typically keeps a
725priority queue of nodes to visit, sorted by the (lower-bound) distance
726of the node to the query feature.  If the query feature falls in the
727region owned by one of the leaf nodes, then all nodes on the path from
728the root to that leaf will have lower-bound distance of zero, so this
729path will be explored first and the ``greedy'' nearest neighbour
730distance will become the first pruning distance.
731
732
733The \kdtree can also be used for approximate nearest-neighbour or
734range searches.  Beis and Lowe \cite{beis1997} describe the Best Bin
735First algorithm, which is essentially the same as the full
736nearest-neighbour search except that the search terminates after
737exploring a fixed maximum number of leaves or after a fixed period of
738time has elapsed.  The priority queue is sorted by the lower-bound
739distance of the node's bounding box to the query point, so the `most
740promising' nodes are explored first.
741}
742
743
744Chapter \ref{chap:kdtree} presents the \kdtree in more detail.
745
746
747\subsection{Other approaches}
748
749% Voronoi:
750% -de Berg 1997
751%   (Mark de Berg , Marc van Kreveld , Mark Overmars , Otfried Schwarzkopf, Computational geometry: algorithms and applications, Springer-Verlag New York, Inc., Secaucus, NJ, 1997)
752% -Edelsbrunner 1987
753%   (Herbert Edelsbrunner, Algorithms in combinatorial geometry, Springer-Verlag New York, Inc., New York, NY, 1987)
754% -Preparata and Shamon 1985
755%   (Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985)
756
757In contrast to the \kdtree's rectangular decomposition of space, there
758is a family of space decompositions that use hyperspheres.  An example
759is the \emph{ball-tree} \cite{uhlmann1991b}, in which each non-leaf
760node is associated with one of the known features $f$ and a radius
761$r$; the left child owns all features within radius $r$ of feature
762$f$, and the right child owns the remaining features.  Similarly, in
763the \emph{anchors hierarchy} \cite{moore2000}, each non-leaf node is
764represented by a feature (called its \emph{anchor}), and a node owns
765all the features that are closer to its anchor than the anchor of its
766sibling.  These sphere-based trees are supposed to better capture the
767structure of high-dimensional data sets, though the results seem to
768depend quite strongly on the particular data distributions and search
769parameters.
770
771
772For two-dimensional feature spaces, algorithms based on planar point
773location (eg, \cite{kirkpatrick1983}) and the Voronoi tesselation of
774space yield $\mathcal{O}(n \log n)$ preprocessing time, a data
775structure of size $\mathcal{O}(n)$, and query time of
776$\mathcal{O}(\log{n})$.  Unfortunately, in higher dimensions the space
777required to store the Voronoi tesselation is
778$\mathcal{O}(n^{\floor{d/2}})$, which quickly becomes untenable
779\cite{arya1998}.
780
781% planar point location:
782% -Dobkin and Lipton in 1976 O((log n)^2) time, O(n^2) space
783% -Sarnak and Tarjan O(n) space, O(log n) time
784%    * Neil Sarnak, Robert E. Tarjan (1986). "Planar Point Location Using Persistent Search Trees". Communications of the ACM 29 (7): 669-679.
785% -Edelsbrunner, Guibas, and Stolfi, same
786%    * Herbert Edelsbrunner, Leonidas J. Guibas, Jorge Stolfi (1986). "Optimal point location in a monotone subdivision". SIAM Journal on Computing 15 (2): 317-340.
787% -Kirkpatrick, same
788%    * David G. Kirkpatrick (1983). "Optimal Search in Planar Subdivisions". SIAM J. Comput. 12: 28-35.
789% Voronoi tesselation:
790% -B. Delaunay, Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793-800, 1934
791
792The design of spatial indexing data structures and search algorithms
793is itself a large research area.  Dozens of different tree structures
794have been proposed: in two surveys B\"ohm \cite{bohm2001} and
795Hjaltason \cite{hjaltason2003} identify B-, $\textrm{B}^{+}$-, ball-,
796bisector-, BSP-, DABS-, fq-, gh-, \mbox{GNA-,} hB-, $\textrm{hB}^{\pi}$-,
797hybrid-, IQ-, kd-, kd-B-, $\textrm{LSD}^h$-, M-, mb-,
798$\textrm{mb}^{\ast}$-, mvp-, oct-, \mbox{post-office-,} pyramid-, quad-, R-,
799$\textrm{R}^{\ast}$-, $\textrm{R}^{+}$-, sa-, slim-, \mbox{sphere-,}
800SR-, SS-, TV-, vp-, $\textrm{vp}^{\ast}$-, and X-trees.  There is an
801equally mind-boggling variety of exact and approximate search
802algorithms.  Each data structure and algorithm may work well in some
803region of parameter space, but there is no clear winner for
804general-purpose use.
805
806
807\section{Astrometric calibration as a pattern recognition task}
808
809
810\comment{
811wget "http://casjobs.sdss.org/ImgCutoutDR7/getjpeg.aspx?ra=166.45&dec=-0.03&scale=1&opt=&width=2000&height=2000" -O ngc3521-orig.jpg
812jpegtopnm ngc3521-orig.jpg | pnmrotate -45 | pnmcut 600 900 1400 1000 | pnmscale -reduce 2 | pnmtojpeg > ngc3521.jpg
813#---> http://live.astrometry.net/status.php?job=alpha-200906-68444159
814jpegtopnm ngc3521.jpg | ppmtopgm | pnminvert | pnmtojpeg > ngc3521-bw.jpg
815wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=field.xy.fits" -O ngc3521.xy
816wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=index.xy.fits" -O ngc3521-index.xy
817jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white > ngc3521-sources.png
818jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white -P | plotxy -I - -i ngc3521-index.xy -x 1 -y 1 -C black -b white -s crosshair -P | plot-constellations -f 18 -w ngc3521.wcs -i - -N -o ngc3521-index.png
819%%% wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=wcs.fits" -O ngc3521.wcs
820%%% scp gmaps:/data2/test-merc/tycho.mkdt.fits .
821wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-0.5&bb=0,-85,360,85&dashbox=0.1&w=500&h=500&lw=3" -O ngc3521-zoom0.png
822wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-1&bb=175.533,-17.7621663832,211.533,17.6598478619&dashbox=0.01&w=500&h=500&lw=3" -O ngc3521-zoom1.png
823wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=0.5&bb=191.733,-1.85338140354,195.333,1.74602498613&w=500&h=500&lw=3" -O ngc3521-zoom2.png
824for x in 0 1 2; do
825 pngtopnm ngc3521-zoom${x}.png | ppmtopgm | pnminvert | pnmtopng > ngc3521-zoom${x}-bw.png;
826done
827}
828
829
830% ~/an-2/usnob-map/execs/tilerender -x 0.000000 -y -85.000000 -X 360.000000 -Y 85.000000 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.1 -d > tile1.png
831% ~/an-2/usnob-map/execs/tilerender -x 151.724000 -y -4.784223 -X 187.724000 -Y 29.772221 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.25 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.01 -d > tile2.png
832% ~/an-2/usnob-map/execs/tilerender -x 167.924000 -y 11.335527 -X 171.524000 -Y 14.841399 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g 0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 > tile3.png
833%
834% /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/index.xy.fits -S 1 -W 720 -H 503 -x 1 -y 1 -w 2 -r 6 -s s > ixy.pgm
835% /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/field.xy.fits -S 1 -W 720 -H 503 -r 5 -x 1 -y 1 -w 2 -s c > fxy.pgm
836% cp /home/gmaps/ontheweb-data/tor/200706/51145570/image.pnm undim.ppm
837% pgmtoppm green ixy.pgm > ixy.ppm
838% pnmcomp -alpha=ixy.pgm ixy.ppm undim.ppm > sum.ppm
839% pnmcomp -alpha=fxy.pgm fxy.ppm sum.ppm > sum2.ppm
840% pnmtopng sum2.ppm > sum.png
841% pnmcomp -alpha=fxy.pgm fxy.ppm undim.ppm | pnmtopng > sources.png
842% http://oven.cosmo.fas.nyu.edu/test/status.php?job=tor-200706-51145570
843
844
845\begin{figure}
846\begin{center}
847\framebox{\includegraphics[width=0.99\figunit]{ngc3521-bw}} \\
848\framebox{\includegraphics[width=0.99\figunit]{ngc3521-sources}} \\
849\framebox{\includegraphics[width=0.99\figunit]{ngc3521-index}}
850\end{center}
851\caption{\captionpart{Top:} Input image (credit: Sloan Digital Sky
852Survey).  \captionpart{Middle:} The brightest 100 sources extracted
853from the image.  \captionpart{Bottom:} Reference sources, transformed
854into the image coordinate system (crosshairs).  Many of the image and
855reference sources are aligned, but there are many image sources
856without reference sources.  Our system knows about the positions of
857many objects of interest on the sky, and has labelled the galaxy NGC
8583521.\label{fig:redgreen}}
859\end{figure}
860
861
862\begin{figure}
863\begin{center}
864\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c}
865\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom0-bw}} &
866\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom1-bw}} &
867\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom2-bw}}
868\end{tabular}
869\end{center}
870\caption{The location of the input image on the sky.
871  \captionpart{Left:} The whole sky, in Mercator projection.  The
872  sinusoid-shaped feature is the Milky Way.  The dashed box shows the
873  zoomed-in region.  \captionpart{Middle:} Zoomed in by a factor of
874  $10$.  \captionpart{Right:} Zoomed in by a factor of $100$.  The box
875  shows the outline of the input image.\label{fig:onthesky}}
876\end{figure}
877
878
879For modern astronomers, astrometric calibration is often one of the
880first steps toward getting useful information out of an image of the
881sky.  Aligning a new image with an \emph{astrometric reference
882catalog} allows the astronomer to place the image within a standard
883coordinate frame.  This allows stars, galaxies, and other objects
884(\emph{sources}) in the new image to be identified with known sources,
885which in turn allows astronomers to calibrate other properties of the
886new image, and allows the positions of new sources to be described in
887a standard reference frame.
888
889
890The task of blind astrometric calibration---automatically finding the
891astrometric calibration of an image, using only the information in the
892image pixels---can be seen as a pattern recognition problem.  As
893Bertin \cite{bertin2005} notes, ``astrometric and photometric
894calibrations have remained the most tiresome step in the reduction of
895large imaging surveys,'' so this is not only an interesting problem to
896solve, but one with practical implications for astronomers.
897
898
899For the purposes of astrometric calibration, we can think of the sky
900as a large two-dimensional surface: the stars are very distant, so our
901viewpoint is effectively fixed.  We are moving, as are the stars, but
902these motions are small relative to the precision at which we
903typically work.  The sky contains many stars, galaxies, and other
904astronomical sources.  The stars and distant galaxies are effectively
905point sources, while closer galaxies can be resolved.  Astrometric
906reference catalogs list the positions, motions, and brightnesses of
907these sources and serve as the ``ground truth'' or database of known
908(reference) objects.  The USNO-B1 catalog \cite{usnob, nomad}, for
909example, lists over one billion objects.  As many as a few percent of
910these are false detections or other artifacts \cite{barroncleaning},
911and some objects that should be visible are missing.
912
913
914Extra sources can be due to planets, comets, satellites, or aircraft.
915Missing or poorly localized source can be due to imperfections in the
916imaging sensor, saturation, cosmic ray interference, or (rarely)
917occlusion.  Errors in the image processing that detects sources can
918lead to sources being gained or lost.  We call sources that appear in
919only the input image or reference catalog \emph{distractors} and
920\emph{dropouts}, respectively.  The existence of distractors and
921dropouts means that we can never assume that all the objects in the
922reference catalog will be contained in an image to be recognized, or
923vice versa.
924
925
926The images to be recognized are subregions of the sky.  Image sizes
927range from nearly half the celestial sphere down to $10^{-7}$ of the
928area and smaller.  The input images measure unknown bands of the
929electromagnetic spectrum, and various nonlinear functions may have
930been applied to the pixel values.  We cannot rely on absolute
931brightness or color to recognize individual stars or galaxies.  At
932best we can hope that there is some positive correlation in the
933relative brightness ordering of objects in the image and the
934corresponding objects in our catalog.
935
936
937Blind astrometric calibration is an ideal task for exploring geometric
938ideas in pattern recognition.  Most celestial objects are effectively
939point sources, and can be found and localized to sub-pixel accuracy
940using relatively simple image-processing procedures.  But since the
941individual features are characterized only by their positions and
942brightnesses, we must examine collections of features in order to
943build distinctive patterns.  In \chapref{chap:techreport} we present
944\an, which applies the geometric hashing framework to the task of
945blind astrometric calibration.  An example of our results in shown in
946\figs \ref{fig:redgreen} and \ref{fig:onthesky}.
947
948
949\comment{
950There are several useful applications for a blind astrometry solver.
951Most telescopes used by professional astronomers are
952computer-controlled and automatically record the telescope pointing in
953the image header, but these systems occasionally fail.  Feeding the
954images to a blind astrometry solver would allow an independent
955verification that the \metadata written by the telescope control
956software are correct.  Second, some observatories have large archives
957of photographic plates taken by astronomers over many decades, and a
958growing fraction of these have been digitized.  These images could be
959useful to researchers, since they provide a longer baseline for
960measuring changes over time.  However, information about where the
961telescope was pointing has often been lost, or is only available as a
962hand-written note on the edge of the photographic plate, or as an
963entry in a logbook.  A blind astrometry solver would be an invaluable
964tool for easily providing accurate \metadata about these images.
965Finally, these is a large and enthusiastic community of amateur
966astronomers who produce many images that would be useful to
967professional astronomers, but since amateurs are seldom interested in
968recording exact \metadata, these images are not readily available to
969professional astronomers.  A blind astrometry solver would unlock this
970trove of pixels.
971
972Other interesting applications include creating a customized
973star-finding chart overlayed on the input image, and controlling a
974telescope by using the output of a blind solver as feedback to a
975control system.
976
977\begin{figure}
978  \begin{center}
979        %\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c}
980        \includegraphics[width=\figunit]{moon} \\
981        \includegraphics[width=\figunit]{saturated}
982        %\end{tabular}
983  \end{center}
984  \caption{Astronomical images with occlusion.  \captionpart{Top:} The
985moon, buildings, and mountains occlude the stars behind them (image
986copyright Johannes Schedler, \texttt{http://panther-observatory.com}).
987\captionpart{Bottom:} Saturation and diffraction spikes due to bright
988objects can obscure nearby objects.}
989\label{moon}
990\end{figure}
991
992Since most celestial objects are effectively point sources, the
993obvious image feature to use is an individual object.  Feature
994detection is relatively straightforward, and the positions of features
995can be measured with subpixel accuracy, but the feature descriptor
996essentially contains only brightness and position; individual features
997are not distinctive.
998
999Although occlusion is typically not a problem in astronomical images,
1000it does occasionally occur (see Figure \ref{moon}).  In addition,
1001bright objects in the image can cause diffraction spikes or saturation
1002over a large region of the image, and this can cause nearby objects to
1003be hidden.
1004}
1005
1006
1007
1008\section{Related work in astrometric calibration}
1009
1010There seem to be two distinct groups of researchers who have worked on
1011astrometric calibration.  The first are professional astronomers,
1012whose images are typically of small angular extent, long exposure
1013time, and high quality.  They typically assume that there is a good
1014initial guess of the astrometric calibration and the goal is to
1015produce a very accurate calibration, including image distortion.  The
1016second group of researchers are spacecraft engineers who want to use
1017the stars to estimate the attitude of a camera attached to a
1018spacecraft.  Here the images are of wide angular extent, have short
1019exposure time, and are very noisy.  Primary concerns include weight,
1020power consumption, and robustness (especially in avoiding false
1021positives), while a high degree of accuracy is neither required nor
1022possible given the hardware.
1023
1024\subsection{Non-blind astrometric calibration}
1025
1026Non-blind astrometric calibration requires that an initial estimate of
1027the calibration to be provided.  Groth \cite{groth1986} presents an
1028algorithm for matching two lists of coordinates (eg, image coordinates
1029in pixels and reference star coordinates on the celestial sphere),
1030assuming that the lists contain a significant proportion of objects in
1031common.  With the limited computing resources available at the time,
1032he suggests taking the brightest 20 or 30 objects in each list.
1033
1034
1035Once the two lists of objects have been compiled and the brightest
1036objects selected, all sets of three objects are enumerated and the
1037triangles they form are described by a scale-, rotation-, and
1038translation-invariant descriptor, composed of the length ratio of the
1039longest to shortest edges, plus the cosine between these edges and the
1040\emph{sense} or \emph{parity} of the triangle.  The tolerances
1041associated with these features are computed (by propagating their
1042positional errors through the feature descriptor process) and stored,
1043along with the logarithm of the triangle's perimeter.  Triangles with
1044large length ratio are rejected since they are relatively
1045indistinctive.
1046
1047
1048After triangle features are extracted from both point lists, feature
1049matching is performed by checking whether the distance between each
1050pair of features is less than their corresponding tolerances.  This
1051process is accelerated by sorting the features on one of the feature
1052dimensions.  If multiple matches are found for a particular feature,
1053only the closest is considered.  After all the features have been
1054compared, many correct matches should be found, along with some false
1055matches.  For each match, the difference of the log-perimeters of the
1056two triangles is computed.  This gives the relative scales of the two
1057triangles and hence their coordinate frames, assuming the match is
1058correct.  False matches are rejected by iterative outlier detection:
1059the mean difference of log-perimeters is computed and matches far from
1060the mean are rejected.
1061
1062
1063This approach is essentially an application of the geometric hashing
1064method, though instead of using hashing to accelerate feature
1065matching, the features are simply kept in lists that are sorted on one
1066dimension.  Much of the subsequent work in this area follows
1067essentially the same path (eg, \cite{valdes1995}, \cite{dong2006}).
1068
1069
1070P\'al and Bakos \cite{pal2006} adapt the triangle-matching approach to
1071images containing many more objects (of order $10^4$).  Since it would
1072become prohibitively expensive to enumerate all triangles in such an
1073image, they reduce the number of triangles created by using only the
1074triangles created by a Delauney triangulation.  This vastly reduces
1075the number of triangles created, but makes the algorithm more
1076sensitive to dropout and distractor stars: a single extra star causes
1077a completely disjoint set of triangles to be created in its
1078neighbourhood.  To compensate for this shortcoming, they define an
1079\emph{extended Delauney triangulation}: for each point, they select
1080all points at distance $\ell$ in the Delauney triangulation, and
1081triangulate this set.  This process is repeated for each point.  The
1082first extended triangulation ($\ell = 2$) skips over the nearest set
1083of stars and builds larger triangles by using the next-further set of
1084stars.  The intent is that if some of the nearby stars are distractors
1085that they will be ignored when the extended triangulations are used.
1086
1087\begin{figure}
1088\begin{center}
1089\includegraphics[width=\figunit]{pal-fig2}
1090\end{center}
1091\caption{The triangle parameterization used by P\'al and Bakos.  This
1092figure is a reproduction of \fig 2 in \cite{pal2006}.\label{pal}}
1093\end{figure}
1094
1095P\'al and Bakos introduce a new triangle parameterization which is
1096continuous and sensitive to chirality (parity); see \figref{pal}.
1097This two-dimensional parameter space is used as the geometric feature
1098space.  When matching triangles between the two images, they demand a
1099\emph{symmetric point match}: each triangle must be the other
1100triangle's nearest neighbour in feature space.  Matching two images is
1101done by creating the lists of triangles and attempting to find
1102symmetric point matches.  This process is accelerated by sorting each
1103list by one of the coordinates.  Each triangle match is considered to
1104be a vote for the correspondence of the three pairs of points
1105composing the triangles.  These votes are accumulated in a sparse
1106matrix where element $(i, j)$ contains the number of votes for a
1107correspondence between object $i$ in the first image and object $j$ in
1108the second image.  After all matches are considered, the top 40\% of
1109the nonzero matrix elements are assumed to contain the true
1110correspondences.  A transformation based on these correspondence is
1111computed and the unitarity of the transformation matrix is used to
1112judge whether the match is true or false.
1113
1114
1115A different approach is taken by Kaiser \etal \cite{kaiser1999}.  They
1116assume they are given two lists of source positions that differ by a
1117rigid transformation involving scaling, rotation, and translation.  In
1118each image, they iterate over each pair of points and compute the
1119vector difference of their positions.  For each list, the log-length
1120and angle of the pairwise difference vectors are accumulated in a
1121two-dimensional histogram.  Observe that if the whole list of points
1122is scaled up by a constant factor, then the log-distance between each
1123pair of points increases by a constant amount.  Similarly, if the
1124whole list is rotated then the angles shift by a constant amount.
1125Once the two histograms have been computed, their cross-correlation is
1126computed.  If the two lists contain a significant number of
1127corresponding points, the cross-correlation signal will be strongest
1128at a shift corresponding to the difference in log-scale and rotation
1129between the lists.  This process is similar to a Hough transform
1130\cite{duda1972,ballard1981}, except that instead of finding the peak
1131of a single parameter-space histogram, we are searching for a peak in
1132the similarity of two histograms as we shift them with respect to each
1133other in parameter space.
1134
1135
1136Once the scaling and rotation between the lists has been found, the
1137translation can be found by scaling and rotating one of the lists into
1138the frame of the other list, then histogramming the vector difference
1139between points across the two lists.  This is a standard (generalized)
1140Hough transform.
1141
1142
1143\subsection{Blind astrometric calibration}
1144
1145
1146The majority of previous work on blind astrometric calibration is
1147motivated by the problem of spacecraft attitude estimation.  Sometimes
1148called the ``lost in space'' or ``stellar gyroscope'' problem, the
1149task is to estimate the pose of a spacecraft by using an image of the
1150sky captured by an onboard camera.  Although similar in general
1151spirit, the requirements and limitations of this application are quite
1152different than astronomical applications.  Mass, power consumption,
1153and robustness of the system are primary concerns, and as a result the
1154optical designs are very different from science-grade astronomical
1155instruments, and the available processor and memory resources are very
1156restricted.  Typical fields of view are tens of degrees across, and
1157the exposure time is kept short to allow the system to function while
1158the spacecraft is rotating.  As a result, image quality is typically
1159quite poor: often only a handful of the brightest stars are visible.
1160Since the field of view is large, a reference catalog of a few
1161thousand stars is sufficient to ensure that any view of the sky
1162contains many reference stars.  Since the system will only process
1163images from a single camera, the whole system can be customized and
1164calibrated to that camera.  For example, the nonlinear distortions of
1165the optical system can be measured, and the scale and bandpass of the
1166imaging system are known, so the reference catalog can be tailored to
1167match.
1168
1169
1170For example, Liebe \etal \cite{liebe2004} describe a system design
1171with a $56~\deg$ field of view and exposure time of $50~\unit{ms}$.
1172The resulting images contain tens of stars if the spacecraft is not
1173rotating, but on average only three stars will be detectable when the
1174rotation rate is $50~\deg/\unit{sec}$.  The paper does not describe a
1175particular algorithm for star identification, with the implication
1176that it is not a particularly difficult problem since absolute
1177brightness information will be available, and the total number of
1178stars that are visible to the camera is only a few hundred.
1179
1180
1181In earlier work \cite{liebe1993}, Liebe describes a star
1182identification system.  The reference catalog is composed of the 1539
1183brightest stars, with brightness calibrated to the camera used in the
1184system.  The field of view is $30~\degrees$, and the system uses a
1185feature-matching approach, using the brightest star in the field and
1186its two nearest neighbours to define a geometric feature.  The feature
1187descriptor is composed of the distance from the brightest star to its
1188nearest and second-nearest neighbours, along with the angle between
1189these neighbours.  Note that this feature is not scale-invariant,
1190because the system is only meant to solve images taken by one camera,
1191so the scale is known.  An index is constructed by enumerating all
1192such features that could possibly be detected, given the detection
1193limits of the camera.  These features are coarsely quantized and
1194stored in a table.  To account for noise in the feature descriptors,
1195all neighbouring cells in the quantized feature space are also stored
1196in the table.  This generates $185,000$ features.  At test time, the
1197features in the image are enumerated and the table of features is
1198scanned; an exact feature match in the quantized space is assumed to
1199be correct.
1200
1201
1202This approach is a fairly straightforward geometric hashing technique,
1203except that it does not use hashing as such, and there is no voting or
1204verification scheme because feature aliasing is assumed not to happen.
1205
1206\begin{figure}
1207\begin{center}
1208\includegraphics[width=\figunit]{grid}
1209\end{center}
1210\caption{A grid-based feature: two sources are used to define a local
1211  coordinate system which is discretized into a grid of cells.  Each
1212  cell becomes a bit in the feature descriptor; if a cell is occupied
1213  its bit is set.  This figure is copied from MacKay
1214  \cite{mackay2005}.}
1215\label{gridbased}
1216\end{figure}
1217
1218In contrast to systems that build features from the precise locations
1219of a small number of stars, Padgett and Kreutz-Delgado
1220\cite{padgett1997} present an approach that incorporates information
1221from a large portion of the image.  Their grid-based feature is
1222defined by first choosing one star as the reference star to define the
1223center of the grid, then selecting its nearest neighbour (outside an
1224exclusion radius) to define the orientation of the grid.  Once the
1225center and orientation are determined, a grid is defined, and each
1226remaining stars in the image is assigned to a grid cell.  The feature
1227is a bit vector, one bit per grid cell, where the bit is set only if
1228the cell contains a star.  See \figref{gridbased}.
1229
1230
1231The index is constructed by scanning the sky and selecting, for each
1232field of view, the $n$ brightest stars.  For each of these stars a
1233feature is computed and stored.  At test time, they examine the
1234brightest $c n$ stars (for some safety factor $c$), computing the
1235feature for each one and searching the index for a match.  A pair of
1236features is defined to match if their dot product is above a
1237threshold.  This is equivalent to taking the bitwise \textsc{AND} of
1238the bit vectors and counting the number of bits that are set.  This
1239search can be implemented efficiently by using lookup tables: for each
1240bit, they maintain a list of the features for which that bit is set.
1241Note that this is equivalent to using a grid-based geometric hashing
1242approach: each grid cell is equivalent to a discretized relative
1243coordinate vector, which becomes a hash key, and the ``lookup table''
1244is a hash table.
1245
1246
1247After all the features in the test image have been extracted and
1248matched to the index, the algorithm proceeds to a voting (or perhaps
1249``consensus'') phase where it rejects matches that propose a field of
1250view that does not overlap that of the majority.
1251
1252
1253The system is designed to operate on images of diameter $8~\degrees$,
1254and performs well on simulated data.  They use a grid size of $40
1255\times 40$, and find that on average $25$ grid cells are filled.
1256Their index contains $13,000$ patterns, which means that false
1257positives are quite rare.  However, misidentification of the nearest
1258neighbour, or edge effects (assigning a star to the wrong grid cell
1259due to positional noise) mean that failures to find a match are not
1260uncommon, and are more likely in regions of the sky with high stellar
1261density.
1262
1263In later work, Clouse and Padgett \cite{clouse2000} extend this
1264approach by using Bayesian decision theory instead of a simple
1265threshold on the dot product to define how well features match.  This
1266extension, along with smaller noise levels in the (simulated) imaging
1267system, allows them to extend the approach down to fields of view
1268$2~\degrees$ in diameter.
1269
1270\comment{
1271  Given two features, they compute the dot product between the bit
1272  vectors, then proceed to estimate the probabilities that the match is
1273  a true positive and false positive.  These probability distributions
1274  are quite complex, so several simplifying approximations are made, and
1275  the remaining parameters are estimated by running simulations.
1276  }
1277
1278Harvey \cite{harvey2004} presents two different approaches, one
1279grid-based and the other shape-based.  The grid-based approach is
1280similar to the Padgett--Kreutz-Delgado and Clouse--Padgett approaches
1281\cite{padgett1997, clouse2000}.  A coarser grid is used, so more stars
1282are likely to appear in each bin.  To compensate, a grid cell is only
1283considered ``occupied'' if it contains more than some threshold number
1284of stars.  The other major change is that he expects a test image to
1285be an ``overexposure'' or ``underexposure'' relative to the reference
1286catalog.  This implies that the test image should contain either a
1287subset or a superset of the stars in the index, and therefore the
1288image feature vector must be either greater than or less than an index
1289feature at each bit.  This allows simple bit operations to be used to
1290find feature matches, and feature matching is performed by a linear
1291scan through the index.
1292
1293Harvey's shape-based approach uses constrained $n$-star constellations
1294in order to aggregate information from a large number of stars without
1295allowing the number of potential features to grow combinatorially.
1296Specifically, Harvey uses a pair of stars to define a narrow wedge in
1297the image, then describes the relative positions and angles of a fixed
1298number of nearby stars within that wedge.  Unfortunately, this makes
1299the feature highly sensitive to distractor and dropout stars, since
1300the feature depends on the stars in the feature being enumerated in a
1301particular order.  As a result, all potential features (allowing any
1302combination of stars to drop out) must be checked; the number of
1303features grows exponentially as the density of stars increases.
1304
1305
1306Harvey makes the useful observation that a cascade of indices can be
1307built, where each index is designed to solve images with a particular
1308range of scales.
1309
1310
1311MacKay and Roweis \cite{mackay2005} point out that a grid-based
1312feature such as that used by Harvey leads naturally to a hashing-based
1313strategy.  Each grid cell is associated with a bit that is turned on
1314if the cell is occupied.  This value is placed in a hash table with a
1315mapping back to its position on the sky.  Since false positives can
1316occur as a result of feature aliasing or hash collision, a voting
1317scheme is employed: several feature matches must accumulate before the
1318match is accepted.  Since false negatives can occur as a result of
1319dropouts and distractors (\emph{any} missing or extra star causes the
1320hash code to change completely), many features must be extracted for
1321each region of the sky.
1322
1323
1324\subsection{Fine-tuning astrometric calibrations}
1325
1326In order to ``co-add'' or ``stack'' astronomical images (combine
1327pixels from different images to produce a higher signal-to-noise
1328image), or to do proper-motion studies (measure the movement of stars
1329over time), it is necessary to fine-tune the astrometric calibration
1330of the images.  This is similar to the \emph{bundle adjustment}
1331problem in computer vision \cite{triggs2000}, in that it involves the
1332simultaneous optimization of the various camera and telescope
1333parameters and the estimated positions of objects in the world.
1334Fine-tuning astrometric calibrations is easier because it is
1335essentially two-dimensional, but more difficult because the camera
1336parameters include polynomial distortion terms to model the image
1337distortion introduced by telescope optics.
1338
1339
1340The software package \scamp by Bertin \cite{bertin2005} is a popular
1341tool used by astronomers to fine-tune simultaneously the astrometric
1342calibrations of a large collection of images.  For each image, it is
1343assumed that the center of the image is known to about 25\% of the
1344size of the image, and the scale is known to within a factor of two.
1345The histogram-alignment method of \cite{kaiser1999} is used to find
1346the translation, scaling, and rotation between the image and a
1347reference catalog, which allows the correspondences between image and
1348reference catalog stars to be determined.
1349
1350The core of the \scamp system is a chi-squared minimization of the
1351total weighted distance between the projected positions of all star
1352correspondences among the set of images and the reference catalog.
1353The parameters to be adjusted are the center, scale, rotation, and
1354polynomial distortion coefficients of each image.  A key feature is
1355the ability to share image distortion parameters among subsets of the
1356images, since images taken with the same telescope are expected to
1357share some distortion terms due to the telescope optics.  This reduces
1358the degrees of freedom of the fitting process, resulting in more
1359robust fits.  \scamp can fine-tune thousands of images to sub-pixel
1360accuracy.
1361
1362
1363\section{Summary}
1364
1365
1366The task of automatically finding the astrometric calibration of an
1367astronomical image---this is, recognizing the stars in the image, or
1368locating the image on the sky---is ideal for a geometric hashing-based
1369approach.  Geometric hashing (or more generally, geometric feature
1370matching) is a widely-applicable technique for problems in which it is
1371desirable to build distinctive geometric features out of simple
1372interest-point features.  The approach is ideally suited to the blind
1373astrometric calibration problem, since images of celestial objects can
1374be localized to sub-pixel accuracy but have few other distinguishing
1375features.  Indeed, much of the previous work in blind and non-blind
1376astrometric calibration can be placed within a geometric feature
1377matching framework.
1378
1379
1380\comment{
1381  Although one might suppose that hashing is an intrinsic part of
1382  geometric hashing, we have found that a tree structure is more
1383  effective for the large-scale feature matching we want to perform; we
1384  feel that deciding to use approximate feature matching should be a
1385  design decision, not an intrinsic part of the method: in cases where
1386  each image contains only a few features that could potentially be
1387  found in the index, we do not want to risk missing them due to ``bad
1388  luck'' in a hash function.
1389}
1390
1391
Note: See TracBrowser for help on using the repository browser.