source: trunk/documents/theses/dstn/review.tex @ 12304

Revision 12304, 71.1 KB checked in by dstn, 14 months ago (diff)

some of hogg's kdtree comments

Line 
1\section{Preface}
2
3This thesis describes some of the research carried out by me and my
4colleagues in the \an group.  \an started as a collaboration between
5Sam Roweis, a University of Toronto computer scientist, and David
6W.~Hogg, a New York University astronomer.  An idea that began as a
7crazy scrawl on the back of a napkin---probably in a bar---eventually
8grew into a real project and began attracting collaborators and
9students.  Fast-forwarding a few years, we have achieved a rare feat:
10a computer vision software system that \emph{works}, with little human
11intervention, and is being used by hundreds of real astronomers doing
12real research.  We have also branched out and explored a number of
13other promising areas where ideas in computer vision and machine
14learning can be applied to astronomical data to great benefit.
15
16
17This thesis is quite blatantly a collection of manuscripts.  I have
18done a bit more than staple them together to produce the chapters of
19this thesis, but the original tone and style of each manuscript still
20shines through.  I hope my readers will find the chapter-to-chapter
21variations a refreshing, rather than jarring, change of pace.  One
22advantage of collecting manuscripts into a thesis is that each chapter
23should largely stand alone.  Another advantage is that each manuscript
24was prepared with a list of authors, so it is simpler to identify the
25collaborators who have contributed to each chapter.  %Each chapter has
26%a note detailing the authors of the original manuscript and explaining
27%my personal contribution to the work described.
28
29
30\section{Introduction}
31
32
33The general idea of the \an collaboration is to apply ideas in
34computer vision and machine learning to problems and data in
35astronomy.  Astronomical images are a good domain in which to explore
36ideas in computer vision, because the problems to be solved are often
37much more constrained than in general imaging, the telescopes and
38cameras are often well-understood and calibrated, large volumes of
39data and ``ground truth'' information exist, and with new telescopes
40constantly coming online and producing greater and greater volumes of
41imaging, there is a need for sophisticated computation.  I briefly
42outline astronomical imaging for computer vision researchers in
43\secref{sec:astrointro}.
44
45
46The specific idea addressed in this thesis is that pattern recognition
47approaches can be applied to astronomical images.  The goal is to
48produce a system that is able to recognize automatically the stars in
49any image of the night sky, using the information in the image pixels
50alone.  Recognizing the stars in an image is equivalent to finding the
51pointing, scale, and rotation of the image on the sky in a standard
52coordinate system.  The branch of astronomy concerned with measuring
53the positions and motions of celestial bodies is called
54\emph{astrometry}, and the task of placing a new image in a standard
55reference frame is known as ``solving the astrometry'' or
56``calibrating the astrometry'' of the image.  The \emph{blind
57astrometric calibration} task is to calibrate an image using only the
58image itself.  The broad goal of the \an project was to build a system
59that would allow us to create correct, standards-compliant astrometric
60\metadata for every useful astronomical image ever taken, past and
61future, in any state of archival disarray.  This is part of a larger
62effort to organize, annotate and make searchable all the world's
63astronomical information.
64
65
66After a brief introduction to astronomical imaging for computer
67scientists, the remainder of this chapter reviews the area of pattern
68recognition, and in particular the framework of \emph{geometric
69hashing} for object recognition in images.  One of the contributions
70of this thesis is to replace the simple hash table used in traditional
71geometric hashing with a \kdtree, so I review related work in hashing
72and other approaches for fast feature matching.  Finally, I present
73the blind astrometric calibration task as an instance of object
74recognition, and review some previous approaches to the problem.
75
76%% Our approach?
77
78
79The remaining chapters present our approach to the blind astrometric
80calibration problem.  \Chapref{chap:techreport} explains our approach
81and presents the results of large-scale tests of the system on
82real-world data.  Chapters \ref{chap:verify} and \ref{chap:kdtree}
83delve into details of the approach: \Chapref{chap:verify} presents the
84Bayesian decision theory problem that lies at the heart of our
85approach, while \chapref{chap:kdtree} explains the technical details
86of the \kdtree data structure implementation that is key to making our
87system performant enough to be a practical tool.
88\Chapref{chap:conclusion} summarizes our results.
89
90
91\section{Astronomical imaging for computer scientists}
92\label{sec:astrointro}
93
94Astronomical images are very different from images produced by typical
95consumer-grade cameras.  This section is intended to give
96non-astronomers a sense of the typical parameters of contemporary
97astronomical imaging setups, and introduce some of the terminology and
98key ideas.
99
100
101The Sloan Digital Sky Survey (SDSS) \cite{sdsstechnical, sdsscamera}
102is one of the most important projects in contemporary astronomy.  The
103primary imaging survey was carried out from 2000 through 2008, so the
104hardware is no longer cutting-edge, but it is still very impressive.
105The telescope's primary mirror is $2.5$ meters in diameter.  Incoming
106light first strikes the primary mirror, then a one-meter secondary
107mirror, then passes through two corrective lenses in front of the
108camera.  The nearly distortion-free field of view is about $3$ degrees
109wide.  The camera is composed of $30$ charge-coupled devices (CCDs),
110each $2048\times2048$ pixels, arranged in six columns.  The five CCDs
111in each column have different bandpass filters, ranging from the
112near-infrared ($913~\nanometers$) through the optical to the
113near-ultraviolet ($354~\nanometers$).  The CCDs and associated
114electronics are cooled with liquid nitrogen to $-80^{\circ} \unit{C}$
115and held under vacuum.  Each CCD is about $5~\unit{cm}$ square, and
116the whole focal plane of the camera is about $65~\unit{cm}$ in
117diameter.  The camera assembly has a mass over $300~\unit{kg}$.
118
119
120The survey operates in ``drift-scanning'' mode: the telescope is
121pointed in a nearly fixed direction, and as the Earth rotates the sky
122drifts past the camera.  The camera electronics shift the electrons
123along the CCD columns at the same rate.  As an object drifts through
124the field of view, the electrons it produces in the CCD pixels march
125along with it, and are read out as it disappears from view.  In the
126SDSS camera, the object drifts past each of the five bandpass-filtered
127CCDs of the camera in turn, producing a five-color image.  Over the
128course of a night of observing, the camera sweeps out a great circle,
129creating six columns of five-band images that are $2048$ pixels wide
130and over a million pixels long: over $120$ gigabytes of pixels per
131night.  This clever drift-scanning scheme has many advantages, one
132being that the readout rate of the CCDs can be relatively slow,
133resulting in low readout noise.  Calibration of the camera is simpler
134because charge is integrated over columns of the CCD: it is only
135necessary to calibrate whole columns rather than individual pixels.
136Note that for this scheme to work, the CCDs must be very carefully
137aligned in the focal plane, and the telescope control system must be
138capable of keeping the system pointed stably at the level of a pixel
139or better.  The moving mass of the telescope is over
140$15,000~\unit{kg}$, so this is no mean feat!
141
142% 24 um pixels
143
144Astronomical imaging systems are designed so that the point-spread
145function (PSF)---the image on the CCD of a distant point source of
146light---is spread over several pixels.  This allows the position of a
147point source to be measured to sub-pixel accuracy by fitting a model
148of the PSF to the observed pixel values.  If the system focused a
149point source onto a single pixel, this would be impossible.  The goal
150is typically for the full-width at half-maximum (FWHM) of the PSF to
151be a few pixels, so that the PSF is well-sampled but the signal is not
152spread out over too many pixels.
153
154For ground-based telescopes, the moving atmosphere causes the image of
155a distant point source of light to dance around on the CCD; in typical
156exposures of many seconds this ``speckling'' is integrated into a
157point-spread function.  The size of this PSF is called the ``seeing''
158and a value of one $\unit{arcsecond}$ FWHM is considered quite good.
159The SDSS camera has pixels of size $0.396~\unit{arcseconds}$, and the
160point-spread function is dominated by the seeing: the optics of the
161system are good enough that the images are as clear as they can be
162given the atmosphere.
163
164
165The positions of astronomical objects are usually measured in an
166\emph{equatorial coordinate system}: essentially the projection of the
167Earth's latitudes and longitudes onto the celestial sphere at a
168specific instant.  The latitudinal direction is called ``declination''
169and abbreviated ``Dec'', and the longitudinal direction is called
170``right ascension'' and abbreviated ``RA''.  Declination is typically
171measured in degrees from $-90$ to $+90$, and right ascension is
172typically measured in degrees from $0$ to $360$, or hours from $0$ to
173$24$.  There are $60~\unit{arcminutes}$ per degree and
174$60~\unit{arcseconds}$ per $\unit{arcminute}$.  For reference, the
175Moon as seen from Earth is about half a degree in diameter.  The
176$0.396~\unit{arcsecond}$ pixels of the SDSS camera view an area the
177size of a penny at a distance of ten kilometers; each $2048\times2048$
178pixel CCD views an area of about one-millionth of the sky.
179
180
181The SDSS is very impressive, but the next generation of telescopes
182currently in development dwarf its specifications.  The Large Synoptic
183Survey Telescope (LSST) \cite{lsst}, for example, has an
184$8.4~\unit{meter}$ primary mirror, with a $3.5~\unit{degree}$ field of
185view almost completely covered by an array of nearly $200$ CCDs, each
186with $4096\times4096$ pixels: a total of about $3.2$ gigapixels.  The
187camera is the size of a small car.  It will take exposures of about
188$15$ seconds, yielding over $10$ terabytes of data per night.
189
190% -- basics of data reduction; catalogs
191% -- stars, galaxies, and other stuff
192
193
194
195
196\section{Pattern recognition}
197
198\emph{Pattern recognition} encompasses a huge class of problems and
199techniques in computer vision and machine learning.  Many pattern
200recognition problems are of the type ``identify this object'' or
201``find things like this'': the system is first told about a large set
202of objects, then is presented with a novel object and is asked to
203describe it or find similar objects from the set of known objects.
204
205Pattern recognition systems often take a \emph{feature matching}
206approach.  A \emph{feature} is a compact description of
207(problem-specific) relevant, invariant properties of an object, such
208that objects that are similar in the problem domain have similar
209features.  Defining how features are extracted from objects and
210defining the similarity between features are key factors in the
211success of such systems.  Often the representation of a feature (the
212\emph{feature descriptor}) is a point in a real coordinate space and a
213distance metric defines the similarity between features.
214
215Feature-based approaches typically involve two phases: an
216\emph{indexing} phase, in which the set of known objects is shown to
217the system, and a \emph{test} phase, in which novel objects are
218presented.  During indexing, features are extracted from the set of
219known objects.  At test time, features are extracted from the novel
220object and compared with the set of features from the known objects.
221In many domains, the set of known objects is very large, so it is
222critical that features be stored in a manner that allows the system to
223rapidly find known features that match a given feature.
224
225Often feature-based approaches suffer \emph{feature aliasing}
226problems: two objects that are dissimilar in the problem domain may
227yield similar features.  In this case, systems may use \emph{voting}
228or \emph{verification} schemes to ensure that aliased (false) feature
229matches are eliminated.  Voting schemes typically assume that false
230feature matches are independently randomly distributed, so the
231probability of matching several features to the same incorrect object
232is the product of the individual probabilities; if we demand a
233sufficient number of matches, the probability of a false match becomes
234small.  Verification or \emph{hypothesize-test} schemes treat a
235feature match as a hypothesis that the novel object is similar to a
236known object, and then some test to decide if the hypothesis is
237correct.  Such schemes can be seen as using feature matching as a
238search heuristic.
239
240\section{Visual pattern recognition}
241
242Pattern recognition in images is a particularly challenging and
243complex problem domain which has been studied extensively.  Objects to
244recognize include faces, fingerprints, specific physical objects, or
245even general classes of physical objects such as mugs, chairs, or
246elephants.
247
248
249Visual pattern recognition of general physical (3-dimensional) objects
250is difficult for many reasons: objects may be deformable, can appear
251different when viewed from different angles or under different
252lighting conditions, and can be occluded by other objects.  Features
253used in visual pattern recognition include edges (lines and curves),
254corners, or the color or texture of image patches.
255
256
257An important consideration when designing features for visual pattern
258recognition is the size (\ie, spatial extent in the image) of the
259feature.  \emph{Local features} gather information from a small region
260of the image, while \emph{global features} incorporate information
261from the whole image.  Local features can provide robustness to
262occlusion and deformation: if part of the object is hidden, features
263belonging to that part will be lost, but the features belonging to the
264remaining parts will still be found.  Since local features aggregate
265information from only a small portion of the image, they may be more
266noisy or less distinctive than global features.
267
268
269Since a single local feature may be insufficient to guarantee that an
270object has been correctly recognized, some systems build high-level
271features by combining multiple low-level features.  When the objects
272to be recognized are rigid or articulated (composed of multiple rigid
273parts connected by simple joints), a common approach is to build
274features that describe the relative geometry of low-level features.
275This use of \emph{geometric features} is often called \emph{geometric
276hashing}, since the result is a feature descriptor or \emph{hash code}
277which describes the geometric arrangement of the low-level features.
278
279
280
281\section{The geometric hashing framework}
282
283\begin{figure}
284\begin{center}
285% dot lamdan.dot -Tps2 -o lamdan.ps
286% ps2pdf -sPAPERSIZE=a4 lamdan.ps lamdan.pdf
287\includegraphics[height=0.8\textheight]{lamdan}
288\end{center}
289\caption{Outline of the Geometric Hashing scheme.  This diagram is a
290  reproduction of \fig 2 in Lamdan \etal \cite{lamdan1990}.
291  \label{lamdan}}
292\end{figure}
293
294
295Lamdan \etal \cite{lamdan1990} and Wolfson and Rigoutsos
296\cite{wolfson1997} give an excellent overview of the geometric hashing
297approach.  See Figure \ref{lamdan} for a block-diagram overview.
298Geometric hashing is based on the fact that when rigid objects undergo
299similarity transformations (translation, rotation, and scaling),
300angles between points on the object and relative distances between
301points do not change.  Therefore, if we describe the relative
302arrangement of points on an object in terms of these invariants, our
303description will be invariant under these transformations of the
304object.  Invariant geometric properties can be found for other types
305of transformations such as affine or perspective projection.
306
307% \cite{affineinvariant}
308
309
310Geometric hashing builds \emph{geometric features} out of lower-level
311features.  We will call the low-level features \emph{interest points}
312for clarity.
313
314Given the types of transformations the system must handle, there is
315some sufficient number of interest points required to define an
316invariant local coordinate system; these are called the \emph{basis
317  set}.  For example, for 2-D objects that can undergo similarity
318transformations, two interest points can be used to define a
319coordinate system.  For 3-D objects with similarity transformations,
320three (non-collinear) interest points are required.
321
322Geometric hashing follows an indexing approach.  During the indexing
323phase, the system preprocesses a database of objects which are to be
324recognized.  For each object, many interest points are extracted which
325are invariant under the transformations that are expected, and can be
326well localized.  The system then enumerates all basis sets by looking
327at all combinations of interest points; each set of basis features
328defines a relative coordinate system.  For each basis set, all
329remaining interest points and enumerated and their positions described
330in this coordinate system.  Each relative coordinate vector forms a
331\emph{geometric feature descriptor}.  In this way, the relative
332geometric arrangement of a set of interest points is converted into a
333point in a vector space.
334
335% --- consider adding a variant of the quad fig here.
336
337In the standard geometric hashing approach, the geometric feature
338descriptor is discretized and encoded as a hash table key. A grid is
339place over the feature descriptor space, and the hash table key of a
340geometric feature is the identifier of the grid cell containing it.
341Each entry in the hash table maps back to the basis set plus the
342feature whose position is encoded.
343
344% --- picture?
345
346Given a new image, a geometric hashing system extracts all interest
347points from the image and, as at indexing time, proceeds to enumerate
348all basis sets and computes the positions of the remaining interest
349points in each basis set.  This generates many geometric features
350whose descriptors are discretized and converted to hash table keys, as
351before.  Similar arrangements of interest points will lead to similar
352geometric feature descriptors, which--with luck---will be mapped to
353the same hash table key.  The hash table values associated with each
354key are retrieved, and each value that is found is considered a match.
355Each match votes for the correspondence between the interest points
356from the database and the interest points from the image that were
357used to define the geometric feature.
358
359
360After all features from the image have been processed, many votes will
361have accumulated.  False matches (due to aliasing or hash table
362collisions) should be uniformly distributed over the set of known
363objects, so should not generate very many votes for any particular
364object.  Objects that are actually present in the image should
365generate many votes.  Objects that receive an insufficient number of
366votes are rejected, and the remainder are subjected to a verification
367process.
368
369
370The verification process requires computing a transformation from the
371object to image coordinates, using the corresponding interest points
372from the known object and the image.  Using this transformation, all
373the interest points belonging to the known object are projected into
374image coordinates.  If the match is correct, we expect the image to
375contain features at these locations.  Some researchers add an extra
376verification stage by projecting the known object into the image and
377checking that the predicted edges of the object are found in the image
378(eg, \cite{lamdan1990, huttenlocher1990}).
379
380
381The geometric hashing approach is flexible and applicable to many
382problems, but there are some issues with the basic version described
383here.  The hash function described above simply places a grid over the
384relative coordinate system and returns the identifier of the grid cell
385in which the interest point lands.  This is prone to edge effects: if
386an interest point is near the edge of a grid cell, a small change in
387its position due to noise can move it across the edge into a different
388grid cell, which causes the hash code to change and a different set of
389feature matches to be found.  This problem can be overcome by
390detecting interest points that are near the edges of their cells and
391also searching the hash table using the hash codes of neighbouring
392cells.  This results in more false feature matches, and can become
393expensive as the dimensionality of the geometric feature space
394increases (there are more edges, and the proportion of the hypervolume
395of feature space that is near an edge increases).  Using a data
396structure other than the hash table may be beneficial.
397
398
399Another issue with uniform grid-based hashing is that it assigns equal
400importance (and storage space) to each grid cell in the feature space.
401Consider an interest point that is far from the basis set that defines
402its local coordinate system.  Small positional errors in the basis set
403can cause the coordinate system to be rotated or scaled, which results
404in large changes in the interest point's relative coordinates.  This
405means that edge effects become more pronounced in such regions of the
406feature space.  One response to this concern is to increases the size
407of the grid cells far from the basis set, but then if the interest
408points are uniformly distributed, these larger hash table bins will
409contain more features, so every match to these bins will be
410accompanied by more false positives.
411
412
413Finally, a hash table with equal-sized cells will only be uniformly
414utilized if interest points are uniformly distributed in feature
415space, but some domains may yield interest points that are far from
416uniformly distributed.  In other domains, it may be beneficial to
417place constraints on the geometric features that are used (for
418example, to avoid regions that are prone to large errors, or that are
419relatively indistinctive).
420
421
422There is nothing in the geometric hashing framework that requires a
423standard hash table to be used for feature matching.  Other methods of
424feature matching are described in the next section.
425
426
427\section{Related work in fast feature matching}
428
429
430For feature matching in a geometric hashing framework, the database of
431known objects is converted into a set of points in a feature
432descriptor space, which is a vector space.  To find matches to a new
433feature (extracted from an image in which we want to recognize
434objects), we need to find all points in the vector space that are
435within some tolerance.  Usually the Euclidean distance metric is
436applied, so the tolerance is a radius in the vector space.
437
438
439\subsection{Bloom filters}
440
441A Bloom filter \cite{bloom1970} is a data structure that can answer
442queries of the form ``is feature $q$ a member of the set of known
443features?''  A Bloom filter is a set of $n$ bits, initially zero.  A
444set of $k$ hash functions must be defined which take a feature as
445input and produce an integer in $[0, n)$ as output.  In the indexing
446phase, we examine each of the known features, and apply each of the
447hash functions, producing $k$ hash codes.  Each code is used to
448reference a bit and \emph{set} it (\ie, turn it on).  Each known
449feature results in $k$ bits being set; several features may request
450that a particular bit be set.
451
452
453Given a new feature to match, we run the $k$ hash functions on it and
454test whether each of the resulting bits are set.  If the feature is
455equal to a feature seen at indexing time (according to the hash
456functions), then all $k$ of the bits will have been set; thus the
457Bloom filter does not produce false negatives.  If the query feature
458is not equal to a known feature, then some of the bits may have been
459set due to \emph{collision} of the hash functions, but the feature
460will only be accepted if all $k$ hash functions result in collision;
461this results in a false positive.  The rate of false positives
462increases as the number of known features increases relative to the
463number of bits in the filter.  The Bloom filter is ideally suited to
464cases where the number of known features is small, and a verification
465step can be applied to eliminate any false positives that are
466produced.
467
468
469For feature matching, we want to know not only whether the query
470feature $q$ matches a known feature, we want to know \emph{which}
471known feature matches.  One way to answer such queries is to use a
472\emph{Bloomier filter} \cite{chazelle2004}.  The simplest Bloomier
473filter uses multiple Bloom filters to categorize a given feature into
474two different categories, and uses slightly more space than two Bloom
475filters.  A set of $n$ such Bloomier filters can be used to categorize
476a feature into $2^n$ different categories by assigning one Bloomier
477filter to each bit of the result.
478
479
480The standard Bloom filter only finds exact matches, but we want to
481find \emph{nearby} matches.  An extension of Bloom filters to allow
482proximity searches is given by Kirsch and Mitzenmacher
483\cite{kirsch2006b}.  This variant uses \emph{locality sensitive
484hashing} (LSH, discussed below) as its hashing function, so it
485inherits some limitations from LSH.  Most importantly, LSH only
486guarantees with some probability that nearby features will hash to the
487same bin.  As a result, the locality-sensitive Bloom filter can
488produce false negatives as well as false positives.
489
490
491In order to answer feature matching queries (find all known features
492near a given query feature), one could combine these Bloom filter
493extensions to build a locality-sensitive Bloomier filter.  However,
494the false negative rate of the locality-sensitive filter would be
495magnified because the Bloomier filter requires \mbox{$2
496\ceil{\log(n)}$} separate filters to represent $n$ known features, and
497all of these filters must produce the correct answer.  It seems that
498Bloom filters are not particularly well suited to this task.
499
500
501\subsection{Locality Sensitive Hashing}
502
503Locality Sensitive Hashing (LSH) \cite{datar2004, gionis1999,
504andoni2006}, is a technique for answering approximate nearest
505neighbour queries: it returns a feature whose distance is within a
506constant factor of the distance to the nearest neighbour.  The LSH
507scheme can be extended to handle exact nearest-neighbour queries
508efficiently if the set of known features satisfies some \emph{bounded
509growth} criteria.  It is intended for use in high-dimensional spaces
510(tens to thousands of dimensions), and $\ell_p$ distance metrics for
511$p \in (0, 2]$.
512
513
514The core idea of LSH is to use several hash functions with the
515property that the probability of collision---features being placed in
516the same bin---is much higher for nearby features than for distant
517ones.  In the indexing stage, we populate the hash table by running
518each hash function on each known feature.  To perform a query, we run
519the hash functions on the query feature and retrieve the features in
520the same hash table bin as the query.  One of these features is likely
521to be an approximate nearest neighbour of the query.
522
523
524In practice, in \cite{datar2004} each hash function is composed of
525several (typically $10$) simpler hash functions.  Each simple hash
526function performs a random projection in the feature space: the hash
527value is the discretized value of the dot product of the feature
528vector with the hash function's random vector.  Combining several of
529these simple hash functions is equivalent to projecting the feature
530vectors onto a grid in a $10$-dimensional, non-orthogonal, subspace.
531The hash code is simply the identifier of the grid cell in which the
532feature falls.
533
534
535This scheme suffers from edge effects: a query feature may land in a
536bin other than the bin containing its nearest neighbour due to small
537positional noise in any of the dimensions.  Attempting to reduce this
538effect by making the bin size (discretization level) larger fails
539because the number of hash functions must be increased to maintain the
540same total number of hash bins.
541
542
543In order to reduce the number of false negatives, we can choose many
544(typically $30$) different hash functions and place a reference to
545each feature in the hash table bin chosen by each function.  This
546increases the probability that at least one of the hash functions will
547select a $10$-dimensional subspace in which the query is close to its
548nearest neighbour.
549
550
551The standard LSH scheme does not seem particularly well suited for the
552purpose of feature matching in a geometric hashing system, since the
553dimensionality of the feature vectors is usually moderate ($2$ to $6$
554in the \an system). Since some feature aliasing is expected in
555geometric hashing systems, the nearest neighbour may not be the
556correct match: we would prefer to get all neighbours within a given
557search radius.
558
559
560\subsection{\Kdtrees}
561
562\begin{figure}
563  \begin{center}
564        % tree-figs.py : plot_bboxes
565    \begin{tabular}{c@{\hspace{3pt}}c@{\hspace{3pt}}c}
566      \includegraphics[width=0.31\textwidth]{kdtree-0} &
567      \includegraphics[width=0.31\textwidth]{kdtree-1} &
568      \includegraphics[width=0.31\textwidth]{kdtree-2} \\
569      \includegraphics[width=0.31\textwidth]{kdtree-3} &
570      \includegraphics[width=0.31\textwidth]{kdtree-4} &
571      \includegraphics[width=0.31\textwidth]{kdtree-5}
572    \end{tabular}
573  \end{center}
574  \caption[A \kdtree partitioning of space.]{A \kdtree partitioning of
575        space.  Each panel shows the nodes at one level of the tree.  The
576        points are 200 samples from a ring with uniformly-distributed
577        radius in $\left(0.6, 1\right)$ and uniformly-distributed angle in
578        $\left(0, \displaystyle{\frac{\pi}{2}}\right)$.  Nodes are split
579        along the dimension with the largest range, and the splitting
580        position is chosen so that the two child nodes will contain an
581        equal number of points.  Note how the tree adapts to the
582        distribution of the data: each node tightly bounds the data points
583        it owns.  This is a slightly different version of the \kdtree than
584        Bentley's original description.\label{kdtree}}
585\end{figure}
586
587The $k$-dimensional tree (\kdtree) was introduced by Bentley in 1975
588\cite{bentley1975}, and several extensions have been described
589\cite{friedman1977, sproull1991}.  The idea is to build a binary tree
590over the feature space, where each node ``owns'' a region of the space
591which is disjoint from the region owned by its sibling node.  Each
592non-leaf node has an axis-aligned splitting hyper-plane; its left
593child owns the subspace to the left of the splitting plane, and the
594right child the subspace to the right.  In this way, the \kdtree
595defines a hierarchical subdivision of the feature space into
596axis-aligned hyper-rectangles.  See \figref{kdtree}.
597
598
599For the purposes of indexing the features in a geometric hashing
600system, the set of features is known beforehand and is static, so we
601do not need to insert or delete features from the tree, and Bentley's
602\emph{optimized} \kdtree construction method can be used.  That is,
603the tree can be built to adapt to the particular set of features we
604have.
605
606
607Tree construction proceeds by first assigning all features to the root
608node, then recursively splitting nodes until each leaf node owns at
609most $L$ features, with $L$ perhaps $10$ or $20$.  Splitting is done
610by selecting a splitting plane and using it to partition the features.
611Sproull \cite{sproull1991} enumerates several strategies for choosing
612the splitting hyperplane.  Traditionally, \kdtrees use axis-aligned
613splitting hyperplanes, and the splitting dimension is chosen by simply
614iterating through the dimensions (as in Bentley's original paper), or
615by selecting the dimension with the largest \emph{range} or
616\emph{variance}.  One can also use an arbitrary splitting hyperplane,
617for example by finding the principal eigenvector of the covariance
618matrix of the feature vectors.  Typically the splitting hyperplane is
619positioned so that the two children own an equal number of data
620points.  This leads to a balanced (and therefore short) tree, but in
621some applications it can be beneficial to bias the splitting location
622\cite{macdonald1990}.
623
624
625Depending on the application, it can be beneficial to store at each
626node the tight axis-aligned bounding box (and other summary statistics
627\cite{deng1995}) of the points owned by that node (as shown in
628\figref{kdtree}).  This is particularly useful where there is
629significant correlation between the dimensions of the data, because
630splitting the data points along one dimension implies that the
631bounding box will shrink in other dimensions as well, and this can
632result in faster searches.  If the bounding boxes are stored, then it
633is no longer strictly necessary to store the splitting hyperplane,
634though it may be useful to speed up some computations.
635
636
637\comment{
638\begin{figure}
639  \begin{center}
640    \includegraphics[width=\textwidth]{rangesearch-bb-fig}
641  \end{center}
642  \caption[\Kdtree range search with bounding boxes.]{\Kdtree range
643search for a query point $q$ in \kdtree node $N$, for a \kdtree that
644stores the bounding box of each node.  We check whether the query
645hypersphere overlaps the child nodes $L$ and $R$.  In both cases $q$
646is contained in the left child node $L$ so we must recurse on $L$.  In
647the top panel, the distance \emph{in the splitting dimension} from $q$
648to the bounding box of node $R$ is less than the search radius $r_1$
649so we do not need to recurse on $R$.  In the lower panel, the distance
650in the search dimension to the query is larger than the search radius
651$r_2$ so we must recurse on $R$.}
652  \label{rangesearch-bb}
653\end{figure}
654
655
656\begin{figure}
657  \begin{center}
658    \includegraphics[width=\textwidth]{bohm2001fig7}
659  \end{center}
660  \caption[Lower and Upper Distance Bounds in a \kdtree.]{Lower and upper
661        bounds (min- and max-dist, respectively) on the distance from a query point
662        to a \kdtree bounding box.
663        This figure is copied from \cite{bohm2001} Figure 7.}
664  \label{upperlower}
665\end{figure}
666 
667
668\begin{figure}
669  \begin{center}
670    \includegraphics[width=\textwidth]{rangesearch-split-fig}
671  \end{center}
672  \caption[\kdtree range search with splitting planes.]{\kdtree range
673        search for a query point $q$ in \kdtree node $N$,
674        for a \kdtree that only stores the splitting plane (not the bounding boxes).
675        Top: since the query is to the left of $N$'s splitting plane (dashed line),
676        we must recurse on the left child $L$.  Since the distance in the splitting
677        dimension from the query to the splitting plane, $D_R$ is less than $r$,
678        we must recurse on $R$.
679
680        Bottom: For the left node ($L$): the query is above the splitting plane, so we must recurse
681        on child $T$.  The distance $D_B$ in the splitting dimension to the splitting plane
682        is less than $r$, so we must recurse on $B$ also.
683       
684        For the right node ($R$): the query is above the splitting plane, so we must recurse
685        on child $t$.  The distance $D_b$ to the splitting plane is greater than $r$ so we
686        need not recurse on node $b$.  (Note, we could also keep track of the \emph{total}
687        distance to the query point, $\sqrt{D_R^2 + D_b^2}$, but this requires more book-keeping.}
688  \label{rangesearch-sp}
689\end{figure}
690}
691
692For feature matching, we typically want to do \emph{range search}:
693find all features within radius $r$ of a given query feature $q$.
694This is easily achieved with a \kdtree using a recursive algorithm.
695Starting at the root, we must decide whether we need to recurse on
696each of the node's children.  (How we make this decision is explained
697below.)  Once we reach a leaf, we enumerate the features owned by the
698leaf and compute the distance to the query feature; any feature with
699distance less than $r$ is accepted.
700
701
702We decide whether we must recurse on a node's children based on their
703bounding boxes.  We compute the \emph{mindist}---the lower-bound of
704distance---between the query point and each child's bounding box.  The
705mindist is simply the distance to the corner or edge of the bounding
706box, and can be computed quickly.  Any node whose mindist to the query
707point is less than the query radius $r$ must be visited.  If the
708\kdtree does not explicitly store the tight bounding-boxes of the
709nodes, loose bounding-boxes can be constructed during the recursion
710based on the splitting planes of the ancestors.
711
712
713\comment{
714Nearest-neighbour search in \kdtrees is essentially the same as range
715search, except that $r$ is considered the \emph{pruning distance}; any
716nodes whose lower-bound distance is larger than the pruning distance
717need not be visited.  The pruning distance decreases monotonically as
718the search proceeds and is simply the (upper bound of) the distance to
719the nearest neighbour found so far.  Since reducing the pruning
720distance is critically important to achieving fast search, the order
721in which nodes are explored is important; one typically keeps a
722priority queue of nodes to visit, sorted by the (lower-bound) distance
723of the node to the query feature.  If the query feature falls in the
724region owned by one of the leaf nodes, then all nodes on the path from
725the root to that leaf will have lower-bound distance of zero, so this
726path will be explored first and the ``greedy'' nearest neighbour
727distance will become the first pruning distance.
728
729
730The \kdtree can also be used for approximate nearest-neighbour or
731range searches.  Beis and Lowe \cite{beis1997} describe the Best Bin
732First algorithm, which is essentially the same as the full
733nearest-neighbour search except that the search terminates after
734exploring a fixed maximum number of leaves or after a fixed period of
735time has elapsed.  The priority queue is sorted by the lower-bound
736distance of the node's bounding box to the query point, so the `most
737promising' nodes are explored first.
738}
739
740
741Chapter \ref{chap:kdtree} presents the \kdtree in more detail.
742
743
744\subsection{Other approaches}
745
746% Voronoi:
747% -de Berg 1997
748%   (Mark de Berg , Marc van Kreveld , Mark Overmars , Otfried Schwarzkopf, Computational geometry: algorithms and applications, Springer-Verlag New York, Inc., Secaucus, NJ, 1997)
749% -Edelsbrunner 1987
750%   (Herbert Edelsbrunner, Algorithms in combinatorial geometry, Springer-Verlag New York, Inc., New York, NY, 1987)
751% -Preparata and Shamon 1985
752%   (Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985)
753
754In contrast to the \kdtree's rectangular decomposition of space, there
755is a family of space decompositions that use hyperspheres.  An example
756is the \emph{ball-tree} \cite{uhlmann1991b}, in which each non-leaf
757node is associated with one of the known features $f$ and a radius
758$r$; the left child owns all features within radius $r$ of feature
759$f$, and the right child owns the remaining features.  Similarly, in
760the \emph{anchors hierarchy} \cite{moore2000}, each non-leaf node is
761represented by a feature (called its \emph{anchor}), and a node owns
762all the features that are closer to its anchor than the anchor of its
763sibling.  These sphere-based trees are supposed to better capture the
764structure of high-dimensional data sets, though the results seem to
765depend quite strongly on the particular data distributions and search
766parameters.
767
768
769For two-dimensional feature spaces, algorithms based on planar point
770location (eg, \cite{kirkpatrick1983}) and the Voronoi tesselation of
771space yield $\mathcal{O}(n \log n)$ preprocessing time, a data
772structure of size $\mathcal{O}(n)$, and query time of
773$\mathcal{O}(\log{n})$.  Unfortunately, in higher dimensions the space
774required to store the Voronoi tesselation is
775$\mathcal{O}(n^{\floor{d/2}})$, which quickly becomes untenable
776\cite{arya1998}.
777
778% planar point location:
779% -Dobkin and Lipton in 1976 O((log n)^2) time, O(n^2) space
780% -Sarnak and Tarjan O(n) space, O(log n) time
781%    * Neil Sarnak, Robert E. Tarjan (1986). "Planar Point Location Using Persistent Search Trees". Communications of the ACM 29 (7): 669-679.
782% -Edelsbrunner, Guibas, and Stolfi, same
783%    * Herbert Edelsbrunner, Leonidas J. Guibas, Jorge Stolfi (1986). "Optimal point location in a monotone subdivision". SIAM Journal on Computing 15 (2): 317-340.
784% -Kirkpatrick, same
785%    * David G. Kirkpatrick (1983). "Optimal Search in Planar Subdivisions". SIAM J. Comput. 12: 28-35.
786% Voronoi tesselation:
787% -B. Delaunay, Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793-800, 1934
788
789The design of spatial indexing data structures and search algorithms
790is itself a large research area.  Dozens of different tree structures
791have been proposed: in two surveys B\"ohm \cite{bohm2001} and
792Hjaltason \cite{hjaltason2003} identify B-, $\textrm{B}^{+}$-, ball-,
793bisector-, BSP-, DABS-, fq-, gh-, \mbox{GNA-,} hB-, $\textrm{hB}^{\pi}$-,
794hybrid-, IQ-, kd-, kd-B-, $\textrm{LSD}^h$-, M-, mb-,
795$\textrm{mb}^{\ast}$-, mvp-, oct-, \mbox{post-office-,} pyramid-, quad-, R-,
796$\textrm{R}^{\ast}$-, $\textrm{R}^{+}$-, sa-, slim-, \mbox{sphere-,}
797SR-, SS-, TV-, vp-, $\textrm{vp}^{\ast}$-, and X-trees.  There is an
798equally mind-boggling variety of exact and approximate search
799algorithms.  Each data structure and algorithm may work well in some
800region of parameter space, but there is no clear winner for
801general-purpose use.
802
803
804\section{Astrometric calibration as a pattern recognition task}
805
806
807\comment{
808wget "http://casjobs.sdss.org/ImgCutoutDR7/getjpeg.aspx?ra=166.45&dec=-0.03&scale=1&opt=&width=2000&height=2000" -O ngc3521-orig.jpg
809jpegtopnm ngc3521-orig.jpg | pnmrotate -45 | pnmcut 600 900 1400 1000 | pnmscale -reduce 2 | pnmtojpeg > ngc3521.jpg
810#---> http://live.astrometry.net/status.php?job=alpha-200906-68444159
811jpegtopnm ngc3521.jpg | ppmtopgm | pnminvert | pnmtojpeg > ngc3521-bw.jpg
812wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=field.xy.fits" -O ngc3521.xy
813wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=index.xy.fits" -O ngc3521-index.xy
814jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white > ngc3521-sources.png
815jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white -P | plotxy -I - -i ngc3521-index.xy -x 1 -y 1 -C black -b white -s crosshair -P | plot-constellations -f 18 -w ngc3521.wcs -i - -N -o ngc3521-index.png
816%%% wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=wcs.fits" -O ngc3521.wcs
817%%% scp gmaps:/data2/test-merc/tycho.mkdt.fits .
818wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-0.5&bb=0,-85,360,85&dashbox=0.1&w=500&h=500&lw=3" -O ngc3521-zoom0.png
819wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-1&bb=175.533,-17.7621663832,211.533,17.6598478619&dashbox=0.01&w=500&h=500&lw=3" -O ngc3521-zoom1.png
820wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=0.5&bb=191.733,-1.85338140354,195.333,1.74602498613&w=500&h=500&lw=3" -O ngc3521-zoom2.png
821for x in 0 1 2; do
822 pngtopnm ngc3521-zoom${x}.png | ppmtopgm | pnminvert | pnmtopng > ngc3521-zoom${x}-bw.png;
823done
824}
825
826
827% ~/an-2/usnob-map/execs/tilerender -x 0.000000 -y -85.000000 -X 360.000000 -Y 85.000000 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.1 -d > tile1.png
828% ~/an-2/usnob-map/execs/tilerender -x 151.724000 -y -4.784223 -X 187.724000 -Y 29.772221 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.25 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.01 -d > tile2.png
829% ~/an-2/usnob-map/execs/tilerender -x 167.924000 -y 11.335527 -X 171.524000 -Y 14.841399 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g 0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 > tile3.png
830%
831% /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/index.xy.fits -S 1 -W 720 -H 503 -x 1 -y 1 -w 2 -r 6 -s s > ixy.pgm
832% /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/field.xy.fits -S 1 -W 720 -H 503 -r 5 -x 1 -y 1 -w 2 -s c > fxy.pgm
833% cp /home/gmaps/ontheweb-data/tor/200706/51145570/image.pnm undim.ppm
834% pgmtoppm green ixy.pgm > ixy.ppm
835% pnmcomp -alpha=ixy.pgm ixy.ppm undim.ppm > sum.ppm
836% pnmcomp -alpha=fxy.pgm fxy.ppm sum.ppm > sum2.ppm
837% pnmtopng sum2.ppm > sum.png
838% pnmcomp -alpha=fxy.pgm fxy.ppm undim.ppm | pnmtopng > sources.png
839% http://oven.cosmo.fas.nyu.edu/test/status.php?job=tor-200706-51145570
840
841
842\begin{figure}
843\begin{center}
844\framebox{\includegraphics[width=0.99\figunit]{ngc3521-bw}} \\
845\framebox{\includegraphics[width=0.99\figunit]{ngc3521-sources}} \\
846\framebox{\includegraphics[width=0.99\figunit]{ngc3521-index}}
847\end{center}
848\caption{\captionpart{Top:} Input image (credit: Sloan Digital Sky
849Survey).  \captionpart{Middle:} The brightest 100 sources extracted
850from the image.  \captionpart{Bottom:} Reference sources, transformed
851into the image coordinate system (crosshairs).  Many of the image and
852reference sources are aligned, but there are many image sources
853without reference sources.  Our system knows about the positions of
854many objects of interest on the sky, and has labelled the galaxy NGC
8553521.\label{fig:redgreen}}
856\end{figure}
857
858
859\begin{figure}
860\begin{center}
861\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c}
862\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom0-bw}} &
863\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom1-bw}} &
864\framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom2-bw}}
865\end{tabular}
866\end{center}
867\caption{The location of the input image on the sky.
868  \captionpart{Left:} The whole sky, in Mercator projection.  The
869  sinusoid-shaped feature is the Milky Way.  The dashed box shows the
870  zoomed-in region.  \captionpart{Middle:} Zoomed in by a factor of
871  $10$.  \captionpart{Right:} Zoomed in by a factor of $100$.  The box
872  shows the outline of the input image.\label{fig:onthesky}}
873\end{figure}
874
875
876For modern astronomers, astrometric calibration is often one of the
877first steps toward getting useful information out of an image of the
878sky.  Aligning a new image with an \emph{astrometric reference
879catalog} allows the astronomer to place the image within a standard
880coordinate frame.  This allows stars, galaxies, and other objects
881(\emph{sources}) in the new image to be identified with known sources,
882which in turn allows astronomers to calibrate other properties of the
883new image, and allows the positions of new sources to be described in
884a standard reference frame.
885
886
887The task of blind astrometric calibration---automatically finding the
888astrometric calibration of an image, using only the information in the
889image pixels---can be seen as a pattern recognition problem.  As
890Bertin \cite{bertin2005} notes, ``astrometric and photometric
891calibrations have remained the most tiresome step in the reduction of
892large imaging surveys,'' so this is not only an interesting problem to
893solve, but one with practical implications for astronomers.
894
895
896For the purposes of astrometric calibration, we can think of the sky
897as a large two-dimensional surface: the stars are very distant, so our
898viewpoint is effectively fixed.  We are moving, as are the stars, but
899these motions are small relative to the precision at which we
900typically work.  The sky contains many stars, galaxies, and other
901astronomical sources.  The stars and distant galaxies are effectively
902point sources, while closer galaxies can be resolved.  Astrometric
903reference catalogs list the positions, motions, and brightnesses of
904these sources and serve as the ``ground truth'' or database of known
905(reference) objects.  The USNO-B1 catalog \cite{usnob, nomad}, for
906example, lists over one billion objects.  As many as a few percent of
907these are false detections or other artifacts \cite{barroncleaning},
908and some objects that should be visible are missing.
909
910
911Extra sources can be due to planets, comets, satellites, or aircraft.
912Missing or poorly localized source can be due to imperfections in the
913imaging sensor, saturation, cosmic ray interference, or (rarely)
914occlusion.  Errors in the image processing that detects sources can
915lead to sources being gained or lost.  We call sources that appear in
916only the input image or reference catalog \emph{distractors} and
917\emph{dropouts}, respectively.  The existence of distractors and
918dropouts means that we can never assume that all the objects in the
919reference catalog will be contained in an image to be recognized, or
920vice versa.
921
922
923The images to be recognized are subregions of the sky.  Image sizes
924range from nearly half the celestial sphere down to $10^{-7}$ of the
925area and smaller.  The input images measure unknown bands of the
926electromagnetic spectrum, and various nonlinear functions may have
927been applied to the pixel values.  We cannot rely on absolute
928brightness or color to recognize individual stars or galaxies.  At
929best we can hope that there is some positive correlation in the
930relative brightness ordering of objects in the image and the
931corresponding objects in our catalog.
932
933
934Blind astrometric calibration is an ideal task for exploring geometric
935ideas in pattern recognition.  Most celestial objects are effectively
936point sources, and can be found and localized to sub-pixel accuracy
937using relatively simple image-processing procedures.  But since the
938individual features are characterized only by their positions and
939brightnesses, we must examine collections of features in order to
940build distinctive patterns.  In \chapref{chap:techreport} we present
941\an, which applies the geometric hashing framework to the task of
942blind astrometric calibration.  An example of our results in shown in
943\figs \ref{fig:redgreen} and \ref{fig:onthesky}.
944
945
946\comment{
947There are several useful applications for a blind astrometry solver.
948Most telescopes used by professional astronomers are
949computer-controlled and automatically record the telescope pointing in
950the image header, but these systems occasionally fail.  Feeding the
951images to a blind astrometry solver would allow an independent
952verification that the \metadata written by the telescope control
953software are correct.  Second, some observatories have large archives
954of photographic plates taken by astronomers over many decades, and a
955growing fraction of these have been digitized.  These images could be
956useful to researchers, since they provide a longer baseline for
957measuring changes over time.  However, information about where the
958telescope was pointing has often been lost, or is only available as a
959hand-written note on the edge of the photographic plate, or as an
960entry in a logbook.  A blind astrometry solver would be an invaluable
961tool for easily providing accurate \metadata about these images.
962Finally, these is a large and enthusiastic community of amateur
963astronomers who produce many images that would be useful to
964professional astronomers, but since amateurs are seldom interested in
965recording exact \metadata, these images are not readily available to
966professional astronomers.  A blind astrometry solver would unlock this
967trove of pixels.
968
969Other interesting applications include creating a customized
970star-finding chart overlayed on the input image, and controlling a
971telescope by using the output of a blind solver as feedback to a
972control system.
973
974\begin{figure}
975  \begin{center}
976        %\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c}
977        \includegraphics[width=\figunit]{moon} \\
978        \includegraphics[width=\figunit]{saturated}
979        %\end{tabular}
980  \end{center}
981  \caption{Astronomical images with occlusion.  \captionpart{Top:} The
982moon, buildings, and mountains occlude the stars behind them (image
983copyright Johannes Schedler, \texttt{http://panther-observatory.com}).
984\captionpart{Bottom:} Saturation and diffraction spikes due to bright
985objects can obscure nearby objects.}
986\label{moon}
987\end{figure}
988
989Since most celestial objects are effectively point sources, the
990obvious image feature to use is an individual object.  Feature
991detection is relatively straightforward, and the positions of features
992can be measured with subpixel accuracy, but the feature descriptor
993essentially contains only brightness and position; individual features
994are not distinctive.
995
996Although occlusion is typically not a problem in astronomical images,
997it does occasionally occur (see Figure \ref{moon}).  In addition,
998bright objects in the image can cause diffraction spikes or saturation
999over a large region of the image, and this can cause nearby objects to
1000be hidden.
1001}
1002
1003
1004
1005\section{Related work in astrometric calibration}
1006
1007There seem to be two distinct groups of researchers who have worked on
1008astrometric calibration.  The first are professional astronomers,
1009whose images are typically of small angular extent, long exposure
1010time, and high quality.  They typically assume that there is a good
1011initial guess of the astrometric calibration and the goal is to
1012produce a very accurate calibration, including image distortion.  The
1013second group of researchers are spacecraft engineers who want to use
1014the stars to estimate the attitude of a camera attached to a
1015spacecraft.  Here the images are of wide angular extent, have short
1016exposure time, and are very noisy.  Primary concerns include weight,
1017power consumption, and robustness (especially in avoiding false
1018positives), while a high degree of accuracy is neither required nor
1019possible given the hardware.
1020
1021\subsection{Non-blind astrometric calibration}
1022
1023Non-blind astrometric calibration requires that an initial estimate of
1024the calibration to be provided.  Groth \cite{groth1986} presents an
1025algorithm for matching two lists of coordinates (eg, image coordinates
1026in pixels and reference star coordinates on the celestial sphere),
1027assuming that the lists contain a significant proportion of objects in
1028common.  With the limited computing resources available at the time,
1029he suggests taking the brightest 20 or 30 objects in each list.
1030
1031
1032Once the two lists of objects have been compiled and the brightest
1033objects selected, all sets of three objects are enumerated and the
1034triangles they form are described by a scale-, rotation-, and
1035translation-invariant descriptor, composed of the length ratio of the
1036longest to shortest edges, plus the cosine between these edges and the
1037\emph{sense} or \emph{parity} of the triangle.  The tolerances
1038associated with these features are computed (by propagating their
1039positional errors through the feature descriptor process) and stored,
1040along with the logarithm of the triangle's perimeter.  Triangles with
1041large length ratio are rejected since they are relatively
1042indistinctive.
1043
1044
1045After triangle features are extracted from both point lists, feature
1046matching is performed by checking whether the distance between each
1047pair of features is less than their corresponding tolerances.  This
1048process is accelerated by sorting the features on one of the feature
1049dimensions.  If multiple matches are found for a particular feature,
1050only the closest is considered.  After all the features have been
1051compared, many correct matches should be found, along with some false
1052matches.  For each match, the difference of the log-perimeters of the
1053two triangles is computed.  This gives the relative scales of the two
1054triangles and hence their coordinate frames, assuming the match is
1055correct.  False matches are rejected by iterative outlier detection:
1056the mean difference of log-perimeters is computed and matches far from
1057the mean are rejected.
1058
1059
1060This approach is essentially an application of the geometric hashing
1061method, though instead of using hashing to accelerate feature
1062matching, the features are simply kept in lists that are sorted on one
1063dimension.  Much of the subsequent work in this area follows
1064essentially the same path (eg, \cite{valdes1995}, \cite{dong2006}).
1065
1066
1067P\'al and Bakos \cite{pal2006} adapt the triangle-matching approach to
1068images containing many more objects (of order $10^4$).  Since it would
1069become prohibitively expensive to enumerate all triangles in such an
1070image, they reduce the number of triangles created by using only the
1071triangles created by a Delauney triangulation.  This vastly reduces
1072the number of triangles created, but makes the algorithm more
1073sensitive to dropout and distractor stars: a single extra star causes
1074a completely disjoint set of triangles to be created in its
1075neighbourhood.  To compensate for this shortcoming, they define an
1076\emph{extended Delauney triangulation}: for each point, they select
1077all points at distance $\ell$ in the Delauney triangulation, and
1078triangulate this set.  This process is repeated for each point.  The
1079first extended triangulation ($\ell = 2$) skips over the nearest set
1080of stars and builds larger triangles by using the next-further set of
1081stars.  The intent is that if some of the nearby stars are distractors
1082that they will be ignored when the extended triangulations are used.
1083
1084\begin{figure}
1085\begin{center}
1086\includegraphics[width=\figunit]{pal-fig2}
1087\end{center}
1088\caption{The triangle parameterization used by P\'al and Bakos.  This
1089figure is a reproduction of \fig 2 in \cite{pal2006}.\label{pal}}
1090\end{figure}
1091
1092P\'al and Bakos introduce a new triangle parameterization which is
1093continuous and sensitive to chirality (parity); see \figref{pal}.
1094This two-dimensional parameter space is used as the geometric feature
1095space.  When matching triangles between the two images, they demand a
1096\emph{symmetric point match}: each triangle must be the other
1097triangle's nearest neighbour in feature space.  Matching two images is
1098done by creating the lists of triangles and attempting to find
1099symmetric point matches.  This process is accelerated by sorting each
1100list by one of the coordinates.  Each triangle match is considered to
1101be a vote for the correspondence of the three pairs of points
1102composing the triangles.  These votes are accumulated in a sparse
1103matrix where element $(i, j)$ contains the number of votes for a
1104correspondence between object $i$ in the first image and object $j$ in
1105the second image.  After all matches are considered, the top 40\% of
1106the nonzero matrix elements are assumed to contain the true
1107correspondences.  A transformation based on these correspondence is
1108computed and the unitarity of the transformation matrix is used to
1109judge whether the match is true or false.
1110
1111
1112A different approach is taken by Kaiser \etal \cite{kaiser1999}.  They
1113assume they are given two lists of source positions that differ by a
1114rigid transformation involving scaling, rotation, and translation.  In
1115each image, they iterate over each pair of points and compute the
1116vector difference of their positions.  For each list, the log-length
1117and angle of the pairwise difference vectors are accumulated in a
1118two-dimensional histogram.  Observe that if the whole list of points
1119is scaled up by a constant factor, then the log-distance between each
1120pair of points increases by a constant amount.  Similarly, if the
1121whole list is rotated then the angles shift by a constant amount.
1122Once the two histograms have been computed, their cross-correlation is
1123computed.  If the two lists contain a significant number of
1124corresponding points, the cross-correlation signal will be strongest
1125at a shift corresponding to the difference in log-scale and rotation
1126between the lists.  This process is similar to a Hough transform
1127\cite{duda1972,ballard1981}, except that instead of finding the peak
1128of a single parameter-space histogram, we are searching for a peak in
1129the similarity of two histograms as we shift them with respect to each
1130other in parameter space.
1131
1132
1133Once the scaling and rotation between the lists has been found, the
1134translation can be found by scaling and rotating one of the lists into
1135the frame of the other list, then histogramming the vector difference
1136between points across the two lists.  This is a standard (generalized)
1137Hough transform.
1138
1139
1140\subsection{Blind astrometric calibration}
1141
1142
1143The majority of previous work on blind astrometric calibration is
1144motivated by the problem of spacecraft attitude estimation.  Sometimes
1145called the ``lost in space'' or ``stellar gyroscope'' problem, the
1146task is to estimate the pose of a spacecraft by using an image of the
1147sky captured by an onboard camera.  Although similar in general
1148spirit, the requirements and limitations of this application are quite
1149different than astronomical applications.  Mass, power consumption,
1150and robustness of the system are primary concerns, and as a result the
1151optical designs are very different from science-grade astronomical
1152instruments, and the available processor and memory resources are very
1153restricted.  Typical fields of view are tens of degrees across, and
1154the exposure time is kept short to allow the system to function while
1155the spacecraft is rotating.  As a result, image quality is typically
1156quite poor: often only a handful of the brightest stars are visible.
1157Since the field of view is large, a reference catalog of a few
1158thousand stars is sufficient to ensure that any view of the sky
1159contains many reference stars.  Since the system will only process
1160images from a single camera, the whole system can be customized and
1161calibrated to that camera.  For example, the nonlinear distortions of
1162the optical system can be measured, and the scale and bandpass of the
1163imaging system are known, so the reference catalog can be tailored to
1164match.
1165
1166
1167For example, Liebe \etal \cite{liebe2004} describe a system design
1168with a $56~\deg$ field of view and exposure time of $50~\unit{ms}$.
1169The resulting images contain tens of stars if the spacecraft is not
1170rotating, but on average only three stars will be detectable when the
1171rotation rate is $50~\deg/\unit{sec}$.  The paper does not describe a
1172particular algorithm for star identification, with the implication
1173that it is not a particularly difficult problem since absolute
1174brightness information will be available, and the total number of
1175stars that are visible to the camera is only a few hundred.
1176
1177
1178In earlier work \cite{liebe1993}, Liebe describes a star
1179identification system.  The reference catalog is composed of the 1539
1180brightest stars, with brightness calibrated to the camera used in the
1181system.  The field of view is $30~\degrees$, and the system uses a
1182feature-matching approach, using the brightest star in the field and
1183its two nearest neighbours to define a geometric feature.  The feature
1184descriptor is composed of the distance from the brightest star to its
1185nearest and second-nearest neighbours, along with the angle between
1186these neighbours.  Note that this feature is not scale-invariant,
1187because the system is only meant to solve images taken by one camera,
1188so the scale is known.  An index is constructed by enumerating all
1189such features that could possibly be detected, given the detection
1190limits of the camera.  These features are coarsely quantized and
1191stored in a table.  To account for noise in the feature descriptors,
1192all neighbouring cells in the quantized feature space are also stored
1193in the table.  This generates $185,000$ features.  At test time, the
1194features in the image are enumerated and the table of features is
1195scanned; an exact feature match in the quantized space is assumed to
1196be correct.
1197
1198
1199This approach is a fairly straightforward geometric hashing technique,
1200except that it does not use hashing as such, and there is no voting or
1201verification scheme because feature aliasing is assumed not to happen.
1202
1203\begin{figure}
1204\begin{center}
1205\includegraphics[width=\figunit]{grid}
1206\end{center}
1207\caption{A grid-based feature: two sources are used to define a local
1208  coordinate system which is discretized into a grid of cells.  Each
1209  cell becomes a bit in the feature descriptor; if a cell is occupied
1210  its bit is set.  This figure is copied from MacKay
1211  \cite{mackay2005}.}
1212\label{gridbased}
1213\end{figure}
1214
1215In contrast to systems that build features from the precise locations
1216of a small number of stars, Padgett and Kreutz-Delgado
1217\cite{padgett1997} present an approach that incorporates information
1218from a large portion of the image.  Their grid-based feature is
1219defined by first choosing one star as the reference star to define the
1220center of the grid, then selecting its nearest neighbour (outside an
1221exclusion radius) to define the orientation of the grid.  Once the
1222center and orientation are determined, a grid is defined, and each
1223remaining stars in the image is assigned to a grid cell.  The feature
1224is a bit vector, one bit per grid cell, where the bit is set only if
1225the cell contains a star.  See \figref{gridbased}.
1226
1227
1228The index is constructed by scanning the sky and selecting, for each
1229field of view, the $n$ brightest stars.  For each of these stars a
1230feature is computed and stored.  At test time, they examine the
1231brightest $c n$ stars (for some safety factor $c$), computing the
1232feature for each one and searching the index for a match.  A pair of
1233features is defined to match if their dot product is above a
1234threshold.  This is equivalent to taking the bitwise \textsc{AND} of
1235the bit vectors and counting the number of bits that are set.  This
1236search can be implemented efficiently by using lookup tables: for each
1237bit, they maintain a list of the features for which that bit is set.
1238Note that this is equivalent to using a grid-based geometric hashing
1239approach: each grid cell is equivalent to a discretized relative
1240coordinate vector, which becomes a hash key, and the ``lookup table''
1241is a hash table.
1242
1243
1244After all the features in the test image have been extracted and
1245matched to the index, the algorithm proceeds to a voting (or perhaps
1246``consensus'') phase where it rejects matches that propose a field of
1247view that does not overlap that of the majority.
1248
1249
1250The system is designed to operate on images of diameter $8~\degrees$,
1251and performs well on simulated data.  They use a grid size of $40
1252\times 40$, and find that on average $25$ grid cells are filled.
1253Their index contains $13,000$ patterns, which means that false
1254positives are quite rare.  However, misidentification of the nearest
1255neighbour, or edge effects (assigning a star to the wrong grid cell
1256due to positional noise) mean that failures to find a match are not
1257uncommon, and are more likely in regions of the sky with high stellar
1258density.
1259
1260In later work, Clouse and Padgett \cite{clouse2000} extend this
1261approach by using Bayesian decision theory instead of a simple
1262threshold on the dot product to define how well features match.  This
1263extension, along with smaller noise levels in the (simulated) imaging
1264system, allows them to extend the approach down to fields of view
1265$2~\degrees$ in diameter.
1266
1267\comment{
1268  Given two features, they compute the dot product between the bit
1269  vectors, then proceed to estimate the probabilities that the match is
1270  a true positive and false positive.  These probability distributions
1271  are quite complex, so several simplifying approximations are made, and
1272  the remaining parameters are estimated by running simulations.
1273  }
1274
1275Harvey \cite{harvey2004} presents two different approaches, one
1276grid-based and the other shape-based.  The grid-based approach is
1277similar to the Padgett--Kreutz-Delgado and Clouse--Padgett approaches
1278\cite{padgett1997, clouse2000}.  A coarser grid is used, so more stars
1279are likely to appear in each bin.  To compensate, a grid cell is only
1280considered ``occupied'' if it contains more than some threshold number
1281of stars.  The other major change is that he expects a test image to
1282be an ``overexposure'' or ``underexposure'' relative to the reference
1283catalog.  This implies that the test image should contain either a
1284subset or a superset of the stars in the index, and therefore the
1285image feature vector must be either greater than or less than an index
1286feature at each bit.  This allows simple bit operations to be used to
1287find feature matches, and feature matching is performed by a linear
1288scan through the index.
1289
1290Harvey's shape-based approach uses constrained $n$-star constellations
1291in order to aggregate information from a large number of stars without
1292allowing the number of potential features to grow combinatorially.
1293Specifically, Harvey uses a pair of stars to define a narrow wedge in
1294the image, then describes the relative positions and angles of a fixed
1295number of nearby stars within that wedge.  Unfortunately, this makes
1296the feature highly sensitive to distractor and dropout stars, since
1297the feature depends on the stars in the feature being enumerated in a
1298particular order.  As a result, all potential features (allowing any
1299combination of stars to drop out) must be checked; the number of
1300features grows exponentially as the density of stars increases.
1301
1302
1303Harvey makes the useful observation that a cascade of indices can be
1304built, where each index is designed to solve images with a particular
1305range of scales.
1306
1307
1308MacKay and Roweis \cite{mackay2005} point out that a grid-based
1309feature such as that used by Harvey leads naturally to a hashing-based
1310strategy.  Each grid cell is associated with a bit that is turned on
1311if the cell is occupied.  This value is placed in a hash table with a
1312mapping back to its position on the sky.  Since false positives can
1313occur as a result of feature aliasing or hash collision, a voting
1314scheme is employed: several feature matches must accumulate before the
1315match is accepted.  Since false negatives can occur as a result of
1316dropouts and distractors (\emph{any} missing or extra star causes the
1317hash code to change completely), many features must be extracted for
1318each region of the sky.
1319
1320
1321\subsection{Fine-tuning astrometric calibrations}
1322
1323In order to ``co-add'' or ``stack'' astronomical images (combine
1324pixels from different images to produce a higher signal-to-noise
1325image), or to do proper-motion studies (measure the movement of stars
1326over time), it is necessary to fine-tune the astrometric calibration
1327of the images.  This is similar to the \emph{bundle adjustment}
1328problem in computer vision \cite{triggs2000}, in that it involves the
1329simultaneous optimization of the various camera and telescope
1330parameters and the estimated positions of objects in the world.
1331Fine-tuning astrometric calibrations is easier because it is
1332essentially two-dimensional, but more difficult because the camera
1333parameters include polynomial distortion terms to model the image
1334distortion introduced by telescope optics.
1335
1336
1337The software package \scamp by Bertin \cite{bertin2005} is a popular
1338tool used by astronomers to fine-tune simultaneously the astrometric
1339calibrations of a large collection of images.  For each image, it is
1340assumed that the center of the image is known to about 25\% of the
1341size of the image, and the scale is known to within a factor of two.
1342The histogram-alignment method of \cite{kaiser1999} is used to find
1343the translation, scaling, and rotation between the image and a
1344reference catalog, which allows the correspondences between image and
1345reference catalog stars to be determined.
1346
1347The core of the \scamp system is a chi-squared minimization of the
1348total weighted distance between the projected positions of all star
1349correspondences among the set of images and the reference catalog.
1350The parameters to be adjusted are the center, scale, rotation, and
1351polynomial distortion coefficients of each image.  A key feature is
1352the ability to share image distortion parameters among subsets of the
1353images, since images taken with the same telescope are expected to
1354share some distortion terms due to the telescope optics.  This reduces
1355the degrees of freedom of the fitting process, resulting in more
1356robust fits.  \scamp can fine-tune thousands of images to sub-pixel
1357accuracy.
1358
1359
1360\section{Summary}
1361
1362
1363The task of automatically finding the astrometric calibration of an
1364astronomical image---this is, recognizing the stars in the image, or
1365locating the image on the sky---is ideal for a geometric hashing-based
1366approach.  Geometric hashing (or more generally, geometric feature
1367matching) is a widely-applicable technique for problems in which it is
1368desirable to build distinctive geometric features out of simple
1369interest-point features.  The approach is ideally suited to the blind
1370astrometric calibration problem, since images of celestial objects can
1371be localized to sub-pixel accuracy but have few other distinguishing
1372features.  Indeed, much of the previous work in blind and non-blind
1373astrometric calibration can be placed within a geometric feature
1374matching framework.
1375
1376
1377\comment{
1378  Although one might suppose that hashing is an intrinsic part of
1379  geometric hashing, we have found that a tree structure is more
1380  effective for the large-scale feature matching we want to perform; we
1381  feel that deciding to use approximate feature matching should be a
1382  design decision, not an intrinsic part of the method: in cases where
1383  each image contains only a few features that could potentially be
1384  found in the index, we do not want to risk missing them due to ``bad
1385  luck'' in a hash function.
1386}
1387
1388
Note: See TracBrowser for help on using the repository browser.