| 1 | |
|---|
| 2 | \section{Preface} |
|---|
| 3 | |
|---|
| 4 | This thesis describes some of the research carried out by me and my |
|---|
| 5 | colleagues in the \an group. \an started as a collaboration between |
|---|
| 6 | Sam Roweis, a Toronto computer scientist, and David W.~Hogg, a New |
|---|
| 7 | York University astronomer. An idea that began as a crazy scrawl on |
|---|
| 8 | the back of a napkin---probably in a bar---eventually grew into a real |
|---|
| 9 | project and began attracting collaborators and students. |
|---|
| 10 | Fast-forwarding a few years, we have achieved a rare feat: a computer |
|---|
| 11 | vision software system that \emph{works}, with little human |
|---|
| 12 | intervention, and is being used by hundreds of real astronomers doing |
|---|
| 13 | real research. We have also branched out and explored a number of |
|---|
| 14 | other promising areas where ideas in computer vision and machine |
|---|
| 15 | learning can be applied to astronomical data to great benefit. |
|---|
| 16 | |
|---|
| 17 | |
|---|
| 18 | This thesis is quite blatantly a collection of manuscripts. I have |
|---|
| 19 | done a bit more than staple them together to produce the chapters of |
|---|
| 20 | this thesis, but the original tone and style of each manuscript still |
|---|
| 21 | shines through. I hope my readers will find the chapter-to-chapter |
|---|
| 22 | variations a refreshing, rather than jarring, change of pace. One |
|---|
| 23 | advantage of collecting manuscripts into a thesis is that each chapter |
|---|
| 24 | should largely stand alone. Another advantage is that each manuscript |
|---|
| 25 | was prepared with a list of authors, so it is simpler to identify the |
|---|
| 26 | collaborators who have contributed to each chapter. Each chapter has |
|---|
| 27 | a note detailing the authors of the original manuscript and explaining |
|---|
| 28 | my personal contribution to the work described. |
|---|
| 29 | |
|---|
| 30 | |
|---|
| 31 | \section{Introduction} |
|---|
| 32 | |
|---|
| 33 | |
|---|
| 34 | The general idea of the \an collaboration is to apply ideas in |
|---|
| 35 | computer vision and machine learning to problems and data in |
|---|
| 36 | astronomy. Astronomical images are a good domain in which to explore |
|---|
| 37 | ideas in computer vision, because the problems to be solved are often |
|---|
| 38 | much more constrained than in general imaging, the telescopes and |
|---|
| 39 | cameras are often well-understood and calibrated, large volumes of |
|---|
| 40 | data and ``ground truth'' information exist, and with new telescopes |
|---|
| 41 | constantly coming online and producing greater and greater volumes of |
|---|
| 42 | imaging, there is a need for sophisticated computation. I briefly |
|---|
| 43 | outline astronomical imaging for computer vision researchers in |
|---|
| 44 | \secref{sec:astrointro}. |
|---|
| 45 | |
|---|
| 46 | |
|---|
| 47 | The specific idea addressed in this thesis is that pattern recognition |
|---|
| 48 | approaches can be applied to astronomical images. The goal is to |
|---|
| 49 | produce a system that is able to recognize automatically the stars in |
|---|
| 50 | any image of the night sky, using the information in the image pixels |
|---|
| 51 | alone. Recognizing the stars in an image is equivalent to finding the |
|---|
| 52 | pointing, scale, and rotation of the image on the sky in a standard |
|---|
| 53 | coordinate system. The branch of astronomy concerned with measuring |
|---|
| 54 | the positions and motions of celestial bodies is called |
|---|
| 55 | \emph{astrometry}, and the task of placing a new image in a standard |
|---|
| 56 | reference frame is known as ``solving the astrometry'' or |
|---|
| 57 | ``calibrating the astrometry'' of the image. The \emph{blind |
|---|
| 58 | astrometric calibration} task is to calibrate an image using only the |
|---|
| 59 | image itself. The broad goal of the \an project was to build a system |
|---|
| 60 | that would allow us to create correct, standards-compliant astrometric |
|---|
| 61 | \metadata for every useful astronomical image ever taken, past and |
|---|
| 62 | future, in any state of archival disarray. This is part of a larger |
|---|
| 63 | effort to organize, annotate and make searchable all the world's |
|---|
| 64 | astronomical information. |
|---|
| 65 | |
|---|
| 66 | |
|---|
| 67 | After a brief introduction to astronomical imaging for computer |
|---|
| 68 | scientists, the remainder of this chapter reviews the area of pattern |
|---|
| 69 | recognition, and in particular the framework of \emph{geometric |
|---|
| 70 | hashing} for object recognition in images. One of the contributions |
|---|
| 71 | of this thesis is to replace the simple hash table used in traditional |
|---|
| 72 | geometric hashing with a \kdtree, so I review related work in hashing |
|---|
| 73 | and other approaches for fast feature matching. Finally, I present |
|---|
| 74 | the blind astrometric calibration task as an instance of object |
|---|
| 75 | recognition, and review some previous approaches to the problem. |
|---|
| 76 | |
|---|
| 77 | %% Our approach? |
|---|
| 78 | |
|---|
| 79 | |
|---|
| 80 | The remaining chapters present our approach to the blind astrometric |
|---|
| 81 | calibration. \Chapref{chap:techreport} explains our approach and |
|---|
| 82 | presents the results of large-scale tests of the system on real-world |
|---|
| 83 | data. Chapters \ref{chap:verify} and \ref{chap:kdtree} delve into |
|---|
| 84 | details of the approach: \Chapref{chap:verify} presents the Bayesian |
|---|
| 85 | decision theory problem that lies at the heart of our approach, while |
|---|
| 86 | \chapref{chap:kdtree} explains the technical details of the \kdtree |
|---|
| 87 | data structure implementation that is key to making our system |
|---|
| 88 | performant enough to be a practical tool. \Chapref{chap:conclusion} |
|---|
| 89 | summarizes our results. |
|---|
| 90 | |
|---|
| 91 | |
|---|
| 92 | \section{Astronomical imaging for computer scientists} |
|---|
| 93 | \label{sec:astrointro} |
|---|
| 94 | |
|---|
| 95 | Astronomical images are very different from images produced by typical |
|---|
| 96 | consumer-grade cameras. This section is intended to give |
|---|
| 97 | non-astronomers a sense of the typical parameters of contemporary |
|---|
| 98 | astronomical imaging setups, and introduce some of the terminology and |
|---|
| 99 | key ideas. |
|---|
| 100 | |
|---|
| 101 | |
|---|
| 102 | The Sloan Digital Sky Survey (SDSS) \cite{sdsstechnical, sdsscamera} |
|---|
| 103 | is one of the most important projects in contemporary astronomy. The |
|---|
| 104 | primary imaging survey was carried out from 2000 through 2008, so the |
|---|
| 105 | hardware is no longer cutting-edge, but it is still very impressive. |
|---|
| 106 | The telescope's primary mirror is $2.5$ meters in diameter. Incoming |
|---|
| 107 | light first strikes the primary mirror, then a one-meter secondary |
|---|
| 108 | mirror, then passes through two corrective lenses in front of the |
|---|
| 109 | camera. The nearly distortion-free field of view is about $3$ degrees |
|---|
| 110 | wide. The camera is composed of $30$ charge-coupled devices (CCDs), |
|---|
| 111 | each $2048\times2048$ pixels, arranged in six columns. The five CCDs |
|---|
| 112 | in each column have different bandpass filters, ranging from the |
|---|
| 113 | near-infrared ($913~\nanometers$) through the optical to the |
|---|
| 114 | near-ultraviolet ($354~\nanometers$). The CCDs and associated |
|---|
| 115 | electronics are cooled with liquid nitrogen to $-80^{\circ} \unit{C}$ |
|---|
| 116 | and held under vacuum. Each CCD is about $5~\unit{cm}$ square, and |
|---|
| 117 | the whole focal plane of the camera is about $65~\unit{cm}$ in |
|---|
| 118 | diameter. The camera assembly has a mass over $300~\unit{kg}$. |
|---|
| 119 | |
|---|
| 120 | |
|---|
| 121 | The survey operates in ``drift-scanning'' mode: the telescope is |
|---|
| 122 | pointed in a nearly fixed direction, and as the Earth rotates the sky |
|---|
| 123 | drifts past the camera. The camera electronics shift the electrons |
|---|
| 124 | along the CCD columns at the same rate. As an object drifts through |
|---|
| 125 | the field of view, the electrons it produces in the CCD pixels march |
|---|
| 126 | along with it, and are read out as it disappears from view. In the |
|---|
| 127 | SDSS camera, the object drifts past each of the five bandpass-filtered |
|---|
| 128 | CCDs of the camera in turn, producing a five-color image. Over the |
|---|
| 129 | course of a night of observing, the camera sweeps out a great circle, |
|---|
| 130 | creating six columns of five-band images that are $2048$ pixels wide |
|---|
| 131 | and over a million pixels long: over $120$ gigabytes of pixels per |
|---|
| 132 | night. This clever drift-scanning scheme has many advantages, one |
|---|
| 133 | being that the readout rate of the CCDs can be relatively slow, |
|---|
| 134 | resulting in low readout noise. Calibration of the camera is simpler |
|---|
| 135 | because charge is integrated over columns of the CCD: it is only |
|---|
| 136 | necessary to calibrate whole columns rather than individual pixels. |
|---|
| 137 | Note that for this scheme to work, the CCDs must be very carefully |
|---|
| 138 | aligned in the focal plane, and the telescope control system must be |
|---|
| 139 | capable of keeping the system pointed stably at the level of a pixel |
|---|
| 140 | or better. The moving mass of the telescope is over |
|---|
| 141 | $15,000~\unit{kg}$, so this is no mean feat! |
|---|
| 142 | |
|---|
| 143 | % 24 um pixels |
|---|
| 144 | |
|---|
| 145 | Astronomical imaging systems are designed so that the point-spread |
|---|
| 146 | function (PSF)---the image on the CCD of a distant point source of |
|---|
| 147 | light---is spread over several pixels. This allows the position of a |
|---|
| 148 | point source to be measured to sub-pixel accuracy by fitting a model |
|---|
| 149 | of the PSF to the observed pixel values. If the system focused a |
|---|
| 150 | point source onto a single pixel, this would be impossible. The goal |
|---|
| 151 | is typically for the full-width at half-maximum (FWHM) of the PSF to |
|---|
| 152 | be a few pixels, so that the PSF is well-sampled but the signal is not |
|---|
| 153 | spread out over too many pixels. |
|---|
| 154 | |
|---|
| 155 | For ground-based telescopes, the moving atmosphere causes the image of |
|---|
| 156 | a distant point source of light to dance around on the CCD; in typical |
|---|
| 157 | exposures of many seconds this ``speckling'' is integrated into a |
|---|
| 158 | point-spread function. The size of this PSF is called the ``seeing'' |
|---|
| 159 | and a value of one $\unit{arcsecond}$ FWHM is considered quite good. |
|---|
| 160 | The SDSS camera has pixels of size $0.396~\unit{arcseconds}$, and the |
|---|
| 161 | point-spread function is dominated by the seeing: the optics of the |
|---|
| 162 | system are good enough that the images are as clear as they can be |
|---|
| 163 | given the atmosphere. |
|---|
| 164 | |
|---|
| 165 | |
|---|
| 166 | The positions of astronomical objects are usually measured in an |
|---|
| 167 | \emph{equatorial coordinate system}: essentially the projection of the |
|---|
| 168 | Earth's latitudes and longitudes onto the celestial sphere at a |
|---|
| 169 | specific instant. The latitudinal direction is called ``declination'' |
|---|
| 170 | and abbreviated ``Dec'', and the longitudinal direction is called |
|---|
| 171 | ``right ascension'' and abbreviated ``RA''. Declination is typically |
|---|
| 172 | measured in degrees from $-90$ to $+90$, and right ascension is |
|---|
| 173 | typically measured in degrees from $0$ to $360$, or hours from $0$ to |
|---|
| 174 | $24$. There are $60~\unit{arcminutes}$ per degree and |
|---|
| 175 | $60~\unit{arcseconds}$ per $\unit{arcminute}$. For reference, the |
|---|
| 176 | Moon as seen from Earth is about half a degree in diameter. The |
|---|
| 177 | $0.396~\unit{arcsecond}$ pixels of the SDSS camera view an area the |
|---|
| 178 | size of a penny at a distance of ten kilometers; each $2048\times2048$ |
|---|
| 179 | pixel CCD views an area of about one-millionth of the sky. |
|---|
| 180 | |
|---|
| 181 | |
|---|
| 182 | The SDSS is very impressive, but the next generation of telescopes |
|---|
| 183 | currently in development dwarf its specifications. The Large Synoptic |
|---|
| 184 | Survey Telescope (LSST) \cite{lsst}, for example, has an |
|---|
| 185 | $8.4~\unit{meter}$ primary mirror, with a $3.5~\unit{degree}$ field of |
|---|
| 186 | view almost completely covered by an array of nearly $200$ CCDs, each |
|---|
| 187 | with $4096\times4096$ pixels: a total of about $3.2$ gigapixels. The |
|---|
| 188 | camera is the size of a small car. It will take exposures of about |
|---|
| 189 | $15$ seconds, yielding over $10$ terabytes of data per night. |
|---|
| 190 | |
|---|
| 191 | % -- basics of data reduction; catalogs |
|---|
| 192 | % -- stars, galaxies, and other stuff |
|---|
| 193 | |
|---|
| 194 | |
|---|
| 195 | |
|---|
| 196 | |
|---|
| 197 | \section{Pattern recognition} |
|---|
| 198 | |
|---|
| 199 | \emph{Pattern recognition} encompasses a huge class of problems and |
|---|
| 200 | techniques in computer vision and machine learning. Many pattern |
|---|
| 201 | recognition problems are of the type ``identify this object'' or |
|---|
| 202 | ``find things like this'': the system is first told about a large set |
|---|
| 203 | of objects, then is presented with a novel object and is asked to |
|---|
| 204 | describe it or find similar objects from the set of known objects. |
|---|
| 205 | |
|---|
| 206 | Pattern recognition systems often take a \emph{feature matching} |
|---|
| 207 | approach. A \emph{feature} is a compact description of |
|---|
| 208 | (problem-specific) relevant, invariant properties of an object, such |
|---|
| 209 | that objects that are similar in the problem domain have similar |
|---|
| 210 | features. Defining how features are extracted from objects and |
|---|
| 211 | defining the similarity between features are key factors in the |
|---|
| 212 | success of such systems. Often the representation of a feature (the |
|---|
| 213 | \emph{feature descriptor}) is a point in a real coordinate space and a |
|---|
| 214 | distance metric defines the similarity between features. |
|---|
| 215 | |
|---|
| 216 | Feature-based approaches typically involve two phases: an |
|---|
| 217 | \emph{indexing} phase, in which the set of known objects is shown to |
|---|
| 218 | the system, and a \emph{test} phase, in which novel objects are |
|---|
| 219 | presented. During indexing, features are extracted from the set of |
|---|
| 220 | known objects. At test time, features are extracted from the novel |
|---|
| 221 | object and compared with the set of features from the known objects. |
|---|
| 222 | In many domains, the set of known objects is very large, so it is |
|---|
| 223 | critical that features be stored in a manner that allows the system to |
|---|
| 224 | rapidly find known features that match a given feature. |
|---|
| 225 | |
|---|
| 226 | Often feature-based approaches suffer \emph{feature aliasing} |
|---|
| 227 | problems: two objects that are dissimilar in the problem domain may |
|---|
| 228 | yield similar features. In this case, systems may use \emph{voting} |
|---|
| 229 | or \emph{verification} schemes to ensure that aliased (false) feature |
|---|
| 230 | matches are eliminated. Voting schemes typically assume that false |
|---|
| 231 | feature matches are independently randomly distributed, so the |
|---|
| 232 | probability of matching several features to the same incorrect object |
|---|
| 233 | is the product of the individual probabilities; if we demand a |
|---|
| 234 | sufficient number of matches, the probability of a false match becomes |
|---|
| 235 | small. Verification or \emph{hypothesize-test} schemes treat a |
|---|
| 236 | feature match as a hypothesis that the novel object is similar to a |
|---|
| 237 | known object, and then some test to decide if the hypothesis is |
|---|
| 238 | correct. Such schemes can be seen as using feature matching as a |
|---|
| 239 | search heuristic. |
|---|
| 240 | |
|---|
| 241 | \section{Visual pattern recognition} |
|---|
| 242 | |
|---|
| 243 | Pattern recognition in images is a particularly challenging and |
|---|
| 244 | complex problem domain which has been studied extensively. Objects to |
|---|
| 245 | recognize include faces, fingerprints, specific physical objects, or |
|---|
| 246 | even general classes of physical objects such as mugs, chairs, or |
|---|
| 247 | elephants. |
|---|
| 248 | |
|---|
| 249 | |
|---|
| 250 | Visual pattern recognition of general physical (3-dimensional) objects |
|---|
| 251 | is difficult for many reasons: objects may be deformable, can appear |
|---|
| 252 | different when viewed from different angles or under different |
|---|
| 253 | lighting conditions, and can be occluded by other objects. Features |
|---|
| 254 | used in visual pattern recognition include edges (lines and curves), |
|---|
| 255 | corners, or the color or texture of image patches. |
|---|
| 256 | |
|---|
| 257 | |
|---|
| 258 | An important consideration when designing features for visual pattern |
|---|
| 259 | recognition is the size (\ie, spatial extent in the image) of the |
|---|
| 260 | feature. \emph{Local features} gather information from a small region |
|---|
| 261 | of the image, while \emph{global features} incorporate information |
|---|
| 262 | from the whole image. Local features can provide robustness to |
|---|
| 263 | occlusion and deformation: if part of the object is hidden, features |
|---|
| 264 | belonging to that part will be lost, but the features belonging to the |
|---|
| 265 | remaining parts will still be found. Since local features aggregate |
|---|
| 266 | information from only a small portion of the image, they may be more |
|---|
| 267 | noisy or less distinctive than global features. |
|---|
| 268 | |
|---|
| 269 | |
|---|
| 270 | Since a single local feature may be insufficient to guarantee that an |
|---|
| 271 | object has been correctly recognized, some systems build high-level |
|---|
| 272 | features by combining multiple low-level features. When the objects |
|---|
| 273 | to be recognized are rigid or articulated (composed of multiple rigid |
|---|
| 274 | parts connected by simple joints), a common approach is to build |
|---|
| 275 | features that describe the relative geometry of low-level features. |
|---|
| 276 | This use of \emph{geometric features} is often called \emph{geometric |
|---|
| 277 | hashing}, since the result is a feature descriptor or \emph{hash code} |
|---|
| 278 | which describes the geometric arrangement of the low-level features. |
|---|
| 279 | |
|---|
| 280 | |
|---|
| 281 | |
|---|
| 282 | \section{The geometric hashing framework} |
|---|
| 283 | |
|---|
| 284 | \begin{figure} |
|---|
| 285 | \begin{center} |
|---|
| 286 | % dot lamdan.dot -Tps2 -o lamdan.ps |
|---|
| 287 | % ps2pdf -sPAPERSIZE=a4 lamdan.ps lamdan.pdf |
|---|
| 288 | \includegraphics[height=0.8\textheight]{lamdan} |
|---|
| 289 | \end{center} |
|---|
| 290 | \caption{Outline of the Geometric Hashing scheme. This diagram is a |
|---|
| 291 | reproduction of \fig 2 in Lamdan \etal \cite{lamdan1990}. |
|---|
| 292 | \label{lamdan}} |
|---|
| 293 | \end{figure} |
|---|
| 294 | |
|---|
| 295 | |
|---|
| 296 | Lamdan \etal \cite{lamdan1990} and Wolfson and Rigoutsos |
|---|
| 297 | \cite{wolfson1997} give an excellent overview of the geometric hashing |
|---|
| 298 | approach. See Figure \ref{lamdan} for a block-diagram overview. |
|---|
| 299 | Geometric hashing is based on the fact that when rigid objects undergo |
|---|
| 300 | similarity transformations (translation, rotation, and scaling), |
|---|
| 301 | angles between points on the object and relative distances between |
|---|
| 302 | points do not change. Therefore, if we describe the relative |
|---|
| 303 | arrangement of points on an object in terms of these invariants, our |
|---|
| 304 | description will be invariant under these transformations of the |
|---|
| 305 | object. Invariant geometric properties can be found for other types |
|---|
| 306 | of transformations such as affine or perspective projection. |
|---|
| 307 | |
|---|
| 308 | % \cite{affineinvariant} |
|---|
| 309 | |
|---|
| 310 | |
|---|
| 311 | Geometric hashing builds \emph{geometric features} out of lower-level |
|---|
| 312 | features. We will call the low-level features \emph{interest points} |
|---|
| 313 | for clarity. |
|---|
| 314 | |
|---|
| 315 | Given the types of transformations the system must handle, there is |
|---|
| 316 | some sufficient number of interest points required to define an |
|---|
| 317 | invariant local coordinate system; these are called the \emph{basis |
|---|
| 318 | set}. For example, for 2-D objects that can undergo similarity |
|---|
| 319 | transformations, two interest points can be used to define a |
|---|
| 320 | coordinate system. For 3-D objects with similarity transformations, |
|---|
| 321 | three (non-collinear) interest points are required. |
|---|
| 322 | |
|---|
| 323 | Geometric hashing follows an indexing approach. During the indexing |
|---|
| 324 | phase, the system preprocesses a database of objects which are to be |
|---|
| 325 | recognized. For each object, many interest points are extracted which |
|---|
| 326 | are invariant under the transformations that are expected, and can be |
|---|
| 327 | well localized. The system then enumerates all basis sets by looking |
|---|
| 328 | at all combinations of interest points; each set of basis features |
|---|
| 329 | defines a relative coordinate system. For each basis set, all |
|---|
| 330 | remaining interest points and enumerated and their positions described |
|---|
| 331 | in this coordinate system. Each relative coordinate vector forms a |
|---|
| 332 | \emph{geometric feature descriptor}. In this way, the relative |
|---|
| 333 | geometric arrangement of a set of interest points is converted into a |
|---|
| 334 | point in a vector space. |
|---|
| 335 | |
|---|
| 336 | % --- consider adding a variant of the quad fig here. |
|---|
| 337 | |
|---|
| 338 | In the standard geometric hashing approach, the geometric feature |
|---|
| 339 | descriptor is discretized and encoded as a hash table key. A grid is |
|---|
| 340 | place over the feature descriptor space, and the hash table key of a |
|---|
| 341 | geometric feature is the identifier of the grid cell containing it. |
|---|
| 342 | Each entry in the hash table maps back to the basis set plus the |
|---|
| 343 | feature whose position is encoded. |
|---|
| 344 | |
|---|
| 345 | % --- picture? |
|---|
| 346 | |
|---|
| 347 | Given a new image, a geometric hashing system extracts all interest |
|---|
| 348 | points from the image and, as at indexing time, proceeds to enumerate |
|---|
| 349 | all basis sets and computes the positions of the remaining interest |
|---|
| 350 | points in each basis set. This generates many geometric features |
|---|
| 351 | whose descriptors are discretized and converted to hash table keys, as |
|---|
| 352 | before. Similar arrangements of interest points will lead to similar |
|---|
| 353 | geometric feature descriptors, which--with luck---will be mapped to |
|---|
| 354 | the same hash table key. The hash table values associated with each |
|---|
| 355 | key are retrieved, and each value that is found is considered a match. |
|---|
| 356 | Each match votes for the correspondence between the interest points |
|---|
| 357 | from the database and the interest points from the image that were |
|---|
| 358 | used to define the geometric feature. |
|---|
| 359 | |
|---|
| 360 | |
|---|
| 361 | After all features from the image have been processed, many votes will |
|---|
| 362 | have accumulated. False matches (due to aliasing or hash table |
|---|
| 363 | collisions) should be uniformly distributed over the set of known |
|---|
| 364 | objects, so should not generate very many votes for any particular |
|---|
| 365 | object. Objects that are actually present in the image should |
|---|
| 366 | generate many votes. Objects that receive an insufficient number of |
|---|
| 367 | votes are rejected, and the remainder are subjected to a verification |
|---|
| 368 | process. |
|---|
| 369 | |
|---|
| 370 | |
|---|
| 371 | The verification process requires computing a transformation from the |
|---|
| 372 | object to image coordinates, using the corresponding interest points |
|---|
| 373 | from the known object and the image. Using this transformation, all |
|---|
| 374 | the interest points belonging to the known object are projected into |
|---|
| 375 | image coordinates. If the match is correct, we expect the image to |
|---|
| 376 | contain features at these locations. Some reasearchers add an extra |
|---|
| 377 | verification stage by projecting the known object into the image and |
|---|
| 378 | checking that the predicted edges of the object are found in the image |
|---|
| 379 | (eg, \cite{lamdan1990, huttenlocher1990}). |
|---|
| 380 | |
|---|
| 381 | |
|---|
| 382 | The geometric hashing approach is flexible and applicable to many |
|---|
| 383 | problems, but there are some issues with the basic version described |
|---|
| 384 | here. The hash function described above simply places a grid over the |
|---|
| 385 | relative coordinate system and returns the identifier of the grid cell |
|---|
| 386 | in which the interest point lands. This is prone to edge effects: if |
|---|
| 387 | an interest point is near the edge of a grid cell, a small change in |
|---|
| 388 | its position due to noise can move it across the edge into a different |
|---|
| 389 | grid cell, which causes the hash code to change and a different set of |
|---|
| 390 | feature matches to be found. This problem can be overcome by |
|---|
| 391 | detecting interest points that are near the edges of their cells and |
|---|
| 392 | also searching the hash table using the hash codes of neighbouring |
|---|
| 393 | cells. This results in more false feature matches, and can become |
|---|
| 394 | expensive as the dimensionality of the geometric feature space |
|---|
| 395 | increases (there are more edges, and the proportion of the hypervolume |
|---|
| 396 | of feature space that is near an edge increases). Using a data |
|---|
| 397 | structure other than the hash table may be beneficial. |
|---|
| 398 | |
|---|
| 399 | |
|---|
| 400 | Another issue with uniform grid-based hashing is that it assigns equal |
|---|
| 401 | importance (and storage space) to each grid cell in the feature space. |
|---|
| 402 | Consider an interest point that is far from the basis set that defines |
|---|
| 403 | its local coordinate system. Small positional errors in the basis set |
|---|
| 404 | can cause the coordinate system to be rotated or scaled, which results |
|---|
| 405 | in large changes in the interest point's relative coordinates. This |
|---|
| 406 | means that edge effects become more pronounced in such regions of the |
|---|
| 407 | feature space. One response to this concern is to increases the size |
|---|
| 408 | of the grid cells far from the basis set, but then if the interest |
|---|
| 409 | points are uniformly distributed, these larger hash table bins will |
|---|
| 410 | contain more features, so every match to these bins will be |
|---|
| 411 | accompanied by more false positives. |
|---|
| 412 | |
|---|
| 413 | |
|---|
| 414 | Finally, a hash table with equal-sized cells will only be uniformly |
|---|
| 415 | utilized if interest points are uniformly distributed in feature |
|---|
| 416 | space, but some domains may yield interest points that are far from |
|---|
| 417 | uniformly distributed. In other domains, it may be beneficial to |
|---|
| 418 | place constraints on the geometric features that are used (for |
|---|
| 419 | example, to avoid regions that are prone to large errors, or that are |
|---|
| 420 | relatively undistinctive). |
|---|
| 421 | |
|---|
| 422 | |
|---|
| 423 | There is nothing in the geometric hashing framework that requires a |
|---|
| 424 | standard hash table to be used for feature matching. Other methods of |
|---|
| 425 | feature matching are described in the next section. |
|---|
| 426 | |
|---|
| 427 | |
|---|
| 428 | \section{Related work in fast feature matching} |
|---|
| 429 | |
|---|
| 430 | |
|---|
| 431 | For feature matching in a geometric hashing framework, the database of |
|---|
| 432 | known objects is converted into a set of points in a feature |
|---|
| 433 | descriptor space, which is a vector space. To find matches to a new |
|---|
| 434 | feature (extracted from an image in which we want to recognize |
|---|
| 435 | objects), we need to find all points in the vector space that are |
|---|
| 436 | within some tolerance. Usually the Euclidean distance metric is |
|---|
| 437 | applied, so the tolerance is a radius in the vector space. |
|---|
| 438 | |
|---|
| 439 | |
|---|
| 440 | \subsection{Bloom filters} |
|---|
| 441 | |
|---|
| 442 | A Bloom filter \cite{bloom1970} is a data structure that can answer |
|---|
| 443 | queries of the form ``is feature $q$ a member of the set of known |
|---|
| 444 | features?'' A Bloom filter is a set of $n$ bits, initially zero. A |
|---|
| 445 | set of $k$ hash functions must be defined which take a feature as |
|---|
| 446 | input and produce an integer in $[0, n)$ as output. In the indexing |
|---|
| 447 | phase, we examine each of the known features, and apply each of the |
|---|
| 448 | hash functions, producing $k$ hash codes. Each code is used to |
|---|
| 449 | reference a bit and \emph{set} it (\ie, turn it on). Each known |
|---|
| 450 | feature results in $k$ bits being set; several features may request |
|---|
| 451 | that a particular bit be set. |
|---|
| 452 | |
|---|
| 453 | |
|---|
| 454 | Given a new feature to match, we run the $k$ hash functions on it and |
|---|
| 455 | test whether each of the resulting bits are set. If the feature is |
|---|
| 456 | equal to a feature seen at indexing time (according to the hash |
|---|
| 457 | functions), then all $k$ of the bits will have been set; thus the |
|---|
| 458 | Bloom filter does not produce false negatives. If the query feature |
|---|
| 459 | is not equal to a known feature, then some of the bits may have been |
|---|
| 460 | set due to \emph{collision} of the hash functions, but the feature |
|---|
| 461 | will only be accepted if all $k$ hash functions result in collision; |
|---|
| 462 | this results in a false positive. The rate of false positives |
|---|
| 463 | increases as the number of known features increases relative to the |
|---|
| 464 | number of bits in the filter. The Bloom filter is ideally suited to |
|---|
| 465 | cases where the number of known features is small, and a verification |
|---|
| 466 | step can be applied to eliminate any false positives that are |
|---|
| 467 | produced. |
|---|
| 468 | |
|---|
| 469 | |
|---|
| 470 | For feature matching, we want to know not only whether the query |
|---|
| 471 | feature $q$ matches a known feature, we want to know \emph{which} |
|---|
| 472 | known feature matches. One way to answer such queries is to use a |
|---|
| 473 | \emph{Bloomier filter} \cite{chazelle2004}. The simplest Bloomier |
|---|
| 474 | filter uses multiple Bloom filters to categorize a given feature into |
|---|
| 475 | two different categories, and uses slightly more space than two Bloom |
|---|
| 476 | filters. A set of $n$ such Bloomier filters can be used to categorize |
|---|
| 477 | a feature into $2^n$ different categories by assigning one Bloomier |
|---|
| 478 | filter to each bit of the result. |
|---|
| 479 | |
|---|
| 480 | |
|---|
| 481 | The standard Bloom filter only finds exact matches, but we want to |
|---|
| 482 | find \emph{nearby} matches. An extension of Bloom filters to allow |
|---|
| 483 | proximity searches is given by Kirsch and Mitzenmacher |
|---|
| 484 | \cite{kirsch2006b}. This variant uses \emph{locality sensitive |
|---|
| 485 | hashing} (LSH, discussed below) as its hashing function, so it |
|---|
| 486 | inherits some limitations from LSH. Most importantly, LSH only |
|---|
| 487 | guarantees with some probability that nearby features will hash to the |
|---|
| 488 | same bin. As a result, the locality-sensitive Bloom filter can |
|---|
| 489 | produce false negatives as well as false positives. |
|---|
| 490 | |
|---|
| 491 | |
|---|
| 492 | In order to answer feature matching queries (find all known features |
|---|
| 493 | near a given query feature), one could combine these Bloom filter |
|---|
| 494 | extensions to build a locality-sensitive Bloomier filter. However, |
|---|
| 495 | the false negative rate of the locality-sensitive filter would be |
|---|
| 496 | magnified because the Bloomier filter requires \mbox{$2 |
|---|
| 497 | \ceil{\log(n)}$} separate filters to represent $n$ known features, and |
|---|
| 498 | all of these filters must produce the correct answer. It seems that |
|---|
| 499 | Bloom filters are not particularly well suited to this task. |
|---|
| 500 | |
|---|
| 501 | |
|---|
| 502 | \subsection{Locality Sensitive Hashing} |
|---|
| 503 | |
|---|
| 504 | Locality Sensitive Hashing (LSH) \cite{datar2004, gionis1999, |
|---|
| 505 | andoni2006}, is a technique for answering approximate nearest |
|---|
| 506 | neighbour queries: it returns a feature whose distance is within a |
|---|
| 507 | constant factor of the distance to the nearest neighbour. The LSH |
|---|
| 508 | scheme can be extended to handle exact nearest-neighbour queries |
|---|
| 509 | efficiently if the set of known features satisfies some \emph{bounded |
|---|
| 510 | growth} criteria. It is intended for use in high-dimensional spaces |
|---|
| 511 | (tens to thousands of dimensions), and $\ell_p$ distance metrics for |
|---|
| 512 | $p \in (0, 2]$. |
|---|
| 513 | |
|---|
| 514 | |
|---|
| 515 | The core idea of LSH is to use several hash functions with the |
|---|
| 516 | property that the probability of collision---features being placed in |
|---|
| 517 | the same bin---is much higher for nearby features than for distant |
|---|
| 518 | ones. In the indexing stage, we populate the hash table by running |
|---|
| 519 | each hash function on each known feature. To perform a query, we run |
|---|
| 520 | the hash functions on the query feature and retrieve the features in |
|---|
| 521 | the same hash table bin as the query. One of these features is likely |
|---|
| 522 | to be an approximate nearest neighbour of the query. |
|---|
| 523 | |
|---|
| 524 | |
|---|
| 525 | In practice, in \cite{datar2004} each hash function is composed of |
|---|
| 526 | several (typically $10$) simpler hash functions. Each simple hash |
|---|
| 527 | function performs a random projection in the feature space: the hash |
|---|
| 528 | value is the discretized value of the dot product of the feature |
|---|
| 529 | vector with the hash function's random vector. Combining several of |
|---|
| 530 | these simple hash functions is equivalent to projecting the feature |
|---|
| 531 | vectors onto a grid in a $10$-dimensional, non-orthogonal, subspace. |
|---|
| 532 | The hash code is simply the identifier of the grid cell in which the |
|---|
| 533 | feature falls. |
|---|
| 534 | |
|---|
| 535 | |
|---|
| 536 | This scheme suffers from edge effects: a query feature may land in a |
|---|
| 537 | bin other than the bin containing its nearest neighbour due to small |
|---|
| 538 | positional noise in any of the dimensions. Attempting to reduce this |
|---|
| 539 | effect by making the bin size (discretization level) larger fails |
|---|
| 540 | because the number of hash functions must be increased to maintain the |
|---|
| 541 | same total number of hash bins. |
|---|
| 542 | |
|---|
| 543 | |
|---|
| 544 | In order to reduce the number of false negatives, we can choose many |
|---|
| 545 | (typically $30$) different hash functions and place a reference to |
|---|
| 546 | each feature in the hash table bin chosen by each function. This |
|---|
| 547 | increases the probability that at least one of the hash functions will |
|---|
| 548 | select a $10$-dimensional subspace in which the query is close to its |
|---|
| 549 | nearest neighbour. |
|---|
| 550 | |
|---|
| 551 | |
|---|
| 552 | The standard LSH scheme does not seem particularly well suited for the |
|---|
| 553 | purpose of feature matching in a geometric hashing system, since the |
|---|
| 554 | dimensionality of the feature vectors is usually moderate ($2$ to $6$ |
|---|
| 555 | in the \an system). Since some feature aliasing is expected in |
|---|
| 556 | geometric hashing systems, the nearest neighbour may not be the |
|---|
| 557 | correct match: we would prefer to get all neighbours within a given |
|---|
| 558 | search radius. |
|---|
| 559 | |
|---|
| 560 | |
|---|
| 561 | \subsection{\Kdtrees} |
|---|
| 562 | |
|---|
| 563 | \begin{figure} |
|---|
| 564 | \begin{center} |
|---|
| 565 | % tree-figs.py : plot_bboxes |
|---|
| 566 | \begin{tabular}{c@{\hspace{3pt}}c@{\hspace{3pt}}c} |
|---|
| 567 | \includegraphics[width=0.31\textwidth]{kdtree-0} & |
|---|
| 568 | \includegraphics[width=0.31\textwidth]{kdtree-1} & |
|---|
| 569 | \includegraphics[width=0.31\textwidth]{kdtree-2} \\ |
|---|
| 570 | \includegraphics[width=0.31\textwidth]{kdtree-3} & |
|---|
| 571 | \includegraphics[width=0.31\textwidth]{kdtree-4} & |
|---|
| 572 | \includegraphics[width=0.31\textwidth]{kdtree-5} |
|---|
| 573 | \end{tabular} |
|---|
| 574 | \end{center} |
|---|
| 575 | \caption[A \kdtree partitioning of space.]{A \kdtree partitioning of |
|---|
| 576 | space. Each panel shows the nodes at one level of the tree. The |
|---|
| 577 | points are 200 samples from a ring with uniformly-distributed |
|---|
| 578 | radius in $\left(0.6, 1\right)$ and uniformly-distributed angle in |
|---|
| 579 | $\left(0, \displaystyle{\frac{\pi}{2}}\right)$. Nodes are split |
|---|
| 580 | along the dimension with the largest range, and the splitting |
|---|
| 581 | position is chosen so that the two child nodes will contain an |
|---|
| 582 | equal number of points. Note how the tree adapts to the |
|---|
| 583 | distribution of the data: each node tightly bounds the data points |
|---|
| 584 | it owns. This is a slightly different version of the \kdtree than |
|---|
| 585 | Bentley's original description.\label{kdtree}} |
|---|
| 586 | \end{figure} |
|---|
| 587 | |
|---|
| 588 | The $k$-dimensional tree (\kdtree) was introduced by Bentley in 1975 |
|---|
| 589 | \cite{bentley1975}, and several extensions have been described |
|---|
| 590 | \cite{friedman1977, sproull1991}. The idea is to build a binary tree |
|---|
| 591 | over the feature space, where each node ``owns'' a region of the space |
|---|
| 592 | which is disjoint from the region owned by its sibling node. Each |
|---|
| 593 | non-leaf node has an axis-aligned splitting hyper-plane; its left |
|---|
| 594 | child owns the subspace to the left of the splitting plane, and the |
|---|
| 595 | right child the subspace to the right. In this way, the \kdtree |
|---|
| 596 | defines a hierarchical subdivision of the feature space into |
|---|
| 597 | axis-aligned hyper-rectangles. See \figref{kdtree}. |
|---|
| 598 | |
|---|
| 599 | |
|---|
| 600 | For the purposes of indexing the features in a geometric hashing |
|---|
| 601 | system, the set of features is known beforehand and is static, so we |
|---|
| 602 | do not need to insert or delete features from the tree, and Bentley's |
|---|
| 603 | \emph{optimized} \kdtree construction method can be used. That is, |
|---|
| 604 | the tree can be built to adapt to the particular set of features we |
|---|
| 605 | have. |
|---|
| 606 | |
|---|
| 607 | |
|---|
| 608 | Tree construction proceeds by first assigning all features to the root |
|---|
| 609 | node, then recursively splitting nodes until each leaf node owns at |
|---|
| 610 | most $L$ features, with $L$ perhaps $10$ or $20$. Splitting is done |
|---|
| 611 | by selecting a splitting plane and using it to partition the features. |
|---|
| 612 | Sproull \cite{sproull1991} enumerates several strategies for choosing |
|---|
| 613 | the splitting hyperplane. Traditionally, \kdtrees use axis-aligned |
|---|
| 614 | splitting hyperplanes, and the splitting dimension is chosen by simply |
|---|
| 615 | iterating through the dimensions (as in Bentley's original paper), or |
|---|
| 616 | by selecting the dimension with the largest \emph{range} or |
|---|
| 617 | \emph{variance}. One can also use an arbitrary splitting hyperplane, |
|---|
| 618 | for example by finding the principal eigenvector of the covariance |
|---|
| 619 | matrix of the feature vectors. Typically the splitting hyperplane is |
|---|
| 620 | positioned so that the two children own an equal number of data |
|---|
| 621 | points. This leads to a balanced (and therefore short) tree, but in |
|---|
| 622 | some applications it can be beneficial to bias the splitting location |
|---|
| 623 | \cite{macdonald1990}. |
|---|
| 624 | |
|---|
| 625 | |
|---|
| 626 | Depending on the application, it can be beneficial to store at each |
|---|
| 627 | node the tight axis-aligned bounding box (and other summary statistics |
|---|
| 628 | \cite{deng1995}) of the points owned by that node (as shown in |
|---|
| 629 | \figref{kdtree}). This is particularly useful where there is |
|---|
| 630 | significant correlation between the dimensions of the data, because |
|---|
| 631 | splitting the data points along one dimension implies that the |
|---|
| 632 | bounding box will shrink in other dimensions as well, and this can |
|---|
| 633 | result in faster searches. If the bounding boxes are stored, then it |
|---|
| 634 | is no longer strictly necessary to store the splitting hyperplane, |
|---|
| 635 | though it may be useful to speed up some computations. |
|---|
| 636 | |
|---|
| 637 | |
|---|
| 638 | \comment{ |
|---|
| 639 | \begin{figure} |
|---|
| 640 | \begin{center} |
|---|
| 641 | \includegraphics[width=\textwidth]{rangesearch-bb-fig} |
|---|
| 642 | \end{center} |
|---|
| 643 | \caption[\Kdtree range search with bounding boxes.]{\Kdtree range |
|---|
| 644 | search for a query point $q$ in \kdtree node $N$, for a \kdtree that |
|---|
| 645 | stores the bounding box of each node. We check whether the query |
|---|
| 646 | hypersphere overlaps the child nodes $L$ and $R$. In both cases $q$ |
|---|
| 647 | is contained in the left child node $L$ so we must recurse on $L$. In |
|---|
| 648 | the top panel, the distance \emph{in the splitting dimension} from $q$ |
|---|
| 649 | to the bounding box of node $R$ is less than the search radius $r_1$ |
|---|
| 650 | so we do not need to recurse on $R$. In the lower panel, the distance |
|---|
| 651 | in the search dimension to the query is larger than the search radius |
|---|
| 652 | $r_2$ so we must recurse on $R$.} |
|---|
| 653 | \label{rangesearch-bb} |
|---|
| 654 | \end{figure} |
|---|
| 655 | |
|---|
| 656 | |
|---|
| 657 | \begin{figure} |
|---|
| 658 | \begin{center} |
|---|
| 659 | \includegraphics[width=\textwidth]{bohm2001fig7} |
|---|
| 660 | \end{center} |
|---|
| 661 | \caption[Lower and Upper Distance Bounds in a \kdtree.]{Lower and upper |
|---|
| 662 | bounds (min- and max-dist, respectively) on the distance from a query point |
|---|
| 663 | to a \kdtree bounding box. |
|---|
| 664 | This figure is copied from \cite{bohm2001} Figure 7.} |
|---|
| 665 | \label{upperlower} |
|---|
| 666 | \end{figure} |
|---|
| 667 | |
|---|
| 668 | |
|---|
| 669 | \begin{figure} |
|---|
| 670 | \begin{center} |
|---|
| 671 | \includegraphics[width=\textwidth]{rangesearch-split-fig} |
|---|
| 672 | \end{center} |
|---|
| 673 | \caption[\kdtree range search with splitting planes.]{\kdtree range |
|---|
| 674 | search for a query point $q$ in \kdtree node $N$, |
|---|
| 675 | for a \kdtree that only stores the splitting plane (not the bounding boxes). |
|---|
| 676 | Top: since the query is to the left of $N$'s splitting plane (dashed line), |
|---|
| 677 | we must recurse on the left child $L$. Since the distance in the splitting |
|---|
| 678 | dimension from the query to the splitting plane, $D_R$ is less than $r$, |
|---|
| 679 | we must recurse on $R$. |
|---|
| 680 | |
|---|
| 681 | Bottom: For the left node ($L$): the query is above the splitting plane, so we must recurse |
|---|
| 682 | on child $T$. The distance $D_B$ in the splitting dimension to the splitting plane |
|---|
| 683 | is less than $r$, so we must recurse on $B$ also. |
|---|
| 684 | |
|---|
| 685 | For the right node ($R$): the query is above the splitting plane, so we must recurse |
|---|
| 686 | on child $t$. The distance $D_b$ to the splitting plane is greater than $r$ so we |
|---|
| 687 | need not recurse on node $b$. (Note, we could also keep track of the \emph{total} |
|---|
| 688 | distance to the query point, $\sqrt{D_R^2 + D_b^2}$, but this requires more book-keeping.} |
|---|
| 689 | \label{rangesearch-sp} |
|---|
| 690 | \end{figure} |
|---|
| 691 | } |
|---|
| 692 | |
|---|
| 693 | For feature matching, we typically want to do \emph{range search}: |
|---|
| 694 | find all features within radius $r$ of a given query feature $q$. |
|---|
| 695 | This is easily achieved with a \kdtree using a recursive algorithm. |
|---|
| 696 | Starting at the root, we must decide whether we need to recurse on |
|---|
| 697 | each of the node's children. (How we make this decision is explained |
|---|
| 698 | below.) Once we reach a leaf, we enumerate the features owned by the |
|---|
| 699 | leaf and compute the distance to the query feature; any feature with |
|---|
| 700 | distance less than $r$ is accepted. |
|---|
| 701 | |
|---|
| 702 | |
|---|
| 703 | We decide whether we must recurse on a node's children based on their |
|---|
| 704 | bounding boxes. We compute the \emph{mindist}---the lower-bound of |
|---|
| 705 | distance---between the query point and each child's bounding box. The |
|---|
| 706 | mindist is simply the distance to the corner or edge of the bounding |
|---|
| 707 | box, and can be computed quickly. Any node whose mindist to the query |
|---|
| 708 | point is less than the query radius $r$ must be visited. If the |
|---|
| 709 | \kdtree does not explicitly store the tight bounding-boxes of the |
|---|
| 710 | nodes, loose bounding-boxes can be constructed during the recursion |
|---|
| 711 | based on the splitting planes of the ancestors. |
|---|
| 712 | |
|---|
| 713 | |
|---|
| 714 | \comment{ |
|---|
| 715 | Nearest-neighbour search in \kdtrees is essentially the same as range |
|---|
| 716 | search, except that $r$ is considered the \emph{pruning distance}; any |
|---|
| 717 | nodes whose lower-bound distance is larger than the pruning distance |
|---|
| 718 | need not be visited. The pruning distance decreases monotonically as |
|---|
| 719 | the search proceeds and is simply the (upper bound of) the distance to |
|---|
| 720 | the nearest neighbour found so far. Since reducing the pruning |
|---|
| 721 | distance is critically important to achieving fast search, the order |
|---|
| 722 | in which nodes are explored is important; one typically keeps a |
|---|
| 723 | priority queue of nodes to visit, sorted by the (lower-bound) distance |
|---|
| 724 | of the node to the query feature. If the query feature falls in the |
|---|
| 725 | region owned by one of the leaf nodes, then all nodes on the path from |
|---|
| 726 | the root to that leaf will have lower-bound distance of zero, so this |
|---|
| 727 | path will be explored first and the ``greedy'' nearest neighbour |
|---|
| 728 | distance will become the first pruning distance. |
|---|
| 729 | |
|---|
| 730 | |
|---|
| 731 | The \kdtree can also be used for approximate nearest-neighbour or |
|---|
| 732 | range searches. Beis and Lowe \cite{beis1997} describe the Best Bin |
|---|
| 733 | First algorithm, which is essentially the same as the full |
|---|
| 734 | nearest-neighbour search except that the search terminates after |
|---|
| 735 | exploring a fixed maximum number of leaves or after a fixed period of |
|---|
| 736 | time has elapsed. The priority queue is sorted by the lower-bound |
|---|
| 737 | distance of the node's bounding box to the query point, so the `most |
|---|
| 738 | promising' nodes are explored first. |
|---|
| 739 | } |
|---|
| 740 | |
|---|
| 741 | |
|---|
| 742 | Chapter \ref{chap:kdtree} presents the \kdtree in more detail. |
|---|
| 743 | |
|---|
| 744 | |
|---|
| 745 | \subsection{Other approaches} |
|---|
| 746 | |
|---|
| 747 | % Voronoi: |
|---|
| 748 | % -de Berg 1997 |
|---|
| 749 | % (Mark de Berg , Marc van Kreveld , Mark Overmars , Otfried Schwarzkopf, Computational geometry: algorithms and applications, Springer-Verlag New York, Inc., Secaucus, NJ, 1997) |
|---|
| 750 | % -Edelsbrunner 1987 |
|---|
| 751 | % (Herbert Edelsbrunner, Algorithms in combinatorial geometry, Springer-Verlag New York, Inc., New York, NY, 1987) |
|---|
| 752 | % -Preparata and Shamon 1985 |
|---|
| 753 | % (Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985) |
|---|
| 754 | |
|---|
| 755 | In contrast to the \kdtree's rectangular decomposition of space, there |
|---|
| 756 | is a family of space decompositions that use hyperspheres. An example |
|---|
| 757 | is the \emph{ball-tree} \cite{uhlmann1991b}, in which each non-leaf |
|---|
| 758 | node is associated with one of the known features $f$ and a radius |
|---|
| 759 | $r$; the left child owns all features within radius $r$ of feature |
|---|
| 760 | $f$, and the right child owns the remaining features. Similarly, in |
|---|
| 761 | the \emph{anchors hierarchy} \cite{moore2000}, each non-leaf node is |
|---|
| 762 | represented by a feature (called its \emph{anchor}), and a node owns |
|---|
| 763 | all the features that are closer to its anchor than the anchor of its |
|---|
| 764 | sibling. These sphere-based trees are supposed to better capture the |
|---|
| 765 | structure of high-dimensional data sets, though the results seem to |
|---|
| 766 | depend quite strongly on the particular data distributions and search |
|---|
| 767 | parameters. |
|---|
| 768 | |
|---|
| 769 | |
|---|
| 770 | For two-dimensional feature spaces, algorithms based on planar point |
|---|
| 771 | location (eg, \cite{kirkpatrick1983}) and the Voronoi tesselation of |
|---|
| 772 | space yield $\mathcal{O}(n \log n)$ preprocessing time, a data |
|---|
| 773 | structure of size $\mathcal{O}(n)$, and query time of |
|---|
| 774 | $\mathcal{O}(\log{n})$. Unfortunately, in higher dimensions the space |
|---|
| 775 | required to store the Voronoi tesselation is |
|---|
| 776 | $\mathcal{O}(n^{\floor{d/2}})$, which quickly becomes untenable |
|---|
| 777 | \cite{arya1998}. |
|---|
| 778 | |
|---|
| 779 | % planar point location: |
|---|
| 780 | % -Dobkin and Lipton in 1976 O((log n)^2) time, O(n^2) space |
|---|
| 781 | % -Sarnak and Tarjan O(n) space, O(log n) time |
|---|
| 782 | % * Neil Sarnak, Robert E. Tarjan (1986). "Planar Point Location Using Persistent Search Trees". Communications of the ACM 29 (7): 669-679. |
|---|
| 783 | % -Edelsbrunner, Guibas, and Stolfi, same |
|---|
| 784 | % * Herbert Edelsbrunner, Leonidas J. Guibas, Jorge Stolfi (1986). "Optimal point location in a monotone subdivision". SIAM Journal on Computing 15 (2): 317-340. |
|---|
| 785 | % -Kirkpatrick, same |
|---|
| 786 | % * David G. Kirkpatrick (1983). "Optimal Search in Planar Subdivisions". SIAM J. Comput. 12: 28-35. |
|---|
| 787 | % Voronoi tesselation: |
|---|
| 788 | % -B. Delaunay, Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793-800, 1934 |
|---|
| 789 | |
|---|
| 790 | The design of spatial indexing data structures and search algorithms |
|---|
| 791 | is itself a large research area. Dozens of different tree structures |
|---|
| 792 | have been proposed: in two surveys B\"ohm \cite{bohm2001} and |
|---|
| 793 | Hjaltason \cite{hjaltason2003} identify B-, $\textrm{B}^{+}$-, ball-, |
|---|
| 794 | bisector-, BSP-, DABS-, fq-, gh-, \mbox{GNA-,} hB-, $\textrm{hB}^{\pi}$-, |
|---|
| 795 | hybrid-, IQ-, kd-, kd-B-, $\textrm{LSD}^h$-, M-, mb-, |
|---|
| 796 | $\textrm{mb}^{\ast}$-, mvp-, oct-, \mbox{post-office-,} pyramid-, quad-, R-, |
|---|
| 797 | $\textrm{R}^{\ast}$-, $\textrm{R}^{+}$-, sa-, slim-, \mbox{sphere-,} |
|---|
| 798 | SR-, SS-, TV-, vp-, $\textrm{vp}^{\ast}$-, and X-trees. There is an |
|---|
| 799 | equally mind-boggling variety of exact and approximate search |
|---|
| 800 | algorithms. Each data structure and algorithm may work well in some |
|---|
| 801 | region of parameter space, but there is no clear winner for |
|---|
| 802 | general-purpose use. |
|---|
| 803 | |
|---|
| 804 | |
|---|
| 805 | \section{Astrometric calibration as a pattern recognition task} |
|---|
| 806 | |
|---|
| 807 | |
|---|
| 808 | \comment{ |
|---|
| 809 | wget "http://casjobs.sdss.org/ImgCutoutDR7/getjpeg.aspx?ra=166.45&dec=-0.03&scale=1&opt=&width=2000&height=2000" -O ngc3521-orig.jpg |
|---|
| 810 | jpegtopnm ngc3521-orig.jpg | pnmrotate -45 | pnmcut 600 900 1400 1000 | pnmscale -reduce 2 | pnmtojpeg > ngc3521.jpg |
|---|
| 811 | #---> http://live.astrometry.net/status.php?job=alpha-200906-68444159 |
|---|
| 812 | jpegtopnm ngc3521.jpg | ppmtopgm | pnminvert | pnmtojpeg > ngc3521-bw.jpg |
|---|
| 813 | wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=field.xy.fits" -O ngc3521.xy |
|---|
| 814 | wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=index.xy.fits" -O ngc3521-index.xy |
|---|
| 815 | jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white > ngc3521-sources.png |
|---|
| 816 | jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white -P | plotxy -I - -i ngc3521-index.xy -x 1 -y 1 -C black -b white -s crosshair -P | plot-constellations -f 18 -w ngc3521.wcs -i - -N -o ngc3521-index.png |
|---|
| 817 | %%% wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=wcs.fits" -O ngc3521.wcs |
|---|
| 818 | %%% scp gmaps:/data2/test-merc/tycho.mkdt.fits . |
|---|
| 819 | wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-0.5&bb=0,-85,360,85&dashbox=0.1&w=500&h=500&lw=3" -O ngc3521-zoom0.png |
|---|
| 820 | wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-1&bb=175.533,-17.7621663832,211.533,17.6598478619&dashbox=0.01&w=500&h=500&lw=3" -O ngc3521-zoom1.png |
|---|
| 821 | wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=0.5&bb=191.733,-1.85338140354,195.333,1.74602498613&w=500&h=500&lw=3" -O ngc3521-zoom2.png |
|---|
| 822 | for x in 0 1 2; do |
|---|
| 823 | pngtopnm ngc3521-zoom${x}.png | ppmtopgm | pnminvert | pnmtopng > ngc3521-zoom${x}-bw.png; |
|---|
| 824 | done |
|---|
| 825 | } |
|---|
| 826 | |
|---|
| 827 | |
|---|
| 828 | % ~/an-2/usnob-map/execs/tilerender -x 0.000000 -y -85.000000 -X 360.000000 -Y 85.000000 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.1 -d > tile1.png |
|---|
| 829 | % ~/an-2/usnob-map/execs/tilerender -x 151.724000 -y -4.784223 -X 187.724000 -Y 29.772221 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.25 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.01 -d > tile2.png |
|---|
| 830 | % ~/an-2/usnob-map/execs/tilerender -x 167.924000 -y 11.335527 -X 171.524000 -Y 14.841399 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g 0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 > tile3.png |
|---|
| 831 | % |
|---|
| 832 | % /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/index.xy.fits -S 1 -W 720 -H 503 -x 1 -y 1 -w 2 -r 6 -s s > ixy.pgm |
|---|
| 833 | % /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/field.xy.fits -S 1 -W 720 -H 503 -r 5 -x 1 -y 1 -w 2 -s c > fxy.pgm |
|---|
| 834 | % cp /home/gmaps/ontheweb-data/tor/200706/51145570/image.pnm undim.ppm |
|---|
| 835 | % pgmtoppm green ixy.pgm > ixy.ppm |
|---|
| 836 | % pnmcomp -alpha=ixy.pgm ixy.ppm undim.ppm > sum.ppm |
|---|
| 837 | % pnmcomp -alpha=fxy.pgm fxy.ppm sum.ppm > sum2.ppm |
|---|
| 838 | % pnmtopng sum2.ppm > sum.png |
|---|
| 839 | % pnmcomp -alpha=fxy.pgm fxy.ppm undim.ppm | pnmtopng > sources.png |
|---|
| 840 | % http://oven.cosmo.fas.nyu.edu/test/status.php?job=tor-200706-51145570 |
|---|
| 841 | |
|---|
| 842 | |
|---|
| 843 | \begin{figure} |
|---|
| 844 | \begin{center} |
|---|
| 845 | \framebox{\includegraphics[width=0.99\figunit]{ngc3521-bw}} \\ |
|---|
| 846 | \framebox{\includegraphics[width=0.99\figunit]{ngc3521-sources}} \\ |
|---|
| 847 | \framebox{\includegraphics[width=0.99\figunit]{ngc3521-index}} |
|---|
| 848 | \end{center} |
|---|
| 849 | \caption{\captionpart{Top:} Input image (credit: Sloan Digital Sky |
|---|
| 850 | Survey). \captionpart{Middle:} The brightest 100 sources extracted |
|---|
| 851 | from the image. \captionpart{Bottom:} Reference sources, transformed |
|---|
| 852 | into the image coordinate system (crosshairs). Many of the image and |
|---|
| 853 | reference sources are aligned, but there are many image sources |
|---|
| 854 | without reference sources. Our system knows about the positions of |
|---|
| 855 | many objects of interest on the sky, and has labelled the galaxy NGC |
|---|
| 856 | 3521.\label{fig:redgreen}} |
|---|
| 857 | \end{figure} |
|---|
| 858 | |
|---|
| 859 | |
|---|
| 860 | \begin{figure} |
|---|
| 861 | \begin{center} |
|---|
| 862 | \begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c} |
|---|
| 863 | \framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom0-bw}} & |
|---|
| 864 | \framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom1-bw}} & |
|---|
| 865 | \framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom2-bw}} |
|---|
| 866 | \end{tabular} |
|---|
| 867 | \end{center} |
|---|
| 868 | \caption{The location of the input image on the sky. |
|---|
| 869 | \captionpart{Left:} The whole sky, in Mercator projection. The |
|---|
| 870 | sinusoid-shaped feature is the Milky Way. The dashed box shows the |
|---|
| 871 | zoomed-in region. \captionpart{Middle:} Zoomed in by a factor of |
|---|
| 872 | $10$. \captionpart{Right:} Zoomed in by a factor of $100$. The box |
|---|
| 873 | shows the outline of the input image.\label{fig:onthesky}} |
|---|
| 874 | \end{figure} |
|---|
| 875 | |
|---|
| 876 | |
|---|
| 877 | For modern astronomers, astrometric calibration is often one of the |
|---|
| 878 | first steps toward getting useful information out of an image of the |
|---|
| 879 | sky. Aligning a new image with an \emph{astrometric reference |
|---|
| 880 | catalog} allows the astronomer to place the image within a standard |
|---|
| 881 | coordinate frame. This allows stars, galaxies, and other objects |
|---|
| 882 | (\emph{sources}) in the new image to be identified with known sources, |
|---|
| 883 | which in turn allows astronomers to calibrate other properties of the |
|---|
| 884 | new image, and allows the positions of new sources to be described in |
|---|
| 885 | a standard reference frame. |
|---|
| 886 | |
|---|
| 887 | |
|---|
| 888 | The task of blind astrometric calibration---automatically finding the |
|---|
| 889 | astrometric calibration of an image, using only the information in the |
|---|
| 890 | image pixels---can be seen as a pattern recognition problem. As |
|---|
| 891 | Bertin \cite{bertin2005} notes, ``astrometric and photometric |
|---|
| 892 | calibrations have remained the most tiresome step in the reduction of |
|---|
| 893 | large imaging surveys,'' so this is not only an interesting problem to |
|---|
| 894 | solve, but one with practical implications for astronomers. |
|---|
| 895 | |
|---|
| 896 | |
|---|
| 897 | For the purposes of astrometric calibration, we can think of the sky |
|---|
| 898 | as a large two-dimensional surface: the stars are very distant, so our |
|---|
| 899 | viewpoint is effectively fixed. We are moving, as are the stars, but |
|---|
| 900 | these motions are small relative to the precision at which we |
|---|
| 901 | typically work. The sky contains many stars, galaxies, and other |
|---|
| 902 | astronomical sources. The stars and distant galaxies are effectively |
|---|
| 903 | point sources, while closer galaxies can be resolved. Astrometric |
|---|
| 904 | reference catalogs list the positions, motions, and brightnesses of |
|---|
| 905 | these sources and serve as the ``ground truth'' or database of known |
|---|
| 906 | (reference) objects. The USNO-B1 catalog \cite{usnob, nomad}, for |
|---|
| 907 | example, lists over one billion objects. As many as a few percent of |
|---|
| 908 | these are false detections or other artifacts \cite{barroncleaning}, |
|---|
| 909 | and some objects that should be visible are missing. |
|---|
| 910 | |
|---|
| 911 | |
|---|
| 912 | Extra sources can be due to planets, comets, satellites, or aircraft. |
|---|
| 913 | Missing or poorly localized source can be due to imperfections in the |
|---|
| 914 | imaging sensor, saturation, cosmic ray interference, or (rarely) |
|---|
| 915 | occlusion. Errors in the image processing that detects sources can |
|---|
| 916 | lead to sources being gained or lost. We call sources that appear in |
|---|
| 917 | only the input image or reference catalog \emph{distractors} and |
|---|
| 918 | \emph{dropouts}, respectively. The existence of distractors and |
|---|
| 919 | dropouts means that we can never assume that all the objects in the |
|---|
| 920 | reference catalog will be contained in an image to be recognized, or |
|---|
| 921 | vice versa. |
|---|
| 922 | |
|---|
| 923 | |
|---|
| 924 | The images to be recognized are subregions of the sky. Image sizes |
|---|
| 925 | range from nearly half the celestial sphere down to $10^{-7}$ of the |
|---|
| 926 | area and smaller. The input images measure unknown bands of the |
|---|
| 927 | electromagnetic spectrum, and various nonlinear functions may have |
|---|
| 928 | been applied to the pixel values. We cannot rely on absolute |
|---|
| 929 | brightness or color to recognize individual stars or galaxies. At |
|---|
| 930 | best we can hope that there is some positive correlation in the |
|---|
| 931 | relative brightness ordering of objects in the image and the |
|---|
| 932 | corresponding objects in our catalog. |
|---|
| 933 | |
|---|
| 934 | |
|---|
| 935 | Blind astrometric calibration is an ideal task for exploring geometric |
|---|
| 936 | ideas in pattern recognition. Most celestial objects are effectively |
|---|
| 937 | point sources, and can be found and localized to sub-pixel accuracy |
|---|
| 938 | using relatively simple image-processing procedures. But since the |
|---|
| 939 | individual features are characterized only by their positions and |
|---|
| 940 | brightnesses, we must examine collections of features in order to |
|---|
| 941 | build distinctive patterns. In \chapref{chap:techreport} we present |
|---|
| 942 | \an, which applies the geometric hashing framework to the task of |
|---|
| 943 | blind astrometric calibration. An example of our results in shown in |
|---|
| 944 | \figs \ref{fig:redgreen} and \ref{fig:onthesky}. |
|---|
| 945 | |
|---|
| 946 | |
|---|
| 947 | \comment{ |
|---|
| 948 | There are several useful applications for a blind astrometry solver. |
|---|
| 949 | Most telescopes used by professional astronomers are |
|---|
| 950 | computer-controlled and automatically record the telescope pointing in |
|---|
| 951 | the image header, but these systems occasionally fail. Feeding the |
|---|
| 952 | images to a blind astrometry solver would allow an independent |
|---|
| 953 | verification that the \metadata written by the telescope control |
|---|
| 954 | software are correct. Second, some observatories have large archives |
|---|
| 955 | of photographic plates taken by astronomers over many decades, and a |
|---|
| 956 | growing fraction of these have been digitized. These images could be |
|---|
| 957 | useful to researchers, since they provide a longer baseline for |
|---|
| 958 | measuring changes over time. However, information about where the |
|---|
| 959 | telescope was pointing has often been lost, or is only available as a |
|---|
| 960 | hand-written note on the edge of the photographic plate, or as an |
|---|
| 961 | entry in a logbook. A blind astrometry solver would be an invaluable |
|---|
| 962 | tool for easily providing accurate \metadata about these images. |
|---|
| 963 | Finally, these is a large and enthusiastic community of amateur |
|---|
| 964 | astronomers who produce many images that would be useful to |
|---|
| 965 | professional astronomers, but since amateurs are seldom interested in |
|---|
| 966 | recording exact \metadata, these images are not readily available to |
|---|
| 967 | professional astronomers. A blind astrometry solver would unlock this |
|---|
| 968 | trove of pixels. |
|---|
| 969 | |
|---|
| 970 | Other interesting applications include creating a customized |
|---|
| 971 | star-finding chart overlayed on the input image, and controlling a |
|---|
| 972 | telescope by using the output of a blind solver as feedback to a |
|---|
| 973 | control system. |
|---|
| 974 | |
|---|
| 975 | \begin{figure} |
|---|
| 976 | \begin{center} |
|---|
| 977 | %\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c} |
|---|
| 978 | \includegraphics[width=\figunit]{moon} \\ |
|---|
| 979 | \includegraphics[width=\figunit]{saturated} |
|---|
| 980 | %\end{tabular} |
|---|
| 981 | \end{center} |
|---|
| 982 | \caption{Astronomical images with occlusion. \captionpart{Top:} The |
|---|
| 983 | moon, buildings, and mountains occlude the stars behind them (image |
|---|
| 984 | copyright Johannes Schedler, \texttt{http://panther-observatory.com}). |
|---|
| 985 | \captionpart{Bottom:} Saturation and diffraction spikes due to bright |
|---|
| 986 | objects can obscure nearby objects.} |
|---|
| 987 | \label{moon} |
|---|
| 988 | \end{figure} |
|---|
| 989 | |
|---|
| 990 | Since most celestial objects are effectively point sources, the |
|---|
| 991 | obvious image feature to use is an individual object. Feature |
|---|
| 992 | detection is relatively straightforward, and the positions of features |
|---|
| 993 | can be measured with subpixel accuracy, but the feature descriptor |
|---|
| 994 | essentially contains only brightness and position; individual features |
|---|
| 995 | are not distinctive. |
|---|
| 996 | |
|---|
| 997 | Although occlusion is typically not a problem in astronomical images, |
|---|
| 998 | it does occasionally occur (see Figure \ref{moon}). In addition, |
|---|
| 999 | bright objects in the image can cause diffraction spikes or saturation |
|---|
| 1000 | over a large region of the image, and this can cause nearby objects to |
|---|
| 1001 | be hidden. |
|---|
| 1002 | } |
|---|
| 1003 | |
|---|
| 1004 | |
|---|
| 1005 | |
|---|
| 1006 | \section{Related work in astrometry} |
|---|
| 1007 | |
|---|
| 1008 | There seem to be two distinct groups of researchers who have worked on |
|---|
| 1009 | astrometric calibration. The first are professional astronomers, |
|---|
| 1010 | whose images are typically of small angular extent, long exposure |
|---|
| 1011 | time, and high quality. They typically assume that there is a good |
|---|
| 1012 | initial guess of the astrometry and the goal is to produce a very |
|---|
| 1013 | accurate astrometric calibration, including image distortion. The |
|---|
| 1014 | second group of researchers are spacecraft engineers who want to use |
|---|
| 1015 | the stars to estimate the attitude of a camera attached to a |
|---|
| 1016 | spacecraft. Here the images are of wide angular extent, have short |
|---|
| 1017 | exposure time, and are very noisy. Primary concerns include weight, |
|---|
| 1018 | power consumption, and robustness (especially in avoiding false |
|---|
| 1019 | positives), while a high degree of accuracy is neither required nor |
|---|
| 1020 | possible given the hardware. |
|---|
| 1021 | |
|---|
| 1022 | \subsection{Non-blind astrometric calibration} |
|---|
| 1023 | |
|---|
| 1024 | Non-blind astrometry requires that an initial estimate of the image |
|---|
| 1025 | astrometry be provided. Groth \cite{groth1986} presents an algorithm |
|---|
| 1026 | for matching two lists of coordinates (eg, image coordinates in pixels |
|---|
| 1027 | and reference star coordinates on the celestial sphere), assuming that |
|---|
| 1028 | the lists contain a significant proportion of objects in common. With |
|---|
| 1029 | the limited computing resources available at the time, he suggests |
|---|
| 1030 | taking the brightest 20 or 30 objects in each list. |
|---|
| 1031 | |
|---|
| 1032 | |
|---|
| 1033 | Once the two lists of objects have been compiled and the brightest |
|---|
| 1034 | objects selected, all sets of three objects are enumerated and the |
|---|
| 1035 | triangles they form are described by a scale-, rotation-, and |
|---|
| 1036 | translation-invariant descriptor, composed of the length ratio of the |
|---|
| 1037 | longest to shortest edges, plus the cosine between these edges and the |
|---|
| 1038 | \emph{sense} or \emph{parity} of the triangle. The tolerances |
|---|
| 1039 | associated with these features are computed (by propagating their |
|---|
| 1040 | positional errors through the feature descriptor process) and stored, |
|---|
| 1041 | along with the logarithm of the triangle's perimeter. Triangles with |
|---|
| 1042 | large length ratio are rejected since they are relatively |
|---|
| 1043 | undistinctive. |
|---|
| 1044 | |
|---|
| 1045 | |
|---|
| 1046 | After triangle features are extracted from both point lists, feature |
|---|
| 1047 | matching is performed by checking whether the distance between each |
|---|
| 1048 | pair of features is less than their corresponding tolerances. This |
|---|
| 1049 | process is accelerated by sorting the features on one of the feature |
|---|
| 1050 | dimensions. If multiple matches are found for a particular feature, |
|---|
| 1051 | only the closest is considered. After all the features have been |
|---|
| 1052 | compared, many correct matches should be found, along with some false |
|---|
| 1053 | matches. For each match, the difference of the log-perimeters of the |
|---|
| 1054 | two triangles is computed. This gives the relative scales of the two |
|---|
| 1055 | triangles and hence their coordinate frames, assuming the match is |
|---|
| 1056 | correct. False matches are rejected by iterative outlier detection: |
|---|
| 1057 | the mean difference of log-perimeters is computed and matches far from |
|---|
| 1058 | the mean are rejected. |
|---|
| 1059 | |
|---|
| 1060 | |
|---|
| 1061 | This approach is essentially an application of the geometric hashing |
|---|
| 1062 | method, though instead of using hashing to accelerate feature |
|---|
| 1063 | matching, the features are simply kept in lists that are sorted on one |
|---|
| 1064 | dimension. Much of the subsequent work in this area follows |
|---|
| 1065 | essentially the same path (eg, \cite{valdes1995}, \cite{dong2006}). |
|---|
| 1066 | |
|---|
| 1067 | |
|---|
| 1068 | P\'al and Bakos \cite{pal2006} adapt the triangle-matching approach to |
|---|
| 1069 | images containing many more objects (of order $10^4$). Since it would |
|---|
| 1070 | become prohibitively expensive to enumerate all triangles in such an |
|---|
| 1071 | image, they reduce the number of triangles created by using only the |
|---|
| 1072 | triangles created by a Delauney triangulation. This vastly reduces |
|---|
| 1073 | the number of triangles created, but makes the algorithm more |
|---|
| 1074 | sensitive to dropout and distractor stars: a single extra star causes |
|---|
| 1075 | a completely disjoint set of triangles to be created in its |
|---|
| 1076 | neighbourhood. To compensate for this shortcoming, they define an |
|---|
| 1077 | \emph{extended Delauney triangulation}: for each point, they select |
|---|
| 1078 | all points at distance $\ell$ in the Delauney triangulation, and |
|---|
| 1079 | triangulate this set. This process is repeated for each point. The |
|---|
| 1080 | first extended triangulation ($\ell = 2$) skips over the nearest set |
|---|
| 1081 | of stars and builds larger triangles by using the next-further set of |
|---|
| 1082 | stars. The intent is that if some of the nearby stars are distractors |
|---|
| 1083 | that they will be ignored when the extended triangulations are used. |
|---|
| 1084 | |
|---|
| 1085 | \begin{figure} |
|---|
| 1086 | \begin{center} |
|---|
| 1087 | \includegraphics[width=\figunit]{pal-fig2} |
|---|
| 1088 | \end{center} |
|---|
| 1089 | \caption{The triangle parameterization used by P\'al and Bakos. This |
|---|
| 1090 | figure is a reproduction of \fig 2 in \cite{pal2006}.\label{pal}} |
|---|
| 1091 | \end{figure} |
|---|
| 1092 | |
|---|
| 1093 | P\'al and Bakos introduce a new triangle parameterization which is |
|---|
| 1094 | continuous and sensitive to chirality (parity); see \figref{pal}. |
|---|
| 1095 | This two-dimensional parameter space is used as the geometric feature |
|---|
| 1096 | space. When matching triangles between the two images, they demand a |
|---|
| 1097 | \emph{symmetric point match}: each triangle must be the other |
|---|
| 1098 | triangle's nearest neighbour in feature space. Matching two images is |
|---|
| 1099 | done by creating the lists of triangles and attempting to find |
|---|
| 1100 | symmetric point matches. This process is accelerated by sorting each |
|---|
| 1101 | list by one of the coordinates. Each triangle match is considered to |
|---|
| 1102 | be a vote for the correspondence of the three pairs of points |
|---|
| 1103 | composing the triangles. These votes are accumulated in a sparse |
|---|
| 1104 | matrix where element $(i, j)$ contains the number of votes for a |
|---|
| 1105 | correspondence between object $i$ in the first image and object $j$ in |
|---|
| 1106 | the second image. After all matches are considered, the top 40\% of |
|---|
| 1107 | the nonzero matrix elements are assumed to contain the true |
|---|
| 1108 | correspondences. A transformation based on these correspondence is |
|---|
| 1109 | computed and the unitarity of the transformation matrix is used to |
|---|
| 1110 | judge whether the match is true or false. |
|---|
| 1111 | |
|---|
| 1112 | |
|---|
| 1113 | A different approach is taken by Kaiser \etal \cite{kaiser1999}. They |
|---|
| 1114 | assume they are given two lists of source positions that differ by a |
|---|
| 1115 | rigid transformation involving scaling, rotation, and translation. In |
|---|
| 1116 | each image, they iterate over each pair of points and compute the |
|---|
| 1117 | vector difference of their positions. For each list, the log-length |
|---|
| 1118 | and angle of the pairwise difference vectors are accumulated in a |
|---|
| 1119 | two-dimensional histogram. Observe that if the whole list of points |
|---|
| 1120 | is scaled up by a constant factor, then the log-distance between each |
|---|
| 1121 | pair of points increases by a constant amount. Similarly, if the |
|---|
| 1122 | whole list is rotated then the angles shift by a constant amount. |
|---|
| 1123 | Once the two histograms have been computed, their cross-correlation is |
|---|
| 1124 | computed. If the two lists contain a significant number of |
|---|
| 1125 | corresponding points, the cross-correlation signal will be strongest |
|---|
| 1126 | at a shift corresponding to the difference in log-scale and rotation |
|---|
| 1127 | between the lists. This process is similar to a Hough transform |
|---|
| 1128 | \cite{duda1972,ballard1981}, except that instead of finding the peak |
|---|
| 1129 | of a single parameter-space histogram, we are searching for a peak in |
|---|
| 1130 | the similarity of two histograms as we shift them with respect to each |
|---|
| 1131 | other in parameter space. |
|---|
| 1132 | |
|---|
| 1133 | |
|---|
| 1134 | Once the scaling and rotation between the lists has been found, the |
|---|
| 1135 | translation can be found by scaling and rotating one of the lists into |
|---|
| 1136 | the frame of the other list, then histogramming the vector difference |
|---|
| 1137 | between points across the two lists. This is a standard (generalized) |
|---|
| 1138 | Hough transform. |
|---|
| 1139 | |
|---|
| 1140 | |
|---|
| 1141 | \subsection{Blind astrometric calibration} |
|---|
| 1142 | |
|---|
| 1143 | |
|---|
| 1144 | The majority of previous work on blind astrometric calibration is |
|---|
| 1145 | motivated by the problem of spacecraft attitude estimation. Sometimes |
|---|
| 1146 | called the ``lost in space'' or ``stellar gyroscope'' problem, the |
|---|
| 1147 | task is to estimate the pose of a spacecraft by using an image of the |
|---|
| 1148 | sky captured by an onboard camera. Although similar in general |
|---|
| 1149 | spirit, the requirements and limitations of this application are quite |
|---|
| 1150 | different than astronomical applications. Mass, power consumption, |
|---|
| 1151 | and robustness of the system are primary concerns, and as a result the |
|---|
| 1152 | optical designs are very different from science-grade astronomical |
|---|
| 1153 | instruments, and the available processor and memory resources are very |
|---|
| 1154 | restricted. Typical fields of view are tens of degrees across, and |
|---|
| 1155 | the exposure time is kept short to allow the system to function while |
|---|
| 1156 | the spacecraft is rotating. As a result, image quality is typically |
|---|
| 1157 | quite poor: often only a handfull of the brightest stars are visible. |
|---|
| 1158 | Since the field of view is large, a reference catalog of a few |
|---|
| 1159 | thousand stars is sufficient to ensure that any view of the sky |
|---|
| 1160 | contains many reference stars. Since the system will only process |
|---|
| 1161 | images from a single camera, the whole system can be customized and |
|---|
| 1162 | calibrated to that camera. For example, the nonlinear distortions of |
|---|
| 1163 | the optical system can be measured, and the scale and bandpass of the |
|---|
| 1164 | imaging system are known, so the reference catalog can be tailored to |
|---|
| 1165 | match. |
|---|
| 1166 | |
|---|
| 1167 | |
|---|
| 1168 | For example, Liebe \etal \cite{liebe2004} describe a system design |
|---|
| 1169 | with a $56~\deg$ field of view and exposure time of $50~\unit{ms}$. |
|---|
| 1170 | The resulting images contain tens of stars if the spacecraft is not |
|---|
| 1171 | rotating, but on average only three stars will be detectable when the |
|---|
| 1172 | rotation rate is $50~\deg/\unit{sec}$. The paper does not describe a |
|---|
| 1173 | particular algorithm for star identification, with the implication |
|---|
| 1174 | that it is not a particularly difficult problem since absolute |
|---|
| 1175 | brightness information will be available, and the total number of |
|---|
| 1176 | stars that are visible to the camera is only a few hundred. |
|---|
| 1177 | |
|---|
| 1178 | |
|---|
| 1179 | In earlier work \cite{liebe1993}, Liebe describes a star |
|---|
| 1180 | identification system. The reference catalog is composed of the 1539 |
|---|
| 1181 | brightest stars, with brightness calibrated to the camera used in the |
|---|
| 1182 | system. The field of view is $30~\degrees$, and the system uses a |
|---|
| 1183 | feature-matching approach, using the brightest star in the field and |
|---|
| 1184 | its two nearest neighbours to define a geometric feature. The feature |
|---|
| 1185 | descriptor is composed of the distance from the brightest star to its |
|---|
| 1186 | nearest and second-nearest neighbours, along with the angle between |
|---|
| 1187 | these neighbours. Note that this feature is not scale-invariant, |
|---|
| 1188 | because the system is only meant to solve images taken by one camera, |
|---|
| 1189 | so the scale is known. An index is constructed by enumerating all |
|---|
| 1190 | such features that could possibly be detected, given the detection |
|---|
| 1191 | limits of the camera. These features are coarsely quantized and |
|---|
| 1192 | stored in a table. To account for noise in the feature descriptors, |
|---|
| 1193 | all neighbouring cells in the quantized feature space are also stored |
|---|
| 1194 | in the table. This generates $185,000$ features. At test time, the |
|---|
| 1195 | features in the image are enumerated and the table of features is |
|---|
| 1196 | scanned; an exact feature match in the quantized space is assumed to |
|---|
| 1197 | be correct. |
|---|
| 1198 | |
|---|
| 1199 | |
|---|
| 1200 | This approach is a fairly straightforward geometric hashing technique, |
|---|
| 1201 | except that it does not use hashing as such, and there is no voting or |
|---|
| 1202 | verification scheme because feature aliasing is assumed not to happen. |
|---|
| 1203 | |
|---|
| 1204 | \begin{figure} |
|---|
| 1205 | \begin{center} |
|---|
| 1206 | \includegraphics[width=\figunit]{grid} |
|---|
| 1207 | \end{center} |
|---|
| 1208 | \caption{A grid-based feature: two sources are used to define a local |
|---|
| 1209 | coordinate system which is discretized into a grid of cells. Each |
|---|
| 1210 | cell becomes a bit in the feature descriptor; if a cell is occupied |
|---|
| 1211 | its bit is set. This figure is copied from MacKay |
|---|
| 1212 | \cite{mackay2005}.} |
|---|
| 1213 | \label{gridbased} |
|---|
| 1214 | \end{figure} |
|---|
| 1215 | |
|---|
| 1216 | In contrast to systems that build features from the precise locations |
|---|
| 1217 | of a small number of stars, Padgett and Kreutz-Delgado |
|---|
| 1218 | \cite{padgett1997} present an approach that incorporates information |
|---|
| 1219 | from a large portion of the image. Their grid-based feature is |
|---|
| 1220 | defined by first choosing one star as the reference star to define the |
|---|
| 1221 | center of the grid, then selecting its nearest neighbour (outside an |
|---|
| 1222 | exclusion radius) to define the orientation of the grid. Once the |
|---|
| 1223 | center and orientation are determined, a grid is defined, and each |
|---|
| 1224 | remaining stars in the image is assigned to a grid cell. The feature |
|---|
| 1225 | is a bit vector, one bit per grid cell, where the bit is set only if |
|---|
| 1226 | the cell contains a star. See \figref{gridbased}. |
|---|
| 1227 | |
|---|
| 1228 | |
|---|
| 1229 | The index is constructed by scanning the sky and selecting, for each |
|---|
| 1230 | field of view, the $n$ brightest stars. For each of these stars a |
|---|
| 1231 | feature is computed and stored. At test time, they examine the |
|---|
| 1232 | brightest $c n$ stars (for some safety factor $c$), computing the |
|---|
| 1233 | feature for each one and searching the index for a match. A pair of |
|---|
| 1234 | features is defined to match if their dot product is above a |
|---|
| 1235 | threshold. This is equivalent to taking the bitwise \textsc{AND} of |
|---|
| 1236 | the bit vectors and counting the number of bits that are set. This |
|---|
| 1237 | search can be implemented efficiently by using lookup tables: for each |
|---|
| 1238 | bit, they maintain a list of the features for which that bit is set. |
|---|
| 1239 | Note that this is equivalent to using a grid-based geometric hashing |
|---|
| 1240 | approach: each grid cell is equivalent to a discretized relative |
|---|
| 1241 | coordinate vector, which becomes a hash key, and the ``lookup table'' |
|---|
| 1242 | is a hash table. |
|---|
| 1243 | |
|---|
| 1244 | |
|---|
| 1245 | After all the features in the test image have been extracted and |
|---|
| 1246 | matched to the index, the algorithm proceeds to a voting (or perhaps |
|---|
| 1247 | ``consensus'') phase where it rejects matches that propose a field of |
|---|
| 1248 | view that does not overlap that of the majority. |
|---|
| 1249 | |
|---|
| 1250 | |
|---|
| 1251 | The system is designed to operate on images of diameter $8~\degrees$, |
|---|
| 1252 | and performs well on simulated data. They use a grid size of $40 |
|---|
| 1253 | \times 40$, and find that on average $25$ grid cells are filled. |
|---|
| 1254 | Their index contains $13,000$ patterns, which means that false |
|---|
| 1255 | positives are quite rare. However, misidentification of the nearest |
|---|
| 1256 | neighbour, or edge effects (assigning a star to the wrong grid cell |
|---|
| 1257 | due to positional noise) mean that failures to find a match are not |
|---|
| 1258 | uncommon, and are more likely in regions of the sky with high stellar |
|---|
| 1259 | density. |
|---|
| 1260 | |
|---|
| 1261 | In later work, Clouse and Padgett \cite{clouse2000} extend this |
|---|
| 1262 | approach by using Bayesian decision theory instead of a simple |
|---|
| 1263 | threshold on the dot product to define how well features match. This |
|---|
| 1264 | extension, along with smaller noise levels in the (simulated) imaging |
|---|
| 1265 | system, allows them to extend the approach down to fields of view |
|---|
| 1266 | $2~\degrees$ in diameter. |
|---|
| 1267 | |
|---|
| 1268 | \comment{ |
|---|
| 1269 | Given two features, they compute the dot product between the bit |
|---|
| 1270 | vectors, then proceed to estimate the probabilities that the match is |
|---|
| 1271 | a true positive and false positive. These probability distributions |
|---|
| 1272 | are quite complex, so several simplifying approximations are made, and |
|---|
| 1273 | the remaining parameters are estimated by running simulations. |
|---|
| 1274 | } |
|---|
| 1275 | |
|---|
| 1276 | Harvey \cite{harvey2004} presents two different approaches, one |
|---|
| 1277 | grid-based and the other shape-based. The grid-based approach is |
|---|
| 1278 | similar to the Padgett--Kreutz-Delgado and Clouse--Padgett approaches |
|---|
| 1279 | \cite{padgett1997, clouse2000}. A coarser grid is used, so more stars |
|---|
| 1280 | are likely to appear in each bin. To compensate, a grid cell is only |
|---|
| 1281 | considered ``occupied'' if it contains more than some threshold number |
|---|
| 1282 | of stars. The other major change is that he expects a test image to |
|---|
| 1283 | be an ``overexposure'' or ``underexposure'' relative to the reference |
|---|
| 1284 | catalog. This implies that the test image should contain either a |
|---|
| 1285 | subset or a superset of the stars in the index, and therefore the |
|---|
| 1286 | image feature vector must be either greater than or less than an index |
|---|
| 1287 | feature at each bit. This allows simple bit operations to be used to |
|---|
| 1288 | find feature matches, and feature matching is performed by a linear |
|---|
| 1289 | scan through the index. |
|---|
| 1290 | |
|---|
| 1291 | Harvey's shape-based approach uses constrained $n$-star constellations |
|---|
| 1292 | in order to aggregate information from a large number of stars without |
|---|
| 1293 | allowing the number of potential features to grow combinatorially. |
|---|
| 1294 | Specifically, Harvey uses a pair of stars to define a narrow wedge in |
|---|
| 1295 | the image, then describes the relative positions and angles of a fixed |
|---|
| 1296 | number of nearby stars within that wedge. Unfortunately, this makes |
|---|
| 1297 | the feature highly sensitive to distractor and dropout stars, since |
|---|
| 1298 | the feature depends on the stars in the feature being enumerated in a |
|---|
| 1299 | particular order. As a result, all potential features (allowing any |
|---|
| 1300 | combination of stars to drop out) must be checked; the number of |
|---|
| 1301 | features grows exponentially as the density of stars increases. |
|---|
| 1302 | |
|---|
| 1303 | |
|---|
| 1304 | Harvey makes the useful observation that a cascade of indices can be |
|---|
| 1305 | built, where each index is designed to solve images with a particular |
|---|
| 1306 | range of scales. |
|---|
| 1307 | |
|---|
| 1308 | |
|---|
| 1309 | MacKay and Roweis \cite{mackay2005} point out that a grid-based |
|---|
| 1310 | feature such as that used by Harvey leads naturally to a hashing-based |
|---|
| 1311 | strategy. Each grid cell is associated with a bit that is turned on |
|---|
| 1312 | if the cell is occupied. This value is placed in a hash table with a |
|---|
| 1313 | mapping back to its position on the sky. Since false positives can |
|---|
| 1314 | occur as a result of feature aliasing or hash collision, a voting |
|---|
| 1315 | scheme is employed: several feature matches must accumulate before the |
|---|
| 1316 | match is accepted. Since false negatives can occur as a result of |
|---|
| 1317 | dropouts and distractors (\emph{any} missing or extra star causes the |
|---|
| 1318 | hash code to change completely), many features must be extracted for |
|---|
| 1319 | each region of the sky. |
|---|
| 1320 | |
|---|
| 1321 | |
|---|
| 1322 | \subsection{Fine-tuning astrometry} |
|---|
| 1323 | |
|---|
| 1324 | In order to ``co-add'' or ``stack'' astronomical images (combine |
|---|
| 1325 | pixels from different images of higher signal-to-noise), or to do |
|---|
| 1326 | proper-motion studies (measure the movement of stars over time), it is |
|---|
| 1327 | necessary to fine-tune the astrometric calibration of their images. |
|---|
| 1328 | This is similar to the \emph{bundle adjustment} problem in computer |
|---|
| 1329 | vision \cite{triggs2000}, in that it involves the simultaneous |
|---|
| 1330 | optimization of the various camera and telescope parameters and the |
|---|
| 1331 | estimated positions of objects in the world. Fine-tuning astrometric |
|---|
| 1332 | calibrations is easier because it is essentially two-dimensional, but |
|---|
| 1333 | more difficult because the camera parameters include polynomial |
|---|
| 1334 | distortion terms to model the image distortion introduced by telescope |
|---|
| 1335 | optics. |
|---|
| 1336 | |
|---|
| 1337 | |
|---|
| 1338 | The software package \scamp by Bertin \cite{bertin2005} is a popular |
|---|
| 1339 | tool used by astronomers to fine-tune simultaneously the astrometric |
|---|
| 1340 | calibrations of a large collection of images. For each image, it is |
|---|
| 1341 | assumed that the center of the image is known to about 25\% of the |
|---|
| 1342 | size of the image, and the scale is known to within a factor of two. |
|---|
| 1343 | The histogram-alignment method of \cite{kaiser1999} is used to find |
|---|
| 1344 | the translation, scaling, and rotation between the image and a |
|---|
| 1345 | reference catalog, which allows the correspondences between image and |
|---|
| 1346 | reference catalog stars to be determined. |
|---|
| 1347 | |
|---|
| 1348 | The core of the \scamp system is a chi-squared minimization of the |
|---|
| 1349 | total weighted distance between the projected positions of all star |
|---|
| 1350 | correspondences among the set of images and the reference catalog. |
|---|
| 1351 | The parameters to be adjusted are the center, scale, rotation, and |
|---|
| 1352 | polynomial distortion coefficients of each image. A key feature is |
|---|
| 1353 | the ability to share image distortion parameters among subsets of the |
|---|
| 1354 | images, since images taken with the same telescope are expected to |
|---|
| 1355 | share some distortion terms due to the telescope optics. This reduces |
|---|
| 1356 | the degrees of freedom of the fitting process, resulting in more |
|---|
| 1357 | robust fits. \scamp can fine-tune thousands of images to sub-pixel |
|---|
| 1358 | accuracy. |
|---|
| 1359 | |
|---|
| 1360 | |
|---|
| 1361 | \section{Our approach} |
|---|
| 1362 | \label{ourapproach} |
|---|
| 1363 | |
|---|
| 1364 | \subsection{Preprocessing} |
|---|
| 1365 | |
|---|
| 1366 | Astronomers have developed computational methods for detecting and |
|---|
| 1367 | localizing stars, galaxies, and other objects in astronomical images. |
|---|
| 1368 | In astronomy this process is known as \emph{source extraction}. |
|---|
| 1369 | Telescopes are designed to produce a predictable point-spread function |
|---|
| 1370 | which covers several pixels, and this allows stars, which are |
|---|
| 1371 | effectively point sources, to be localized with subpixel accuracy. |
|---|
| 1372 | Sources that have image profiles larger than the point spread function |
|---|
| 1373 | must be extended objects, and this allows stars to be differentiated |
|---|
| 1374 | from galaxies and other objects. Cosmic rays (single, energetic |
|---|
| 1375 | particles) deposit energy in single pixels; since no distant object |
|---|
| 1376 | can produce such an image profile, this source of noise can easily be |
|---|
| 1377 | filtered from the image. |
|---|
| 1378 | |
|---|
| 1379 | |
|---|
| 1380 | The result of source extraction is a list of object positions (with |
|---|
| 1381 | subpixel resolution), along with the brightness of each object. If |
|---|
| 1382 | the telescope has been properly calibrated, this brightness is |
|---|
| 1383 | proportional to the electromagnetic \emph{flux} from the object, but |
|---|
| 1384 | in general we simply treat the brightness as a means of ordering the |
|---|
| 1385 | objects. It is also possible to classify each source as a star or |
|---|
| 1386 | galaxy, and even classify the galaxy type, but we currently do not use |
|---|
| 1387 | this information. |
|---|
| 1388 | |
|---|
| 1389 | |
|---|
| 1390 | For the purposes of this paper, we will assume that source extraction |
|---|
| 1391 | has been performed as a preprocessing step, so we are given a list of |
|---|
| 1392 | source positions sorted by brightness. |
|---|
| 1393 | |
|---|
| 1394 | |
|---|
| 1395 | |
|---|
| 1396 | \subsection{Quad features} |
|---|
| 1397 | |
|---|
| 1398 | \begin{figure} |
|---|
| 1399 | \begin{center} |
|---|
| 1400 | \includegraphics[width=0.5\textwidth]{quad-fig} |
|---|
| 1401 | \end{center} |
|---|
| 1402 | \caption{The local coordinate frame of a quad. The most distant pair of sources, $A$ and $B$, |
|---|
| 1403 | define the origin and $(1,1)$ of the coordinate frame, respectively. |
|---|
| 1404 | The remaining two sources $C$ and $D$ are required to lie inside the circle with diameter $AB$. |
|---|
| 1405 | To resolve ambiguities, the points must be labelled such that $C_x \le \smallhalf$ and |
|---|
| 1406 | $C_x + D_x \le 1$. The geometric feature descriptor is $(C_x, C_y, D_x, D_y)$.} |
|---|
| 1407 | \label{quad} |
|---|
| 1408 | \end{figure} |
|---|
| 1409 | |
|---|
| 1410 | We have taken a geometric feature-matching approach to this problem. |
|---|
| 1411 | Since individual sources can be well localized but have no distinctive |
|---|
| 1412 | properties, we build distinctive features by describing the relative |
|---|
| 1413 | geometry of small sets of objects. |
|---|
| 1414 | |
|---|
| 1415 | |
|---|
| 1416 | To solve the blind astrometry problem, we need to index a large number |
|---|
| 1417 | of features: we want to solve images that cover $10^{-6}$ or less of |
|---|
| 1418 | the area of the sky, and we want to have tens of features per image in |
|---|
| 1419 | order to increase our robustness to distractors, dropouts, and |
|---|
| 1420 | occlusion. |
|---|
| 1421 | |
|---|
| 1422 | |
|---|
| 1423 | We are matching two-dimensional objects, and we assume that |
|---|
| 1424 | astronomical images can be well described by similarity |
|---|
| 1425 | transformations: scaling, rotation and translation. Therefore, we |
|---|
| 1426 | require two sources to define a local coordinate system. |
|---|
| 1427 | |
|---|
| 1428 | |
|---|
| 1429 | The standard geometric hashing recipe would have us use triangle |
|---|
| 1430 | features. However, the positional noise in astronomical image is |
|---|
| 1431 | sufficiently large that triangles are not distinctive: with the number |
|---|
| 1432 | of features we require, \emph{any} triangle will match an excessively |
|---|
| 1433 | large number of triangles in our index due to feature aliasing. In |
|---|
| 1434 | order to building a more distinctive feature, we consider sets of |
|---|
| 1435 | \emph{four} sources-- \emph{quads}. Two of the sources are the basis |
|---|
| 1436 | set that defines the local coordinate system, and our geometric |
|---|
| 1437 | feature descriptor is composed of the positions of the remaining two |
|---|
| 1438 | sources in this coordinate system. |
|---|
| 1439 | |
|---|
| 1440 | Using a quad feature can be seen as a way of incorporating a voting |
|---|
| 1441 | scheme directly into the feature-matching phase: a quad is essentially |
|---|
| 1442 | the conjunction of two triangles. We are effectively squaring the |
|---|
| 1443 | distinctiveness of our geometric features by demanding that \emph{two} |
|---|
| 1444 | triangles be matched simultaneously. |
|---|
| 1445 | |
|---|
| 1446 | One disadvantage of using quads rather than triangles is that we |
|---|
| 1447 | become more sensitive to distractors and dropouts: if the proportion |
|---|
| 1448 | of distractors and dropouts is $d$, the proportion of overlapping |
|---|
| 1449 | sources is $(1-d)$, and we expect about $(1-d)^3$ triangles but |
|---|
| 1450 | $(1-d)^4$ quads to overlap. |
|---|
| 1451 | |
|---|
| 1452 | |
|---|
| 1453 | In standard geometric feature-matching, one enumerates all basis sets, |
|---|
| 1454 | then lists the positions of all the remaining interest points within |
|---|
| 1455 | that basis. Since we have a two-dimensional basis and we list the |
|---|
| 1456 | positions of pairs of sources, this would yield $\mathcal{O}(n^4)$ |
|---|
| 1457 | features. Astronomical images typically contain hundreds to thousands |
|---|
| 1458 | of sources, and our reference catalog contains $10^9$ sources, so this |
|---|
| 1459 | would be prohibitively expensive. We therefore impose two constraints |
|---|
| 1460 | on the properties of the quads we create. See Figure \ref{quad}. |
|---|
| 1461 | |
|---|
| 1462 | Given a set of four sources, we first find the two sources most |
|---|
| 1463 | distant from each other and label them $A$ and $B$. If the distance |
|---|
| 1464 | from $A$ to $B$ is outside a specified range, we reject the quad. |
|---|
| 1465 | Next, we demand that the two remaining sources lie inside the circle |
|---|
| 1466 | that has $AB$ as a diameter. We place $A$ at the origin of the local |
|---|
| 1467 | coordinate system, and place $B$ at position $(x, y) = (1, 1)$. We |
|---|
| 1468 | then compute the $(x,y)$ coordinates of the two remaining sources. |
|---|
| 1469 | The source with the smaller $x$ coordinate is labelled $C$, and the |
|---|
| 1470 | last source is labelled $D$, so we have $C_x \le D_x$. The geometric |
|---|
| 1471 | feature descriptor is $(C_x, C_y, D_x, D_y)$. |
|---|
| 1472 | |
|---|
| 1473 | Observe that this feature has one redundancy: we labelled $A$ and $B$ |
|---|
| 1474 | arbitrarily, and if we swap their labels then the coordinate system is |
|---|
| 1475 | rotated $180$ degrees about $(\half, \half)$. The feature descriptor |
|---|
| 1476 | becomes $(1 - C_x, 1 - C_y, 1 - D_x, 1 - D_y)$. We can deal with this |
|---|
| 1477 | redundancy by demanding that $A$ and $B$ be labelled such that $C_x + |
|---|
| 1478 | D_x \le 1$. |
|---|
| 1479 | |
|---|
| 1480 | |
|---|
| 1481 | Recall the idea mentioned earlier in this paper of solving images of unknown |
|---|
| 1482 | scale by building a cascade of indices, each covering a range of scales. |
|---|
| 1483 | We use this approach with a scale factor of $2$ or $\sqrt{2}$. |
|---|
| 1484 | This allows us to assume that the image scale is known to |
|---|
| 1485 | within a small factor. When deciding what range of quad sizes ($AB$ distances) we wish to |
|---|
| 1486 | allow, we are faced with a tradeoff: if we use quads with large |
|---|
| 1487 | diameter, then the probability that all four stars will be within the bounds of |
|---|
| 1488 | the image decreases. Meanwhile, the number of quads of diameter $d$ that |
|---|
| 1489 | can be created is $\mathcal{O}(d^5)$, assuming a uniform distribution |
|---|
| 1490 | of sources and ignoring edge effects.\footnote{% |
|---|
| 1491 | For a given star $A$, the number of |
|---|
| 1492 | stars that can become star $B$ goes as $d$, while the number of stars |
|---|
| 1493 | $C$ and $D$ goes as $d^2$, so the number of $BCD$ combinations goes as $d^5$.} |
|---|
| 1494 | On the other hand, if we build small quads, then the relative positional error |
|---|
| 1495 | will be larger, which means that when we search for matching quads in the index |
|---|
| 1496 | we must use a larger search radius, which means that many more false matches |
|---|
| 1497 | will be found. We have not yet found an analytic solution to optimize this |
|---|
| 1498 | tradeoff; we currently set the desired quad scale to about |
|---|
| 1499 | a third of the size of the images we wish to solve. |
|---|
| 1500 | |
|---|
| 1501 | |
|---|
| 1502 | Our quad features have good noise properties: a small change of |
|---|
| 1503 | position of any one of sources $ABCD$ leads to a small change in the |
|---|
| 1504 | feature descriptor. The noise is nearly isotropic: movement of a |
|---|
| 1505 | source in any direction yields an almost equal change in the feature |
|---|
| 1506 | descriptor. The feature has some edge effects: $A$ and $B$ are |
|---|
| 1507 | defined to be the most distant pair, but if $C$ and $D$ are nearly |
|---|
| 1508 | diametrically opposite each other then a small amount of positional |
|---|
| 1509 | noise can push them further apart so that they become the most distant |
|---|
| 1510 | pair. As a result, $C$ and $D$ will be labelled $A$ and $B$, and vice |
|---|
| 1511 | versa, and a completely different geometric feature will be produced. |
|---|
| 1512 | This is a minor effect since it only occurs when both $C$ and $D$ have |
|---|
| 1513 | extreme values; we ignore it. Another edge effect is that $C$ and $D$ |
|---|
| 1514 | are required to lie inside the circle with diameter $AB$; at test time |
|---|
| 1515 | we simply expand the allowed diameter of the quad enough to compensate |
|---|
| 1516 | for a given level of positional noise. Finally, if the distance $AB$ |
|---|
| 1517 | is near the limit of the allowed range of scales, then the quad can |
|---|
| 1518 | become invalid due to noise. This effect is equally easy to correct |
|---|
| 1519 | by expanding the range slightly at test time. |
|---|
| 1520 | |
|---|
| 1521 | |
|---|
| 1522 | Notice that our constraints on the quads we accept has the effect of |
|---|
| 1523 | restricting the regions of feature space in which valid quads live. |
|---|
| 1524 | Since $C$ and $D$ are required to fall inside a circle with $AB$ on |
|---|
| 1525 | the diameter, we have: |
|---|
| 1526 | \begin{eqnarray*} |
|---|
| 1527 | \left(C_x - \smallhalf\right)^2 + \left(C_y - \smallhalf\right)^2 & \le & \smallhalf \\ |
|---|
| 1528 | \left(D_x - \smallhalf\right)^2 + \left(D_y - \smallhalf\right)^2 & \le & \smallhalf |
|---|
| 1529 | \end{eqnarray*} |
|---|
| 1530 | We also have the constraints $C_x + D_x \le 1$ and $C_x \le D_x$ to |
|---|
| 1531 | eliminate the redundancy of the labelling of the source pairs $AB$ and |
|---|
| 1532 | $CD$. As a result, our feature space is rather strangely shaped, so a |
|---|
| 1533 | grid-based hashing scheme would result in poor utilization of the hash |
|---|
| 1534 | table storage. Also note that since our noise is nearly isotropic and |
|---|
| 1535 | our quads are restricted to a known range of scales, we can use the |
|---|
| 1536 | Euclidean distance metric with a fixed search radius $r$ to do quad |
|---|
| 1537 | feature matching. |
|---|
| 1538 | |
|---|
| 1539 | |
|---|
| 1540 | |
|---|
| 1541 | |
|---|
| 1542 | \subsection{Indexing} |
|---|
| 1543 | |
|---|
| 1544 | |
|---|
| 1545 | We want our index to satisfy the following constraints. It should |
|---|
| 1546 | have uniform source and quad density over the whole sky, because we |
|---|
| 1547 | want to be equally likely to solve images over the whole sky. We want |
|---|
| 1548 | to build quads from bright stars, because we expect these to be more |
|---|
| 1549 | likely to be visible in test images, yet because of distractors, |
|---|
| 1550 | dropouts, and occlusions, we don't want to depend too heavily on any |
|---|
| 1551 | one star; we want to trade off using bright stars and using a variety |
|---|
| 1552 | of stars. |
|---|
| 1553 | |
|---|
| 1554 | |
|---|
| 1555 | We achieve this by first selecting a spatially uniform subset of the |
|---|
| 1556 | brightest sources in the reference catalog. We place a grid over the |
|---|
| 1557 | sky whose cells are sized so that a typical image is a few cells wide, |
|---|
| 1558 | then we take the $n$ brightest sources in each cell. |
|---|
| 1559 | |
|---|
| 1560 | |
|---|
| 1561 | Next, we place another similarly-sized grid over the sky and in each |
|---|
| 1562 | cell we try to build a quad whose center is within the cell. We try |
|---|
| 1563 | building quads in order of source brightness, but we also keep track |
|---|
| 1564 | of how many times each source has been used. In our first pass |
|---|
| 1565 | through the cells we only allow each star to be used once, and in each |
|---|
| 1566 | subsequent pass we increase the number of times a source can be used. |
|---|
| 1567 | |
|---|
| 1568 | |
|---|
| 1569 | Once we have built all the quads, we build a \kdtree in feature space. |
|---|
| 1570 | This allows us to rapidly find all the quads in our index whose |
|---|
| 1571 | geometric feature descriptor is close to a query feature. |
|---|
| 1572 | |
|---|
| 1573 | We also build a \kdtree over the spatially uniform subset of sources; |
|---|
| 1574 | this is used for our verification scheme. |
|---|
| 1575 | |
|---|
| 1576 | |
|---|
| 1577 | \subsection{Solving} |
|---|
| 1578 | |
|---|
| 1579 | At test time, we begin enumerating all valid quads in the image, |
|---|
| 1580 | starting with those that can be built from the brightest sources. For |
|---|
| 1581 | each quad we perform a range search in the \kdtree in order to find |
|---|
| 1582 | matching quads. Typically each image quad results in a few matches. |
|---|
| 1583 | We run a verification process on each match. |
|---|
| 1584 | |
|---|
| 1585 | The verification process first computes a least-squares transformation |
|---|
| 1586 | between image coordinates and celestial coordinates. We then search |
|---|
| 1587 | in the \kdtree of sources for all sources that should be visible in |
|---|
| 1588 | the image. We apply the transformation to bring them into image |
|---|
| 1589 | coordinates, and for each one find the nearest source in the image. |
|---|
| 1590 | We use a probabilistic model of the positional noise to compute the |
|---|
| 1591 | odds ratio between the match being a true positive and a false |
|---|
| 1592 | positive. If this ratio exceeds a threshold we declare the image |
|---|
| 1593 | solved. By using an odds ratio threshold, we allow the user to trade |
|---|
| 1594 | off the probability of false negatives and false positives. |
|---|
| 1595 | |
|---|
| 1596 | |
|---|
| 1597 | |
|---|
| 1598 | \comment{ |
|---|
| 1599 | We build an \emph{index} which maps these features |
|---|
| 1600 | back to their locations on the sky.\footnote{% |
|---|
| 1601 | In the database community (and some early computer science literature) |
|---|
| 1602 | this would be called an \emph{inverted} index, but we feel this |
|---|
| 1603 | is contrary to common usage: the index of a book maps from concepts |
|---|
| 1604 | (features) to pages (locations); the inverse mapping is called a |
|---|
| 1605 | \emph{table of contents}.} |
|---|
| 1606 | |
|---|
| 1607 | At test time, we extract features from the image, then search our |
|---|
| 1608 | index for matching features. Each matched feature is a hypothesized |
|---|
| 1609 | astrometric solution. We use a separate probabilistic verification |
|---|
| 1610 | stage to test these hypotheses. |
|---|
| 1611 | |
|---|
| 1612 | |
|---|
| 1613 | Since our input images have unknown scale, rotation, and |
|---|
| 1614 | translation, we want our features to be invariant under these |
|---|
| 1615 | transformations. One option is to consider triangles of stars and |
|---|
| 1616 | use the interior angles (or relative edge lengths) as a |
|---|
| 1617 | two-dimensional feature descriptor, as in \cite{pal2006}. Since |
|---|
| 1618 | both the image and reference catalog are noisy, we must allow each |
|---|
| 1619 | feature in the image to match a region in feature space. When using |
|---|
| 1620 | geometric features, we are faced with a tradeoff: if we consider |
|---|
| 1621 | sets of stars that span a large portion of the image, then the |
|---|
| 1622 | relative positional error is smaller (and hence each feature matches |
|---|
| 1623 | a smaller region of feature space - features are more distinctive), |
|---|
| 1624 | but there are \emph{many} features to test. If we choose smaller |
|---|
| 1625 | features, there are fewer features to test, but the relative error |
|---|
| 1626 | is larger so each feature matches a larger region in feature space; |
|---|
| 1627 | features can be confused more easily. |
|---|
| 1628 | |
|---|
| 1629 | |
|---|
| 1630 | Although triangles are appealing features from a theoretical |
|---|
| 1631 | perspective, in practice they are problematic due to the magnitude |
|---|
| 1632 | of positional noise in typical images we wish to solve. Any |
|---|
| 1633 | triangle we find in an image will generate a large number of false |
|---|
| 1634 | matches to triangles in our index because triangles aren't |
|---|
| 1635 | distinctive enough. We therefore use sets of four stars -- |
|---|
| 1636 | \emph{quads} -- as our features. The two most widely separated |
|---|
| 1637 | stars (A and B) define the corners of a coordinate system, and we |
|---|
| 1638 | list the positions of the remaining two stars (C and D) in that |
|---|
| 1639 | coordinate system. We demand that stars C and D lie inside the |
|---|
| 1640 | circle centered at the midpoint of stars A and B and diameter AB. |
|---|
| 1641 | We impose an arbitrary order to decide which star is labelled C and |
|---|
| 1642 | which is labelled D, based on their positions in the local |
|---|
| 1643 | coordinate system. This yields a four-dimensional feature |
|---|
| 1644 | descriptor. If we exchange the labels of A and B, then the local |
|---|
| 1645 | coordinate system is flipped and a different feature descriptor is |
|---|
| 1646 | produced. We store one of these descriptors in our index, and at |
|---|
| 1647 | test time we search for matches to both descriptors. |
|---|
| 1648 | |
|---|
| 1649 | |
|---|
| 1650 | This choice of feature and feature descriptor (\emph{code}) has |
|---|
| 1651 | several desirable properties. If stars are distributed uniformly in |
|---|
| 1652 | space, then codes are distributed uniformly in \emph{code space}. |
|---|
| 1653 | The error propagation properties are good: small positional errors |
|---|
| 1654 | in any of the four stars yield small errors in the code value, and |
|---|
| 1655 | isotropic errors in the positions of the stars produce nearly |
|---|
| 1656 | isotropic errors in the code value. This allows us to find matching |
|---|
| 1657 | codes based on Euclidean distance in code space. |
|---|
| 1658 | |
|---|
| 1659 | Our index is composed of a large number of code values and mappings |
|---|
| 1660 | back to locations on the sky. When given a new image to solve, we |
|---|
| 1661 | extract many quads, compute their codes, and for each one search for |
|---|
| 1662 | matching codes in the index. Due to the isotropic error properties |
|---|
| 1663 | of codes, searching for matching codes is simply \emph{range search} |
|---|
| 1664 | in four-dimensional code space. We use \kdtrees to accelerate this |
|---|
| 1665 | process. |
|---|
| 1666 | |
|---|
| 1667 | |
|---|
| 1668 | Each time we match a feature in an image to a feature in our index, |
|---|
| 1669 | we generate a hypothesis about the location (pointing, scale, and |
|---|
| 1670 | rotation) of the image on the sky. We are typically operating in |
|---|
| 1671 | the regime where \emph{any} feature will produce several matches, so |
|---|
| 1672 | we use a probabilistic model to determine whether a given match is a |
|---|
| 1673 | false positive or a true solution. We do this by computing a |
|---|
| 1674 | transformation between image coordinates and the sky based on the |
|---|
| 1675 | matching quad. We then search for stars in our reference catalog |
|---|
| 1676 | that should be visible in the image if the hypothesis is correct. |
|---|
| 1677 | By combining our knowledge of the positional errors in the reference |
|---|
| 1678 | catalog and the image with an estimate of the proportion of stars |
|---|
| 1679 | that should be visible in both, we can assign a probability to the |
|---|
| 1680 | hypothesis. If it is above some threshold, we accept the match. |
|---|
| 1681 | |
|---|
| 1682 | |
|---|
| 1683 | \emph{ |
|---|
| 1684 | \begin{itemize} |
|---|
| 1685 | \item Brightness ordering to speed search only. |
|---|
| 1686 | \item More about the constraints and goals of our system to differentiate it |
|---|
| 1687 | from the spacecraft attitude estimation problem. |
|---|
| 1688 | \item A quad is like the conjunction of two triangles: we're doing implicit agreement of two triangle features! |
|---|
| 1689 | \item Applications of our system? |
|---|
| 1690 | \end{itemize} |
|---|
| 1691 | } |
|---|
| 1692 | |
|---|
| 1693 | |
|---|
| 1694 | To summarize, the key elements of our approach are: |
|---|
| 1695 | \begin{itemize} |
|---|
| 1696 | \item \textbf{geometric features} composed of sets of point features; |
|---|
| 1697 | \item \textbf{continuous feature descriptors} that describe the relative positions of the point features; |
|---|
| 1698 | \item \textbf{fast range-search} to match features descriptors; |
|---|
| 1699 | \item \textbf{probabilistic validation} to eliminate false positives. |
|---|
| 1700 | \end{itemize} |
|---|
| 1701 | |
|---|
| 1702 | |
|---|
| 1703 | |
|---|
| 1704 | %rummage through the toolbox of computer vision and object |
|---|
| 1705 | %recognition techniques that are related to our approach or would seem |
|---|
| 1706 | %to be applicable to our problem. |
|---|
| 1707 | } |
|---|
| 1708 | |
|---|
| 1709 | |
|---|
| 1710 | |
|---|
| 1711 | \section{Conclusion} |
|---|
| 1712 | |
|---|
| 1713 | This paper presents the geometric feature matching approach we are |
|---|
| 1714 | using to solve the blind astrometry problem, and presents prior work |
|---|
| 1715 | relating to the techniques we are using and the problem we are trying |
|---|
| 1716 | to solve, as well as some different techniques that might seem to be |
|---|
| 1717 | applicable. |
|---|
| 1718 | |
|---|
| 1719 | Geometric hashing (or more generally, geometric feature matching) is a |
|---|
| 1720 | widely-applicable technique for problems where it is desirable to |
|---|
| 1721 | build distinctive geometric features out of simple interest-point |
|---|
| 1722 | features. The approach is ideally suited to the blind astrometry |
|---|
| 1723 | problem, since images of celestial objects can be localized to |
|---|
| 1724 | sub-pixel accuracy but have few other distinguishing features. |
|---|
| 1725 | Indeed, much of the previous work in blind and non-blind astrometry |
|---|
| 1726 | can be placed within a geometric feature matching framework. |
|---|
| 1727 | |
|---|
| 1728 | |
|---|
| 1729 | Although one might suppose that hashing is an intrinsic part of |
|---|
| 1730 | geometric hashing, we have found that a tree structure is more |
|---|
| 1731 | effective for the large-scale feature matching we want to perform; we |
|---|
| 1732 | feel that deciding to use approximate feature matching should be a |
|---|
| 1733 | design decision, not an intrinsic part of the method: in cases where |
|---|
| 1734 | each image contains only a few features that could potentially be |
|---|
| 1735 | found in the index, we do not want to risk missing them due to ``bad |
|---|
| 1736 | luck'' in a hash function. |
|---|
| 1737 | |
|---|
| 1738 | |
|---|