| 1 | \section{Preface} |
|---|
| 2 | |
|---|
| 3 | This thesis describes some of the research carried out by me and my |
|---|
| 4 | colleagues in the \an group. \an started as a collaboration between |
|---|
| 5 | Sam Roweis, a University of Toronto computer scientist, and David |
|---|
| 6 | W.~Hogg, a New York University astronomer. An idea that began as a |
|---|
| 7 | crazy scrawl on the back of a napkin---probably in a bar---eventually |
|---|
| 8 | grew into a real project and began attracting collaborators and |
|---|
| 9 | students. Fast-forwarding a few years, we have achieved a rare feat: |
|---|
| 10 | a computer vision software system that \emph{works}, with little human |
|---|
| 11 | intervention, and is being used by hundreds of real astronomers doing |
|---|
| 12 | real research. We have also branched out and explored a number of |
|---|
| 13 | other promising areas where ideas in computer vision and machine |
|---|
| 14 | learning can be applied to astronomical data to great benefit. |
|---|
| 15 | |
|---|
| 16 | |
|---|
| 17 | \begin{figure} |
|---|
| 18 | \begin{center} |
|---|
| 19 | test |
|---|
| 20 | \end{center} |
|---|
| 21 | \caption{test caption} |
|---|
| 22 | \end{figure} |
|---|
| 23 | |
|---|
| 24 | |
|---|
| 25 | This thesis is quite blatantly a collection of manuscripts. I have |
|---|
| 26 | done a bit more than staple them together to produce the chapters of |
|---|
| 27 | this thesis, but the original tone and style of each manuscript still |
|---|
| 28 | shines through. I hope my readers will find the chapter-to-chapter |
|---|
| 29 | variations a refreshing, rather than jarring, change of pace. One |
|---|
| 30 | advantage of collecting manuscripts into a thesis is that each chapter |
|---|
| 31 | should largely stand alone. Another advantage is that each manuscript |
|---|
| 32 | was prepared with a list of authors, so it is simpler to identify the |
|---|
| 33 | collaborators who have contributed to each chapter. |
|---|
| 34 | |
|---|
| 35 | |
|---|
| 36 | Test para |
|---|
| 37 | |
|---|
| 38 | \section{Introduction} |
|---|
| 39 | |
|---|
| 40 | |
|---|
| 41 | The general idea of the \an collaboration is to apply ideas in |
|---|
| 42 | computer vision and machine learning to problems and data in |
|---|
| 43 | astronomy. Astronomical images are a good domain in which to explore |
|---|
| 44 | ideas in computer vision, because the problems to be solved are often |
|---|
| 45 | much more constrained than in general imaging, the telescopes and |
|---|
| 46 | cameras are often well-understood and calibrated, large volumes of |
|---|
| 47 | data and ``ground truth'' information exist, and with new telescopes |
|---|
| 48 | constantly coming online and producing greater and greater volumes of |
|---|
| 49 | imaging, there is a need for sophisticated computation. I briefly |
|---|
| 50 | outline astronomical imaging for computer vision researchers in |
|---|
| 51 | \secref{sec:astrointro}. |
|---|
| 52 | |
|---|
| 53 | |
|---|
| 54 | The specific idea addressed in this thesis is that pattern recognition |
|---|
| 55 | approaches can be applied to astronomical images. The goal is to |
|---|
| 56 | produce a system that is able to recognize automatically the stars in |
|---|
| 57 | any image of the night sky, using the information in the image pixels |
|---|
| 58 | alone. Recognizing the stars in an image is equivalent to finding the |
|---|
| 59 | pointing, scale, and rotation of the image on the sky in a standard |
|---|
| 60 | coordinate system. The branch of astronomy concerned with measuring |
|---|
| 61 | the positions and motions of celestial bodies is called |
|---|
| 62 | \emph{astrometry}, and the task of placing a new image in a standard |
|---|
| 63 | reference frame is known as ``solving the astrometry'' or |
|---|
| 64 | ``calibrating the astrometry'' of the image. The \emph{blind |
|---|
| 65 | astrometric calibration} task is to calibrate an image using only the |
|---|
| 66 | image itself. The broad goal of the \an project was to build a system |
|---|
| 67 | that would allow us to create correct, standards-compliant astrometric |
|---|
| 68 | \metadata for every useful astronomical image ever taken, past and |
|---|
| 69 | future, in any state of archival disarray. This is part of a larger |
|---|
| 70 | effort to organize, annotate and make searchable all the world's |
|---|
| 71 | astronomical information. |
|---|
| 72 | |
|---|
| 73 | |
|---|
| 74 | After a brief introduction to astronomical imaging for computer |
|---|
| 75 | scientists, the remainder of this chapter reviews the area of pattern |
|---|
| 76 | recognition, and in particular the framework of \emph{geometric |
|---|
| 77 | hashing} for object recognition in images. One of the contributions |
|---|
| 78 | of this thesis is to replace the simple hash table used in traditional |
|---|
| 79 | geometric hashing with a \kdtree, so I review related work in hashing |
|---|
| 80 | and other approaches for fast feature matching. Finally, I present |
|---|
| 81 | the blind astrometric calibration task as an instance of object |
|---|
| 82 | recognition, and review some previous approaches to the problem. |
|---|
| 83 | |
|---|
| 84 | %% Our approach? |
|---|
| 85 | |
|---|
| 86 | |
|---|
| 87 | The remaining chapters present our approach to the blind astrometric |
|---|
| 88 | calibration problem. \Chapref{chap:techreport} explains our approach |
|---|
| 89 | and presents the results of large-scale tests of the system on |
|---|
| 90 | real-world data. Chapters \ref{chap:verify} and \ref{chap:kdtree} |
|---|
| 91 | delve into details of the approach: \Chapref{chap:verify} presents the |
|---|
| 92 | Bayesian decision theory problem that lies at the heart of our |
|---|
| 93 | approach, while \chapref{chap:kdtree} explains the technical details |
|---|
| 94 | of the \kdtree data structure implementation that is key to making our |
|---|
| 95 | system performant enough to be a practical tool. |
|---|
| 96 | \Chapref{chap:conclusion} summarizes our results. |
|---|
| 97 | |
|---|
| 98 | |
|---|
| 99 | \section{Astronomical imaging for computer scientists} |
|---|
| 100 | \label{sec:astrointro} |
|---|
| 101 | |
|---|
| 102 | Astronomical images are very different from images produced by typical |
|---|
| 103 | consumer-grade cameras. This section is intended to give |
|---|
| 104 | non-astronomers a sense of the typical parameters of contemporary |
|---|
| 105 | astronomical imaging setups, and introduce some of the terminology and |
|---|
| 106 | key ideas. |
|---|
| 107 | |
|---|
| 108 | |
|---|
| 109 | The Sloan Digital Sky Survey (SDSS) \cite{sdsstechnical, sdsscamera} |
|---|
| 110 | is one of the most important projects in contemporary astronomy. The |
|---|
| 111 | primary imaging survey was carried out from 2000 through 2008, so the |
|---|
| 112 | hardware is no longer cutting-edge, but it is still very impressive. |
|---|
| 113 | The telescope's primary mirror is $2.5$ meters in diameter. Incoming |
|---|
| 114 | light first strikes the primary mirror, then a one-meter secondary |
|---|
| 115 | mirror, then passes through two corrective lenses in front of the |
|---|
| 116 | camera. The nearly distortion-free field of view is about $3$ degrees |
|---|
| 117 | wide. The camera is composed of $30$ charge-coupled devices (CCDs), |
|---|
| 118 | each $2048\times2048$ pixels, arranged in six columns. The five CCDs |
|---|
| 119 | in each column have different bandpass filters, ranging from the |
|---|
| 120 | near-infrared ($913~\nanometers$) through the optical to the |
|---|
| 121 | near-ultraviolet ($354~\nanometers$). The CCDs and associated |
|---|
| 122 | electronics are cooled with liquid nitrogen to $-80^{\circ} \unit{C}$ |
|---|
| 123 | and held under vacuum. Each CCD is about $5~\unit{cm}$ square, and |
|---|
| 124 | the whole focal plane of the camera is about $65~\unit{cm}$ in |
|---|
| 125 | diameter. The camera assembly has a mass over $300~\unit{kg}$. |
|---|
| 126 | |
|---|
| 127 | |
|---|
| 128 | The survey operates in ``drift-scanning'' mode: the telescope is |
|---|
| 129 | pointed in a nearly fixed direction, and as the Earth rotates the sky |
|---|
| 130 | drifts past the camera. The camera electronics shift the electrons |
|---|
| 131 | along the CCD columns at the same rate. As an object drifts through |
|---|
| 132 | the field of view, the electrons it produces in the CCD pixels march |
|---|
| 133 | along with it, and are read out as it disappears from view. In the |
|---|
| 134 | SDSS camera, the object drifts past each of the five bandpass-filtered |
|---|
| 135 | CCDs of the camera in turn, producing a five-color image. Over the |
|---|
| 136 | course of a night of observing, the camera sweeps out a great circle, |
|---|
| 137 | creating six columns of five-band images that are $2048$ pixels wide |
|---|
| 138 | and over a million pixels long: over $120$ gigabytes of pixels per |
|---|
| 139 | night. This clever drift-scanning scheme has many advantages, one |
|---|
| 140 | being that the readout rate of the CCDs can be relatively slow, |
|---|
| 141 | resulting in low readout noise. Calibration of the camera is simpler |
|---|
| 142 | because charge is integrated over columns of the CCD: it is only |
|---|
| 143 | necessary to calibrate whole columns rather than individual pixels. |
|---|
| 144 | Note that for this scheme to work, the CCDs must be very carefully |
|---|
| 145 | aligned in the focal plane, and the telescope control system must be |
|---|
| 146 | capable of keeping the system pointed stably at the level of a pixel |
|---|
| 147 | or better. The moving mass of the telescope is over |
|---|
| 148 | $15,000~\unit{kg}$, so this is no mean feat! |
|---|
| 149 | |
|---|
| 150 | % 24 um pixels |
|---|
| 151 | |
|---|
| 152 | Astronomical imaging systems are designed so that the point-spread |
|---|
| 153 | function (PSF)---the image on the CCD of a distant point source of |
|---|
| 154 | light---is spread over several pixels. This allows the position of a |
|---|
| 155 | point source to be measured to sub-pixel accuracy by fitting a model |
|---|
| 156 | of the PSF to the observed pixel values. If the system focused a |
|---|
| 157 | point source onto a single pixel, this would be impossible. The goal |
|---|
| 158 | is typically for the full-width at half-maximum (FWHM) of the PSF to |
|---|
| 159 | be a few pixels, so that the PSF is well-sampled but the signal is not |
|---|
| 160 | spread out over too many pixels. |
|---|
| 161 | |
|---|
| 162 | For ground-based telescopes, the moving atmosphere causes the image of |
|---|
| 163 | a distant point source of light to dance around on the CCD; in typical |
|---|
| 164 | exposures of many seconds this ``speckling'' is integrated into a |
|---|
| 165 | point-spread function. The size of this PSF is called the ``seeing'' |
|---|
| 166 | and a value of one $\unit{arcsecond}$ FWHM is considered quite good. |
|---|
| 167 | The SDSS camera has pixels of size $0.396~\unit{arcseconds}$, and the |
|---|
| 168 | point-spread function is dominated by the seeing: the optics of the |
|---|
| 169 | system are good enough that the images are as clear as they can be |
|---|
| 170 | given the atmosphere. |
|---|
| 171 | |
|---|
| 172 | |
|---|
| 173 | The positions of astronomical objects are usually measured in an |
|---|
| 174 | \emph{equatorial coordinate system}: essentially the projection of the |
|---|
| 175 | Earth's latitudes and longitudes onto the celestial sphere at a |
|---|
| 176 | specific instant. The latitudinal direction is called ``declination'' |
|---|
| 177 | and abbreviated ``Dec'', and the longitudinal direction is called |
|---|
| 178 | ``right ascension'' and abbreviated ``RA''. Declination is typically |
|---|
| 179 | measured in degrees from $-90$ to $+90$, and right ascension is |
|---|
| 180 | typically measured in degrees from $0$ to $360$, or hours from $0$ to |
|---|
| 181 | $24$. There are $60~\unit{arcminutes}$ per degree and |
|---|
| 182 | $60~\unit{arcseconds}$ per $\unit{arcminute}$. For reference, the |
|---|
| 183 | Moon as seen from Earth is about half a degree in diameter. The |
|---|
| 184 | $0.396~\unit{arcsecond}$ pixels of the SDSS camera view an area the |
|---|
| 185 | size of a penny at a distance of ten kilometers; each $2048\times2048$ |
|---|
| 186 | pixel CCD views an area of about one-millionth of the sky. |
|---|
| 187 | |
|---|
| 188 | |
|---|
| 189 | The SDSS is very impressive, but the next generation of telescopes |
|---|
| 190 | currently in development dwarf its specifications. The Large Synoptic |
|---|
| 191 | Survey Telescope (LSST) \cite{lsst}, for example, has an |
|---|
| 192 | $8.4~\unit{meter}$ primary mirror, with a $3.5~\unit{degree}$ field of |
|---|
| 193 | view almost completely covered by an array of nearly $200$ CCDs, each |
|---|
| 194 | with $4096\times4096$ pixels: a total of about $3.2$ gigapixels. The |
|---|
| 195 | camera is the size of a small car. It will take exposures of about |
|---|
| 196 | $15$ seconds, yielding over $10$ terabytes of data per night. |
|---|
| 197 | |
|---|
| 198 | % -- basics of data reduction; catalogs |
|---|
| 199 | % -- stars, galaxies, and other stuff |
|---|
| 200 | |
|---|
| 201 | |
|---|
| 202 | |
|---|
| 203 | |
|---|
| 204 | \section{Pattern recognition} |
|---|
| 205 | |
|---|
| 206 | \emph{Pattern recognition} encompasses a huge class of problems and |
|---|
| 207 | techniques in computer vision and machine learning. Many pattern |
|---|
| 208 | recognition problems are of the type ``identify this object'' or |
|---|
| 209 | ``find things like this'': the system is first told about a large set |
|---|
| 210 | of objects, then is presented with a novel object and is asked to |
|---|
| 211 | describe it or find similar objects from the set of known objects. |
|---|
| 212 | |
|---|
| 213 | Pattern recognition systems often take a \emph{feature matching} |
|---|
| 214 | approach. A \emph{feature} is a compact description of |
|---|
| 215 | (problem-specific) relevant, invariant properties of an object, such |
|---|
| 216 | that objects that are similar in the problem domain have similar |
|---|
| 217 | features. Defining how features are extracted from objects and |
|---|
| 218 | defining the similarity between features are key factors in the |
|---|
| 219 | success of such systems. Often the representation of a feature (the |
|---|
| 220 | \emph{feature descriptor}) is a point in a real coordinate space and a |
|---|
| 221 | distance metric defines the similarity between features. |
|---|
| 222 | |
|---|
| 223 | Feature-based approaches typically involve two phases: an |
|---|
| 224 | \emph{indexing} phase, in which the set of known objects is shown to |
|---|
| 225 | the system, and a \emph{test} phase, in which novel objects are |
|---|
| 226 | presented. During indexing, features are extracted from the set of |
|---|
| 227 | known objects. At test time, features are extracted from the novel |
|---|
| 228 | object and compared with the set of features from the known objects. |
|---|
| 229 | In many domains, the set of known objects is very large, so it is |
|---|
| 230 | critical that features be stored in a manner that allows the system to |
|---|
| 231 | rapidly find known features that match a given feature. |
|---|
| 232 | |
|---|
| 233 | Often feature-based approaches suffer \emph{feature aliasing} |
|---|
| 234 | problems: two objects that are dissimilar in the problem domain may |
|---|
| 235 | yield similar features. In this case, systems may use \emph{voting} |
|---|
| 236 | or \emph{verification} schemes to ensure that aliased (false) feature |
|---|
| 237 | matches are eliminated. Voting schemes typically assume that false |
|---|
| 238 | feature matches are independently randomly distributed, so the |
|---|
| 239 | probability of matching several features to the same incorrect object |
|---|
| 240 | is the product of the individual probabilities; if we demand a |
|---|
| 241 | sufficient number of matches, the probability of a false match becomes |
|---|
| 242 | small. Verification or \emph{hypothesize-test} schemes treat a |
|---|
| 243 | feature match as a hypothesis that the novel object is similar to a |
|---|
| 244 | known object, and then some test to decide if the hypothesis is |
|---|
| 245 | correct. Such schemes can be seen as using feature matching as a |
|---|
| 246 | search heuristic. |
|---|
| 247 | |
|---|
| 248 | \section{Visual pattern recognition} |
|---|
| 249 | |
|---|
| 250 | Pattern recognition in images is a particularly challenging and |
|---|
| 251 | complex problem domain which has been studied extensively. Objects to |
|---|
| 252 | recognize include faces, fingerprints, specific physical objects, or |
|---|
| 253 | even general classes of physical objects such as mugs, chairs, or |
|---|
| 254 | elephants. |
|---|
| 255 | |
|---|
| 256 | |
|---|
| 257 | Visual pattern recognition of general physical (3-dimensional) objects |
|---|
| 258 | is difficult for many reasons: objects may be deformable, can appear |
|---|
| 259 | different when viewed from different angles or under different |
|---|
| 260 | lighting conditions, and can be occluded by other objects. Features |
|---|
| 261 | used in visual pattern recognition include edges (lines and curves), |
|---|
| 262 | corners, or the color or texture of image patches. |
|---|
| 263 | |
|---|
| 264 | |
|---|
| 265 | An important consideration when designing features for visual pattern |
|---|
| 266 | recognition is the size (\ie, spatial extent in the image) of the |
|---|
| 267 | feature. \emph{Local features} gather information from a small region |
|---|
| 268 | of the image, while \emph{global features} incorporate information |
|---|
| 269 | from the whole image. Local features can provide robustness to |
|---|
| 270 | occlusion and deformation: if part of the object is hidden, features |
|---|
| 271 | belonging to that part will be lost, but the features belonging to the |
|---|
| 272 | remaining parts will still be found. Since local features aggregate |
|---|
| 273 | information from only a small portion of the image, they may be more |
|---|
| 274 | noisy or less distinctive than global features. |
|---|
| 275 | |
|---|
| 276 | |
|---|
| 277 | Since a single local feature may be insufficient to guarantee that an |
|---|
| 278 | object has been correctly recognized, some systems build high-level |
|---|
| 279 | features by combining multiple low-level features. When the objects |
|---|
| 280 | to be recognized are rigid or articulated (composed of multiple rigid |
|---|
| 281 | parts connected by simple joints), a common approach is to build |
|---|
| 282 | features that describe the relative geometry of low-level features. |
|---|
| 283 | This use of \emph{geometric features} is often called \emph{geometric |
|---|
| 284 | hashing}, since the result is a feature descriptor or \emph{hash code} |
|---|
| 285 | which describes the geometric arrangement of the low-level features. |
|---|
| 286 | |
|---|
| 287 | |
|---|
| 288 | |
|---|
| 289 | \section{The geometric hashing framework} |
|---|
| 290 | |
|---|
| 291 | \begin{figure} |
|---|
| 292 | \begin{center} |
|---|
| 293 | % dot lamdan.dot -Tps2 -o lamdan.ps |
|---|
| 294 | % ps2pdf -sPAPERSIZE=a4 lamdan.ps lamdan.pdf |
|---|
| 295 | \includegraphics[height=0.8\textheight]{lamdan} |
|---|
| 296 | \end{center} |
|---|
| 297 | \caption{Outline of the Geometric Hashing scheme. This diagram is a |
|---|
| 298 | reproduction of \fig 2 in Lamdan \etal \cite{lamdan1990}. |
|---|
| 299 | \label{lamdan}} |
|---|
| 300 | \end{figure} |
|---|
| 301 | |
|---|
| 302 | |
|---|
| 303 | Lamdan \etal \cite{lamdan1990} and Wolfson and Rigoutsos |
|---|
| 304 | \cite{wolfson1997} give an excellent overview of the geometric hashing |
|---|
| 305 | approach. See Figure \ref{lamdan} for a block-diagram overview. |
|---|
| 306 | Geometric hashing is based on the fact that when rigid objects undergo |
|---|
| 307 | similarity transformations (translation, rotation, and scaling), |
|---|
| 308 | angles between points on the object and relative distances between |
|---|
| 309 | points do not change. Therefore, if we describe the relative |
|---|
| 310 | arrangement of points on an object in terms of these invariants, our |
|---|
| 311 | description will be invariant under these transformations of the |
|---|
| 312 | object. Invariant geometric properties can be found for other types |
|---|
| 313 | of transformations such as affine or perspective projection. |
|---|
| 314 | |
|---|
| 315 | |
|---|
| 316 | Geometric hashing builds \emph{geometric features} out of lower-level |
|---|
| 317 | features. We will call the low-level features \emph{interest points} |
|---|
| 318 | for clarity. |
|---|
| 319 | |
|---|
| 320 | |
|---|
| 321 | Given the types of transformations the system must handle, there is |
|---|
| 322 | some sufficient number of interest points required to define an |
|---|
| 323 | invariant local coordinate system; these are called the \emph{basis |
|---|
| 324 | set}. For example, for 2-D objects that can undergo similarity |
|---|
| 325 | transformations, two interest points can be used to define a |
|---|
| 326 | coordinate system. For 3-D objects with similarity transformations, |
|---|
| 327 | three (non-collinear) interest points are required. |
|---|
| 328 | |
|---|
| 329 | Geometric hashing follows an indexing approach. During the indexing |
|---|
| 330 | phase, the system preprocesses a database of objects which are to be |
|---|
| 331 | recognized. For each object, many interest points are extracted which |
|---|
| 332 | are invariant under the transformations that are expected, and can be |
|---|
| 333 | well localized. The system then enumerates all basis sets by looking |
|---|
| 334 | at all combinations of interest points; each set of basis features |
|---|
| 335 | defines a relative coordinate system. For each basis set, all |
|---|
| 336 | remaining interest points and enumerated and their positions described |
|---|
| 337 | in this coordinate system. Each relative coordinate vector forms a |
|---|
| 338 | \emph{geometric feature descriptor}. In this way, the relative |
|---|
| 339 | geometric arrangement of a set of interest points is converted into a |
|---|
| 340 | point in a vector space. |
|---|
| 341 | |
|---|
| 342 | % --- consider adding a variant of the quad fig here. |
|---|
| 343 | |
|---|
| 344 | In the standard geometric hashing approach, the geometric feature |
|---|
| 345 | descriptor is discretized and encoded as a hash table key. A grid is |
|---|
| 346 | place over the feature descriptor space, and the hash table key of a |
|---|
| 347 | geometric feature is the identifier of the grid cell containing it. |
|---|
| 348 | Each entry in the hash table maps back to the basis set plus the |
|---|
| 349 | feature whose position is encoded. |
|---|
| 350 | |
|---|
| 351 | % --- picture? |
|---|
| 352 | |
|---|
| 353 | Given a new image, a geometric hashing system extracts all interest |
|---|
| 354 | points from the image and, as at indexing time, proceeds to enumerate |
|---|
| 355 | all basis sets and computes the positions of the remaining interest |
|---|
| 356 | points in each basis set. This generates many geometric features |
|---|
| 357 | whose descriptors are discretized and converted to hash table keys, as |
|---|
| 358 | before. Similar arrangements of interest points will lead to similar |
|---|
| 359 | geometric feature descriptors, which--with luck---will be mapped to |
|---|
| 360 | the same hash table key. The hash table values associated with each |
|---|
| 361 | key are retrieved, and each value that is found is considered a match. |
|---|
| 362 | Each match votes for the correspondence between the interest points |
|---|
| 363 | from the database and the interest points from the image that were |
|---|
| 364 | used to define the geometric feature. |
|---|
| 365 | |
|---|
| 366 | |
|---|
| 367 | After all features from the image have been processed, many votes will |
|---|
| 368 | have accumulated. False matches (due to aliasing or hash table |
|---|
| 369 | collisions) should be uniformly distributed over the set of known |
|---|
| 370 | objects, so should not generate very many votes for any particular |
|---|
| 371 | object. Objects that are actually present in the image should |
|---|
| 372 | generate many votes. Objects that receive an insufficient number of |
|---|
| 373 | votes are rejected, and the remainder are subjected to a verification |
|---|
| 374 | process. |
|---|
| 375 | |
|---|
| 376 | |
|---|
| 377 | The verification process requires computing a transformation from the |
|---|
| 378 | object to image coordinates, using the corresponding interest points |
|---|
| 379 | from the known object and the image. Using this transformation, all |
|---|
| 380 | the interest points belonging to the known object are projected into |
|---|
| 381 | image coordinates. If the match is correct, we expect the image to |
|---|
| 382 | contain features at these locations. Some researchers add an extra |
|---|
| 383 | verification stage by projecting the known object into the image and |
|---|
| 384 | checking that the predicted edges of the object are found in the image |
|---|
| 385 | (eg, \cite{lamdan1990, huttenlocher1990}). |
|---|
| 386 | |
|---|
| 387 | |
|---|
| 388 | The geometric hashing approach is flexible and applicable to many |
|---|
| 389 | problems, but there are some issues with the basic version described |
|---|
| 390 | here. The hash function described above simply places a grid over the |
|---|
| 391 | relative coordinate system and returns the identifier of the grid cell |
|---|
| 392 | in which the interest point lands. This is prone to edge effects: if |
|---|
| 393 | an interest point is near the edge of a grid cell, a small change in |
|---|
| 394 | its position due to noise can move it across the edge into a different |
|---|
| 395 | grid cell, which causes the hash code to change and a different set of |
|---|
| 396 | feature matches to be found. This problem can be overcome by |
|---|
| 397 | detecting interest points that are near the edges of their cells and |
|---|
| 398 | also searching the hash table using the hash codes of neighbouring |
|---|
| 399 | cells. This results in more false feature matches, and can become |
|---|
| 400 | expensive as the dimensionality of the geometric feature space |
|---|
| 401 | increases (there are more edges, and the proportion of the hypervolume |
|---|
| 402 | of feature space that is near an edge increases). Using a data |
|---|
| 403 | structure other than the hash table may be beneficial. |
|---|
| 404 | |
|---|
| 405 | |
|---|
| 406 | Another issue with uniform grid-based hashing is that it assigns equal |
|---|
| 407 | importance (and storage space) to each grid cell in the feature space. |
|---|
| 408 | Consider an interest point that is far from the basis set that defines |
|---|
| 409 | its local coordinate system. Small positional errors in the basis set |
|---|
| 410 | can cause the coordinate system to be rotated or scaled, which results |
|---|
| 411 | in large changes in the interest point's relative coordinates. This |
|---|
| 412 | means that edge effects become more pronounced in such regions of the |
|---|
| 413 | feature space. One response to this concern is to increases the size |
|---|
| 414 | of the grid cells far from the basis set, but then if the interest |
|---|
| 415 | points are uniformly distributed, these larger hash table bins will |
|---|
| 416 | contain more features, so every match to these bins will be |
|---|
| 417 | accompanied by more false positives. |
|---|
| 418 | |
|---|
| 419 | |
|---|
| 420 | Finally, a hash table with equal-sized cells will only be uniformly |
|---|
| 421 | utilized if interest points are uniformly distributed in feature |
|---|
| 422 | space, but some domains may yield interest points that are far from |
|---|
| 423 | uniformly distributed. In other domains, it may be beneficial to |
|---|
| 424 | place constraints on the geometric features that are used (for |
|---|
| 425 | example, to avoid regions that are prone to large errors, or that are |
|---|
| 426 | relatively indistinctive). |
|---|
| 427 | |
|---|
| 428 | |
|---|
| 429 | There is nothing in the geometric hashing framework that requires a |
|---|
| 430 | standard hash table to be used for feature matching. Other methods of |
|---|
| 431 | feature matching are described in the next section. |
|---|
| 432 | |
|---|
| 433 | |
|---|
| 434 | \section{Related work in fast feature matching} |
|---|
| 435 | |
|---|
| 436 | |
|---|
| 437 | For feature matching in a geometric hashing framework, the database of |
|---|
| 438 | known objects is converted into a set of points in a feature |
|---|
| 439 | descriptor space, which is a vector space. To find matches to a new |
|---|
| 440 | feature (extracted from an image in which we want to recognize |
|---|
| 441 | objects), we need to find all points in the vector space that are |
|---|
| 442 | within some tolerance. Usually the Euclidean distance metric is |
|---|
| 443 | applied, so the tolerance is a radius in the vector space. |
|---|
| 444 | |
|---|
| 445 | |
|---|
| 446 | \subsection{Bloom filters} |
|---|
| 447 | |
|---|
| 448 | A Bloom filter \cite{bloom1970} is a data structure that can answer |
|---|
| 449 | queries of the form ``is feature $q$ a member of the set of known |
|---|
| 450 | features?'' A Bloom filter is a set of $n$ bits, initially zero. A |
|---|
| 451 | set of $k$ hash functions must be defined which take a feature as |
|---|
| 452 | input and produce an integer in $[0, n)$ as output. In the indexing |
|---|
| 453 | phase, we examine each of the known features, and apply each of the |
|---|
| 454 | hash functions, producing $k$ hash codes. Each code is used to |
|---|
| 455 | reference a bit and \emph{set} it (\ie, turn it on). Each known |
|---|
| 456 | feature results in $k$ bits being set; several features may request |
|---|
| 457 | that a particular bit be set. |
|---|
| 458 | |
|---|
| 459 | |
|---|
| 460 | Given a new feature to match, we run the $k$ hash functions on it and |
|---|
| 461 | test whether each of the resulting bits are set. If the feature is |
|---|
| 462 | equal to a feature seen at indexing time (according to the hash |
|---|
| 463 | functions), then all $k$ of the bits will have been set; thus the |
|---|
| 464 | Bloom filter does not produce false negatives. If the query feature |
|---|
| 465 | is not equal to a known feature, then some of the bits may have been |
|---|
| 466 | set due to \emph{collision} of the hash functions, but the feature |
|---|
| 467 | will only be accepted if all $k$ hash functions result in collision; |
|---|
| 468 | this results in a false positive. The rate of false positives |
|---|
| 469 | increases as the number of known features increases relative to the |
|---|
| 470 | number of bits in the filter. The Bloom filter is ideally suited to |
|---|
| 471 | cases where the number of known features is small, and a verification |
|---|
| 472 | step can be applied to eliminate any false positives that are |
|---|
| 473 | produced. |
|---|
| 474 | |
|---|
| 475 | |
|---|
| 476 | For feature matching, we want to know not only whether the query |
|---|
| 477 | feature $q$ matches a known feature, we want to know \emph{which} |
|---|
| 478 | known feature matches. One way to answer such queries is to use a |
|---|
| 479 | \emph{Bloomier filter} \cite{chazelle2004}. The simplest Bloomier |
|---|
| 480 | filter uses multiple Bloom filters to categorize a given feature into |
|---|
| 481 | two different categories, and uses slightly more space than two Bloom |
|---|
| 482 | filters. A set of $n$ such Bloomier filters can be used to categorize |
|---|
| 483 | a feature into $2^n$ different categories by assigning one Bloomier |
|---|
| 484 | filter to each bit of the result. |
|---|
| 485 | |
|---|
| 486 | |
|---|
| 487 | The standard Bloom filter only finds exact matches, but we want to |
|---|
| 488 | find \emph{nearby} matches. An extension of Bloom filters to allow |
|---|
| 489 | proximity searches is given by Kirsch and Mitzenmacher |
|---|
| 490 | \cite{kirsch2006b}. This variant uses \emph{locality sensitive |
|---|
| 491 | hashing} (LSH, discussed below) as its hashing function, so it |
|---|
| 492 | inherits some limitations from LSH. Most importantly, LSH only |
|---|
| 493 | guarantees with some probability that nearby features will hash to the |
|---|
| 494 | same bin. As a result, the locality-sensitive Bloom filter can |
|---|
| 495 | produce false negatives as well as false positives. |
|---|
| 496 | |
|---|
| 497 | |
|---|
| 498 | In order to answer feature matching queries (find all known features |
|---|
| 499 | near a given query feature), one could combine these Bloom filter |
|---|
| 500 | extensions to build a locality-sensitive Bloomier filter. However, |
|---|
| 501 | the false negative rate of the locality-sensitive filter would be |
|---|
| 502 | magnified because the Bloomier filter requires \mbox{$2 |
|---|
| 503 | \ceil{\log(n)}$} separate filters to represent $n$ known features, and |
|---|
| 504 | all of these filters must produce the correct answer. It seems that |
|---|
| 505 | Bloom filters are not particularly well suited to this task. |
|---|
| 506 | |
|---|
| 507 | |
|---|
| 508 | \subsection{Locality Sensitive Hashing} |
|---|
| 509 | |
|---|
| 510 | Locality Sensitive Hashing (LSH) \cite{datar2004, gionis1999, |
|---|
| 511 | andoni2006}, is a technique for answering approximate nearest |
|---|
| 512 | neighbour queries: it returns a feature whose distance is within a |
|---|
| 513 | constant factor of the distance to the nearest neighbour. The LSH |
|---|
| 514 | scheme can be extended to handle exact nearest-neighbour queries |
|---|
| 515 | efficiently if the set of known features satisfies some \emph{bounded |
|---|
| 516 | growth} criteria. It is intended for use in high-dimensional spaces |
|---|
| 517 | (tens to thousands of dimensions), and $\ell_p$ distance metrics for |
|---|
| 518 | $p \in (0, 2]$. |
|---|
| 519 | |
|---|
| 520 | |
|---|
| 521 | The core idea of LSH is to use several hash functions with the |
|---|
| 522 | property that the probability of collision---features being placed in |
|---|
| 523 | the same bin---is much higher for nearby features than for distant |
|---|
| 524 | ones. In the indexing stage, we populate the hash table by running |
|---|
| 525 | each hash function on each known feature. To perform a query, we run |
|---|
| 526 | the hash functions on the query feature and retrieve the features in |
|---|
| 527 | the same hash table bin as the query. One of these features is likely |
|---|
| 528 | to be an approximate nearest neighbour of the query. |
|---|
| 529 | |
|---|
| 530 | |
|---|
| 531 | In practice, in \cite{datar2004} each hash function is composed of |
|---|
| 532 | several (typically $10$) simpler hash functions. Each simple hash |
|---|
| 533 | function performs a random projection in the feature space: the hash |
|---|
| 534 | value is the discretized value of the dot product of the feature |
|---|
| 535 | vector with the hash function's random vector. Combining several of |
|---|
| 536 | these simple hash functions is equivalent to projecting the feature |
|---|
| 537 | vectors onto a grid in a $10$-dimensional, non-orthogonal, subspace. |
|---|
| 538 | The hash code is simply the identifier of the grid cell in which the |
|---|
| 539 | feature falls. |
|---|
| 540 | |
|---|
| 541 | |
|---|
| 542 | This scheme suffers from edge effects: a query feature may land in a |
|---|
| 543 | bin other than the bin containing its nearest neighbour due to small |
|---|
| 544 | positional noise in any of the dimensions. Attempting to reduce this |
|---|
| 545 | effect by making the bin size (discretization level) larger fails |
|---|
| 546 | because the number of hash functions must be increased to maintain the |
|---|
| 547 | same total number of hash bins. |
|---|
| 548 | |
|---|
| 549 | |
|---|
| 550 | In order to reduce the number of false negatives, we can choose many |
|---|
| 551 | (typically $30$) different hash functions and place a reference to |
|---|
| 552 | each feature in the hash table bin chosen by each function. This |
|---|
| 553 | increases the probability that at least one of the hash functions will |
|---|
| 554 | select a $10$-dimensional subspace in which the query is close to its |
|---|
| 555 | nearest neighbour. |
|---|
| 556 | |
|---|
| 557 | |
|---|
| 558 | The standard LSH scheme does not seem particularly well suited for the |
|---|
| 559 | purpose of feature matching in a geometric hashing system, since the |
|---|
| 560 | dimensionality of the feature vectors is usually moderate ($2$ to $6$ |
|---|
| 561 | in the \an system). Since some feature aliasing is expected in |
|---|
| 562 | geometric hashing systems, the nearest neighbour may not be the |
|---|
| 563 | correct match: we would prefer to get all neighbours within a given |
|---|
| 564 | search radius. |
|---|
| 565 | |
|---|
| 566 | |
|---|
| 567 | \subsection{\Kdtrees} |
|---|
| 568 | |
|---|
| 569 | \begin{figure} |
|---|
| 570 | \begin{center} |
|---|
| 571 | % tree-figs.py : plot_bboxes |
|---|
| 572 | \begin{tabular}{c@{\hspace{3pt}}c@{\hspace{3pt}}c} |
|---|
| 573 | \includegraphics[width=0.31\textwidth]{kdtree-0} & |
|---|
| 574 | \includegraphics[width=0.31\textwidth]{kdtree-1} & |
|---|
| 575 | \includegraphics[width=0.31\textwidth]{kdtree-2} \\ |
|---|
| 576 | \includegraphics[width=0.31\textwidth]{kdtree-3} & |
|---|
| 577 | \includegraphics[width=0.31\textwidth]{kdtree-4} & |
|---|
| 578 | \includegraphics[width=0.31\textwidth]{kdtree-5} |
|---|
| 579 | \end{tabular} |
|---|
| 580 | \end{center} |
|---|
| 581 | \caption[A \kdtree partitioning of space.]{A \kdtree partitioning of |
|---|
| 582 | space. Each panel shows the nodes at one level of the tree. The |
|---|
| 583 | points are 200 samples from a ring with uniformly-distributed |
|---|
| 584 | radius in $\left(0.6, 1\right)$ and uniformly-distributed angle in |
|---|
| 585 | $\left(0, \displaystyle{\frac{\pi}{2}}\right)$. Nodes are split |
|---|
| 586 | along the dimension with the largest range, and the splitting |
|---|
| 587 | position is chosen so that the two child nodes will contain an |
|---|
| 588 | equal number of points. Note how the tree adapts to the |
|---|
| 589 | distribution of the data: each node tightly bounds the data points |
|---|
| 590 | it owns. This is a slightly different version of the \kdtree than |
|---|
| 591 | Bentley's original description.\label{kdtree}} |
|---|
| 592 | \end{figure} |
|---|
| 593 | |
|---|
| 594 | The $k$-dimensional tree (\kdtree) was introduced by Bentley in 1975 |
|---|
| 595 | \cite{bentley1975}, and several extensions have been described |
|---|
| 596 | \cite{friedman1977, sproull1991}. The idea is to build a binary tree |
|---|
| 597 | over the feature space, where each node ``owns'' a region of the space |
|---|
| 598 | which is disjoint from the region owned by its sibling node. Each |
|---|
| 599 | non-leaf node has an axis-aligned splitting hyper-plane; its left |
|---|
| 600 | child owns the subspace to the left of the splitting plane, and the |
|---|
| 601 | right child the subspace to the right. In this way, the \kdtree |
|---|
| 602 | defines a hierarchical subdivision of the feature space into |
|---|
| 603 | axis-aligned hyper-rectangles. See \figref{kdtree}. |
|---|
| 604 | |
|---|
| 605 | |
|---|
| 606 | For the purposes of indexing the features in a geometric hashing |
|---|
| 607 | system, the set of features is known beforehand and is static, so we |
|---|
| 608 | do not need to insert or delete features from the tree, and Bentley's |
|---|
| 609 | \emph{optimized} \kdtree construction method can be used. That is, |
|---|
| 610 | the tree can be built to adapt to the particular set of features we |
|---|
| 611 | have. |
|---|
| 612 | |
|---|
| 613 | |
|---|
| 614 | Tree construction proceeds by first assigning all features to the root |
|---|
| 615 | node, then recursively splitting nodes until each leaf node owns at |
|---|
| 616 | most $L$ features, with $L$ perhaps $10$ or $20$. Splitting is done |
|---|
| 617 | by selecting a splitting plane and using it to partition the features. |
|---|
| 618 | Sproull \cite{sproull1991} enumerates several strategies for choosing |
|---|
| 619 | the splitting hyperplane. Traditionally, \kdtrees use axis-aligned |
|---|
| 620 | splitting hyperplanes, and the splitting dimension is chosen by simply |
|---|
| 621 | iterating through the dimensions (as in Bentley's original paper), or |
|---|
| 622 | by selecting the dimension with the largest \emph{range} or |
|---|
| 623 | \emph{variance}. One can also use an arbitrary splitting hyperplane, |
|---|
| 624 | for example by finding the principal eigenvector of the covariance |
|---|
| 625 | matrix of the feature vectors. Typically the splitting hyperplane is |
|---|
| 626 | positioned so that the two children own an equal number of data |
|---|
| 627 | points. This leads to a balanced (and therefore short) tree, but in |
|---|
| 628 | some applications it can be beneficial to bias the splitting location |
|---|
| 629 | \cite{macdonald1990}. |
|---|
| 630 | |
|---|
| 631 | |
|---|
| 632 | Depending on the application, it can be beneficial to store at each |
|---|
| 633 | node the tight axis-aligned bounding box (and other summary statistics |
|---|
| 634 | \cite{deng1995}) of the points owned by that node (as shown in |
|---|
| 635 | \figref{kdtree}). This is particularly useful where there is |
|---|
| 636 | significant correlation between the dimensions of the data, because |
|---|
| 637 | splitting the data points along one dimension implies that the |
|---|
| 638 | bounding box will shrink in other dimensions as well, and this can |
|---|
| 639 | result in faster searches. If the bounding boxes are stored, then it |
|---|
| 640 | is no longer strictly necessary to store the splitting hyperplane, |
|---|
| 641 | though it may be useful to speed up some computations. |
|---|
| 642 | |
|---|
| 643 | |
|---|
| 644 | \comment{ |
|---|
| 645 | \begin{figure} |
|---|
| 646 | \begin{center} |
|---|
| 647 | \includegraphics[width=\textwidth]{rangesearch-bb-fig} |
|---|
| 648 | \end{center} |
|---|
| 649 | \caption[\Kdtree range search with bounding boxes.]{\Kdtree range |
|---|
| 650 | search for a query point $q$ in \kdtree node $N$, for a \kdtree that |
|---|
| 651 | stores the bounding box of each node. We check whether the query |
|---|
| 652 | hypersphere overlaps the child nodes $L$ and $R$. In both cases $q$ |
|---|
| 653 | is contained in the left child node $L$ so we must recurse on $L$. In |
|---|
| 654 | the top panel, the distance \emph{in the splitting dimension} from $q$ |
|---|
| 655 | to the bounding box of node $R$ is less than the search radius $r_1$ |
|---|
| 656 | so we do not need to recurse on $R$. In the lower panel, the distance |
|---|
| 657 | in the search dimension to the query is larger than the search radius |
|---|
| 658 | $r_2$ so we must recurse on $R$.} |
|---|
| 659 | \label{rangesearch-bb} |
|---|
| 660 | \end{figure} |
|---|
| 661 | |
|---|
| 662 | |
|---|
| 663 | \begin{figure} |
|---|
| 664 | \begin{center} |
|---|
| 665 | \includegraphics[width=\textwidth]{bohm2001fig7} |
|---|
| 666 | \end{center} |
|---|
| 667 | \caption[Lower and Upper Distance Bounds in a \kdtree.]{Lower and upper |
|---|
| 668 | bounds (min- and max-dist, respectively) on the distance from a query point |
|---|
| 669 | to a \kdtree bounding box. |
|---|
| 670 | This figure is copied from \cite{bohm2001} Figure 7.} |
|---|
| 671 | \label{upperlower} |
|---|
| 672 | \end{figure} |
|---|
| 673 | |
|---|
| 674 | |
|---|
| 675 | \begin{figure} |
|---|
| 676 | \begin{center} |
|---|
| 677 | \includegraphics[width=\textwidth]{rangesearch-split-fig} |
|---|
| 678 | \end{center} |
|---|
| 679 | \caption[\kdtree range search with splitting planes.]{\kdtree range |
|---|
| 680 | search for a query point $q$ in \kdtree node $N$, |
|---|
| 681 | for a \kdtree that only stores the splitting plane (not the bounding boxes). |
|---|
| 682 | Top: since the query is to the left of $N$'s splitting plane (dashed line), |
|---|
| 683 | we must recurse on the left child $L$. Since the distance in the splitting |
|---|
| 684 | dimension from the query to the splitting plane, $D_R$ is less than $r$, |
|---|
| 685 | we must recurse on $R$. |
|---|
| 686 | |
|---|
| 687 | Bottom: For the left node ($L$): the query is above the splitting plane, so we must recurse |
|---|
| 688 | on child $T$. The distance $D_B$ in the splitting dimension to the splitting plane |
|---|
| 689 | is less than $r$, so we must recurse on $B$ also. |
|---|
| 690 | |
|---|
| 691 | For the right node ($R$): the query is above the splitting plane, so we must recurse |
|---|
| 692 | on child $t$. The distance $D_b$ to the splitting plane is greater than $r$ so we |
|---|
| 693 | need not recurse on node $b$. (Note, we could also keep track of the \emph{total} |
|---|
| 694 | distance to the query point, $\sqrt{D_R^2 + D_b^2}$, but this requires more book-keeping.} |
|---|
| 695 | \label{rangesearch-sp} |
|---|
| 696 | \end{figure} |
|---|
| 697 | } |
|---|
| 698 | |
|---|
| 699 | For feature matching, we typically want to do \emph{range search}: |
|---|
| 700 | find all features within radius $r$ of a given query feature $q$. |
|---|
| 701 | This is easily achieved with a \kdtree using a recursive algorithm. |
|---|
| 702 | Starting at the root, we must decide whether we need to recurse on |
|---|
| 703 | each of the node's children. (How we make this decision is explained |
|---|
| 704 | below.) Once we reach a leaf, we enumerate the features owned by the |
|---|
| 705 | leaf and compute the distance to the query feature; any feature with |
|---|
| 706 | distance less than $r$ is accepted. |
|---|
| 707 | |
|---|
| 708 | |
|---|
| 709 | We decide whether we must recurse on a node's children based on their |
|---|
| 710 | bounding boxes. We compute the \emph{mindist}---the lower-bound of |
|---|
| 711 | distance---between the query point and each child's bounding box. The |
|---|
| 712 | mindist is simply the distance to the corner or edge of the bounding |
|---|
| 713 | box, and can be computed quickly. Any node whose mindist to the query |
|---|
| 714 | point is less than the query radius $r$ must be visited. If the |
|---|
| 715 | \kdtree does not explicitly store the tight bounding-boxes of the |
|---|
| 716 | nodes, loose bounding-boxes can be constructed during the recursion |
|---|
| 717 | based on the splitting planes of the ancestors. |
|---|
| 718 | |
|---|
| 719 | |
|---|
| 720 | \comment{ |
|---|
| 721 | Nearest-neighbour search in \kdtrees is essentially the same as range |
|---|
| 722 | search, except that $r$ is considered the \emph{pruning distance}; any |
|---|
| 723 | nodes whose lower-bound distance is larger than the pruning distance |
|---|
| 724 | need not be visited. The pruning distance decreases monotonically as |
|---|
| 725 | the search proceeds and is simply the (upper bound of) the distance to |
|---|
| 726 | the nearest neighbour found so far. Since reducing the pruning |
|---|
| 727 | distance is critically important to achieving fast search, the order |
|---|
| 728 | in which nodes are explored is important; one typically keeps a |
|---|
| 729 | priority queue of nodes to visit, sorted by the (lower-bound) distance |
|---|
| 730 | of the node to the query feature. If the query feature falls in the |
|---|
| 731 | region owned by one of the leaf nodes, then all nodes on the path from |
|---|
| 732 | the root to that leaf will have lower-bound distance of zero, so this |
|---|
| 733 | path will be explored first and the ``greedy'' nearest neighbour |
|---|
| 734 | distance will become the first pruning distance. |
|---|
| 735 | |
|---|
| 736 | |
|---|
| 737 | The \kdtree can also be used for approximate nearest-neighbour or |
|---|
| 738 | range searches. Beis and Lowe \cite{beis1997} describe the Best Bin |
|---|
| 739 | First algorithm, which is essentially the same as the full |
|---|
| 740 | nearest-neighbour search except that the search terminates after |
|---|
| 741 | exploring a fixed maximum number of leaves or after a fixed period of |
|---|
| 742 | time has elapsed. The priority queue is sorted by the lower-bound |
|---|
| 743 | distance of the node's bounding box to the query point, so the `most |
|---|
| 744 | promising' nodes are explored first. |
|---|
| 745 | } |
|---|
| 746 | |
|---|
| 747 | |
|---|
| 748 | Chapter \ref{chap:kdtree} presents the \kdtree in more detail. |
|---|
| 749 | |
|---|
| 750 | |
|---|
| 751 | \subsection{Other approaches} |
|---|
| 752 | |
|---|
| 753 | % Voronoi: |
|---|
| 754 | % -de Berg 1997 |
|---|
| 755 | % (Mark de Berg , Marc van Kreveld , Mark Overmars , Otfried Schwarzkopf, Computational geometry: algorithms and applications, Springer-Verlag New York, Inc., Secaucus, NJ, 1997) |
|---|
| 756 | % -Edelsbrunner 1987 |
|---|
| 757 | % (Herbert Edelsbrunner, Algorithms in combinatorial geometry, Springer-Verlag New York, Inc., New York, NY, 1987) |
|---|
| 758 | % -Preparata and Shamon 1985 |
|---|
| 759 | % (Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985) |
|---|
| 760 | |
|---|
| 761 | In contrast to the \kdtree's rectangular decomposition of space, there |
|---|
| 762 | is a family of space decompositions that use hyperspheres. An example |
|---|
| 763 | is the \emph{ball-tree} \cite{uhlmann1991b}, in which each non-leaf |
|---|
| 764 | node is associated with one of the known features $f$ and a radius |
|---|
| 765 | $r$; the left child owns all features within radius $r$ of feature |
|---|
| 766 | $f$, and the right child owns the remaining features. Similarly, in |
|---|
| 767 | the \emph{anchors hierarchy} \cite{moore2000}, each non-leaf node is |
|---|
| 768 | represented by a feature (called its \emph{anchor}), and a node owns |
|---|
| 769 | all the features that are closer to its anchor than the anchor of its |
|---|
| 770 | sibling. These sphere-based trees are supposed to better capture the |
|---|
| 771 | structure of high-dimensional data sets, though the results seem to |
|---|
| 772 | depend quite strongly on the particular data distributions and search |
|---|
| 773 | parameters. |
|---|
| 774 | |
|---|
| 775 | |
|---|
| 776 | For two-dimensional feature spaces, algorithms based on planar point |
|---|
| 777 | location (eg, \cite{kirkpatrick1983}) and the Voronoi tesselation of |
|---|
| 778 | space yield $\mathcal{O}(n \log n)$ preprocessing time, a data |
|---|
| 779 | structure of size $\mathcal{O}(n)$, and query time of |
|---|
| 780 | $\mathcal{O}(\log{n})$. Unfortunately, in higher dimensions the space |
|---|
| 781 | required to store the Voronoi tesselation is |
|---|
| 782 | $\mathcal{O}(n^{\floor{d/2}})$, which quickly becomes untenable |
|---|
| 783 | \cite{arya1998}. |
|---|
| 784 | |
|---|
| 785 | % planar point location: |
|---|
| 786 | % -Dobkin and Lipton in 1976 O((log n)^2) time, O(n^2) space |
|---|
| 787 | % -Sarnak and Tarjan O(n) space, O(log n) time |
|---|
| 788 | % * Neil Sarnak, Robert E. Tarjan (1986). "Planar Point Location Using Persistent Search Trees". Communications of the ACM 29 (7): 669-679. |
|---|
| 789 | % -Edelsbrunner, Guibas, and Stolfi, same |
|---|
| 790 | % * Herbert Edelsbrunner, Leonidas J. Guibas, Jorge Stolfi (1986). "Optimal point location in a monotone subdivision". SIAM Journal on Computing 15 (2): 317-340. |
|---|
| 791 | % -Kirkpatrick, same |
|---|
| 792 | % * David G. Kirkpatrick (1983). "Optimal Search in Planar Subdivisions". SIAM J. Comput. 12: 28-35. |
|---|
| 793 | % Voronoi tesselation: |
|---|
| 794 | % -B. Delaunay, Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793-800, 1934 |
|---|
| 795 | |
|---|
| 796 | The design of spatial indexing data structures and search algorithms |
|---|
| 797 | is itself a large research area. Dozens of different tree structures |
|---|
| 798 | have been proposed: in two surveys B\"ohm \cite{bohm2001} and |
|---|
| 799 | Hjaltason \cite{hjaltason2003} identify B-, $\textrm{B}^{+}$-, ball-, |
|---|
| 800 | bisector-, BSP-, DABS-, fq-, gh-, \mbox{GNA-,} hB-, $\textrm{hB}^{\pi}$-, |
|---|
| 801 | hybrid-, IQ-, kd-, kd-B-, $\textrm{LSD}^h$-, M-, mb-, |
|---|
| 802 | $\textrm{mb}^{\ast}$-, mvp-, oct-, \mbox{post-office-,} pyramid-, quad-, R-, |
|---|
| 803 | $\textrm{R}^{\ast}$-, $\textrm{R}^{+}$-, sa-, slim-, \mbox{sphere-,} |
|---|
| 804 | SR-, SS-, TV-, vp-, $\textrm{vp}^{\ast}$-, and X-trees. There is an |
|---|
| 805 | equally mind-boggling variety of exact and approximate search |
|---|
| 806 | algorithms. Each data structure and algorithm may work well in some |
|---|
| 807 | region of parameter space, but there is no clear winner for |
|---|
| 808 | general-purpose use. |
|---|
| 809 | |
|---|
| 810 | |
|---|
| 811 | \section{Astrometric calibration as a pattern recognition task} |
|---|
| 812 | |
|---|
| 813 | |
|---|
| 814 | \comment{ |
|---|
| 815 | wget "http://casjobs.sdss.org/ImgCutoutDR7/getjpeg.aspx?ra=166.45&dec=-0.03&scale=1&opt=&width=2000&height=2000" -O ngc3521-orig.jpg |
|---|
| 816 | jpegtopnm ngc3521-orig.jpg | pnmrotate -45 | pnmcut 600 900 1400 1000 | pnmscale -reduce 2 | pnmtojpeg > ngc3521.jpg |
|---|
| 817 | #---> http://live.astrometry.net/status.php?job=alpha-200906-68444159 |
|---|
| 818 | jpegtopnm ngc3521.jpg | ppmtopgm | pnminvert | pnmtojpeg > ngc3521-bw.jpg |
|---|
| 819 | wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=field.xy.fits" -O ngc3521.xy |
|---|
| 820 | wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=index.xy.fits" -O ngc3521-index.xy |
|---|
| 821 | jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white > ngc3521-sources.png |
|---|
| 822 | jpegtopnm ngc3521-bw.jpg | plotxy -N 100 -i ngc3521.xy -I - -x 1 -y 1 -C black -b white -P | plotxy -I - -i ngc3521-index.xy -x 1 -y 1 -C black -b white -s crosshair -P | plot-constellations -f 18 -w ngc3521.wcs -i - -N -o ngc3521-index.png |
|---|
| 823 | %%% wget "http://live.astrometry.net/status.php?job=alpha-200906-36181848&get=wcs.fits" -O ngc3521.wcs |
|---|
| 824 | %%% scp gmaps:/data2/test-merc/tycho.mkdt.fits . |
|---|
| 825 | wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-0.5&bb=0,-85,360,85&dashbox=0.1&w=500&h=500&lw=3" -O ngc3521-zoom0.png |
|---|
| 826 | wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=-1&bb=175.533,-17.7621663832,211.533,17.6598478619&dashbox=0.01&w=500&h=500&lw=3" -O ngc3521-zoom1.png |
|---|
| 827 | wget "http://explore.astrometry.net/tile/get/?layers=tycho,grid,userboundary&arcsinh&wcsfn=alpha/200906/36181848/wcs.fits&gain=0.5&bb=191.733,-1.85338140354,195.333,1.74602498613&w=500&h=500&lw=3" -O ngc3521-zoom2.png |
|---|
| 828 | for x in 0 1 2; do |
|---|
| 829 | pngtopnm ngc3521-zoom${x}.png | ppmtopgm | pnminvert | pnmtopng > ngc3521-zoom${x}-bw.png; |
|---|
| 830 | done |
|---|
| 831 | } |
|---|
| 832 | |
|---|
| 833 | |
|---|
| 834 | % ~/an-2/usnob-map/execs/tilerender -x 0.000000 -y -85.000000 -X 360.000000 -Y 85.000000 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.1 -d > tile1.png |
|---|
| 835 | % ~/an-2/usnob-map/execs/tilerender -x 151.724000 -y -4.784223 -X 187.724000 -Y 29.772221 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g -0.25 -W 'tor/200706/51145570/wcs.fits' -L 5 -B 0.01 -d > tile2.png |
|---|
| 836 | % ~/an-2/usnob-map/execs/tilerender -x 167.924000 -y 11.335527 -X 171.524000 -Y 14.841399 -w 1024 -h 1024 -l 'tycho' -l 'grid' -l 'boundary' -s -g 0.5 -W 'tor/200706/51145570/wcs.fits' -L 5 > tile3.png |
|---|
| 837 | % |
|---|
| 838 | % /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/index.xy.fits -S 1 -W 720 -H 503 -x 1 -y 1 -w 2 -r 6 -s s > ixy.pgm |
|---|
| 839 | % /home/gmaps/an-2/quads/plotxy2 -i /home/gmaps/ontheweb-data/tor/200706/51145570/field.xy.fits -S 1 -W 720 -H 503 -r 5 -x 1 -y 1 -w 2 -s c > fxy.pgm |
|---|
| 840 | % cp /home/gmaps/ontheweb-data/tor/200706/51145570/image.pnm undim.ppm |
|---|
| 841 | % pgmtoppm green ixy.pgm > ixy.ppm |
|---|
| 842 | % pnmcomp -alpha=ixy.pgm ixy.ppm undim.ppm > sum.ppm |
|---|
| 843 | % pnmcomp -alpha=fxy.pgm fxy.ppm sum.ppm > sum2.ppm |
|---|
| 844 | % pnmtopng sum2.ppm > sum.png |
|---|
| 845 | % pnmcomp -alpha=fxy.pgm fxy.ppm undim.ppm | pnmtopng > sources.png |
|---|
| 846 | % http://oven.cosmo.fas.nyu.edu/test/status.php?job=tor-200706-51145570 |
|---|
| 847 | |
|---|
| 848 | |
|---|
| 849 | \begin{figure} |
|---|
| 850 | \begin{center} |
|---|
| 851 | \framebox{\includegraphics[width=0.99\figunit]{ngc3521-bw}} \\ |
|---|
| 852 | \framebox{\includegraphics[width=0.99\figunit]{ngc3521-sources}} \\ |
|---|
| 853 | \framebox{\includegraphics[width=0.99\figunit]{ngc3521-index}} |
|---|
| 854 | \end{center} |
|---|
| 855 | \caption{\captionpart{Top:} Input image (credit: Sloan Digital Sky |
|---|
| 856 | Survey). \captionpart{Middle:} The brightest 100 sources extracted |
|---|
| 857 | from the image. \captionpart{Bottom:} Reference sources, transformed |
|---|
| 858 | into the image coordinate system (crosshairs). Many of the image and |
|---|
| 859 | reference sources are aligned, but there are many image sources |
|---|
| 860 | without reference sources. Our system knows about the positions of |
|---|
| 861 | many objects of interest on the sky, and has labelled the galaxy NGC |
|---|
| 862 | 3521.\label{fig:redgreen}} |
|---|
| 863 | \end{figure} |
|---|
| 864 | |
|---|
| 865 | |
|---|
| 866 | \begin{figure} |
|---|
| 867 | \begin{center} |
|---|
| 868 | \begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c} |
|---|
| 869 | \framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom0-bw}} & |
|---|
| 870 | \framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom1-bw}} & |
|---|
| 871 | \framebox{\includegraphics[width=0.31\textwidth]{ngc3521-zoom2-bw}} |
|---|
| 872 | \end{tabular} |
|---|
| 873 | \end{center} |
|---|
| 874 | \caption{The location of the input image on the sky. |
|---|
| 875 | \captionpart{Left:} The whole sky, in Mercator projection. The |
|---|
| 876 | sinusoid-shaped feature is the Milky Way. The dashed box shows the |
|---|
| 877 | zoomed-in region. \captionpart{Middle:} Zoomed in by a factor of |
|---|
| 878 | $10$. \captionpart{Right:} Zoomed in by a factor of $100$. The box |
|---|
| 879 | shows the outline of the input image.\label{fig:onthesky}} |
|---|
| 880 | \end{figure} |
|---|
| 881 | |
|---|
| 882 | |
|---|
| 883 | For modern astronomers, astrometric calibration is often one of the |
|---|
| 884 | first steps toward getting useful information out of an image of the |
|---|
| 885 | sky. Aligning a new image with an \emph{astrometric reference |
|---|
| 886 | catalog} allows the astronomer to place the image within a standard |
|---|
| 887 | coordinate frame. This allows stars, galaxies, and other objects |
|---|
| 888 | (\emph{sources}) in the new image to be identified with known sources, |
|---|
| 889 | which in turn allows astronomers to calibrate other properties of the |
|---|
| 890 | new image, and allows the positions of new sources to be described in |
|---|
| 891 | a standard reference frame. |
|---|
| 892 | |
|---|
| 893 | |
|---|
| 894 | The task of blind astrometric calibration---automatically finding the |
|---|
| 895 | astrometric calibration of an image, using only the information in the |
|---|
| 896 | image pixels---can be seen as a pattern recognition problem. As |
|---|
| 897 | Bertin \cite{bertin2005} notes, ``astrometric and photometric |
|---|
| 898 | calibrations have remained the most tiresome step in the reduction of |
|---|
| 899 | large imaging surveys,'' so this is not only an interesting problem to |
|---|
| 900 | solve, but one with practical implications for astronomers. |
|---|
| 901 | |
|---|
| 902 | |
|---|
| 903 | For the purposes of astrometric calibration, we can think of the sky |
|---|
| 904 | as a large two-dimensional surface: the stars are very distant, so our |
|---|
| 905 | viewpoint is effectively fixed. We are moving, as are the stars, but |
|---|
| 906 | these motions are small relative to the precision at which we |
|---|
| 907 | typically work. The sky contains many stars, galaxies, and other |
|---|
| 908 | astronomical sources. The stars and distant galaxies are effectively |
|---|
| 909 | point sources, while closer galaxies can be resolved. Astrometric |
|---|
| 910 | reference catalogs list the positions, motions, and brightnesses of |
|---|
| 911 | these sources and serve as the ``ground truth'' or database of known |
|---|
| 912 | (reference) objects. The USNO-B1 catalog \cite{usnob, nomad}, for |
|---|
| 913 | example, lists over one billion objects. As many as a few percent of |
|---|
| 914 | these are false detections or other artifacts \cite{barroncleaning}, |
|---|
| 915 | and some objects that should be visible are missing. |
|---|
| 916 | |
|---|
| 917 | |
|---|
| 918 | Extra sources can be due to planets, comets, satellites, or aircraft. |
|---|
| 919 | Missing or poorly localized source can be due to imperfections in the |
|---|
| 920 | imaging sensor, saturation, cosmic ray interference, or (rarely) |
|---|
| 921 | occlusion. Errors in the image processing that detects sources can |
|---|
| 922 | lead to sources being gained or lost. We call sources that appear in |
|---|
| 923 | only the input image or reference catalog \emph{distractors} and |
|---|
| 924 | \emph{dropouts}, respectively. The existence of distractors and |
|---|
| 925 | dropouts means that we can never assume that all the objects in the |
|---|
| 926 | reference catalog will be contained in an image to be recognized, or |
|---|
| 927 | vice versa. |
|---|
| 928 | |
|---|
| 929 | |
|---|
| 930 | The images to be recognized are subregions of the sky. Image sizes |
|---|
| 931 | range from nearly half the celestial sphere down to $10^{-7}$ of the |
|---|
| 932 | area and smaller. The input images measure unknown bands of the |
|---|
| 933 | electromagnetic spectrum, and various nonlinear functions may have |
|---|
| 934 | been applied to the pixel values. We cannot rely on absolute |
|---|
| 935 | brightness or color to recognize individual stars or galaxies. At |
|---|
| 936 | best we can hope that there is some positive correlation in the |
|---|
| 937 | relative brightness ordering of objects in the image and the |
|---|
| 938 | corresponding objects in our catalog. |
|---|
| 939 | |
|---|
| 940 | |
|---|
| 941 | Blind astrometric calibration is an ideal task for exploring geometric |
|---|
| 942 | ideas in pattern recognition. Most celestial objects are effectively |
|---|
| 943 | point sources, and can be found and localized to sub-pixel accuracy |
|---|
| 944 | using relatively simple image-processing procedures. But since the |
|---|
| 945 | individual features are characterized only by their positions and |
|---|
| 946 | brightnesses, we must examine collections of features in order to |
|---|
| 947 | build distinctive patterns. In \chapref{chap:techreport} we present |
|---|
| 948 | \an, which applies the geometric hashing framework to the task of |
|---|
| 949 | blind astrometric calibration. An example of our results in shown in |
|---|
| 950 | \figs \ref{fig:redgreen} and \ref{fig:onthesky}. |
|---|
| 951 | |
|---|
| 952 | |
|---|
| 953 | \comment{ |
|---|
| 954 | There are several useful applications for a blind astrometry solver. |
|---|
| 955 | Most telescopes used by professional astronomers are |
|---|
| 956 | computer-controlled and automatically record the telescope pointing in |
|---|
| 957 | the image header, but these systems occasionally fail. Feeding the |
|---|
| 958 | images to a blind astrometry solver would allow an independent |
|---|
| 959 | verification that the \metadata written by the telescope control |
|---|
| 960 | software are correct. Second, some observatories have large archives |
|---|
| 961 | of photographic plates taken by astronomers over many decades, and a |
|---|
| 962 | growing fraction of these have been digitized. These images could be |
|---|
| 963 | useful to researchers, since they provide a longer baseline for |
|---|
| 964 | measuring changes over time. However, information about where the |
|---|
| 965 | telescope was pointing has often been lost, or is only available as a |
|---|
| 966 | hand-written note on the edge of the photographic plate, or as an |
|---|
| 967 | entry in a logbook. A blind astrometry solver would be an invaluable |
|---|
| 968 | tool for easily providing accurate \metadata about these images. |
|---|
| 969 | Finally, these is a large and enthusiastic community of amateur |
|---|
| 970 | astronomers who produce many images that would be useful to |
|---|
| 971 | professional astronomers, but since amateurs are seldom interested in |
|---|
| 972 | recording exact \metadata, these images are not readily available to |
|---|
| 973 | professional astronomers. A blind astrometry solver would unlock this |
|---|
| 974 | trove of pixels. |
|---|
| 975 | |
|---|
| 976 | Other interesting applications include creating a customized |
|---|
| 977 | star-finding chart overlayed on the input image, and controlling a |
|---|
| 978 | telescope by using the output of a blind solver as feedback to a |
|---|
| 979 | control system. |
|---|
| 980 | |
|---|
| 981 | \begin{figure} |
|---|
| 982 | \begin{center} |
|---|
| 983 | %\begin{tabular}{c@{\hspace{1pt}}c@{\hspace{1pt}}c} |
|---|
| 984 | \includegraphics[width=\figunit]{moon} \\ |
|---|
| 985 | \includegraphics[width=\figunit]{saturated} |
|---|
| 986 | %\end{tabular} |
|---|
| 987 | \end{center} |
|---|
| 988 | \caption{Astronomical images with occlusion. \captionpart{Top:} The |
|---|
| 989 | moon, buildings, and mountains occlude the stars behind them (image |
|---|
| 990 | copyright Johannes Schedler, \texttt{http://panther-observatory.com}). |
|---|
| 991 | \captionpart{Bottom:} Saturation and diffraction spikes due to bright |
|---|
| 992 | objects can obscure nearby objects.} |
|---|
| 993 | \label{moon} |
|---|
| 994 | \end{figure} |
|---|
| 995 | |
|---|
| 996 | Since most celestial objects are effectively point sources, the |
|---|
| 997 | obvious image feature to use is an individual object. Feature |
|---|
| 998 | detection is relatively straightforward, and the positions of features |
|---|
| 999 | can be measured with subpixel accuracy, but the feature descriptor |
|---|
| 1000 | essentially contains only brightness and position; individual features |
|---|
| 1001 | are not distinctive. |
|---|
| 1002 | |
|---|
| 1003 | Although occlusion is typically not a problem in astronomical images, |
|---|
| 1004 | it does occasionally occur (see Figure \ref{moon}). In addition, |
|---|
| 1005 | bright objects in the image can cause diffraction spikes or saturation |
|---|
| 1006 | over a large region of the image, and this can cause nearby objects to |
|---|
| 1007 | be hidden. |
|---|
| 1008 | } |
|---|
| 1009 | |
|---|
| 1010 | |
|---|
| 1011 | |
|---|
| 1012 | \section{Related work in astrometric calibration} |
|---|
| 1013 | |
|---|
| 1014 | There seem to be two distinct groups of researchers who have worked on |
|---|
| 1015 | astrometric calibration. The first are professional astronomers, |
|---|
| 1016 | whose images are typically of small angular extent, long exposure |
|---|
| 1017 | time, and high quality. They typically assume that there is a good |
|---|
| 1018 | initial guess of the astrometric calibration and the goal is to |
|---|
| 1019 | produce a very accurate calibration, including image distortion. The |
|---|
| 1020 | second group of researchers are spacecraft engineers who want to use |
|---|
| 1021 | the stars to estimate the attitude of a camera attached to a |
|---|
| 1022 | spacecraft. Here the images are of wide angular extent, have short |
|---|
| 1023 | exposure time, and are very noisy. Primary concerns include weight, |
|---|
| 1024 | power consumption, and robustness (especially in avoiding false |
|---|
| 1025 | positives), while a high degree of accuracy is neither required nor |
|---|
| 1026 | possible given the hardware. |
|---|
| 1027 | |
|---|
| 1028 | \subsection{Non-blind astrometric calibration} |
|---|
| 1029 | |
|---|
| 1030 | Non-blind astrometric calibration requires that an initial estimate of |
|---|
| 1031 | the calibration to be provided. Groth \cite{groth1986} presents an |
|---|
| 1032 | algorithm for matching two lists of coordinates (eg, image coordinates |
|---|
| 1033 | in pixels and reference star coordinates on the celestial sphere), |
|---|
| 1034 | assuming that the lists contain a significant proportion of objects in |
|---|
| 1035 | common. With the limited computing resources available at the time, |
|---|
| 1036 | he suggests taking the brightest 20 or 30 objects in each list. |
|---|
| 1037 | |
|---|
| 1038 | |
|---|
| 1039 | Once the two lists of objects have been compiled and the brightest |
|---|
| 1040 | objects selected, all sets of three objects are enumerated and the |
|---|
| 1041 | triangles they form are described by a scale-, rotation-, and |
|---|
| 1042 | translation-invariant descriptor, composed of the length ratio of the |
|---|
| 1043 | longest to shortest edges, plus the cosine between these edges and the |
|---|
| 1044 | \emph{sense} or \emph{parity} of the triangle. The tolerances |
|---|
| 1045 | associated with these features are computed (by propagating their |
|---|
| 1046 | positional errors through the feature descriptor process) and stored, |
|---|
| 1047 | along with the logarithm of the triangle's perimeter. Triangles with |
|---|
| 1048 | large length ratio are rejected since they are relatively |
|---|
| 1049 | indistinctive. |
|---|
| 1050 | |
|---|
| 1051 | |
|---|
| 1052 | After triangle features are extracted from both point lists, feature |
|---|
| 1053 | matching is performed by checking whether the distance between each |
|---|
| 1054 | pair of features is less than their corresponding tolerances. This |
|---|
| 1055 | process is accelerated by sorting the features on one of the feature |
|---|
| 1056 | dimensions. If multiple matches are found for a particular feature, |
|---|
| 1057 | only the closest is considered. After all the features have been |
|---|
| 1058 | compared, many correct matches should be found, along with some false |
|---|
| 1059 | matches. For each match, the difference of the log-perimeters of the |
|---|
| 1060 | two triangles is computed. This gives the relative scales of the two |
|---|
| 1061 | triangles and hence their coordinate frames, assuming the match is |
|---|
| 1062 | correct. False matches are rejected by iterative outlier detection: |
|---|
| 1063 | the mean difference of log-perimeters is computed and matches far from |
|---|
| 1064 | the mean are rejected. |
|---|
| 1065 | |
|---|
| 1066 | |
|---|
| 1067 | This approach is essentially an application of the geometric hashing |
|---|
| 1068 | method, though instead of using hashing to accelerate feature |
|---|
| 1069 | matching, the features are simply kept in lists that are sorted on one |
|---|
| 1070 | dimension. Much of the subsequent work in this area follows |
|---|
| 1071 | essentially the same path (eg, \cite{valdes1995}, \cite{dong2006}). |
|---|
| 1072 | |
|---|
| 1073 | |
|---|
| 1074 | P\'al and Bakos \cite{pal2006} adapt the triangle-matching approach to |
|---|
| 1075 | images containing many more objects (of order $10^4$). Since it would |
|---|
| 1076 | become prohibitively expensive to enumerate all triangles in such an |
|---|
| 1077 | image, they reduce the number of triangles created by using only the |
|---|
| 1078 | triangles created by a Delauney triangulation. This vastly reduces |
|---|
| 1079 | the number of triangles created, but makes the algorithm more |
|---|
| 1080 | sensitive to dropout and distractor stars: a single extra star causes |
|---|
| 1081 | a completely disjoint set of triangles to be created in its |
|---|
| 1082 | neighbourhood. To compensate for this shortcoming, they define an |
|---|
| 1083 | \emph{extended Delauney triangulation}: for each point, they select |
|---|
| 1084 | all points at distance $\ell$ in the Delauney triangulation, and |
|---|
| 1085 | triangulate this set. This process is repeated for each point. The |
|---|
| 1086 | first extended triangulation ($\ell = 2$) skips over the nearest set |
|---|
| 1087 | of stars and builds larger triangles by using the next-further set of |
|---|
| 1088 | stars. The intent is that if some of the nearby stars are distractors |
|---|
| 1089 | that they will be ignored when the extended triangulations are used. |
|---|
| 1090 | |
|---|
| 1091 | \begin{figure} |
|---|
| 1092 | \begin{center} |
|---|
| 1093 | \includegraphics[width=\figunit]{pal-fig2} |
|---|
| 1094 | \end{center} |
|---|
| 1095 | \caption{The triangle parameterization used by P\'al and Bakos. This |
|---|
| 1096 | figure is a reproduction of \fig 2 in \cite{pal2006}.\label{pal}} |
|---|
| 1097 | \end{figure} |
|---|
| 1098 | |
|---|
| 1099 | P\'al and Bakos introduce a new triangle parameterization which is |
|---|
| 1100 | continuous and sensitive to chirality (parity); see \figref{pal}. |
|---|
| 1101 | This two-dimensional parameter space is used as the geometric feature |
|---|
| 1102 | space. When matching triangles between the two images, they demand a |
|---|
| 1103 | \emph{symmetric point match}: each triangle must be the other |
|---|
| 1104 | triangle's nearest neighbour in feature space. Matching two images is |
|---|
| 1105 | done by creating the lists of triangles and attempting to find |
|---|
| 1106 | symmetric point matches. This process is accelerated by sorting each |
|---|
| 1107 | list by one of the coordinates. Each triangle match is considered to |
|---|
| 1108 | be a vote for the correspondence of the three pairs of points |
|---|
| 1109 | composing the triangles. These votes are accumulated in a sparse |
|---|
| 1110 | matrix where element $(i, j)$ contains the number of votes for a |
|---|
| 1111 | correspondence between object $i$ in the first image and object $j$ in |
|---|
| 1112 | the second image. After all matches are considered, the top 40\% of |
|---|
| 1113 | the nonzero matrix elements are assumed to contain the true |
|---|
| 1114 | correspondences. A transformation based on these correspondence is |
|---|
| 1115 | computed and the unitarity of the transformation matrix is used to |
|---|
| 1116 | judge whether the match is true or false. |
|---|
| 1117 | |
|---|
| 1118 | |
|---|
| 1119 | A different approach is taken by Kaiser \etal \cite{kaiser1999}. They |
|---|
| 1120 | assume they are given two lists of source positions that differ by a |
|---|
| 1121 | rigid transformation involving scaling, rotation, and translation. In |
|---|
| 1122 | each image, they iterate over each pair of points and compute the |
|---|
| 1123 | vector difference of their positions. For each list, the log-length |
|---|
| 1124 | and angle of the pairwise difference vectors are accumulated in a |
|---|
| 1125 | two-dimensional histogram. Observe that if the whole list of points |
|---|
| 1126 | is scaled up by a constant factor, then the log-distance between each |
|---|
| 1127 | pair of points increases by a constant amount. Similarly, if the |
|---|
| 1128 | whole list is rotated then the angles shift by a constant amount. |
|---|
| 1129 | Once the two histograms have been computed, their cross-correlation is |
|---|
| 1130 | computed. If the two lists contain a significant number of |
|---|
| 1131 | corresponding points, the cross-correlation signal will be strongest |
|---|
| 1132 | at a shift corresponding to the difference in log-scale and rotation |
|---|
| 1133 | between the lists. This process is similar to a Hough transform |
|---|
| 1134 | \cite{duda1972,ballard1981}, except that instead of finding the peak |
|---|
| 1135 | of a single parameter-space histogram, we are searching for a peak in |
|---|
| 1136 | the similarity of two histograms as we shift them with respect to each |
|---|
| 1137 | other in parameter space. |
|---|
| 1138 | |
|---|
| 1139 | |
|---|
| 1140 | Once the scaling and rotation between the lists has been found, the |
|---|
| 1141 | translation can be found by scaling and rotating one of the lists into |
|---|
| 1142 | the frame of the other list, then histogramming the vector difference |
|---|
| 1143 | between points across the two lists. This is a standard (generalized) |
|---|
| 1144 | Hough transform. |
|---|
| 1145 | |
|---|
| 1146 | |
|---|
| 1147 | \subsection{Blind astrometric calibration} |
|---|
| 1148 | |
|---|
| 1149 | |
|---|
| 1150 | The majority of previous work on blind astrometric calibration is |
|---|
| 1151 | motivated by the problem of spacecraft attitude estimation. Sometimes |
|---|
| 1152 | called the ``lost in space'' or ``stellar gyroscope'' problem, the |
|---|
| 1153 | task is to estimate the pose of a spacecraft by using an image of the |
|---|
| 1154 | sky captured by an onboard camera. Although similar in general |
|---|
| 1155 | spirit, the requirements and limitations of this application are quite |
|---|
| 1156 | different than astronomical applications. Mass, power consumption, |
|---|
| 1157 | and robustness of the system are primary concerns, and as a result the |
|---|
| 1158 | optical designs are very different from science-grade astronomical |
|---|
| 1159 | instruments, and the available processor and memory resources are very |
|---|
| 1160 | restricted. Typical fields of view are tens of degrees across, and |
|---|
| 1161 | the exposure time is kept short to allow the system to function while |
|---|
| 1162 | the spacecraft is rotating. As a result, image quality is typically |
|---|
| 1163 | quite poor: often only a handful of the brightest stars are visible. |
|---|
| 1164 | Since the field of view is large, a reference catalog of a few |
|---|
| 1165 | thousand stars is sufficient to ensure that any view of the sky |
|---|
| 1166 | contains many reference stars. Since the system will only process |
|---|
| 1167 | images from a single camera, the whole system can be customized and |
|---|
| 1168 | calibrated to that camera. For example, the nonlinear distortions of |
|---|
| 1169 | the optical system can be measured, and the scale and bandpass of the |
|---|
| 1170 | imaging system are known, so the reference catalog can be tailored to |
|---|
| 1171 | match. |
|---|
| 1172 | |
|---|
| 1173 | |
|---|
| 1174 | For example, Liebe \etal \cite{liebe2004} describe a system design |
|---|
| 1175 | with a $56~\deg$ field of view and exposure time of $50~\unit{ms}$. |
|---|
| 1176 | The resulting images contain tens of stars if the spacecraft is not |
|---|
| 1177 | rotating, but on average only three stars will be detectable when the |
|---|
| 1178 | rotation rate is $50~\deg/\unit{sec}$. The paper does not describe a |
|---|
| 1179 | particular algorithm for star identification, with the implication |
|---|
| 1180 | that it is not a particularly difficult problem since absolute |
|---|
| 1181 | brightness information will be available, and the total number of |
|---|
| 1182 | stars that are visible to the camera is only a few hundred. |
|---|
| 1183 | |
|---|
| 1184 | |
|---|
| 1185 | In earlier work \cite{liebe1993}, Liebe describes a star |
|---|
| 1186 | identification system. The reference catalog is composed of the 1539 |
|---|
| 1187 | brightest stars, with brightness calibrated to the camera used in the |
|---|
| 1188 | system. The field of view is $30~\degrees$, and the system uses a |
|---|
| 1189 | feature-matching approach, using the brightest star in the field and |
|---|
| 1190 | its two nearest neighbours to define a geometric feature. The feature |
|---|
| 1191 | descriptor is composed of the distance from the brightest star to its |
|---|
| 1192 | nearest and second-nearest neighbours, along with the angle between |
|---|
| 1193 | these neighbours. Note that this feature is not scale-invariant, |
|---|
| 1194 | because the system is only meant to solve images taken by one camera, |
|---|
| 1195 | so the scale is known. An index is constructed by enumerating all |
|---|
| 1196 | such features that could possibly be detected, given the detection |
|---|
| 1197 | limits of the camera. These features are coarsely quantized and |
|---|
| 1198 | stored in a table. To account for noise in the feature descriptors, |
|---|
| 1199 | all neighbouring cells in the quantized feature space are also stored |
|---|
| 1200 | in the table. This generates $185,000$ features. At test time, the |
|---|
| 1201 | features in the image are enumerated and the table of features is |
|---|
| 1202 | scanned; an exact feature match in the quantized space is assumed to |
|---|
| 1203 | be correct. |
|---|
| 1204 | |
|---|
| 1205 | |
|---|
| 1206 | This approach is a fairly straightforward geometric hashing technique, |
|---|
| 1207 | except that it does not use hashing as such, and there is no voting or |
|---|
| 1208 | verification scheme because feature aliasing is assumed not to happen. |
|---|
| 1209 | |
|---|
| 1210 | \begin{figure} |
|---|
| 1211 | \begin{center} |
|---|
| 1212 | \includegraphics[width=\figunit]{grid} |
|---|
| 1213 | \end{center} |
|---|
| 1214 | \caption{A grid-based feature: two sources are used to define a local |
|---|
| 1215 | coordinate system which is discretized into a grid of cells. Each |
|---|
| 1216 | cell becomes a bit in the feature descriptor; if a cell is occupied |
|---|
| 1217 | its bit is set. This figure is copied from MacKay |
|---|
| 1218 | \cite{mackay2005}.} |
|---|
| 1219 | \label{gridbased} |
|---|
| 1220 | \end{figure} |
|---|
| 1221 | |
|---|
| 1222 | In contrast to systems that build features from the precise locations |
|---|
| 1223 | of a small number of stars, Padgett and Kreutz-Delgado |
|---|
| 1224 | \cite{padgett1997} present an approach that incorporates information |
|---|
| 1225 | from a large portion of the image. Their grid-based feature is |
|---|
| 1226 | defined by first choosing one star as the reference star to define the |
|---|
| 1227 | center of the grid, then selecting its nearest neighbour (outside an |
|---|
| 1228 | exclusion radius) to define the orientation of the grid. Once the |
|---|
| 1229 | center and orientation are determined, a grid is defined, and each |
|---|
| 1230 | remaining stars in the image is assigned to a grid cell. The feature |
|---|
| 1231 | is a bit vector, one bit per grid cell, where the bit is set only if |
|---|
| 1232 | the cell contains a star. See \figref{gridbased}. |
|---|
| 1233 | |
|---|
| 1234 | |
|---|
| 1235 | The index is constructed by scanning the sky and selecting, for each |
|---|
| 1236 | field of view, the $n$ brightest stars. For each of these stars a |
|---|
| 1237 | feature is computed and stored. At test time, they examine the |
|---|
| 1238 | brightest $c n$ stars (for some safety factor $c$), computing the |
|---|
| 1239 | feature for each one and searching the index for a match. A pair of |
|---|
| 1240 | features is defined to match if their dot product is above a |
|---|
| 1241 | threshold. This is equivalent to taking the bitwise \textsc{AND} of |
|---|
| 1242 | the bit vectors and counting the number of bits that are set. This |
|---|
| 1243 | search can be implemented efficiently by using lookup tables: for each |
|---|
| 1244 | bit, they maintain a list of the features for which that bit is set. |
|---|
| 1245 | Note that this is equivalent to using a grid-based geometric hashing |
|---|
| 1246 | approach: each grid cell is equivalent to a discretized relative |
|---|
| 1247 | coordinate vector, which becomes a hash key, and the ``lookup table'' |
|---|
| 1248 | is a hash table. |
|---|
| 1249 | |
|---|
| 1250 | |
|---|
| 1251 | After all the features in the test image have been extracted and |
|---|
| 1252 | matched to the index, the algorithm proceeds to a voting (or perhaps |
|---|
| 1253 | ``consensus'') phase where it rejects matches that propose a field of |
|---|
| 1254 | view that does not overlap that of the majority. |
|---|
| 1255 | |
|---|
| 1256 | |
|---|
| 1257 | The system is designed to operate on images of diameter $8~\degrees$, |
|---|
| 1258 | and performs well on simulated data. They use a grid size of $40 |
|---|
| 1259 | \times 40$, and find that on average $25$ grid cells are filled. |
|---|
| 1260 | Their index contains $13,000$ patterns, which means that false |
|---|
| 1261 | positives are quite rare. However, misidentification of the nearest |
|---|
| 1262 | neighbour, or edge effects (assigning a star to the wrong grid cell |
|---|
| 1263 | due to positional noise) mean that failures to find a match are not |
|---|
| 1264 | uncommon, and are more likely in regions of the sky with high stellar |
|---|
| 1265 | density. |
|---|
| 1266 | |
|---|
| 1267 | In later work, Clouse and Padgett \cite{clouse2000} extend this |
|---|
| 1268 | approach by using Bayesian decision theory instead of a simple |
|---|
| 1269 | threshold on the dot product to define how well features match. This |
|---|
| 1270 | extension, along with smaller noise levels in the (simulated) imaging |
|---|
| 1271 | system, allows them to extend the approach down to fields of view |
|---|
| 1272 | $2~\degrees$ in diameter. |
|---|
| 1273 | |
|---|
| 1274 | \comment{ |
|---|
| 1275 | Given two features, they compute the dot product between the bit |
|---|
| 1276 | vectors, then proceed to estimate the probabilities that the match is |
|---|
| 1277 | a true positive and false positive. These probability distributions |
|---|
| 1278 | are quite complex, so several simplifying approximations are made, and |
|---|
| 1279 | the remaining parameters are estimated by running simulations. |
|---|
| 1280 | } |
|---|
| 1281 | |
|---|
| 1282 | Harvey \cite{harvey2004} presents two different approaches, one |
|---|
| 1283 | grid-based and the other shape-based. The grid-based approach is |
|---|
| 1284 | similar to the Padgett--Kreutz-Delgado and Clouse--Padgett approaches |
|---|
| 1285 | \cite{padgett1997, clouse2000}. A coarser grid is used, so more stars |
|---|
| 1286 | are likely to appear in each bin. To compensate, a grid cell is only |
|---|
| 1287 | considered ``occupied'' if it contains more than some threshold number |
|---|
| 1288 | of stars. The other major change is that he expects a test image to |
|---|
| 1289 | be an ``overexposure'' or ``underexposure'' relative to the reference |
|---|
| 1290 | catalog. This implies that the test image should contain either a |
|---|
| 1291 | subset or a superset of the stars in the index, and therefore the |
|---|
| 1292 | image feature vector must be either greater than or less than an index |
|---|
| 1293 | feature at each bit. This allows simple bit operations to be used to |
|---|
| 1294 | find feature matches, and feature matching is performed by a linear |
|---|
| 1295 | scan through the index. |
|---|
| 1296 | |
|---|
| 1297 | Harvey's shape-based approach uses constrained $n$-star constellations |
|---|
| 1298 | in order to aggregate information from a large number of stars without |
|---|
| 1299 | allowing the number of potential features to grow combinatorially. |
|---|
| 1300 | Specifically, Harvey uses a pair of stars to define a narrow wedge in |
|---|
| 1301 | the image, then describes the relative positions and angles of a fixed |
|---|
| 1302 | number of nearby stars within that wedge. Unfortunately, this makes |
|---|
| 1303 | the feature highly sensitive to distractor and dropout stars, since |
|---|
| 1304 | the feature depends on the stars in the feature being enumerated in a |
|---|
| 1305 | particular order. As a result, all potential features (allowing any |
|---|
| 1306 | combination of stars to drop out) must be checked; the number of |
|---|
| 1307 | features grows exponentially as the density of stars increases. |
|---|
| 1308 | |
|---|
| 1309 | |
|---|
| 1310 | Harvey makes the useful observation that a cascade of indices can be |
|---|
| 1311 | built, where each index is designed to solve images with a particular |
|---|
| 1312 | range of scales. |
|---|
| 1313 | |
|---|
| 1314 | |
|---|
| 1315 | MacKay and Roweis \cite{mackay2005} point out that a grid-based |
|---|
| 1316 | feature such as that used by Harvey leads naturally to a hashing-based |
|---|
| 1317 | strategy. Each grid cell is associated with a bit that is turned on |
|---|
| 1318 | if the cell is occupied. This value is placed in a hash table with a |
|---|
| 1319 | mapping back to its position on the sky. Since false positives can |
|---|
| 1320 | occur as a result of feature aliasing or hash collision, a voting |
|---|
| 1321 | scheme is employed: several feature matches must accumulate before the |
|---|
| 1322 | match is accepted. Since false negatives can occur as a result of |
|---|
| 1323 | dropouts and distractors (\emph{any} missing or extra star causes the |
|---|
| 1324 | hash code to change completely), many features must be extracted for |
|---|
| 1325 | each region of the sky. |
|---|
| 1326 | |
|---|
| 1327 | |
|---|
| 1328 | \subsection{Fine-tuning astrometric calibrations} |
|---|
| 1329 | |
|---|
| 1330 | In order to ``co-add'' or ``stack'' astronomical images (combine |
|---|
| 1331 | pixels from different images to produce a higher signal-to-noise |
|---|
| 1332 | image), or to do proper-motion studies (measure the movement of stars |
|---|
| 1333 | over time), it is necessary to fine-tune the astrometric calibration |
|---|
| 1334 | of the images. This is similar to the \emph{bundle adjustment} |
|---|
| 1335 | problem in computer vision \cite{triggs2000}, in that it involves the |
|---|
| 1336 | simultaneous optimization of the various camera and telescope |
|---|
| 1337 | parameters and the estimated positions of objects in the world. |
|---|
| 1338 | Fine-tuning astrometric calibrations is easier because it is |
|---|
| 1339 | essentially two-dimensional, but more difficult because the camera |
|---|
| 1340 | parameters include polynomial distortion terms to model the image |
|---|
| 1341 | distortion introduced by telescope optics. |
|---|
| 1342 | |
|---|
| 1343 | |
|---|
| 1344 | The software package \scamp by Bertin \cite{bertin2005} is a popular |
|---|
| 1345 | tool used by astronomers to fine-tune simultaneously the astrometric |
|---|
| 1346 | calibrations of a large collection of images. For each image, it is |
|---|
| 1347 | assumed that the center of the image is known to about 25\% of the |
|---|
| 1348 | size of the image, and the scale is known to within a factor of two. |
|---|
| 1349 | The histogram-alignment method of \cite{kaiser1999} is used to find |
|---|
| 1350 | the translation, scaling, and rotation between the image and a |
|---|
| 1351 | reference catalog, which allows the correspondences between image and |
|---|
| 1352 | reference catalog stars to be determined. |
|---|
| 1353 | |
|---|
| 1354 | The core of the \scamp system is a chi-squared minimization of the |
|---|
| 1355 | total weighted distance between the projected positions of all star |
|---|
| 1356 | correspondences among the set of images and the reference catalog. |
|---|
| 1357 | The parameters to be adjusted are the center, scale, rotation, and |
|---|
| 1358 | polynomial distortion coefficients of each image. A key feature is |
|---|
| 1359 | the ability to share image distortion parameters among subsets of the |
|---|
| 1360 | images, since images taken with the same telescope are expected to |
|---|
| 1361 | share some distortion terms due to the telescope optics. This reduces |
|---|
| 1362 | the degrees of freedom of the fitting process, resulting in more |
|---|
| 1363 | robust fits. \scamp can fine-tune thousands of images to sub-pixel |
|---|
| 1364 | accuracy. |
|---|
| 1365 | |
|---|
| 1366 | |
|---|
| 1367 | \section{Summary} |
|---|
| 1368 | |
|---|
| 1369 | |
|---|
| 1370 | The task of automatically finding the astrometric calibration of an |
|---|
| 1371 | astronomical image---this is, recognizing the stars in the image, or |
|---|
| 1372 | locating the image on the sky---is ideal for a geometric hashing-based |
|---|
| 1373 | approach. Geometric hashing (or more generally, geometric feature |
|---|
| 1374 | matching) is a widely-applicable technique for problems in which it is |
|---|
| 1375 | desirable to build distinctive geometric features out of simple |
|---|
| 1376 | interest-point features. The approach is ideally suited to the blind |
|---|
| 1377 | astrometric calibration problem, since images of celestial objects can |
|---|
| 1378 | be localized to sub-pixel accuracy but have few other distinguishing |
|---|
| 1379 | features. Indeed, much of the previous work in blind and non-blind |
|---|
| 1380 | astrometric calibration can be placed within a geometric feature |
|---|
| 1381 | matching framework. |
|---|
| 1382 | |
|---|
| 1383 | |
|---|
| 1384 | \comment{ |
|---|
| 1385 | Although one might suppose that hashing is an intrinsic part of |
|---|
| 1386 | geometric hashing, we have found that a tree structure is more |
|---|
| 1387 | effective for the large-scale feature matching we want to perform; we |
|---|
| 1388 | feel that deciding to use approximate feature matching should be a |
|---|
| 1389 | design decision, not an intrinsic part of the method: in cases where |
|---|
| 1390 | each image contains only a few features that could potentially be |
|---|
| 1391 | found in the index, we do not want to risk missing them due to ``bad |
|---|
| 1392 | luck'' in a hash function. |
|---|
| 1393 | } |
|---|
| 1394 | |
|---|
| 1395 | |
|---|