The Whole Blind Pipeline, In Words

How do we go from raw star catalogs to blind solving? Let me tell you a story...

Raw USNO-B and Tycho-2 to FITS

We start with raw USNO-B and Tycho-2 files. (The Tycho-2 catalog is only a few hundred megs so can be found on the web. USNO-B is 80 gigs so people are strangely leery about making it publicly available.) Since these catalogs are in insane formats, we convert them to FITS. This is done using the programs usnobtofits and tycho2tofits. Yes, dstn is sometimes shockingly unoriginal in naming programs.

http://trac.astrometry.net/browser/trunk/doc/pipeline-1.png?format=raw

For Tycho-2, we just produce one FITS file for each of the three input files; the biggest is about 350 MB, and the others are relatively tiny.

For USNO-B, we split the sky into 972 healpixes (Nside=9) so that the file sizes are manageable (they range from 50 to 1500 MB).

Creating the Astrometry.net catalog

Next, we merge the USNO-B and Tycho-2 catalogs (now in FITS format). The USNO-B catalog includes all the stars from Tycho-2, since Tycho-2 stars are bright enough to completely saturate the photographic plates. However, they are included in USNO-B in an undocumented binary format, so I had to go to the source and get the original Tycho-2 catalog. Since these two catalogs are, by design, non-overlapping, they can be merged easily; we simply reject stars in USNO-B that are marked as being Tycho-2 stars. We also reject stars that have the "may be a diffraction spike" flag set. This process is done by the program build_an_catalog (again, not likely to win the Most Originally Named Program award). We again healpixify the sky into 972 healpixes so that the files are manageable (30 to 1050 MB).

http://trac.astrometry.net/browser/trunk/doc/pipeline-1b.png?format=raw

Cutting the Astrometry.net catalog

The next stage is to sample objects from the catalog that we expect to be bright in the fields we are interested in solving. We also want spatial uniformity. We also remove duplicate stars. This is performed by the cut-an program. We create one catalog per coarse healpix (twelve for the whole sky).

We lay a fine healpix grid over the coarse grid, and add a margin of one fine healpix so that the catalogs overlap a bit. We sweep through the input catalog files, placing stars in the appropriate healpixes. In each healpix we keep a list of the N brightest stars we have encountered so far.

After all the input catalogs have been read, we make N passes through the healpixes. During pass p, we gather the pth brightest star in each healpix, if it exists. We then sort the stars by absolute brightness and write them out.

For SDSS R-band, we simply take the average magnitude of the measurement in the input catalog, including the USNO "red" surveys (E and F bands) and the Tycho-2 "visual" and "Hipparcos" (V or H) bands.

After the catalogs have been created, we build a star kdtree for each.

http://trac.astrometry.net/browser/trunk/doc/pipeline0.png?format=raw

Quad creation

The next step in the pipeline is to build quads. We want to build quads such that they are spatially uniform, have some limited range of scales, use bright stars, and are evenly distributed in code space. This is achieved by the hpquads program.

The program proceeds by laying a fine healpix grid over the sky. In each healpix, we find all stars that could potentially be a member of a quad whose center lies inside the bounds of the healpix. This search can be done efficiently using the star kdtree. We sort this set of stars by their order in the cut catalog, which is nearly the same as sorting by brightness. Finally, we try to build valid quads, of the desired scale, whose centers lie within the healpix. We do this by sequentially adding the next-brightest star, and considering all potential quads which have the new star as an AB star, and all potential quads in which the new star is a CD star. This process stops once we have found one valid quad.

We then shift the fine healpix grid and repeat. We keep track of the quads that have been created so far to avoid creating the same quad more than once.

There is also a parameter that limits the maximum number of times a star can be used in a quad.

The result of this process is a list of quads (ie, a list of the four stars composing each quad) and a corresponding list of codes which describe the relative shape of the quad.

Finally, we build a kdtree out of the codes.

http://trac.astrometry.net/browser/trunk/doc/pipeline1.png?format=raw

Unpermuting

This stage is not required, but is desirable for technical reasons. Recall that we build kdtrees from both stars and codes. In the kdtree implementation we are using, the stars and codes are shuffled into an order that allows efficient access by the kdtree algorithms. In order to get back to the previous order, the kdtree maintains a permutation vector.

Note, however, that once we have built the quads using hpquads, we no longer care about the original ordering of the stars. Likewise, we never care about the original ordering of the codes. We can therefore apply the permutation vector in all the corresponding places such that the kdtree ordering becomes the new ordering. We then no longer need to store the permutation vector nor use it to access data in the kdtrees. This results in faster operation and smaller memory usage in subsequent stages.

http://trac.astrometry.net/browser/trunk/doc/pipeline1b.png?format=raw

The Whole Thing

http://trac.astrometry.net/browser/trunk/doc/pipelineall2.png?format=raw

Homering

In the SDSS engineering-grade data, many fields contain "donuts". When the telescope is out of focus, the CCDs see a ring, rather than a point, for each star. The SDSS object detector erroneous converts the ring into many objects. This causes problems since none of the detected objects are particularly good estimates of the object's true position. We have implemented a simple preprocessor which attempts to seek out and devour donuts. It is named homer in tribute to one of our heroes.

Solving

Solving fields requires the star kdtree, code kdtree, and quad list created in the indexing stage, in addition to the homered fields to solve.

The solving process proceeds as follows. As during the indexing phase, we sequentially add field objects, starting with the brightest, and build all valid quads. If we have bounds on the pixel scale of the field, we can drastically reduce the number of quads that we need to consider, since we know that the index contains only quads of a limited range of scales. For each field quad that we create, we compute its code (shape descriptor) and search for nearby codes in the code kdtree. Using the quad list, we map each matching code back to its constituent four stars. Once we find a small number of matches (typically 2) that agree on the location of the field, we perform verification. A match between field objects and indexed stars implies a transformation between field and star coordinates. We apply this transformation to each object in the field and see how well the field objects align with indexed stars in the region. This can be done efficiently using the star kdtree. A large overlap between field and index objects is very unlikely to occur by chance, so we virtually eliminate false positives in this way.

http://trac.astrometry.net/browser/trunk/doc/pipeline2.png?format=raw