This page documents some of the findings I (dstn) found when investigating the distribution of quads in code space.

Vanilla

After finding some oddities in the distribution of quads in code space, I eventually looked at the distribution of stars in space. For each star in our cut (typically 75,000 stars for GALEX), I found that star's nearest neighbour. The histogram shows a huge spike at zero that is unexpected.

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nn00c.png?format=raw

Here's a zoom-in of the distribution of stars whose nearest neighbours are within 50 arcseconds. Note the huge spike toward zero.

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nn00.png?format=raw

Zooming in a little further, we can see that it actually drops off sharply at 1.0 arcseconds. I conjecture that this is the duplicate-star detection ratio that someone upstream has set. It seems to be letting a few duplicates get through. There are about 2000 stars with nearest neighbour within 5 arcsec, out of 75,000 stars in this cut.

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nn00b.png?format=raw

If we plot the positions of stars that are very close to each other, we find this: (these plots all show healpix 0; the three plots show the nearby stars for three definitions of "nearby".)

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nearbyxyz2.png?format=raw http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nearbyxyz4.png?format=raw http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nearbyxyz8.png?format=raw

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nearbyradec2.png?format=raw http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nearbyradec4.png?format=raw http://trac.astrometry.net/attachment/wiki/CodeAnalysis/nearbyradec8.png?format=raw

After de-duplicating stars

Hogg told me, "The plates from which these data were constructed were scanned with 1.7 arcsec pixels so it is audacious to include pairs closer than about 3 arcsec." Looking at the above histograms, the heavy "tail" toward zero seems to drop down to the background rate at about 8 arcseconds.

I removed stars within 8 arcsec of each other via the startree -d 8 command-line option. I kept 75,000 stars (before deduplication). With this deduplication radius, 1305 of 75,000 stars were removed.

With the index generated in this way, the distribution of stars look like this:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/starhist00.png?format=raw

And the distribution of quads in code space looks like this:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/codehist00.png?format=raw

The histogram of projecting the codes onto each pair of axes looks like this:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/hists_00_zoomout.png?format=raw

Zooming in, we see a bit of structure:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/codeprojs00.png?format=raw

The same plot, but looking down from above:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/codeprojs00b.png?format=raw

Here are the one-dimensional projections:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/oned00.png?format=raw

Random stars

I generated a catalogue of stars randomly using the randcat program. I generated 1,000,000 stars, and kept 450,000 during startree. I did not de-duplicate stars (I found that there were only 41 pairs of stars within 8 arcsec of each other). This generated about 13M quads. Here are the projections:

http://trac.astrometry.net/attachment/wiki/CodeAnalysis/codeprojs_rand.png?format=raw

Attachments