The backend/frontend split
General Dataflow:
- The frontend interacts with the user to obtain a list of extracted sources with some extra information about the file (estimated scale, etc).
- The backend takes the FITS file (called a job file) and solves the field.
Underneath, the backend pushes some files around, locates some indexes (which the frontend doesn't need to know about), generates some blind input files, maybe launches some blind instances locally or remotely, and collects results after the solve is finished (successfully or not).
Architecture sketch
RPC implementation: the components are:
- web-facing front-end: user interface. Contacts the master node to submit jobs and retrieve results.
- master node: accepts jobs, distributes units of work to service nodes and aggregates results.
- service nodes: perform units of work.
When the user submits a job:
- The web front-end makes an RPC call to the master node, and gets back the job id.
- The master node cuts the job into units of work and distributes them to the service nodes; the job id identifies the job.
- One of the service nodes solves the job; it sends the results (identified by the job id) back to the master.
- The master tells the rest of the service nodes to cancel any work units they have queued that belong to the given job id.
- Meanwhile, the web front-end is polling the master node to get the job status (and results) via RPC calls, using the job id as the key.
PHP implementation: the components are:
- web-facing PHP scripts: user interface. Submits jobs to enqueuer.
- enqueuer: Puts jobs into a queue.
- watcher (dequeuer): Pulls jobs out of the queue and runs them (possibly on a remote compute server) using launch-blind.
- launch-blind: Sets up the environment to run blind.
When the user submits a job:
- The PHP script call augment-xylist to bundle up the user's information with the xylist to create an augmented xylist (axylist).
- The PHP script calls enqueuer to enqueue the job; it gets the job id back as return value.
- (The enqueuer does this by finding a directory (in the right site/epoch/ subdirectory) that doesn't already exist, creating it, and moving the axylist into it. It also adds the ANJOBID header to the axylist.)
- The watcher notices that a new axylist has appeared. It runs launch-blind with the axylist path as argument.
- launch-blind cd's into the axylist's directory, and either runs blind locally, or runs a script to run it remotely.
- Meanwhile, the PHP script retrieves job status and results by calling enqueuer --get <filename>.
The front-end interface
We want to provide a command-line interface that provides a similar user experience to the web site.
This command-line interface should also allow solving a multi-extension Sloan xylist on the local machine.
Scenario 0: solve a local file on the local machine
solve-a-field --image mypic.fits --tweak-order 4 \
--scale-low 2 --scale-high 4 --scale-units degwide --dir mypic-results
solve-a-field must run:
augment-xylist --guess-scale --image mypic.fits --scale-low 2 \
--scale-high 4 --scale-units degwide --tweak-order 4 --out mypic.axy \
--match mypic-results/match.fits --cancel mypic-results/cancel \
--solved mypic-results/solved --rdls mypic-results/rdls.fits --wcs mypic-results/wcs.fits
backend mypic.axy
render-job --dir mypic-results
Scenario 0b: solve an image URL on the local machine
solve-a-field --imageurl http://google.com/mypic.fits --tweak-order 4 \
--scale-low 2 --scale-high 4 --scale-units degwide --dir mypic-results
solve-a-field must run:
wget http://google.com/mypic.fits -O mypic.fits
augment-xylist --guess-scale --image mypic.fits --scale-low 2 \
--scale-high 4 --scale-units degwide --tweak-order 4 --out mypic.axy \
--match mypic-results/match.fits --cancel mypic-results/cancel \
--solved mypic-results/solved --rdls mypic-results/rdls.fits --wcs mypic-results/wcs.fits
backend mypic.axy
render-job --dir mypic-results
Scenario 1: like the web site, solve an image given its URL
solve-a-field --imageurl http://google.com/mypic.fits --tweak-order 4 \
--scale-low 2 --scale-high 4 --scale-units degwide
solve-a-field must run:
wget http://google.com/mypic.fits -O mypic.fits
augment-xylist --guess-scale --image mypic.fits --scale-low 2 \
--scale-high 4 --scale-units degwide --tweak-order 4 --out mypic.axy
submit-job --job mypic.axy --server http://live.astrometry.net/submit --jobfile mypic.job --output-dir .
monitor-job --jobfile mypic.job
render-job --jobfile mypic.job
augment-xylist uses image2xy to produce an xylist, then uses fits-guess-scale to find scale estimates, and get-wcs` to find existing WCS headers, and combines this information with the user's parameters to produce an augmented xylist.
The "jobfile" mypic.job would contain something like:
server http://live.astrometry.net/submit jobid alpha-200707-12345678 output-dir .
The submit-job program would return right away.
The monitor-job program would let you watch the log file (I love grass!). It will wait for the job to finish, then retrieve results.
(OR maybe submit-job just has a --monitor option.)
There will also be a cancel-job program.
The render-job program produces plots and stats, writes an HTML file with the results, etc.
Scenario 2: batch solve SDSS using the local machine
solve-a-field --xylist sdss.fits --tweak-order 4 \
--scale-low 0.38 --scale-high 0.41 --scale-units arcsecperpix \
--fields 1-9999 --local \
--noplots --nordls
The --fields argument says try to solve fields in FITS extensions 1-9999.
The --local argument says call the backend program directly, rather than submitting the job to a remote solver.
The --noplots and --nordls arguments tell it to not produce overlays or RDLS files. (These are too big and take too much time to render when dealing with many Sloan fields.)
The solve-a-field program must run:
augment-xylist --xylist sdss.fits --scale-low 0.38 --scale-high 0.41 \
--scale-units arcsecperpix --fields 1-9999 --noplots --nordls \
--out sdss.axy
backend --run sdss.axy
The backend process will look for a config file that tells it where indices can be found on the local filesystem, and any other site-specific config items (none that I can think of at the moment, but basically policy specifications). The --run flag tells it to run in the foreground.
How the network solver service will work
http://trac.astrometry.net/attachment/wiki/BackEndInterface/flow-submit.png?format=raw
The network solver needs to accept requests on a URL such as http://live.astrometry.net/submit. When it receives a job, it has to:
- receive the uploaded augmented xylist (newjob.axy) file
- run backend-q --submit newjob.axy --job newjob.job
- grab the job id from the newjob.job file and return it to the client
It also needs to accept requests from monitor-job and cancel-job, to retrieve the results or to cancel the job, respectively. I guess it has to use backend-q --cancel and backend-q --retrieve to do these things. backend-q --cancel will just call backend-remote --cancel. backend-q --retrieve will map the jobid to the directory in which the results are placed, and cat the requested file.
We want to place jobs in a queue instead of running them directly. Note that backend-q has the same interface as backend. backend-q just:
- chooses a job id
- creates a directory for the job
- moves the newjob.axy file there
- writes the newjob.job file (or maybe just writes it on stdout)
And then the watcher program will notice that a new file was placed in the queue. Once that job gets to the front of the queue, it will run backend-remote --submit newjob.axy --jobid alpha-200707-12345678.
backend-remote will start up an ssh connection to the compute server, send the job id and newjob.axy over the ssh pipe, and run the script backend-remote-server, which will print the log file to stdout and when the job finishes, tar up the results and send them back over the ssh pipe.
The backend-remote-server script will:
- read the jobid
- create a directory for the job
- read the newjob.axy file and place it in that directory
- run backend --run newjob.axy
The --run flag to backend tells it to run in the foreground.
It sounds complex, but it's basically the same as our current setup.
How the web edition will work
It could either use the backend-q tool, or augment-xylist, backend --submit to submit to the network solve service, and monitor-job.
I suspect we'll want to use the backend-q level of commands, because it avoids the roundabout route of talking to another network service.
The back-end interface
Backend job FITS file
The backend is defined as everything that happens after source extraction, up to and including generation of WCS and RDLS files. The input to the backend is a single FITS file containing:
- Parameters relevant to the file at hand to assist in solving
- The extracted (x,y) source positions
- Possibly many instances of the two previous items in extra HDU's
The backend produces the following information to be consumed by the frontend:
- WCS file
- RDLS file
- Match file
- Solved file
FITS keywords
Required keywords:
- IMAGEW - Image width. Positive real
- IMAGEH - Image height. Positive real
- ANRUN - the "Go" button.
Optional keywords:
- ANPOSERR - Field positional error, in pixels. Positive real. Default 1.
- ANSOLVED - Output filename for solved file. One byte per field, with value 0x1 if the field solved, 0x0 otherwise. The file is only created if at least one field solves.
- ANSOLVIN - Input filename for solved file. Lists fields that have already been solved.
- ANMATCH - Output filename for match file. FITS table describing the quad match that solved the field.
- ANRDLS - Output filename for index RDLS file. The (RA,Dec) of index stars that overlap the field.
- ANWCS - Output filename for WCS file. A FITS header containing the header cards required to specify the solved WCS.
- ANCANCEL - Input filename - if this file exists, the job is cancelled.
- ANTLIM - Wall-clock time limit, in seconds (default inf)
- ANCLIM - CPU time limit, in seconds (default inf)
- ANPARITY - "BOTH", "POS", "NEG" (default "BOTH")
- ANTWEAK - FITS Boolean ie 'T' or 'F' (default 'T')
- ANTWEAKO - Tweak order. Integer in [1, 10], default=3. Why 3? Radial distortion requires 4.
- ANAPPL# - Lower bound on scale (in arcseconds per pixel) for estimate#, # = 1, 2, ... (default determined by the backend.)
- ANAPPU# - Upper bound on scale (in arcseconds per pixel) for estimate#, # = 1, 2, ... (default determined by the backend.)
- ANAPPDEF - (boolean) Also include the default range of scales (the scale bounds given are uncertain).
- ANDPL# / ANDPU# - The field objects to examine in round #. '''L''' and '''U''' stand for Lower and Upper. ANDPL1=0, ANDPU1=20 means look at field objects 0 through 20 in the first round. Default: ANDPL1 = 0, ANDPU1 = infinity
- ANFDL#/ANFDU# - Add fields to be solved, specified in inclusive table HDU ranges starting from 1. The ANFD'''L'''/ANFD'''U''' stands for Upper and Lower. Defaults to all HDU's, but if a single ANFD[LU]# is specified, then only the fields explicitly listed are used. Multiple ANFD[LU]#'s can be present and the solved field are the union of the specified ranges. Only valid in the primary header.
- ANFD#### - Add single fields to be solved. Only valid in the primary header.
- ANODDSPR - Odds ratio required before to printing a solution. Default 1e3.
- ANODDSKP - Odds ratio required before to keep a solution. Default 1e9.
- ANODDSSL - Odds ratio required before to consider a solution solved. Default 1e9.
- ANIMFRAC - Fraction of the rectangular image that the field actually occupies (eg for round fields). Real in (0, 1], default 1.
- ANCTOL - Code tolerance. Units of distance in 4-D code space. Real > 0, default 0.01.
- ANDISTR - Distractor fraction. Real in (0, 1). Default is 0.25.
- ANINTRLV - The interleaving strategy to use. Will be a string.
- ANXCOL, ANYCOL - Name of the FITS column containing the X and Y coordinates of sources.
WCS to verify: if the image already has a WCS that you want to verify, convert it to TAN format and specify it using these headers. If there are multiple WCS instances to try, list them all, using increasing # values for each one. The first one should have #=1, the second one #=2, etc.
- ANW#PIX1 CRPIX1
- ANW#PIX2 CRPIX2
- ANW#VAL1 CRVAL1
- ANW#VAL2 CRVAL2
- ANW#CD11 CD1_1
- ANW#CD12 CD1_2
- ANW#CD21 CD2_1
- ANW#CD22 CD2_2
SIP extensions:
- ANW#SAO SIP A/B order
- ANW#SAPO SIP AP/BP order
- ANW#A## SIP polynomial A_#_#
- ANW#B## SIP polynomial A_#_#
- ANW#AP## SIP polynomial A_#_#
- ANW#BP## SIP polynomial A_#_#
Extensions
The FITS job file must have the required keywords defined in the primary HDU. However, individual extensions can specify any of the above solver keywords which override the default and file-wide keywords.
Attachments
- flow-submit.png (3.9 kB) - added by dstn 18 months ago.
