Optimizing performance for single-image reprojection#

Disabling coordinate transformation round-tripping#

For the interpolation and adaptive algorithms, an optional third argument, roundtrip_coords is accepted. By default, after coordinates are transformed from the output plane to the input plane, the input-plane coordinates are transformed back to the output plane to ensure that the transformation is defined in both directions. This doubles the amount of coordinate-transformation work to be done. In speed-critical situations, where it is known that the coordinate transformation is defined everywhere, this extra work can be disabled by setting roundtrip_coords=False. (Note that this is not a question of whether each output pixel maps to an existing pixel in the input image and vice-versa, but whether it maps to a valid coordinate in the coordinate system of the input image—regardless of whether that coordinate falls within the bounds of the input image.)

Disabling returning the footprint#

If you don’t need the output footprint after reprojection, you can set return_footprint=False to return only the reprojected array. This can save memory and in some cases computing time:

>>> array = reproject_interp(..., return_footprint=False)

Using memory-mapped output arrays#

If you are dealing with a large dataset to reproject, you may be want to write the reprojected array (and optionally the footprint) to an array of your choice, such as for example a memory-mapped array. For example:

>>> header_out = fits.Header.fromtextfile('cube_header_gal.hdr')
>>> shape = (header_out['NAXIS3'], header_out['NAXIS2'], header_out['NAXIS1'])
>>> array_out = np.memmap(filename='output.np', mode='w+',
...                       shape=shape, dtype='float32')
>>> hdu = fits.open('cube_file.fits')
>>> reproject_interp(hdu, header_out, output_array=array_out,
...                  return_footprint=False)

After the call to reproject_interp(), array_out will contain the reprojected values. If you set up a memory-mapped array for the footprint you could also do:

>>> reproject_interp(hdu, header_out, output_array=array_out,
...                  output_footprint=footprint_out)

If you are dealing with FITS files, you can skip the numpy memmap step and use FITS large file creation:

>>> header_out = fits.Header.fromtextfile('cube_header_gal.hdr')
>>> header_out.tofile('new_cube.fits')
>>> shape = tuple(header_out['NAXIS{0}'.format(ii)] for ii in range(1, header_out['NAXIS']+1))
>>> with open('new_cube.fits', 'rb+') as fobj:
...     fobj.seek(len(header_out.tostring()) + (np.product(shape) * np.abs(header_out['BITPIX']//8)) - 1)
...     fobj.write(b'\0')
>>> hdu_out = fits.open('new_cube.fits', mode='update')
>>> rslt = reproject.reproject_interp(hdu, header_out, output_array=hdu_out[0].data,
...                                   return_footprint=False)
>>> hdu_out.flush()

Multiple images with the same coordinates#

If you have multiple images with the exact same coordinate system (e.g. a raw image and a corresponding processed image) and want to reproject both to the same output frame, it is faster to compute the coordinate mapping between input and output pixels only once and reuse this mapping for each reprojection. This is supported by passing multiple input images as an additional dimension in the input data array. When the input array contains more dimensions than the input WCS describes, the extra leading dimensions are taken to represent separate images with the same coordinates, and the reprojection loops over those dimensions after computing the pixel mapping. For example:

>>> raw_image, header_in = fits.getdata('raw_image.fits', header=True)
>>> bg_subtracted_image = fits.getdata('background_subtracted_image.fits')
>>> header_out = # Prepare your desired output projection here
>>> # Combine the two images into one array
>>> image_stack = np.stack((raw_image, bg_subtracted_image))
>>> # We provide a header that describes 2 WCS dimensions, but our input
>>> # array shape is (2, ny, nx)---the 'extra' first dimension represents
>>> # separate images sharing the same coordinates.
>>> reprojected, footprint = reproject.reproject_adaptive(
...         (image_stack, header_in), header_out)
>>> # The shape of `reprojected` is (2, ny', nx')
>>> reprojected_raw, reprojected_bg_subtracted = reprojected[0], reprojected[1]

For reproject_interp() and reproject_adaptive(), this is approximately twice as fast as reprojecting the two images separately. For reproject_exact() the savings are much less significant, as producing the coordinate mapping is a much smaller portion of the total runtime for this algorithm.

When the output coordinates are provided as a WCS and a shape_out tuple, shape_out may describe the output shape of a single image, in which case the extra leading dimensions are prepended automatically, or it may include the extra dimensions, in which case the size of the extra dimensions must match those of the input data exactly.

While the reproject functions can accept the name of a FITS file as input, from which the input data and coordinates are loaded automatically, this multi-image reprojection feature does not support loading multiple images automatically from multiple HDUs within one FITS file, as it would be difficult to verify automatically that the HDUs contain the same exact coordinates. If multiple HDUs do share coordinates and are to be reprojected together, they must be separately loaded and combined into a single input array (e.g. using np.stack as in the above example).

Chunk by chunk reprojection#

When calling one of the reprojection functions, you can specify a block size to use for the reprojection, and this is used to iterate over chunks in the output array in chunks. For instance, if you pass in a (1024, 1024) array and specify that the shape of the output should be a (2048, 2048) array (e.g., via shape_out), then if you set block_size=(256, 256):

>>> from reproject import reproject_interp
>>> input_array.shape
(1024, 1024)
>>> array, footprint = reproject_interp((input_array, wcs_in), wcs_out,
...                                     shape_out=(2048, 2048), block_size=(256, 256))

the reprojection will be done in 64 separate output chunks. Note however that this does not break up the input array into chunks since in the general case any input pixel may contribute to any output pixel.

Multi-threaded reprojection#

By default, the iteration over the output chunks is done in a single process/thread, but you may specify parallel=True to process these in parallel. If you do this, reproject will use multiple threads to parallelize the computation. If you specify parallel=True, then block_size will be automatically set to a sensible default, but you can also set block_size manually for more control. Note that you can also set parallel= to an integer to indicate the number of threads to use.

By default, in parallel mode, the entire input array will be written to a temporary file that is then memory-mapped - this is to avoid loading the whole input array into memory in each process. If you are specifying a WCS with fewer dimensions than the data to be reprojected, as described in Multiple images with the same coordinates, you can set the block size to be such that the block size along the dimensions being reprojected cover the whole image, while the other dimensions can be smaller. For example, if you are reprojecting a spectral cube with dimensions (500, 2048, 2048) where 500 is the number of spectral channels and (2048, 2048) is the celestial plane, then if you are reprojecting just the celestial part of the WCS you can specify a block size of (N, 2048, 2048) and this will enable a separate reprojection mode where the input array is not written to disk but where the reprojection is done in truly independent chunks with size (N, 2048, 2048).

Input dask arrays#

The three main reprojection functions can accept dask arrays as inputs, e.g. assuming you have already constructed a dask array named dask_array:

>>> dask_array
dask.array<array, shape=(1024, 1024), dtype=float64, chunksize=(128, 128), chunktype=numpy.ndarray>

you can pass this in as part of the first argument to one of the reprojection functions:

>>> array, footprint = reproject_interp((dask_array, wcs_in), wcs_out,
...                                     shape_out=(2048, 2048))

In general however, we cannot benefit much from the chunking of the input arrays because any input pixel might in principle contribute to any output pixel. Therefore, for now, when a dask array is passed as input, it is computed using the current default scheduler and converted to a Numpy memory-mapped array. This is done efficiently in terms of memory and never results in the whole dataset being loaded into memory at any given time. However, this does require sufficient space on disk to store the array. If your default system temporary directory does not have sufficient space, you can set the TMPDIR environment variable to point at another directory:

>>> import os
>>> os.environ['TMPDIR'] = '/home/lancelot/tmp'

Output dask arrays#

By default, the reprojection functions will do the computation immediately and return Numpy arrays for the reprojected array and optionally the footprint (this is regardless of whether dask or Numpy arrays were passed in, or any of the parallelization options above). However, by setting return_type='dask', you can make the functions delay any computation and return dask arrays:

>>> array, footprint = reproject_interp((input_array, wcs_in), wcs_out,
...                                     shape_out=(2048, 2048), block_size=(256, 256),
...                                     return_type='dask')
>>> array
dask.array<getitem, shape=(2048, 2048), dtype=float64, chunksize=(256, 256), ...>

You can then compute the array or a section of the array yourself whenever you need, or use the result in further dask expressions.

Using dask.distributed#

The dask.distributed package makes it possible to use distributed schedulers for dask. In order to compute reprojections with dask.distributed, set up the client and then call the reprojection functions with parallel='current-scheduler'. Alternatively, you can make use of the return_type='dask' option mentioned above so that you can compute the dask array once the distributed scheduler has been set up.