Optimizing performance for mosaicking#
Using memory-mapped output arrays#
If you are producing a large mosaic, you may be want to write the mosaic and footprint to an array of your choice, such as for example a memory-mapped array. For example:
>>> output_array = np.memmap(filename='array.np', mode='w+',
... shape=shape_out, dtype='float32')
>>> output_footprint = np.memmap(filename='footprint.np', mode='w+',
... shape=shape_out, dtype='float32')
>>> reproject_and_coadd(...,
... output_array=output_array,
... output_footprint=output_footprint)
Using memory-mapped intermediate arrays#
During the mosaicking process, each cube is reprojected to the minimal subset of
the final header that it covers. In some cases, this can result in arrays that
may not fit in memory. In this case, you can use the intermediate_memmap
option to indicate that all intermediate arrays in the mosaicking process should
use memory-mapped arrays rather than in-memory arrays:
>>> reproject_and_coadd(...,
... intermediate_memmap=True)
Combined with the above option to specify the output array and footprint for the
final mosaic, it is possible to make sure that no large arrays are ever loaded
into memory. Note however that you will need to make sure you have sufficient disk
space in your temporary directory. If your default system temporary directory does
not have sufficient space, you can set the TMPDIR environment variable to point
at another directory:
>>> import os
>>> os.environ['TMPDIR'] = '/home/lancelot/tmp'
Multi-threading#
Similarly to single-image reprojection (see Multi-threaded reprojection), it is possible
to make use of multi-threading during the mosaicking process by setting the
parallel= option to True or to an integer value to indicate the number of
threads to use.
Using dask.distributed#
The dask.distributed package makes
it possible to use distributed schedulers for dask. In order to do mosaicking
with dask.distributed, set up the client and then call
reproject_and_coadd() with the
parallel='current-scheduler' option.