API
RangeExtractor.AbstractTileOperation
RangeExtractor.AsyncSingleThreaded
RangeExtractor.Dagger
RangeExtractor.FixedGridTiling
RangeExtractor.Multithreaded
RangeExtractor.RecombiningTileOperation
RangeExtractor.Serial
RangeExtractor.SumTileOperation
RangeExtractor.TileOperation
RangeExtractor.TileState
RangeExtractor.TilingStrategy
RangeExtractor._nothing_or_view
RangeExtractor.allocate_result
RangeExtractor.crop_ranges_to_array
RangeExtractor.extract
RangeExtractor.extract!
RangeExtractor.similar_blank
RangeExtractor.AbstractTileOperation
— TypeAbstractTileOperation
Abstract type for tile operations, which are callable structs that can operate on a TileState
.
Interface
All subtypes of AbstractTileOperation
MUST implement the following interface:
(op::AbstractTileOperation)(state::TileState)
: Apply the operation to the given tile state. Return a tuple of (containedresults, sharedresults). The order of the results MUST be the same as the order of the indices instate.contained_ranges
andstate.shared_ranges
.combine(op::AbstractTileOperation, range, metadata, results, tile_idxs)
: Combine the outputs from the portions of the shared ranges, and return the final result.
Optionally, subtypes can implement the following methods:
(op::AbstractTileOperation)(state::TileState, contained_channel, shared_channel)
: Apply the operation to the given tile state, and write output to the specified channels. Output to both of the channels must be in the form(tile_idx, result_idx) => result
, wheretile_idx
is the index of the tile in theTileState
, andresult_idx
is the index of the result in thecontained_ranges
orshared_ranges
of theTileState
.This interface is a bit more efficient, since it avoids the overhead of allocating intermediate arrays. That can be useful to cut down on inference and GC overhead, especially when dealing with many tiles.
contained_result_type(::Type{<: AbstractTileOperation}, DataArrayType)
: Return the type of the results from the contained ranges. Defaults toAny
.shared_result_type(::Type{<: AbstractTileOperation}, DataArrayType)
: Return the type of the results from the shared ranges. Defaults toAny
.
RangeExtractor.AsyncSingleThreaded
— TypeAsyncSingleThreaded(; ntasks = 0)
Asynchronous execution, but only using a single thread.
RangeExtractor.Dagger
— TypeDagger()
Dagger execution, using Dagger.jl
's distributed execution. Runs asynchronously.
RangeExtractor.FixedGridTiling
— TypeFixedGridTiling(chunk_sizes...)
Tiles a domain into a fixed grid of chunks.
Geometries that are fully encompassed by a tile are processed in bulk whenever a tile is read.
Geometries that lie on chunk boundaries are added to a separate queue, and whenever a tile is read, the view on the tile representing the chunk of that geometry that lies within the tile is added to a queue. These views are combined by a separate worker task, and once all the information on a geometry has been read, it is processed by a second worker task.
However, this approach could potentially change. If we know the statistic is separable and associative (like a histogram with fixed bins, or mean
, min
, max
, etc.), we could simply process the partial geometries in the same task that reads the tile, and recombine at the end. This would allow us to avoid the (cognitive) overhead of managing separate channels and worker tasks. But it would be a bit less efficient and less general over the space of zonal functions. It would, however, allow massive zonals - so eg
RangeExtractor.Multithreaded
— TypeMultithreaded()
Multithreaded execution, using all available threads. Runs asynchronously.
RangeExtractor.RecombiningTileOperation
— TypeRecombiningTileOperation(f)
A tile operation that always passes f
a fully materialized array over the requested range.
Contained ranges are instantly materialized and evaluated when encountered; relevant sections of shared ranges are put in the shared channel and set aside.
When all components of a shared range are read, then f
is called with the fully materialized array.
Here, the combining function is simply an array mosaic. There is no difference between shared and contained operations.
RangeExtractor.Serial
— TypeSerial()
Serial execution, no asynchronicity at all.
RangeExtractor.SumTileOperation
— TypeSumTileOperation()
An operator that sums the values in each range.
This can freely change the order of summation, so results may not be floating-point accurate every time. But they will be approximately accurate.
RangeExtractor.TileOperation
— TypeTileOperation(; contained, shared, combine)
TileOperation(operation)
Create a tile operation that can operate on a TileState
and return a tuple of results from the contained and shared ranges.
It calls the contained
function on each contained range, and the shared
function on each shared range.
contained
and shared
are called with the signature func(data, metadata)
, where data
is a view of the tile data on the current range, and metadata
is the metadata for that range.
combine
is called with the signature combine(shared_results, shared_tile_idxs, metadata)
, where shared_results
is an array of results from the shared
function, shared_tile_idxs
is an array of the tile indices that the shared results came from, and metadata
is the metadata for that range.
If constructed with a single function, that function is used for both contained and shared operations.
Arguments
contained
: Function to apply to contained (non-overlapping) regionsshared
: Function to apply to shared (overlapping) regions
Examples
# Different functions for contained and shared regions
op = TileOperation(
contained = (data, meta) -> sum(data),
shared = (data, meta) -> sum(data),
combine = (x, _u, _w) -> sum(x)
)
# Same function for all three
op = TileOperation((data, meta) -> mean(data))
RangeExtractor.TileState
— TypeTileState{N, TileType, RowVecType}
TileState(tile::TileType, tile_offset::CartesianIndex{N}, contained_rows::AbstractVector, shared_rows::AbstractVector)
A struct that holds all the state that is local to a single tile.
Fields
tile
: The in-memory data of the tile.tile_ranges
: The ranges that the tile covers in the parent arraycontained_ranges
: The ranges of the rows that are fully contained in the tileshared_ranges
: The ranges of the rows that are only partially contained in the tile, i.e shared with other tilescontained_metadata
: The rows that are fully contained in the tileshared_metadata
: The rows that are only partially contained in the tile, i.e shared with other tiles
RangeExtractor.TilingStrategy
— Typeabstract type TilingStrategy
Abstract type for tiling strategies. Must hold all necessary information to create a tiling strategy.
All tiling strategies MUST implement the following methods:
indextype(::Type{<: TilingStrategy})
: Return the type of the index used by the tiling strategy. For example,FixedGridTiling
returnsCartesianIndex{N}
.RTreeTiling
might return a single integer, that corresponds to the R-tree node id.get_tile_indices(tiling, range)
: Given a range, return the indices of the tiles that the range intersects.tile_to_ranges(tiling, index)
: Given a tile index, return the ranges that the tile covers.split_ranges_into_tiles(tiling, ranges)
: Given a set of ranges, return three dictionaries:- A dictionary mapping tile indices to the indices of the ranges that the tile fully contains (Ints).
- A dictionary mapping tile indices to the indices of the ranges that the tile shares with one or more other ranges (Ints).
- A dictionary mapping the indices of the shared ranges (Ints) to the tile indices that contain them.
RangeExtractor._nothing_or_view
— Method_nothing_or_view(x, idx)
Return view(x, idx)
if x
is not nothing, otherwise return nothing.
This is made so that we can have metadata=nothing
, and have it still work with broadcast
.
RangeExtractor.allocate_result
— Methodallocate_result(operator, data, ranges, metadata, strategy)
Allocate a result array for the given extraction operation.
Returns a tuple of (allocated, nskip) where `nskip` is the number of ranges that should be skipped.
RangeExtractor.crop_ranges_to_array
— Methodcrop_ranges_to_array(array, ranges)
Crop the ranges
(a Tuple of AbstractUnitRange
) to the axes
of array
.
This uses intersect
internally to crop the ranges.
Returns a Tuple of AbstractUnitRange
, that have been cropped to the axes
of array
.
RangeExtractor.extract!
— Methodextract!([f], dest, data, ranges, [metadata]; strategy, threaded)
RangeExtractor.extract
— Methodextract([f], data, ranges, [metadata]; strategy, threaded)
Passing a function f
is more memory-efficient and faster, since the processing is done immediately as data is extracted, and the result is (usually) a lot smaller than the whole array!
RangeExtractor.similar_blank
— Methodsimilar_blank(array::AbstractArray)
Return a new array of the same type as array
, but with all values set to some "blank" value, depending on the array type.
In general this just returns zero(array)
, but for some array types (e.g. Rasters), this needs to be a different value (missingval(array)
).
This is used in the RecombiningTileOperator to create a blank array to store the recombined tiles.