API

RangeExtractor.AbstractTileOperationType
AbstractTileOperation

Abstract type for tile operations, which are callable structs that can operate on a TileState.

Interface

All subtypes of AbstractTileOperation MUST implement the following interface:

  • (op::AbstractTileOperation)(state::TileState): Apply the operation to the given tile state. Return a tuple of (containedresults, sharedresults). The order of the results MUST be the same as the order of the indices in state.contained_ranges and state.shared_ranges.
  • combine(op::AbstractTileOperation, range, metadata, results, tile_idxs): Combine the outputs from the portions of the shared ranges, and return the final result.

Optionally, subtypes can implement the following methods:

  • (op::AbstractTileOperation)(state::TileState, contained_channel, shared_channel): Apply the operation to the given tile state, and write output to the specified channels. Output to both of the channels must be in the form (tile_idx, result_idx) => result, where tile_idx is the index of the tile in the TileState, and result_idx is the index of the result in the contained_ranges or shared_ranges of the TileState.

    This interface is a bit more efficient, since it avoids the overhead of allocating intermediate arrays. That can be useful to cut down on inference and GC overhead, especially when dealing with many tiles.

  • contained_result_type(::Type{<: AbstractTileOperation}, DataArrayType): Return the type of the results from the contained ranges. Defaults to Any.

  • shared_result_type(::Type{<: AbstractTileOperation}, DataArrayType): Return the type of the results from the shared ranges. Defaults to Any.

source
RangeExtractor.FixedGridTilingType
FixedGridTiling(chunk_sizes...)

Tiles a domain into a fixed grid of chunks.

Geometries that are fully encompassed by a tile are processed in bulk whenever a tile is read.

Geometries that lie on chunk boundaries are added to a separate queue, and whenever a tile is read, the view on the tile representing the chunk of that geometry that lies within the tile is added to a queue. These views are combined by a separate worker task, and once all the information on a geometry has been read, it is processed by a second worker task.

However, this approach could potentially change. If we know the statistic is separable and associative (like a histogram with fixed bins, or mean, min, max, etc.), we could simply process the partial geometries in the same task that reads the tile, and recombine at the end. This would allow us to avoid the (cognitive) overhead of managing separate channels and worker tasks. But it would be a bit less efficient and less general over the space of zonal functions. It would, however, allow massive zonals - so eg

source
RangeExtractor.RecombiningTileOperationType
RecombiningTileOperation(f)

A tile operation that always passes f a fully materialized array over the requested range.

Contained ranges are instantly materialized and evaluated when encountered; relevant sections of shared ranges are put in the shared channel and set aside.

When all components of a shared range are read, then f is called with the fully materialized array.

Here, the combining function is simply an array mosaic. There is no difference between shared and contained operations.

source
RangeExtractor.SumTileOperationType
SumTileOperation()

An operator that sums the values in each range.

This can freely change the order of summation, so results may not be floating-point accurate every time. But they will be approximately accurate.

source
RangeExtractor.TileOperationType
TileOperation(; contained, shared, combine)
TileOperation(operation)

Create a tile operation that can operate on a TileState and return a tuple of results from the contained and shared ranges.

It calls the contained function on each contained range, and the shared function on each shared range.

contained and shared are called with the signature func(data, metadata), where data is a view of the tile data on the current range, and metadata is the metadata for that range.

combine is called with the signature combine(shared_results, shared_tile_idxs, metadata), where shared_results is an array of results from the shared function, shared_tile_idxs is an array of the tile indices that the shared results came from, and metadata is the metadata for that range.

If constructed with a single function, that function is used for both contained and shared operations.

Arguments

  • contained: Function to apply to contained (non-overlapping) regions
  • shared: Function to apply to shared (overlapping) regions

Examples

# Different functions for contained and shared regions
op = TileOperation(
    contained = (data, meta) -> sum(data),
    shared = (data, meta) -> sum(data),
    combine = (x, _u, _w) -> sum(x)
)

# Same function for all three
op = TileOperation((data, meta) -> mean(data))
source
RangeExtractor.TileStateType
TileState{N, TileType, RowVecType}
TileState(tile::TileType, tile_offset::CartesianIndex{N}, contained_rows::AbstractVector, shared_rows::AbstractVector)

A struct that holds all the state that is local to a single tile.

Fields

  • tile: The in-memory data of the tile.

  • tile_ranges: The ranges that the tile covers in the parent array

  • contained_ranges: The ranges of the rows that are fully contained in the tile

  • shared_ranges: The ranges of the rows that are only partially contained in the tile, i.e shared with other tiles

  • contained_metadata: The rows that are fully contained in the tile

  • shared_metadata: The rows that are only partially contained in the tile, i.e shared with other tiles

source
RangeExtractor.TilingStrategyType
abstract type TilingStrategy

Abstract type for tiling strategies. Must hold all necessary information to create a tiling strategy.

All tiling strategies MUST implement the following methods:

  • indextype(::Type{<: TilingStrategy}): Return the type of the index used by the tiling strategy. For example, FixedGridTiling returns CartesianIndex{N}. RTreeTiling might return a single integer, that corresponds to the R-tree node id.
  • get_tile_indices(tiling, range): Given a range, return the indices of the tiles that the range intersects.
  • tile_to_ranges(tiling, index): Given a tile index, return the ranges that the tile covers.
  • split_ranges_into_tiles(tiling, ranges): Given a set of ranges, return three dictionaries:
    • A dictionary mapping tile indices to the indices of the ranges that the tile fully contains (Ints).
    • A dictionary mapping tile indices to the indices of the ranges that the tile shares with one or more other ranges (Ints).
    • A dictionary mapping the indices of the shared ranges (Ints) to the tile indices that contain them.
source
RangeExtractor._nothing_or_viewMethod
_nothing_or_view(x, idx)

Return view(x, idx) if x is not nothing, otherwise return nothing.

This is made so that we can have metadata=nothing, and have it still work with broadcast.

source
RangeExtractor.allocate_resultMethod
allocate_result(operator, data, ranges, metadata, strategy)

Allocate a result array for the given extraction operation.

Returns a tuple of (allocated, nskip) where `nskip` is the number of ranges that should be skipped.

source
RangeExtractor.crop_ranges_to_arrayMethod
crop_ranges_to_array(array, ranges)

Crop the ranges (a Tuple of AbstractUnitRange) to the axes of array.

This uses intersect internally to crop the ranges.

Returns a Tuple of AbstractUnitRange, that have been cropped to the axes of array.

source
RangeExtractor.extractMethod
extract([f], data, ranges, [metadata]; strategy, threaded)

Passing a function f is more memory-efficient and faster, since the processing is done immediately as data is extracted, and the result is (usually) a lot smaller than the whole array!

source
RangeExtractor.similar_blankMethod
similar_blank(array::AbstractArray)

Return a new array of the same type as array, but with all values set to some "blank" value, depending on the array type.

In general this just returns zero(array), but for some array types (e.g. Rasters), this needs to be a different value (missingval(array)).

This is used in the RecombiningTileOperator to create a blank array to store the recombined tiles.

source