transform#

geobricks.transform(data, *, src_crs=None, base_template=None, template_crs=None, bounds=None, bounds_crs=None, x0=None, y0=None, mask=None, mask_crs=None, drop=False, to_file=False, export_extension=None, rasterize=False, main_var_list=None, rasterize_mode=['mean', 'coverage', 'and'], force_polygon=False, force_point=False, **rio_kwargs)[source]#

Reproject, clip, rasterize or convert space-time data. transform(), reproject() and convert() are three aliases of the same function.

Parameters#

datastr, pathlib.Path, xarray.Dataset, xarray.DataArray, geopandas.GeoDataFrame or pandas.DataFrame

Data to transform. Supported file formats are .tif, .asc, .nc, vector formats supported by geopandas (.shp, .json, …), and .csv.

src_crsint or str or rasterio.crs.CRS, optional

Coordinate reference system of the source (data). When passed as an integer, src_crs refers to the EPSG code. When passed as a string, src_crs can be OGC WKT string or Proj.4 string.

base_templatestr, Path, xarray.DataArray or geopandas.GeoDataFrame, optional

Filepath, used as a template for spatial profile. Supported file formats are .tif, .nc and vector formats supported by geopandas (.shp, .json, …).

template_crsint or str or rasterio.crs.CRS, optional

Coordinate reference system of the base_template. When passed as an integer, template_crs refers to the EPSG code. When passed as a string, template_crs can be OGC WKT string or Proj.4 string.

boundsiterable or None, optional, default None

Boundaries of the target domain as a tuple (x_min, y_min, x_max, y_max). The values are expected to be given according to bounds_crs if it is not None. If bounds_crs is None, bounds are expected to be given according to the destination CRS dst_crs if it is not None. It dst_crs is also None, bounds are then expected to be given according to the source CRS (src_crs of data’s CRS).

bounds_crsint or str or rasterio.crs.CRS, optional

Coordinate reference system of the bounds (if bounds is not None). When passed as an integer, src_crs refers to the EPSG code. When passed as a string, src_crs can be OGC WKT string or Proj.4 string.

x0: number, optional, default None

Origin of the X-axis, used to align the reprojection grid.

y0: number, optional, default None

Origin of the Y-axis, used to align the reprojection grid.

maskstr, Path, shapely.geometry, xarray.DataArray or geopandas.GeoDataFrame, optional

Filepath of mask used to clip the data.

mask_crsint or str or rasterio.crs.CRS, optional

Coordinate reference system of the mask. When passed as an integer, mask_crs refers to the EPSG code. When passed as a string, mask_crs can be OGC WKT string or Proj.4 string.

dropbool, default False

Only applicable for raster/xarray.Dataset types. If True, coordinate labels that only correspond to NaN values are dropped from the result.

to_filebool or path (str or pathlib.Path), default False

If True and if data is a path (str or pathlib.Path), the resulting dataset will be exported to a file with the same pathname and the suffix ‘_geop4th’. If to_file is a path, the resulting dataset will be exported to this specified filepath.

export_extensionstr, optional

Extension to which the data will be converted and exported. Only used when the specified data is a filepath. It data is a variable and not a file, it will not be exported.

If rasterize=True and export_extension is not specified, it will be set to ‘.tif’ by default.

rasterizebool, default False

Option to rasterize data (if data is a vector data).

main_var_listiterable, default None

Data variables to rasterize. Only used if rasterize is True. If None, all variables in data are rasterized.

rasterize_modestr or list of str, or dict, default [‘mean’, ‘coverage’, ‘and’]

Defines the mode to rasterize data:

  • for numeric variables: 'count', 'sum' or 'mean' (default)

    • 'mean' refers to:

      • the sum of polygon values weighted by their relative coverage on each cell, when the vector data contains Polygons (appropriate for intensive quantities)

      • the average value of points on each cell, when the vector data contains Points

    • 'sum' refers to:

      • the sum of polygon values downscaled to each cell (appropriate for extensive quantities)

      • the sum values of points on each cell, when the vector data contains Points

    • 'count' refers to:

      • the number of points or polygons intersecting each cell

  • for categorical variables: 'fraction' or 'dominant' or 'coverage' (default)

    • 'coverage' refers to: - the area covered by each level on each cell, when the vector data contains Polygons - the count of points for each level on each cell, when the vector data contains Points

    • 'dominant' rises the most frequent level for each cell

    • 'fraction' creates a new variable per level, which stores the fraction (from 0 to 1) of the coverage of this level compared to all levels, for each cell.

  • for boolean variables: 'or' or 'and' (default)

The modes can be specified for each variable by passing rasterize_mode as a dict: {'<var1>': 'mean', '<var2>': 'percent', ...}. This argument specification makes it possible to force a numeric variable to be rasterized as a categorical variable. Unspecified variables will be rasterized with the default mode. If data contains no variable other than ‘geometry’, the arbitrary name ‘data’ can be used to specify a mode for the whole data.

force_polygonbool, default False

Only Polygon geometry types will be kept when rasterizing.

force_pointbool, default False,

Only Point geometry types will be kept when rasterizing.

**rio_kwargskeyword args, optional

Argument passed to the xarray.Dataset.rio.reproject() function call.

Note : These arguments are prioritary over base_template attributes.

May contain:

  • dst_crs : str

  • resolution : float or tuple

  • shape : tuple (int, int) of (height, width)

  • transform : Affine

  • nodata : float or None

  • resampling :

    • see help(rasterio.enums.Resampling)

    • most common are: 5 (average), 13 (sum), 0 (nearest), 9 (min), 8 (max), 1 (bilinear), 2 (cubic)…

    • the functionality 'std' (standard deviation) is also available

  • see help(xarray.Dataset.rio.reproject)

Returns#

Transformed dataxarray.Dataset or geopandas.GeoDataFrame.

The type of the resulting variable is accordingly to the type of input data and to the conversion operations (such as rasterize):

  • all vector data will be output as a geopandas.GeoDataFrame

  • all raster data and netCDF will be output as a xarray.Dataset

If data is a file, the resulting dataset will be exported to a file as well (with the suffix ‘_geop4th’), except if the parameter to_file=False is passed.