Code standards#

Under progress…

Section to be completed

GEOP4TH’s philosophy#

GEOP4TH relies on generic and format-agnostic elementary functions, called geobricks and defined in the geobricks.py script.

The philosophy of geobricks is to aim for a high level of abstraction: mandatory user’s choices are to be reduced to the minimum. But it should still be possible for the advanced users to access advanced options. To combine these two goals, we use smart default values and smart procedures in order to infer the parameter values when they are not passed by the user (for some examples, have a look at geobricks.transform(), geobricks.load(), geobricks.compare()…).

Geobricks form the foundation of GEOP4TH. Therefore, special attention is given to them, and they are expected to remain as stable as possible.

On the other hand, workflows form the cooperative branch of GEOP4TH. Therefore, they benefit from a high degree of freedom and flexibility. Worfklows can be developped with procedural programming (functions) or object-oriented programming (classes and methods). To find some inspiration you can have a look at standardize_fr.bdalti() or standardize_fr.sim2() (standardize) or download_fr.bnpe() (download) for procedural approach, or at standardize_wl.ERA5StandardizerWL (standardize) or download_wl.ERA5LandDownloader (download) for objetc-oriented approach.

Under progress…

More info to come on script organization…

Artificial Intelligence statement#

GEOP4TH advocates for a fairer world. This calls for a more equal distribution of wealth, more collective decisions and more sustainable societies. Generative artificial intelligence blatantly contravenes to that in multiple ways (here is a manifest with more detailled views (French)). The use of AI in GEOP4TH development is not prohibited, but it is strognly advised for contributors to take the time and some step back to evaluate the ins and outs of AI before using it. If used, for the sake of transparency and contributor accountability, it is required to explicitely state it in the docstrings and in the documentation of the modules where AI has been used.

File formats and extensions#

in a nutshell

  • when referring to the file extension (for instance ‘.json’), extension is used, and would ideally include the ‘.’ character

  • when referring to the file format (for instance ‘GeoJSON’), file_format is used, and would ideally not include the ‘.’ character

When developping new prep-processing steps, it is very likely that you will have to deal with file formats and extensions! In GEOP4TH code, we make a difference between those 2. An extension refers to the suffix to the file name (‘.asc’, ‘.tif’, ‘.json’…). We usually need this variable to differentiate between different file types such as raster files, vector files and so on. In that case we use the variable name extension as in geobricks.get_filelist() or geobricks.load(). We also consider that the extension values should include the ‘.’ character. Of course this theory, in practice we add a safeguard to deal with values missing the ‘.’:

if isinstance(extension, str):
    if extension[0] != '.': extension = '.' + extension

The tricky part to have in mind is that a single extension can refers to different file formats, for example .json can refer both to ‘JSON’, ‘GeoJSON’ or ‘TopoJSON’ files, .tif can refer both to ‘TIFF’ or ‘GeoTIFF’… (see wikipedia file formats).

In the situations told before (geobricks.get_filelist(), geobricks.load(), …), we do not mind if the extension refers to one file format or another, because we try our best to handle these potential issues in the code itself, so that users will not have to care for that. For instance in geobricks.load() we will differentiate between ‘JSON’ and ‘GeoJSON’ during the loading step:

try:
    data_ds = gpd.read_file(data, **kwargs)
except: # DataSourceError
    try:
        data_ds = pd.read_json(data, **kwargs)
    except:
        data_ds = json.load(open(data, "r"))
        print("   Warning: The JSON file could not be loaded as a pandas.DataFrame and was loaded as a dict")

Anyway, in some other cases, we want to differentiate between different file formats with the same extension. It is typically the case in download_fr.bnpe(). In the water withdrawal API of BNPE that this function queries, the user can explicitely choose between ‘JSON’ or ‘GeoJSON’ formats. Of course we wanted to keep this choice available. In that type of situation, we use the variable name file_format and we consider that its value should not include the ‘.’. Once again, we still add a nice safeguard:

if file_formats[i][0] == '.': file_formats[i] = file_formats[i][1:]