Code standards#

Under progress…

Section to be completed

GEOP4TH’s philosophy#

GEOP4TH relies on generic and format-agnostic elementary functions, called geobricks and defined in the geobricks.py script.

The philosophy of geobricks is to aim for a high level of abstraction: mandatory user’s choices are to be reduced to the minimum. But it should still be possible for the advanced users to access advanced options. To combine these two goals, we use smart default values and smart procedures in order to infer the parameter values when they are not passed by the user.

Under progress…

More info to come on script organization…

File formats and extensions#

in a nutshell

  • when referring to the file extension (for instance ‘.json’), extension is used, and would ideally include the ‘.’ character

  • when referring to the file format (for instance ‘GeoJSON’), file_format is used, and would ideally not include the ‘.’ character

When developping new prep-processing steps, it is very likely that you will have to deal with file formats and extensions! In GEOP4TH code, we make a difference between those 2. An extension refers to the suffix to the file name (‘.asc’, ‘.tif’, ‘.json’…). We usually need this variable to differentiate between different file types such as raster files, vector files and so on. In that case we use the variable name extension as in geobricks.get_filelist() or geobricks.load_any(). We also consider that the extension values should include the ‘.’ character. Of course this theory, in practice we add a safeguard to deal with values missing the ‘.’:

if isinstance(extension, str):
    if extension[0] != '.': extension = '.' + extension

The tricky part to have in mind is that a single extension can refers to different file formats, for example .json can refer both to ‘JSON’, ‘GeoJSON’ or ‘TopoJSON’ files, .tif can refer both to ‘TIFF’ or ‘GeoTIFF’… (see wikipedia file formats).

In the situations told before (geobricks.get_filelist(), geobricks.load_any(), …), we do not mind if the extension refers to one file format or another, because we try our best to handle these potential issues in the code itself, so that users will not have to care for that. For instance in geobricks.load_any() we will differentiate between ‘JSON’ and ‘GeoJSON’ during the loading step:

try:
    data_ds = gpd.read_file(data, **kwargs)
except: # DataSourceError
    try:
        data_ds = pd.read_json(data, **kwargs)
    except:
        data_ds = json.load(open(data, "r"))
        print("   Warning: The JSON file could not be loaded as a pandas.DataFrame and was loaded as a dict")

Anyway, in some other cases, we want to differentiate between different file formats with the same extension. It is typically the case in download_fr.bnpe(). In the water withdrawal API of BNPE that this function queries, the user can explicitely choose between ‘JSON’ or ‘GeoJSON’ formats. Of course we wanted to keep this choice available. In that type of situation, we use the variable name file_format and we consider that its value should not include the ‘.’. Once again, we still add a nice safeguard:

if file_formats[i][0] == '.': file_formats[i] = file_formats[i][1:]