Integrated geospatial datasets to inform marine spatial planning and impact assessment in waters surrounding the United Kingdom

From the total 337 layers in this integrated geodataset, there are 33 interpolation layers (15 interpolation layers with no time series, 9 interpolation layers with time series and limit zone, and 9 interpolation layers with time series and no limit zone), and 26 kernel density estimated layers (10 estimation layers of no time series, and 16 estimation layers of time series) that are not just extracted or resampled from observation/model data given in the data source, and as such require technical validation. For validation, we calculated the mean relative standard error (RSE) to understand the uncertainty of the resulting interpolation layers, and root mean square error (RMSE) along with coefficient of determination (R²) to measure the deviation of both interpolated and kernel density estimation layers.

Here we describe the equations of the parameters and graphical representation of the measures. Mean RSE are given in Fig. 4[A] for layers with no time series, and Fig. 4[B] for layers with time series; the ratio of RMSE to the exact gridding value is shown in Fig. 5[A] for layers with no time series, and Fig. 5[B] for layers with time series; and R² is shown in Fig. 5[C] for layers with no time series, and in Fig. 5[D] for layers with time series. The ratio of RMSE to the exact gridding value and R² for the kernel density layers are given in Fig. 6[A] and [C], respectively for layers with no time series, and Fig. 6[B] and [D], respectively for layers with time series.

Table of Contents

Mean Relative Standard Error (RSE)

We include an estimate of uncertainty for all layers generated using empirical Bayesian kriging (EBK) spatial interpolation. Uncertainties are represented by the ratio of the kriging variance given by the semivariograms of the kriging interpolation upon the prediction, and averaged for all interpolated grids, which is termed as the mean relative standard error (RSE) (Eq. 1).

$${mean\; RSE}\left( \% \right)=\frac{1}{{\rm{n}}}{\sum }_{i=1}^{n}\frac{{{SE}}_{i}}{{z}_{i}}\ast 100$$

(1)

Where: $n$ represents the total grids with interpolation values, and ${SE}$ and $z$ are the standard error or kriging variance from the semivariograms and the interpolation value respectively from the interpolation result for each grid ($i$). Higher mean RSE indicates higher uncertainties.

The uncertainty results for the interpolated layers show that most (n = 12, from 15 layers with no time series, e.g. Shannon diversity index (SDI), species evenness, and percentage carbonate in sand—Fig. 4[A]; n = 10, from 18 layers with time series based on the mean value for all years, e.g. SDI, mean mobility mode (Mi), and mean reworking mode (Ri)—Fig. 4[B]) of the interpolated layers have 0.1 to 2 times that of the standard error range (based on the kriging semivariograms variance) compared to the estimated value (or mean RSE < 100%). But, for some layers (n = 3 for layers with no time series, i.e. total abundance, percentage carbonate in gravel, and compressive strength—Fig. 4[A]; n = 8 for layers with time series, e.g. mean bioturbation potential index (BP_c), total abundance, and biomass—Fig. 4[B]), the standard error does exceed twice the estimated value (or mean RSE > 100%). Furthermore, the mean RSE also shows that interpolation layers given with constraints for the interpolation zone [L-E17 to L-E54] have a smaller mean RSE than the layers without [L-E106 to L-E114]. The layers with narrower range values (e.g. Species evenness, mean M_i, BP_c in log₁₀ layers) also have a lower mean RSE compared to the layers with wider range values (e.g. BP_c not in log _10, mean body mass, biomass, or total abundance). See Fig. 4 for more details of RSE for each layer.

Root Mean Square Error (RMSE)

We measured the deviation between the interpolation prediction/kernel density estimation and the mean observed value per grid value using the root mean square error (RMSE) (Eq. 2). For interpolated layers that generated twice with and without zone limit (i.e. the benthic ecological parameters), RMSE was only conducted for one layer since both are basically from the same interpolation results.

$${RMSE}=\sqrt{\frac{1}{{\rm{n}}}{\sum }_{i=1}^{n}{{(z}_{i}-{{ze}}_{i})}^{2}}$$

(2)

Where: $n$ is the total grids with interpolation, $z$ and ${ze}$ are the interpolated and the exact gridding value extracted from the original data for each grid ($i$) respectively. Higher RMSE indicate higher deviation.

Some of the interpolated layers (n = 8, from 15 layers with no time series, e.g. SDI, Mi, Ri, and seabed compressive strength—Fig. 5[A]; n = 5, from 9 layers with time series based on the mean value for all years, e.g. Mi, Ri, and species evenness—Fig. 5[B]) have the ratio of RMSE to the measured value (or the exact mean extracted per grid) of < 1. While for the rest (n = 7, from 15 layers with no time series, e.g. BPc, total abundance, and mean body mass—Fig. 5[A]; n = 4, from 9 layers with time series based on the mean value for all layers, e.g. BPc, total abundance, and mean body mass—Fig. 5[B]), the RMSE can increase to 10 times the measured value.

While for the kernel density estimation layers, almost all (n = 8, from 10 layers of no time series, e.g. potential shipwrecks, dangerous shipwrecks and subsea power and telecommunications cables—Fig. 6[A]; n = 11, from 16 layers of time series based on the mean value for all layers, e.g. noises echosounder and satellite observation of offshore infrastructures—Fig. 6[B]) have RMSE ratio to the exact mean extracted per grid of < 1. With a few (n = 2, from 10 layers of no time series, i.e. subsea points infrastructures and sub-glacial bedforms—Fig. 6[A]; n = 5, from 16 layers of time series, i.e. AIS track vessels, noises acoustic deterrent device, and satellite observation of offshore infrastructures/vessels) having an average of the RMSE ratio per year data of >1. Nevertheless, it is important to note that: (a) the RMSE is a function of the bandwidth used in the kernel density estimation method¹⁷, for which we used Silverman’s Rule-of-thumb bandwidth estimation¹⁷—to note a smaller bandwidth would result in smaller RMSE, and (b) despite low RMSE given (i.e. those layers with ratio RMSE <1), the estimations given outside of the area of where observations exist are less reliable¹⁸. We provide these estimations to complement, rather than replace, the exact density gridding, providing a smoother version of spatial visualisation.

Coefficient of Determination (R²)

An alternative way to measure the deviation between the interpolation prediction/kernel density estimation and the mean observed value per grid value is to calculate the coefficient of determination (R²) [Eq. 3].

$${R}^{2}=1-\frac{\sum {({y}_{i}-{\hat{y}}_{i})}^{2}}{\sum {({y}_{i}-{\bar{y}}_{i})}^{2}}$$

(3)

Where: ${y}_{i}$ is the exact gridding value extracted from the original data, ${\bar{y}}_{i}$ is the mean of the exact gridding value, and ${\hat{y}}_{i}$ is the interpolated value. Lower R² indicates higher deviation.

We identified that some of the interpolated layers (n = 8, from 15 layers with no time series, e.g. SDI, species richness, and shear strength—Fig. 5[C]; n=2, from 9 layers with time series based on the mean value for all years, i.e. SDI and species richness—Fig. 5[D]) were found with R² > 0.6. Those with lesser R² include n = 7, from 15 layers of no time series, e.g. BPc, mean body mass, and biomass—Fig. 5[C]; n = 7, from 9 layers with time series based on the mean value for all layers, e.g. Mi, mean body mass, and biomass—Fig. 5[D]. While only a few of the kernel density layers (none from layers with non time series—Fig. 6[C]; n = 3, from 16 layers with time series, i.e. noises from seismic airguns and explosion, and sub-bottom profiler—Fig. 6[D]) are found with R² > 0.6. The others, such as heritage assets and offshore infrastructure from the no time series layers—Fig. 6[C]; and AIS track vessels, other noises, and satellite observation of offshore infrastructures/vessels for layers with time series—Fig. 6[D], were found to have lower R².

We provide the spatial distribution of the RSE and RMSE on each grid for each interpolated and density estimation on each generated layer in the open GIS dashboard R² for each grid cannot be calculated.

As the reliability of the interpolated layers is dependent on the number of samples within the area, we provide the spatial mapping of the number of samples, or amounts, found across the UK-EEZ for each input data in Fig. 7. This summary includes the number of sampling data distributions from benthic ecological, geotechnical, and seabed themes across the UK-EEZ that have undergone a spatial interpolation processing method.