Integrated geospatial datasets to inform marine spatial planning and impact assessment in waters surrounding the United Kingdom
From the total 337 layers in this integrated geodataset, there are 33 interpolation layers (15 interpolation layers with no time series, 9 interpolation layers with time series and limit zone, and 9 interpolation layers with time series and no limit zone), and 26 kernel density estimated layers (10 estimation layers of no time series, and 16 estimation layers of time series) that are not just extracted or resampled from observation/model data given in the data source, and as such require technical validation. For validation, we calculated the mean relative standard error (RSE) to understand the uncertainty of the resulting interpolation layers, and root mean square error (RMSE) along with coefficient of determination (R2) to measure the deviation of both interpolated and kernel density estimation layers.
Here we describe the equations of the parameters and graphical representation of the measures. Mean RSE are given in Fig. 4[A] for layers with no time series, and Fig. 4[B] for layers with time series; the ratio of RMSE to the exact gridding value is shown in Fig. 5[A] for layers with no time series, and Fig. 5[B] for layers with time series; and R2 is shown in Fig. 5[C] for layers with no time series, and in Fig. 5[D] for layers with time series. The ratio of RMSE to the exact gridding value and R2 for the kernel density layers are given in Fig. 6[A] and [C], respectively for layers with no time series, and Fig. 6[B] and [D], respectively for layers with time series.

Relative standard error (RSE) for [A] interpolated layers with time series and [B] interpolated layers without time series. Layers code in [A], [L-E01]: mean bioturbation potential index (BPc), [L-E02]: Shannon diversity index, [L-E03]: mean mobility mode (Mi), [L-E04]: mean reworking mode (Ri), [L-E05]: species evenness, [L-E06]: species richness, [L-E07]: total abundance per meter square, [L-E08]: mean body mass, [L-E54]: biomass per meter square, [L-G01]: compressive strength, [L-G02]: shear strength, [L-G07]: percentage carbonate in sand, [L-G08]: percentage carbonate in mud, [L-G09]: percentage carbonate in gravel, [L-G10]: percentage carbonate in total sediment. Layers code in [B], [L-E17, L-E106]: mean bioturbation potential index (BPc), [L-E18, L-E107]: shannon diversity index, [L-E19, L-E108]: mean mobility mode (Mi), [L-E20, L-E109]: mean reworking mode (Ri), [L-E21, L-E110]: species evenness, [L-E22, L-E111]: species richness, [L-E23, L-E112]: total abundance per meter square, [L-E24, L-E113]: mean body mass, [L-E55, L-E114]: biomass per meter square.

Ratio of the root mean square error (RMSE) of the interpolated layers to the exact gridding mean value for each grid, and the R2 between the interpolated layers and the exact gridding mean value for each grid: [A,C] interpolated layers with time series and [B,D] interpolated layers with no time series, respectively. Layers code in [A,C], [L-E01]: mean bioturbation potential index (BPc), [L-E02]: Shannon diversity index, [L-E03]: mean mobility mode (Mi), [L-E04]: mean reworking mode (Ri), [L-E05]: species evenness, [L-E06]: species richness, [L-E07]: total abundance per meter square, [L-E08]: mean body mass, [L-E54]: biomass per meter square, [L-G01]: compressive strength, [L-G02]: shear strength, [L-G07]: percentage carbonate in sand, [L-G08]: percentage carbonate in mud, [L-G09]: percentage carbonate in gravel, [L-G10]: percentage carbonate in total sediment. Layers code in [B,D], [L-E17]: mean bioturbation potential index (BPc), [L-E18]: Shannon diversity index, [L-E19]: mean mobility mode (Mi), [L-E20] mean reworking mode (Ri), [L-E21]: species evenness, [L-E22]: species richness, [L-E23]: total abundance per meter square, [L-E24]: mean body mass, [L-E55]: biomass per meter square.

Ratio of the root mean square error (RMSE) of kernel density estimation (KDE) layers to the exact gridding mean value for each grid, and the R2 between the KDE layers and the exact gridding mean value for each grid: [A,C] KDE layers with time series, and [B,D] KDE layers without time series. Layers code in [A,C], [L-A37]: heritage assets – potential shipwrecks, [L-A38]: heritage assets – dangerous shipwrecks, [L-A39]: heritage assets – floating and fixed heritage assets, [L-A40]: heritage assets – obstructions, [L-A45]: subsea power and telecommunications cables, [L-A47]: subsea points, [L-A48]: subsea linear: [L-A49]: pipeline freespans, [L-50]: pipeline, [L-G19]: sub-glacial bedforms features. Layers code in [B,D], [L-A01]: AIS track vessels per day per grid for each year, [L-A05]: noises – seismic survey airguns, [L-A07]: noises – explosion, [L-A09]: noises – sub bottom profiler, [L-A10]: noises – acoustic deterrent device, [L-A11]: noises – piling, [L-A77]: satellite observation – infrastructure wind, [L-A78]: satellite observation – infrastructure oil, [L-A79]: satellite observation – infrastructure unknown, [L-A80]: satellite observation – AIS fishing, [L-A81]: satellite observation – vessel – AIS non fishing, [L-A82]: satellite observation – vessel – dark fishing, [L-A83]: satellite observation – vessel – dark non fishing.
Mean Relative Standard Error (RSE)
We include an estimate of uncertainty for all layers generated using empirical Bayesian kriging (EBK) spatial interpolation. Uncertainties are represented by the ratio of the kriging variance given by the semivariograms of the kriging interpolation upon the prediction, and averaged for all interpolated grids, which is termed as the mean relative standard error (RSE) (Eq. 1).
$${mean\; RSE}\left( \% \right)=\frac{1}{{\rm{n}}}{\sum }_{i=1}^{n}\frac{{{SE}}_{i}}{{z}_{i}}\ast 100$$
(1)
Where: \(n\) represents the total grids with interpolation values, and \({SE}\) and \(z\) are the standard error or kriging variance from the semivariograms and the interpolation value respectively from the interpolation result for each grid (\(i\)). Higher mean RSE indicates higher uncertainties.
The uncertainty results for the interpolated layers show that most (n = 12, from 15 layers with no time series, e.g. Shannon diversity index (SDI), species evenness, and percentage carbonate in sand—Fig. 4[A]; n = 10, from 18 layers with time series based on the mean value for all years, e.g. SDI, mean mobility mode (Mi), and mean reworking mode (Ri)—Fig. 4[B]) of the interpolated layers have 0.1 to 2 times that of the standard error range (based on the kriging semivariograms variance) compared to the estimated value (or mean RSE < 100%). But, for some layers (n = 3 for layers with no time series, i.e. total abundance, percentage carbonate in gravel, and compressive strength—Fig. 4[A]; n = 8 for layers with time series, e.g. mean bioturbation potential index (BPc), total abundance, and biomass—Fig. 4[B]), the standard error does exceed twice the estimated value (or mean RSE > 100%). Furthermore, the mean RSE also shows that interpolation layers given with constraints for the interpolation zone [L-E17 to L-E54] have a smaller mean RSE than the layers without [L-E106 to L-E114]. The layers with narrower range values (e.g. Species evenness, mean Mi, BPc in log10 layers) also have a lower mean RSE compared to the layers with wider range values (e.g. BPc not in log 10, mean body mass, biomass, or total abundance). See Fig. 4 for more details of RSE for each layer.
Root Mean Square Error (RMSE)
We measured the deviation between the interpolation prediction/kernel density estimation and the mean observed value per grid value using the root mean square error (RMSE) (Eq. 2). For interpolated layers that generated twice with and without zone limit (i.e. the benthic ecological parameters), RMSE was only conducted for one layer since both are basically from the same interpolation results.
$${RMSE}=\sqrt{\frac{1}{{\rm{n}}}{\sum }_{i=1}^{n}{{(z}_{i}-{{ze}}_{i})}^{2}}$$
(2)
Where: \(n\) is the total grids with interpolation, \(z\) and \({ze}\) are the interpolated and the exact gridding value extracted from the original data for each grid (\(i\)) respectively. Higher RMSE indicate higher deviation.
Some of the interpolated layers (n = 8, from 15 layers with no time series, e.g. SDI, Mi, Ri, and seabed compressive strength—Fig. 5[A]; n = 5, from 9 layers with time series based on the mean value for all years, e.g. Mi, Ri, and species evenness—Fig. 5[B]) have the ratio of RMSE to the measured value (or the exact mean extracted per grid) of < 1. While for the rest (n = 7, from 15 layers with no time series, e.g. BPc, total abundance, and mean body mass—Fig. 5[A]; n = 4, from 9 layers with time series based on the mean value for all layers, e.g. BPc, total abundance, and mean body mass—Fig. 5[B]), the RMSE can increase to 10 times the measured value.
While for the kernel density estimation layers, almost all (n = 8, from 10 layers of no time series, e.g. potential shipwrecks, dangerous shipwrecks and subsea power and telecommunications cables—Fig. 6[A]; n = 11, from 16 layers of time series based on the mean value for all layers, e.g. noises echosounder and satellite observation of offshore infrastructures—Fig. 6[B]) have RMSE ratio to the exact mean extracted per grid of < 1. With a few (n = 2, from 10 layers of no time series, i.e. subsea points infrastructures and sub-glacial bedforms—Fig. 6[A]; n = 5, from 16 layers of time series, i.e. AIS track vessels, noises acoustic deterrent device, and satellite observation of offshore infrastructures/vessels) having an average of the RMSE ratio per year data of >1. Nevertheless, it is important to note that: (a) the RMSE is a function of the bandwidth used in the kernel density estimation method17, for which we used Silverman’s Rule-of-thumb bandwidth estimation17—to note a smaller bandwidth would result in smaller RMSE, and (b) despite low RMSE given (i.e. those layers with ratio RMSE <1), the estimations given outside of the area of where observations exist are less reliable18. We provide these estimations to complement, rather than replace, the exact density gridding, providing a smoother version of spatial visualisation.
Coefficient of Determination (R2)
An alternative way to measure the deviation between the interpolation prediction/kernel density estimation and the mean observed value per grid value is to calculate the coefficient of determination (R2) [Eq. 3].
$${R}^{2}=1-\frac{\sum {({y}_{i}-{\hat{y}}_{i})}^{2}}{\sum {({y}_{i}-{\bar{y}}_{i})}^{2}}$$
(3)
Where: \({y}_{i}\) is the exact gridding value extracted from the original data, \({\bar{y}}_{i}\) is the mean of the exact gridding value, and \({\hat{y}}_{i}\) is the interpolated value. Lower R2 indicates higher deviation.
We identified that some of the interpolated layers (n = 8, from 15 layers with no time series, e.g. SDI, species richness, and shear strength—Fig. 5[C]; n=2, from 9 layers with time series based on the mean value for all years, i.e. SDI and species richness—Fig. 5[D]) were found with R2 > 0.6. Those with lesser R2 include n = 7, from 15 layers of no time series, e.g. BPc, mean body mass, and biomass—Fig. 5[C]; n = 7, from 9 layers with time series based on the mean value for all layers, e.g. Mi, mean body mass, and biomass—Fig. 5[D]. While only a few of the kernel density layers (none from layers with non time series—Fig. 6[C]; n = 3, from 16 layers with time series, i.e. noises from seismic airguns and explosion, and sub-bottom profiler—Fig. 6[D]) are found with R2 > 0.6. The others, such as heritage assets and offshore infrastructure from the no time series layers—Fig. 6[C]; and AIS track vessels, other noises, and satellite observation of offshore infrastructures/vessels for layers with time series—Fig. 6[D], were found to have lower R2.
We provide the spatial distribution of the RSE and RMSE on each grid for each interpolated and density estimation on each generated layer in the open GIS dashboard R2 for each grid cannot be calculated.
As the reliability of the interpolated layers is dependent on the number of samples within the area, we provide the spatial mapping of the number of samples, or amounts, found across the UK-EEZ for each input data in Fig. 7. This summary includes the number of sampling data distributions from benthic ecological, geotechnical, and seabed themes across the UK-EEZ that have undergone a spatial interpolation processing method.

Distribution of input data for interpolated layers (data Type I) in the UK waters. [A] classification of water depth in the UK waters – depth up to 227 m are highlighted as they represent the 90th percentile of areas with high human activities60. [B] Total ocean space per depth classification across different sea regions. [C,F,I] Maps showing distribution of benthic [D-E1], geotechnical [D-G1], and seabed [D-G2] sampling data, respectively, with colour bar exhibiting different amounts of samples with natural breaks interval. [D,G,J] Charts showing the mean total number of benthic, geotechnical, and seabed samples per km2 per sea region, by depth classification. [E,H,K] Charts showing the mean percentage of 10 km2 grids covered by benthic, geotechnical, and seabed data per sea region, by depth classification.
link
