Wave overtopping prediction is the process of estimating the quantity of water that may spill over a coastal structure during a storm or high wave event. This is an important endeavor for ensuring the safety and functionality of coastal structures, such as seawalls, breakwaters, and other types of maritime infrastructure. The traditional deterministic models based on physical principles and mathematical models have limitations in handling the uncertainties and complexities associated with the wave overtopping prediction. This has led to the application of soft computing techniques in this area. These techniques have also reduced the computational time associated with the prediction process. Therefore, the use of machine learning techniques for wave

Artificial Neural networks (ANN) have become increasingly popular in the field of wave overtopping prediction due to their ability to learn complex relationships and patterns in data. Several studies have employed neural networks for this purpose, including Pullen et al.

Furthermore, den Bieman et al.

Elbisy

The goal of this work is to provide coastal designers with a robust and accurate machine learning model able to represent wave overtopping discharges for a wide range of coastal structure types with composite slopes under a variety of wave conditions. This study evaluates the accuracy of machine learning models utilizing the support vector machine and gradient-boosted tree approaches for wave overtopping discharge prediction of coastal structures with composite slopes "without a berm". This study is expected to provide valuable, up-to-date information for estimating wave overtopping risk, forecasting wave overtopping for warning and emergency evacuation of people in the event of extreme waves, risk minimization, and economic assessment of coastal protection projects.

In this study, the support vector machine and gradient-boosted tree approaches were used to predict wave-overtopping discharge at coastal structures featuring composite slopes "without a berm". A flowchart of the research steps is shown in

The new, expanded database currently contains more than 17,942 tests, approximately 13,500 of which are solely for wave overtopping. About 10,000 schematized experiments on wave overtopping discharge q were collected from all over the world for the original CLASH database. The data used is the data that is convened for and assigned for the training of the soft computing methods, which were 4737 tests.

Each test underwent a thorough screening process, during which a reliability factor, abbreviated RF, and a complexity factor, abbreviated CF, were assigned to each test (based on the volume and accuracy of the data). How simple it is to schematize the structure geometry using various geometrical factors determines the complexity factor. Reliable information or simple geometries received a score of 1, while less reliable information or complex geometries received a value of 3. A score of 4 indicates that the geometry was either too complex to be schematized or that the data were not trustworthy enough to be used.

The database was first expanded by adding already-existing databases on wave transmission and wave reflection. By keeping the same geometrical characteristics and pertinent climate parameters originally determined inside the CLASH project, the data assemblage was accomplished. Panizzo and Briganti

The methods used in this study are Gradient Boosted Trees (GBT) and Support Vector Machine (SVM). The GBT is boosting method creates base models consecutively in contrast to bagging. By focusing on these tough to estimate training events, multiple models are developed sequentially to increase prediction accuracy. During the boosting process, examples that are challenging to estimate with the prior base models show up in the training data more frequently than examples that can be accurately estimated. Every new base model aims to fix the errors created by the previous base models. The boosting strategy was initially devised in response to Kearns' question (8) as follows (9) is one strong learner the same as a group of weak learners? A weak leaner is an algorithm that just marginally outperforms random guessing; a strong base model is a more accurate prediction or classification method that outperforms random guessing. Is unjustifiably connected to the problem? This question's answer is very important. A weak model is frequently easier to estimate than a strong one. Schapire provides evidence that the answer is yes by integrating multiple poor models into a single, very accurate model using boosting methods. The main distinction between boosting and bagging techniques is how the former carefully resamples the training data to deliver the most pertinent data for each succeeding model.

Each training step's modified distribution is dependent on the error that the previous models had caused. The likelihood of choosing a certain sample is not equal for the boosting algorithm, in contrast to the bagging approach, which uniformly selects each sample to create a training dataset. Misclassified or overestimated samples are more likely to be chosen with greater weight. As a result, each newly developed model emphasizes the samples that prior models incorrectly categorized. Boosting fits extra models that minimize a specific loss function, such as a squared error or an absolute error, averaged over the training data. The loss function calculates how much the true value differs from the projected value. The use of a forward stage-wise modeling approach is one of the approximations to this problem.

With the forward stage-wise technique, new base models are sequentially added without altering the parameters or coefficients of previously added models. The boosting approach is a type of "functional gradient decent" in terms of regression problems. By introducing a base model at each step that best reduces the loss function, it is an optimization strategy that minimizes a certain loss function. The Vapnik-developed SVM technique (1995). The SVM model was initially created to solve pattern recognition issues. SVM has recently been expanded to address non-linear regression estimation and time series prediction thanks to the advent of the -insensitive loss function. Based on the structural risk minimization concept, which is a strategy for minimizing the upper limit risk functional associated to generalization performance, SVM are efficient machine-learning techniques. An SVM is essentially a mathematical object, a technique (or recipe) for optimizing a specific mathematical function in relation to a specific set of data. However, the fundamental concepts behind the SVM algorithm may be described without ever reading an equation.

In fact, all that is required to comprehend the essence of SVM classification are four fundamental ideas: the soft margin, the maximum-margin hyperplane, the separating hyperplane, and the kernel function.

H 4737 [m] 0.017 1.480 0.127 0.077 β 4737 [°] 0.000 80.000 3.720 11.499 h/L 4737 [m] 0.029 5.010 0.462 0.399 h 4737 [m] 0.029 5.010 0.440 0.403 B 4737 [m] 0.000 2.031 0.053 0.133 R 4737 [m] 0.000 2.500 0.170 0.152 A 4737 [m] -0.030 2.500 0.161 0.153 G 4737 [m] 0.000 1.000 0.119 0.159 cotα 4737 [-] 0.000 7.000 2.307 1.193 cotα 4737 [-] 0.000 7.000 2.340 1.203 D/H 4737 [m] 0.000 0.109 0.024 0.026 4737 [-] 0.380 1.000 0.722 0.276 Spread s 4737 [-] 0.000 10.000 0.346 1.400

_{m0 toe}/L_{m1,0t}
_{m1,0t}
_{t} /H_{m0 toe}
_{t}/L_{m1,0t}
_{c} /H_{m0 toe}
_{c} /H_{m0 toe}
_{c} /L_{m1,0t}
_{d}
_{incl}
_{m0 toe}

Given that some of the data comes from small-scale models and other data comes from full-scale prototypes, the new EurOtop database advises against utilizing basic parameters as input to the models. Therefore, to avoid the wide range in raw parameter values, the basic data should be dimensionless. Dimensionless parameters are used to improve the accuracy and dependability of models.

The wavelength (Lm1,0t) can be calculated by using the following equation:

The non-dimensional wave overtopping rate Sq is given by:

cotα_{d}, cotα_{incl}, spread s, β, h / Lm1, 0t, Hm,0,t / _{t} / Hm,0,t, B_{t} / _{f}, D / Hm,0,t, R_{c} / Hm,0,t, A_{c} / Hm,0,t, and G_{c} /

The mean square error (MSE), the root mean square error (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), the correlation coefficient (R), the coefficient of performance (COP), the average absolute error (AAE), the scatter index (SI), the root mean square percentage error (RMSPE), and the Willmott index were used to evaluate the performance of the models (WI). The following are the equations for various statistical indicators:

Where

The SVM configuration parameters were utilized to create the training. Statistical indicators showed that the points were more concentrated and less evenly distributed, giving the impression that the SVM outputs were insufficient to predict the overtopping rates as depicted in

In this section, the SVM and GBT algorithms' propensities for prediction were contrasted. The models' accuracy metrics are reported in

When compared to an SVM model, the GBT model performed significantly better. Since RMSE and MAPE were 0.003 and 0.125, respectively, the model's error levels were reduced. The correlation coefficient would be nearly one.

As seen in

0.000024 0.0049 0.0018 0.18 0.92 0.996 11.8 0.94 3889.6 0.96 0.000009 0.003 0.00125 0.125 0.97 1.03 5.06 0.57 6313.4 0.985

Since the SVM model was slightly off from the actual and predicted dash line in

To assess the validity of the GBT model, we compared its performance with that of the ANN model proposed by van Gent et al. (2007)

Coastal structures are primarily designed to prevent flooding and limit wave overtopping, but the ongoing effects of climate change, such as sea level rise and increased storm intensity and frequency, pose new challenges for the risk-based design of these structures. Accurately estimating overtopping discharges and understanding the characteristics of the overtopping flow over structures is crucial for ensuring the safety of people, activities, and goods in coastal areas or, at the very least, reducing their exposure to risk.

To address this challenge, the study utilized advanced machine learning techniques, specifically Support Vector Machine and Gradient Boosted Trees techniques, to predict wave overtopping discharge for a coastal structure with composite slopes "without a berm". The predictive performance of each model was evaluated using ten different parameters. The analysis of the EurOtop database (4737 data) found that the gradient-boosted trees technique produced exceptionally precise results in predicting wave overtopping discharge.

The analysis showed that the gradient-boosted trees model outperformed the ANN developed by van Gent et al. (2007) in terms of reducing prediction errors. This indicates that the GRNN model is more accurate and precise compared to other models.

However, further research is necessary to accurately represent more complex geometries, such as coastal structures with a berm, in the gradient-boosted trees. Additionally, we recommend comparing the performance of the gradient-boosted trees model with other available prediction methods for wave overtopping discharge to gain a better understanding of its effectiveness and applicability in different scenarios.

The authors would like to convey their sincere gratitude to the referees for their thorough evaluation and insightful criticism, both of which helped shape the manuscript into what it is today.