INVENTION | FREE FULL TEXT | Designing robust predictive ensembles of data-driven models using multi-objective formulations: Application to home energy management systems

In order to obtain a “good” model from a set of acquired or existing data, three sub-problems must be solved:

These sub-problems are addressed by applying a model design framework consisting of two existing tools. The first one, denoted as ApproxHull, performs data selection from the data available for design. Feature and topology search are solved by the evolution part of MOGA (Multi-Objective Genetic Algorithm), while parameter estimation is performed by the gradient part of MOGA.

2.1.1. Data selection

In order to design a data-driven model like RBF (Radial Basis Function), the training set must contain samples covering the entire input-output range over which the underlying process should operate. To identify such samples, called convex hull (CH) points, from the entire data set, the convex hull algorithm can be applied.

Standard convex hull algorithms suffer from exaggerated time and space complexity in high-dimensional studies.To address these challenges in high dimensions, ApproxHull was proposed [29] As a stochastic approximation convex hull algorithm. To identify convex hull points, ApproxHull employs two main computational geometry concepts: hyperplane distance and convex hull distance.
In view of this X =

x 1 x d T in a d-dimensional Euclidean space, and a hyperplane H, the hyperplane distance of x to H is obtained by:

d s x , H = a 1 x 1 + + a d v d + b a 1 2 + a d n

where n = a 1 , a d T and d are the normal vector and the offset of H, respectively.

Given a set X = x i i = 1 n d and a point x d , the Euclidean distance between x and the convex hull of X, denoted by conv(X), can be computed by solving the following quadratic optimization problem:

min a a T Q a 2 c T a s . t . e T a = 1 , a 0

where e = 1 , 1 T , Q = X T X , and c = X T x . Assuming that the optimal solution of Equation (2) is a*, the distance of point x to conv(X) is given by:

d c x , c o n v ( X ) = x T x 2 c T a * + a * T Q a *

ApproxHull consists of five main steps. In Step 1, each dimension of the input dataset is scaled to the range [−1, 1]. In step 2, the largest and smallest samples of each dimension are identified and considered as the vertices of the initial convex hull. In step 3, a population of k faces based on the current vertex of the convex hull is generated. In step 4, equation (1) is used to identify the furthest points to each face in the current population and, if they have not been detected before, treat them as new vertices of the convex hull. Finally, in step 5, the current convex hull is updated by adding the newly found vertices to the current vertex set. Iteratively perform steps 3 to 5 until no vertex is found in step 4 or the newly found vertex is very close to the current convex hull and thus does not contain useful information. Under an acceptable user-defined threshold, the convex hull distance shown in (3) is used to identify the closest point to the current convex hull.

In previous steps before determining CH points, ApproHull eliminates copies and linear combinations of samples/features. After determining the CH points, ApproxHull generates training sets, test sets, and validation sets for MOGA to use according to user specifications, but merges the CH points into the training set.

2.1.2. Parameter separability

We will use models whose parameters are linearly and nonlinearly separable [30,31].The output of such a model at time step kgiven as:

y ^ X k , w = you 0 + Σ I = 1 n you I Phi I X k , v I = Phi X k , v you

In (4), X k is the input of step ANN k, Phi is the basis function vector, you is the (linear) output weight vector, and v represents a nonlinear parameter.For simplicity, we assume there is only one hidden layer, and v Consists of n Parameter vector, each parameter vector corresponds to each neuron v =

v 1 v n T . This type of model comprises Multilayer Perceptrons, Radial Basis Function (RBF) networks, B-Spline and Asmod models, Wavelet networks, and Mamdani, Takagi, and Takagi-Sugeno fuzzy models (satisfying certain assumptions) [32].

This means that model parameters can be separated into linear and nonlinear parameters: and this separability can be exploited in training algorithms.For a set of input patterns Xtraining the model means finding the value wthus minimizing the following criteria:

oh X , w = y y ^ X , w 2 2 2

Where . 2 Represents the Euclidean norm. Replace (4) in (6) with:

oh X , w = y C X , v you 2 2 2

Where C X , v =

φ x 1 , v φ x m , v T , m being the number of patterns in the training set. As (7) is a linear problem in u, its optimal solution is given as:

Where the symbol ‘+’ denotes a pseudo-inverse operation. Replacing (8) in (7), we have a new criterion, which is only dependent on the nonlinear parameters:

Ψ X , v = y Γ X , v Γ + X , v y 2 2 2

The advantages of using (9) instead of (7) are threefold:

  • It lowers the problem dimensionality, as the number of model parameters to determine is reduced;

  • The initial value of Ψ is much smaller than Ω

  • Typically, the rate of convergence of gradient algorithms using (9) is faster than using Equation (7).

2.1.5. MOGA

This framework is described in detail in [37], which will be discussed briefly here. MOGA develops ANN structures whose parameters are separated (RBF in this case) and each structure is trained by the minimization criterion (9) in Section 2.1.3. Since we will be designing a predictive model that wants to predict the evolution of a specific variable within a predefined PH (prediction horizon), the model should provide multi-step ahead predictions. This type of prediction can be achieved in a direct mode by using multiple one-step-ahead prediction models, each of which provides predictions for each step-ahead within the PH. Another approach taken in this work is to use a recursive version. In this case, only one model is used, but its inputs change over time. For simplicity, consider a nonlinear autoregressive model with exogenous inputs (NARX), where there is only one input:

y ^ k + 1 | k = F z k = F y k d 0 1 , , y k d 0 n , X k d I 1 , , X k d I rice

Where y ^ k + 1 | k represents the time step prediction k + 1 Given the measurement data at that time kand d I j this jth variable delay I. This represents a one-step forecast within the forecast horizon.When we iterate over (17) on PH, some or all of the indices on the right-hand side will be greater than k, which means that corresponding forecasts must be adopted. What was said for the NARX model also applies to the NAR model (no exogenous inputs).

The evolutionary part of MOGA evolves a group of ANN structures. Each topology consists of the number of neurons in a single hidden layer (for RBF models) and the model inputs or features. MOGA assumes that the number of neurons must be within a user-specified range, n ε n m , n M Additionally, one needs to select the features to use for a specific model, i.e., must perform input selection. In MOGA we assume that, from a total number q of available features, denoted as F, each model must select the most representative d features within a user-specified interval, d d m , d M , d M q . For this reason, each ANN structure is codified as shown in Figure 1:

The first component corresponds to the number of neurons. The next dm represent the minimum number of features, while the last white ones are a variable number of inputs, up to the predefined maximum number. The λ j values correspond to the indices of the features fj in the columns of F.

The operation of MOGA is a typical evolutionary procedure. We shall refer the reader to publication [37] About genetic operators.
The model design cycle is shown in Figure 2. First, the search space should be defined. This includes the input variables to be considered, the lag to be considered for each variable, and the allowed ranges of the neuron and input.The total input data, expressed as FThen, together with the target data, it must be divided into three different sets: Training setestimate model parameters; test setexecution stops early; and Validation setanalyze MOGA performance.
Secondly, the optimization goals and objectives need to be clearly defined. A typical goal is to evaluate the root mean square error (RMSE) on the training set ( r t r ), or on the test set ( r t e ), and model complexity, #(v)—the number of nonlinear parameters—or the norm of the linear parameters ( you 2 ). For predictive applications, as is the case here, a metric is also used to evaluate their performance.Assume a time series simulationa subset of design data, where p data point.For each point, model (14) is used to make predictions pH value Stay one step ahead. Then, build the error matrix:

Second s I rice , phosphorus H = e 1 , 1 e 1 , 2 e 1 , P H e 2 , 1 e 2 , 2 e 2 , P H e p p h , 1 e p p h , 2 e p p h , P H ,

where e[i,j] is the immediate model prediction error I of simulationin step j within the forecast range.Indicates that the RMS function is in Ith columns of matrix Secondgo through r s I rice . , I the prediction performance criterion is the sum of the RMS of each column Second:

r s I rice phosphorus H = Σ I = 1 phosphorus H r Second s I rice , phosphorus H , I

Note that in the MOGA formula, each performance criterion can be minimized, or set to a limit.

After formulating the optimization problem and setting other hyperparameters, such as the number of elements in the population (nPop music), the total number of iterations (nIterator) and genetic algorithm parameters (random immigration proportion, selection pressure, crossover rate, and survival rate), a hybrid evolutionary gradient method was performed.

Each element in the population corresponds to a certain RBF structure. Since the model is nonlinear, gradient algorithms such as LM algorithm minimization (6) are only guaranteed to converge to a local minimum. Therefore, the RBF model is trained a user-specified number of times, starting with different initial values ​​of the nonlinear parameters. MOGA allows choosing initial centers from the heuristics mentioned in Section 2.1.4, or using an adaptive clustering algorithm [38].
Since the problem is multi-objective, there are multiple ways to determine which training trial is best. One strategy is to select training trials with the smallest Euclidean distance from the origin.The green arrow in Figure 3 illustrates this situation d = 2 . In the second strategy, the average of the target values ​​across all training trajectories is calculated, and then the trial with the value closest to the average is selected as the best trial (i.e. the red arrow in Figure 3).
another d The strategy is to choose the training trials that minimize I t H target (i.e. I = 1 , 2 , , d ) better than other tests. For example, the yellow and blue arrows in Figure 3 are the best training trials that minimize Goal 1 and Goal 2 respectively.
After executing the specified number of iterations, our performance value is nPop music * nIterator Different models. Since the problem is multi-objective, a subset of these models corresponds to non-dominated models (ND), or Pareto solution.If one or more targets are set as limits, then NDexpressed as the priority solution, Preference, corresponding to the non-dominated solution that satisfies the objective. Figure 4 shows an example.

The performance of MOGA models is evaluated on either a nondominated model set or a prioritized model set. If a single solution is pursued, it will be selected based on the target values ​​for these model sets, the performance criteria applied to the validation set, and possibly other criteria.

The problem definition steps should be modified when the analysis of solutions provided by MOGA requires repeating the process. In this case, two main operations can be performed: redefining the input space by removing or adding one or more features (variables and lagged inputs in the modeling problem), and improving the trade-offs by changing the objective or redefining Surface coverage target. This process can be advantageous because typically the output of a single run allows us to reduce the number of input terms (and possibly the variables used to model the problem) by eliminating terms that are not present in the resulting population. Furthermore, faced with the results obtained in a single run, it is often possible to narrow down the number of neurons. This results in a smaller search space in subsequent runs of MOGA, potentially achieving faster convergence and better Pareto front approximation.

Typically, for a specific problem, an initial MOGA execution is performed minimizing all objectives. Then, a second execution is run, which typically sets some targets as limits.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *