A high percentage of energy consumption in India is directed by the utilization in both commercial and domestic buildings. The country has seen a constant rise in energy consumption in terms of percentage over the years and is said to grow by at least 15 times between the years 2010 and 2050
There are two major approaches for energy consumption prediction: the physical modeling approach and the datadriven approach. The first one deals more with physical and environmental factors like the weather, structure of the building, choice of construction materials, the surrounding environment, and insulations
Existing datadriven models for power predictions include a wide variety of machine learning and deep learning models like AutoRegressive Integrated Moving Average (ARIMA), Support Vector Regression (SVR), Random Forests (RF), and Artificial Neural networks (ANN). ARIMA is a commonly used univariate time series model for future predictions based on historical data
Convolutional Neural Network (CNN) and LSTM models solve the problem of finding exploratory variables using linear regression
An Improved Sine Cosine Optimization Algorithm based LSTM (ISCOALSTM) is introduced
Building energy prediction values are different for various profiles within the building. A single model developed for this purpose consumes a lot of computational resources. A framework called Multiple Electric Energy Consumption forecasting in a smart building using Transfer Learning and LSTM, denoted by MECTLL is used
Three different base models  Support Vector Machine (SVM), Back Propagation Neural Network (BPNN), and Extreme Learning Machine (ELM) are put together to form an integrated structure to predict energy utilization values with the least amount of forecasting errors
Studies have been done to compare the performance of single prediction models to ensemble models. They show high reliability on prediction values for the ensemble creations along with flexibility and stability
Despite the extensive study that has been done, there is often a lack of importance that is given to the lag variable that needs to be considered to give more relevant outputs. Also, current models in existence tend to use the entire dataset for training purposes every time the model is required to forecast values. This can lead to an unnecessarily high cost of time and effort with a lack of flexibility in terms of the data being collected and used for analysis.
The objective of this study is hence to create an enhanced version of an ensemble consisting of LSTM and VAR as the base models in order to overcome the challenges mentioned above. This is done by making use of historical data that is generated from the energy meter to identify common trends. This system then aims at analyzing the data using visualizations and makes predictions for the future, which can pose a helpful tool to enable people to understand their energy utilization patterns. This helps a person closely monitor and single out an anomaly for early fault detections as well, thereby allowing an individual to track and manage his daily energy consumptions.
This paper proposes a novel datadriven solution by building a time series prediction ensemble model that integrates the advantageous properties of LSTM and VAR using enhanced ensemble bagging techniques. A lag value of the past 48 hours is taken into account to predict values for the next three days, to increase the accuracy. Further, Principal Component Analysis (PCA) techniques have been used for cleaning the data to enhance the outputs. The processed data is passed through an ensemble of LSTM and VAR using a weighted average bagging technique to enhance the outputs. It considers the results produced by both models individually along with the results of the ensemble model and compares their performance using the Mean Squared Error (MSE) value. The proposed model has proven to be more benign in terms of increased precision as well as decremented computational resource utilization.
The structure of the paper is organized as follows. Section 2 discusses the data collection and preprocessing techniques used along with the architecture of the proposed LSTMVAR ensemble model. Section 3 presents the empirical results, which is followed by Section 4 to conclude the paper.
The data used for this study is collected from a singlephase Minion Energy Monitor, which is developed by Minion Labs India Private Limited. It is installed in a commercial building to collect data from households. The acquired data is not publicly available and is collected for the interest of this study. The dataset contains 7 attributes, namely Device ID, Current, Active Power, Voltage, Power Factor, Frequency, and Timestamp. The dataset has been collected over a period of one week, with a time interval of two seconds. Information pertaining to multiple devices and their energy consumption activity for 7 days are recorded. Active power is chosen as the target variable to be predicted for power consumption.
Missing and inconsistent values are found by calculating the difference between two consecutive rows of data and by comparing the resultant difference against the specified threshold. These values are then superseded by the result obtained by an Iterative Imputer that uses Bayesian Ridge Regression. Once a consistent dataset is formed, it is then reshaped to have the timestamp column of power consumption transformed into the index of the data frame. The data is also checked for stationarity, to make sure that statistical values like mean, variance, and autocorrelation do not change over time. A 0.8 fraction of the entire data is extracted to form the training subset, with the remaining falling under the test category. Each of these subsets is then normalized using the equation (1).
where N is the normalized score, x represents the values in the dataset, train[mean] and train[std] is the mean and standard deviation of the values for each column respectively. This is done to scale the data for a more radical transformation.
Since the data collected is fairly large, Principal Component Analysis (PCA) is performed to reduce the size of the dataset without compromising the necessary information. This in turn reduces the computation time as well as prevents overfitting of data. PCA has been performed using the decomposition submodule within the scikit learn library provided by Python.
Ensembling is a technique used in machine learning where multiple base models are integrated to create a single model that can perform better. This is due to the fact that an integrated model tends to overcome the limitations of each model and only strengthens the performance by combining their independent capabilities. They are known to reduce the need for a heavily set of tuned parameters for optimization. Moreover, they are known to be robust by predicting more accurate results as well as maintaining stability across multiple datasets. Most ensemble techniques are a twostep process:
Create and identify the connections for the base model. The models chosen can either be connected in parallel or series.
Pick one valid scheme to evaluate the results produced. A few techniques that are often used include Max voting, Weighted average, Stacking, Boosting, Bagging, and Blending.
The proposed ensemble has been depicted in
A stacked LSTM of two layers has been used as the base model to store information about the attributes and their contribution to power consumption patterns. LSTM networks are developed to solve the problem of longterm dependencies. It is one of the most commonly used models for time series analysis due to its advantageous property of being able to retain memory for longer durations when compared to a standard Recurrent Neural Network (RNN). This has been achieved by making use of four neural network structures within the single repeating module that is present in a simple RNN. It comprises multiple memory blocks called cells, which are responsible for retaining the information learned. Manipulations to these cells are done through three different gates, forget gate, input gate, and output gate. These gates contain a Sigmoid activation function that helps the structure understand what parts of the data need to be remembered and what can be ignored. The outputs are binary in the form of 1 and 0 where 1 corresponds to data being allowed to pass through the gate and 0 corresponds to data being blocked by the gate. The equations for the three gates are shown in equations (3), (4), and (5).
where
The VAR model is used as a multivariate time series forecasting model where a group of timedependent variables is modeled as a linear combination of the previous value. Since it makes use of multiple independent variables, it is created as a system of equations for each variable being predicted. Each of these equations makes use of an endogenous lag variable to identify a deterministic trend. Suppose
where
The results from the standalone LSTM and VAR models have been taken to iteratively create an ensemble using bagging techniques. This newly created ensemble produces the final prediction by taking the weighted mean of base models. This can be expressed through equation (8).
Where,
The ensemble has been tuned in the following ways to provide the most effective results. The chosen activation function is the Rectified Linear activation function (ReLU) due to its property of being able to process stochastic gradient descent with the additional characteristic of performing backpropagation for error correction. It is also known to circumvent easy saturation. The return sequence parameter for the LSTM has been set to 'True' so that the output of each hidden layer is made visible to convert the result into a threedimensional input for the next layer. To avoid overfitting, a callback method has been considered while fitting the model to the data, which will automatically terminate the training process once the specified threshold has been reached. The VAR model used in the ensemble made use of the application of the Akaike Information Criterion (AIC) to identify 31 as the best order for lag and 3.644 as the AIC score. Once these parameters have been taken into account, the data is fit to the VAR model, and then predictions are made. Data that is preprocessed is passed through both of these models, where probabilities for each prediction are calculated. They trigger each neuron in the LSTM network based on a threshold value, thereby giving more weight to certain neurons in order to predict accurate values.



116 
122.01 
120.80 
126 
131.37 
129.07 
132 
139.52 
136.14 
143 
160.40 
156.82 
134 
139.52 
137.14 
142 
154.73 
151.20 
143 
160.40 
158.82 
133 
139.52 
137.14 
To evaluate the performance of the proposed ensemble, the forecasting error has been calculated. This is done through the use of the MSE value, which is an average of the squares of all the errors present. It is seen that the modeling performance is low for this structure in contrast to the plot shown in
Analysis has also been done to compare the proposed model with existing studies that have used different models. SVM and Autoregression are two commonly used models that have been implemented previously in studies for this application. They have proven to be dependable for predictions. But as seen in
R2 score, also known as the coefficient of determination, is often used to describe the percentage of variance in the dependent attribute in accordance with the target attribute. A high R2 score suggests more accuracy. Similarly, another metric called Explained Variance explains the amount of data being lost while approximating the model. The lower the value, the less accurate the model.
An observed limitation is the lack of variation in data with respect to the parameters selected for training. This study can be extended for a future scope with the addition of several other parameters corresponding to energy consumption for multiple datasets to check its consistency.
This research study has proposed a datadriven, deep learning ensemble of LSTM and VAR in order to overcome the gaps identified in existing approaches for energy consumption prediction. The timeseries data is analyzed and normalized to fit the needs of the model, which is then passed through the ensemble to learn and extract useful information. A lag variable is then introduced to increase the efficiency and decrease computation time in predicting energy consumption values. The predicted values are plotted to identify the MSE and evaluate the model's performance. The paper concluded with a comparison of indifference in error for the standalone models against the created ensemble. It is seen that the ensemble performed more efficiently and stably than the standard models such as LSTM, SVM and VAR, thereby making it more reliable for future applications. The proposed enhanced model gave an improved R2 score of 98.99% when compared to LSTM (74.85%), SVM (62.41%), ARIMA (93.15%) and VAR (82.914%) standalone models. LSTMs are affected by various random weight initializations and are hence quite comparable to a FeedForward Neural Network. LSTMs are susceptible to overfitting and the dropout technique is hard to use to stop this problem. Dropout is a strategy for regulating inputs and recurrent LSTM connections which are likely to be excluded from weight updates and activation during training a network. In addition, LSTM requires the operation of 4 linear layers per cell for every timephase sequence. Linear layers need to calculate big quantities of the memory bandwidth, since the system doesn't have enough memory bandwidth to feed the computer units, because of all these reasons, LSTM performed less. SVM took a long training time and led to a huge error percentage because of which it performed less. VAR imposes a oneway relationship. The variables of the forecast are influenced by the prediction variables, but not vice versa. Due to this reason, VAR too performed less. ARIMA models can be extremely precise and dependable under the right conditions and with enough data. It is observed during experimentation process that, the model’s parameters are to be defined manually, making obtaining the best fit a lengthy trialanderror process. Also, the model is strongly reliant on the consistency and differencing of past data. To ensure that the model produces reliable results and forecasts, it is critical to ensure that data is collected accurately and over a lengthy period of time. Due to these reasons, ARIMA model performed less when compared to the proposed enhanced model.
The future scope for this study includes testing the model for various datasets and taking into consideration a number of other parameters that contribute to an energy profile. Further, an empirical analysis can be done to analyse the time complexity of the model with respect to other models that have been used for similar problem statements.