With the increasing adoption of renewable energy sources (RES), battery electric vehicles (BEVs) and energy storage systems (ESS) in recent years by residential consumers, they are changing from conventional consumers to prosumers, who both consume and produce electricity. This transformation has made the power systems increasingly dynamic and bidirectional in terms of power flow.
To effectively plan for the future of residential electricity consumption, grid operators, policymakers, utilities, and other stakeholders must have a clear understanding of the dynamics of the prosumers in the future. However, limited or lack of data availability is a major obstacle for several reasons. Firstly, the slow adoption of new technologies (particularly BEVs) and automation in this field contributes to the scarcity of real data from prosumers. Secondly, individual electricity consumption data for thousands of prosumers is not available to practitioners and researchers due to consumers' privacy concerns. Thirdly, in countries with widespread smart meter rollouts, interval consumption data of prosumers is available. However, metered data only shows imported and exported energy from/to the grid, which cannot help determine the type of prosumers based on their behind-the-meter (BTM) equipment, such as BEVs, stationary batteries, or solar PV systems. Lastly, the dynamic nature of prosumers' behaviour and the frequent changes in household electrical appliances further exacerbate reliable data availability issues.
Under this circumstance, we synthesized a benchmark dataset based on real-world consumers' data collected from Denmark, incorporating three different RES interval datasets: automated energy storage systems, rooftop solar PV systems, and BEVs. By reformatting the data and applying a conditional tabular generative adversarial network (CTGAN)-based data synthesizer, we can sidestep the privacy concern of using real-world consumers' data. In this way, we created a synthetic dataset of 600,000 days of imported and exported energy from/to the grid. This dataset includes hourly resolution profiles labelled by BTM equipment, type of day, season, and daily temperature, providing a comprehensive representation of residential prosumers' consumption patterns.
To verify the authenticity of the dataset, we applied a set of analysis methods, including qualitative inspection, empirical statistics, Machine Learning (ML) based evaluation metrics, and information theory. We demonstrated that the synthetic dataset shows similar statistical features when compared to our benchmark dataset as well as other research using real-world electricity users' data. The ML model trained by the synthetic dataset shows a reasonable performance when it's tested with the real dataset. This means the synthetic dataset can be used to provide insights both for humans and for ML models. While walking through all the explored key performance indicators (KPIs), one limitation spotted is that the synthetic dataset has a higher complexity compared to the real dataset. This shows that the CTGAN generally overestimates the complexity of the real dataset when the stochastic nature of residential users and their varying consumption patterns are taken into account. On the other hand, the models can successfully capture the features and relative complexity of each type of user.
We believe this synthetic dataset offers several advantages. It allows users to gain insight into possible future scenarios based on different technology adaptations. This is possible by using the programming code publicly shared, which allows researchers and other stakeholders to apply different plausible scenarios of the future for planning, operation, investment, developing new business models, etc. Additionally, users can examine how different external factors such as temperatures and seasons impact their electricity usage. Lastly, while our synthetic dataset focuses on Danish residential prosumers, our methodology can be applied to other datasets. As a result, it can be used with high-resolution data that could provide more information on the system requirements in the future.