This story
begins in Philadelphia, 2022. We were attending the IEEE Photovoltaic
Specialist Conference (PVSC), where we gave a talk about one of our papers. At
this event we got to know a few people, including a researcher at University of
Cyprus. We kept some contact after the event and eventually had the idea of co-authoring
a paper, which we planned to present at 2023 PVSC. The idea was to take sample
measurement data from solar power plants and train a model to generate
realistic synthetic data from it. We got a good chunk of the work done at had
submitted an abstract, which was accepted, but due to time conflicts we were
eventually unable to make it there. It's been some time since then, and
we now feel like blogging about our approach and the preliminary results we
got.
![]() |
| Photo of Philadelphia city center, close the the convention center where 2022 PVSC was held. It is here where the authors of the work had first met. |
First
the process calculated a large set (over several thousand) of features that are
known to be relevant dataset in time dependent data; in a second step, the
features were weighed due to their correlation with the data, which resulted in
a new list of several hundred features that are correlated to the data. These
two steps represent the automated feature selection. Instead of synthesizing
irradiance directly, we computed the clearsky indices (the ratio of the actual
global horizontal irradiance to the theoretical global horizontal irradiance at
ground level under cloudless conditions) using the clearsky model by Ineichen
[2] as implemented in Python pvlib [3]; in the third step a Markov chain process was
used to create a set of synthetic features for a given observation based on a
Markov probability matrix; finally, a new series of clear-sky indices was
generated using a random forest regression model. These artificial clear-sky
indices, which have the same value and frequency range as the real
measurements, can be converted into irradiance. The random forest
hyperparameters and the number of features used in the Markov probability
matrices were fine-tuned to minimize the difference between the features of the
synthetic and measured data.
| Overview of the pipeline of the proposed methodology to obtain complete GHI synthetic timeseries with correlated independent features from the original data. Figure was taken from the abstract. |
Our
preliminary results looked promising, which is why we went ahead with
submitting the abstract. The following figures show examples of synthetically
generated data for clearsky, mixed and overcast situations, which look quite
realistic. A direct comparison of recreating the training data through the
generation method shows a reasonable agreement with a 7.5% RMSE wrt. mean
measurement.
Comparison
of synthetically predicted and original timeseries for a selected day: (a)
clear-sky index; and (b) GHI. Sample of
the synthetically generated daily GHI profiles at 15-minute sampling for (a) mostly
clear-sky day, (b) mixed-skies day, and (c) overcast day.
Even though we ended up not presenting at 2023 PVSC, we still think the idea was good and maybe there will be another chance to do something about it in the future.
[1] Christ, M., Braun, N., Neuffer, J. and Kempa-Liehr A.W. (2018). Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package). Neurocomputing 307 (2018) 72-77, doi: 10.1016/j.neucom.2018.03.067.
[2] Ineichen, P.. "A broadband simplified version of the Solis clear sky model." Solar Energy 82.8 (2008): 758-762.
[3] Holmgren, W., Hansen,
C., and Mikofski, M., “pvlib python: a python package for modeling solar energy
systems.” Journal of Open Source Software, 3(29), 884, (2018). https://doi.org/10.21105/joss.00884
[4] Polo, J., Zarzalejo,
L. F., Marchante, R., and Navarro, A. A., “A simple approach to the synthetic
generation of solar irradiance time series with high temporal resolution,”
Solar Energy, vol. 85, no. 5, pp. 1164–1170, 2011, doi:
10.1016/j.solener.2011.03.011.
[5] Rayati, M., De
Falco, P., Proto, D., Bozorg, M., and Carpita, M. 2021. "Generation Data
of Synthetic High Frequency Solar Irradiance for Data-Driven Decision-Making in
Electrical Distribution Grids" Energies 14, no. 16: 4734.
[6] I. L.
Carreño et al., "SoDa: An Irradiance-Based Synthetic Solar Data
Generation Tool," 2020 IEEE International Conference on
Communications, Control, and Computing Technologies for Smart Grids
(SmartGridComm), Tempe, AZ, USA, 2020, pp. 1-6, doi:
10.1109/SmartGridComm47815.2020.9302941.

Keine Kommentare:
Kommentar veröffentlichen