When we work with forecasts, we often store our data in netCDF files since they are very well suited for multi dimensional data and have great software support with packages like xarray and dask. However if your applications write, delete and change your data often, then netCDF may not be the storage container you're looking for. Instead we turned to a classic and one of the most used database engines in the world, SQLite.
Like all SQL databases, SQLite stores data in tables. For our example let's assume we have forecast data that we store using a schema with basetime, step, validtime and values. Here is how our data could look like:
| basetime | step | validtime | value | |
|---|---|---|---|---|
| 0 | 2018-08-20 10:00:00 | 00:15:00 | 2018-08-20 10:15:00 | 0.000000 |
| 1 | 2018-08-20 10:00:00 | 00:30:00 | 2018-08-20 10:30:00 | 3.826834 |
| 2 | 2018-08-20 10:00:00 | 00:45:00 | 2018-08-20 10:45:00 | 7.071068 |
| 3 | 2018-08-20 10:00:00 | 01:00:00 | 2018-08-20 11:00:00 | 9.238795 |
| 4 | 2018-08-20 10:00:00 | 01:15:00 | 2018-08-20 11:15:00 | 10.000000 |
We use pythons sqlite3 package to interact with our database and in doing so, we write the SQL querys in strings and execute them with sqlite3s db api. The queries can then be wrapped in python functions for easy use. Here is how we would connect to our database and wrap basic insert and update queries in python functions:
from sqlite3 import dbapi2 as sqlite import numpy as np con = sqlite.connect('forecast_data.db') con.execute(("create table if not exists forecast" "(basetime, step, validtime, value)")) con.commit() def insert_into_db(basetime, step, validtime, value): """ Writes a value into the SQLite db. NaNs are written as NULLs. """ value = value if not np.isnan(value) else 'NULL' arg = ("insert into forecast " "(basetime, step, validtime, value) " "values ('%s', '%s', '%s', %s)" %(basetime, step, valid_time, value)) con.execute(arg) con.commit() def update_value(basetime, step, validtime, value): """ Executes an update query. """ value = value if not np.isnan(value) else 'NULL' arg = ("update forecast set " "(basetime, step, validtime, value) = " "('%s', '%s', '%s', %s) " "where basetime='%s' and step='%s" %(basetime, step, validtime, value, basetime, step)) con.execute(arg) con.commit()
When retrieving forecasts from the database we use "select from where" queries that allow us to select different forecasts or group forecast data by the step as described before. Also we usually transform timeseries data into xarray.DataArrays or pandas.Series for further analysis. Here is how those queries can be wrapped:
import pandas as pd def select_from_where(arg): """ Execute a select query and returns a pd.Series with validtime as index. """ retval = con.execute(arg).fetchall() val = [retval[i][3] for i in range(len(retval))] time = [pd.Timestamp(retval[i][2]) for i in range(len(retval))] return pd.Series(val, index=time) def retrieve_a_timeseries_by_basetime(basetime): """ Retrieves forecasts. Parameters ---------- basetime : string or pd.Timestamp """ arg = "select * from forecast where basetime='%s'" % (basetime) return select_from_where(arg) def retrieve_a_timeseries_by_step(step): """ Retrieves forecasts. Parameters ---------- step : string or pd.Timedelta """ arg = "select * from forecast where step='%s'" % (step) return select_from_where(arg)
In conclusion, SQLite can be great for storing forecast data, when you need an easy and dynamic storage container as a back-end for an application like e.g. a monitoring site. Since everything get's stored in a single file it's portable and easy to set up. The developers have a great when-to-use site, where they lay out some general use cases for SQLite.
Also I found this post about storing financial forecasts using RSQLite:
AntwortenLöschenhttps://www.r-bloggers.com/storing-forecasts-in-a-database/
It's a bit dated but it's somewhat similar to what we do with our weather forecasts.