Keeping data up to date

The Opendatasoft platform makes it possible, in the very same data catalog, to handle completely static datasets (which need to be published only once) and live datasets (which need to be regularly updated). Two different mechanisms are made available to handle datasets refresh.

The first one is called scheduling and consists of having a dataset being automatically republished at fixed intervals. This mode is most useful for datasets with a remote resource that is regularly updated.

The second one consists of pushing data on the Opendatasoft platform using a dedicated API endpoint. This mode is most useful when the data can be sent directly by the system that produces the data points, such as a computer program sending event metrics or a set of sensors sending their readings.

Using scheduling to keep a dataset up to date


The availability of this feature depends on the license of the Opendatasoft domain.

This solution is the easiest to implement. It does not require any development, only a remote source and some settings in the dataset configuration.

Specifying a resource


To be able to schedule a dataset, its underlying resource must be a remote one, specified as a URL (HTTP or FTP work well), and not an uploaded file. To add such a resource, simply paste a URL in the URL input.

Specifying scheduling interval


Once a dataset is saved with a remote resource, the scheduling tab is activated. The minimum interval is the minute, but it is not activated by default. Please contact Opendatasoft's support if you need minute level scheduling on your domain. You can add as many schedules as you want. For instance, if it fits your needs, you could decide to schedule a dataset to be reprocessed every Monday morning and every Wednesday afternoon.


Schedules are defined to run on the timezone of Paris, France. In standard time, schedules run on GMT+1 (Central European Time). In the summer months, schedules run on GMT+2 (Central European Summer Time).

Pushing real time data


The availability of this feature depends on the license of the Opendatasoft domain.

For some types of data, it can be useful to push data instead of the more traditional model of having the data being pulled from a resource by the platform. To address this need, the Opendatasoft platform offers a realtime push API. It is not to be confused with the ability to schedule a dataset processing. When scheduling, the dataset will periodically pull the resource and process the data that is inside of it, whereas with the push API, the dataset is fed by an application through a push API, and records are processed one by one as soon as they are received.

Configuring the dataset schema


To create a realtime dataset, start by navigating to the dataset creation interface. Here, select "add a realtime source".

realtime resource pane

You will be prompted to enter some bootstrap data and to optionally fill in additional options. The bootstrap data should have all the fields that will be sent through the API. Please note that the bootstrap data is not used in the dataset: its sole purpose is to allow setting up the dataset.

Using the push URL


Once your dataset is saved with the correct realtime resource settings, a URL path containing a push API key will appear. This path, appended to your domain base URL, is where the platform will expect data to be sent after publication. As is the case with the bootstrap data, the data is expected to be sent in the JSON format, either as a single JSON object for a single record or an array of JSON objects to push multiple records at once.

table view with a single record with value "Hello World!" in the "message" field

A mimimal example of the api usage for a dataset with a single field named "message", using curl, would be

curl -XPOST <DOMAIN_URL>/api/push/1.0/<DATASET_ID>/<RESSOURCE_ID>/push/?pushkey=<PUSH_API_KEY> -d'{"message":"Hello World!"}'

A minimal example with the same dataset, using the array form to send multiple records at once would be

curl -XPOST <DOMAIN_URL>/api/push/1.0/<DATASET_ID>/<RESSOURCE_ID>/push/?pushkey=<PUSH_API_KEY> -d'[{"message":"¡Hola Mundo!"},{"message":"Hallo Welt!"}]'

If the records have been received correctly, the server will respond the following message.

    "status": "OK"

If an error happened while trying to push a record, the response would specify the error.


Real time push requests are limited to a 5MB payload. A larger payload will trigger an error and should be split into several smaller requests instead.

Pushing a field of type file

To push a field of type image, a JSON object containing the base64-encoded content and the MIME type of the file needs to be sent, as such.

    "image_field": {
        "content": "BASE64 data",
        "content-type": "image/jpg"

Update data by defining a unique key

table view with 2 records containing respectively 978-0060589462 and 978-2862744506 as isbn and 3 and 5 as number_of_copies

Sometimes it is useful to update the existing records instead of just pushing new ones. An example of this would be a dataset that tracks the number of copies available for each book in a public library. Suppose that we have such a dataset with two fields: isbn, representing the ISBN number of the book, and number_of_copies tracking the current number of copies available in the library. It would not make a lot of sense to add one record for each new value of number_of_copies. Instead, it would be better to set the new number_of_copies value to the record corresponding to the book isbn.


In order to set up such a system with the Opendatasoft platform, the fields that will be used as a unique key must be marked as so. In our example, the unique key would be isbn because the rest of the data is linked to individual books, and these books are identified by the ISBN. This can be done in the processing view, in the menu that pops when the configuration button is pressed. It is possible to set multiple fields as unique keys. Then, after saving and publishing, if a new record whose key value is equal to an existing record is pushed, the new record will overwrite the old record. In our library case, if your dataset has isbn as the unique key and contains these two records.

        "isbn": "978-0060589462",
        "number_of_copies": 3
    }, {
        "isbn": "978-2862744506",
        "number_of_copies": 5

If somebody borrows a copy of Zen and the Art of Motorcycle Maintenance, and you push the following record, you will still have two records, the first one being updated with the new value.

    "isbn": "978-0060589462",
    "number_of_copies": 2
table view with 2 records containing respectively 978-0060589462 and 978-2862744506 as isbn and 2 and 5 as number_of_copies

Delete data

There are two entrypoints that allow for deleting a pushed records. One that uses the records values and one that uses the record ID.

Using the record values

To delete a record knowing the record fields values, POST the record as if you were adding it for the first time, but replace /push/ with /delete/ in the push URL. If your push URL path is /api/push/1.0/<DATASET_ID>/<RESSOURCE_ID>/push/?pushkey=<PUSH_API_KEY>, then use instead /api/push/1.0/<DATASET_ID>/<RESSOURCE_ID>/delete/?pushkey=<PUSH_API_KEY>. A minimal example to delete the record we pushed earlier follows.

curl -XPOST <DOMAIN_URL>/api/push/1.0/<DATASET_ID>/<RESSOURCE_ID>/delete/?pushkey=<PUSH_API_KEY> -d'{"message":"Hello World!"}'

Using the record ID

If you know the record ID of the record you want to delete, simply make a GET request to the URL you get by replacing /push/ with /<RECORD_ID>/delete/ in the push URL. A minimal example of this follows.

curl -XGET <DOMAIN_URL>/api/push/1.0/<DATASET_ID>/<RESSOURCE_ID>/<RECORD_ID>/delete/?pushkey=<PUSH_API_KEY>

Get notified in case of inactivity


If you expect a system to push data to the platform often, you may want to be notified if no record has been received by the platform in a while. In order to get notified, you can enable the "Alerting" option in the source configuration and set up a time threshold in minutes. If a time span greater than the threshold has occurred during which no record has been received, you will receive an email.

Unpublishing and disabling the API


Beware of unpublishing your dataset, as this will not keep existing records for the next time the dataset is published. If you desire to avoid getting new data, you should instead click the "disable push" button in the resource setting. This will prevent the usage of the push API but will have no effect on existing data. If data is pushed while push is disabled on the resource, no data will be added, and an error will be sent.



In the event of data loss, for example, when the dataset has been unpublished or when a processor has been misconfigured, there is a possibility of recovering the lost records. To do so, the recovery option must have been activated prior to the records being pushed to the platform.


When the recovery is activated, every subsequent record received will be backed up and eligible for recovery. You can use the "recover data" button on the source configuration page to recover eligible records.