File Delivery Options
This document outlines the options available in terms of how file-based deliveries are made.
Defaults
- File Format: CSV
- File Compression: GZIP
- Folder structure: See here
- Files are automatically partitioned
- Success File is by default provided
File Formats
CSV
We deliver .csv
with a header column and ,
as the delimiter.
(ND)JSON
We deliver New-Line Delimited JSON with the .json
extension.
File Compression
We offer to compress each file delivered with GZIP
.
Folder Structure
This describes the folder structure used to separate consecutive deliveries of data. Files are partitioned by date and time of the shipment.
[/PREFIX]/[dataset_version_string][is_backfill]/[DELIVERY_DATE_AND_HOUR]/[PERIOD_START_DATE]/
PREFIX
is optional and customisabledataset_version_string
is a unique name for the version of the dataset receivedis_backfill
if the delivery is a backfill_backfill
will be appeneded to thedataset_version_string
DELIVERY_DATE_AND_HOUR
the date and hour of the delivery (UTC) e.g.2023/01/01/13/
would be data with delivery started on Jan 1st, 2023 at hour 13 UTC.PERIOD_START_DATE
time partitions of data, indicates start date of each observation period, e.g.2022/12/01
would be data describing December 2022.
File Partitioning
Files are automatically partitioned into several chunks/files. These files are numbered.
There is no guaranteed sort between the chunks/files in a single delivery.
Success File
If enabled, a success file can be provided.
The success file is simply a file that is written after all files within one delivery has been successfully written to the target.
For a backfill of data when multiple time periods of data is delivered at once, one success-file will be provided per period, in its respective folder.
In the backfill example below, there is one success-file for each period of data. Data is delivered for November and December 2022.
[<PREFIX>/]<DATASET_VERSION>_backfill/2023/02/24/14/2022/11/01/_SUCCESS
[<PREFIX>/]<DATASET_VERSION>_backfill/2023/02/24/14/2022/11/01/<DATASET_VERSION>000000000000.csv.gz
[<PREFIX>/]<DATASET_VERSION>_backfill/2023/02/24/14/2022/12/01/_SUCCESS
[<PREFIX>/]<DATASET_VERSION>_backfill/2023/02/24/14/2022/12/01/<DATASET_VERSION>000000000000.csv.gz
The name of the success file is _SUCCESS
and contains no particular information.