S3 destination for batch exports

Last updated:

|Edit this page

With batch exports, data can be exported to an S3 bucket.

Creating the batch export

  1. Subscribe to data pipelines add-on in your billing settings if you haven't already.
  2. Click Data pipelines in the navigation and go to the Destinations tab in your PostHog instance.
  3. Search for S3.
  4. Click the + Create button.
  5. Fill in the necessary configuration details.
  6. Finalize the creation by clicking on "Create".
  7. Done! The batch export will schedule its first run on the start of the next period.

S3 configuration

Configuring a batch export targeting S3 requires the following S3-specific configuration values:

  • Bucket name: The name of the S3 bucket where the data is to be exported.
  • Region: The AWS region where the bucket is located.
  • Key prefix: A key prefix to use for each S3 object created. This key can include template variables
  • Format: Select a file format to use in the export. See here for details on which file formats are supported.
  • Max file size (MiB): If the size of the exported data exceeds this value, the data is split into multiple files. (Note that this is approximate and the actual file size may be slightly larger). If this value is not set, or is set to 0, the data is exported as a single file.
  • Compression: Select a compression method (like gzip) to use for exported files or no compression.
  • Encryption: Select a server-side encryption method (AES256 or aws:kms) for AWS to encrypt data at rest.
  • AWS Access Key ID: An AWS access key ID with access to the S3 bucket.
  • AWS Secret Access Key: An AWS secret access key with access to the S3 bucket.
  • AWS KMS Key ID: The AWS KMS Key ID to use for server-side encryption. Only required when selecting aws:kms encryption.
  • Events to exclude: A list of events to omit from the exported data.
  • Endpoint URL: Required if exporting to an S3-compatible blob storage.

S3 key prefix template variables

The key prefix provided for data exporting can include template variables which are formatted at runtime. All template variables are defined between curly brackets (for example {day}). This allows you partition files in your S3 bucket, such as by date.

Template variables include:

  • Date and time variables:
    • year.
    • month.
    • day.
    • hour.
    • minute.
    • second.
  • Name of the table exported (for now, only "events"):
    • table.
  • Batch export data bounds:
    • data_interval_start.
    • data_interval_end.

So, as an example, setting {year}-{month}-{day}_{table}/ as a key prefix, will produce files prefixed with keys like 2023-07-28_events/.

S3 file formats

PostHog S3 batch exports support two file formats for exporting data:

The batch export format is selected via a drop down menu when creating or editing an export.

We intend to add support for other common formats, and format-specific configuration options. You can follow the roadmap to track progress.

S3-compatible blob storage

PostHog S3 batch exports may also export data to an S3-compatible blob storage like MinIO. Simply set the Endpoint URL to your blob storage's host and port, for example: https://my-minio-storage:9000.

Questions?

Was this page useful?

Next article

Snowflake destination for batch exports

With batch exports, data can be exported to a Snowflake database table. Creating the batch export Subscribe to data pipelines add-on in your billing settings if you haven't already. Click Data pipelines in the navigation and go to the Destinations tab in your PostHog instance. Search for Snowflake . Click the + Create button. Fill in the necessary configuration details . Finalize the creation by clicking on "Create". Done! The batch export will schedule its first run on the start of…

Read next article