Amazon S3

Amazon AWS Long Term Storage

Synopsis

Creates a target that writes log messages to Amazon S3 buckets with support for various file formats, authentication methods, and multipart uploads. The target handles large file uploads efficiently with configurable rotation based on size or event count.

Schema

- name: <string>
  description: <string>
  type: awss3
  pipelines: <pipeline[]>
  status: <boolean>
  properties:
    key: <string>
    secret: <string>
    session: <string>
    region: <string>
    endpoint: <string>
    part_size: <numeric>
    bucket: <string>
    buckets:
      - bucket: <string>
        name: <string>
        format: <string>
        compression: <string>
        extension: <string>
        schema: <string>
    name: <string>
    format: <string>
    compression: <string>
    extension: <string>
    schema: <string>
    max_size: <numeric>
    batch_size: <numeric>
    timeout: <numeric>
    field_format: <string>
    interval: <string|numeric>
    cron: <string>
    debug:
      status: <boolean>
      dont_send_logs: <boolean>

Configuration

The following fields are used to define the target:

Field	Required	Default	Description
`name`	Y		Target name
`description`	N	-	Optional description
`type`	Y		Must be `awss3`
`pipelines`	N	-	Optional post-processor pipelines
`status`	N	`true`	Enable/disable the target

AWS Credentials

Field	Required	Default	Description
`key`	N*	-	AWS access key ID for authentication
`secret`	N*	-	AWS secret access key for authentication
`session`	N	-	Optional session token for temporary credentials
`region`	Y	-	AWS region (e.g., `us-east-1`, `eu-west-1`)
`endpoint`	N	-	Custom S3-compatible endpoint URL (for non-Amazon S3 services)

* = Conditionally required. AWS credentials (key and secret) are required unless using IAM role-based authentication on AWS infrastructure.

Connection

Field	Required	Default	Description
`part_size`	N	`5`	Multipart upload part size in megabytes (minimum 5MB)
`timeout`	N	`30`	Connection timeout in seconds
`field_format`	N	-	Data normalization format. See applicable Normalization section

Files

Field	Required	Default	Description
`bucket`	N*	-	Default S3 bucket name (acts as catch-all when `buckets` is also specified)
`buckets`	N*	-	Array of bucket configurations for file distribution
`buckets.bucket`	Y	-	S3 bucket name
`buckets.name`	Y	-	File name template
`buckets.format`	N	`"json"`	Output format: `json`, `multijson`, `avro`, `parquet`
`buckets.compression`	N	-	Compression algorithm. See Compression below
`buckets.extension`	N	Matches `format`	File extension override
`buckets.schema`	N*	-	Schema definition file path (required for Avro and Parquet formats)
`name`	N	`"vmetric.{{.Timestamp}}.{{.Extension}}"`	Default file name template (used with `bucket` for catch-all)
`format`	N	`"json"`	Default output format (used with `bucket` for catch-all). See Format below
`compression`	N	-	Default compression (used with `bucket` for catch-all)
`extension`	N	Matches `format`	Default file extension (used with `bucket` for catch-all)
`schema`	N	-	Default schema path (used with `bucket` for catch-all)
`max_size`	N	`0`	Maximum file size in bytes before rotation
`batch_size`	N	`100000`	Maximum number of messages per file

* = Either bucket or buckets must be specified. When using buckets, schema is conditionally required for Avro and Parquet formats.

note

When max_size is reached, the current file is uploaded to S3 and a new file is created. For unlimited file size, set the field to 0.

Scheduler

Field	Required	Default	Description
`interval`	N	realtime	Execution frequency. See Interval for details
`cron`	N	-	Cron expression for scheduled execution. See Cron for details

Debug Options

Field	Required	Default	Description
`debug.status`	N	`false`	Enable debug logging
`debug.dont_send_logs`	N	`false`	Process logs but don't send to target (testing)

Details

The Amazon S3 target supports writing to different buckets with various file formats and schemas. The target provides enterprise-grade cloud storage integration with comprehensive file format support.

Authentication Methods

Supports static credentials (access key and secret key) with optional session tokens for temporary credentials. When deployed on AWS infrastructure, can leverage IAM role-based authentication without explicit credentials.

All authentication methods call sts:GetCallerIdentity during initialization to validate credentials before proceeding.

IAM Permissions

When using IAM role-based authentication, the following permissions are required:

IAM Action	Purpose
`sts:GetCallerIdentity`	Validate credentials at initialization
`s3:PutObject`	Upload log files to bucket (also covers multipart upload lifecycle)
`s3:AbortMultipartUpload`	Abort failed multipart uploads

Loading include...

Minimum IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "STSIdentity",
      "Effect": "Allow",
      "Action": "sts:GetCallerIdentity",
      "Resource": "*"
    },
    {
      "Sid": "S3Upload",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:AbortMultipartUpload"
      ],
      "Resource": "arn:aws:s3:::BUCKET_NAME/*"
    }
  ]
}

note

The S3 upload manager automatically switches between single-part PutObject and multipart upload based on the part_size configuration.

Loading include...

Templates

The following template variables can be used in file names:

Variable	Description	Example
`{{.Year}}`	Current year	`2024`
`{{.Month}}`	Current month	`01`
`{{.Day}}`	Current day	`15`
`{{.Timestamp}}`	Current timestamp in nanoseconds	`1703688533123456789`
`{{.Format}}`	File format	`json`
`{{.Extension}}`	File extension	`json`
`{{.Compression}}`	Compression type	`zstd`
`{{.TargetName}}`	Target name	`my_logs`
`{{.TargetType}}`	Target type	`awss3`
`{{.Table}}`	Bucket name	`logs`

Multipart Upload

Large files automatically use S3 multipart upload protocol with configurable part size (part_size parameter). Default 5MB part size balances upload efficiency and memory usage.

Multiple Buckets

Single target can write to multiple S3 buckets with different configurations, enabling data distribution strategies (e.g., raw data to one bucket, processed data to another).

Schema Requirements

Avro and Parquet formats require schema definition files. Schema files must be accessible at the path specified in the schema parameter during target initialization.

Examples

Basic Configuration

The minimum configuration for a JSON S3 target:

targets:
  - name: basic_s3
    type: awss3
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-east-1"
      bucket: "datastream-logs"

Pipeline-Based Routing

Dynamic bucket routing using pipeline processors to analyze log content and route to appropriate buckets:

targets:
  - name: smart_routing_s3
    type: awss3
    pipelines:
      - dynamic_routing
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-east-1"
      buckets:
        - bucket: "security-events"
          name: "security-{{.Year}}-{{.Month}}-{{.Day}}.json"
          format: "json"
        - bucket: "application-events"
          name: "app-{{.Year}}-{{.Month}}-{{.Day}}.json"
          format: "json"
        - bucket: "system-events"
          name: "system-{{.Year}}-{{.Month}}-{{.Day}}.json"
          format: "json"
      bucket: "other-events"
      name: "other-{{.Timestamp}}.json"
      format: "json"

pipelines:
  - name: dynamic_routing
    processors:
      - set:
          field: "_vmetric.bucket"
          value: "security-events"
          if: "ctx.event_type == 'security'"
      - set:
          field: "_vmetric.bucket"
          value: "application-events"
          if: "ctx.event_type == 'application'"
      - set:
          field: "_vmetric.bucket"
          value: "system-events"
          if: "ctx.event_type == 'system'"

Multiple Buckets with Catch-All

Configuration for routing different log types to specific buckets with a catch-all for unmatched logs:

targets:
  - name: multi_bucket_routing
    type: awss3
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-east-1"
      buckets:
        - bucket: "security-logs"
          name: "security-{{.Year}}-{{.Month}}-{{.Day}}.json"
          format: "json"
        - bucket: "application-logs"
          name: "app-{{.Year}}-{{.Month}}-{{.Day}}.json"
          format: "json"
      bucket: "general-logs"
      name: "general-{{.Timestamp}}.json"
      format: "json"

Multiple Buckets with Different Formats

Configuration for distributing data across multiple S3 buckets with different formats:

targets:
  - name: multi_bucket_export
    type: awss3
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "eu-west-1"
      buckets:
        - bucket: "raw-data-archive"
          name: "raw-{{.Year}}-{{.Month}}-{{.Day}}.json"
          format: "multijson"
          compression: "gzip"
        - bucket: "analytics-data"
          name: "analytics-{{.Year}}/{{.Month}}/{{.Day}}/data_{{.Timestamp}}.parquet"
          format: "parquet"
          schema: "<schema definition>"
          compression: "snappy"

Parquet Format

Configuration for daily partitioned Parquet files:

targets:
  - name: parquet_analytics
    type: awss3
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-west-2"
      bucket: "analytics-lake"
      name: "events/year={{.Year}}/month={{.Month}}/day={{.Day}}/part-{{.Timestamp}}.parquet"
      format: "parquet"
      schema: "<schema definition>"
      compression: "snappy"
      max_size: 536870912

High Reliability

Configuration with enhanced settings:

targets:
  - name: reliable_s3
    type: awss3
    pipelines:
      - checkpoint
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-east-1"
      bucket: "critical-logs"
      name: "logs-{{.Timestamp}}.json"
      format: "json"
      timeout: 60
      part_size: 10

With Field Normalization

Using field normalization for standard format:

targets:
  - name: normalized_s3
    type: awss3
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-east-1"
      bucket: "normalized-logs"
      name: "logs-{{.Timestamp}}.json"
      format: "json"
      field_format: "cim"

Debug Configuration

Configuration with debugging enabled:

targets:
  - name: debug_s3
    type: awss3
    properties:
      key: "AKIAIOSFODNN7EXAMPLE"
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      region: "us-east-1"
      bucket: "test-logs"
      name: "test-{{.Timestamp}}.json"
      format: "json"
      debug:
        status: true
        dont_send_logs: true

Synopsis​

Schema​

Configuration​

AWS Credentials​

Connection​

Files​

Scheduler​

Debug Options​

Details​

Authentication Methods​

IAM Permissions​

Templates​

Multipart Upload​

Multiple Buckets​

Schema Requirements​

Examples​

Basic Configuration​

Pipeline-Based Routing​

Multiple Buckets with Catch-All​

Multiple Buckets with Different Formats​

Parquet Format​

High Reliability​

With Field Normalization​

Debug Configuration​