Apache Iceberg

Use the Apache Iceberg Connector to send data to an Apache Iceberg table. This connector uses Iceberg format version 2.

Overview

Connector name iceberg

Type

sink

Delivery guarantee

exactly once

Compatibility

This connector currently supports the following catalog and data warehouse options:

- Catalog: AWS Glue

- Data Warehouse: AWS S3

Prerequisites

To send data to an Apache Iceberg table, the following must be true:

  • You must have an AWS role with a Permissions Policy containing read and write permissions for AWS S3 and AWS Glue.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "glue:CreateDatabase",
                    "glue:CreateTable",
                    "glue:GetTable"
                ],
                "Resource": [
                    "arn:aws:glue:<region>:<AWS account id>:catalog",
                    "arn:aws:glue:<region>:<AWS account id>:database/<catalog-database>",
                    "arn:aws:glue:<region>:<AWS account id>:table/<catalog-database>/<catalog-table>"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<s3 warehouse path>/*"
                ]
            }
        ]
    }
    • Replace <region> with the AWS region (e.g., us-east-1) of the Glue catalog.

    • Replace <AWS account id> with your AWS account ID.

    • Replace <s3 warehouse path>, <catalog-database>, and <catalog-table> with appropriate values for your Iceberg tables.

  • If you’re using the managed cloud version of Decodable, you’ll need to include the following trust policy to allow Decodable’s AWS account to assume the role:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::671293015970:root"
                },
                "Action": "sts:AssumeRole",
                "Condition": {
                    "StringEquals": {
                        "sts:ExternalId": "<Decodable account name>"
                    }
                }
            }
        ]
    }

Steps

Follow these steps to send data to an Iceberg table using the Iceberg Connector.

  1. From the Connections page, select the REST Connector and complete the following fields.

    UI Field Property Name in the Decodable CLI Description

    Connection Type

    N/A

    Select Sink to use this connector to send data into the database provided.

    Warehouse path

    warehouse

    The file path to the S3 bucket or folder that you want to send data to.

    For example: s3://bucket/folder.

    Database name

    catalog-database

    The name of the database in your Iceberg catalog. This is the name that you added permissions for as part of the prerequisites.

    If a database with this name doesn’t exist, Decodable creates it.

    Table name

    catalog-table

    The name of the table in your Iceberg catalog. This is the name that you added permissions for as part of the prerequisites.

    If a table with this name doesn’t exist, Decodable creates it.

    Catalog type

    catalog-type

    The catalog responsible for managing the metadata associated with Iceberg tables.

    Currently, only AWS Glue is supported. If you are using the Decodable CLI to create this connection, enter glue for this value.

    IAM Role ARN

    role-arn

    The AWS ARN of the IAM role.

    For example, arn:aws:iam::111222333444:role/decodable-s3-access.

    AWS Region

    region

    The AWS region of the AWS Glue catalog.

    Format

    format

    The format for data in AWS S3. The following formats are supported:

    - parquet - avro - orc

    Defaults to parquet.

  2. Select which stream contains the records that you’d like to send to an Iceberg table. Then, select Next.

  3. Give the newly created connection a Name and Description and select Save.

Upon starting this connection, you can use it to send data to your Iceberg table. If you are sending data from a change stream to an Iceberg table, then upsert mode is used. Otherwise, append mode is used.