· Data Engineering · 2 min read
Event-Driven Data Ingestion: Architecting S3 to Snowflake with Snowpipe
Stop scheduling batch jobs. Learn how to build a real-time, event-driven ingestion layer using AWS S3, SQS, and Snowflake Snowpipe.
The era of the “nightly batch” is over. Modern businesses demand data as it happens. Yet, many data teams are still stuck writing Cron jobs to poll API endpoints or check S3 buckets.
There is a better way. By leveraging AWS S3 Events, SQS, and Snowflake Snowpipe, we can build a pipeline that ingests data the millisecond it lands.
The Architecture
- Source: A file (JSON/Parquet/CSV) lands in an AWS S3 Bucket.
- Trigger: S3 publishes an
ObjectCreatedevent. - Queue: An Amazon SQS queue captures this event.
- Ingest: Snowpipe polls the queue, sees the new file, and loads it into a Snowflake raw table.
This architecture is Serverless, Scalable (handles 1 file or 1 million), and Cheap (you pay only for compute used).
Setting it Up
1. The Storage Integration
First, Snowflake needs permission to read your S3 bucket. We create a STORAGE INTEGRATION.
CREATE STORAGE INTEGRATION s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my_snowflake_role'
STORAGE_ALLOWED_LOCATIONS = ('s3://my-raw-data-bucket/');2. The Pipe
Instead of a COPY INTO command run by a scheduler, we wrap it in a PIPE.
CREATE PIPE my_db.raw.daily_sales_pipe
AUTO_INGEST = TRUE
AS
COPY INTO my_db.raw.sales_table
FROM @my_s3_stage
FILE_FORMAT = (TYPE = 'JSON');Why this changes everything
Once this pipe is active, engineering efforts shift. You no longer debug “why the 2 AM job failed.” You assume data is always arriving.
Your focus moves downstream: transforming that raw data into insights.
Need help moving to real-time? Contact our engineering team to audit your current pipelines.
