AWS: Asynchronous Event Ingestion and Processing Architecture
AWS: Asynchronous Event Ingestion and Processing Architecture
This documentation outlines the Asynchronous Event Ingestion and Processing Architecture designed for high-scale webhook integration from clients OR third-party providers.
- Overview
- Architecture
- High-Level Architecture Diagram
- Component Specifications
- Request Flow Sequence
- Architectural Justification: Why Asynchronous Validation?
- Future enhancement
- Code
- References
1. Overview
The architecture follows a Decoupled Producer-Consumer pattern. Its primary objective is to provide a highly available entry point that captures external events with minimal latency, ensuring data durability through a queuing system before processing business logic and validations.![ref1]
2. Architecture
| Component | AWS |
|---|---|
| Entry Point | API Gateway |
| Authentication | Lambda |
| Ingestion | Lambda |
| Messaging/Queue | SQS |
| Worker/Processor | Lambda |
| Scaling | Automatic (Scale-to-Zero) |
3. High-Level Architecture Diagram
4. Component Specifications
1. API Gateway
The entry point for all incoming webhook requests. Role: Acts as the managed interface for the system.
- Key Responsibilities: Terminating TLS, request routing, and basic protocol validation.
- Design Choice: By using API Gateway, we offload authentication and throttling concerns, ensuring the underlying compute resources are only used for legitimate traffic.
2. Auth Lambda (Authorizer)
A dedicated function for request validation.
- Role: Performs security checks (e.g., verifying Shopify HMAC signatures or API keys).
- Interaction: If validation succeeds, it returns an IAM policy allowing the API Gateway to invoke the next stage. If it fails, the request is rejected at the edge with a
401 Unauthorized.
3. Injection / Producer Lambda
The ingestion layer is designed for speed and reliability.
- Role: Receives the raw payload from the API Gateway and pushes it to the directed message queue.
- Validation Strategy: This layer uses Shallow Validation. It checks if the payload is valid JSON but does not enforce a strict schema (DTO). This ensures that if the provider adds new fields unexpectedly, the event is still captured.
- Outcome: Once the message is in SQS, it returns a 202 Accepted to the client.
4. SQS / FIFO (Simple Queue Service)
The durability and ordering layer.
-
Role: Buffers events and ensures they are processed in the order they were received (First-In-First-Out).
-
Benefit: Decouples the ingestion speed from the processing speed, protecting downstream services from traffic spikes.
5. Consumer / Worker Lambda
The core business logic and validation layer.
- Role: Triggered by messages in SQS to perform heavy lifting.
-
Validation Strategy: This layer performs Deep Validation (Schema/DTO checks). It maps the incoming data to the internal system requirements.
- Processing: If validation passes, it works on logic to perform updates to the database or triggers downstream business workflows.
6. DLQ (Dead Letter Queue) & Fix-and-Replay
The resilience and recovery mechanism.
- Role: Captures events that fail processing in the Worker Lambda (e.g., schema mismatches or transient database errors).
- Fix-and-Replay Path: Allows developers to inspect failed events in the DLQ, fix the underlying Worker code or schema, and then re-inject the message back into the Worker for processing without losing data.
5. Request Flow Sequence
- Ingestion: The Client sends a webhook. API Gateway triggers the Auth Lambda.
- Verification: Upon successful authentication, API Gateway passes the request to the Injection Lambda.
- Persistence: Injection Lambda performs a structural check and sends the payload to SQS FIFO.
- Acknowledgement: The system returns an immediate 202 Accepted to the client.
- Processing: SQS triggers the Worker Lambda.
- Deep Validation: The Worker validates the schema.
- If Valid: The event is processed.
- If Invalid: The event is moved to the DLQ.
- Architectural Justification: Why Asynchronous Validation? This design prioritizes Durability over Immediate Rejection.
- Resilience to External Changes: Third-party webhooks (like Shopify) are subject to change. If we enforced strict validation at the API Gateway (as suggested in your peer review), a new, unmapped field from Shopify would cause a 400 Bad Request , and the data would be lost forever.
- Reliability: By accepting the data first, we ensure we have a “copy of record.” If the validation fails in the Worker, we have the ability to fix our code and replay the event from the DLQ.
- Client Experience: Webhook providers require fast response times to prevent retries and back-offs. This architecture minimizes the synchronous work, ensuring we meet these strict time constraints.
7. Future enhancement
Entry Point: Global External HTTP(S) Load Balancer (ALB equivalent)
- Static IP: Provides a single, static Anycast IP address to whitelist for any third party.
8. Code
A fully serverless, async message processing system built with AWS Lambda, API Gateway, and SQS, deployed via Terraform and TypeScript.