Serverless ETL Pipeline for Weather Data
This solution builds a serverless data pipeline that collects, processes, and analyzes global weather data using:
- OpenWeatherMap API as the data source
- Lambda function for data ingestion
- Kinesis Data Firehose for streaming
- S3 for data lake storage with time-based partitioning
- AWS Glue Crawler & Data Catalog for automated schema discovery
- Amazon Athena for SQL query capabilities against raw data
- VPC Endpoints for secure data access
- AWS Secrets Manager for credential management
- AWS Glue ETL for data transformation
- Aurora Serverless v2 for structured data storage and analysis
- EventBridge Rules for workflow orchestration
The pipeline delivers a complete solution with minimal operational overhead, providing both raw data in S3 and structured data in Aurora for comprehensive weather insights.
Intended audience
This video is designed for beginners interested in AWS data engineering seeking hands-on experience with ETL pipelines. It’s suitable for those preparing for the AWS Data Engineering certification or anyone wanting to develop practical cloud data skills through a real-world project.
Learning Objectives
- Create a Lambda function to process weather data from the OpenWeatherMap API
- Set up Kinesis Firehose to store data in S3 with dynamic partitioning
- Implement AWS Secrets Manager for secure credential management
- Configure a Glue Crawler to catalog the S3 data
- Set up Amazon Athena for querying the raw data via the Glue Data Catalog
- Deploy Amazon Aurora Serverless v2 database to store transformed data
- Establish VPC endpoints for S3 and Secrets Manager for enhanced security
- Build a Glue ETL pipeline using visual ETL and script mode for data transformation
- Configure automated triggers using EventBridge for Lambda and Glue triggers for the crawler and ETL job
- Run SQL queries via Aurora’s built-in Query Editor
This hands-on demonstration will show you how to deploy directly from the command line while Elastic Beanstalk automatically handles the infrastructure provisioning and management.
Get Started
This post is licensed under CC BY 4.0 by the author.