Step Functions
Problems with Lambda

👉 In simple terms: Step Functions help you connect multiple serverless functions/services into a workflow, step by step, with logic and rules.
🧩 How it works
You define a workflow (a state machine).
Each step can be a task (like running a Lambda), a decision (if/else), or a wait/retry.
AWS Step Functions manage the order, handle errors/retries, and keep track of the state.
Imagine you run an online shoe shop 👟. When a customer places an order:
Step 1 → Check payment (Lambda).
Step 2 → If payment succeeds → reserve inventory (DynamoDB).
Step 3 → If inventory is available → arrange shipping (another Lambda).
Step 4 → Send confirmation email (SNS).
If something fails → Step Functions can retry, send alerts, or stop.
⚡ Instead of you writing all this logic manually, Step Functions handle the flow.
📖 Analogy
Think of Step Functions like a project manager:
Assigns tasks in order.
Makes decisions if something goes wrong.
Keeps track of progress.
Ensures nothing is missed.
🛠️ Two types of workflow
Standard Workflows – Long-running (can last days).
Designed for long-running, reliable processes.
Duration: Can run for up to 1 year.
Executions: Millions of executions per month.
Execution history: Stored for up to 90 days (you can see every step).
Pricing: Pay per state transition.
Best for:
Order processing
Payment workflows
Fraud checks
Any process that must be durable and auditable.
👉 Example: A loan application process that takes days/weeks — Step Functions Standard keeps track of the workflow until it finishes.
Express Workflows – Fast, high-volume (run in milliseconds, millions of executions).
Designed for high-volume, short-duration processes.
Duration: Up to 5 minutes.
Executions: Millions per second (very high throughput).
Execution history: Short-lived, logs go to CloudWatch or X-Ray.
Pricing: Pay for execution duration + number of requests (cheaper at scale).
Best for:
Real-time data processing
IoT event handling
Streaming data (video/audio)
High-volume API requests
👉 Example: A real-time image recognition app where thousands of users upload pictures every second — Express workflows process each quickly and cheaply.
Comparison Table
Max duration
1 year
5 minutes
Execution rate
Thousands/sec
Millions/sec
Execution history
90 days
Sent to CloudWatch/X-Ray
Cost model
Per state transition
Per execution + duration
Use case
Long-running, auditable processes
High-volume, real-time processing
Components of Step Functions
1. State Machine 🏗️
The overall workflow definition.
It’s like a map showing each step, order, and rules.
Written in Amazon States Language (ASL) (JSON format).
2. States 🔄
Each step in your workflow is called a state. There are different types:
Task → Runs work (like a Lambda, ECS job, Glue job).
Choice → If/Else branching (decision-making).
Parallel → Runs multiple branches at the same time.
Map → Runs the same step for multiple items (looping).
Wait → Pause for a certain time.
Pass → Passes input to output without doing anything.
Fail / Succeed → Ends the workflow with failure/success.
3. Transitions 🔀
Define how the workflow moves from one state to the next.
Example: “After Task 1 → go to Task 2” OR “If error → retry or go to Fail state.”
4. Input & Output (State Data) 📦
Each state can receive input and produce output.
Step Functions pass this data along the workflow.
Example: Payment check → returns “approved/denied” → next step uses that info.
5. Error Handling & Retries ⚠️
Built-in system to retry failed steps, catch errors, and move to backup steps.
Example: If payment system fails → retry 3 times → if still fails → send alert.
6. Execution ▶️
Each run of the workflow is called an execution.
You can track the status (running, succeeded, failed)
📖 Analogy
Think of Step Functions like a recipe book:
State Machine = the whole recipe.
States = individual steps (chop, cook, mix).
Transitions = what comes after chopping (cook).
Error Handling = if food burns, try again or order pizza 🍕.
Execution = each time you cook following the recipe.
States

Last updated