Step Functions

Problems with Lambda

👉 In simple terms: Step Functions help you connect multiple serverless functions/services into a workflow, step by step, with logic and rules.

🧩 How it works

  • You define a workflow (a state machine).

  • Each step can be a task (like running a Lambda), a decision (if/else), or a wait/retry.

  • AWS Step Functions manage the order, handle errors/retries, and keep track of the state.

Imagine you run an online shoe shop 👟. When a customer places an order:

  1. Step 1 → Check payment (Lambda).

  2. Step 2 → If payment succeeds → reserve inventory (DynamoDB).

  3. Step 3 → If inventory is available → arrange shipping (another Lambda).

  4. Step 4 → Send confirmation email (SNS).

  5. If something fails → Step Functions can retry, send alerts, or stop.

⚡ Instead of you writing all this logic manually, Step Functions handle the flow.

📖 Analogy

Think of Step Functions like a project manager:

  • Assigns tasks in order.

  • Makes decisions if something goes wrong.

  • Keeps track of progress.

  • Ensures nothing is missed.

🛠️ Two types of workflow

  1. Standard Workflows – Long-running (can last days).

Designed for long-running, reliable processes.

  • Duration: Can run for up to 1 year.

  • Executions: Millions of executions per month.

  • Execution history: Stored for up to 90 days (you can see every step).

  • Pricing: Pay per state transition.

  • Best for:

    • Order processing

    • Payment workflows

    • Fraud checks

    • Any process that must be durable and auditable.

👉 Example: A loan application process that takes days/weeks — Step Functions Standard keeps track of the workflow until it finishes.

  1. Express Workflows – Fast, high-volume (run in milliseconds, millions of executions).

  • Designed for high-volume, short-duration processes.

  • Duration: Up to 5 minutes.

  • Executions: Millions per second (very high throughput).

  • Execution history: Short-lived, logs go to CloudWatch or X-Ray.

  • Pricing: Pay for execution duration + number of requests (cheaper at scale).

  • Best for:

    • Real-time data processing

    • IoT event handling

    • Streaming data (video/audio)

    • High-volume API requests

👉 Example: A real-time image recognition app where thousands of users upload pictures every second — Express workflows process each quickly and cheaply.

Comparison Table

Feature
Standard Workflow 🏛️
Express Workflow ⚡

Max duration

1 year

5 minutes

Execution rate

Thousands/sec

Millions/sec

Execution history

90 days

Sent to CloudWatch/X-Ray

Cost model

Per state transition

Per execution + duration

Use case

Long-running, auditable processes

High-volume, real-time processing

Components of Step Functions

1. State Machine 🏗️

  • The overall workflow definition.

  • It’s like a map showing each step, order, and rules.

  • Written in Amazon States Language (ASL) (JSON format).

2. States 🔄

Each step in your workflow is called a state. There are different types:

  • Task → Runs work (like a Lambda, ECS job, Glue job).

  • Choice → If/Else branching (decision-making).

  • Parallel → Runs multiple branches at the same time.

  • Map → Runs the same step for multiple items (looping).

  • Wait → Pause for a certain time.

  • Pass → Passes input to output without doing anything.

  • Fail / Succeed → Ends the workflow with failure/success.

3. Transitions 🔀

  • Define how the workflow moves from one state to the next.

  • Example: “After Task 1 → go to Task 2” OR “If error → retry or go to Fail state.”

4. Input & Output (State Data) 📦

  • Each state can receive input and produce output.

  • Step Functions pass this data along the workflow.

  • Example: Payment check → returns “approved/denied” → next step uses that info.

5. Error Handling & Retries ⚠️

  • Built-in system to retry failed steps, catch errors, and move to backup steps.

  • Example: If payment system fails → retry 3 times → if still fails → send alert.

6. Execution ▶️

  • Each run of the workflow is called an execution.

  • You can track the status (running, succeeded, failed)

📖 Analogy

Think of Step Functions like a recipe book:

  • State Machine = the whole recipe.

  • States = individual steps (chop, cook, mix).

  • Transitions = what comes after chopping (cook).

  • Error Handling = if food burns, try again or order pizza 🍕.

  • Execution = each time you cook following the recipe.

States

Last updated