DataBase
What is a Database?
A database is just a place to store, organize, and retrieve data (like customer info, sales records, or inventory).
In AWS, you don’t have to build servers yourself — AWS manages the database for you (if you want).
Two Main Types of Databases
Relational Databases (SQL-based)
Think Excel tables: rows + columns.
Best when data has structure (like customer names, emails, transactions).
Examples in AWS:
Amazon RDS (Relational Database Service) → managed SQL databases.
Aurora → AWS’s high-performance relational database (compatible with MySQL & PostgreSQL).
Non-Relational Databases (NoSQL-based)
Think JSON documents or key-value pairs (like a giant dictionary).
Best when data is flexible, unstructured, or changes a lot.
Examples in AWS:
DynamoDB → super-fast key-value & document database.
DocumentDB → MongoDB-compatible.
Neptune → graph database (used for social networks, fraud detection).
AWS Database Services:
Here’s the main AWS databases you’ll deal with:
Amazon RDS → Relational (SQL)
Supports MySQL, PostgreSQL, SQL Server, Oracle, MariaDB.
AWS handles backups, patching, scaling.
Amazon Aurora → Next-gen relational (SQL)
Faster + cheaper than commercial databases.
Auto-healing + replication across multiple AZs.
Amazon DynamoDB → NoSQL (key-value & document)
Fully serverless (no servers to manage).
Millisecond response time.
Great for IoT, gaming, real-time apps.
Amazon Redshift → Data warehouse (analytics)
Stores massive data (terabytes/petabytes).
Used for business intelligence, dashboards, reports.
Amazon ElastiCache → In-memory (Redis & Memcached)
Super-fast caching to reduce database load.
Amazon DocumentDB → MongoDB-compatible (NoSQL docs).
Amazon Neptune → Graph database (relationships).
Amazon QLDB → Ledger database (tamper-proof audit logs).
Amazon Timestream → Time-series database (IoT, metrics, logs).
How Data Flows in AWS (Simple Example)
Let’s imagine an online shoe shop (like yours 🥾):
Customer signs up → Stored in RDS (SQL).
Shopping cart data → Stored in DynamoDB (NoSQL) for speed.
Trending products dashboard → Uses Redshift for analytics.
Hot products cache → Stored in ElastiCache for fast delivery.
Quick Mapping
Transactions, Orders → RDS / Aurora
User Profiles, Cart → DynamoDB
Analytics & Reports → Redshift
Caching Fast Results → ElastiCache
Time-Series Logs/IoT → Timestream
Fraud/Social Graph → Neptune
Audit Trail → QLDB
What is a Cluster
A cluster is a group of computers (servers) that work together as if they were one.
Instead of putting all the load on a single machine, you use multiple machines that share the work.
👉 Think of it like a team of chefs in a kitchen:
If one chef cooks everything, it’s slow and risky (if they get sick, the restaurant shuts).
A team (cluster) of chefs can handle more customers, and if one chef is out, others still keep the kitchen running.
Why Clusters Are Used
High Availability (HA) → If one server fails, another in the cluster takes over.
Scalability → Can add more servers to handle more load.
Performance → Work is distributed among servers.
Clusters in AWS Databases
When AWS talks about clusters, it usually means a set of database instances working together:
Amazon RDS (Aurora Cluster)
Aurora doesn’t just run on one server.
It creates a cluster with:
Writer Node → Handles all the writes (inserts/updates).
Reader Nodes (Read Replicas) → Handle read queries.
Data is automatically replicated across multiple Availability Zones.
Amazon Redshift Cluster
A Redshift cluster is made of multiple nodes:
Leader Node → Handles queries & coordinates.
Compute Nodes → Store & process the data.
Amazon ElastiCache Cluster
Multiple cache nodes working together for high speed.
Amazon Neptune Cluster
Graph DB cluster with a writer + read replicas.
Real-World Example
Let’s say your retail shoe shop has an app:
If one database server runs everything:
Too many customers = slow.
If it crashes = your app is down.
If you use an Aurora Cluster:
1 Writer Node → saves new orders.
3 Reader Nodes → customers browse products without slowing down checkout.
If the writer crashes → AWS promotes a reader to become the new writer.
IOPS (Input/Output Operations Per Second)
What it means → How fast your database can read/write data.
Think of it like: how many times per second you can access a book in a library.
Higher IOPS = faster database performance.
In AWS RDS:
General Purpose (gp3/gp2) → Balanced performance (good for most apps).
Provisioned IOPS (io1/io2) → High-performance (critical apps like finance, gaming).
Primary / Secondary
Primary (Writer) → The main database instance that accepts writes + reads.
Secondary (Replica/Standby) → Copies of the database:
Used for read-only traffic (read replicas).
Or failover (standby in Multi-AZ).
👉 If the Primary goes down → AWS automatically promotes a Secondary.
Endpoints
Endpoints = connection addresses for your app.
Cluster Endpoint (Writer Endpoint) → Always points to the primary/writer.
Reader Endpoint → Balances across read replicas.
Instance Endpoint → Direct access to a specific DB instance.
👉 Why?
Your app doesn’t need to know which DB is writer/reader.
AWS handles rerouting during failover.
Read vs Write
Write = Insert/Update/Delete (changing data).
Read = SELECT queries (fetching data).
In clusters, AWS often splits read & write:
Writer handles writes.
Readers handle reads → faster performance.
Replication / Synchronization
AWS keeps copies of your data across multiple DB instances or AZs.
Synchronous Replication → Data is written to both primary + standby at the same time (used in Multi-AZ for high availability).
Asynchronous Replication → Data is copied later (used in Read Replicas, can have slight lag).
Snapshot
A backup copy of your database at a point in time.
You can:
Restore to a new DB.
Automate backups with RDS.
Snapshots can be:
Automated (daily by AWS).
Manual (you trigger).
Encrypted snapshots → Keep your data secure.
RDS Custom
Special RDS flavor where you control the OS and database software (normally AWS locks this).
Useful if you need custom configurations, agents, or security tools.
Example → Oracle DB with custom monitoring tools.
Authentication Methods
IAM Authentication → Use AWS IAM to connect (no passwords).
Kerberos → Integrates with Microsoft Active Directory.
SAML (Security Assertion Markup Language) → Single Sign-On (SSO) for apps/users.
Database Native Authentication → Traditional username/password.
Storage Types
Magnetic (Standard) → Old, not used much.
gp2/gp3 (General Purpose SSD) → Balanced performance.
io1/io2 (Provisioned IOPS) → High-performance workloads.
Performance Insights (PI)
AWS tool to monitor DB performance.
Shows which queries are slow, who is consuming resources, etc.
Security Features
Encryption at rest → Data is encrypted on disk (KMS).
Encryption in transit → Data encrypted via SSL/TLS.
Security Groups & VPC → Network access controls.
SAML / IAM / AD → Identity + Access management.
Putting It All Together (Example)
Imagine your shoe shop database running on Aurora RDS:
Aurora Cluster → 1 Primary (Writer) + 3 Readers.
Cluster Endpoint → App writes orders to the primary.
Reader Endpoint → Customers browsing shoes hit read replicas.
Replication → Data synced across 3 AZs (synchronous).
Snapshots → Daily backups, recoverable at any point in time.
Performance Insights → Monitors slow queries.
IOPS → Provisioned IOPS for fast checkout processing.
Security → IAM auth + KMS encryption + private VPC.
IOPS = speed of reads/writes.
Primary/Secondary = writer vs replicas.
Cluster = group of DB instances.
Endpoints = connection points (writer, reader, instance).
Read/Write split = performance optimization.
Replication/Synchronization = HA + scaling.
Snapshots = backups.
RDS Custom = advanced OS/DB control.
SAML/IAM/Kerberos = authentication.
ACID and BASE

ACID is Mostly Used by Relational DB

BASE is mostly used in Non-Relational DB Nosql

Running DBMS systems directly on EC2 Why and Why not
why you might do it

why you shouldnt do it

RDS
The Relational Database Service (RDS) is a Database(server) as a service product from AWS which allows the creation of managed databases instances.

RDS-Architecture

Relational Database Service (RDS) MultiAZ - Instance and Cluster
MultiAZ is a feature of RDS which provisions a highly available instance set.
Backups, software updates and restarts can take advantage of MultiAZ to reduce user disruption.


MultiAZ instance where a standby replica is kept in sync Synchronously with the primary instance.The standby replica cannot be used for any performance scaling ... only availability.
it also provides MultiAZ cluster mode, where a write and two reader instances are kept in sync Synchronously. The reader instances can be used for read operations ..allowing for limited read scaling.

Aurora Architecture
Aurora is a AWS designed database engine officially part of RDS
Aurora implements a number of radical design changes which offer significant performance and feature improvements over other RDS database engines.
Architecture


Aurora Serverless

Aurora global databases
Aurora global databases are a feature of Aurora Provisioned clusters which allow data to be replicated globally providing significant RPO and RTO improvements for BC and DR planning. Additionally global databases can provide performance improvements for customers .. with data being located closer to them, in a read-only form.
Replication occurs at the storage layer and is generally ~1second between all AWS regions.
Multi-master writes
Multi-master write is a mode of Aurora Provisioned Clusters which allows multiple instances to perform reads and writes at the same time - rather than only one primary instance having write capability in a single-master cluster. This lesson steps through the architecture and explains how the conflict resolution works.
Relational Database Service (RDS) - RDS Proxy
Amazon RDS Proxy is a fully managed, highly available database proxy for Amazon Relational Database Service (RDS) that makes applications more scalable, more resilient to database failures, and more secure.
RDS Proxy Arch-

Database Migration Service (DMS)
The Database Migration Service (DMS) is a managed service which allows for 0 data loss, low or 0 downtime migrations between 2 database endpoints. The service is capable of moving databases INTO or OUT of AWS.

Last updated