Article

9 AWS Backup Best Practices: Fix Coverage Gaps, Cut Costs, and Recover Faster

Learn AWS Backup best practices from real-world environments that help you fix coverage gaps, cut costs, and recover faster at scale.

Andrew Coleman
Written by
Andrew Coleman
Last updated: 
Apr 23, 2026
0
 min read

Quick Summary

  • Tier workloads by RPO and RTO before writing any policy, so you don't overprotect cheap data or underprotect critical systems.
  • Audit actual coverage across accounts and automate policy assignment by data type, because manual tagging breaks at scale.
  • Store backups in a separate account with immutable, encrypted vaults so compromised credentials can't reach your recovery points.
  • Test granular recovery monthly and continuously monitor for policy drift instead of just when an audit or incident forces the question.

Most AWS Backup best practices break down at scale. I've seen teams assume they're covered, until a restore fails or a critical resource turns out to have no protection at all.

AWS Backup best practices that hold up at scale

These AWS Backup best practices address the weak spots I’ve seen repeatedly in multi-account environments at scale.

1. Tier your workloads by RPO and RTO before writing any policy

Define RPO (how much data you can lose) and RTO (how long you can be down) per workload tier before configuring a single backup plan.

Most teams define RPO and RTO. The problem is that they apply the same targets to everything. A production Aurora database and a dev EC2 instance don't carry the same risk, and treating them identically means you're overprotecting cheap data and leaving critical systems exposed.

Here's how to break workloads into tiers before writing a single policy:

Tier Example Workloads RPO RTO Backup Approach
Critical Prod databases (RDS, Aurora, DynamoDB), payment systems Minutes Under 1 hour Continuous or hourly backups
Important App servers (EC2), core EFS shares 1 to 4 hours 2 to 4 hours Every 4 to 6 hours
Standard Dev/test, logs, non-critical S3 24 hours 24 hours Daily snapshots

AWS Backup supports continuous backup for RDS (point-in-time recovery down to the second). For EC2 and EBS, you’re limited to scheduled snapshots. 

Testing your RTOs proves that the tiering works. If your plan says 1-hour recovery and your last drill took 3 hours, fix it before something breaks. Run restore drills against your stated RTOs at least quarterly and include granular recovery scenarios, not just full-instance restores. 

Most teams drill full-instance recovery and pass, then discover during an actual incident that no one has a runbook for restoring a single table or file.

2. Audit what’s actually protected (Most teams don’t know)

Most AWS environments have unprotected resources nobody knows about. Audit your actual coverage before assuming your backup strategy is complete.

AWS Backup doesn’t auto-discover all newly created resources without configuration. If someone on your team spins up a new RDS instance, creates an S3 bucket, or launches EC2 instances without the right tags, those resources have zero backup coverage. 

The problem grows fast at scale. At companies running dozens or hundreds of AWS accounts, anyone with account access can create resources. Eon's State of Cloud Data Backup report found that 51% of organizations still rely on manual or semi-automated backup processes, and that's exactly where coverage gaps appear.

What AWS gives you natively:

  • Backup Audit Manager flags resources not assigned to backup plans
  • AWS Config rules detect untagged resources (requires custom setup)

Both help, but both require manual configuration and only work within the AWS Backup scope. Neither auto-classifies data by type or sensitivity.

The gap: you can check what's protected today. You have no native way to catch what becomes unprotected tomorrow. If you can't prove coverage continuously, you don't really have coverage.

3. Automate policy assignment and drift detection by data type

Manual tagging doesn’t hold up when resources are created across multiple accounts and teams. Automate policy assignment based on what the data is, and monitor it continuously so coverage doesn’t drift over time.

The standard AWS approach is to tag resources as "gold," "silver," or "bronze" and map those tags to backup plans. The approach only works if tagging is consistent, and it rarely is.

The problem goes deeper than missed tags at resource creation. Tags don't follow data. A resource correctly tagged as 'bronze' when it contains dev data can remain tagged that way months later, even when it's running a production workload with customer records. 

If auto-classification isn't an option yet, at a minimum, standardize them across teams (for example, backup-tier:gold) and enforce them with AWS Organizations policies. This helps, but it doesn’t eliminate the problem.

The more durable fix is to classify resources based on the data they contain: PII, financial records, and production workloads. Unlike tags, the data type doesn’t drift.

The category answer to this is cloud backup posture management (CBPM): continuous discovery, classification, and policy enforcement that doesn't depend on tagging staying accurate.

Platforms like Eon deploy agentlessly, autonomously discover resources across accounts and regions, classify them by data type, and enforce backup policies without manual tagging. 

That shift matters because it changes what "protected" means. Instead of trusting that the right tags exist on the right resources, you get a continuous answer to three questions that define backup confidence: what's protected, what's drifting, and how cleanly you can recover. 

Critical data gets high-frequency, cross-region protection. Lower-risk workloads receive lighter coverage. Coverage gaps surface in real time, not during an incident.

4. Store backups in a separate account (Region alone isn’t enough)

Cross-region backups protect against regional outages. Cross-account backups protect against compromised credentials, and that’s the threat most teams underestimate.

The AWS Well-Architected Framework treats accounts as security boundaries. If your production data and backup data share the same account, a compromised credential gives an attacker access to both. 

Ransomware attacks target backups directly because they’re the recovery path.

To reduce that risk, set up the following:

  • Dedicated backup account: Isolated it from all production accounts with separate IAM roles, MFA, and minimal access
  • Cross-region copies for critical tiers: Replicate high-priority data to a second region; keep lower tiers in-region to control costs
  • AWS Backup Vault Lock: Enforce WORM (Write Once Read Many) on vaults in the backup account. Once locked, nobody can delete or modify the data.

Cross-region replication adds data transfer costs, so reserve it for critical tiers. Primary backups in-region carry no transfer fee; cross-region copies only need to follow the data that actually can't afford to be lost in a regional failure.

5. Make backups immutable and encrypted

Immutability stops backup data from being deleted or modified. Encryption prevents it from being read if it's exfiltrated. You need both. Immutability alone won't help if an attacker copies your data before you notice, and encryption alone won't help if they delete the backups entirely.

Immutability

AWS Backup Vault Lock has two modes, and picking the wrong one has real consequences:

  • Governance mode: Admins with specific IAM permissions can still modify or remove the vault lock settings. Use this for non-critical tiers where you might need to adjust retention policies.
  • Compliance mode: Once applied, nobody can remove or modify the lock, not even admins or the root account. It's irreversible. Use this for critical data, but confirm retention settings first.

AWS also introduced logically air-gapped vaults (GA in August 2024). These store backup data in a separate, AWS-managed account that’s completely isolated from your environment. Restores can require multi-party approval when MPA is enabled.

Encryption

Backed-up data must be encrypted at rest and in transit. AWS Backup encrypts data at rest in vaults, but how you manage the keys determines how much control you have:

  • AWS-managed keys (default): AWS creates, manages, and rotates the key for you. Simple, but limited control.
  • Customer-managed KMS keys: You create and control the key through AWS KMS. You set the rotation policy, manage grants, and can audit usage through CloudTrail, which is the right choice for regulated environments.

Use customer-managed KMS keys for critical and production-tier backup vaults and rotate them regularly. Verify that transit encryption (TLS) is active on cross-region and cross-account copies.

If the irreversibility of compliance mode poses a risk to your team, Eon's logically air-gapped vaults let you maintain isolation while keeping the vault within your own environment. 

Production credentials can't reach it, but your team isn't locked into settings you can't revisit as requirements change.

6. Cut backup costs without cutting retention

Lifecycle rules help, but real cost reduction comes from deduplication, compression, and platform-level incremental backups.

EBS snapshots cost about $0.05/GB per month. S3 Standard drops to about $0.023/GB per month, while S3 Glacier Deep Archive falls below $0.001/GB per month. (US-East-1 pricing; varies by region.)

The difference between these tiers is significant, yet many teams keep data in higher-cost storage longer than necessary.

Consider an environment with 500 EC2 instances, each with 200 GB volumes and 30 days of daily snapshots. With a typical 3-5% daily change rate, storage costs can exceed $100,000 per year.

Storage tier pricing is only part of the cost problem. The harder part is that AWS Backup offers almost no visibility into why costs are increasing. 

Teams running major database migrations have watched backup costs spike, with no breakdown from AWS on what drove the increase. You see the bill go up, but not which resources, policies, or data types are responsible for it.

AWS Backup helps with:

  • Lifecycle policies that transition backups from warm to cold storage after a set number of days
  • Incremental EBS snapshots (only changed blocks after the first full snapshot)

Where native tooling stops short:

  • Cross-resource deduplication (if 100 instances share 80% of the same data, you’re storing 100 near-identical copies)
  • Advanced compression at the platform level
  • Intelligent tiering based on access patterns or data type

Solutions like Eon handle deduplication and compression at the platform level across all workloads. That's the mechanism behind the 40-50% storage cost reduction customers see with Eon. In the meantime, setting lifecycle rules to transition non-critical backups to cold storage after 30 days is the fastest native win. 

7. Go beyond full snapshots: Test granular recovery

Most teams focus on full-instance restores. In reality, most recovery requests involve a single file, table, or record, and native AWS snapshots make that harder than it should be.

Day-to-day failures are rarely full outages. A config file gets deleted. A deployment corrupts a table. You don’t need to restore an entire EC2 instance; you need the one thing that broke.

Here’s what that looks like with native snapshots:

Restoring a single file from an EBS snapshot means spinning up a new EC2 instance, attaching the volume, mounting it, finding the file, extracting it, and cleaning up. That’s 10+ manual steps for one file. RDS point-in-time recovery creates an entirely new database instance.

What your restore testing should cover:

  • Full instance recovery for actual disasters (quarterly)
  • Single file or folder recovery for the most common cases (monthly)
  • Database record or table-level recovery for application-level issues (monthly)
  • Last clean point-in-time recovery for ransomware scenarios

Document runbooks for each scenario, including who can initiate restores, where backups live, and expected recovery times. Keep them accessible outside AWS in case your environment is unavailable.

8. Build for multi-cloud from day one

If your organization runs workloads across multiple clouds, your backup strategy needs to cover them all. AWS Backup covers AWS only.

Even if you’re AWS-only today, that rarely stays true. If a team adopts BigQuery on Google Cloud or spins up Azure SQL for a specific workload next year, AWS Backup won’t cover it. Now you’re managing multiple tools and policies, and losing a unified view of what’s protected.

Even within AWS, multi-account setups create fragmentation. Multiply that across clouds, and consistency becomes harder to maintain.

The goal is consistency across environments: pick tools and design policies that can extend beyond AWS, even if your current footprint is single-cloud. This avoids the fragmentation that comes with organic multi-cloud growth.

9. Turn backup data into something useful

Backup data shouldn't be trapped in cold storage waiting for a disaster. It should be searchable, queryable, governed, usable for compliance, investigations, and analytics without needing to restore anything first.

Your backup data contains years of historical records, compliance evidence, transactional data, and operational context. If the only way to access any of it is to restore a full environment, that data is effectively locked away.

The problem shows up in common situations:

  • GDPR or compliance audit: A request comes in for all records tied to a specific customer. With native tools, you’d restore each relevant backup, search for the records, export them, and tear down the restored environment. That’s hours of work for a single request.
  • Internal investigation or legal hold: You need records from 18 months ago. The production data has long since been overwritten.
  • Analytics on historical data: Training ML models on historical transaction patterns or building trend reports from data that’s aged out of production.

Native AWS Backup wasn't designed for this. Accessing backed-up data requires restoring it first.

Eon was built for it. Backup data is instantly searchable and queryable, so compliance requests, investigations, and analytics don't require restoring an environment first. It turns backup from an insurance policy into infrastructure your team can actually use.

5 AWS Backup mistakes that get expensive

These are the mistakes that show up when those best practices aren’t followed.

Snapshot-only strategies that fail at recovery time

A team runs daily EBS snapshots and assumes they're covered. Then someone deletes a single config file, and the only recovery option is to restore the entire volume, attach it to a new instance, and manually extract the file. What should take 5 minutes takes half a day, and the application stays down the whole time.

Dev-tier policies on production databases

A production RDS instance inherits a daily backup policy that was designed for dev environments. The database corrupts mid-afternoon. The last backup was 18 hours ago. Eighteen hours of transaction data, gone. The fix costs nothing; the policy needed to be hourly. The data loss costs a lot more.

Untagged resources that nobody knows are exposed

A contractor spins up three EC2 instances for a migration project and doesn't apply backup tags. Six months later, one of those instances is running a production workload with customer data. It fails. There's no backup. The team finds out during the incident, not before it.

Ransomware that reaches the backup account

An attacker compromises an admin account that has access to both production and the backup vault in the same account. They encrypt production data and delete the recovery points in the same session. The team has nothing to restore from. A separate backup account with its own IAM roles and immutable vaults would have kept the recovery points untouched.

A restore test that never tested what actually breaks

The team runs quarterly full-instance restore drills and passes every time. Then a bad deployment corrupts a single table in a DynamoDB database. Nobody has ever tested table-level recovery. The runbook doesn't exist. The team spends 4 hours figuring out the process while the application serves stale data to customers.

When native AWS Backup stops being enough

AWS Backup handles the fundamentals well: centralized policies, cross-region copies, vault encryption, and lifecycle rules. For single-account, single-region setups, it does what it's supposed to do.

The strain shows up as environments grow. More accounts, more regions, eventually multiple clouds, and coverage becomes harder to track, recovery gets more complex, and costs climb in ways lifecycle rules alone can't fix.

At scale, cloud backup becomes a posture problem. You need visibility into what's protected, enforcement that holds as environments change, and confidence that recovery will actually work when something breaks.

Eon is built to that standard. It's a cloud-native, agentless platform that pairs autonomous backup with Cloud Backup Posture Management (CBPM):

  • Visibility and enforcement without manual tagging. Eon autonomously discovers and classifies resources across accounts and clouds, then enforces policies continuously. 

    AlphaSense
    used this approach to protect petabytes of AWS data, finished the initial backup in three days, and reached production in 25 days, a timeline that isn't possible when every resource needs manual tagging.
  • Recovery that actually meets your RTO. Granular recovery at the file, record, or table level means you don't have to rebuild an entire environment to restore a single item. 

    SoFi
    ran a five-region AWS environment where manual updates and native snapshots created gaps and recovery delays. With Eon, recovery went from a day to minutes, and retention changes moved from hours or days to seconds.
  • Real-time cost and spend visibility. Deduplication and compression cut storage costs, but the bigger shift is knowing what's driving the bill. 

    NETGEAR
    had poor visibility into backup spend and 10TB SQL restores that could take up to 24 hours. With Eon, they cut backup storage costs by 35%, gained real-time spend visibility, and reduced restore time to under three hours.

Can you confidently answer what’s protected across your environment right now? Book a demo with Eon to get real-time visibility into coverage, recovery, and cost across all your accounts and clouds.

Frequently asked questions

What is AWS Backup?

AWS Backup is a fully managed service that centralizes and automates data protection across AWS services, including EC2, RDS, DynamoDB, EFS, S3, Aurora, and FSx. It lets you create backup plans with scheduled frequencies and retention policies, store backups in encrypted vaults, and copy data across regions and accounts.

What are the most important AWS Backup best practices?

The most important AWS Backup best practices are tiering workloads by RPO and RTO, auditing coverage to identify unprotected resources, enforcing immutable backups in isolated accounts, testing granular recovery, and monitoring for policy drift continuously. 

Does AWS Backup support file-level or record-level recovery?

No, AWS Backup does not support native file-level or record-level recovery. Restoring a single file from an EBS snapshot requires spinning up a new instance, mounting the volume, and manually extracting the file. RDS point-in-time recovery creates an entirely new database instance.

How much does AWS Backup cost per GB?

AWS Backup storage costs vary by storage type. EBS snapshots cost about $0.05/GB/month, S3 Standard about $0.023/GB, and Glacier Deep Archive under $0.001/GB/month. Additional costs include cross-region data transfer fees and API call charges.

How do I find unprotected resources in my AWS environment?

AWS Config rules and Backup Audit Manager can detect resources not covered by backup plans, but both require manual setup. For continuous, automated coverage audits across multiple accounts and regions, cloud backup posture management (CBPM) tools discover and classify resources without relying on manual tagging.

Is AWS Backup enough for multi-cloud environments?

No, AWS Backup only covers AWS services. Organizations running workloads on Google Cloud, Azure, or hybrid infrastructure need a platform like Eon that provides unified policy management, cross-cloud visibility, and cross-cloud backup and recovery, with CBPM enforcing coverage continuously.

FAQ

No items found.
Andrew Coleman
Andrew Coleman

Head of CS & Support at Eon

>100% ROI in the first year

SoFi automated multi-region resilience and regulatory alignment across five AWS regions with Eon’s agentless platform, cutting recovery time from a day to minutes and achieving over 100% ROI.

Read case study
88% faster recovery, 35% savings

NETGEAR replaced its legacy backup provider with Eon's cloud-native platform, cutting a 10TB recovery from 24 hours to under three and reducing backup storage costs by 35% in under a week.

Read case study
9 AWS Backup Best Practices: Fix Coverage Gaps, Cut Costs, and Recover Faster

Turn your backups into usable data

Eon turns your backups into instantly searchable, usable data so you can recover exactly what you need without delays.

  • Instantly search backup data
  • Recover at any level
  • No full restores or downtime
See eon in action