<\!DOCTYPE html> RMAN Backup Failure Triage Runbook โ€” ORA-19566, ORA-27030 | TuneVault
๐Ÿ’พ Backup

RMAN Backup Failure Triage Runbook

โฑ 30โ€“90 min Oracle 11g / 12c / 19c Updated 2026-05-15
๐Ÿ”ด Not sure if you have a valid backup?
Jump to Step 6: Validate Recoverability first. Knowing whether you have a usable backup is more important than understanding why the last job failed.
In this runbook
  1. Read the RMAN error correctly
  2. Common errors and their fixes
  3. Channel and device errors
  4. Catalog and crosscheck issues
  5. Space-related failures
  6. Validate recoverability
  7. Prevention and monitoring

1. Read the RMAN Error Correctly

RMAN error messages are hierarchical. The root cause is almost always the last error in the stack, not the first. When you see a wall of ORA- errors, scroll to the bottom of the RMAN output.

-- Find the last RMAN job status and error
SELECT
  j.session_key,
  j.start_time,
  j.end_time,
  j.status,
  j.input_bytes_display,
  j.output_bytes_display,
  j.output_device_type
FROM v$rman_backup_job_details j
ORDER BY j.start_time DESC
FETCH FIRST 5 ROWS ONLY;

-- Check RMAN output log (find from v$rman_output)
SELECT output
FROM v$rman_output
WHERE session_key = (
  SELECT MAX(session_key) FROM v$rman_backup_job_details
  WHERE status = 'FAILED'
)
ORDER BY recid;

2. Common Errors and Their Fixes

ORA-19809 / ORA-19804: FRA full

The Fast Recovery Area is full. See the Archive Log Destination Full runbook for the full fix.

-- Quick check: FRA usage
SELECT ROUND(space_used/space_limit*100,1) AS pct_full
FROM v$recovery_file_dest;

-- Quick fix: delete backed-up archive logs
-- RMAN> DELETE NOPROMPT ARCHIVELOG ALL BACKED UP 1 TIMES TO DISK;

ORA-19566: exceeded limit of corrupt blocks

-- RMAN hit more corrupt blocks than MAXCORRUPT allows
-- Default MAXCORRUPT = 0 (zero tolerance)

-- Find which datafiles have corrupt blocks
SELECT file#, block#, blocks, corruption_type, marked_corrupt
FROM v$database_block_corruption
ORDER BY file#;

-- Try to recover corrupt blocks (if media recovery is available)
-- RMAN> BLOCKRECOVER CORRUPTION LIST;

-- If only datafile header corruption, try resetting:
-- RMAN> RECOVER DATAFILE 7;

-- As a temporary workaround to complete backup (NOT a fix):
-- RMAN> BACKUP DATABASE MAXCORRUPT 1000;
-- Warning: this may mean your backup can't be used for full recovery

ORA-27030: skgfwrt: sbtwrite2 returned error

-- Media management layer (tape/NFS/cloud) write error
-- Check: is the target backup device reachable?
-- For NFS: df -h /backup/rman_dest
-- For tape: check MML (NetBackup, DataDomain, etc.) connectivity

-- Retry with backup to disk if tape is unavailable
-- RMAN> BACKUP DATABASE FORMAT '/u02/rman_backup/%d_%T_%U.bkp';

RMAN-06059: expected archived log not found

-- Archive log in catalog but not on disk (deleted manually?)
CROSSCHECK ARCHIVELOG ALL;
DELETE NOPROMPT EXPIRED ARCHIVELOG ALL;

-- Then retry backup. RMAN will skip missing arclogs
-- and backup will complete (may have gap in recovery window)

3. Channel and Device Errors

Channel timeout or connection drop

-- Increase channel timeout (default 300 seconds)
-- RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK MAXOPENFILES 8;

-- Add explicit timeout to RMAN script
-- RUN {
--   ALLOCATE CHANNEL c1 DEVICE TYPE DISK FORMAT '/u02/rman/%U';
--   SET COMMAND ID TO 'backup_db';
--   BACKUP DATABASE PLUS ARCHIVELOG;
--   RELEASE CHANNEL c1;
-- }

ORA-15041: diskgroup space exhausted (ASM)

-- ASM FRA diskgroup is full
SELECT
  name,
  ROUND(total_mb/1024, 1) AS total_gb,
  ROUND(free_mb/1024, 1) AS free_gb,
  ROUND((1 - free_mb/total_mb)*100, 1) AS pct_used
FROM v$asm_diskgroup;

-- Delete old backups from ASM FRA using RMAN
-- RMAN> DELETE NOPROMPT BACKUP COMPLETED BEFORE 'SYSDATE-7';

4. Catalog and Crosscheck Issues

-- Full crosscheck and cleanup of expired objects
CROSSCHECK BACKUP;
CROSSCHECK ARCHIVELOG ALL;
CROSSCHECK COPY;

DELETE NOPROMPT EXPIRED BACKUP;
DELETE NOPROMPT EXPIRED COPY;
DELETE NOPROMPT EXPIRED ARCHIVELOG ALL;

-- List backups RMAN knows about
LIST BACKUP SUMMARY;

-- List backups completed in the last 7 days
LIST BACKUP COMPLETED AFTER 'SYSDATE-7';

-- Resync from controlfile (if catalog is out of sync)
-- RMAN> RESYNC CATALOG;

5. Space-Related Failures

-- Check all backup-related space in one query
SELECT
  'FRA' AS location,
  ROUND(space_limit/1073741824, 1) AS total_gb,
  ROUND(space_used/1073741824, 1) AS used_gb,
  ROUND(space_reclaimable/1073741824, 1) AS reclaimable_gb,
  ROUND(space_used/space_limit*100, 1) AS pct_used
FROM v$recovery_file_dest;

-- Check backup destination filesystem (OS-level check)
-- df -h /u02/rman_backup

-- Clean up old backups that exceed retention policy
DELETE NOPROMPT OBSOLETE;

6. Validate Recoverability

โš ๏ธ This is the most important step
A backup that exists but cannot be used for recovery is useless. Run RESTORE VALIDATE monthly to confirm your backups are intact.
-- Validate backup without actually restoring (checks all backup pieces)
RESTORE DATABASE VALIDATE;

-- Validate just the last full backup (faster)
RESTORE DATABASE FROM TAG '&BACKUP_TAG' VALIDATE;

-- Validate specific tablespace backup
RESTORE TABLESPACE SYSTEM VALIDATE;
RESTORE TABLESPACE SYSAUX VALIDATE;

-- Check validation results
SELECT
  file#, blocks, blocks_examined, blocks_skipped,
  blocks_corrupt, marked_corrupt,
  high_scn
FROM v$database_block_corruption;

-- Check RMAN validation report
SELECT object_type, status, object_name
FROM v$rman_status
WHERE operation = 'RESTORE'
  AND start_time > SYSDATE - 1
ORDER BY start_time DESC;

7. Prevention and Monitoring

โœ… RMAN health checklist
-- Monitor: check when last successful backup ran
SELECT
  MAX(end_time) AS last_successful_backup,
  ROUND(SYSDATE - MAX(end_time), 1) AS days_since_backup
FROM v$rman_backup_job_details
WHERE status = 'COMPLETED'
  AND input_type = 'DB FULL';

Know your backup status before 3am

TuneVault monitors RMAN job status, last successful backup age, and FRA utilisation. Get alerted when a backup fails or hasn't run in 24 hours.

Run a Free Health Check โ†’