Software RAID (mdadm) on Linux protects your data by keeping identical copies of partitions across multiple disks in a RAID1 setup. When one of these disks fails, the array becomes degraded, meaning it still works but no longer has redundancy. It’s important to fix such failures quickly to avoid data loss and Restore Failed Disk in Linux mdadm RAID1 promptly to maintain full redundancy and data protection.
Introduction
Linux software RAID provides mirrored storage using tools like mdadm, which assembles and manages RAID arrays. A RAID1 array mirrors data across two or more disks so that if one disk fails, the system continues to operate from the remaining disk without data loss. Monitoring tools report RAID health, and in degraded conditions, administrators must take action to restore redundancy.
Step-by-Step: Repairing a Degraded RAID
1. Check RAID Status
To begin, inspect the current RAID health:
# cat /proc/mdstat
This will show arrays and whether a member is missing (e.g., showing [2/1] [_U]), meaning one of two disks is not active.
2. Identify the Failed Drive
Use mdadm –detail to find which device is marked as failed:
#mdadm --detail /dev/md0
Look for devices with (F) or removed status. These are candidates for rebuild or replacement.
3. Remove Failed Drives
If a disk has actually failed and is still in the RAID metadata, mark it as failed and remove it:
#mdadm --manage /dev/md0 --fail /dev/nvme0n1p3
#mdadm --manage /dev/md0 --remove /dev/nvme0n1p3
This cleans up the array and prepares it for the replacement device.
4. Re-Add a Disk to the Array
If a disk was removed but is actually intact and was part of the array recently, you can use mdadm –re-add. This tells mdadm to add back a device that was previously part of the RAID, using its existing metadata and event count to re-synchronize.
Example:
#mdadm --manage /dev/md0 /dev/nvme1n1p1 --re-add
This instructs the driver to reinsert the disk into the same RAID slot it previously occupied, triggering recovery if needed.
Use cases for –re-add:
- The device was part of the array but was temporarily missing or disconnected.
- You stopped and re-assembled the array and want to reinsert known good members.
- The metadata on the disk still matches the array and can be reused.
Note: –re-add will only work if the disk is healthy and the metadata is intact. It will not work for hardware-failed disks.
5. Add a Replacement Disk
If the original disk is truly failed (physically broken), you must physically replace it with a new one, then partition it similarly and add it to the array:
#mdadm --manage /dev/md0 --add /dev/sdb1
This will automatically initiate a rebuild of the RAID array using the good disk(s).
Monitor progress with:
#watch cat /proc/mdstat
When the Issue is a Hardware Failure
If the system reports a disk with 0B size, cannot write metadata, or otherwise cannot be re-added or accessed at the OS level, this indicates a hardware failure. No software RAID command will bring such a disk back into functioning condition.
In such cases, you should:
Seek assistance from your data center or hosting provider. And Request replacement of the failed physical disk.
Once replaced, partition it to match the existing RAID and add it back to the array for a rebuild.
RAID redundancy cannot be restored until the failed hardware is replaced, and continued operation in degraded mode increases the risk of data loss.
Conclusion
Managing a degraded mdadm RAID requires understanding both the software and hardware aspects of the array. Use cat /proc/mdstat and mdadm –detail to assess array health, and mdadm –fail, –remove, –re-add, or –add to fix issues when they are purely software-related. However, when the disk itself shows physical failure or is unreadable, escalate to the data center for hardware replacement. Once a healthy device is available, re-adding it to the array and allowing it to rebuild will restore RAID redundancy.
Restoring a RAID array after a disk failure requires precision, experience, and zero room for error. If you’re unsure about the recovery process or want to safely Restore Failed Disk in Linux mdadm RAID1 without risking data loss, it’s always best to rely on professionals. Our expert Linux Server Management Services and proactive Server Management Services team can handle RAID rebuilds, disk replacements, monitoring, and full server recovery with minimal downtime. Contact us today for fast, secure, and reliable server support.