Need Assistance?

In only two hours, with an average response time of 15 minutes, our expert will have your problem sorted out.

Server Trouble?

For a single, all-inclusive fee, we guarantee the continuous reliability, safety, and blazing speed of your servers.

How to Restore a Failed Disk in a Linux mdadm RAID1 Array

Table of Contents

Software RAID (mdadm) on Linux protects your data by keeping identical copies of partitions across multiple disks in a RAID1 setup. When one of these disks fails, the array becomes degraded, meaning it still works but no longer has redundancy. It’s important to fix such failures quickly to avoid data loss and Restore Failed Disk in Linux mdadm RAID1 promptly to maintain full redundancy and data protection.

Introduction

Linux software RAID provides mirrored storage using tools like mdadm, which assembles and manages RAID arrays. A RAID1 array mirrors data across two or more disks so that if one disk fails, the system continues to operate from the remaining disk without data loss. Monitoring tools report RAID health, and in degraded conditions, administrators must take action to restore redundancy.

Step-by-Step: Repairing a Degraded RAID

1. Check RAID Status

To begin, inspect the current RAID health:

# cat /proc/mdstat

This will show arrays and whether a member is missing (e.g., showing [2/1] [_U]), meaning one of two disks is not active.

2. Identify the Failed Drive

Use mdadm –detail to find which device is marked as failed:

#mdadm --detail /dev/md0

Look for devices with (F) or removed status. These are candidates for rebuild or replacement.

3. Remove Failed Drives

If a disk has actually failed and is still in the RAID metadata, mark it as failed and remove it:

#mdadm --manage /dev/md0 --fail /dev/nvme0n1p3
#mdadm --manage /dev/md0 --remove /dev/nvme0n1p3

This cleans up the array and prepares it for the replacement device.

4. Re-Add a Disk to the Array

If a disk was removed but is actually intact and was part of the array recently, you can use mdadm –re-add. This tells mdadm to add back a device that was previously part of the RAID, using its existing metadata and event count to re-synchronize.

Example:

#mdadm --manage /dev/md0 /dev/nvme1n1p1 --re-add

This instructs the driver to reinsert the disk into the same RAID slot it previously occupied, triggering recovery if needed.

Use cases for –re-add:

  • The device was part of the array but was temporarily missing or disconnected.
  • You stopped and re-assembled the array and want to reinsert known good members.
  • The metadata on the disk still matches the array and can be reused.

Note: –re-add will only work if the disk is healthy and the metadata is intact. It will not work for hardware-failed disks.

5. Add a Replacement Disk

If the original disk is truly failed (physically broken), you must physically replace it with a new one, then partition it similarly and add it to the array:

#mdadm --manage /dev/md0 --add /dev/sdb1

This will automatically initiate a rebuild of the RAID array using the good disk(s).

Monitor progress with:

#watch cat /proc/mdstat

When the Issue is a Hardware Failure

If the system reports a disk with 0B size, cannot write metadata, or otherwise cannot be re-added or accessed at the OS level, this indicates a hardware failure. No software RAID command will bring such a disk back into functioning condition.

In such cases, you should:

Seek assistance from your data center or hosting provider. And Request replacement of the failed physical disk.

Once replaced, partition it to match the existing RAID and add it back to the array for a rebuild.

RAID redundancy cannot be restored until the failed hardware is replaced, and continued operation in degraded mode increases the risk of data loss.

Conclusion

Managing a degraded mdadm RAID requires understanding both the software and hardware aspects of the array. Use cat /proc/mdstat and mdadm –detail to assess array health, and mdadm –fail, –remove, –re-add, or –add to fix issues when they are purely software-related. However, when the disk itself shows physical failure or is unreadable, escalate to the data center for hardware replacement. Once a healthy device is available, re-adding it to the array and allowing it to rebuild will restore RAID redundancy.

Restoring a RAID array after a disk failure requires precision, experience, and zero room for error. If you’re unsure about the recovery process or want to safely Restore Failed Disk in Linux mdadm RAID1 without risking data loss, it’s always best to rely on professionals. Our expert Linux Server Management Services and proactive Server Management Services team can handle RAID rebuilds, disk replacements, monitoring, and full server recovery with minimal downtime. Contact us today for fast, secure, and reliable server support.

Liked!! Share the post.

Get Support right now!

Start server management with our 24x7 monitoring and active support team

Subscribe and get your first issue fixed for Free!

Looking for server support and 24x7 monitoring?

Have doubts? Connect with us now.