Active Directory Disaster Recovery (Best Practices). Active Directory (AD) plays a critical role in the infrastructure of many organizations, serving as the central repository for user accounts, permissions, and network configurations. However, unforeseen disasters such as fires, floods, malware attacks, or even misconfiguration disrupt the operation of AD, leading to data loss and downtime. To mitigate these risks, it’s essential to establish a robust disaster recovery plan for Active Directory. In this article, we explore best practices for AD disaster recovery, emphasizing the importance of proper documentation, testing, and considering various disaster scenarios.
General recommendations for Disaster Recovery
While preparing the recovery plan, the following general recommendations should be considered:
- The recovery strategy heavily depends on the backup, therefore, the proper backup plan is required. Follow this article for best practices.
- The recovery process must be well-documented. The proper guideline helps to ensure all steps are performed properly and in right order.
- The dependency map should be drawn, to show all services that depend on Active Directory:
- The disaster recovery plan must be tested regularly. The result of the test must be reviewed accurately, and the necessary adjustments must be added to the plan, to keep it in tact with the changing environment.
- The plan must include steps for recovery in several possible scenarios. Usual scenarios are the unavailability of a physical site (in case of fire, flood, social unrest, etc.) and the failure of Active Directory itself (due to misconfiguration, malware, schema issues, etc.).
Let’s have a look at possible restoration plans during these disaster scenarios.
Active Directory Restoration Plans
Unavailability of the physical site (single site infrastructure)
In a scenario where all domain controllers are in a single data center (or a server room), the following must be taken into the consideration:
- Ensure that the backup is not affected in case of a disaster. For example, if you store your backup data in the same server room as the production servers, a single fire can destroy both. Therefore, follow the best practice of storing one copy of backup data separately – with at least 8 km distance. For a small company with a single office – consider uploading backup in the cloud storage.
- Be ready to restore the entire Active Directory Infrastructure from backup in case of a disaster. It requires major outage and you need to have a detailed plan which includes the restoration of an entire forest (described later in this article)
Image Source: Freepik.com
The scenario with all domain controllers in the same data center should be avoided whenever possible. It increases the recovery time significantly and doesn’t allow using the built-in Active Directory high availability.
Try our Active Directory & Office 365 Reporting & Auditing Tools
Try us out for Free. 100’s of report templates available. Easily customise your own reports on AD, Azure AD & Office 355.
Unavailability of the physical site (multi-site infrastructure)
Let’s imaging the following scenario: a company has two physical locations, each of them has domain controllers. The first step of the planning of disaster recovery requires ensuring that in case of failure of servers in the first location, the services and user computers seamlessly reconnect to the second site. It is achieved using the proper DNS configuration planning:
- The first DNS server in network interface card configuration should be the domain controller from the local site
- The second DNS server should be another domain controller from the local site. To ensure that the local server is used in case of the failure of the first domain controller
- The third DNS server should be the domain controller from the remote site. In case disaster causes both local domain controllers to become unavailable, the remote domain controller is used.
Additionally, configure firewalls to allow cross-site connections for ports and protocols used by Active Directory services, such as DNS, LDAP, Kerberos, etc., as described in this article. If your forest contains several domains, ensure that domain controllers from each domain are presented in both sites.
In case some of your services are hardcoded to use specific domain controllers (for example, Exchange Server could be set to use specific domain controller using the StaticDomainControllers parameter of the Set-ExchangeServer cmdlet) you need manual intervention to change the configuration. All such services must be assessed and the required intervention must be included in the disaster recovery plan documentation.
The next step after ensuring the high availability is to plan the restoration of the service itself. If the servers are not affected (for example, the disaster only caused unavailability of the network in one of the data centers), there is no need for specific restoration – domain controllers are available automatically when the site is brought back online. It only requires some time for the replication process.
In case servers are failed and you need to restore them, the high-level process is the following:
- Check whether any Flexible Single Master Operations (FSMO) roles were hosted in the failed site. If they did, transfer the roles to the available controller using NTDSutil tool or PowerShell cmdlets, as shown in this article.
- Remove the servers from the failed site from the domain and cleanup their metadata (the detailed instruction is available here)
- When the site is restored, deploy new domain controllers (at this step, one domain controller per domain is enough) with the same name and IP address.
- Promote the new servers to the domain controller role. Use Active Directory Sites and Services snap-in to configure proper replication routes.
- After the replication is completed, redeploy other domain controllers.
Failure of the Active Directory Forest
In case of failure of Active Directory, the forest recovery steps must be performed. The complexity of the restoration process depends on the tools used for backing up the system and on the topology of Active Directory. It is only possible if you have a proper reserve copy and necessary passwords (domain admin password for each domain and Directory Services Restore Mode password for at least one domain controller for each domain). The main steps that is included in the process are:
- Identify the proper backup to use for restore – use the latest backup made before the problem appeared, to minimize the data loss.
- Shut down all domain controller, if they are still running.
- Restore one domain controller from the root domain. It is important because the root domain contains the Schema Admins and Enterprise Admins groups (these groups are required for the step 4). During the restoration perform a nonauthoritative restore of AD DS and an authoritative restore of SYSVOL, as described here. At this point, keep the server disconnected from the network.
- Use the recovered domain controller to seize all FSMO roles.
- Remove all other domain controllers from the root domain and cleanup their metadata.
- Raise the value of the relative ID (RID) pool, to avoid the RID conflicts after the domain recovery (more details are here), and then invalidate the current pool.
- Reset the computer account password of the restored domain controller, as it described in this article, and Kerberos Ticket Granting Ticket (KRBTGT) password. It should be done twice, since KRBTGT has password history that saves the last 2 passwords, so reset twice to ensure that the old password is erased.
- Remove the global catalog from the restored domain controller, to avoid conflicts with other domains within the forest (the step can be ignored if you only have one domain).
- Repeat steps 3-8 for one writable domain controller from each of the other domains, keeping in mind that you should always recover the parent domain before starting to recover the child. For example, in a forest shown on below image, you should recover one domain controller from domain.local domain, then child01.domain.local and child02.domain.local domains to be recovered simultaneously, and then sub.child01.domain.local should be restored.
Active Directory Disaster Recovery (Best Practices) Conclusion
In today’s interconnected world, the uninterrupted functioning of Active Directory is vital for organizations to maintain their daily operations. Implementing effective disaster recovery practices for Active Directory ensures that when the unexpected occurs, an organization quickly and efficiently restore its critical services. By following best practices such as comprehensive backups, thorough documentation, regular testing, and addressing specific disaster scenarios, businesses significantly reduce the impact of disasters and maintain business continuity. It is crucial to invest in disaster recovery planning for Active Directory to safeguard the heart of your organization’s IT infrastructure and preserve data integrity.
Try InfraSOS for FREE
Try InfraSOS Active Directory, Azure AD & Office 365 Reporting & Auditing Tool