2/25 Update: Add better support for updating Send Connectors. Script now supports an array of Send Connects with a normal cost and DR cost, to be used when normal route may not work. Code below has NOT been updated. Get the latest version from: http://izzy.org/Scripts/Exchange/Admin/Failover.ps1
I created this script several months ago for a client and it took about 18 hours of time to create and fully test initially. I recently made a few modifications to it and decided to post it today.
This script is designed to do a partial failover, where DAG membership is not changed, and a full failover that does remove DAG members. It supports both planned and unplanned failovers, where the servers are off-line. It will check the health of database replication and the replication infrastructure, using the Test-ReplicationHealth cmdlet. If too many logs, 5 by default, are found in the CopyQueueLength it will wait five seconds and check the queue length again. If the queues have not decreased it will prompt the user if they want to wait and check again or exit. If any CopyQueueLength are higher than 50, by default, it will abort so the user can figure out why. This is in addition to moving database, updating DNS, AD, etc.
Key things done by the script
- Check health of DBs
- Check CopyQueueLength
- Check replication infrastucture health
- Planned and unplanned (-ConfigurationOnly switch used) failovers
- Temporary (DAG left intack) and extneded failovers (DAG members removed)
- Move databases to DR\secondary site
- Update Public Folder datasebase value on Mailbox databases
- Update DNS for records that need to be changed to point to DR site
- Update Send Connector costs
- Forcing AD replication
- Calls RedistributeActiveDatabases.ps1 when failing back, code included for enviroments where this script won’t work
- Users PowerShell Transcript function to log all actions to a file
- Log only mode ($MakeChanges=$False/$True)
Due to the complexy and many factors this script will probally be updated many times in the future and will need to be customized for your enviroment. Currently it doesn’t support dynamically discovery servers and Public Folder databases, so you will need to update AD Site, Server names, IP addressess, Public Folder databases, and other settings\names at the top of the script at the minimum.
Screen shot of it running, in a small enviroment:
Source: http://izzy.org/Scripts/Exchange/Admin/Failover.ps1
# Exchange 2010 site failover script w/ health checks # Created by Jason Sherry | izzy@izzy.org | http://jasonsherry.org # Created 9/21/2012, Last Updated 2/21/2013 # 2/20: Added $MakeChanges switch, commented out code required for Journal mailbox databases (where Index state always = Failed) # 2/21: Minor clean-up # Source: http://izzy.org/Scripts/Exchange/Admin/Failover.ps1 #TO DOs # - Add option to not exit script, when doing post failover checks when an issue is found # - Checks for ReplayQueueLength & ContentIndexState # - Add logging to file (In addition to what has been added below that will show up in transcript file) # - Get server names, instread of having a fixed number when doing cluster calls # - Add index repair option # - Get PF DB names, instead of hard coding # - Add error handling, Try\Trap $ErrorActionPreference = "SilentlyContinue" $MakeChanges = $False #Replication Limits $CopyQueueLengthWarn = 5 $CopyQueueLengthMax = 50 $ReplayQueueLengthWarn = 5 $ReplayQueueLengthMax = 50 #AD & DAG Info $PrimaryADSite = "COL" $DRADSite = "DR" $DAGName = "DAG01" ## TO DO: Replace with code to get dynamic list of servers $PrimaryServers = @("COLMBX01","COLMBX02","COLMBX03") # Array of server names, 1st server listed will be the PAM $DRServers = @("DRMBX01","DRMBX02") $PrimaryPFDB = "Public Folders COL02" $DRPFDB = "Public Folders DR02" $PrimaryDC = "colcorpdc01.corp.company.com" #File Witness Information $PrimaryWS = "COLHYPERV01" $PrimaryWD = "E:\DAG-FSW\DAG01" $AltWS = "DRHYPERV01" $AltWD = "E:\DAG-FSW\DAG01" #DNS Values to update $DNSZone = "company.com" $DNSServers = @("10.10.24.4","10.10.16.41","10.10.34.3","10.10.32.8") # Multiples can be listed to help reduce latnecy due to AD replication $PrimaryDNSRecords = @(("mail","10.10.24.33"),("COLRPCmail","10.10.24.33"),("um","10.10.24.30")) $DRDNSRecords = @(("mail","10.10.12.30"),("COLRPCmail","10.10.12.30"),("um","10.10.12.30")) $SendConnector = "" Start-Transcript -Path "Failover.log" -Append -NoClobber clear If (!$MakeChanges) {Write-Host `n'Note: Script in logging mode only, $MakeChanges=$False'`n -ForegroundColor Yellow} Function Failover { Write-Host "`n`nWARNING: You are about to fail or move Exchange to the [$DRADSite] site!" -ForegroundColor Red $Caption = "Preparing to failover Exchange, please choose:" $Message = "-> Is this a planned or unplanned failover?" Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Planned", "&Unplanned","&Abort failover" -Default 0) { 0 { $PlannedFailover = $True $extendedFailover = $True } 1 { $PlannedFailover = $False } 2 { CloseScript } } CheckDBHealth $PlannedFailover ReplicationHealth $PlannedFailover If ($PlannedFailover) { $Caption = "Will this be a extended failover, please choose:" $Message = "-> In you contiune with an extended failover, the servers in the [$PrimaryADSite] site will be removed from the DAG and not kept up to date." Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Yes","&No","&Abort" -Default 1) { 0 { $extendedFailover = $True } 1 { $extendedFailover = $False } 2 { CloseScript } } } $DAG = Get-DatabaseAvailabilityGroup If (!$extendedFailover) { $Caption = "Change File Share Witness to [$AltWS], please choose:" $Message = "-> If the current FSW server [" + $dag.WitnessServer.HostName + "] will be off-line it should be changed." Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Yes","&No","&Abort" -Default 0) { 0 { Write-Host "`n`tChanging WitnessServer to [$AltWS] and WitnessDirectory to [$AltWD]" If ($MakeChanges) {Set-DatabaseAvailabilityGroup -Identity $DAGName -WitnessServer $AltWS -WitnessDirectory $AltWD} } 1 {} 2 { CloseScript } } } MoveResources $True $extendedFailover $PlannedFailover } Function Failback { Write-Host "`n`nYou are about to fail Exchange back to [$PrimaryADSite]. This will move resources back to this site and recover the DAG" -ForegroundColor Green $Caption = "Preparing to fail back Exchange, please choose:" $Message = "-> What this a planned or unplanned failover?" Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Planned", "&Unplanned" -Default 0) { 0 { $PlannedFailover = $True $Caption = "Extended failover, please choose:" $Message = "-> What this an extened failover, was the [$PrimaryADSite] site removed from the DAG?" Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Removed for extended failover", "&Not removed" -Default 1) { 0 { $extendedFailover = $True } 1 { $extendedFailover = $Flase } } } 1 { $PlannedFailover = $False $extendedFailover = $True } } CheckDBHealth $PlannedFailover ReplicationHealth $PlannedFailover $DAG = Get-DatabaseAvailabilityGroup $CurrentFSW = $dag.WitnessServer If ($CurrentFSW -ne $PrimaryWS -and $CurrentFSW -NotContains $PrimaryWS) { Write-Host "`nChanging witness server settings to : [$PrimaryWS] & : [$PrimaryWD], current server: [$CurrentFSW]" -ForegroundColor Cyan If ($MakeChanges) {Set-DatabaseAvailabilityGroup -Identity $DAGName -WitnessServer $PrimaryWS -WitnessDirectory $PrimaryWD} } MoveResources $False $extendedFailover $PlannedFailover } Function InputPrompt { #From: http://blogs.technet.com/b/jamesone/archive/2009/06/24/how-to-get-user-input-more-nicely-in-powershell.aspx Param( [String[]]$choiceList, [String]$Caption="Please make a selection", [String]$Message="Choices are presented below", [int]$default=0 ) $choicedesc = New-Object System.Collections.ObjectModel.Collection[System.Management.Automation.Host.ChoiceDescription] $choiceList | foreach { $choicedesc.Add((New-Object "System.Management.Automation.Host.ChoiceDescription" -ArgumentList $_))} $Host.ui.PromptForChoice($caption, $message, $choicedesc, $default) } Function CheckQueues { Param ([array]$Databases, [Bool]$ExitOnIssue=$False) ForEach ($Database in $Databases) { $CopyQueueLength1 = $Database.CopyQueueLength $DatabaseName = $Database.Name write-host "`t`t`nRefreshing in 5 seconds`n" -Foregroundcolor Yellow start-sleep -s 5 $SecondStatus = Get-MailboxDatabaseCopyStatus $Database.Name $CopyQueueLength2 = $SecondStatus.CopyQueueLength If ($CopyQueueLength2 -ge $CopyQueueLength1 -or $CopyQueueLength2 -gt $CopyQueueLengthWarn ) { $Caption = "Database [$DatabaseName] CopyQueueLength is not decreasing or still > $CopyQueueLengthWarn : `n`tPrevious value: [$CopyQueueLength1], current value [$CopyQueueLength2]" $Message = "" Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Wait another 5 seconds","&Contiune failover","&Abort failover" -Default 0) { 0 {CheckQueues $Database $ExitOnIssue} 1 {Return} 2 {If ($ExitOnIssue) {CloseScript}} } } } } Function CheckDBHealth { Param ( [Bool]$ExitOnIssue=$False) Write-Host "`n`tChecking health of databases..." -Foregroundcolor Green $WarningDBs = @() $BadDBs = @() $DBsCopyStatus = Get-MailboxDatabase | Get-MailboxDatabaseCopyStatus ## Bug\issue with formating that throws an error when doing the line below sometimes| ft Try { $DBsCopyStatus | Sort-Object Status,Name | ft name, status, @{label="Copy Q";expression={$_.CopyQueueLength}}, @{label="Reply Q";expression={$_.ReplayQueueLength}}, @{label="Index";expression={$_.contentIndexState}}, LastInspectedLogTime -Auto } Catch { Write-Host "Hit a stupid PowerShell formatting bug, unable to display status, run: `n`t[Get-MailboxDatabase | Get-MailboxDatabaseCopyStatus]" -Foregroundcolor Magenta } # Get-MailboxDatabase -ea Continue | Get-MailboxDatabaseCopyStatus | Sort-Object Status,Name | ft name, status, @{label="Copy Q";expression={$_.CopyQueueLength}}, @{label="Reply Q";expression={$_.ReplayQueueLength}}, @{label="Index";expression={$_.contentIndexState}}, LastInspectedLogTime -Auto ForEach ($DBCopyStatus in $DBsCopyStatus) { If ($DBCopyStatus.Status -ne "Healthy" -And $DBCopyStatus.Status -ne "Mounted") { $BadDBs += $DBCopyStatus } } If ($BadDBs.Count -gt 0) { Write-Host "The following database(s) are not in a healthy state, script will exit:" -Foregroundcolor Yellow ForEach ($DB in $BadDBs) { $DB | select Name, status, CopyQueueLength, ReplayQueueLength } If ($Mode = "Failover") {CloseScript} } $BadDBs = @() ForEach ($DBCopyStatus in $DBsCopyStatus) { If ($DBCopyStatus.CopyQueueLength -ge $CopyQueueLengthWarn -and $DBCopyStatus.CopyQueueLength -lt $CopyQueueLengthMax) { $WarningDBs += $DBCopyStatus } If ($DBCopyStatus.CopyQueueLength -ge $CopyQueueLengthMax) { $BadDBs += $DBCopyStatus } } If ($BadDBs.Count -gt 0) { Write-Host "`n*** FAILED *** The following database(s) have too large of a CopyQueueLength ( > $CopyQueueLengthMax ), the script will now exit:" -Foregroundcolor Red ForEach ($DB in $BadDBs) { $DB | select Name, CopyQueueLength, ReplayQueueLength } CloseScript } If ($WarningDBs.Count -gt 0) { Write-Host "The following database(s) have a CopyQueueLength > $CopyQueueLengthWarn " -Foregroundcolor Yellow ForEach ($DB in $WarningDBs) { $DB | select Name, CopyQueueLength, ReplayQueueLength } CheckQueues $WarningDBs $ExitOnIssue } If ($BadDBs.Count -eq 0 -and $WarningDBs.Count -eq 0) { Write-Host "`tAll databases are healthy, failover script will contiune" -Foregroundcolor Green } Else { Write-Host "`nSome database were found to have logs in their queue, but user has choosen to contiune with failover" -Foregroundcolor Yellow } } Function ReplicationHealth { Param ( [Bool]$ExitOnIssue=$False) Write-Host "`n`tChecking health of replication infrastructure..." -Foregroundcolor Green $FailedChecks = @() $DAG = Get-DatabaseAvailabilityGroup $DAGName # Write-Host "Server Count: " $DAG.Servers.Count ForEach ($Server in $DAG.Servers) { Write-Host "`tChecking: $Server " -Foregroundcolor Cyan $FailedChecks += Test-ReplicationHealth $Server | ? {$_.Result -NotLike "Passed"} } If ($FailedChecks.Count -gt $DAG.Servers.Count) { Write-Host "`n*** FAILED *** The following replication health checks have failed:" -Foregroundcolor Red $FailedChecks | ft -wrap If ($ExitOnIssue) {CloseScript} } Write-Host "`n`tReplication infrastructure is healthy" -Foregroundcolor Green } Function MoveResources { Param ($Failover,$extendedFailover,$PlannedFailover ) Write-Host "`nFailover: $Failover | Extended failover: $extendedFailover | Planned failover: $PlannedFailover" -Foregroundcolor Blue If ($Failover) { $TargetServers = $DRServers $TargetSite = $DRADSite $TargetPFDB = $DRPFDB $DNSRecords = $DRDNSRecords } Else { $TargetServers = $PrimaryServers $TargetSite = $PrimaryADSite $TargetPFDB = $PrimaryPFDB $DNSRecords = $PrimaryDNSRecords } $PrimaryServer = $TargetServers[0] Write-Host "`nMoving the PAM to [$PrimaryServer] " -ForegroundColor Green If ($MakeChanges) {cluster.exe group "Cluster Group" /moveto:$PrimaryServer} write-host "`nActivating all mailbox databases in the [$TargetSite] AD Site" -ForegroundColor Green # Do not attempt to move databases already active on the DR server If ($Failover -And !$extendedFailover) { If ($MakeChanges) {Get-MailboxDatabase | ? {$_.Server -ne $PrimaryServer} | Move-ActiveMailboxDatabase -ActivateOnServer $PrimaryServer -Confirm:$false} } ElseIf ($Failover -And ($extendedFailover -Or !$PlannedFailover)) { If (!$PlannedFailover) { $Caption = "Status of [$PrimaryADSite] site" $Message = "Are the Exchange servers in [$PrimaryADSite] on-line AND available from [$DRADSite]? If they are unavailable the script will not attempt to contact them." Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&On-line","&Unavailable\off-line" -Default 0) { 0 { $Unavailable = $True } 1 { $Unavailable = $False } } } If (!$Unavailable -Or $extendedFailover) { Write-Host "Stopping and removing the servers from the DAG in [$PrimaryADSite]." -ForegroundColor Green If ($MakeChanges) {Stop-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $PrimaryADSite -confirm:$false} } Else { Write-Host "Removing servers from the DAG in [$PrimaryADSite] in the AD only." -ForegroundColor Green If ($MakeChanges) {Stop-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $PrimaryADSite -confirm:$false -ConfigurationOnly} } } ElseIf ($extendedFailover -And !$Failover) { Write-Host "Starting DAG in [$PrimaryADSite]." -ForegroundColor Green If ($MakeChanges) {Start-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $PrimaryADSite -confirm:$false} } If ($extendedFailover) { ForEach ($Server in $TargetServers) { Write-Host "`nStopping the cluster service on $Server." -ForegroundColor Green If ($MakeChanges) { (new-Object System.ServiceProcess.ServiceController('ClusSvc',$Server)).Stop() (new-Object System.ServiceProcess.ServiceController('ClusSvc',$Server)).WaitForStatus('Stopped',(new-timespan -seconds 3)) } } Write-Host "`nRecovering the DAG in [$TargetSite] site." -ForegroundColor Green If ($MakeChanges) {Restore-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $TargetSite -confirm:$false} } If (!$Failover) { Write-Host "`nForcing AD replication, calling RepAdmin" -ForegroundColor Green repadmin /syncall /APe $PrimaryDC > $Null Write-Host "Preparing to activate databases in $TargetSite." -ForegroundColor Green &($EXscripts + "RedistributeActiveDatabases.ps1") -DagName $DAG -BalanceDbsByActivationPreference -Confirm:$false -LogEvents # Won't work in enviroment with a Journal copy with indexing disabled <# This code may be needed in some enviroments $DBs = Get-MailboxDatabase ForEach ($DB in $DBs) { $ActivationPreference = $DB.ActivationPreference | ?{$_.Value -eq 1} $TargetServer = $ActivationPreference.Key If ($DB.Server -ne $TargetServer) { Write-Host "`t Activating [$DB] on [$TargetServer]..." If ($MakeChanges) {Move-ActiveMailboxDatabase $DB -ActivateOnServer $TargetServer -confirm:$false} } Else { Write-Host "`tDatabase [$DB] is already activated on [$TargetServer], skipping" -ForegroundColor Green } } #> } Write-host "`nChanging Public Folders database to $TargetPFDB on mailboxes" -ForegroundColor Green If ($MakeChanges) {Get-MailboxDatabase | Set-MailboxDatabase -PublicFolderDatabase $TargetPFDB } If ($MakeChanges) {Get-PublicFolderDatabase | Mount-Database } # Should already be mounted, but just in case. Doesn't return any results if already mounted Write-host "Pausing for 5 seconds..." Sleep 5 Write-Host "`nChecking health post failover..." -ForegroundColor Green CheckDBHealth $False ReplicationHealth $False Write-Host "`nUpdating IP Addresses..." -ForegroundColor Green UpdateDNS $DNSRecords Write-Host "`nForcing AD replication, calling RepAdmin" -ForegroundColor Green repadmin /syncall /APe $PrimaryDC > $Null If ($SendConnector -ne "") { If ($Failover) {$ConnectorCost = 5} Else {$ConnectorCost = 1} Write-Host "`nUpdating Send Connector [$SendConnector] cost to [$ConnectorCost]..." -ForegroundColor Green If ($MakeChanges) {Set-SendConnector -AddressSpaces "SMTP:*;$ConnectorCost" -Identity $SendConnector} } write-host "`n`n`nFailover/back is complete`n" -ForegroundColor Blue } Function UpdateDNS { Param($HostNames) ForEach ($DNSServer in $DNSServers) { Write-Host "`nUpdating DNS Server [$DNSServer]" -ForegroundColor Cyan $iHost = 0 Do { $HostName = $HostNames[$iHost][0] $IPAddress = $HostNames[$iHost][1] Write-Host "`tChanging host entry [$HostName] to [$IPAddress]" -ForegroundColor Cyan If ($MakeChanges) { dnscmd.exe $DNSServer /recorddelete $DNSZone $HostName A /f > $Null dnscmd.exe $DNSServer /recordadd $DNSZone $HostName 300 A $IPAddress > $Null } $iHost = $iHost + 1 } While ($iHost -lt $HostNames.Count ) If ($MakeChanges) {dnscmd.exe $DNSServer /clearcache > $Null} } } Function CloseScript { stop-transcript Write-Host "Script has finished" Exit } CheckDBHealth $True $Caption = "Exchange 2010 failover script, please choose:" $Message = "-> Do you want to failover to DR [XO] or failback to Primary [COL]?" Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&DR/XO","&Primary/COL","&Abort" -Default 2) { 0 { Failover $Mode = "Failover" } 1 { Failback $Mode = "Failback" } 2 { CloseScript } } CloseScript