Script: Site-Failover.ps1 – Exchange 2010 failover script with health checks


2/25 Update: Add better support for updating Send Connectors. Script now supports an array of Send Connects with a normal cost and DR cost, to be used when normal route may not work. Code below has NOT been updated. Get the latest version from: http://izzy.org/Scripts/Exchange/Admin/Failover.ps1

I created this script several months ago for a client and it took about 18 hours of time to create and fully test initially. I recently made a few modifications to it and decided to post it today.

This script is designed to do a partial failover, where DAG membership is not changed, and a full failover that does remove DAG members. It supports both planned and unplanned failovers, where the servers are off-line. It will check the health of database replication and the replication infrastructure, using the Test-ReplicationHealth cmdlet. If too many logs, 5 by default, are found in the CopyQueueLength it will wait five seconds and check the queue length again. If the queues have not decreased it will prompt the user if they want to wait and check again or exit. If any CopyQueueLength are higher than 50, by default, it will abort so the user can figure out why. This is in addition to moving database, updating DNS, AD, etc.

Key things done by the script

  1. Check health of DBs
  2. Check CopyQueueLength
  3. Check replication infrastucture health
  4. Planned and unplanned (-ConfigurationOnly switch used) failovers
  5. Temporary (DAG left intack) and extneded failovers (DAG members removed)
  6. Move databases to DR\secondary site
  7. Update Public Folder datasebase value on Mailbox databases
  8. Update DNS for records that need to be changed to point to DR site
  9. Update Send Connector costs
  10. Forcing AD replication
  11. Calls RedistributeActiveDatabases.ps1 when failing back, code included for enviroments where this script won’t work
  12. Users PowerShell Transcript function to log all actions to a file
  13. Log only mode ($MakeChanges=$False/$True)

Due to the complexy and many factors this script will probally be updated many times in the future and will need to be customized for your enviroment. Currently it doesn’t support dynamically discovery servers and Public Folder databases, so you will need to update AD Site, Server names, IP addressess, Public Folder databases, and other settings\names at the top of the script at the minimum.

Screen shot of it running, in a small enviroment:
Failover-Script
Source: http://izzy.org/Scripts/Exchange/Admin/Failover.ps1

# Exchange 2010 site failover script w/ health checks
# Created by Jason Sherry | izzy@izzy.org | http://jasonsherry.org
# Created 9/21/2012, Last Updated 2/21/2013
#		2/20: Added $MakeChanges switch, commented out code required for Journal mailbox databases (where Index state always = Failed)
#		2/21: Minor clean-up

# Source: http://izzy.org/Scripts/Exchange/Admin/Failover.ps1

#TO DOs
# - Add option to not exit script, when doing post failover checks when an issue is found
# - Checks for ReplayQueueLength & ContentIndexState
# - Add logging to file (In addition to what has been added below that will show up in transcript file)
# - Get server names, instread of having a fixed number when doing cluster calls
# - Add index repair option
# - Get PF DB names, instead of hard coding
# - Add error handling, Try\Trap

$ErrorActionPreference = "SilentlyContinue"
$MakeChanges = $False

#Replication Limits
$CopyQueueLengthWarn = 5
$CopyQueueLengthMax = 50
$ReplayQueueLengthWarn = 5
$ReplayQueueLengthMax = 50

#AD & DAG Info
$PrimaryADSite = "COL"
$DRADSite = "DR"
$DAGName = "DAG01"
## TO DO: Replace with code to get dynamic list of servers
$PrimaryServers = @("COLMBX01","COLMBX02","COLMBX03") # Array of server names, 1st server listed will be the PAM
$DRServers = @("DRMBX01","DRMBX02")
$PrimaryPFDB = "Public Folders COL02"
$DRPFDB = "Public Folders DR02"
$PrimaryDC = "colcorpdc01.corp.company.com"

#File Witness Information
$PrimaryWS = "COLHYPERV01"
$PrimaryWD = "E:\DAG-FSW\DAG01"
$AltWS = "DRHYPERV01"
$AltWD = "E:\DAG-FSW\DAG01"

#DNS Values to update
$DNSZone = "company.com"
$DNSServers = @("10.10.24.4","10.10.16.41","10.10.34.3","10.10.32.8") # Multiples can be listed to help reduce latnecy due to AD replication
$PrimaryDNSRecords = @(("mail","10.10.24.33"),("COLRPCmail","10.10.24.33"),("um","10.10.24.30"))
$DRDNSRecords = @(("mail","10.10.12.30"),("COLRPCmail","10.10.12.30"),("um","10.10.12.30"))

$SendConnector = ""

Start-Transcript -Path "Failover.log" -Append -NoClobber
clear

If (!$MakeChanges) {Write-Host `n'Note: Script in logging mode only, $MakeChanges=$False'`n -ForegroundColor Yellow}

Function Failover {
	Write-Host "`n`nWARNING: You are about to fail or move Exchange to the [$DRADSite] site!" -ForegroundColor Red

	$Caption = "Preparing to failover Exchange, please choose:"
	$Message = "-> Is this a planned or unplanned failover?"
	Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Planned", "&Unplanned","&Abort failover" -Default 0) {
		0 { $PlannedFailover = $True
			$extendedFailover = $True }
		1 { $PlannedFailover = $False }
		2 { CloseScript }
	 }

	CheckDBHealth $PlannedFailover
	ReplicationHealth $PlannedFailover

	If ($PlannedFailover) {
		$Caption = "Will this be a extended failover, please choose:"
		$Message = "-> In you contiune with an extended failover, the servers in the [$PrimaryADSite] site will be removed from the DAG and not kept up to date."
		Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Yes","&No","&Abort" -Default 1) {
			0 { $extendedFailover = $True }
			1 { $extendedFailover = $False }
			2 { CloseScript }
		}
	}

	$DAG = Get-DatabaseAvailabilityGroup

	If (!$extendedFailover) {
		$Caption = "Change File Share Witness to [$AltWS], please choose:"
		$Message = "-> If the current FSW server [" + $dag.WitnessServer.HostName + "] will be off-line it should be changed."
		Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Yes","&No","&Abort" -Default 0) {
			0 {
				Write-Host "`n`tChanging WitnessServer to [$AltWS] and WitnessDirectory to [$AltWD]"
				If ($MakeChanges) {Set-DatabaseAvailabilityGroup -Identity $DAGName -WitnessServer $AltWS -WitnessDirectory $AltWD}
			}
			1 {}
			2 { CloseScript }
		 }
	}
	MoveResources $True $extendedFailover $PlannedFailover
}

Function Failback {
	Write-Host "`n`nYou are about to fail Exchange back to [$PrimaryADSite]. This will move resources back to this site and recover the DAG" -ForegroundColor Green

	$Caption = "Preparing to fail back Exchange, please choose:"
	$Message = "-> What this a planned or unplanned failover?"
	Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Planned", "&Unplanned" -Default 0) {
		0 { $PlannedFailover = $True
			$Caption = "Extended failover, please choose:"
			$Message = "-> What this an extened failover, was the [$PrimaryADSite] site removed from the DAG?"
			Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Removed for extended failover", "&Not removed" -Default 1) {
				0 { $extendedFailover = $True }
				1 { $extendedFailover = $Flase }
			 }
		}
		1 { $PlannedFailover = $False
			$extendedFailover = $True }
	 }

	CheckDBHealth $PlannedFailover
	ReplicationHealth $PlannedFailover

	$DAG = Get-DatabaseAvailabilityGroup
	$CurrentFSW = $dag.WitnessServer
	If ($CurrentFSW -ne $PrimaryWS -and $CurrentFSW -NotContains $PrimaryWS) {
		Write-Host "`nChanging witness server settings to : [$PrimaryWS] & : [$PrimaryWD], current server: [$CurrentFSW]" -ForegroundColor Cyan
		If ($MakeChanges) {Set-DatabaseAvailabilityGroup -Identity $DAGName -WitnessServer $PrimaryWS -WitnessDirectory $PrimaryWD}
	}
	MoveResources $False $extendedFailover $PlannedFailover
}

Function InputPrompt {
#From: http://blogs.technet.com/b/jamesone/archive/2009/06/24/how-to-get-user-input-more-nicely-in-powershell.aspx
Param(   [String[]]$choiceList,
         [String]$Caption="Please make a selection",
         [String]$Message="Choices are presented below",
         [int]$default=0  )
   $choicedesc = New-Object System.Collections.ObjectModel.Collection[System.Management.Automation.Host.ChoiceDescription]
   $choiceList | foreach  { $choicedesc.Add((New-Object "System.Management.Automation.Host.ChoiceDescription" -ArgumentList $_))}
   $Host.ui.PromptForChoice($caption, $message, $choicedesc, $default)
}

Function CheckQueues {
Param ([array]$Databases, [Bool]$ExitOnIssue=$False)
	ForEach ($Database in $Databases) {
		$CopyQueueLength1 = $Database.CopyQueueLength
		$DatabaseName = $Database.Name
		write-host "`t`t`nRefreshing in 5 seconds`n" -Foregroundcolor Yellow
		start-sleep -s 5
		$SecondStatus = Get-MailboxDatabaseCopyStatus $Database.Name
		$CopyQueueLength2 = $SecondStatus.CopyQueueLength
		If ($CopyQueueLength2 -ge $CopyQueueLength1 -or $CopyQueueLength2 -gt $CopyQueueLengthWarn ) {
			$Caption = "Database [$DatabaseName] CopyQueueLength is not decreasing or still > $CopyQueueLengthWarn : `n`tPrevious value: [$CopyQueueLength1], current value [$CopyQueueLength2]"
			$Message = ""
			Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&Wait another 5 seconds","&Contiune failover","&Abort failover" -Default 0) {
				0 {CheckQueues $Database $ExitOnIssue}
				1 {Return}
				2 {If ($ExitOnIssue) {CloseScript}}  }
		}
	}
}

Function CheckDBHealth {
	Param ( [Bool]$ExitOnIssue=$False)
	Write-Host "`n`tChecking health of databases..." -Foregroundcolor Green
	$WarningDBs = @()
	$BadDBs = @()

	$DBsCopyStatus = Get-MailboxDatabase | Get-MailboxDatabaseCopyStatus
	## Bug\issue with formating that throws an error when doing the line below sometimes| ft
	Try {
		$DBsCopyStatus | Sort-Object Status,Name | ft name, status, @{label="Copy Q";expression={$_.CopyQueueLength}}, @{label="Reply Q";expression={$_.ReplayQueueLength}}, @{label="Index";expression={$_.contentIndexState}}, LastInspectedLogTime -Auto
	}
	Catch {
		Write-Host "Hit a stupid PowerShell formatting bug, unable to display status, run: `n`t[Get-MailboxDatabase | Get-MailboxDatabaseCopyStatus]" -Foregroundcolor Magenta
	}
#	Get-MailboxDatabase -ea Continue | Get-MailboxDatabaseCopyStatus | Sort-Object Status,Name | ft name, status, @{label="Copy Q";expression={$_.CopyQueueLength}}, @{label="Reply Q";expression={$_.ReplayQueueLength}}, @{label="Index";expression={$_.contentIndexState}}, LastInspectedLogTime -Auto

	ForEach ($DBCopyStatus in $DBsCopyStatus) {
		If ($DBCopyStatus.Status -ne "Healthy" -And $DBCopyStatus.Status -ne "Mounted") {
			$BadDBs += $DBCopyStatus }
	}
	If 	($BadDBs.Count -gt 0) {
		Write-Host "The following database(s) are not in a healthy state, script will exit:" -Foregroundcolor Yellow
		ForEach ($DB in $BadDBs) {
			$DB | select  Name, status, CopyQueueLength, ReplayQueueLength
		}
		If ($Mode = "Failover") {CloseScript}
	}
	$BadDBs = @()
	ForEach ($DBCopyStatus in $DBsCopyStatus) {
		If ($DBCopyStatus.CopyQueueLength -ge $CopyQueueLengthWarn -and $DBCopyStatus.CopyQueueLength -lt $CopyQueueLengthMax) {
			$WarningDBs += $DBCopyStatus }
		If ($DBCopyStatus.CopyQueueLength -ge $CopyQueueLengthMax) {
			$BadDBs += $DBCopyStatus }
	}
	If 	($BadDBs.Count -gt 0) {
		Write-Host "`n*** FAILED *** The following database(s) have too large of a CopyQueueLength ( > $CopyQueueLengthMax ), the script will now exit:" -Foregroundcolor Red
		ForEach ($DB in $BadDBs) {
			$DB | select  Name, CopyQueueLength, ReplayQueueLength
		}
		CloseScript
	}
	If 	($WarningDBs.Count -gt 0) {
		Write-Host "The following database(s) have a CopyQueueLength > $CopyQueueLengthWarn " -Foregroundcolor Yellow
		ForEach ($DB in $WarningDBs) {
			$DB | select  Name, CopyQueueLength, ReplayQueueLength
		}
		CheckQueues $WarningDBs $ExitOnIssue
	}
	If ($BadDBs.Count -eq 0 -and $WarningDBs.Count -eq 0) {
		Write-Host "`tAll databases are healthy, failover script will contiune" -Foregroundcolor Green }
	Else {
		Write-Host "`nSome database were found to have logs in their queue, but user has choosen to contiune with failover" -Foregroundcolor Yellow	}
}

Function ReplicationHealth {
	Param ( [Bool]$ExitOnIssue=$False)
	Write-Host "`n`tChecking health of replication infrastructure..." -Foregroundcolor Green
	$FailedChecks = @()
	$DAG = Get-DatabaseAvailabilityGroup $DAGName
#	Write-Host "Server Count: "  $DAG.Servers.Count
	ForEach ($Server in $DAG.Servers) {
		Write-Host "`tChecking: $Server " -Foregroundcolor Cyan
		$FailedChecks += Test-ReplicationHealth $Server | ? {$_.Result -NotLike "Passed"}
	}
		If 	($FailedChecks.Count -gt  $DAG.Servers.Count) {
			Write-Host "`n*** FAILED *** The following replication health checks have failed:" -Foregroundcolor Red
			$FailedChecks | ft -wrap
			If ($ExitOnIssue) {CloseScript}
		}
	Write-Host "`n`tReplication infrastructure is healthy" -Foregroundcolor Green
}

Function MoveResources {
Param ($Failover,$extendedFailover,$PlannedFailover )
	Write-Host "`nFailover: $Failover | Extended failover: $extendedFailover | Planned failover: $PlannedFailover" -Foregroundcolor Blue

	If ($Failover) {
		$TargetServers = $DRServers
		$TargetSite = $DRADSite
		$TargetPFDB = $DRPFDB
		$DNSRecords = $DRDNSRecords
	}
	Else {
		$TargetServers = $PrimaryServers
		$TargetSite = $PrimaryADSite
		$TargetPFDB = $PrimaryPFDB
		$DNSRecords = $PrimaryDNSRecords
	}
	$PrimaryServer = $TargetServers[0]
	Write-Host "`nMoving the PAM to [$PrimaryServer] " -ForegroundColor Green
	If ($MakeChanges) {cluster.exe group "Cluster Group" /moveto:$PrimaryServer}
	write-host "`nActivating all mailbox databases in the [$TargetSite] AD Site" -ForegroundColor Green

	# Do not attempt to move databases already active on the DR server
	If ($Failover -And !$extendedFailover) {
		If ($MakeChanges) {Get-MailboxDatabase |  ? {$_.Server -ne $PrimaryServer} | Move-ActiveMailboxDatabase -ActivateOnServer $PrimaryServer -Confirm:$false}
	}
	ElseIf ($Failover -And ($extendedFailover -Or !$PlannedFailover)) {
		If (!$PlannedFailover) {
			$Caption = "Status of [$PrimaryADSite] site"
			$Message = "Are the Exchange servers in [$PrimaryADSite] on-line AND available from [$DRADSite]? If they are unavailable the script will not attempt to contact them."
			Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&On-line","&Unavailable\off-line" -Default 0) {
				0 { $Unavailable = $True }
				1 { $Unavailable = $False }
			}
		}
		If (!$Unavailable -Or $extendedFailover) {
			Write-Host "Stopping and removing the servers from the DAG in [$PrimaryADSite]." -ForegroundColor Green
			If ($MakeChanges) {Stop-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $PrimaryADSite -confirm:$false}
		}
		Else {
			Write-Host "Removing servers from the DAG in [$PrimaryADSite] in the AD only." -ForegroundColor Green
			If ($MakeChanges) {Stop-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $PrimaryADSite -confirm:$false -ConfigurationOnly}
		}

	}
	ElseIf ($extendedFailover -And !$Failover) {
		Write-Host "Starting DAG in [$PrimaryADSite]." -ForegroundColor Green
		If ($MakeChanges) {Start-DatabaseAvailabilityGroup $DAGName -ActiveDirectorySite $PrimaryADSite -confirm:$false}
	}

	If ($extendedFailover) {
		ForEach ($Server in $TargetServers) {
			Write-Host "`nStopping the cluster service on $Server." -ForegroundColor Green
			If ($MakeChanges) {
				(new-Object System.ServiceProcess.ServiceController('ClusSvc',$Server)).Stop()
				(new-Object System.ServiceProcess.ServiceController('ClusSvc',$Server)).WaitForStatus('Stopped',(new-timespan -seconds 3))
			}
		}
		Write-Host "`nRecovering the DAG in [$TargetSite] site." -ForegroundColor Green
		If ($MakeChanges) {Restore-DatabaseAvailabilityGroup $DAGName  -ActiveDirectorySite $TargetSite  -confirm:$false}
	}

	If (!$Failover) {
		Write-Host "`nForcing AD replication, calling RepAdmin" -ForegroundColor Green
		repadmin /syncall /APe $PrimaryDC > $Null
		Write-Host "Preparing to activate databases in $TargetSite." -ForegroundColor Green

		&($EXscripts + "RedistributeActiveDatabases.ps1") -DagName $DAG -BalanceDbsByActivationPreference -Confirm:$false -LogEvents # Won't work in enviroment with a Journal copy with indexing disabled
<# This code may be needed in some enviroments
		$DBs = Get-MailboxDatabase
		ForEach ($DB in $DBs) {
			$ActivationPreference = $DB.ActivationPreference | ?{$_.Value -eq 1}
			$TargetServer = $ActivationPreference.Key
			If ($DB.Server -ne $TargetServer) {
				Write-Host "`t Activating [$DB] on [$TargetServer]..."
				If ($MakeChanges) {Move-ActiveMailboxDatabase $DB -ActivateOnServer $TargetServer -confirm:$false}
			}
			Else {
				Write-Host "`tDatabase [$DB] is already activated on [$TargetServer], skipping" -ForegroundColor Green
			}
		}
#>
}

	Write-host "`nChanging Public Folders database to $TargetPFDB on mailboxes" -ForegroundColor Green
	If ($MakeChanges) {Get-MailboxDatabase | Set-MailboxDatabase -PublicFolderDatabase $TargetPFDB	}
	If ($MakeChanges) {Get-PublicFolderDatabase | Mount-Database } # Should already be mounted, but just in case. Doesn't return any results if already mounted

	Write-host "Pausing for 5 seconds..."
	Sleep 5
	Write-Host "`nChecking health post failover..."  -ForegroundColor Green
	CheckDBHealth $False
	ReplicationHealth $False

	Write-Host "`nUpdating IP Addresses..." -ForegroundColor Green
	UpdateDNS $DNSRecords
	Write-Host "`nForcing AD replication, calling RepAdmin" -ForegroundColor Green
	repadmin /syncall /APe $PrimaryDC > $Null

	If ($SendConnector -ne "") {
		If ($Failover) {$ConnectorCost = 5} Else {$ConnectorCost = 1}
		Write-Host "`nUpdating Send Connector [$SendConnector] cost to [$ConnectorCost]..." -ForegroundColor Green
		If ($MakeChanges) {Set-SendConnector -AddressSpaces "SMTP:*;$ConnectorCost" -Identity $SendConnector}
	}

	write-host "`n`n`nFailover/back is complete`n" -ForegroundColor Blue
}

Function UpdateDNS {
    Param($HostNames)

	ForEach ($DNSServer in $DNSServers) {
		Write-Host "`nUpdating DNS Server [$DNSServer]" -ForegroundColor Cyan
		$iHost = 0
		Do {
			$HostName = $HostNames[$iHost][0]
			$IPAddress = $HostNames[$iHost][1]
			Write-Host "`tChanging host entry [$HostName] to [$IPAddress]" -ForegroundColor Cyan
			If ($MakeChanges) {
				dnscmd.exe $DNSServer /recorddelete $DNSZone $HostName A /f > $Null
				dnscmd.exe $DNSServer /recordadd $DNSZone $HostName 300 A $IPAddress > $Null
			}
			$iHost = $iHost + 1 }
		While ($iHost -lt $HostNames.Count )
		If ($MakeChanges) {dnscmd.exe $DNSServer /clearcache > $Null}
	}
}

Function CloseScript {
	stop-transcript
	Write-Host "Script has finished"
	Exit
}
CheckDBHealth $True
$Caption = "Exchange 2010 failover script, please choose:"
$Message = "-> Do you want to failover to DR [XO] or failback to Primary [COL]?"
Switch (InputPrompt -Caption $Caption -Message $Message -Choice "&DR/XO","&Primary/COL","&Abort" -Default 2) {
	0 {
		Failover
		$Mode = "Failover"
	}
	1 {
		Failback
		$Mode = "Failback"
	}
	2 { CloseScript }
}
CloseScript

About Jason Sherry

I am a ~30 year Exchange consultant and expert. I currently work for Commvault as a Solutions Specialist for Microsoft Infrastructure For more info see my resume at: http://resume.jasonsherry.org
This entry was posted in Exchange, Script and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s