Subscription Server Recovery

A subscription server is the easiest type of server to recover. At the beginning of each scheduled distribution event, the distribution server connects to the subscription server and finds the value of the highest transaction that has been applied. This value is stored in the job_id column of the MSlast_job_info table.

If a subscription server is unavailable for an extended period of time, all replicated transactions are held in the distribution server's distribution database. When the subscription server is again available on the network, distribution of replicated transactions continues at the point where it last stopped. This allows a subscription server to automatically recover and receive all replicated transactions after situations such as a loss of power, loss of network connection, or minor hardware failure. In many cases, a subscription server can also recover from a major failure.

If a major subscription server failure occurs and the subscribing database must be recovered from a backup tape, it is still possible for the subscription server to automatically recover and receive all replicated transactions. Not only are all nondistributed transactions retained within the distribution server's distribution database, but distributed transactions are also retained for a configurable period of time after they have been sent to the subscription server. After a previous image of the subscription server is restored from backup, if the job_id column in the MSlast_job_info points to a valid transaction that is still available on the distribution server, all replicated transactions with job_id greater than this entry will be automatically distributed to the subscription server.

To ensure the best chance of this occurring automatically, immediately after the subscription server fails, disable the scheduled cleanup task associated with that subscription server until restoration is accomplished. This will keep the system from aging retained transactions off of the distribution server before they can be reapplied.

There is a potential problem to be aware of when a subscription server is recovered from an earlier backup tape. Because any subscription server need not receive all replicated transactions from a publication server, it is a perfectly valid condition for the subscription server to contain a job_id in the MSlast_job_info table that is much smaller than the next job_id that needs to be passed to this server. For this reason, the distribution server will not declare an error condition if the subscription server contains an old job_id.

In this situation, the distribution server will send all stored transactions to the subscription server, but a logical gap can exist of transactions that had previously been sent to the subscriber removed from the distribution database but were not preserved on the backup used to recover the subscription server. Should this condition exist, an information message will be produced by the distribution process. All replicated changes can be recovered by unsubscribing and then resubscribing to the affected publications.

When the distribution task runs and encounters a missing job, it raises a warning. Optionally, you can run the distribution task so that a missing job will raise a fatal error. This is set by running the distribution task with the -m missingjobsfailure option, where missingjobsfailure is 0 or 1. If 1, a missing job will raise an error with a severity of 20 (fatal). If 0, a missing job will raise an error with a severity of 10 (informational). The default is 0. For more information see Chapter 16, Scheduling Tasks.