Physical Design and Replication Topologies

In general, replication can improve the availability of data. Replication is the vehicle for data dissemination where discrete sites in the enterprise require access to data locally. It can also provide roll-up services to fuel growing data aggregation and warehouse needs where data is spread throughout the corporate business domain.

Data Dissemination

Replication can improve the availability of data by locating all elements of a data set that are likely to be accessed by each site in the enterprise on a local system.

The Microsoft SQL Server approach to transactional replication ensures data integrity by ensuring a strict ordering of transactions. At the same time, it provides maximum autonomy by discouraging multi-site updates of individual rows and allowing latent distribution of transactions using a store-and-forward approach.

Departmental databases sometimes need rapid access to data from other departmental databases. Replication allows each database to partition its services according to function and yet provides rapid access to reference data needed for processing. For example, in a manufacturing environment, summary tables of current production levels and tolerances by day can be replicated to an executive information system that does heavy reporting against the data and associated operations data without the manufacturing department production server incurring the load.

Data Roll-up

Replication can also be used to aggregate or "roll up" data from remote locations or different databases. Replication is often a key enabling technology in data warehousing, developing decision support databases, and other On-Line Analytical Processing (OLAP) functions.

Warehousing

Data warehousing is a key consumer of replication. A central site is used to aggregate data from a variety of remote sites that may or may not share precisely matching schemas for each table to be aggregated.

Where schema is the same throughout the enterprise, each feeder site can be established as a publisher and the central warehouse can subscribe to the data from each site. In such an environment, transactional replication must be used and scheduled table refresh is not supported. Synchronization jobs cannot be used because these drop existing tables at the subscriber and refresh them with the current schema and data set from the publisher. As a result, you must rely on a more manual approach to initial synchronization in your enterprise.

Enterprise Synchronization and Aggregating Partial Replicas

The standard replication synchronization option does not always work when published articles have differing schema. Therefore, it can be necessary to create views as custom objects for use in synchronization. These views are then sent to the subscriber instead of base table data.

You must include custom filter or synchronization objects in the definition of your publications before you can add subscriptions to them. Once subscriptions exist on a publication, you cannot change the method of synchronization without first dropping all subscriptions.

When you create a view to use as a custom synchronization object, you must select the object as the synchronization object in the Advanced Options dialog box of the Manage Articles window. Horizontal and vertical partitioning automatically generate custom views to be used as the synchronization object for articles published using partitioning. As a result, it is rarely necessary to create custom synchronization objects.

Locating the Distributor

Roll-up scenarios usually use more publishers of data than subscribers. Publishers can be created with their own local distribution database or with a shared distribution database. The distribution database located on the same SQL Server as the publishing database is called a distribution publisher or distpublisher.

The log reader receives commands from the publishing database and puts them into the distribution database without generating network traffic. All reads and writes during the log reading process are performed locally at the publisher. A local distributor is sufficient for most applications.

By contrast, a server can be established exclusively as the distributor. When a remote distributor is used, the log reader connects to the local server to read the published database log and then establishes a remote connection to the distributor to write items marked for replication to the distribution database. The publisher server is not taxed with distribution to many subscribers because the final distribution processing is off-loaded to a dedicated server. This can be useful when the publisher is taxed by transactions from its local application and large numbers of subscribers require distribution.

Replicating on a Network

When all servers in an enterprise are on the same high-speed network, replication flow can occur continuously. When a subscriber is on the same network as the publisher, the two computers are well-connected for the purposes of replicating data. Continuous distribution of commands marked for replication is desirable so the degree of latency (delay between the time a transaction is committed on the publishing database and applied to the subscribing database) is minimized.

Geographic distribution of business processes and regional can influence the decision to implement replication as part of the business systems solution. When distance and cost prohibit maintaining all subscribers and publishers on the same network, the enterprise is not well-connected. Replication can work using a slow-speed, dial-up connection, but delaying distribution of commands can incur latency in updating all sites and a corresponding limit to the transaction rates the application can support.

If your application is constantly being updated and producing more transactions than can be applied during a limited dial-up connection, you will find the data becoming out of date at subscriber locations. Where timely synchronization is key and overall data volumes are reasonably small, a complete, periodic synchronization of the subscriber with the publisher is often sufficient. This is particularly useful in a data dissemination environment where not all subscribers need all published data, and custom filters and synchronization objects can be used to reduce the amount of data transferred to a subscriber.