How the MTA Is Routed

Each recipient of a message is routed individually. The MTA obtains as much address information for the recipient as possible. If the recipient is local to the current MTA's site, local routing is accomplished by using the object's Home-MTA and Home-MDB attributes.

For non-local recipients, the following routing process determines which connector a message is sent through.

Route Selection

The Microsoft Exchange MTA compares the recipient address to the GWART to determine the group of connectors that the message can be delivered to. The search for a match in the GWART is done in the following order:

Distinguished name (DN) — The native Microsoft Exchange address format. This is only searched if a DN for the recipient has been found. An exact match on the enterprise and site of the DN is required.

Domain-defined attribute (DDA) — The format in which custom recipient addresses are stored, for example, Microsoft Mail and simple mail transport protocol (SMTP). An exact, wildcard, or partial match on the domain-defined attribute value (DDAV) is required with an exact or partial match on the domain-defined attribute type (DDAT). Wildcard matches are used in order of the exactness of the match (that is, with the exact match first, followed by the wildcard that has the most matching characters, and so on).

O/R address — Native X.400 address type. An exact or wildcard match on the address space is required. Each field is compared hierarchically with the contents of the GWART for that field until either a match is found or it is determined that no match is present, at which point this recipient is marked for a non-delivery receipt (NDR). The hierarchy is:

The final result of this search provides a group of connectors that can process the message.

The routing process is called from the following areas:

Connector Selection

After a list of suitable connectors has been determined, the MTA needs to determine which connector is the best one to use. This determination is based on the following criteria: the final group of connectors and load balancing.

Connector Grouping

Each connector has a Max Open Retry count and an Open Retry timer configured in the DSA. Site connectors use a site-wide configurable value. If the initial attempt to open an association fails, then a timer of length Open Retry is started for the association control block (ASB) handling the original association. After the timer begins another attempt to open an association, it is processed for that ASB. This process can be repeated Max Open Retry count times for each connector. After the maximum number of retry counts has been used on all available connectors, the message will NDR for the set of recipients for which that connector is to be used. The retry count for each connector is stored on the message in question and updated during rerouting.

If the message has not been rerouted in this MTA, exclude the connector that this message came in on, to avoid immediate loop detection.

Select the group of connectors where the message retry count is less than the Max Open Retry count for the connector.

Activation schedule: X.400 Connectors and Dynamic RAS Connectors follow an activation schedule that determines how connections are scheduled to the remote system.

There are four possible settings for the activation schedule:

Site Connectors, the Internet Mail Service, Microsoft Mail Connectors, and other XAPI gateways are always active.

In addition to the activation schedule, for purposes of routing, there are four activation states. They are, in order of preference:

These activation states are processed in the following order:

Choose a subgroup of connectors that are active now.

If there is no match, then if any connectors will become active, find the subgroup of connectors that will become active soonest (several may become active in five minutes).

If there is no match, then if any connectors are remote initiated, choose this subgroup of connectors.

If there is no match, then use the connectors that are never active.

Choosing the Subgroup of Connectors with the Lowest Open Retry Count

Within this subgroup, connectors that are not currently retrying (waiting on an Open Retry timer) are chosen. This approach, known as preemptive rerouting, is used so that the MTA does not attempt to route a message to a connector that is known to have not connected the last time it tried.

Suppose that you have a Site Connector and a Dynamic RAS Connector with a higher cost, and the LAN is down. A message is first routed to the Site Connector because it has a lower cost. When the connection fails because the LAN is down, the message is rerouted through the Dynamic RAS Connector. Although the cost is higher, this preemptive rerouting is more likely to achieve timely delivery of the message.

Any new message that comes in before the Open Retry timer has expired on the Site Connector will skip the Site Connector and be routed to the Dynamic RAS Connector first.

Within the group that meets the previous criteria, choose a subgroup of connectors with the lowest cost.

Within the group that meets the previous criteria, choose a subgroup of local connectors over remote connectors. Site Connectors are counted as local if they are not homed on any particular server; otherwise the Site Connector has an actual locality.

If a connector is remote (as indicated by the Home-MTA attribute), it means that the connector exists on a remote MTA within this site (that is, the message will be routed to that Microsoft Exchange MTA). The MTA selects local connectors first to avoid an extra hop to the remote MTA in the site. Otherwise, no connectors exist to service this address space and the message is marked for NDR for this recipient.

If a non-optimal route (that is, a route that does not have the lowest cost) is chosen because the lowest-cost connectors are down, and the message has not yet been rerouted in the MTA, the MTA performs a virtual reroute from the lowest-cost connector to the currently selected connector. This is necessary to prevent loop detection due to circular routes. (When non-optimal routes are chosen, it is possible, and sometimes desirable, for the message to travel circular routes before reaching the final destination.) The example below illustrates why this is necessary.

A<-----Cost 1------->B<-----Cost 1----->C<-----Cost 1----->D

A<-----Cost 10----->D

A message destined for MTA D will be routed in the following order: A->B->C. However, if the C->D connector is waiting on an Open Retry or is unavailable when the message arrives at C, the MTA on C will route the message to the higher-cost alternate route: C->B->A->D.

Without the virtual reroute at MTA C, the message would be loop-detected when it reentered MTA B. In this situation, it is assumed that the incoming connector will be filtered out of the initial routing attempt for a message (otherwise MTA B would route the message directly back to C due to cost considerations).

Load Balancing

After the final group of connectors has been chosen, the MTA load balances between them. Load balancing is accomplished by randomly choosing one of the connectors in the final group (as they all have equal cost) rather than calculating current queue size, message size, and so on.

After a connector has been routed for a message recipient, it is preferentially routed again when that connector is found in a subsequent recipient's final connector group. This prevents a message for multiple recipients from being routed to the same connector group (because one message has been split into several messages). The first connector chosen from a group is used for later recipients.

If a Site Connector is selected, the Microsoft Exchange MTA load balances between target MTAs through cost-weighted randomization. However, cost 0 target MTAs are always tried first, while cost 100 target MTAs are always tried last. Cost-weighted randomization occurs only if routing to all the cost 0 target MTAs has been attempted. All administrator-designated target MTAs are tried on that Site Connector before the message is rerouted to a different connector.

Fan-out

When all recipients have been routed either locally or to a remote connector, the original message is fanned out (that is, multiple copies are created, one for each distinct message destination). Each message has the responsibility attribute set for all recipients who have been routed to that destination.

When only a few recipients for a given destination allow for message database encoding format (MDBEF) messages, the message copy for a particular destination is split into two copies. However,this situation is not typical, because conversion on the sending MTA is usually decided based on the properties of the message and the destination MTA/connector, rather than on the per recipient properties.

Conversion Decision

When the MDBEF content type is used, only one address can be included for a message originator/recipient. Although the address can be the DN address or the O/R address, it is usually the DN address. For this reason, DN access for the originator and all recipients is required during conversion. That way, the O/R addresses required by P2 can be obtained. The decision to convert from MDBEF to P2 can only be delayed until the final replicated Microsoft Exchange site (where the MTA can always obtain access to the objects) is referenced by DNs in the message. The decision to convert must be made as soon as possible, to ensure that all information required for the conversion is available.

The main conversion decision process is based on the content type of a message and the destination (either the local MTA or the recipient or remote MTA/connector). As such, it is per-message. However, even if a message is of the MDBEF content type and the chosen connector allows for this content type, an individual recipient may not allow MDBEF.

Per-recipient checking is completed during routing to verify whether a particular recipient can support MDBEF. If a recipient is a Microsoft Exchange recipient, it is assumed to support MDBEF. A recipient is assumed to be a Microsoft Exchange recipient if it supports MDBEF and if:

If a recipient is not known to support MDBEF, the message is converted by the conversion decision routine during fan-out. However, the following exceptions override the per-recipient MDBEF determined during routing:

Result Processing

The result thread receives results for a message indicating:

If transfer-out failed, the message is rerouted. If delivery failed, an attempt is made to reassign the message (that is, alternate recipient handling will occur).

If the message is not rerouted or reassigned, checks are made to determine whether a report is required for the recipients. The report is sent only after all responsible recipients have been delivered or transferred successfully, or the MTA is unable to deliver a subset of recipients for some reason. The one exception is that a report is sent immediately for any completely invalid recipients that cannot be routed.

Rerouting and Retries

If a message fails to be sent through its chosen connector, the MTA attempts to reroute the message.

However, there are exceptions:

During rerouting, only the O/R address of the first responsible recipient is used to reroute the message. This is not an optimal solution, because some message recipients may end up traveling a longer path than if they were rerouted individually. However, this reduces the code complexity and processing time required. There are no loop-detect issues, because the rerouting action resets the X.400 loop-detect algorithms.

Every connector has a retry count and a retry interval that determine how many times, and at what intervals the MTA will try to send a message through each of the connectors. The Site Connector and the Dynamic RAS Connector default to a 10-minute retry interval and a maximum count of 144 retries. On each fan-out message, a copy of the message is stored as a retry count for each connector and target MTA that has been tried.

When a message is routed to a connector that is inaccessible, the retry count for that connector is incremented on the message, and the message is immediately rerouted using the routing and selection process outlined earlier. After all connectors have been tried and the maximum retry count has been encountered, the message will NDR.

For each connector that is tried, if the openconnection fails it will (independently of the message) start an open retry timer to retry the open.

After a message is successfully rerouted, the external and internal trace information for the message is updated to indicate that a reroute has been performed.

Recipient Reassignment

Microsoft Exchange Server does not expose some X.400 alternate-recipient functions, although the Microsoft Exchange MTA supports all these functions. These functions are:

The redirect-and-deliver function is specific to Microsoft Exchange. If this flag is set, a copy of the message is delivered to the recipient and to the specified alternate recipient. To maintain X.400 conformance, a delivery receipt is sent only for the copy delivered to the alternate recipient.

Report Handling

Reports are internally generated in three areas:

Reports can also be received from remote systems. The routing process for reports is the same as for messages and probes, regardless of the report's origin, except when the report is destined for a DL.

When the report destination is a DL, the MTA backtracks through Originator_ and DL_Expansion_History to find the originator of the message. Each O/R name found by this recursive process in turn becomes the next report destination and is then passed back through the routing process. If the previous entry in this list was a distribution list, the process is repeated until the actual originator is found.

Loop Detection

Loop detection in Microsoft Exchange uses standard X.400 internal and external trace information. It also uses Microsoft Exchange-specific information contained in the per-domain bilateral information of a message or the additional information of a report. This information is used so that multinational enterprises can be supported without undue configuration complexity. In this way, X.400 loop detection triggering can be avoided.

External trace information documents the actions taken on a message, probe, or report by each MD that the message passes through. Each MD the message enters indicates whether the message was relayed or rerouted, plus any Other_Actions (such as redirection or DL expansion) performed by that MD. If the message enters the same MD twice without a reroute, redirection, or DL expansion, the message will be loop detected and will NDR.

Internal trace information is maintained for messages that are routed within an MD. Each MTA the message enters indicates whether the message was relayed or rerouted, plus any Other_Actions (such as redirection or DL expansion) performed by that MTA. If the message enters the same MTA twice without a reroute, redirection, or DL expansion, the message will be loop-detected and will NDR. Note that the internal trace information is removed from a message when it is transferred out of an MD.

An X.400 MD is uniquely defined by the c, a, and p components of the O/R address space. These fields are collectively termed the MD's global domain identifier (GDI). Because Microsoft Exchange uniquely identifies sites by using the c, a, p, and o components of the O/R address space, it is possible for a message to traverse a site that has the same c, a, and p values as a different site traversed earlier. This would provoke loop detection on the message in a normal X.400 system. To prevent this, each Microsoft Exchange site adds Microsoft Exchange-specific per-domain bilateral information to messages and additional information to reports. This information contains a site DN for the site being traversed.

If a loop is detected in the external trace information, the current site DN is searched for in the Microsoft Exchange-specific information. If the current site DN is not found, it is not a real loop, and therefore X.400 loop detection is suppressed.