Disconnected Endpoints - Race Conditions - States, Failover and Race Conditions

4. States, Failover and Race Conditions

4.4 Race Conditions

4.4.7 Disconnected Endpoints

In addition to the restart procedure, gateways also have a

"disconnected" procedure, which MUST be initiated when an endpoint becomes "disconnected" as described in Section 4.3. It should here be noted, that endpoints can only become disconnected when they

attempt to communicate with the Call Agent. The following steps MUST be followed by an endpoint that becomes "disconnected":

1. A "disconnected" timer is initialized to a random value, uniformly distributed between 1 and a provisionable "disconnected" initial waiting delay (Tdinit), e.g., 15 seconds. Care MUST be taken to avoid synchronicity of the random number generation between

multiple gateways and endpoints that would use the same algorithm.

2. The gateway then waits for either the end of this timer, the reception of a command for the endpoint from the Call Agent, or the detection of a local user activity for the endpoint, such as for example an off-hook transition.

3. When the "disconnected" timer elapses for the endpoint, when a command is received for the endpoint, or when local user activity is detected for the endpoint, the gateway initiates the

"disconnected" procedure for the endpoint - if a disconnected procedure was already in progress for the endpoint, it is simply replaced by the new one. Furthermore, in the case of local user activity, a provisionable "disconnected" minimum waiting delay (Tdmin) MUST have elapsed since the endpoint became disconnected or the last time it ended the "disconnected" procedure in order to limit the rate at which the procedure is performed. If Tdmin has not passed, the endpoint simply proceeds to step 2 again, without affecting any disconnected procedure already in progress.

4. If the "disconnected" procedure still left the endpoint

disconnected, the "disconnected" timer is then doubled, subject to a provisionable "disconnected" maximum waiting delay (Tdmax), e.g., 600 seconds, and the gateway proceeds with step 2 again (using a new transaction-id).

The "disconnected" procedure is similar to the restart procedure in that it simply states that the endpoint MUST send a RestartInProgress command to the Call Agent informing it that the endpoint was

disconnected. Furthermore, the endpoint MUST guarantee that the first non-audit message (non-audit command or response to non-audit command) that the Call Agent sees from this endpoint MUST inform the Call Agent that the endpoint is disconnected (unless the endpoint goes out-of-service). When a command (C) is received, this is achieved by sending a piggy-backed datagram with a "disconnected"

RestartInProgress command and the response to command C to the source address of command C as opposed to the current "notified entity".

This piggy-backed RestartInProgress is not automatically

retransmitted by the endpoint but simply relies on fate-sharing with the piggy-backed response to guarantee the in-order delivery

requirement. The Call Agent still sends a response to the backed RestartInProgress, however, as usual, the response may be lost. In addition to the piggy-backed RestartInProgress command, a new "disconnected" procedure is triggered by the command received.

This will lead to a non piggy-backed copy (i.e., same transaction) of the "disconnected" RestartInProgress command being sent reliably to the current "notified entity".

When the Call Agent learns that the endpoint is disconnected, the Call Agent may then for instance decide to audit the endpoint, or simply clear all connections for the endpoint. Note that each such "disconnected" procedure will result in a new RestartInProgress command, which will be subject to the normal retransmission

procedures specified in Section 4.3. At the end of the procedure, the endpoint may thus still be "disconnected". Should the endpoint go out-of-service while being disconnected, it SHOULD send a "forced"

RestartInProgress message as described in Section 2.3.12.

The disconnected procedure is complete once a success response has been received. Error responses are handled similarly to the restart procedure (Section 4.4.6). If the "disconnected" procedure is to be initiated again following an error response, the rate-limiting timer considerations specified above still apply.

Note, that if the RestartInProgress is piggybacked with the response (R) to a command received while being disconnected, then

retransmission of this particular RestartInProgress does not require piggybacking of the response R. However, while the endpoint is disconnected, resending the response R does require the

RestartInProgress to be piggybacked with the response to ensure the in-order delivery of the two.

If a set of disconnected endpoints have the same "notified entity", and the set of endpoints can be named with a wildcard, the gateway MAY replace the individual disconnected procedures with a suitably wildcarded disconnected procedure instead. In that case, the Restart Delay for the wildcarded "disconnected" RestartInProgress command SHALL be the Restart Delay corresponding to the oldest disconnected procedure replaced. Note that if only a subset of these endpoints subsequently have their "notified entity" changed and/or are no longer disconnected, then that wildcarded disconnected procedure can no longer be used. The remaining individual disconnected procedures MUST then be resumed again.

A disconnected endpoint may wish to send a command (besides RestartInProgress) while it is disconnected. Doing so will only succeed once the Call Agent is reachable again, which raises the question of what to do with such a command meanwhile. At one

extreme, the endpoint could drop the command right away, however that would not work very well when the Call Agent was in fact available, but the endpoint had not yet completed the "disconnected" procedure (consider for example the case where a NotificationRequest was just received which immediately resulted in a Notify being generated). To prevent such scenarios, disconnected endpoints SHALL NOT blindly drop new commands to be sent for a period of T-MAX seconds after they receive a non-audit command.

One way of satisfying this requirement is to employ a temporary buffering of commands to be sent, however in doing so, the endpoint MUST ensure, that it:

* does not build up a long queue of commands to be sent,

* does not swamp the Call Agent by rapidly sending too many commands once it is connected again.

Buffering commands for T-MAX seconds and, once the endpoint is connected again, limiting the rate at which buffered commands are sent to one outstanding command per endpoint is considered acceptable (see also Section 4.4.8, especially if using wildcards). If the endpoint is not connected within T-MAX seconds, but a "disconnected"

procedure is initiated within T-MAX seconds, the endpoint MAY

piggyback the buffered command(s) with that RestartInProgress. Note, that once a command has been sent, regardless of whether it was

buffered initially, or piggybacked earlier, retransmission of that command MUST cease T-MAX seconds after the initial send as described in Section 4.3.

This specification purposely does not specify any additional behavior for a disconnected endpoint. Vendors MAY for instance choose to provide silence, play reorder tone, or even enable a downloaded wav file to be played.

The default value for Tdinit is 15 seconds, the default value for Tdmin, is 15 seconds, and the default value for Tdmax is 600 seconds.

Dans le document IESG Note This document is being published for the information of the community (Page 143-146)