• Aucun résultat trouvé

Recovery Actions for LLC Timeouts and Failures

C.7. Timeouts during LLC Negotiation

C.7.1. Recovery Actions for LLC Timeouts and Failures

The following list describes recovery actions for LLC timeouts. A write completion failure or other indication of send failure for an LLC command is treated the same as a timeout.

LLC message: CONFIRM LINK from server (first contact, first link in the link group)

Timer waits for: CONFIRM LINK reply from client.

Recovery action: Break the TCP connection by sending a RST, and clean up the link. The server should have received an SMC Decline from the client by now if the client had an LLC send failure.

LLC message: CONFIRM LINK from server (first contact, second link in the link group)

Timer waits for: CONFIRM LINK reply from client.

Recovery action: The second link was not successfully set up.

Send a DELETE LINK to the client. Connection data cannot flow in the first link in the link group, until the reply to this DELETE LINK is received, to prevent the peers from being out of sync on the state of the link group.

LLC message: CONFIRM LINK from server (not first contact) Timer waits for: CONFIRM LINK reply from client.

Recovery action: Clean up the new link, and set a timer to retry.

Send a DELETE LINK to the client, in case the client has a longer timer interval, so the client can stop waiting.

LLC message: CONFIRM LINK reply from client (first contact) Timer waits for: ADD LINK from server.

Recovery action: Clean up the SMC-R link, and break the TCP connection by sending a RST over the IP fabric. There is a problem with the server. If the server had a send failure, it should have sent an SMC Decline by now.

LLC message: ADD LINK from server (first contact) Timer waits for: ADD LINK reply from client.

Recovery action: Break the TCP connection with a RST, and clean up RoCE resources. The connection is past the point where the server can fall back to IP, and if the client had a send problem it

should have sent an SMC Decline by now.

LLC message: ADD LINK from server (not first contact) Timer waits for: ADD LINK reply from client.

Recovery action: Clean up resources (QP, RKeys, etc.) for the new link, and treat the link over which the ADD LINK was sent as if it had failed. If there is another link available to resend the ADD LINK and the link group still needs another link, retry the ADD LINK over another link in the link group.

LLC message: ADD LINK reply from client (and there are more RKeys to be communicated)

Timer waits for: ADD LINK CONTINUATION from server.

Recovery action: Treat the same as ADD LINK timer failure.

LLC message: ADD LINK reply or ADD LINK CONTINUATION reply from client (and there are no more RKeys to be communicated, for the second link in a first contact scenario)

Timer waits for: CONFIRM LINK from the server, over the new link.

Recovery action: The setup of the new link failed. Send a

DELETE LINK to the server. Do not consider the socket opened to the client application until receiving confirmation from the server in the form of a DELETE LINK request for this link and sending the reply (to prevent the partners from being out of sync on the state of the link group).

Set a timer to send another ADD LINK to the server if there is still an unused RNIC on the client side.

LLC message: ADD LINK reply or ADD LINK CONTINUATION reply from client (and there are no more RKeys to be communicated)

Timer waits for: CONFIRM LINK from the server, over the new link.

Recovery action: Send a DELETE LINK to the server for the new link, then clean up any resource allocated for the new link and set a timer to send an ADD LINK to the server if there is still an unused RNIC on the client side. The setup of the new link failed, but the link over which the ADD LINK exchange occurred is

unaffected.

LLC message: ADD LINK CONTINUATION from server

Timer waits for: ADD LINK CONTINUATION reply from client.

Recovery action: Treat the same as ADD LINK timer failure.

LLC message: ADD LINK CONTINUATION reply from client (first contact, and RMB count fields indicate that the server owes more ADD LINK CONTINUATION messages)

Timer waits for: ADD LINK CONTINUATION from server.

Recovery action: Clean up the SMC-R link, and break the TCP

connection by sending a RST. There is a problem with the server.

If the server had a send failure, it should have sent an SMC Decline by now.

LLC message: ADD LINK CONTINUATION reply from client (not first contact, and RMB count fields indicate that the server owes more ADD LINK CONTINUATION messages)

Timer waits for: ADD LINK CONTINUATION from server.

Recovery action: Treat as if client detected link failure on the link that the ADD LINK exchange is using. Send a DELETE LINK to the server over another active link if one exists; otherwise, clean up the link group.

LLC message: DELETE LINK from client

Timer waits for: DELETE LINK request from server.

Recovery action: If the scope of the request is to delete a single link, the surviving link over which the client sent the

DELETE LINK is no longer usable either. If this is the last link in the link group, end TCP connections over the link group by sending RST packets. If there are other surviving links in the link group, resend over a surviving link. Also send a DELETE LINK over a surviving link for the link over which the client attempted to send the initial DELETE LINK message. If the scope of the request is to delete the entire link group, try resending on other links in the link group until success is achieved. If all sends fail, tear down the link group and any TCP connections that exist on it.

LLC message: DELETE LINK from server (scope: entire link group) Timer waits for: Confirmation from the adapter that the message was delivered.

Recovery action: Tear down the link group and any TCP connections that exist on it.

LLC message: DELETE LINK from server (scope: single link) Timer waits for: DELETE LINK reply from client.

Recovery action: The link over which the server sent the

DELETE LINK is no longer usable either. If this is the last link in the link group, end TCP connections over the link group by sending RST packets. If there are other surviving links in the link group, resend over a surviving link. Also send a DELETE LINK over a surviving link for the link over which the server attempted to send the initial DELETE LINK message. If the scope of the request is to delete the entire link group, try resending on other

links in the link group until success is achieved. If all sends fail, tear down the link group and any TCP connections that exist on it.

LLC message: CONFIRM RKEY from client

Timer waits for: CONFIRM RKEY reply from server.

Recovery action: Perform normal client procedures for detection of failed link. The link over which the message was sent has failed.

LLC message: CONFIRM RKEY from server

Timer waits for: CONFIRM RKEY reply from client.

Recovery action: Perform normal server procedures for detection of failed link. The link over which the message was sent has failed.

LLC message: TEST LINK from client

Timer waits for: TEST LINK reply from server.

Recovery action: Perform normal client procedures for detection of failed link. The link over which the message was sent has failed.

LLC message: TEST LINK from server

Timer waits for: TEST LINK reply from client.

Recovery action: Perform normal server procedures for detection of failed link. The link over which the message was sent has failed.

The following list describes recovery actions for invalid LLC

messages. These could be misformatted or contain out-of-sync data.

LLC message received: CONFIRM LINK from server What it indicates: Incorrect link information.

Recovery action: Protocol error. The link must be brought down by sending a DELETE LINK for the link over another link in the link group if one exists. If this is a first contact, fall back to IP by sending an SMC Decline to the server.

LLC message received: ADD LINK

What it indicates: Undefined enumerated MTU value.

Recovery action: Send a negative ADD LINK reply with reason code x’2’.

LLC message received: ADD LINK reply from client

What it indicates: Client-side link information that would result in a parallel link being set up.

Recovery action: Parallel links are not permitted. Delete the link by sending a DELETE LINK to the client over another link in the link group.

LLC message received: Any link group command from the server, except DELETE LINK for the entire link group

What it indicates: Client has sent a DELETE LINK for the link on which the message was received.

Recovery action: Ignore the LLC message. Worst case: the server will time out. Best case: the DELETE LINK crosses with the command from the server, and the server realizes it failed.

LLC message received: ADD LINK CONTINUATION from server or ADD LINK CONTINUATION reply from client

What it indicates: Number of RMBs provided doesn’t match count given on initial ADD LINK or ADD LINK reply message.

Recovery action: Protocol error. Treat as if detected link outage.

LLC message received: DELETE LINK from client

What it indicates: Link indicated doesn’t exist.

Recovery action: If the link is in the process of being cleaned up, assume timing window and ignore message. Otherwise, send a DELETE LINK reply with reason code 1.

LLC message received: DELETE LINK from server

What it indicates: Link indicated doesn’t exist.

Recovery action: Send a DELETE LINK reply with reason code 1.

LLC message received: CONFIRM RKEY from either client or server What it indicates: No RKey provided for one or more of the links in the link group.

Recovery action: Treat as if detected failure of the link(s) for which no RKey was provided.

LLC message received: DELETE RKEY

What it indicates: Specified RKey doesn’t exist.

Recovery action: Send a negative DELETE RKEY response.

LLC message received: TEST LINK reply

What it indicates: User data doesn’t match what was sent in the TEST LINK request.

Recovery action: Treat as if detected that the link has gone down.

This is a protocol error.

LLC message received: Unknown LLC type with high-order bits of opcode equal to b’10’

What it indicates: This is an optional LLC message that the receiver does not support.

Recovery action: Ignore (silently discard) the message.

LLC message received: Any unambiguously incorrect or out-of-sync LLC message

What it indicates: Link is out of sync.

Recovery action: Treat as if detected that the link has gone down.

Note that an unsupported or unknown LLC opcode whose two

high-order bits are b’10’ is not an error and must be silently discarded. Any other unknown or unsupported LLC opcode is an error.