Thursday, 20 November 2014

Idempotence

More revision, a few musings and another aide-mémoire.

Idempotence is a subject of much debate and seems to mean different things to different people when used in different contexts. This post is just a quick mile-high overview by way of a bit of revision.

Idempotence defined

Lets pop over to Wikipedia and get a definition of idempotence:

“Idempotence (/ˌaɪdɨmˈpoʊtəns/ EYE-dəm-POH-təns) is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.” [1]

“In computer science, the term idempotent is used more comprehensively to describe an operation that will produce the same results if executed once or multiple times. This may have a different meaning depending on the context in which it is applied. In the case of methods or subroutine calls with side effects, for instance, it means that the modified state remains the same after the first call…

This is a very useful property in many situations, as it means that an operation can be repeated or retried as often as necessary without causing unintended effects. With non-idempotent operations, the algorithm may have to keep track of whether the operation was already performed or not.” [2]

So for our purposes idempotence is the property of an operation that means if it is executed once or multiple times the result will always be the same. Now it’s the result that seems to be cause of the debate.

There are often side effects – sometimes quite subtle - that could be considered part of the result. For example, if you were to create an audit record every time an apparently idempotent operation is executed is that still idempotent? The answer is ‘probably not’ but will depend on the expected behaviour of the application. Logging, auditing and monitoring may be considered side effects of message handling:

“These side effects are not relevant to the semantics of the application behavior, so the processing of an idempotent request is still considered idempotent even if side effects exist.” [6]

Idempotence in HTTP

In HTTP there are some methods that are considered idempotent.

“Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE should not have side effects, and so are inherently idempotent.” [3]

Actually in practice there are some potential pitfalls with methods like DELETE. Although the result of a delete operation on the server may be idempotent – deleting the same resource multiple times has the same effect - it is possible for repeated DELETE operations on the same resource to return different HTTP status codes. An initial call to DELETE may return a status code of 200 but subsequent calls could return 404. From the client side the operation may not appear to be idempotent.  

Idempotency is a concept that features in REST quite a bit and is regarded as an important characteristic of fault-tolerant APIs. In REST the result may well be the resource representation so issues about audit records etc. may not be a concern. To illustrate here’s a quote from http://restcookbook.com:

“An idempotent HTTP method is a HTTP method that can be called many times without different outcomes. It would not matter if the method is called only once, or ten times over. The result should be the same. Again, this only applies to the result, not the resource itself. This still can be manipulated (like an update-timestamp, provided this information is not shared in the (current) resource representation.” [4]

So the author is drawing a distinction here between the actual resource and it’s representation. As long as the representation doesn’t change then the operation is regarded as idempotent.

 

Idempotence in messaging

Idempotence in message-based systems is very useful because it helps avoid the necessity of using strategies such as two-phase commit (2PC) to manage distributed transactions. Essentially 2PC in a message-based system means that a message must be processed exactly once by the cohorts participating in the distributed transaction. As we know there are potential drawbacks to using 2PC because it is a blocking protocol that potentially ties up resources.

With a 2PC strategy in place the coordinating process has to determine if a failure condition has occurred and abort the distributed transaction. This causes all participating cohorts to abort their part of the distributed transaction. In a message-based system this effectively means that the messages sent to the cohorts are regarded as not having been processed. Recovering from the failure requires the coordinator to replay the whole distributed transaction by republishing the messages.

An alternative approach is to use a strategy where a message is processed at least once. With this strategy if a failure occurs the original message may be published again by the coordinating process but recipients that have already processed the message are expected to ignore it. This can be achieved 2 ways: have a mechanism in place to remove duplicate messages from being processed by participating cohorts, or messages have to be idempotent. [5]

A typical pattern for handling messages is for a message to be read from a queue, processed, and then removed from the queue.[6] In this scenario if a failure occurs after the message has been processed but before the message has been removed from the queue (e.g. there’s a power failure) the message will be processed twice. If the messages are idempotent this is no longer a concern.

Strategies for implementing idempotent messages include:

  • Natural idempotency – some messages are naturally idempotent (processing them multiple times has the same effect). In much the same way as an HTTP DELETE can be idempotent so could a message to delete a resource.
  • Use a correlation identifier – add a unique identifier to each message and use it to see if the message has been processed before.

 

Idempotence and SOA

Just a quick note here that idempotence has a place to play in a service oriented architecture because it facilities fault-tolerance:

“Idempotency guarantees that repeated invocations of a service capability are safe and will have no negative effect.

Idempotent capabilities are generally limited to read-only data retrieval and queries. For capabilities that do request changes to service state, their logic is generally based on "set", "put" or "delete" actions that have a post-condition that does not depend on the original state of the service.

The design of an idempotent capability can include the use of a unique identifier with each request so that repeated requests (with the same identifier value) that have already been processed will be discarded or ignored by the service capability, rather than being processed again.” [7]

 

References

[1] Idempotence (Wikipedia)

[2] Idempotence – Computer science meaning (Wikipedia)

[3] RFC 2616 Part 9

[4] The RESTful CookBook – Idempotency

[5] (Un) Reliability in messaging: idempotency and de-duplication (Jimmy Bogard, Los Techies)

[6] Idempotence Is Not a Medical Condition (Pat Helland, acm.org)

[7] Idempotent Capability

Wednesday, 19 November 2014

Two-phase Commit (2PC)

Time for a bit of revision. What is Two-phase Commit (2PC)?

Firstly, there’s lots of in formation out there on 2PC including Wkikpedia and MSDN. Those articles will go into much more detail about 2PC than I will here. This post is really just an aide-mémoire.

In a nutshell:

“It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction (it is a specialized type of consensus protocol). The protocol achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely utilized.” [1]

Essentially 2PC provides a mechanism for tasks to be executed across separate systems as a single atomic distributed transaction. For example you might want to make updates to separate databases on different servers - with each update running in its own transaction - and have the whole process run as a single distributed transaction. If an error occurs during any of the component transactions then all of the transactions should be aborted (rolled back).

Note that 2PC does not have to apply to database transactions. A step in the process could mean executing a program.

“The term transaction (or any of its derivatives, such as transactional), might be misleading. In many cases, the term transaction describes a single program executing on a mainframe computer that does not use the 2PC protocol. In other cases, however, it is used to denote an operation that is carried out by multiple programs on multiple computers that are using the 2PC protocol.” [2]

There will be 2 basic actors to 2PC: a coordinating process that manages the distributed transaction, and participating processes (participants, cohorts, or workers).

The 2PC protocol calls for 2 phases (see reference [1] for full details):

  • Commit-request phase (or Voting phase)
    • The coordinator sends an instruction to all cohorts to undertake their part of the distributed transaction and waits until it has received a reply from all cohorts.
    • The cohorts execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log.
    • Each cohort replies with an agreement message (cohort votes Yes to commit), if the cohort's actions succeeded, or an abort message (cohort votes No, not to commit), if the cohort experiences a failure that will make it impossible to commit.
  • Commit phase (or Completion phase)
    • Success
      • If the coordinator received an agreement message from all cohorts during the commit-request phase:
        • The coordinator sends a commit message to all the cohorts.
        • Each cohort completes the operation, and releases all the locks and resources held during the transaction.
        • Each cohort sends an acknowledgment to the coordinator.
        • The coordinator completes the transaction when all acknowledgments have been received.
    • Failure
      • If any cohort votes No during the commit-request phase (or the coordinator's timeout expires):
        • The coordinator sends a rollback message to all the cohorts.
        • Each cohort undoes the transaction using the undo log, and releases the resources and locks held during the transaction.
        • Each cohort sends an acknowledgement to the coordinator.
        • The coordinator undoes the transaction when all acknowledgements have been received.

The key point is that the cohorts do their work up to the point that they need to commit their transactions. They then vote on whether or not to commit. If all cohorts vote “Yes” then the coordinator tells all cohorts to commit. If any cohort votes “No” then the coordinator tells all cohorts to abort (rollback).

An interesting scenario to be considered is what happens if a cohort crashes having already voted but before it receives or processes the coordinator’s instruction to commit. The trick here is that the distributed transaction is not committed until all cohorts have acknowledged that they have committed. The coordinator will instruct the crashed cohort to commit when it becomes available again. To make this kind of scenario work the cohorts need to use logging to keep track of what steps they have taken (e.g. the database transaction log):

“To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures.” [1]

 

Disadvantages

The two-phase commit protocol is a blocking protocol. So,

  • Resources may be locked by cohorts while waiting for an instruction from the coordinator to commit or abort
  • If the coordinator fails permanently, some cohorts will never resolve their transactions

 

References

[1] Two-phase commit protocol (Wikipedia)

[2] Two-Phase Commit (MSDN)