My blog has moved and can now be found at http://blog.aniljohn.com

No action is needed on your part if you are already subscribed to this blog via e-mail or its syndication feed.

Sunday, July 15, 2007
« REST at Catalyst | Main | Vacation in the Canadian Rockies »

Message replay is a very real attack vector for web services attacks. The description of the defense against it is pretty straight forward. To quote the "Message Replay Detection Pattern":

Message replay detection requires that individual messages can be uniquely identified. [...] Cache an identifier for incoming messages, [...] identify and reject messages that match an entry in the replay detection cache.

You have a couple of choices in how you can leverage the relevant web services standards to implement this.

  1. Given that WS-Security XML Signature must include a <ds:SignatureValue> element, you can use that as a unique identifier for the messages. The <ds:SignatureValue> is computed from the hash values of the message parts that are being signed (and you can make sure that the parts include both the message body and the WS-Security timestamp i.e. <wsu:Timestamp>.
  2. You can use another unique identifier e.g. the WS-Addressing messageid <wsa:MessageID>, which is a URI value that uniquely identifies the message that carries it.  Combine it with the WS-Security timestamp i.e. <wsu:Timestamp> and you are good to go here as well.

In both cases, what you need in order for this to work is a configurable replay cache. And that is also relatively straight-forward to implement, especially if you are doing this in software.

The proverbial devil and the details come into play when you want to abstract this type of capability into the infrastructure.  I for one am very interested in NOT doing this in software (Perf, consistent implementation across SOAP stacks etc.), so a natural place that I would seek to implement this type of capability would be in my perimeter PEP which typically, at least in my environment, is implemented using a XML Security Gateway. 

When it comes to deployment, you do not want a single point of failure in your infrastructure, so what you typically do for deploying perimeter PEP's (XML Security Gateways) is to deploy a cluster (at least two) with a (clustered) load balancer in front of the Gateways. Now, if you are going completely stateless what typically happens in this configuration is that requests are dynamically spread across your cluster of XML Gateways. And that is where the details come into play. If each Gateway implements its own replay cache, and requests are spread out across multiple Gateways, how can you truly be assured that you can detect a replay attack?

The way that you would handle this, if you were implementing this in software, would be to have a shared replay cache (typically a shared database) across multiple load balanced app servers (Note that there are perf implications with this).  But this really does not work with XML Gateways. These are hardened, FIPS compliant devices that typically cannot be tied into a database back-end. So, what possible solutions could you have for this:

  1. Configure the load balancer for Active-Passive rather than an Active-Active configuration. i.e. Instead of distributing traffic across the Gateways equally, send all traffic to one Gateway and only if that one fails should the traffic be sent to another cluster member.
  2. Actually modify the Gateway configuration to be able to connect to a database back-end.
  3. Others?

At present, I am thinking that option (1) is probably the most realistic one. I am not aware of any XML Security Gateway vendors who have implemented anything like option (2). But I am most certainly curious as to how folks are doing this in their environment, so if you are doing this, I would appreciate any info you can share on your implementation details and what you find to be the trade-offs.

Tags:: Security
7/15/2007 9:47 PM Eastern Daylight Time  |  Comments [2]  |  Disclaimer  |  Permalink   
Monday, July 16, 2007 10:10:18 AM (Eastern Daylight Time, UTC-04:00)
Good post Anil. One other option, which is probably a variation on the shared cache is to enable some form of communication across the cluster, effectively a shared cache without relying on some physical external store. The same problem exists when trying to employ a rate throttling approach. If you only look at requests flowing in through one gateway, you may not have the full picture of the request load from one particular consumer, since it should be spread across the cluster. The vendors I've spoken with on this one have all enabled some form of inter-gateway communication for sharing this information so that an individual gateway can make throttling decisions based on the collective usage. I would think that the same approach should work for this problem, however, as you point out, there are performance considerations. I've seen app servers that have this type of communication and the saturation problems that can ensue when there is too much communication trying to keep the cluster in sync.

-tb
Monday, July 16, 2007 5:31:53 PM (Eastern Daylight Time, UTC-04:00)
Good intro to replay attack and possible solutions. I agree that this attack is best handled via some network level solution, like the XML gateway.

I don't think the option for Active-Passive is viable for larger volume implementations that require multiple gateways to accommodate throughput. There are in-memory cache synchronization implementations in the app server world that may be applicable for the gateways. For example, JBOSS' JGROUPS implementation that synchronizes cached data across a cluster.

- JC
Jonathan Chaitt
Comments are closed.