Mark’s blog item is worth a read: he is working towards an ideal of time-independent validation. I take his essential point to be that different consumers have different constraints as showstoppers, and that it is inefficient, frustrating and wasteful for your input to barf on constraints that don’t effect you in particular. For example, if you are just storing a numeric field in a database now and then writing it out later, you don’t need to care whether it is in a particular range, just that it is a number at all.
I think there are four kinds of validation strategies, with a natural underwear analogy
* schema used for validating incoming data reflects the public interface (tighty whiteys)
* schema used for validating incoming data only reflects the capabilities of the consuming system (speedos)
* schema used for validating incoming data is a looser family schema that gives some kind of version independence (boxers)
* schema used to validating incoming data only validates traceable business requirements (G-string)
Err, OK lets forget the analogy…
The first case has the advantage of catching problems early, at the cost of reporting things that may not cause a problem in effect. Sometimes you do want software to reject out-of-bound information. And while an out-of-bound error for some data item may have been caused by a schema enhancement, it also may have been caused by bogus programming: you don’t want your database corrupted.
The second case has the advantage that if you have written your system to be robust in the face of extensibility and change, whacking a tight schema in front of it reduces the flexibility you bought with the robustness.
The third case looks like what Mark is suggesting
The fourth case is where I think we need to be heading more. Actually, it provides the mechanism for knowing when to adopt the first three cases too. When we validate, we may have a business requirement to reject any old schemas.
Schematron’s approach, by the way, for this area is called “phases”. You can group patterns into named phases and invoke them together. So you could have phases called
“Schema-2006-12-12″”
“Marks_input_constraints_only”
“Time_independent_constraints”
“Business_rules_only”
which correspond to each of the cases above.


Smart sub-Saharan comments on Mark's post