The new draft of XProc is out and has fewer spangles. Here’s a post I sent to their suggestion box.
I think the new draft is an improvement, especially I prefer the simpler
names.
I think using the attribute name “port” is confusing: it adds an extra
concept where none is needed. Why not just call it “name”, i.e. the name
of the input? That inputs and outputs and parameters are ports is a
distinction without a function: everyone knows what an input and an output
is already but people have to scratch their heads over ports; even scheme
people cannot be sure it maps to the scheme concept.
One possible suggestion: I wonder whether it would better to allow inputs
to be either small arrays or streams, or ZIP files containing related
multiple XML documents, as well as raw XML documents. I think our current
way of doing things, where for example we make a catalog a parameter of a
process rather than a parameter of an input, is wrong.
In other words, instead of simple XML pipelines, have Compound XML Pipelines.
Now in one sense this is already unnecessary: if all the information is
passed as a URL then the URL could just as easily be inside a ZIP as on a
disk. However, XProc doesn’t pass by reference by (anonymous) value
AFAIKS.
As Norm would be painfully aware, a lot of the pain of processing XML
comes from the fact that XML documents are frequently parts of larger
collections: images, and so on. Not to mention the ODF and Open Packaging
Formats, which use ZIP.
Furthermore, most of the PSVI has no simple XML representation: either it
must be handled internally to a process (i.e. it is no use having a
separate process for validation) or it must be passed as a separate
document.
What do I mean by small arrays or streams? Well, define an input as a
stream like this:
<input name="instance">
<document name="schema" href="aaa.xsd"/>
<document name="catalog" />
<document name="instance" />
</input>
<input name="stylesheet" href="xxx.xsl" />
<output name="result">
<document name="schema" href="bbb.xsd" />
<document name="catalog" />
<document name="result" >
</output>
The availability of such arrays frees up the parameter inputs to be
only used for the parameters of the process. Also, it provides a natural
home for XBase and other contextual information: this can allow XML to
develop without requiring the XProc spec to be upgraded.


I agree with your point regarding "port". Too confusing.
In regards to streams, zip files, etc... Wouldn't it make more sense to use a format that has specifically been built around collections of data files -- e.g. Atom? The additional benefit of being able to attach the Atom specific meta-data to each entry allows nicely for things like versioning, comments (without actually having to use XML comments, and instead the built in 'summary' element) and keeping track of things like when the collection was last processed using the 'updated' element as well as when a collection was first made available using the 'published' element such these same pipelines for data processing can easily be queried to determine how old the data is, comparing this to other versions of the same Atom file to determine which is the newest, etc... etc... etc...
Adding to this a bit, its simple enough to add a more compact version (such as that in which you have outlined above) inside of the 'content' element, so in other words, this is the kind of thing that would allow nicely as an envelope for a pipeline input/output sequence.
Just food for thought...
I do agree with you about accepting different formats as inputs and outputs. We just have to define an XML notation for each.
I am not using Xproc personnally because I have already designed my own dialect to perform quite the same operations but with a tree approach. For example, I use to save a sub-tree of elements which could have been generated by a transformation of another sub-tree using ...
What we need is a simple way to describe ordinary operations in XML.