Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video

Building on its influential predecessor chicagocrime.org, EveryBlock takes the local-data mashup to new levels. Founder and hacker Adrian Holovaty talks about the philosophy and technology behind EveryBlock, the untapped potential of address-specific news, open data, and life after Google Maps.



Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video

Geoff Zeiss (Autodesk, Inc.)–Convergence is about breaking down islands of information based on traditional disciplines or professional categories or those created by the traditional organization of the architecture, engineering, construction, transportation, and utility and telecommunications industries. The convergence of architectural and engineering design, location, and 3D visualization and simulation technologies developed is resulting in a framework for interoperability across the lifecycle of building and infrastructure including design, construction, and operation and maintenance.

The business drivers for this transformative technology advance are productivity and efficiency in the construction and facilities management industry, and improving the performance of facilities over their full life-cycle. The goal is seamless access to architectural, engineering design, and geospatial data inside, outside, and under a facility.



Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video

Paul Torrens (Arizona State University)–Ambient crowds are the new distributed computing platform. Smart mobs are fashioning new architectures for social networking. Armed with cell phones and mobile gaming devices, they are the new business model for location-based services. Seditious crowds are creating havoc in urban theaters of war and at global economic forums. Crowds of shoppers, endowed with smart chip credit cards and RFID tagged merchandise are trailed by long-lasting data shadows that follow them ubiquitously.

Embedded in urban infrastructure and in the very products we consume, new technologies are emerging to enable cities to think about—and process—the people that pulse through them, with a burgeoning code-space being developed to capture the actions and interactions of individuals within large dynamic crowds. This presentation will focus on our recent research work in developing models of crowd behavior and their application to theory-building and scenario evaluation in the contexts just described.

We have developed a reusable modeling platform for constructing large simulations of individual and collective behavior in dense urban environments. The simulations are developed with individual agents, equipped with geospatial AI that allows them to perceive and react to their evolving surroundings with an incredible level of behavioral realism. These agents are also capable of social and antisocial interactions. The simulation architecture is coupled to Geographic Information Systems, allowing for a suite of geospatial analytics and data-mining to be performed, across a wide array of scenarios. Moreover, the models have been developed as realistic 4D immersive environments with unprecedented levels of graphical realism.

From O’Reilly Where 2.0, San Jose, CA, Tuesday, May 29th, 2007.



Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video
James Greiner, Senior Vice President and General Manager, MapQuest, Inc. In preparation for Where 2.0, MapQuest conducted an ethnography study. The massive survey polled users on what they want from location-based services, mapping sites, and in mobile. It should be a very informative look into the desires of the people (many) our apps are made for. From O’Reilly Where 2.0, San Jose, CA, Tuesday, May 29th, 2007.
Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video
Since Google first presented a snapshot of the geoweb at last year’s Where 2.0, it has considerably evolved: more Geo data is published on the web, KML was accepted as an OGC standard and is adopted by a growing number of tools. Join John Hanke, Director of Google Earth & Maps to hear the latest on the evolution of the Geoweb and Google’s effort to organize it and make it universally accessible and useful. In this video from the O’Reilly 2008 Where 2.0 conference, John Hanke demonstrates the latest in Google geo development with Jack Dangemond of ESRI.
Eric Larson

AddThis Social Bookmark Button

Today I took some time to quickly scan through a backlog in my feed reader. There were a good number anti-XML articles cropping up. This got me thinking. What do you think of when I say “XML”? I personally associate XML as a baseline technology in a large set of tools used for describing data. For example, I think of Atom and XHTML within the scope of RESTful web services. Next up would be document formats such as DITA and DocBook. This starts me thinking about linking data and technologies such as XInclude and XPointer. As I reflect on where my mind wonders when thinking about XML, themes of linked data and document resources quickly rise to the top. What does not come to mind is WSDL, XML Schema, object serialization, configuration files, or SOAP.

What do you think of when I say XML? What kind of context does XML succeed and where does it fail?

M. David Peterson

AddThis Social Bookmark Button

Update: len trumps his own QOTD with these two gems. I’ll let you decide which you feel is funnier/more accurate, cuz’ I can’t decide,

I try to tell him that no element is *really* non-terminating but he gets wrapped up in the language abstraction and forgets to breathe.

.. or ..

We made this a lot harder than it has to be in the name of “just in case.”

[Original Post]
Is it really that taxing… - O’Reilly XML Blog

Most of the time when I find a programmer struggling with XML, they are a relational database programmer or an object-oriented programmer, or both. We should have lined these guys up against the wall at the beginning of the revolution, really.

NOTE: I met Jeff Atwood for the first time a month or two back. Nice guy. Obviously an OO-trained programmer. But a nice guy, none-the-less. ;-)

M. David Peterson

AddThis Social Bookmark Button

Brain.Save() - We are pleased to bring you new features in .NET 3.5 SP1

Syndication OM for the Atom Publishing Protocol. We added strongly-typed OM for all of the constructs defined in the Atom Publishing Protocol specification (like ServiceDocument and Workspaces) and put them in the System.ServiceModel.Syndication namespace.

M. David Peterson

AddThis Social Bookmark Button

So as Jeff Barr recently pointed out over on the Amazon Web Services blog,

Amazon Web Services Blog: Redundant Disk Storage Across Multiple EC2

M_david_preparing_for_ec2_persisten
XML Hacker M. David Peterson has put together a really interesting article.

As part of his work at 3rd and Urban, he has implemented redundant, fault-tolerant, read-write disk storage on Amazon EC2 using a number of open source tools and applications including LVM, DRBD, NFS, Heartbeat, and VTUN.

Mark notes that "the primary focus of this paper is to present both a detailed overview
as well as a working code base that will enable you to begin designing,
building, testing, and deploying your EC2-based applications using a
generalized persistent storage foundation, doing so today in both lieu
of and in preparation for release of Amazon Web Services offering in
this same space."

The article provides complete implementation details and links to source code for the scripts that Mark developed.

You can read the article, and you can also follow progress via the discussion group.

– Jeff;

Firstly, and most importantly, as pointed out in the first portion of this article,

Eric Larson

AddThis Social Bookmark Button

Jeff Atwood mentions the Angle Bracket Tax and not surprisingly, I don’t agree. XML can be difficult and painful at times, but I think the reasons are not entirely technical. Recently, I had the opportunity to work with XML in Java and it was definitely “taxing”. Even though the process was frustrating, it really had little to do with XML. The biggest pain was actually Java.

After working in Python/Ruby for a good portion of time, declaring types, long CamelCase variable/class names and overly complex Object Oriented patterns feel painful. I’m not much of a Java hacker, so most problems could be chalked up to my lack of experience in the language. While a better understanding of Java would have been helpful, in learning XML and C# I had a very similar (and frustrating) experience, which makes me believe it is not necessarily the XML. After working with XML in Python (and Ruby to a lesser extent), it is clear that the real problem is not an “angle bracket” tax. In fact, I would argue that I got a “tax return” by understanding concepts like DOM, which I became exposed to through JavaScript, rather than Python.

It is fine to think XML is “hard” and as I said before, it can be frustrating. But to consider XML the source of frustration probably is not considering all the factors. Jeff’s blog generally focuses on programming and human factors and this is a great example of a human factor. If you expect the static typing, wasteful OO patterns and require IDE support, then you have accepted the struggles as normal. When you then are forced to deal with XML, the common links to OO ideals and patterns don’t match, leading you to the conclusion that XML is hard. The people who breeze through XML and enjoy the technology are simply those who have invested a little time in learning basic tools. Of course, these people also have set their own expectations regarding XML, but therein lies the secret.

Programmers are supposed to be logical people who make decisions using reason. The reality is programmers are people with irrational feelings and emotions that impact decision making. There is a good chance your criticisms of XML are rooted in the thousands of blog posts saying XML sucks. There is an even better chance you promote tools such as YAML without ever actually having used it heavily. Opinions obviously have a place in software, but so does logic and reason. The next time you deal with XML, take a minute and try to learn something new before complaining. When I recently worked with Java, I went ahead and put aside my frustrations for a bit to get a good handle on Ant. Lo and behold, it was interesting and I learned something new that helped improve my perspective on Java. If you take the time to learn XML basics with an objective mindset, you might still say it sucks. On the other hand, you might realize it is not so bad.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just caught up with this document from W3C which fills in a big gap in English-language technical material. Japanese typesetting technology has been very influential in the other Ideographic countries, and they share many commonalities (e.g. Japanese ruby text and Taiwanese bopomofo.) There is a Japanese standard JIS X 4051, but it has no translation available: though parts of it, usually called the kinsoku rules, are floating around in material from vendors, particularly Adobe’s Ken Lunde and some MS material.

By and large, Chinese and Korean have different details (e.g. different characters) but the same analysis applies.

One term that the W3C draft uses but does not define is kihonhanmen; readers getting held up by this could substitute underlying grid (or text block or even constant width frame) for this.

AddThis Social Bookmark Button

Hello all.

My name is Griffin Caprio and I’m a new blogger here on xml.com. Apologies are in order to Kurt, my fearless editor, since this is my first post and I actually came into the O’Reilly fold back in March. I am very excited to blog here, as the O’Reilly xml community is at the forefront of XML evangelism.

I’ll be blogging mostly about semantic technologies. However, this will not be just another Semantic Web blog. Most of the talk surrounding semantics is centered on the Semantic Web and it’s potential to usher in a new era of interactive / integrated web applications. Personally, I’m more interested in using semantic technologies to tackle the oceans of data companies are amassing internally. Some people call that Business Intelligence (BI) and some call it Data Mining (DM). I’ll try and stay away from those types of product categorizations and concentrate on the pragmatic application of semantics.

As is standard practice here at xml.com, please feel free to leave me comments and / or suggestions if there is something you think I should know. We welcome all feedback!

Michael C. Daconta

AddThis Social Bookmark Button

A few months ago I wrote an article for Government Computer News on the battle over Rich Internet Applications. At that time, I thought it was odd that the other major contenders, Silverlight and Flex, use XML and JavaFX does not. I wonder, if in the rush to push something out the door, Sun forgot about separation of concerns and the benefits of skill specialization to quality production. I see the trend towards declarative User Interfaces as a good thing - the proper domain of graphic designers - so, why did Sun seemingly take a step backwards? If you are a JavaFX guru, I am interested in understanding this. I found a good simple way to compare these techniques was the bubblemark site which programs a simple animation in all three (and many more) variants. What are your thoughts on XAML versus MXML versus JavaFX? In looking at the bubblemark application I still feel that MXML is the cleanest. This definitely will require some more looking in to… in the meantime, see you in the trenches. - Mike

Rick Jelliffe

AddThis Social Bookmark Button

A couple of years back I had a very surprising experience with a junior programmer, who had just joined our team. I had asked him to work on some code until there were no more JUnit errors. A few hours later he proudly showed there were no errors, and explained it was easier than he expected because he just commented out the tests! Then he paused, regarded my startled expression for a few seconds and quickly blushed deeply. Doh!

Poor old Alex Brown has been in and out of favour with the extreme anti-OOXML-ists (perhaps I should use a new acronym, such EAOOXMLista, to say for the hundred thousandth time that not every anti-OOXML person is extreme?) over the last few weeks. First, he didn’t somehow stop the DIS29500 BRM somehow (exactly how?) from doing its job. So he is bad. Then he works with SC34 to organize getting more improvements made to OOXML and ODF. Again, bad. Then he says ““The question behind the question, for a lot of the current OOXML debate, seems to be: can Microsoft really be trusted to behave? We shall see” which earned him the quote of the day on ConsortiumInfo. So presumably he is good.

Then he does a smoke test of validation conformance of Office and the various OOXMLs, and reported the validation errors he found. So he is deemed good. Now he has validated various versions of Open Office and ODF and reported the validation errors he found. And that makes him the devil again.

Unless there is some tussle between evil twins going on, I’d like to suggest that Alex is just trying to faithfuly fulfill his normal committee responsibilities, which include checking through standards. Alex has long been involved in Data Quality issues for publishing professionally, and has been very involved in the development of ISO DSDL at SC34 (which includes RELAX NG and Schematron.)

So what is it that Alex found about ODF that has caused the fuss? It is quite technical, but the gist is this, as I understand it: if a schema is not itself valid, no documents can be formally valid against it.

(When the invalid part of the schema is only detected at run-time when exercised by a particular instance document structure, and the document does not contain such a triggering instance, the implementation may report that the document is valid, but that is a false positive. And you make look at the schema and say “I know what was intended, and the false positive is in fact correct against the intent of the schema” but this is lucky accident, i.e. hacking, not formal validity.)

The particular issue is quite interesting because it relates to an area in a W3C Schema standard where the user requirements for XSD could not be supported by the facet model used, and where XSD fudges it. OASIS RELAX NG, also to an extent inherited this problem.

The problem is with attributes of type ID in the ODF schema. Alex Brown has provided a very simple fix, which I hope gets adopted into ODF 1.2.

The problem with IDs is this. XML inherits ID type attributes from SGML. They have various constraints, which include that they are XML names (tokens), that their values are unique within the document, and that an element can only have one ID attribute.

When XSD came to make its datatyping the XSD WG made a nice theoretical distinction between lexical space and value space: these are entirely context-free distinctions, which relate only the atomic values of the individual pieces of text. XSD also provided another mechanism to declare that certain data values should be unique. But the constraints that an ID attribute value must be document-unique and that an element may only have a single ID attribute are left out in the cold by this model, and are not directly in the XSD specs. Blink and you’ll miss them, there is a little handwaving going on but it is a good pragmatic workaround: the spec references the XML specification; that these non-facet constraints on IDs are intended is made explicit in the (non-normative) Primer which forms Part 0 of the spec:

the scope of an ID is fixed to be the whole document.

and, more importantly, the XSD Structures Spec Part 1 specifies the ID/IDREF table as part of the PSVI.

ODF uses RELAX NG, and ISO RELAX NG specifically allows (s. 9.3.8 data and value pattern) datatyping to validate using more than just the atomic string:

services may make use of the context of a string. For example, a datatype representing a QName would use the namespace map.

(This seems to be a difference from the original OASIS RELAX NG, which AFACS started with a more atomic view of datatypes. )

So when an ODF schema says an attribute is an ID type, we expect for full validation it will have all the XSD/XML semantics, and that for full validation of the schema conflicts would be pointed out. If you don’t want these semantics, you just use the base type xs:ncName which has the lexical and value space but adds none of the other constraints.

So we come to the concrete problem that a couple of content models allow wildcarded attributes in any namespace, and many of the attributes in the namespaces in question have ID attributes. So the argument (which you can follow on Alex Brown and Rob Weir’s blog) is what class of error this should be: all the implementations of RELAX NG and Alex say this makes the schema invalid (in ISO Schematron I specifically included definitions for a “good schema” and a “correct schema” as well as a “valid schema” in order to make these nuances clearer); Rob thinks it shouldn’t be an error (”thinks” is too weak a term) and seems to think it should only be an error if a element actually has two ID attributes. I think this is also legitimate possible approach that the standards could take (but they don’t.).

Alex has found the fix for ODF, but I think RELAX NG and XSD could well have some extra clarifaction text (non-normative) to stop basic mistakes. If a schema, whether DTD, XSD or RELAX NG, says something is an ID, it has all the semantics of an XML ID.

So what was the point about the programmer turning off tests to make some code fault-free? That is Rob Weir’s suggestion on how to make the ODF documents valid: turn off ID testing! Brilliant! So what is the point of ODF 1.0 making these things IDs in the first place if that was not the intended semantics?

I suspect this is actually another example of where it would have been more satisfactory all around to have these constraints in Schematron. For example, not use ID type but xs:ncName (this is not real code, but to give the idea…you’d use a regex and this assumes a consistent naming convention in ODF and sub-vocabularies wrt attribute naming):

<sch:rule context="whatever">
   <sch:report role="duplicate-ids"  test="count( @*[ends-with(name(), 'id']) &gt; 1">
    There should not be more than one attribute called  id.
   </sch:report>
</sch:rule>

This seems to give the intended constraint against duplication, but makes it a run-time instance-driven problem, not a static schema error. Another assertion would handle uniqueness.

So my take: Alex is right that the schema has a flaw, and right to point it out and offer a fix; Rob is right that it is unnecessary for this to be a static error (which is the positive point I would infer from his over-reacting blog), but wrong that the way to fix it is to turn off validating that constraint.

Michael C. Daconta

AddThis Social Bookmark Button

The IBM Information Server has a business glossary manager that I am implementing for several clients. Some of those clients have existing data dictionaries and glossaries that will need to be imported into the product. The IBM information server has an XML format to allow you to import/export business glossaries.

There is a lot to talk about in examining this format. There is the good, the bad and the ugly in this format. Before we begin our dissection there are two contextual topics in need of some discussion. First is examining the goals of the format and second is determining whether those goals could have been achieved using existing formats.

At a high-level, the format has three main goals which correspond to its three main elements: represent terms and their definitions (via the term element), categorize terms (via the category element) and add custom attributes to categories or terms (via the attribute element). Except for the metadata extension mechanism (custom attributes), this is a simple way to create and organize a dictionary in XML. When examining the schema or the example of the format it is clear that it is far from a complete standard. For example, the available data types for custom attributes is only String. So, it is clear that this format will evolve. A bigger question is - should it? And should it even have been created in the first place?

There are quite a few formats for capturing glossaries, dictionaries and thesauri in XML. A colleague of mine, Ken Sall, examined this for the government a few years back. The W3C has SKOS, IBM has subject classification in DITA (though DITA is much broader than glossaries), and XML topic maps can also serve this purpose.

So, although we will continue to explore the details of this format and even conversion of some of the others mentioned into this format, what are your thoughts on it?

Until next time, see you in the trenches… - Mike