PHP DevCenter

oreilly.comSafari Books Online.Conferences.

We've expanded our LAMP news coverage and improved our search! Search for all things LAMP across O'Reilly!

Search
Search Tips

advertisement

Listen Print Discuss Subscribe to PHP Subscribe to Newsletters

Using PHP 5's SimpleXML
Pages: 1, 2

XML Namespaces

SimpleXML even makes processing RSS 1.0 feeds easy. RSS 1.0 uses XML namespaces, which can present a bit of a headache during parsing. With XML namespaces, each element lives under a URL, which acts as a package name. This allows you to distinguish between, say, the HTML <title> element and the RSS <title> element.



All of a sudden things became more complex. You can no longer refer to title, since an unadorned title doesn't let the processor know which <title> you mean. You could be thinking of the RSS item <title>, but there's also an HTML <title> in the document.

As a result, there's now {http://www.w3.org/1999/xhtml}:title and also {http://purl.org/rss/1.0}:title instead. XML uses the colon (:) as a demarcation character between the URL and the plain tag name. In technical language, the complete name is called the qualified name, or the qname for short. (Really!)

Since URLs are long, you can map a short word to the URL. So, you frequently end up referring to these elements as <xhtml:title> and <rss:title>. These short names are known as namespace prefixes. However, it's the URL that's important, so prefixes like xhtml and rss are conventions, not actual namespaces. (It's important to mention that the URL doesn't have to resolve to a web page, it's just an easy way for people to create non-conflicting namespaces.)

SimpleXML likes the world to be simple, so it pretends the namespaces don't exist. (I know a whole crowd of readers feel this cure is worse than the disease. Remember, however, this is SimpleXML. If you're worried about namespace clashes use DOM.)

Here's the same data as before, encoded as RSS 1.0 and saved as rss-1.0.xml:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns="http://purl.org/rss/1.0/"
>
<channel rdf:about="http://www.php.net/">
    <title>PHP: Hypertext Preprocessor</title>
    <link>http://www.php.net/</link>
    <description>The PHP scripting language web site</description>
</channel>

<item rdf:about="http://www.php.net/downloads.php">
    <title>PHP 5.0.0 Beta 3 Released</title>
    <link>http://www.php.net/downloads.php</link>
    <description>
    PHP 5.0 Beta 3 has been released. The third beta of PHP is 
    also scheduled to be the last one (barring unexpected surprises).
    </description>
    <dc:date>2004-01-02</dc:date>
</item>

<item rdf:about="http://shiflett.org/archive/19">
    <title>PHP Community Site Project Announced</title>
    <link>http://shiflett.org/archive/19</link>
    <description>
    Members of the PHP community are seeking volunteers to help 
    develop the first web site that is created both by the community and for 
    the community.
    </description>
    <dc:date>2003-12-18</dc:date>
</item>

</rdf:RDF>

This XML document has three different namespaces. Looking at the top of the file, two namespaces have explicit namespace prefix mappings. That's what xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" and the following line does. It associates those URLs to rdf and dc. You can see rdf:RDF, rdf:about, and dc:date elements and attributes within the document.

RDF is "Yet Another XML Spec" (YAXMLS). I won't go into it here, but you can learn more on the W3 RDF site and in Tim Bray's article, What is RDF?, on XML.com. O'Reilly also has a book on RDF titled, Practical RDF.

There's also one entity without a prefix, xmlns="http://purl.org/rss/1.0/". That's the default namespace, since there's no colon after xmlns. Elements without a prefix, like item and title, live in the default namespace. This is different from RSS 0.91, where elements do not live in any namespace.

To search for elements in a namespace under DOM, you need to switch to a new set of methods, where you pass in the tag and the namespace. As I said earlier, SimpleXML just barges forward with its head down. You can use the exact same syntax with RSS 1.0 as earlier:

foreach ($s->item as $item) {
    print $item->title . "\n";
}

PHP 5.0.0 Beta Released
PHP Community Site Project Announced

This is not a problem because, despite all the namespace vigilance, there are no name clashes in the document.

XML Namespaces and XPath

However, SimpleXML is not completely naive. It recognizes the potential for problems with this attitude. Therefore, you can distinguish between two namespaced elements with XPath, but you need to use namespace prefixes.

SimpleXML automatically registers all the non-default namespace prefixes, but you need to handle the default namespace. (This lack of default namespace mapping is a deficit in XPath 1.0, not SimpleXML.)

To find and print all rss:title entries:

$s = simplexml_load_file('rss-1.0.xml');
$s->register_ns('rss', 'http://purl.org/rss/1.0/');
$titles = $s->xsearch('//rss:item/rss:title');

foreach ($titles as $title) {
    print "$title\n";
}

PHP 5.0.0 Beta 3 Released
PHP Community Site Project Announced

After loading the file, manually register a namespace prefix to go with http://purl.org/rss/1.0/. You're free to select any prefix you want, but rss is a natural choice.

The new XPath query now looks for //rss:item/rss:title instead of plain old //item/title, since it needs namespace prefixes. It's a little funny that there's no way to define a default namespace prefix for an XPath search, but that's how it is. Even though these elements don't have explicit prefixes in the document, they need prefixes in the XPath query.

You can use XPath to take advantage of the additional data in the RSS feed. For instance, to find and print all the entries from January 2004:

$s = simplexml_load_file('rss-1.0.xml');
$s->register_ns('rss', 'http://purl.org/rss/1.0/');
$titles = $s->xsearch('//rss:item[
               starts-with(dc:date, "2004-01-")]/rss:title');

foreach ($titles as $title) {
    print "$title\n";
}

PHP 5.0.0 Beta 3 Released

The first two lines are the same, but I've modified the XPath query to filter the results. In XPath, you can request a subset of elements in a level by requiring them to match a test inside of square brackets ([]). This test requires the dc:date element under the current rss:item to begin with the string 2004-01-. If so, starts-with() returns true, and XPath knows to include it in the results. (These dates are part of the Dublin Core Metadata specification, hence the prefix of dc.)

This prints only one title because the Community Site item was posted in December, while Beta 3 came out in January. (Actually, it came out at the end of December, but it makes the example easier to explain.)

Other Features

SimpleXML has a few more features: you can edit elements and attributes in place by assigning them a new value. Then, you can save the modified XML document to a file or store it in a PHP variable. Additionally, you can validate XML documents using XML Schema.

Besides RSS, SimpleXML is also perfect for parsing configuration files and consuming web services with REST. Additionally, I'm sure that as PHP 5 evolves, SimpleXML will gain even more functionality. Keep an eye peeled for the announcements and enjoy playing with SimpleXML.

Adam Trachtenberg is the manager of technical evangelism for eBay and is the author of two O'Reilly books, "Upgrading to PHP 5" and "PHP Cookbook." In February he will be speaking at Web Services Edge 2005 on "Developing E-Commerce Applications with Web Services" and at the O'Reilly booth at LinuxWorld on "Writing eBay Web Services Applications with PHP 5."


Return to the PHP DevCenter.



Have a question about PHP 5 or SimpleXML? Ask Adam here.
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment
Full Threads Oldest First

Showing messages 1 through 11 of 11.

  • Great tutorial!
    2009-05-12 15:45:26  mattcass [Reply | View]

    Nicely done, and nearly 5 years ago. I was looking for some simple examples of using PHP and XML and this got me going on the right track. Very handy and thanks a lot!
  • Any tips for working with HTML and simplexml?
    2005-06-16 08:16:25  kjwebguy [Reply | View]

    Hi,
    Thanks for the article. Any tips for working with HTML and simplexml? Any pitfalls to look out for? Thanks.
  • xsearch -> xpath
    2005-05-09 04:34:22  KvdnBerg [Reply | View]

    I'm not sure about this but I think you need to use $s->xpath instead of $s->xsearch, at least in my setup (PHP5 on Apache2 on Linux) xsearch gives an undefined method error while xpath gives the correct results.
    • xsearch -> xpath
      2005-12-27 13:53:41  GaryKG [Reply | View]

      Thanks for the tip. I got nowhere with xsearch, but with xpath, it started working.
  • Supported tagnames
    2004-02-15 10:54:05  vladimir-shapiro [Reply | View]

    What happens if i have the tag names with '-' symbol? For example: <first-name> or <last-name>?

    Or when i use russian or german tagnames? Is it covered with SimpleXML?

    wbr, Vladimir
    • Supported tagnames
      2004-03-12 02:27:31  riffraff [Reply | View]

      It seem to me that SimpleXML will just work like JavaScript hashes, where you can safely write
      myHash.my_key just as long as my_key is a valid identifier.

      I suppose that everyone that does not use ascii-7 alphanumerics will be forced to use something else (I hope I'm wrong).

      I'd suggest to the writers to try out ruby's REXML library, cause that is a kick ass, powerful and free library. And SimpleXML behaviour can be reproduced in REXML with 5 lines of code ;)
  • SimpleXML and XSLT in PHP5
    2004-01-16 10:32:47  anonymous2 [Reply | View]

    I am getting ready to move a codebase over to XSLT for template rendering. Will SimpleXML make that process easier for PHP developers or should I look into something like the Apache Project's XSLT parser?
    • Adam Trachtenberg photo SimpleXML and XSLT in PHP5
      2004-01-16 12:27:44  Adam Trachtenberg | O'Reilly Author [Reply | View]

      There are many new XML features in PHP5, including rewritten DOM, XSLT, and XPath classes. Unfortunately, I could not cover them all here.

      PHP 5's XSLT class uses libxsl, the sister library to libxml2. You pass the class DOM objects and it transforms files for you. This interface is cleaner than the PHP 4 XSLT extension, which used the Sablotron XSLT parser and didn't integrate with other PHP XML extensions.

      The current issue of PHP Magazine's Digital Edition (http://www.php-mag.net/) contains an article by me that gives a few examples of how to use XSLT with PHP 5. (Unfortunately, this article is not available for free.)

      So, to answer your question, if you're using PHP 4 and want to do more XSLT, then PHP 5 is definitely a step up. I hope that helps. Let me know if you have other questions.
  • Neat, but...
    2004-01-16 03:54:16  anonymous2 [Reply | View]

    I imagine the overhead of parsing the whole document and loading it into memory can be tremendous for sufficiently-large XML documents.

    In spite of that, it's a very cool extension. Makes me look forward to the release of PHP 5 that much more.
    • Adam Trachtenberg photo Neat, but...
      2004-01-16 09:54:22  Adam Trachtenberg | O'Reilly Author [Reply | View]

      SimpleXML uses the same parsing functions libxml2 uses to create DOM documents. Yes, you do have the overhead of needing to contain the entire document in memory; however, this is all handled in C, so the process is fast and efficient. (As compared to doing this in PHP.)

      The only alternative from a memory perspective is SAX, but SAX can be quite painful to program when your schema is complex. (Try parsing the Apple pList format, for example.)
      • Neat, but...
        2004-03-10 12:31:04  riffraff [Reply | View]

        what about pull parsing? isn't that both memory efficient and at least easyer to handle than sax ?


Recommended for You

  1. Cover of Learning PHP and MySQL
    Learning PHP and MySQL
    Print: $29.99
    Ebook: $20.99
  2. Cover of Object-Oriented PHP
    Object-Oriented PHP
    Print: $29.95
  3. Cover of PHPUnit Pocket Guide
    PHPUnit Pocket Guide
    Print: $9.95
    Ebook: $7.99
  4. Cover of Programming PHP
    Programming PHP
    Print: $39.99
    Ebook: $31.99

Tagged Articles

Post to del.icio.us

This article has been tagged:

php

Articles that share the tag php:

Understanding MVC in PHP (477 tags)

The PHP Scalability Myth (123 tags)

The Dynamic Duo of PEAR::DB and Smarty (53 tags)

PHP Form Handling (43 tags)

Very Dynamic Web Interfaces (39 tags)

View All

xml

Articles that share the tag xml:

Very Dynamic Web Interfaces (595 tags)

Introducing del.icio.us (181 tags)

How to Create a REST Protocol (161 tags)

Secure RSS Syndication (112 tags)

XML on the Web Has Failed (109 tags)

View All

php5

Articles that share the tag php5:

Understanding MVC in PHP (17 tags)

Programming eBay Web Services with PHP 5 and Services_Ebay (5 tags)

Three-Tier Development with PHP 5 (5 tags)

Using PHP 5's SimpleXML (4 tags)

Why PHP 5 Rocks! (2 tags)

View All

parsing

Articles that share the tag parsing:

Lexing Your Data (16 tags)

An Introduction to StAX (14 tags)

Analyzing HTML with Perl (8 tags)

Using PHP 5's SimpleXML (4 tags)

Parsing an XML Document with XPath (4 tags)

View All

programming

Articles that share the tag programming:

Rolling with Ruby on Rails (1374 tags)

Very Dynamic Web Interfaces (279 tags)

Ajax on Rails (231 tags)

Understanding MVC in PHP (202 tags)

A Simpler Ajax Path (186 tags)

View All

Sponsored Resources

  • Inside Lightroom
Advertisement

Sponsored by:

O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com