Thursday, November 02, 2006

The Role of XML

My understanding of XML
My understanding of the role of Extensible Markup Language (XML) is that it is the next generation markup language, following (but not replacing) the original Hypertext Markup Language (HTML) on which the internet as we know it is based. However, unlike HTML, it allows data to be described. This XML language is an enhancement of the existing HTML internet building blocks: providing protocols for reliable global communication and delivering files, a language specifying how data should be displayed, and the graphical interface for displaying HTML data on the web. XML aims to improve on these building block elements by being “well formed” to provide consistency of data element assignment, providing enhanced tags that separate content and presentation, and is verifiable (Rhyno, p72-73).

Problems with HTML
The internet is based on TCP/IP, an innovation that began the phenomenon of the internet, and which allowed global communication by providing easy file and information sharing. However, there are some drawbacks to HTML. Namely, that HTML is mainly concerned with “content presentation and arrangement” (Rhyno, p72) of data in a webpage but not with data description. An example of this that author Art Rhyno gives is wanting to

“extract subject information, or even if you just want consistency in subject assignment, it becomes difficult without a commonly used tag like <subject> to mark, or delimit, where this information is contained…HTML is limited to a fixed set of tags. In this case, <subject> is not considered a valid HTML tag….” (p 73).


Besides the description issue, some additional issues with HTML are that it’s “graphics rich content” is problematic for displaying on screens of most small wireless devices and that even if graphics are removed, the HTML text is overwhelming and not very readable for the user (Coyle, p135).

XML: What it is
Coyle describes XML as a “meta-language” because it describes “how others may define their own data languages” (p 137); or more simply, a language to describe language. Therefore, XML is not a language like HTML, because it only sets out a framework to allow “users and industry groups to define their own domain-specific data definition languages” (Coyle, p139). According to Wikipedia, “languages based on XML are defined in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their particular form.”

Relationship%20between%20XMLand%20HTML.jpg Image:http://www.idealliance.org/papers/xml2001/papers/html/images/03-03-05/xml-html-venn.jpg


A Breif History of XML and the W3C Influence
XML is derived from Standard Generalized Markup Language, which was itself based on Generalized Markup Language, that began the idea of a “formally defined document type” that can be “described with a set of rules” (Rhyno, p72). Some twenty years later, in 1996 the World Wide Web Consortium (W3C) became involved in developing XML (Rhyno, p72) because it was a less costly, more user friendly version of SGML and it could provide more meaningful data than HTML, yet wouldn’t replace it entirely (Desmarais, pp1-2).

timbl.jpg

Image: http://www.xml.com/2000/12/xml2000/timbl.jpg


XML and Libraries
The use of XML is very useful for libraries due to the following factors:

XML is well formed (Rhyno, p73): Meaning that the structure is sound because it contains the proper nesting with both an opening and closing tag (Desmarais, p7). This is crucial for libraries because they must have quality control over potentially “tens of thousands” of documents (Rhyno, p73).

XML can be validated and can ensure consistency (Rhyno, p73): This requires an XML parser or “validation mechanism…of a document type definition (DTD)” (Rhyno, p73) to “check incoming data against the rules defined in the DTD to verify that the data were structured correctly” (Desmarais, p3). Rhyno calls this validation step “one of the most important steps in managing a library’s digital collection” because it maintains consistency that is needed for “sharing the content with others” and for any future “migrations” of this information to a new system (Rhyno, p73).

XML separates content from presentation (Rhyno, p73): This goes back to the initial problem with HTML: namely that it doesn’t separate content from presentation, a task “which is fundamental in managing large collections of documents...” (Rhyno, p73).


seperating%20content%20from%20format.gif

Image: http://ils.unc.edu/~viles/xml/slides/img020.gif


Obviously the issue of separating content from presentation is important to libraries because their job is to provide and organize content from numerous sources. XML also allows meaning to be embedded into data that can be presented “in a format that is independent of device, programming language, operating system, or network platform” (Coyle, p138).

As noted, XML is useful for libraries and was considered for replacing the "aging" MARC format as their cataloging system (Desmarais, p3). The Library of Congress initially considered this shift by doing a feasibility study in 1995, then in 1998 they released a MARC document type definition (DTD) and software to convert MARC to XML (Desmarais, pp3-4). “The objective is to make machine-readable bibliographic data more open and interchangeable in the Internet environment” (Desmarais, p4).

XML Applications
Besides solving data content issues for the library, XML is useful for integrating technologies and being able to deliver critical information in other industries. For example, I found an article on XML in Wired News titled; XML Zooms onto Gov’t Tech Agenda. This article is about how “declining sales among U.S. automakers have clinched government support for XML standards.” This is in reference to the Enterprise Integration Act of 2002, which was made into law due to the declining profits of U.S. carmakers (Ford, GM, and DaimlerChrysler), after a report by the National Institute of Standards and Technology whose findings showed that data-quality errors caused by interoperability resulted in financial losses of $1 billion per year (Steakly, 2002). The idea here is that XML can help U.S. industries save billions of dollars by streamlining and integrating their manufacturing or business processes so that they can compete in the global marketplace.

Another article I located in the New York Times Technology section entitled, Software Out There that also addresses the interoperability issue. The idea here is that “blocks of interchangeable software components are proliferating on the Web and developers are joining them together to create a potentially infinite array of useful new programs” (Markoff, 2006). According to this article, the main reason for this shift from proprietary systems to one of interoperability is due to open source software and XML which allows “simple and efficient to exchange digital data over the Internet” (Markoff, 2006). Markoff quotes a Microsoft Chief Technical Officer, Ray Ozzie, as saying: "I'm pretty pumped up with the potential for R.S.S. to be the DNA for wiring the Web." Since RSS is based on an XML system, this is obviously a great example of XML extending the enterprise of the web.

todays%20vs%20tomorrows%20web.jpg

Image: http://www.tiresias.org/cost219ter/florence/images/dardailler_fig01.jpg


My Thoughts on the Impact of XML on the Digital Divide Issue
The Markoff article got me thinking about XML and the impact it has on the web and how this in turn, and more importantly, impacts the digital divide. Last night I read in The Economist, the article Splitting the Digital Difference about new ideas for narrowing this divide like giving children laptops, or hard wiring multiple users to a central computer they all share, or cell phones that can carry data and link shared PC's to the internet. This just makes me realize how important the XML standards are for truly “sharing” information. These XML standards will enable data to be shared and understood regardless of the device used or needs of the individual or entity using it to access, provide, or manage information.

References:
Coyle, Frank P. Wireless Web: A Manager’s Guide. NJ: Addison-Wesley, 2001.

Desmarais, Norman. The ABC’s of XML: The Librarian’s Guide to the eXtensible Markup Language. TX: New Technology Press, 2000.

Markoff, John. “Software Out There.” New York Times 5 April 2006. Accessed 26 Sept 2006, online: http://www.nytimes.com/2006/04/05/technology/techspecial4/05lego.html?ex=1159416000&en=ef2d38c1eebed9bf&ei=5070

Rhyno, Art. “Introduction to XML.” In Technology for the Rest of Us, ed Nancy Courney. CT: Libraries Unlimited, 2005.

Steakly, Lia. “XML Zooms onto Gov’t Tech Agenda” Wired News 11 Nov 2002. Accessed 26 Sept 2006, online: http://www.wired.com/news/politics/0,56287-0.html

0 Comments:

Post a Comment

<< Home