<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2873716402863254637</id><updated>2011-12-15T22:15:28.690-08:00</updated><category term='The Disadvantages of XML Searching'/><category term='THE XML RULES'/><category term='Specifying XML Structures Using Schema OVERVIEW'/><category term='The Advantages of XML Searching'/><category term='STRUCTURE XML'/><category term='PARSING XML FILES'/><title type='text'>learn xml script</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>18</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-7252519706493516935</id><published>2009-06-20T00:55:00.000-07:00</published><updated>2009-06-20T00:56:03.266-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Specifying XML Structures Using Schema OVERVIEW'/><title type='text'>Specifying XML Structures Using Schema OVERVIEW</title><content type='html'>Specifying XML Structures Using Schema&lt;br /&gt;OVERVIEW&lt;br /&gt;We've already seen that XML documents can be described using Document Type Definitions, DTDs. DTDs originated with SGML and show those origins all too visibly. XML documents are far more complex and varied than their SGML cousins because XML is used in far more ways than SGML. This creates a problem. While DTDs are perfectly suitable for SGML, where they have been used successfully for many years, they are inappropriate for the newer technology of XML. DTDs cannot be processed by XML-only applications. Developers need to learn two relatively complex languages to use DTDs and they cannot be validated using XML validators. XML has more data types than can be expressed in DTDs, and is generally far richer. Basically DTDs cannot be used to express XML documents.&lt;br /&gt;&lt;br /&gt;To remedy this situation, W3C has created a language called XML Schema which can be used to define XML structures. A number of different schema languages exist. In this chapter I will be writing specifically about XML Schema because it is a Recommendation of W3C. I'll be using the terms XML Schema and schema interchangeably – my choice being based purely upon which reads better in a given context. If I wanted to be precise all of the time I would use XML Schema when referring to the language and Recommendation, and schema when referring to a particular document that uses the language.&lt;br /&gt;&lt;br /&gt;As I write this, far more tools exist to handle DTDs than XML Schema. This situation is changing rapidly since everyone sees the advantages of using schemas. DTDs are really a technical dead-end, although understanding them will remain important since so many exist. It's likely that when you are using older documents, they'll continue to be described using DTDs. New documents should always be described using XML Schema.[1] &lt;br /&gt;&lt;br /&gt;The most important omission in the DTD is the idea of a data type. SGML documents tend to contain mostly plain text. Almost all data in an SGML application can be treated as strings of characters in definitions and applications. XML documents require a far richer set of data types, including strings of characters, numbers, both whole and decimal, and complex types such as dates and times. XML Schema introduces data types which, in turn, leads to more tightly defined XML structures which can be used with current database technologies or in conventional applications written in general-purpose programming languages. Other new, and useful, features in the XML Schema Recommendation include:&lt;br /&gt;&lt;br /&gt;a simple pattern matching grammar which might be used, for example, to define the structure of an order code,&lt;br /&gt;&lt;br /&gt;defined ordering of subelements so that document structure can be tightly controlled,&lt;br /&gt;&lt;br /&gt;selection between different elements so that documents can share a schema without having identical structure.&lt;br /&gt;&lt;br /&gt;DTDs are described using their own, unique, syntax. Using them means having to learn, and apply, two sets of syntactic rules in one application. While DTDs are not the most complex documents imaginable, it is vital that developers define them correctly. Equally as important, parsing and manipulating DTDs within applications requires special libraries. XML Schema documents can be handled much more easily because they are fully compliant XML documents in their own right. What does this mean in practice? The tools that you use to develop, parse and manipulate your XML can also be used for your schemas. Developers need learn only one set of rules for schema and document, and both could be created using the same pieces of editing software.&lt;br /&gt;&lt;br /&gt;Using XML Schema requires an understanding of namespaces. Schema definitions always use namespaces, so much so that namespaces are one of the cornerstones of schema technology. I've mentioned namespaces before; now is the time to examine them in detail and learn how to use them.&lt;br /&gt;&lt;br /&gt;[1]Although pragmatic realities such as organizational politics, historical preferences or the tools you have available may force you to use DTDs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-7252519706493516935?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/7252519706493516935/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/specifying-xml-structures-using-schema.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/7252519706493516935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/7252519706493516935'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/specifying-xml-structures-using-schema.html' title='Specifying XML Structures Using Schema OVERVIEW'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-5629123036072115882</id><published>2009-06-20T00:42:00.001-07:00</published><updated>2009-06-20T00:44:46.026-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='STRUCTURE XML'/><title type='text'>STRUCTURE XML</title><content type='html'>STRUCTURE&lt;br /&gt;The DTD is a series of declarations. Each declaration takes the form:&lt;br /&gt;&lt;br /&gt;   '&lt;!   &gt;'&lt;br /&gt;&lt;br /&gt;and contains one of four keywords. These are:&lt;br /&gt;&lt;br /&gt;ELEMENT which defines a tag,&lt;br /&gt;&lt;br /&gt;ATTRIBUTE which defines an attribute of an ELEMENT,&lt;br /&gt;&lt;br /&gt;ENTITY which is used to define an ENTITY,&lt;br /&gt;&lt;br /&gt;NOTATION which defines a data type.&lt;br /&gt;&lt;br /&gt;The easiest way to understand the structure of a DTD is to look at a simplified one. Rather than create a novel structure, I'm going to use part of the DTD for the Business Letter. This is shown in Listing 3.1.&lt;br /&gt;&lt;br /&gt;Listing 3.1: Partial DTD for the Business Letter &lt;br /&gt; &lt;br /&gt;&lt;br /&gt;'&lt;!DOCTYPE letter ["&lt;br /&gt;   '&lt;!ELEMENT letter (address)&gt;'&lt;br /&gt;   '&lt;!ELEMENT address (line1, line2?, line3*, city,(county|state)'&lt;br /&gt;     ' ?, country?, code?)&gt;'&lt;br /&gt;   '&lt;!ELEMENT line1 (#PCDATA)&gt;'&lt;br /&gt;   '&lt;!ELEMENT line2 (#PCDATA)&gt;'&lt;br /&gt;  '&lt;!ELEMENT line3 (#PCDATA)&gt;'&lt;br /&gt;   '&lt;!ELEMENT city (#PCDATA)&gt;'&lt;br /&gt;   '&lt;!ELEMENT county (#PCDATA)&gt;'&lt;br /&gt;   '&lt;!ELEMENT state (#PCDATA)&gt;'&lt;br /&gt;  ' &lt;!ELEMENT country (#PCDATA)&gt;'&lt;br /&gt;   '&lt;!ELEMENT code (#PCDATA)&gt;'&lt;br /&gt;]&gt;'&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-5629123036072115882?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/5629123036072115882/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/structure-xml.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/5629123036072115882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/5629123036072115882'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/structure-xml.html' title='STRUCTURE XML'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-3404423239713633208</id><published>2009-06-20T00:40:00.001-07:00</published><updated>2009-06-20T00:40:38.512-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='PARSING XML FILES'/><title type='text'>PARSING XML FILES</title><content type='html'>Applications that manipulate XML need to be able to move through the data structure, finding elements, tags and content. Processing data to extract meaning from it is called parsing in computing. The same term is used to describe the processing of sentences in human languages to extract their meaning. The idea, in both cases, is the same. Few developers choose to write their own XML parsers. Although the rules of the grammar are relatively simple, writing fast and accurate parsers is a difficult task. Most people use a parser written by someone else. Many XML parsers are freely available; the choice of which you use tends to depend upon your system and the language that you are developing in. Two popular choices are MSXML from Microsoft, which can be programmed using C++ or Visual Basic, and Xerces from the Apache Foundation. Xerces comes in Java, C++ and Perl versions and can be used on many different operating systems. Both these parsers can be used directly from the command line or called from within applications.&lt;br /&gt;&lt;br /&gt;Once you have installed MSXML on your system, it is automatically available within Internet Explorer. This means that you can, for instance, open XML files in Explorer and view them as tree structures.&lt;br /&gt;&lt;br /&gt;As you read through this book, you'll find that XML parsers can do lots of interesting things with your XML. One of the most useful is to check if the XML you have written is correct, and if it adheres to the rules set out in the DTD or schema for that particular document.&lt;br /&gt;&lt;br /&gt;2.4.1 Valid or well-formed?&lt;br /&gt;XML documents may be either valid or well-formed. The two terms relate to differing levels of conformance with the XML Recommendation, the DTD, or schema, and the basic structure of the XML. All XML documents must be well-formed. Tags should be paired, elements should be properly nested, the document should have an XML declaration. Entities should be properly formed. Any application which can handle XML will be able to cope with a well-formed document. A valid document takes conformance rather further. To be valid, a DTD or schema should be identified for the XML data. The data must meet the rules set out in that document.&lt;br /&gt;&lt;br /&gt;All XML parsers are able to check that a document is well-formed. For some such as MSXML, this is where their capabilities end. Other parsers such as Xerces are able to validate an XML document against a DTD. At the time of writing, Schema support in Xerces is in the alpha stage of development. That means it's far from ready for the big time – but it is being implemented. XML is a new technology, it's evolving rapidly and tool support does tend to lag slightly behind. In the near future, though, the tools will be available to use XML Schema as well as DTDs. It's at that stage that we'll start to see DTDs becoming less popular with developers.&lt;br /&gt;&lt;br /&gt;Unparsed Character Data&lt;br /&gt;Most of the content in an XML file will be handled by the parser. Generally elements and entities contain text that has some meaning. The content will not include characters such as &lt; which have special meaning to the parser, and when it does contain them, those characters are usually entered as character entities. Sometimes a document will include large numbers of these characters. In such cases using entities may be impractical. The XML standard allows for this. Your document can include sections of CDATA, unparsed character data. All characters inside a CDATA section are assumed to be content, rather than markup. A section of CDATA is started with the string &lt;![CDATA] and ended with ]]&gt; as shown in Listing 2.6. You'll meet CDATA again in the discussion of DTDs in Chapter 3.&lt;br /&gt;&lt;br /&gt;Listing 2.6: CDATA Sections &lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;?xml version="1.0"?&gt;&lt;br /&gt;&lt;br /&gt;&lt;greeting style="informal"&gt;&lt;br /&gt;  &lt;from&gt;Chris Bates&lt;/from&gt;&lt;br /&gt;  &lt;to&gt;Mr. M. Mouse&lt;/to&gt;&lt;br /&gt;  &lt;message&gt;Hi, how're ya doin'?&lt;/message&gt;&lt;br /&gt;  &lt;![CDATA[ The text in &lt;&lt; here can contain &amp; markup &gt;&lt;br /&gt;  characters until the end of the section is reached&lt;br /&gt;  ]]&gt;&lt;br /&gt;&lt;/greeting&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-3404423239713633208?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/3404423239713633208/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/parsing-xml-files.html#comment-form' title='1 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3404423239713633208'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3404423239713633208'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/parsing-xml-files.html' title='PARSING XML FILES'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-3444779233341084045</id><published>2009-06-20T00:29:00.001-07:00</published><updated>2009-06-20T00:29:43.034-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='THE XML RULES'/><title type='text'>THE XML RULES</title><content type='html'>THE XML RULES&lt;br /&gt;Computer languages need to be formally defined in some way. Developers need to know what facilities are available in a language and that those facilities will work in the same way in all implementations. Languages are usually standardized by an international body such as the International Standards Organization, ISO, or the Institute of Electrical and Electronic Engineers, IEEE. For those languages that have defined standards, all compilers or interpreters must adhere to the standard: if a C++ compiler doesn't work according to the ANSI/ISO C++ standard then it really isn't a C++ compiler. Often these standards are minimum requirements which will be available in all products and on all platforms. Manufacturers of compilers are free to extend the language by adding their own proprietary features, although this does mean that the extended version will no longer be standard. Often large or powerful companies try to force their extensions into the standards. This can be extremely beneficial when it leads to improvements – too often standardized languages are developed by committees and become lowest common denominator languages. New extensions may only be available on one platform. If developers wish to write code on a Linux box but later compile and execute it on an Apple Macintosh, they can only do this if no extensions have been used. Problems like this tend to force people either to adhere rigidly to the standard or to work exclusively for a subset of all available platforms. When developing for heterogeneous systems such as the Web, adherence to the standard is clearly the preferred option.&lt;br /&gt;&lt;br /&gt;XML requires a common set of rules. In fact, since any Web technology must work on every platform in a plethora of software applications, standardization is even more important than for programming languages. Perhaps surprisingly, XML, like HTML, isn't actually an international standard. It's a Recommendation of the World Wide Web Consortium (W3C). W3C Recommendations have much of the force of international standards but the process of creating them is far more flexible and far faster than standardization.&lt;br /&gt;&lt;br /&gt;The current XML Recommendation is Version 1.0 (second edition). It can be viewed online at http://www.w3.org/TR/2000/REC-xml-20001006 or downloaded in a variety of formats. The second edition makes no major changes to the first edition of the Recommendation but does incorporate all of its errata. Most standards documents are necessarily complex. They don't make for an easy read, and the XML Recommendation is no exception. If you want to know just how much thought went into the design of XML, download a copy of the Recommendation and spend a few minutes leafing through it.&lt;br /&gt;&lt;br /&gt;2.3.1 XML Tags&lt;br /&gt;XML documents are composed of elements. An element has three parts: a start tag, an end tag and, usually, some content. Elements are arranged in a hierarchical structure, similar to a tree, which reflects both the logical structure of the data and its storage structure. A tag is a pair of angled brackets, &lt;… &gt;, containing the name of the element, and pairs of attributes and values. An end tag is denoted by a slash, /, placed before the text. Here are some XML elements:&lt;br /&gt;&lt;br /&gt;&lt;book&gt;The Lord Of The Rings&lt;/book&gt;&lt;br /&gt;&lt;chapter&gt;Helm's Deep&lt;/chapter&gt;&lt;br /&gt;&lt;name&gt;Professor J. R. R. Tolkien&lt;/name&gt;&lt;br /&gt;&lt;br /&gt;XML elements must obey some simple rules:&lt;br /&gt;&lt;br /&gt;An element must have both a start tag and an end tag unless it is an empty element.&lt;br /&gt;&lt;br /&gt;Start tags and end tags must form a matched pair.&lt;br /&gt;&lt;br /&gt;XML is case-sensitive so that name does not match nAme. You can, though, use both upper and lower-case letters inside your XML markup.&lt;br /&gt;&lt;br /&gt;Tag names cannot include whitespace.&lt;br /&gt;&lt;br /&gt;Here are those same elements with introduced errors:&lt;br /&gt;&lt;br /&gt;&lt;book&gt;The Lord Of The Rings&lt;/Book&gt;&lt;br /&gt;&lt;cha pter&gt;Helm's Deep&lt;/chapter&gt;&lt;br /&gt;&lt;name&gt;Professor J. R. R. Tolkien&lt;/n&gt;&lt;br /&gt;&lt;br /&gt;2.3.1.1 Nesting Tags&lt;br /&gt;Even very simple documents have some elements nested inside others. In fact, if your document is going to be XML it has to have a root element which contains the rest of the document. Tags must pair up inside XML so that they are closed in the reverse order to that in which they were opened.&lt;br /&gt;&lt;br /&gt;The code in the left column of Table 2.2 is not valid XML since the ordering of the start and end tags has become confused. The correct version is shown on the right side of the same table.&lt;br /&gt;&lt;br /&gt;Table 2.2: Nesting Elements  Incorrect&lt;br /&gt; Correct&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;?xml version="1.0"?&gt;&lt;br /&gt;&lt;br /&gt;&lt;greeting style="informal"&gt;&lt;br /&gt;  &lt;from&gt;Chris Bates&lt;br /&gt;  &lt;to&gt;Mr. M. Mouse&lt;/to&gt;&lt;br /&gt;  &lt;/from&gt;&lt;br /&gt;  &lt;message&gt;&lt;br /&gt;    Hi, how're ya doin'?&lt;br /&gt;  &lt;/greeting&gt;&lt;br /&gt;&lt;/message&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;?xml version="1.0"?&gt;&lt;br /&gt;&lt;br /&gt;&lt;greeting style="informal"&gt;&lt;br /&gt;  &lt;from&gt;Chris Bates&lt;/from&gt;&lt;br /&gt;  &lt;to&gt;Mr. M. Mouse&lt;/to&gt;&lt;br /&gt;  &lt;message&gt;&lt;br /&gt;    Hi, how're ya doin'?&lt;br /&gt;  &lt;/message&gt;&lt;br /&gt;&lt;/greeting&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;2.3.1.2 Empty Tags&lt;br /&gt;Sometimes an element that could contain text happens not to. There may be many reasons for this – the attributes of the element may contain all the necessary information, or the element may be required if the document is to be valid. These empty elements can be represented in two ways:&lt;br /&gt;&lt;br /&gt;&lt;book&gt;The Lord Of The Rings&lt;/book&gt;&lt;br /&gt;&lt;book&gt;&lt;/book&gt;&lt;br /&gt;&lt;book /&gt;&lt;br /&gt;&lt;br /&gt;The empty element can be included by placing an end tag immediately after the start tag. More simply, a tag containing the name of the element followed by a slash can be used.&lt;br /&gt;&lt;br /&gt;2.3.1.3 Characters in XML&lt;br /&gt;When the XML Recommendation talks about characters, it means characters from the Unicode and ISO 10646 character sets. Until relatively recently most computing applications used a relatively small set of characters, typically the 128 letters of the ASCII character set which could be represented using seven bits. The ASCII character set, defined in ISO/IEC 646, only allowed users to enter those letters typically found in the English language.&lt;br /&gt;&lt;br /&gt;In a multilingual world this is clearly an impractical limitation which led to the development of many alternative character sets. Web applications typically use ISO 8859 which uses 8 bits for each character and which defines a number of alphabets. These include the standard Latin alphabet used as default by most Web browsers. Unicode goes further and uses two bytes to represent each character. This means that Unicode includes 65,536 different characters, insufficient for Chinese but suitable for most uses. ISO 20646 extends the Unicode idea by using four bytes for each character, giving approximately 2 billion possible characters. Unicode is implemented as the default encoding in Microsoft Windows and the Java programming language, among others. But it clearly needs extending to access those extra characters, and has been. Version 2.1 of Unicode includes some facilities that give access to the ISO 10646 character set.&lt;br /&gt;&lt;br /&gt;Using ISO 10646 to represent ASCII data is highly inefficient – effectively three bytes of memory are wasted. Even though computer memory and storage are extremely cheap today, such inefficiency is expensive if an application is handling gigabytes of data. Therefore applications use encoding schemes to store data more efficiently. Applications that process XML must support two of these: UTF-8 and UTF-16. UTF-8, for instance, uses a single byte for ASCII data and two to six bytes for extended characters.&lt;br /&gt;&lt;br /&gt; Note  XML applications support extended character sets. These allow up to 2 billion different characters. When you develop using XML you can use any language and character set that you need to in your applications. You are not restricted to the English language or to the set of languages supported on a particular operating system.&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;It's worth noting that everything in an XML document that is not markup is considered to be character data. Markup[4] consists of:&lt;br /&gt;&lt;br /&gt;start tag,&lt;br /&gt;&lt;br /&gt;end tag,&lt;br /&gt;&lt;br /&gt;empty tag,&lt;br /&gt;&lt;br /&gt;entity reference,&lt;br /&gt;&lt;br /&gt;character reference,&lt;br /&gt;&lt;br /&gt;comments,&lt;br /&gt;&lt;br /&gt;delimiters for CDATA sections,&lt;br /&gt;&lt;br /&gt;document type declarations,&lt;br /&gt;&lt;br /&gt;processing instructions,&lt;br /&gt;&lt;br /&gt;XML declarations,&lt;br /&gt;&lt;br /&gt;text declarations.&lt;br /&gt;&lt;br /&gt;The final, important thing about characters is that some of them have special meaning or cannot be easily represented in your source text using a conventional keyboard. Most of the characters in ISO 10646 clearly fall into this category. Some mechanism is therefore required to permit the full range of characters to be included in documents. This is done through character references. To demonstrate the use of character references, I'll look at those characters that can have special meaning inside markup. Characters such as &lt;, &gt;, ', " are used as part of the markup of the document. If they're encountered by the parser inside an XML file, it assumes that they are control characters which have special meaning to it, and it then acts accordingly. The obvious example of this behavior is found in handling attributes. The following two examples would be illegal in XML:&lt;br /&gt;&lt;br /&gt;&lt;message src="here is the "source" of the message" /&gt;&lt;br /&gt;&lt;message src='here is the 'source' of the message' /&gt;&lt;br /&gt;&lt;br /&gt;In each case, the parser will assume that the content of the src attribute starts at the first apostrophe or set of quotation marks, and stops at the second. Attribute content following this point cannot be parsed since it is not valid XML.&lt;br /&gt;&lt;br /&gt;What happens when the file should legitimately contain &lt; as part of its character data? The appropriate character reference is entered instead.[5] Table 2.3 shows the references which must be entered in an XML document if you want a particular character. Here's the previous example reworked to be valid XML:&lt;br /&gt;&lt;br /&gt;&lt;message src="here is the &amp;quot;source&amp;quot; of the message" /&gt;&lt;br /&gt;&lt;message src='here is the &amp;apos;source&amp;apos; of the message' /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-3444779233341084045?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/3444779233341084045/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/xml-rules.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3444779233341084045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3444779233341084045'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/xml-rules.html' title='THE XML RULES'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-5346978833736055351</id><published>2009-06-17T01:07:00.001-07:00</published><updated>2009-06-17T01:07:22.797-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='The Disadvantages of XML Searching'/><title type='text'>The Disadvantages of XML Searching</title><content type='html'>The Disadvantages of XML Searching&lt;br /&gt;Despite the optimistic view of many people in the XML community, the XML searching problem is complicated, from both technical and business perspectives. In some situations, XML-based contextual searching can be a major advantage; in others, it can be an unnecessary cost; in yet others, it can make the search engine's results worse. This section introduces some of the problems with the very idea of XML-based searching.&lt;br /&gt;&lt;br /&gt;XML searching may be too complex for most users.&lt;br /&gt;&lt;br /&gt;Documents on the Web can use deceptive markup to raise their ranking in a search.&lt;br /&gt;&lt;br /&gt;XML documents are generally not interoperable in the same search environment, because of all the different, incompatible vocabularies.&lt;br /&gt;&lt;br /&gt;6.2.1. Usability&lt;br /&gt;Tim Bray, cofounder of Open Text, which ran an early Web search engine, and coauthor of the original XML specification [XML], wrote the following passage in a Web log (http://www.tbray.org/ongoing/When/200x/2003/06/17/SearchUsers):&lt;br /&gt;&lt;br /&gt;Nobody Uses Advanced Search...&lt;br /&gt;&lt;br /&gt;Every search engine has an "advanced search" screen, and nobody (quantitatively, less than 0.5% of users) ever goes there. This drove us nuts back at Open Text, because our engine was very structurally savvy and could do compound/boolean queries that look like what today we'd call XPath. But nobody used it.&lt;br /&gt;&lt;br /&gt;What most people want is to have a nice simple field into which they will type on average 1.3 words and hit Enter, and have the result come back to them. So anyone who's building search needs to focus almost all their energy on doing an as-good-as-possible job given those 1.3 words and no other inputs.&lt;br /&gt;&lt;br /&gt;This observation does not bode well for XML searching. If users are unwilling to use even relatively simple full-text techniques, such as Boolean or proximity searches, how much hope is there that they will be willing to formulate the complex queries that can take advantage of XML markup? Fortunately for the future of XML searching, Bray does go on to qualify that observation:&lt;br /&gt;&lt;br /&gt;...Except the People who Do&lt;br /&gt;&lt;br /&gt;Of course, the people who do use Advanced Search are your most fanatical users, the professional librarians, spooks, and private investigators. And the ones who will do what it takes to find out everything about research on the rare disease their child just got diagnosed with. These people tend to be loud-mouthed and aggressive and will get in your face if you don't have advanced search or it's not real good.&lt;br /&gt;&lt;br /&gt;Presumably, these same kinds of people would be the ones using XML context in their searches. Others of Bray's "fanatical users" might be academics preparing papers, journalists researching news stories, and software agents collecting and amalgamating information for politicians and managers. This last example, in fact, may point to the real potential users of XML searching: not people but software. People other than governors and CEOs need to make decisions in their own lives, from changing jobs to buying new clothes, and software agents that find information for peoplesay, for price comparisoncould benefit greatly from the extra information provided by XML markup, assuming, of course, that vendors were willing to encode their pricing information in a standard format and accept the transparency that comes with that.&lt;br /&gt;&lt;br /&gt;And that leads to another usability problem: XML searching requires people or software to know a lot about the structure of the documents they're searching. If all XML documents shared a single, global vocabulary, searching would be relatively straightforward, at least for power users: Every price would appear inside a price element, every bar code would appear inside a upc element, every person's name would appear inside a person element, and so on. This is unlikely ever to happen, for two reasons:&lt;br /&gt;&lt;br /&gt;No single, accepted authority could impose a common vocabulary on all users.&lt;br /&gt;&lt;br /&gt;XML documents can encode a potentially infinite variety of information, so a common vocabulary would always be incomplete.&lt;br /&gt;&lt;br /&gt;Some XML-related specifications are designed to work around these problems, at least partlysee Section 6.3.3 for more informationbut in reality, if a large amount of XML markup did appear on the Web today, generalized XML searching would be almost useless, given the hundreds of incompatible XML-based vocabularies. The best people can hope for is specialized searching inside repositories or across Web collections in which all XML documents share a common type: Conceptually, this is the equivalent of a site search engine rather than a Web search engine, and it falls far short of a revolution in Web searching. Even then, searching will be more complex than the most difficult "advanced search" page currently available on full-text Web search engines. Either users will have to become experts in XML structure, or they will have to limit themselves to a few precooked searches, such as "Search for a person" or "Search for a part number," through Web sites that can construct an XML query for them.&lt;br /&gt;&lt;br /&gt;In the end, XML searching may be useful for specific project applications. But usability issues alone make it seem unlikely that XML will ever cause the social revolution in Web searching that some supporters hoped for when the specification first appeared.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-5346978833736055351?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/5346978833736055351/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/disadvantages-of-xml-searching.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/5346978833736055351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/5346978833736055351'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/disadvantages-of-xml-searching.html' title='The Disadvantages of XML Searching'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-3821889490634708848</id><published>2009-06-17T01:03:00.000-07:00</published><updated>2009-06-17T01:06:34.615-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='The Advantages of XML Searching'/><title type='text'>The Advantages of XML Searching</title><content type='html'>The Advantages of XML Searching&lt;br /&gt;XML markup makes searching smarter by adding contextual information and makes it possible to correlate information from more than one document. Any serious Internet user is familiar with searches that do not work: The words are too common or have too many different meanings for any search engine to return useful results, or perhaps the information is spread among several pages. Solving these problems was one of the initial goals of XML's creators.&lt;br /&gt;&lt;br /&gt;6.1.1. Context&lt;br /&gt;Much of the time, full-text search engines do a good job, but they sometimes fall flat. Consider, for example, the difference between Bush the U.S. president and bush the shrub, or Washington the U.S. state and Washington the U.S. city. If you were trying to find information on bush pilots flying out of Washington State, you might try the search "bush pilot washington." In late 2003, Google's first ten results were as follows:&lt;br /&gt;&lt;br /&gt;A site selling a book about a Canadian bush pilot (no connection to Washington)&lt;br /&gt;&lt;br /&gt;Two newspaper stories about the U.S. Navy naming an aircraft carrier after President Bush Sr.&lt;br /&gt;&lt;br /&gt;A 2002 USA Today story about a small plane violating airspace near President Bush Jr.&lt;br /&gt;&lt;br /&gt;A 2000 Washington Post story about President Bush Jr.'s service in the Texas Air National Guard&lt;br /&gt;&lt;br /&gt;A 2001 Pravda story critical of President Bush after the midair collision between a Chinese fighter jet and a U.S. surveillance plane&lt;br /&gt;&lt;br /&gt;The Amazon.com page for a biography of an Alaskan bush pilot, published by University of Washington Press&lt;br /&gt;&lt;br /&gt;Two pages from FlyRod &amp; Reel magazine: one stating that it has no listings for Washington and another listing angling retailers in the state&lt;br /&gt;&lt;br /&gt;A 2003 news story from the Washington Times about a U.S. Navy pilot being held by the Iraqi government&lt;br /&gt;&lt;br /&gt;I choose Google for this example precisely because it is a very good full-text search engine: It infers the relevance of information on a Web page not only from the text on the page but also from the other pages that link to it, the pages that link to those pages, and so on. With this difficult example, the first slightly relevant result is the twenty-third, which mentions a bush pilot who did fly once in Washington State; after that, the matches revert mainly to politics.&lt;br /&gt;&lt;br /&gt;An experienced search-engine user could work around the problem by adding more words to the query. For example, bush pilots in Washington State have to deal with a lot of mountains, and the query string "bush pilot washington mountains" returns fewer political hits. Even better, a search string that contains specific aircraft types used in bush plane flying, such as "bush pilot washington cessna 180" returns almost all relevant matches. Most of the population is not that adept with search-engine query strings, however, and would likely give up on the whole thing; furthermore, these more specific queries would miss pages that do not happen to mention mountains or Cessna 180 airplanes.&lt;br /&gt;&lt;br /&gt;Although homographs, such as Bush and bush can make full-text searches difficult, an even trickier problem comes when the search results depend more on context than on the individual words. For example, consider trying to find Web pages that discuss the history of the word sex, without hitting thousands of pornography sites. Unless the decades-old dream of full machine artificial intelligence finally shakes off its dust and comes true, these are searches that will continue to flummox traditional full-text search engines.&lt;br /&gt;&lt;br /&gt;As long as artificial intelligence remains a distant dream, we need to concentrate on getting plain old human intelligence into our XML documents, and that is precisely what XML-based markup languages do. The following News Industry Text Format [NITF] fragment shows how news providers can tag articles to avoid confusing search engines.&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Today in &lt;location&gt;&lt;city&gt;Seattle&lt;/city&gt;,&lt;br /&gt;&lt;state&gt;Washington&lt;/state&gt;&lt;/location&gt;, &lt;person&gt;President&lt;br /&gt;Bush&lt;/person&gt; opened a new museum.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The markup represents added human intelligence from the news reporter or editor: Washington represents a state, not a city, and Bush represents a person, not a shrub. The XML document contains not only the news story itself but also what the author knew about the news story. As this information survives all the way to the final document, search engines require no special artificial intelligence to use it.&lt;br /&gt;&lt;br /&gt;Similarly, several markup languages, including DocBook [DOCBOOK] and the Text Encoding Initiative [TEI] define markup for talking about the history of words. Following is a DocBook example:&lt;br /&gt;&lt;br /&gt;&lt;para&gt;The word &lt;wordasword&gt;sex&lt;/wordasword&gt; has gradually come&lt;br /&gt;to mean not only gender, but the physical act of procreation, and, &lt;br /&gt;eventually, all physically-intimate acts.&lt;/para&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The wordasword element makes it clear that this paragraph discusses the word sex rather than the act: A search engine could easily pick out this contextual information to return exactly the pages the user wanted.&lt;br /&gt;&lt;br /&gt;XML markup is information that a document's creator knew but could not put in the main text. Because this information is available, search tools could potentially use it to return far more accurate results.&lt;br /&gt;&lt;br /&gt;6.1.2. Correlation&lt;br /&gt;In addition to basic context, XML markup also makes it possible to correlate information, matching instances of the same thing described in different ways and converting among different representations. To start with a simple example, consider only a few of the many ways documents might refer to British Prime Minister Tony Blair:&lt;br /&gt;&lt;br /&gt;Tony Blair&lt;br /&gt;&lt;br /&gt;Prime Minister Blair&lt;br /&gt;&lt;br /&gt;the British Prime Minister&lt;br /&gt;&lt;br /&gt;the prime minister&lt;br /&gt;&lt;br /&gt;the P.M.&lt;br /&gt;&lt;br /&gt;Mr. Blair&lt;br /&gt;&lt;br /&gt;Blair&lt;br /&gt;&lt;br /&gt;Given this variety, searching for information about Prime Minister Blair is difficult, and the work is made even worse by the fact that many of these phrases can apply to other people: for example, in a different context, "the prime minister" could refer to the prime minister of Australia, Canada, India, or many other countries.&lt;br /&gt;&lt;br /&gt;How is it possible to define a Web in which people can easily search for information about Prime Minister Blair no matter how he is described? One possibility is always to normalize the name when it appears; unfortunately, if people are forced always to write "British Prime Minister Tony Blair," text will become awkward and unnatural, and even that might not be sufficient if another person named "Tony Blair" became British prime minister in the future.[1]&lt;br /&gt;&lt;br /&gt;[1] This risk is not far fetched: Consider the phrase "U.S. President George Bush."&lt;br /&gt;&lt;br /&gt;Using XML markup, however, it is a simple matter to attach a unique identifier to every location that mentions Prime Minister Blair. As long as the identifier is well known, search engines can look for it rather than for the text it contains, as shown by the following markup fragments:&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;Tony Blair&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;Prime Minister Blair&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;the British Prime Minister&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;the Prime Minister&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;the P.M.&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;Mr. Blair&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;person ident="ps10563blair"&gt;Blair&lt;/person&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If in the future, another person shared the same name and title, that person would have a different identifier, so search engines would not return false hits.&lt;br /&gt;&lt;br /&gt;Most things in the worldpeople, concepts, historical periods, and so ondo not yet have standard, universally accepted identifiers, so this is more than a markup problem. However, many identification schemes do exist, such as stock market symbols; publication identifiers, such as ISBNs; social security numbers, phone numbers, postal codes, and country, language, and currency codes. The following example shows the use of a stock ticker symbol for identifying a company:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Today, &lt;company symbol="NASDAQ.MSFT"&gt;Microsoft&lt;/company&gt; &lt;br /&gt;announced a new software strategy.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Even limiting searching to current widely accepted identifiers, XML markup can make it significantly easier to correlate information described in various ways.&lt;br /&gt;&lt;br /&gt;Now, consider a more difficult problem than simple identification: a search for world government spending programs that cost more than USD 1 billion. This kind of a search is far beyond the capabilities of current full-text search engines, but XML markup can add hints to help future search engines do the work. The following example uses News Industry Text Format [NITF] markup once again:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Today, Congress approved an additional &lt;money unit="USD"&gt;3&lt;br /&gt;billion&lt;/money&gt; in education spending.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The money element makes it clear that this article is referring to "3 billion" in currency, and the unit attribute indicates that the currency is U.S. dollars, using a code from the ISO 4217 standard for identifying world currencies [ISO-4217]. A search engine would still use full-text searching algorithms to determine that the article dealt with government spending, but then the tagging would help it determine the amount. Even more interestingly, the money element adds enough intelligence that the search engine could return correct results for pages using entirely different currencies; at the time of writing, the money in the next example is less than USD 1 billion, so it should not return a hit:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The Canadian federal government committed an additional &lt;money &lt;br /&gt;unit="CAD"&gt;1.1 billion&lt;/money&gt; in health-care spending.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Automatic currency conversion during searchingbased on intelligent tagging like thiscould be especially useful for financial institutions and others mining large international document repositories for information. Many other types of conversion and substitution are also possible with markup, including dates and times, language conversion and recognition of synonyms, subsets, and supersets. (For example, a search for information about New England should return pages that mention Vermont.)&lt;br /&gt;&lt;br /&gt;Obviously, a lot of infrastructure is required before search engines can work like this. But it does provide an intriguing view of a future that markup might help to enable, where people and programs can search for and find the precise information they need, relying on intelligence encoded in XML markup.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-3821889490634708848?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/3821889490634708848/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/advantages-of-xml-searching.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3821889490634708848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3821889490634708848'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/advantages-of-xml-searching.html' title='The Advantages of XML Searching'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-1173555275447406299</id><published>2009-06-16T18:24:00.001-07:00</published><updated>2009-06-16T18:24:31.301-07:00</updated><title type='text'>Network Resources</title><content type='html'>Network Resources&lt;br /&gt;People have concerns about how XML networking will perform once it is in widespread use, but XML networking brings big performance advantages in one area: Because it can contain arbitrarily complex structure, an XML document can batch up information and reduce the number of network transactions required. For example, a hypothetical accounting server with an XML networking interface might allow a client to request information about multiple accounts with a single XML document sent over HTTP, as in Listing 5-1.&lt;br /&gt;&lt;br /&gt;Listing 5-1. XML Batch Request&lt;br /&gt;&lt;balance-request&gt;&lt;br /&gt;  &lt;account ref="assets.current.accounts-receivable"/&gt;&lt;br /&gt;  &lt;account ref="assets.current.petty-cash"/&gt;&lt;br /&gt;  &lt;account ref="liabilities.current.accounts-payable"/&gt;&lt;br /&gt;  &lt;account ref="income.professional-services"/&gt;&lt;br /&gt;&lt;/balance-request&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The server could respond with all the information also in a single XML document, as in Listing 5-2.&lt;br /&gt;&lt;br /&gt;Listing 5-2. XML Batch Response&lt;br /&gt;&lt;balance-info&gt;&lt;br /&gt;  &lt;balance&lt;br /&gt;    account="assets.current.accounts-receivable"&gt;144298.00&lt;/balance&gt;&lt;br /&gt;  &lt;balance&lt;br /&gt;    account="assets.current.petty-cash"&gt;2119.16&lt;/balance&gt;&lt;br /&gt;  &lt;balance&lt;br /&gt;    account="liabilities.current.accounts-payable"&gt;89376.78&lt;/balance&gt;&lt;br /&gt;  &lt;balance&lt;br /&gt;    account="income.professional-services"&gt;2033945.39&lt;/balance&gt;&lt;br /&gt;&lt;/balance-info&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In a non-XML system, the same information could require many request/response exchanges, and the extra latency would create major slowdowns for an application. More advanced distributed-computing protocols have mechanisms for batching information on the flycalled marshalingbut they are complex to implement and have proved less than impressive in the field. Perhaps XML's simple approach will turn out to be more robust and effective.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-1173555275447406299?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/1173555275447406299/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/network-resources.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/1173555275447406299'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/1173555275447406299'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/network-resources.html' title='Network Resources'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-2954753572385983587</id><published>2009-06-16T18:23:00.000-07:00</published><updated>2009-06-16T18:24:12.257-07:00</updated><title type='text'>Advantages of XML Networking</title><content type='html'>Advantages of XML Networking&lt;br /&gt;XML and HTTP are a great combination: XML brings full internationalization, transparency, and extensibility; HTTP brings compatibility with the Web infrastructure and a high level of hardware and software support. Combining XML and HTTP makes it possible to have stateless, lightweight, decentralized, and distributed computing, the same way that HTML and HTTP made it possible to build a stateless, lightweight, decentralized, and distributed document base.&lt;br /&gt;&lt;br /&gt;5.1.1. Internationalization&lt;br /&gt;Many of the basic Internet protocols date back to the days when the Internet and its predecessor were a phenomenon primarily of the English-speaking parts of North America. International support for non-English characters and different writing directions is often either unavailable or tacked on.&lt;br /&gt;&lt;br /&gt;XML, on the other hand, was designed from the ground up for full internationalization (I18N). Because XML not only allows but also requires full Unicode support [UNICODE], an XML document can appear in any combination of languages. For example, a single document might have an Urdu element name, a Japanese attribute name, and an Arabic attribute value.&lt;br /&gt;&lt;br /&gt;5.1.2. Transparency&lt;br /&gt;The advantage of transparency applies as much to XML networking as to other areas of XML use. Because XML is plaintext, it is easy to test and debug applications that use it. This advantage becomes important for networking because bits on a wire can be particularly difficult to analyze. For a detailed discussion, see Section 7.1.2.&lt;br /&gt;&lt;br /&gt;5.1.3. Extensibility&lt;br /&gt;Extensibility has long been an important part of Internet and pre-Internet networking protocols and formats. For example, implementers often add new features by using a distinctive reserved prefix, such as "x-".[3] Typically, applications that do not understand the extensions simply ignore them.&lt;br /&gt;&lt;br /&gt;[3] This usage will be familiar to many readers from the MIME (Multimedia Internet Mail Extensions) types, such as "video/x-msdownload," and e-mail headers [RFC 822], such as "X-Mailer."&lt;br /&gt;&lt;br /&gt;This kind of extensibility is much simpler with high-level text-based protocols, such as HyperText Transfer Protocol [HTTP], than with low-level binary protocols using fixed-length fields, such as the Transmission Control Protocol [TCP] or the Internet Protocol [IP]. Even there, though, the protocols are designed to allow a certain amount of extensibility: Both the IP and TCP have room for new option types to be defined in the future.&lt;br /&gt;&lt;br /&gt;The Internet culture of easy extensibility can be traced back in a large part to John Postel's 1981 Robustness Principle, included in RFC 793 [TCP], Section 2.10: "TCP implementations will follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others." People quickly extended this principle to other areas of Internet work and often emphasized the second part"be liberal in what you accept from others"over the first. In the 1990s, the idea of extensibility jumped from Internet protocols to Web formats: The Hypertext Markup Language (HTML) required Web browsers to ignore any markup they did not understand rather than only specially flagged markup. This approach led to major headaches for Web designers and became ammunition for both sides during the browser wars of the 1990s, but it also allowed experimentation and innovation, moving HTML from a simple specification for sharing research papers in 1990 to the rich visual medium that caught the world's attention and enabled global online communication by 1995.&lt;br /&gt;&lt;br /&gt;Ironically, despite the fact that the X in XML stands for extensible, XML itself (Section 1.2) deliberately violates Postel's robustness principle in its definition of the term fatal error:&lt;br /&gt;&lt;br /&gt;After encountering a fatal error, the processor MAY continue processing the data to search for further errors and MAY report such errors to the application. In order to support correction of errors, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document's logical structure to the application in the normal way).&lt;br /&gt;&lt;br /&gt;In other words, an XML parser is not allowed to try to recover from, say, an omitted end tag and continue normally: It has to be conservative rather than liberal in what it accepts. In fact, XML is extensible, but its extensibility comes at a higher level.&lt;br /&gt;&lt;br /&gt;The design of XML makes it easy to add new element and attribute types to any format, as in HTML, but the XML Namespaces specification [NAMESPACES] goes further by making it possible to avoid naming collisions between extensions from different sources. The ability to add new fields and complex structures to protocols without breaking existing software is one of XML networking's greatest strengths, even if it does come to Postel's principle by an indirect route.&lt;br /&gt;&lt;br /&gt;5.1.4. Compatibility&lt;br /&gt;XML networking provides compatibility in two ways.&lt;br /&gt;&lt;br /&gt;XML itself ensures that information is compatible with any application, operating system, or hardware: There are no byte-order issues, line-end problems, or any of the other small snags that can create big bugs and portability headaches.&lt;br /&gt;&lt;br /&gt;HTTP, the most popular transport in XML networking, ensures that XML-based network protocols are compatible with existing Web hardware and software, including firewalls, which typically allow HTTP traffic through with no extra configuration required.&lt;br /&gt;&lt;br /&gt;Only the end pointsthe computers sending and receiving XML networking messagesrequire any special software. Everything elsefirewalls, routers, switches, caches, multiplexers, and all the other bells and whistles of the Internetwill simply work. (Note that many people consider the firewall compatibility to be a security flaw, as discussed in Section 5.2.2).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-2954753572385983587?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/2954753572385983587/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/advantages-of-xml-networking.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/2954753572385983587'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/2954753572385983587'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/advantages-of-xml-networking.html' title='Advantages of XML Networking'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-4421122270765476469</id><published>2009-06-15T23:36:00.000-07:00</published><updated>2009-06-15T23:39:17.600-07:00</updated><title type='text'>Common Data Styles</title><content type='html'>Common Data Styles&lt;br /&gt;What does XML data look like? Three popular ways of modeling machine-readable information are&lt;br /&gt;&lt;br /&gt;The tabular style, familiar from spreadsheets&lt;br /&gt;&lt;br /&gt;The graph style, familiar from relational databases and the World Wide Web&lt;br /&gt;&lt;br /&gt;The hierarchical style, familiar from computer file systems&lt;br /&gt;&lt;br /&gt;XML can use any of the three styles, but it is optimized for the hierarchical (or tree) style. Database specialists are particularly fond of the graph style, as it simplifies database import and export and is most suited for fully normalized data. Developers of desktop applications often prefer the tabular style, especially for initialization files, as it is easy to set up; some use simple lists, which are equivalent to a single-column table. This section looks at the strengths and weaknesses of all three styles.&lt;br /&gt;&lt;br /&gt;4.3.1. The Tabular Style&lt;br /&gt;At their best, tables are a space-efficient way of representing structured information for machines or for humans. A column in a table represents a labeled fieldone piece of information about a thingwhereas each row represents all the fields for the same thing, as in Table 4-1.&lt;br /&gt;&lt;br /&gt;Table 4-1. Simple Data Table Employee&lt;br /&gt; ID&lt;br /&gt; Title&lt;br /&gt; Unit&lt;br /&gt; Specialization&lt;br /&gt; Years&lt;br /&gt; &lt;br /&gt;Janet Mulville&lt;br /&gt; e000234&lt;br /&gt; Senior Consultant&lt;br /&gt; Database&lt;br /&gt; Data modeling&lt;br /&gt; 9&lt;br /&gt; &lt;br /&gt;Ahmed Said&lt;br /&gt; e000345&lt;br /&gt; Project Manager&lt;br /&gt; Systems&lt;br /&gt; System integration&lt;br /&gt; 11&lt;br /&gt; &lt;br /&gt;Julie Fujikawa&lt;br /&gt; e009122&lt;br /&gt; Intern&lt;br /&gt; Systems&lt;br /&gt; System integration&lt;br /&gt; 1&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The arrival of the consumer spreadsheet with VisiCalc in 1979 gave end users their first chance to work with structured information directly, and its tabular format proved easy to understand and work with. Twenty-five years later, the spreadsheet is still at the center of many business applications; in fact, much business software is nothing more than customizations built on top of Microsoft's Excel spreadsheet.&lt;br /&gt;&lt;br /&gt;The tabular style is especially effective when every data object has roughly the same kinds and quantities of information. But this style becomes awkward quickly when different objects have different kinds of information or when information can repeat itself. In Table 4-1, it could be that the more senior employees have more than one specialization; for example, Janet Mulville might also have 5 years of experience with C++ programming and 3 years of experience with application server design. How would that fit into this table? The initial solution is to start repeating the Specialization and Years columns, as in Table 4-2.&lt;br /&gt;&lt;br /&gt;Table 4-2. Messy Data Table Employee&lt;br /&gt; ID&lt;br /&gt; Title&lt;br /&gt; Unit&lt;br /&gt; Spec-1&lt;br /&gt; Years-1&lt;br /&gt; Spec-2&lt;br /&gt; Years-2&lt;br /&gt; Spec-3&lt;br /&gt; Years-3&lt;br /&gt; &lt;br /&gt;Janet Mulville&lt;br /&gt; e000234&lt;br /&gt; Senior Consultant&lt;br /&gt; Database&lt;br /&gt; Data modeling&lt;br /&gt; 9&lt;br /&gt; C++&lt;br /&gt; 6&lt;br /&gt; Appl. servers&lt;br /&gt; 3&lt;br /&gt; &lt;br /&gt;Ahmed Said&lt;br /&gt; e000345&lt;br /&gt; Project Manager&lt;br /&gt; Systems&lt;br /&gt; System integration&lt;br /&gt; 11&lt;br /&gt;         &lt;br /&gt;Julie Fujikawa&lt;br /&gt; e009122&lt;br /&gt; Intern&lt;br /&gt; Systems&lt;br /&gt; System integration&lt;br /&gt; 1&lt;br /&gt;         &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Because the structure is a table, Ahmed Said and Julie Fujikawa have blank fields for additional specializations hat they do not have. If a new employee arrives with five specializations, the table will add columns for all the rows, so again, all other users will have more unnecessary blank fields. When Julie is temporarily assigned to the Database unit for 50 percent of her time, an additional Unit column will be needed as well, and so on, until it becomes extremely difficult for a person or a machine to make much sense of the information.&lt;br /&gt;&lt;br /&gt;XML has techniques to make tabular information a little more readable and efficient, however. As a starting point, consider Listing 4-1, which is a direct rendition into XML of the information in Table 4-2.&lt;br /&gt;&lt;br /&gt;Listing 4-1. Raw Table in XML&lt;br /&gt;&lt;table&gt;&lt;br /&gt;&lt;title&gt;Employee Information&lt;/title&gt;&lt;br /&gt;&lt;tgroup cols="10"&gt;&lt;br /&gt;&lt;br /&gt;&lt;thead&gt;&lt;br /&gt;&lt;br /&gt;&lt;row&gt;&lt;br /&gt;&lt;entry&gt;Employee&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;ID&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Title&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Unit&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Spec-1&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Years-1&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Spec-2&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Years-2&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Spec-3&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Years-3&lt;/entry&gt;&lt;br /&gt;&lt;/row&gt;&lt;br /&gt;&lt;br /&gt;&lt;/thead&gt;&lt;br /&gt;&lt;br /&gt;&lt;tbody&gt;&lt;br /&gt;&lt;br /&gt;&lt;row&gt;&lt;br /&gt;&lt;entry&gt;Janet Mulville&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;e000234&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Senior Consultant&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Database&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Data modeling&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;9&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;C++&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;6&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Appl. servers&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;3&lt;/entry&gt;&lt;br /&gt;&lt;/row&gt;&lt;br /&gt;&lt;row&gt;&lt;br /&gt;&lt;entry&gt;Ahmed Said&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;e000345&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Project Manager&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Systems&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;System integration&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;11&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;/row&gt;&lt;br /&gt;&lt;br /&gt;&lt;row&gt;&lt;br /&gt;&lt;entry&gt;Julie Fujikawa&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;e009122&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Intern&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;Systems&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;System integration&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;1&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;entry&gt;&lt;/entry&gt;&lt;br /&gt;&lt;/row&gt;&lt;br /&gt;&lt;br /&gt;&lt;/tbody&gt;&lt;br /&gt;&lt;/tgroup&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;XML can improve on that representation in several ways while still staying in the spirit of the tabular approach. First, because XML already labels every element, a header row labeling the columns is not needed; instead, the column labels can appear as element names. Then, once each entry is labeled, the blank ones need not be included.[2] The result is the much more readable and space-efficient XML table in Listing 4-2, although it is still far from ideal XML markup.&lt;br /&gt;&lt;br /&gt;[2] In fact, the blank entries in Listing 4-1 could have been left out anyway, as they are all trailing; however, any nonblank entries at the ends of the rows would have had to be included.&lt;br /&gt;&lt;br /&gt;Listing 4-2. Readable Table in XML&lt;br /&gt;&lt;employees&gt;&lt;br /&gt;&lt;br /&gt;  &lt;employee&gt;&lt;br /&gt;    &lt;name&gt;Janet Mulville&lt;/name&gt;&lt;br /&gt;    &lt;id&gt;e000234&lt;/id&gt;&lt;br /&gt;    &lt;title&gt;Senior Consultant&lt;/title&gt;&lt;br /&gt;    &lt;unit&gt;Database&lt;/unit&gt;&lt;br /&gt;    &lt;spec-1&gt;Data modeling&lt;/spec-1&gt;&lt;br /&gt;    &lt;years-1&gt;9&lt;/years-1&gt;&lt;br /&gt;    &lt;spec-2&gt;C++&lt;/spec-2&gt;&lt;br /&gt;    &lt;years-2&gt;6&lt;/years-2&gt;&lt;br /&gt;    &lt;spec-3&gt;Appl. servers&lt;/spec-3&gt;&lt;br /&gt;    &lt;years-3&gt;3&lt;/years-3&gt;&lt;br /&gt;  &lt;/employee&gt;&lt;br /&gt;&lt;br /&gt;  &lt;employee&gt;&lt;br /&gt;    &lt;name&gt;Ahmed Said&lt;/name&gt;&lt;br /&gt;    &lt;id&gt;e000345&lt;/id&gt;&lt;br /&gt;    &lt;title&gt;Project Manager&lt;/title&gt;&lt;br /&gt;    &lt;unit&gt;Systems&lt;/unit&gt;&lt;br /&gt;    &lt;spec-1&gt;System integration&lt;/spec-1&gt;&lt;br /&gt;    &lt;years-1&gt;11&lt;/years-1&gt;&lt;br /&gt;  &lt;/employee&gt;&lt;br /&gt;&lt;br /&gt;  &lt;employee&gt;&lt;br /&gt;    &lt;name&gt;Julie Fujikawa&lt;/name&gt;&lt;br /&gt;    &lt;id&gt;e009122&lt;/id&gt;&lt;br /&gt;    &lt;title&gt;Intern&lt;/title&gt;&lt;br /&gt;    &lt;unit&gt;Systems&lt;/unit&gt;&lt;br /&gt;    &lt;spec-1&gt;System integration&lt;/spec-1&gt;&lt;br /&gt;    &lt;years-1&gt;1&lt;/years-1&gt;&lt;br /&gt;  &lt;/employee&gt;&lt;br /&gt;&lt;br /&gt;&lt;/employees&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-4421122270765476469?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/4421122270765476469/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/common-data-styles.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/4421122270765476469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/4421122270765476469'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/common-data-styles.html' title='Common Data Styles'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-2416234885650514390</id><published>2009-06-15T23:34:00.002-07:00</published><updated>2009-06-15T23:35:05.303-07:00</updated><title type='text'>The Disadvantages of XML Data</title><content type='html'>The Disadvantages of XML Data&lt;br /&gt;Besides XML's advantages for data interchange, there are nonetheless some disadvantages. The first thing new users notice is that you cannot simply load arbitrary XML into an application or database the way that you can load a spreadsheet. That level of interoperability requires higher-level specifications, and those have been slow to gain acceptance. Without those specifications, XML processing still, unfortunately, requires a lot of custom coding and a not insignificant amount of processing time.&lt;br /&gt;&lt;br /&gt;4.2.1. Interoperability&lt;br /&gt;XML alone is not sufficient for sharing data: higher-level specifications are required to define what information should appear in the XML and how it should be structured. It is difficult to write a general-purpose, off-the-shelf tool to import XML into a database, but people can provide off-the-shelf tools to import NewsML [NEWSML], Extensible Business Reporting Language [XBRL], or many other such formats.&lt;br /&gt;&lt;br /&gt;Higher-level XML data formats, however, are difficult and expensive to create, and there are too many of them. These problems might seem to contradict each other. In fact, people propose many higher-level XML formats but implement few of them, as discussed in Section 1.4.4: The proliferation of proposals causes confusion without providing solutions that people can build around.&lt;br /&gt;&lt;br /&gt;Because there are rarely middle layers between low-level XML and high-level data formats, most XML data specifications are stovepipes, built from low-level XML on up. Although most of them share common concepts, such as identifiers, references, entities, attributes, and relationships, they all implement them differently, destroying the chance to use shared codeand shared user knowledgeto lower costs. As a result, there are few useful off-the-shelf components for dealing with data and no significant economies of scale or network effects.&lt;br /&gt;&lt;br /&gt;XML networking specifications (see Chapter 5) are helping this problem somewhat by providing general-purpose data formats for the networking payloads. These data formats may not be optimal for exporting large amounts of information from databases or publishing data on the Web, but they will at least provide a starting point.&lt;br /&gt;&lt;br /&gt;4.2.2. Abstraction&lt;br /&gt;Abstraction can also be a problem. If one organization exports its data to XML, a second organization probably will be unable to read that data into its system immediately but will have to write transformation and import scripts to change the XML to a format that its system can use. That is the major problem with XML for data sharing: Its abstraction gives it many advantages, but it also means more work. You cannot simply save or load any arbitrary XML data the way that you can save or load a spreadsheet file.&lt;br /&gt;&lt;br /&gt;4.2.3. Resources&lt;br /&gt;XML has a reputation, partly deserved and partly undeserved, as a resource hog. XML is plaintext, and XML data is usually in an abstract, logical format, so reading XML into an application often requires two steps: parsing the XML from plaintext and transforming the parsed XML so that the application, such as a database, can use it.&lt;br /&gt;&lt;br /&gt;Parsing and transformation take time, as does generating XML for output. Chapter 8 discusses how to avoid some of the worst inefficiencies in XML processing, but in an extremely high-speed environment, the time and processor use of XML might still be unacceptable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-2416234885650514390?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/2416234885650514390/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/disadvantages-of-xml-data.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/2416234885650514390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/2416234885650514390'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/disadvantages-of-xml-data.html' title='The Disadvantages of XML Data'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-6509984928073911636</id><published>2009-06-15T23:34:00.001-07:00</published><updated>2009-06-15T23:34:39.092-07:00</updated><title type='text'>The Advantages of XML Data</title><content type='html'>The Advantages of XML Data&lt;br /&gt;In many of the other chapters in this book, the lists of the advantages and disadvantages of XML are close to even; for sharing data, however, XML is a much more obvious choice. Of course, there are disadvantages that will rule out XML for certain projects, but in most cases, XML's advantages far outweigh the disadvantages, because there really is no general, portable alternative to XML for exchanging data with complex structure.&lt;br /&gt;&lt;br /&gt;Behind the scenes, many companies are already using XML for internal data sharing, but despite its value, XML data itself is no longer a hot topic. Although the vendors continue to add support for XML in databases and other applications, the spotlight has moved on to XML networking (Chapter 5) and, especially, Web Services. However, even XML networking is still, at heart, using XML for data: Nearly all the networking specifications are simply combinations of XML data with a network transport layer, such as HyperText Transfer Protocol [HTTP].&lt;br /&gt;&lt;br /&gt;XML brings clear and persuasive advantages for data exchange in five areas:&lt;br /&gt;&lt;br /&gt;Platform and storage independence&lt;br /&gt;&lt;br /&gt;Self-documentation&lt;br /&gt;&lt;br /&gt;Reusability&lt;br /&gt;&lt;br /&gt;Verification&lt;br /&gt;&lt;br /&gt;Archiving and auditing&lt;br /&gt;&lt;br /&gt;4.1.1. Platform and Storage Independence&lt;br /&gt;In the past, save files for PC-based databases, such as FoxPro, sometimes became a de facto interchange format, just as Microsoft Excel spreadsheet files are a de facto standard for exchanging simple tabular data. Spreadsheets and other data applications also support some more portable formats, such as comma-delimited text or the sequels to the old Data Interchange Format (DIF). None of the portable alternatives, however, is particularly good at exchanging complex structured data.&lt;br /&gt;&lt;br /&gt;XML can model arbitrarily complex data structures and is not tied to a specific vendor or product. With XML and Unicode, there are no byte-order, line-end, or character-encoding problems moving from machine to machine or even from country to country or culture to culture. All major and many minor programming and scripting languages have good support for XML, as do most databases.&lt;br /&gt;&lt;br /&gt;XML's independence, however, goes far beyond simply avoiding platform incompatibilities. As discussed in Section 4.3, XML can represent data in its logical format rather than its physical storage format. Whereas a relational database might divide information into a series of separate, cross-linked tables, using attributes for order, XML can pull everything into a single, ordered, logical document.&lt;br /&gt;&lt;br /&gt;XML's abstraction encourages loosely coupled data interchange. It does not matter whether the data provider and the data consumer arrange their databases differently, with different tables linked in different ways, because that physical level is not fossilized in XML as it would be in a series of comma-delimited table dumps. This fact, perhaps, matters more than all the other portability issues.&lt;br /&gt;&lt;br /&gt;4.1.2. Self-Documentation&lt;br /&gt;For humans, although not for computers, XML documents are at least partly self-describing, as discussed at length in Section 7.1.2. It might seem inconsistent, at first, to suggest that human readability is an advantage, when this chapter has been emphasizing that XML data is intended for machines. However, in practice, people still have to implement the programs or scripts to work with XML. Because XML documents normally have human-readable names for elements and attributes, developers will find it easier to produce, process, transform, and, most important, debug XML input and output. Human-readable names also make it possible to automate the creation of forms and other human interfaces for data entry.&lt;br /&gt;&lt;br /&gt;4.1.3. Reusability&lt;br /&gt;This advantage is related to storage independence (see Section 4.1.1). Well-designed XML data is almost always loosely coupled; it does not slavishly follow the structure of a spreadsheet, a set of database tables, or any other source but rather reorders the information into an abstract, logical presentation. Therefore, XML data is not limited to a single purpose. Data exported from a database can be read into another database, of course, but can also be edited, displayed, analyzed for statistical patterns, transformed, searched and queried, or even published to a Web page or print. The original data provider does not have to be able to anticipate all the uses people might have for XML data. Once the information is in XML format, XML tools and utilities will simply be able to work with it any way the recipient desires.&lt;br /&gt;&lt;br /&gt;4.1.4. Verification&lt;br /&gt;Developers often spend a lot of their time developing and testing code for data validation. With XML data, it is possible to do basic structural validation and, in some cases, data-type validation (see Section 4.5.3), using schema languages with off-the-shelf software tools and libraries. Writing an XML schema or DTD is a specialized skill, but it can sometimes be much easier and more robust than writing custom code to perform the same structural checks.&lt;br /&gt;&lt;br /&gt;XML data allows verification in other ways as well. Because XML is plaintext, people can easily examine it for problems that automated tests might have missed, the same way that people can examine text-based Internet protocols, such as HTTP, SMTP, or Post Office Protocol v. 3 (POP3), manually for verification and debugging. And, like any text file, XML is relatively simple to sign digitally.&lt;br /&gt;&lt;br /&gt;4.1.5. Archiving and Auditing&lt;br /&gt;XML data consists of files, not bits on the wirelike a networking protocolor application-specific binary datalike a spreadsheet file. Because XML is not tied to an application that may no longer exist in a few years, XML data will still be useful in an archive 50 or 100 years from now; because it can be saved to disk, every transaction can be archived and audited, something that is difficult with protocols.&lt;br /&gt;&lt;br /&gt;These points are especially important for companies or other organizations that face complex legal and reporting requirements. A saved XML transaction, like a saved e-mail, provides an ongoing record of what was happening in a company at a specific time. In case of a security breach, people can go back and review not only logs but also past transactions to track down the problem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-6509984928073911636?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/6509984928073911636/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/advantages-of-xml-data.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/6509984928073911636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/6509984928073911636'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/advantages-of-xml-data.html' title='The Advantages of XML Data'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-3520196848165302847</id><published>2009-06-15T23:26:00.003-07:00</published><updated>2009-06-15T23:26:58.310-07:00</updated><title type='text'>XML Content Management</title><content type='html'>Content Management&lt;br /&gt;This chapter has already emphasized that the biggest real benefits for XML publishing come with large, highly structured technical publications, such as regulatory documents, dictionaries, legislation, or maintenance manuals for complex equipment, such as aircraft or weapons systems. Such documents typically have many authors and a formal and often complex editorial review process. These living documents are under continual revision for their entire lives. With a dictionary, for example, lexicographers write individual entries, which will then pass through several stages of editing and approval before being added to the dictionary proper. As corrections and new slipsusage examplescome in, lexicographers will revise an entry and start the entire review process again. At some major projects, the lifespan of the documents is longer than that of the authors: For the Oxford English Dictionary, for example, this process has continued nonstop for almost a century and a half. There's never any concept of a finished book; each edition is only a snapshot of a never-ending work in process. Exactly the same process applies to the aircraft maintenance manual for a large airliner, for example, except that the snapshots are published every few months rather than every few decades.&lt;br /&gt;&lt;br /&gt;Obviously, this kind of publishing was possible before computers, much less XML, but it was labor- and paper-file-intensive: precisely the kinds of operations that can benefit from some kind of content-management system. Many computer programmers are already familiar with source-code-management systems, such as CVS or Visual SourceSafe. Content-management systems for documentation are very similar: They allow authors to check objects, such as documents or pictures, into or out of a central repository, which tracks revisions and often also allows searching, indexing, and even final document assembly. A documentation content-management system may also have a workflow component attached to it, so that it can both track the status of each object through the editorial process and manage the process by sending files to the people who need to approve them.&lt;br /&gt;&lt;br /&gt;None of these features is unique to XML, but they are especially likely to be required in a large, multiauthor XML documentation project. Some systems go further: Instead of managing each XML document as a single object, the same as a picture or word-processing file, they allow users to check out part of an XML documentsay, the third task in the second chapterwhile other users work on other parts of the same XML document. Typically, the system parses the XML document and converts it into a series of entries in a specialized database, then reconstructs it as XML when needed.&lt;br /&gt;&lt;br /&gt;For a large project with multiple authors, a content-management system is often a requirement: The only question is whether to use a traditional system or a special XML-aware one. It is best to approach this problem backward and start with the non-XML solution. Assume that a company is creating a large amount of technical documentation in XML, using a team of authors and editors. At any time, authors will have lists of items assigned to them for writing or revision, and editors will have lists of items assigned to them for editing and approval. The technical documentation itself consists mostly of independent tasks. Listing 3-9 shows a fragment of a simplified DTD.&lt;br /&gt;&lt;br /&gt;Listing 3-9. Top Level of a DTD&lt;br /&gt;&lt;!ELEMENT doc (intro, chapter+)&gt;&lt;br /&gt;&lt;!ELEMENT chapter (title, chapterintro, task+)&gt;&lt;br /&gt;&lt;!ELEMENT task (title, taskintro, partlist, toollist, step+)&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Most of the authoring items moving through the workflow chain are tasks with their associated graphics; the editors will assign tasks to individual authors, who will then pass the tasks back up to the editors for approval. When a task requires revision, the cycle repeats itself.&lt;br /&gt;&lt;br /&gt;Would an XML-aware CMS bring much benefit to a project like this? No. Authors do not need to check out a single step or title or an entire chapter, only tasks. The CMS can handle each task as a separate XML document without needing to know anything about the task's internal markup structure.&lt;br /&gt;&lt;br /&gt;To require an XML-aware content-management system, there would have to be no standard unit for authoring, editing, or workflow; the author would need to be able to check out and lock anything from a single list item to the entire document. The author of this book has not yet seen such a requirement in the real world. Every big multiauthor project has standard units of work, whether they are tasks, dictionary entries, or newspaper articles.&lt;br /&gt;&lt;br /&gt;Content-management systems also typically offer searching, indexing, and packaging. A full-text search is sometimes sufficient, but because XML markup can provide much more detailed content, a good case can sometimes be made for making the search XML-aware. For more on this point, see Chapter 6.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-3520196848165302847?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/3520196848165302847/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/xml-content-management.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3520196848165302847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/3520196848165302847'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/xml-content-management.html' title='XML Content Management'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-4194192709963837961</id><published>2009-06-15T23:26:00.001-07:00</published><updated>2009-06-15T23:26:32.344-07:00</updated><title type='text'>Script Client-Side XML</title><content type='html'>Client-Side XML&lt;br /&gt;Single-source publishing allows people to write documents in XML and then convert them to multiple formats, such as PDF or PostScript for print and HTML for the Web. However, the World Wide Web Consortium [W3C] did not design XML simply to be a source for other formats; many people intended XML itself to be a delivery format for the Web, replacing HTML. Beyond XML is a family of W3C specifications, such as XML Linking Language [XLINK], XML Pointer Language [XPOINTER], cascading style sheets [CSS], and XSL Transformations [XSLT], all designed to help browsers display XML documents directly.&lt;br /&gt;&lt;br /&gt;In many ways, this work has succeeded. Both Microsoft's Internet Explorer and the variants of Mozilla contain extensive XML support: They can display a raw XML document without a style sheet, or they can use style sheets to make an XML document in a browser window indistinguishable from HTML for the casual user. Behind the scenes, however, client-side scripts can perform sophisticated tricks based on the XML source, providing a richer browsing experience for users. Although the XML support in the two browsers has incompatibilities, and some specifications, such as XLink, are barely supported, if at all, there is also a surprising amount of compatibility, especially in comparison with the bitter browser wars of the late 1990s.&lt;br /&gt;&lt;br /&gt;Both the standards and the software are in place for delivering XML directly to users without going through a middle format, such as HTML or PDF, and most users not only have the software installed on their system but use it daily for viewing HTML pages. Nevertheless, XML on the Web is almost nonexistent, as HTML is good enough for almost everything that anyone wants to do on the Web, and the extra benefits of delivering XML directly do not make up for the costs of new training, new authoring tools, and incompatibility with the minority of users who still have old, pre-XML browsers installed.&lt;br /&gt;&lt;br /&gt;Although client-side XML has failed to take off the way that HTML did in the mid-1990s, the tool availability is still a benefit for XML documentation projects. Instead of purchasing and installing special XML viewers, authors can preview formatted versions of XML documents directly in their familiar Web browsers. On intranets and other areas where browser versions are more uniform, client-side XML viewing is a useful ability, even if it has failed to become a social phenomenon on the Web.&lt;br /&gt;&lt;br /&gt;3.4.2. Reuse&lt;br /&gt;Documentation always fits awkwardly into technology projects. Technical writers complain about programmers who make changes at the last minute, forcing the writers to redo most of the documentation; programmers, in turn, complain about writers who seem unable to write most of their documentation until just before a release deadline.&lt;br /&gt;&lt;br /&gt;In fact, programmers have long created and used systems to help them write and maintain documentation aimed at other programmers. In the 1980s, Donald Knuth promoted literate programming, whereby the source code for a program, such as TeX, was embedded inside its own documentation and extracted automatically for compilation; anyone editing the documentation would edit the code at the same time, and vice versa, ensuring that the documentation and the code remained synchronized. In the mid-1990s, as the Java programming language increased in popularity, the opposite approach became common: Programmers embedded documentation in the source code as specially formatted comments and extracted it automatically for publication. (Earlier programming languages, such as Emacs LISP, had already used this approach on a smaller scale.) The JavaDoc system proved extremely effective for generating programmer's API documentation and has been much imitated for other programming languages.&lt;br /&gt;&lt;br /&gt;JavaDoc and literate programming work for programmer's documentation because the documentation nearly always follows the structure of the source code. When the programmer deletes a class or a method, the documentation disappears with it; when adding or modifying a class, the programmer simply needs to modify the documentation that is right there on the screen with it.&lt;br /&gt;&lt;br /&gt;Unfortunately, things are not so easy for most technical writers. Normally, their documentation is designed for users rather than for programmers, so it is based on tasks or concepts rather than source-code structure. As a result, there is no natural connection between the changes a programmer makes to the source and the changes a technical writer has to make to the documentation. A single user task, such as creating a new account, might touch code from dozens of source code modules managed by different programmers; a single source code module might affect dozens of different task descriptions. Even a trivially small change to the source code can have an exponential impact on the documentation.&lt;br /&gt;&lt;br /&gt;Consider a simple code module that displays a dialog box containing a message and two buttons labeled Accept and Cancel. The quality-assurance specialist sends a note to the programmer, saying that, for consistency, the first button should be labelled OK rather than Accept; the programmer takes 5 minutes to change one line of code, test, and commit, and the documentation specialist then announces that it will take 2 weeks to revise the tutorial and manual. What happened?&lt;br /&gt;&lt;br /&gt;First, dozens of different parts of the code might invoke that dialog box, and each may be used by dozens of different tasks. Suppose that a manual has text like the following:&lt;br /&gt;&lt;br /&gt;Select crop from the File drop-down menu.&lt;br /&gt;&lt;br /&gt;A confirmation dialog will appear. Select Accept to continue.&lt;br /&gt;&lt;br /&gt;For each instance, the writer will have to change Accept to OK, and then the writeror editor or quality-assurance specialistwill have to recheck all the documentation against the software. Even worse, the manual may contain screenshots of the dialog in different contexts, all of which will have to be recaptured and recropped. If a small change like this can cause so much trouble, it is not difficult to understand how a more fundamental change could throw technical documentation into chaos.&lt;br /&gt;&lt;br /&gt;This kind of problem was common in computer programming as well until the structured programming movement, beginning in the 1970s, and the object-oriented programming movement, beginning in the 1980s, helped programmers get better at writing reusable code. Programmers have learned to encapsulate reusable code in a single place, such as a function or an object or even a library, rather than duplicating the same code over and over again in their programs; database designers do the same thing when they normalize their database. In fact, document writers have been able to do this for centuries before computers existed, simply by embedding a reference in a text, such as "(see Job 8:810)."&lt;br /&gt;&lt;br /&gt;Modern technical documents could use the same include-by-reference approach as modern computer programs, in which case their documents might look like this:&lt;br /&gt;&lt;br /&gt;To create a new document, take the following steps:&lt;br /&gt;&lt;br /&gt;(See p.145)&lt;br /&gt;&lt;br /&gt;(See p.251)&lt;br /&gt;&lt;br /&gt;(See p.18)&lt;br /&gt;&lt;br /&gt;(See p.44)&lt;br /&gt;&lt;br /&gt;(See p.182)&lt;br /&gt;&lt;br /&gt;It would be easy to write documents this way, especially with word processors that can track references and automatically fill in page numbers, but it would not be easy to read such a document. Following cross-references and keeping track of previous locations are a lot more difficult for humans than for a computer, and people reading documentation like this will quickly get frustrated or simply lost. Documentsat least in the final form that readers seeare necessarily highly redundant, or what a database specialist would call denormalized: They have to repeat information.&lt;br /&gt;&lt;br /&gt;This is where another of XML's big promises comes in. XML, like its predecessor, SGML, is designed to allow writers to reuse text the same way that programmers reuse code: A single change in the XML source document can automatically propagate itself throughout the output formatted document at the other end of the XML publishing system.&lt;br /&gt;&lt;br /&gt;XML has several mechanisms for allowing reusable text, among which the simplest is the internal text entity. In the internal DTD subset, an author includes a declaration like the following:&lt;br /&gt;&lt;br /&gt;&lt;!ENTITY accept-button-name "Accept"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Then, in the main text, the author enters a reference to that entity:&lt;br /&gt;&lt;br /&gt;&lt;step&gt;A confirmation dialog will appear. Select&lt;br /&gt;&amp;accept-button-name; to continue.&lt;/step&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Although this approach is dead simple for a single author creating XML in a text editor, it can cause problems in a large, multiuser environment, in which regular authors should not be able to modify the DTD, where the entity declarations appear. In those cases, system designers come up with more elaborate methods for reusable text. For example, in a maintenance manual, the following caution might appear many times:&lt;br /&gt;&lt;br /&gt;Caution: Use calibrated torque wrench. Overtorquing may cause the bolt to shear.&lt;br /&gt;&lt;br /&gt;In a big project, an author might create this caution once, possibly as an independent XML document like the following:&lt;br /&gt;&lt;br /&gt;&lt;caution&gt;Use calibrated torque wrench. Overtorquing may&lt;br /&gt;cause the bolt to shear.&lt;/caution&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;An author who needs to include this caution will include it by reference, often through a custom-designed dialog box added to the editing system, as follows:&lt;br /&gt;&lt;br /&gt;&lt;caution-inclusion ident="cautions/overtorque001"/&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The same technique should work for shared steps in tasks, boilerplate legal text, and anything else that gets repeated throughout a document.&lt;br /&gt;&lt;br /&gt;This approach looks like the ideal bridge across the discontinuity between the way coders code and the way writers write, but experience in field use has been disappointing. First, in technical writing, repeated text tends to be similar but not identical: A part name in the middle of a caution or the transition text at the beginning of a paragraph will change, depending on its context. Even when the text does not change, managing and locating small, reused chunks of text is mind-numbing work. Most authors would probably prefer to simply to retype when necessary, rather than spend several minutes each time searching through a repository of reusable steps to see whether one is appropriate. Even if authors were willing to use such a system, the savings would not be as great as forecasted. For example, if the same step were used in 200 places and the step were changed, authors or editors would still have to check all 200 places in the text to make sure that the change was appropriate in context, and checking the 200 places will usually take as long as retyping them.&lt;br /&gt;&lt;br /&gt;It may be that new tools and new ways of running projects make reusable information more common in the future; for now, however, XML is not a universal solution to the documentation discontinuity.&lt;br /&gt;&lt;br /&gt;3.4.3. Idioms&lt;br /&gt;Although reusable text is a bit of a chimera, single-source publishing is a real benefit that can come from an XML publishing system. However, single-source publishing also has its limitations, and it is important to understand them before starting on a major XML publishing project.&lt;br /&gt;&lt;br /&gt;Single-source publishing is an exciting idea that also happens to be easy to explain to nonspecialists. You create a single XML source document, then use scripts or templates to transform it automatically into different publication media, such as print, Braille, an automated voice telephone system, or the Web.&lt;br /&gt;&lt;br /&gt;Much of the time, people want to publish from XML source documents only to print and the Web. For publishing to print, the typical data formats are TeX, PDF, RTF, PostScript, and MIF; for publishing to the Web, the typical data formats are HTML and XHTML but sometimes also PDF or Flash. XML specialists learn quickly that they need to write separate transformation style sheets for print and the Web, even if the core content is identical. The obvious problem is that they are transforming to different primitives: HTML deals with abstractions, such as paragraphs and lists, whereas print formats tend to deal with concrete layout elements, such as blocks, fonts, and spacing. This difference is not simply a design problem that could be fixed. Print formats are fundamentally page-based, whereas HTML is fundamentally screen-based. Each has pros and cons.&lt;br /&gt;&lt;br /&gt;Page-based formats allow a designer to take advantage of all the available space, by including multiple texts and graphics in different parts of a single page, with fine control over the placement and size of each item. However, page-based formats are also brittle: A document needs to be optimized for a specific size and aspect and will not move easily across different display devices. (Try viewing a U.S. letter- or A4-sized PDF document on a handheld computer.)&lt;br /&gt;&lt;br /&gt;Screen-based formats are inherently more flexible and, when properly used, will work for many different display sizes. However, that flexibility comes at the cost of surrendering control over the finer points of layout. (Try placing sidebars and graphics precisely in HTML.)&lt;br /&gt;&lt;br /&gt;The real problem, then, is that a Web page is not simply a print document online but rather a fundamentally different kind of thing. That is why Web pages need their own style sheets. That's not such a big problem, however: Writing two, or even ten, style sheets accounts for very little overhead when you will be using them hundreds or thousands of times to transform XML for publication.&lt;br /&gt;&lt;br /&gt;Single-source publishing works well for both print and the Web, as long as you are publishing the right kind of thing. A technical manual for a software program, or a novel, can easily pass from a single XML source document through a couple of different style sheets to print and Web versions, all without human intervention. You do end up, usually, with Web pages that scroll a lot (say, one page for each chapter). That will not be a problem if the user has decided to read a book or manual online, but it is not what you normally expect to find in a Web page. Web pages are typically short, dynamic, and interactive, not long and static.&lt;br /&gt;&lt;br /&gt;There is no reason that a person cannot design an XML document type that takes dynamic content into account, so that the HTML rendition can contain animations, applets, forms, and so on, but doing so requires that you place new constraints on your XML document type in advance: You cannot publish just any document and have it look good both online and in print. This is not a medium problem but an idiom problem. You can print a Web page on paper or put a novel online, but neither fits naturally there, because Web pages and novels are fundamentally different kinds of things. Many other idioms cause trouble for single-source publishing. Consider, for example, slides, online help, and Unix man pages: Each of these follows a fundamentally different set of constraints and carries a different set of reader expectations, and it gets more and more difficult to generate all of them from a single XML source document. In the end, XML cannot deliver on all its promises for single-source publishing; it can allow you to publish to multiple formats from a single master document, but publishing to multiple idioms is much more difficult.&lt;br /&gt;&lt;br /&gt;Is universal single-source publishing a hopeless case, then? Not quite. It turns out that, although all these idioms have drastically different top-level structures, they share much in common at the lower levels. Many of them, for example, have paragraphs of text with special-purpose phrases contained in them, and many use lists and tables. For the lower-level content, XML truly can deliver on at least some of the promise of single-source publishing: the same table might appear on a slide, a man page, a Web page, and in a printed book, all in significantly different contexts, appropriate to each idiom.&lt;br /&gt;&lt;br /&gt;Furthermore, people are often willing to accept deficiencies for the sake of saving money or having easier access to information. For example, a state legislature might accept simple, unimaginative, automatically generated Web pages for its online legislation in exchange for a cost savings of $100,000 a year. The Web site for a popular magazine, on the other hand, will not likely be willing to make the same trade. Designing and implementing an XML-based publishing system is largely a matter of managing expectations. Raising false hopes about single-source publishing will lead to disappointment and hostility later on, but if both the implementers and the users understand and agree to the tradeoffs in advance, XML publishing can work and save money.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-4194192709963837961?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/4194192709963837961/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/script-client-side-xml.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/4194192709963837961'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/4194192709963837961'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/script-client-side-xml.html' title='Script Client-Side XML'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-4848719887552105543</id><published>2009-06-15T23:24:00.000-07:00</published><updated>2009-06-15T23:25:13.542-07:00</updated><title type='text'>Formatting  XML</title><content type='html'>Formatting and Production&lt;br /&gt;Once the higher-level issues are resolved and the authoring system is installed, it is time to turn to the nitty-gritty details of formatting. In many cases, there will be no problems at all; XML, together with transformation and formatting software, does a good job of handling the typical, routine tasks of formatting, particularly if the XML master document contains a single text flow continuing over several pages, such as a technical manual.&lt;br /&gt;&lt;br /&gt;Unfortunately, things are sometimes not so simple. This section examines some of the physical aspects of printed documentation that can cause problems for XML publishing. Sometimes, these problems will not surface until late in a project, when there is not enough time or money left to fix them properly; learning to anticipate them can make a big difference to an XML publishing project's chance of success.&lt;br /&gt;&lt;br /&gt;3.3.1. Change Markup&lt;br /&gt;Technical publications often include various kinds of change information to make it easy for users to find differences between versions, and encoding this kind of information in XML markup probably represents the single biggest difficultly in XML publishing. The final change information in a printed text can take many forms:&lt;br /&gt;&lt;br /&gt;Vertical bars in the margin beside changed text&lt;br /&gt;&lt;br /&gt;Separate textual descriptions of changes made and the reasons for them&lt;br /&gt;&lt;br /&gt;Different font combinations to show text removed and added&lt;br /&gt;&lt;br /&gt;Differently formatted section headings to show sections that contain changes&lt;br /&gt;&lt;br /&gt;Even finding the differences between two versions of the same XML document in the first place can be a problem, although more open-source and commercial software is becoming available. Some XML differencing algorithms scale badly with large documents, so it is worth load testing your intended differencing software early in any project. Assuming that you do have some mechanismeven manual identification by the authorin place for locating changes in an XML document, this section examines some of the problems with inserting the change markup into XML documents for publication.&lt;br /&gt;&lt;br /&gt;3.3.1.1 Markup Issues&lt;br /&gt;Change markup in XML documents causes publishing difficulties on several levels. Most basically, changes do not tend to fit neatly into XML markup trees. Consider the following:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are 203 authorized service depots in Southeast&lt;br /&gt;Asia.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The authorized service depots all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;A change in the company's technical-support structure could cause the content to change, as follows:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are 55 service partners in the Asia-Pacific &lt;br /&gt;region.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The service partners all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;What kind of markup should a system add to this document to show where the changes are? The change begins in one paragraph and ends in the next one, but XML does not allow an element to start and end inside different parent elements, so it is not possible to tag the entire changed sequence as a normal XML element.&lt;br /&gt;&lt;br /&gt;The first option is to put empty tags at the start and end of the changed text:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are &lt;change-start/&gt;55 service partners in&lt;br /&gt;the Asia-Pacific region.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The service partners&lt;change-end/&gt; all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;A variation on the same theme is the use of processing instructions, to avoid contaminating the main element tree:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are &lt;?change-start?&gt;55 service partners in&lt;br /&gt;the Asia-Pacific region.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The service partners&lt;?change-end?&gt; all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Unfortunately, XML publishing tools normally apply formatting based on element boundaries, and many of those tools are not capable of recognizing a span from one empty element to another. Custom-written Perl or Python scripts or very clever and complicated XSLT templates can handle this kind of markup in many cases, but developing them will use up a disproportionately large amount of time on any project.&lt;br /&gt;&lt;br /&gt;A second option is to split up the change so that it falls into element boundaries:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are &lt;change&gt;55 service partners in&lt;br /&gt;the Asia-Pacific region&lt;/change&gt;.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;change&gt;The service partners&lt;/change&gt; all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This approach is much more practical for working with formatting tools, as they can apply normal formatting based on element context, but can cause awkward problems when additional information is attached to the change markup. Consider the following:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are &lt;change desc="Change to new service system."&gt;55&lt;br /&gt;service partners in the Asia-Pacific region&lt;/change&gt;.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;change desc="Change to new service system."&gt;The service&lt;br /&gt;partners&lt;/change&gt; all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If the publishing system is also generating a list of changes or is adding marginal notes or footnotes describing the changes, the change will show up twice. If authors add change markup by hand, splitting a long changesay, over several paragraphs or stepswill be tedious and could lead to errors.&lt;br /&gt;&lt;br /&gt;A third option is to use a single change element placed higher up in the document tree:&lt;br /&gt;&lt;br /&gt;&lt;change desc="Change to new service system"&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;[...] There are 55 service partners in the Asia-Pacific &lt;br /&gt;region.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The service partners all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;/change&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This approach has the advantage of avoiding duplicate change elements, but it can end up tagging far more text than has changed. An even coarser variation on this approach is to mark changes only on the element level, using attributes:&lt;br /&gt;&lt;br /&gt;&lt;p changed="y" desc="Change to new service system"&gt;[...] There are &lt;br /&gt;55 service partners in the Asia-Pacific region.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p changed="y" desc="Change to new service system"&gt;The service&lt;br /&gt;partners all provide [...]&lt;/p&gt;&lt;br /&gt;&lt;/change&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This approach can be useful for specialized applications, such as legal texts, with individually numbered paragraphs or subparagraphs. In the general case, however, it has the disadvantages of both including too much and duplicating change information.&lt;br /&gt;&lt;br /&gt;The last solution is both the most elegant and the most brittle: Track changes outside of the document by using, for example, XPointer expressions to describe the start and end of each change:&lt;br /&gt;&lt;br /&gt;&lt;change type="update"&gt;&lt;br /&gt;  &lt;description&gt;Change to new service system&lt;/description&gt;&lt;br /&gt;  &lt;span&gt;&lt;br /&gt;    &lt;start&gt;//step[@id="foo"]/p[2]/text()/point()[position()=247]&lt;br /&gt;     &lt;/start&gt;&lt;br /&gt;    &lt;end&gt;//step[@id="foo"]/p[3]/text()/point()[position()=20]&lt;br /&gt;     &lt;/end&gt;&lt;br /&gt;  &lt;/span&gt;&lt;br /&gt;&lt;/change&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Although this approach allows tracking the change precisely, without duplication, it also requires an enormous amount of coordination between the out-of-line index and the authoring system; if they are not kept perfectly synchronized, the whole thing will fall apart.&lt;br /&gt;&lt;br /&gt;So far, this section has not mentioned the problem of marking changes in attribute values. Because attribute values cannot contain tags or processing instructions, marking changed attributes is always awkward; therefore, tracking changes externally might be the best option in this case. XML projects sometimes ensure that all information that needs to be marked as change appears within elements.&lt;br /&gt;&lt;br /&gt;3.3.1.2 Custom Publishing Issues&lt;br /&gt;Although the tagging issues for change markup can be tricky, the more serious problems come with custom publishing. The change information in the final published document has to represent changes visible to the reader, not necessarily changes visible to the author.&lt;br /&gt;&lt;br /&gt;In custom publishing, documents are typically assembled from text objects that have rules governing when they should or should not appear. For example, a warning may apply only to aircraft that use a certain engine or to reactors that use a certain cooling process. If the rule for the text object changes, it may suddenly appear in one customer's document or disappear from another's. Consider this warning:&lt;br /&gt;&lt;br /&gt;&lt;warning applicability="0050-0200"&gt;Using the wrong grade of &lt;br /&gt;lubricant can cause engine failure.&lt;/warning&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The warning is applicable for serial numbers 00500200 of a product; a custom publishing system will include it in publications for customers owning products with those serial numbers and omit it for all other customers. Now, an author makes a couple of small changes to the warning:&lt;br /&gt;&lt;br /&gt;&lt;warning applicability="0100-0250"&gt;Using the wrong grade of &lt;br /&gt;lubricant can cause valve damage.&lt;/warning&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The customer with product serial number 0150 should see essentially the same warning, with the phrase engine failure changed to valve damage:&lt;br /&gt;&lt;br /&gt;&lt;warning&gt;Using the wrong grade of lubricant can cause &lt;change&gt;valve damage&lt;/change&gt;.&lt;/warning&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The customer with product serial number 225 previously did not see the warning at all, so the whole thing requires change markup:&lt;br /&gt;&lt;br /&gt;&lt;change&gt;&lt;warning&gt;Using the wrong grade of lubricant can cause valve damage.&lt;/warning&gt;&lt;/change&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The customer with product serial number 0075 previously had the warning, but the change in effective serial numbers means that it will no longer appear in that version of the manual, so the change in this case is a deletion:&lt;br /&gt;&lt;br /&gt;&lt;change&gt;&lt;warning&gt;[Deleted]&lt;/warning&gt;&lt;/change&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Any descriptions of the changes also need to make sense from the reader's perspective. Many projects do not need to report changes with this level of accuracy, but when they do, it can end up being a major project in itself.&lt;br /&gt;&lt;br /&gt;3.3.2. Looseleaf Publishing&lt;br /&gt;Another challenging problem for any automated formatting system is page-based updates, otherwise known as looseleaf publishing. Some kinds of technical documents, such as maintenance manuals and regulatory documents, need to be updated frequently. A standard practice in the paper-based publishing world is to distribute the entire document once, in a binder, and then to send new or updated pages at regular intervals, perhaps every month or a few times a year, with instructions on where to add, remove, or replace pages in the current manual. The instructions, called change pages, might look like this:&lt;br /&gt;&lt;br /&gt;Remove pages 1-3 to 1-5, 1-7, 1-18, 1-26 to 1-44&lt;br /&gt;&lt;br /&gt;Add pages 1-3 to 1-5, 1-7, 1-18, 1-26 to 1-48&lt;br /&gt;&lt;br /&gt;To ensure that the publications do not fall out of sync, the publishers will periodically issue a list of effective pages (LEP) showing what pages should be in the binder. Normally, page numbering starts fresh in each section or chapter, so that a page inserted in one part of the publication will not force renumbering of all pages.&lt;br /&gt;&lt;br /&gt;The advantage of any automated publishing system is its ability to free authors from worrying about formatting details, such as pagination, but in this case, pagination matters quite a bit. For page-based updates, a publishing system has to be able to manage the following tasks:&lt;br /&gt;&lt;br /&gt;Preserve page numbers from the last revision, whenever possible&lt;br /&gt;&lt;br /&gt;Preserve page breaks from the last revision, whenever possible&lt;br /&gt;&lt;br /&gt;Identify and print changed pages, with instructions for adding, removing, or replacing, as necessary&lt;br /&gt;&lt;br /&gt;This process is not easy to automate, as a lot of judgment is involved: How much whitespace should the system allow at the bottom of a page before changing a page break, for example? Another problem is that formatting information, such as page breaks and numbering, has to be preserved somehow and kept in sync with the XML markup. One option is to design a system that will insert the information back into the XML document after each formatting run:&lt;br /&gt;&lt;br /&gt;&lt;para&gt;Airspace above FL180 is Class A, and is restricted to &lt;br /&gt;aircraft flying IFR.&lt;/para&gt;&lt;br /&gt;&lt;br /&gt;&lt;pagebreak n="F.12"/&gt;&lt;br /&gt;&lt;br /&gt;&lt;para&gt;Some class E airspace may require a mode C &lt;br /&gt;transponder.&lt;/para&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As an alternative, a system could store pagination information externally as a set of pointers into the XML document. In that case, however, the authoring system will need to be able to update the pointers as authors make changes to the XML document, so a fairly elaborate technical infrastructure will be required.&lt;br /&gt;&lt;br /&gt;Unfortunately, this problem has no simple, technically elegant solution. It blurs the boundary between content and presentation, a boundary that is usually very important for XML work. The best anyone can do is identify the requirement early and, once again, allow a lot of time and money for meeting it. In time, page-based updates will disappear as publishers distribute more and more information electronically; it is generally easier to redistribute an entire electronic document rather than only changed pages. Also, if sections are small, it is sometimes easier to redistribute an entire changed section rather than individual pages. (XML publishing systems will manage that task much more easily.) Note that documents that use page-based updates generally also require change markup, described in Section 3.3.1.&lt;br /&gt;&lt;br /&gt;3.3.3. Multiple Text Flows&lt;br /&gt;A text flow is a single sequence of text meant to be read from start to end. In a simple publication, all the text flows occur in sequence: for example, an introduction, followed by several chapters, followed by several appendixes. Multiple text flows in sequence are not much more difficult to work with than a single text flow.&lt;br /&gt;&lt;br /&gt;Some types of publications, however, take advantage of both dimensions of the page to present text flows in parallel. One obvious example is a newspaper: Several stories can appear together on the same page, and some stories can continue on other pages, wherever space is available around paid advertisements.&lt;br /&gt;&lt;br /&gt;Automating the layout of a newspaper from an XML master document would be a difficult task. Fitting stories together on a newspaper page is a bit like a jigsaw puzzle, except that it involves answering hundreds of subjective questions as well: Editors have to decide what stories are important, and marketable, enough to appear on the front page and, in a broadsheet, above the fold. A certain amount of variety in the story selection is needed: Unless something important had occurred, a newspaper editor would not want too many stories about the same person or event to appear together at the front, even if they would otherwise be the highest-priority stories. During an election, the newspaper editor may want to be careful not to appear to be biased by giving one candidate a disproportionately large amount of front-page coverage. (Or, on the other hand, the editor may indeed want to favor one candidate that way.)&lt;br /&gt;&lt;br /&gt;Off the front page, the paper is, of course, divided into sections, and related stories tend to be grouped together. Advertising pays a big part of the newspaper's expenses, so ad placement is critical, and the editor also has to watch for conflicts; for example, the lawn-care company ad must not appear too close to the story on the danger of pesticides. Visual appearance is also important; some stories have pictures attached, and the stories must be arranged so that the pictures are spread out evenly among the pages. Without a lot of care, the newspaper could end up with five pictures on one page and solid text on the next.&lt;br /&gt;&lt;br /&gt;Can all of this decision making and design be automated, with or without XML? Fortunately for the job security of newspaper editors, it does not appear so. At best, an XML-based publishing system can chip around the edges by adding metadata to each story, including its priority and subject codes, so that the computer system can help the editor find and organize the stories more easily.&lt;br /&gt;&lt;br /&gt;The newspaper is an extreme example, but the same problems arise in other, more routine kinds of publications. Footnotes, for example, are a separate kind of text flow, but one that many automated formatting systems, such as TeX, handle fairly well. Tables and illustrations are a little more difficult, as they need to be placed close to the text that references them without creating too many widow and orphan lines on the page, and sidebars make the problem a little more difficult yet. Although automated systems are not yet ready to handle newspaper or glossy magazine layout at all, they can handle footnotes, sidebars, tables, and illustrations, but many publishers find the result a little sloppy, and they still employ human layout artists working with interactive programs, such as Adobe FrameMaker, rather than fully automated publishing systems.&lt;br /&gt;&lt;br /&gt;So right now, automated formatting systems can handle some kinds of multiple text flows well enough for technical publications but not well enough for, say, glossy magazines or advertising material; those still require the services of human layout designers. In those cases, XML is most useful for the content of individual text flowsmarking paragraphs, special text, and so onrather than for the document as a whole.&lt;br /&gt;&lt;br /&gt;In the future, this problem will solve itself as more and more publishing moves online. Most attempts to do newspaper- or magazine-style layout online look horrible and are awkward to use on current computers. Instead, a typical magazine or newspaper Web site consists of a list of headlines and, possibly, summaries, with links to individual stories on their own pages. Computers will still probably not be smart enough to lay out advertising material or glossy magazines on their own, but they will be able to handle the bulk of&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-4848719887552105543?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/4848719887552105543/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/formatting-xml.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/4848719887552105543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/4848719887552105543'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/formatting-xml.html' title='Formatting  XML'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-8919585163635496219</id><published>2009-06-15T23:23:00.000-07:00</published><updated>2009-06-15T23:24:46.275-07:00</updated><title type='text'>Listing  XML Markup</title><content type='html'>XML Documents&lt;br /&gt;Generic markup was originally designed for documents, such as technical manuals, books, and articles. XML is a direct descendant of the the Standard Generalized Markup Language [SGML], which was released in 1986; SGML, in its turn, was a descendant of the Generalized Markup Language (GML), developed by Charles Goldfarb, Ed Mosher, and Ray Lorie at IBM in the early 1970s, initially for tagging documents in the legal department.[1] This kind of in-line markup traces its way back further to formatting codes for typesetting machines and on to editorial marks on paper copy.&lt;br /&gt;&lt;br /&gt;[1] Many people believe that GML stands not for Generalized Markup Language but for Goldfarb, Mosher, and Lorie.&lt;br /&gt;&lt;br /&gt;Over time, document markup has become increasingly generic: Codes for type styles, such as "italics," have given way to more general codes, such as "title," that say what text represents rather than how it should be formatted. This pattern occurred not only with XML but also with other document languages, such as TeX and TROFF, which implemented high-level macro packages, such as LaTeX and MS, to hide low-level formatting codes. For example, a LaTeX document often contains no low-level formatting code at all, as in Listing 3-1.&lt;br /&gt;&lt;br /&gt;Listing 3-1. LaTeX Markup&lt;br /&gt;\documentclass{article}&lt;br /&gt;\title{Sample document}&lt;br /&gt;\author{David Megginson}&lt;br /&gt;&lt;br /&gt;\begin{document}&lt;br /&gt;\maketitle&lt;br /&gt;&lt;br /&gt;This is a simple LaTeX document.&lt;br /&gt;&lt;br /&gt;\end{document}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;That example is not, functionally, much different from a similar document in XML, as in Listing 3-2.&lt;br /&gt;&lt;br /&gt;Listing 3-2. XML Markup&lt;br /&gt;&lt;article&gt;&lt;br /&gt;  &lt;title&gt;Sample document&lt;/title&gt;&lt;br /&gt;  &lt;author&gt;David Megginson&lt;/author&gt;&lt;br /&gt;&lt;br /&gt;&lt;para&gt;This is a simple XML document.&lt;/para&gt;&lt;br /&gt;&lt;/article&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Behind the scenes, however, the LaTeX example hides the formatting code inside macro definitions, whereas the XML example has no direct link to formatting at all. Even so, the ideas behind XML and SGML are familiar to people in computer technology, math, and science, who have been working with formats like LaTeX for many years. That fact that HTML was inspired by, but not initially based on, SGML lexical conventions also smoothed the introduction of XML into the documentation world.&lt;br /&gt;&lt;br /&gt;It is XML's document origin that explains specialized syntactic features, such as mixed content and CDATA sections, that seem to make computer processing of XML more difficult than it should be, especially in terms of whitespace handling. Although these features cause technical problems, they exist to allow XML to work with human-readable, publishable information, such as books and articles. For machine-readable data (see Chapter 4), simple lists and tables are usually sufficient, as in Listing 3-3.&lt;br /&gt;&lt;br /&gt;Listing 3-3. Sample XML Data, Without Mixed Content&lt;br /&gt;&lt;parts&gt;&lt;br /&gt;  &lt;part&gt;&lt;br /&gt;    &lt;number&gt;16687&lt;/number&gt;&lt;br /&gt;    &lt;name&gt;locknut&lt;/name&gt;&lt;br /&gt;  &lt;/part&gt;&lt;br /&gt;  &lt;part number="16687"&gt;&lt;br /&gt;    &lt;number&gt;35581&lt;/number&gt;&lt;br /&gt;    &lt;name&gt;washer&lt;/name&gt;&lt;br /&gt;  &lt;/part&gt;&lt;br /&gt;&lt;/parts&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This example has a clear distinction between markup and content: Every element contains either text or other XML elements but never both. Documented-oriented XML, on the other hand, tends to be messier, as in Listing 3-4.&lt;br /&gt;&lt;br /&gt;Listing 3-4. Sample XML Document with Mixed Content&lt;br /&gt;&lt;para&gt;The film &lt;title&gt;Gone with the Wind&lt;/title&gt; appeared in&lt;br /&gt;&lt;date&gt;1939&lt;/date&gt;, looking back to the U.S. Civil War while much of&lt;br /&gt;the rest of the world was already preparing for &lt;event&gt;World War&lt;br /&gt;II&lt;/event&gt;.&lt;/para&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This second example has no clear distinction: The content of the para element consists of both text and other elements mixed together. The presence of this kind of mixed content is a strong indication that an XML file is intended as a document rather than as a data collection.&lt;br /&gt;&lt;br /&gt;These days, XML-encoded documentation is about as common as SGML or LaTeX documentation was before itpeople use XML mainly in large, complex technical documentation systems or small, private research projectsbut documents are no longer the main use for generic markup. Interest in using XML to exchange data and to set up distributed computing (Chapter 5, XML networking) now far exceeds any interest in XML for documentation. Many of the initial XML document-oriented specifications (XLink [XLINK], XPointer [XPOINTER], and XSL-FO [XSL-FO]) now either languish with few users or have been coopted for use with data or networking (XSLT [XSLT], XPath [XPath]), whereas data- or networking-oriented specifications keep on appearing. The world appears to be satisfied with the Hypertext Markup Language [HTML] for online documentation and Microsoft Word for print and is not eager to embrace XML with all its extra complexity.&lt;br /&gt;&lt;br /&gt;Obviously, because of that extra complexity, XML is not a general-purpose solution for all documentation projects, but in some situations, using XML for documents makes a lot of sense, particularly when you need to publish and republish large amounts of technical information in multiple formats, combine human-written material with information from databases, or customize publications for individual recipients. This chapter examines both the advantages and the disadvantages of XML documents and introduces some of the special issues involved with XML publishing.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-8919585163635496219?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/8919585163635496219/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/listing-xml-markup.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/8919585163635496219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/8919585163635496219'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/listing-xml-markup.html' title='Listing  XML Markup'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-983898958180507587</id><published>2009-06-15T23:22:00.000-07:00</published><updated>2009-06-15T23:23:01.415-07:00</updated><title type='text'>XML Project Pitfalls</title><content type='html'>XML Project Pitfalls&lt;br /&gt;The components listed in Section 2.1 deal with the technical side of an XML project; this section looks at pitfalls that come mainly from the human side, including unspoken expectations, unrealistic expectations, and resistance to change. A larger XML project in a government or corporate setting will often encounter threats from several groups of people and may face more risk from overenthusiastic supporters than from opponents.&lt;br /&gt;&lt;br /&gt;2.2.1. Unspoken Expectations&lt;br /&gt;XML still gets a lot of media attention. Sometimes, managers approve XML projects simply so that their customers and shareholders will see that they are using the latest technologies. Nothing is wrong, in principle, with using XML as a marketing technique; the problem is that this goal is almost always unspoken. Nobody tells the project team that it is participating in a marketing exercise, and even if the team realizes that fact, it is still forced to act as if it were implementing a real project. In fact, the situation is worse, because often the team is the designated scapegoat, starting with a set of fictitious written goals that it has little hope of reaching, thereby setting the team up to take the blame for the project's technical failure. To make things even worse, management will often announce an excessively large XML project for maximum publicity but then spend as little as possible on development once the headlines have quieted down, starving the project of money and resources.&lt;br /&gt;&lt;br /&gt;The best way to work around this problem is for management to be honest about the project's goals and requirements, in writing. If market visibility is one of the project's main goals, write it down, along the lines of "the major goal of this project is to raise BigCorp's visibility in the market by showing our commitment to new technologies like XML." Two years later, new managers and new team members will be able to measure their progress more fairly, and it may turn out that the project was a marketing success even though it was a technological flop.&lt;br /&gt;&lt;br /&gt;2.2.2. Unrealistic Expectations&lt;br /&gt;Difficult or impossible requirements are not always the result of devious maneuvering, however; sometimes, they come about honestly and sincerely not only from management but also from the developers. Managers and developers attend conferences and listen to zealots promoting the latest XML specifications, then rush out and make support for those specifications into requirements before evaluating their value and the level of available support, as discussed in Section 1.6. If the specifications chosen are not widely supported, the project's developers will not be able to use off-the-shelf software and will end up doing a lot of custom development to support a specification that brings little or no value to the organization. In many cases, no one has yet proved that the specifications even can be supported in a production environment.&lt;br /&gt;&lt;br /&gt;This problem is especially common when a specification has endorsements from large companies and organizations. Those endorsements can give the impression that the companies are planning to use the specification or even to produce off-the-shelf software to support it, but that's rarely the case: Large companies wait for proven demand before making major investments in technologies. The name IBM, Microsoft, or Oracle on an XML-related specification simply means that the company authorized one or more of its employees to serve on the standards committee, not that it is about to release shrink-wrapped software to support the specification.&lt;br /&gt;&lt;br /&gt;Starting out with unrealistic expectations can quickly leave developers and managers frustrated. The so-called standards mean more work rather than less, and there are no extra rewards for following them. The expected software and tool support never appears, and customers or suppliers that were talking about exchanging information using the new specification never get around to doing it. In the end, all the XML has to be converted back to an older, legacy information format anyway, and the XML ends up as an expensive and unnecessary extra step in the information pipeline.&lt;br /&gt;&lt;br /&gt;The best way to work around this problem is to plan for the present, not for the future. How well is a specification supported today? How many partners, customers, or suppliers want to exchange XML-based information today? How many products and components are available off the shelf today? If the answer to each of these questions is close to zero, postponing support for the specification is probably the wisest choice.&lt;br /&gt;&lt;br /&gt;2.2.3. Resistance to Change&lt;br /&gt;Incremental technical innovations often have mild and benign social effects. For example, the change from rotary to touch-tone dialing did not initially have a major impact on people's lives, although eventually it did allow for automated telephone systems; likewise, the change from roof antennas to cable television simply built on what people were already doingwatching TVbut expanded their choice.&lt;br /&gt;&lt;br /&gt;Disruptive technical innovations, on the other hand, have immediate and unavoidable social effects, both positive and negative, with clear winners and losers. Consider, for example, the effect of peer-to-peer file sharing on the music industry (losers) or the effect of cellular phone technology on real estate agents (winners). As happened with the music industry and peer-to-peer file sharing, the people who fear that they might be losers will fight long and hard against the change, believing that they are better off with the status quo.&lt;br /&gt;&lt;br /&gt;XML typically falls into the disruptive group, so XML projects can face serious resistance. Although big companieseven the ones with the most to lose from open file formatshave embraced XML, XML still poses the same kind of apparent threat to individual users that file sharing poses to the music industry. The best place to start is the separation of content and formatting, one of the central assumptions of XML.&lt;br /&gt;&lt;br /&gt;It is common in the XML world to be dismissive about WYSIWYG word processors, such as OpenOffice Writer or Microsoft Word, but authors using such WYSIWYG systems have an enormous amount of control over their work. Although they may be required to use a specific template and to follow a standard style guide, they can still add formatting directly and see more or less what the published version of their work will look like. They can add page breaks, rearrange paragraphs, add tabs and indentation, and fiddle in many other ways until their text not only reads well but looks good. Taking away control over formatting and presentation wipes out much of what gives document authors pride in their work.&lt;br /&gt;&lt;br /&gt;It is not simply a matter of control, however, but of prestige. To balance the authors' freedom in a writing team using word processors or desktop publishing software, there is often a set of complex rules, both written and unwritten, enforced by editors and senior writers. Many of these rules are related to formatting and software; mastering these rules, especially the unwritten ones, gives the senior people a position of power over the junior ones. Switching to XML immediately weakens or eliminates that power: XML-driven editing software can enforce many structural rules that used to have to be enforced by editors or peers, and formatting rules become mostly irrelevant. The new XML-based system may be just as complex, but the senior people no longer have an advantage: They have to learn it from scratch, just like the junior people do, and probably will not be able to learn it as quickly.&lt;br /&gt;&lt;br /&gt;A third, related problem is simple overwork and frustration, even from people who are not otherwise opposed to the project. An XML-based system often requires people to learn a software product that may be buggy and incomplete. At the same time, unless the group using the XML project is brand new, such as a start-up or a recently created division of a company or organization, the users likely have to keep up with their regular work during the transition. Structured markup requires a new way of thinking, and a new way of thinking takes time to sink in; if, as is typical, users are not given any extra timeor even if they suspect that they won't bethey will be enormously hostile to any new system, XML or otherwise.&lt;br /&gt;&lt;br /&gt;Even the people who will not be using the system directly will likely be skeptical, as they are with any big technical change; they'll be concerned aboutor jealous ofthe resources being devoted to the XML project and will be eager to jump on the first weakness that turns up.&lt;br /&gt;&lt;br /&gt;So, to summarize, following are four major reasons that people will be secretly or openly hostile to a new XML project in any company or organization.&lt;br /&gt;&lt;br /&gt;Authors do not want to give up control over the physical appearance of their work.&lt;br /&gt;&lt;br /&gt;Senior people do not want to lose their advantage of experience with the current system over the coworkers.&lt;br /&gt;&lt;br /&gt;Users do not want to devote the time to learn a new system, risking falling behind in their existing work.&lt;br /&gt;&lt;br /&gt;All members of the company or organization may be skeptical that the project's benefits will justify the cost.&lt;br /&gt;&lt;br /&gt;How can an XML project deal with these obstacles? The best place to start is an admission that, sometimes, the naysayers are right. An XML project may fail or underperform, especially if it involves desktop authoring tools. Employees may initially find their work less pleasant once they've lost some control over it. Senior people are at real danger of being left behind by any technological change. Employees will find that their managers expect them to learn and to adopt complicated new technologies without any temporary decrease in productivity. All together, like most other workplace innovations, a big change like XML can make for a bad situation and, eventually, a poisoned working environment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-983898958180507587?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/983898958180507587/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/xml-project-pitfalls.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/983898958180507587'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/983898958180507587'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/xml-project-pitfalls.html' title='XML Project Pitfalls'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-6758212349845050872</id><published>2009-06-15T23:21:00.000-07:00</published><updated>2009-06-15T23:22:39.035-07:00</updated><title type='text'>Components of an XML</title><content type='html'>Components of an XML Project&lt;br /&gt;XML is nothing more than a way of adding structure to information, so you can use XML for almost any purpose; in that sense, there is no such thing as a typical XML project. XML can show up in technical publishing, networked games, spreadsheets, air traffic control, news publishing, blogging, or just about anything you can imagine that involves passing information from one system to another.&lt;br /&gt;&lt;br /&gt;Still, many XML projects involve performing similar operations on XML information, even if the final result is different. The operations described in this section and illustrated in Figure 2.1 are not low-level libraries and tools, such as parsers, as important as those are, but high-level stages in the life cycle of an XML document:&lt;br /&gt;&lt;br /&gt;Creation&lt;br /&gt;&lt;br /&gt;Storage&lt;br /&gt;&lt;br /&gt;Search&lt;br /&gt;&lt;br /&gt;Archiving&lt;br /&gt;&lt;br /&gt;Transformation&lt;br /&gt;&lt;br /&gt;Rendering&lt;br /&gt;&lt;br /&gt;Transport&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Figure 2.1. Components of an XML project&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;To illustrate these stages, this section describes a hypothetical production system for a retail catalog. The designers chose XML because the company needs to publish the catalog in print for mailing to telephone customers and online for use by Web customers. (For more information on single-source publishing, see Section 3.1.1.)&lt;br /&gt;&lt;br /&gt;For creation, the developers build a Web application with forms that authors can use to enter information directly into a database. In this system, the authors deal with only a tiny amount of XML. Photographers upload digital photographs of products, filling in metadata fields with basic information about each picture: date, product number, and so on. Writers write product information in various fields of a Web form, including product number, colors, and styles, and a short description, which allows a few simple types of in-line markup. When it is time to generate the complete XML master catalog, a script issues SQL database queries to collect information and then assembles it into an XML document, matching photos with descriptions and extracting current pricing and shipping information from other data tables.&lt;br /&gt;&lt;br /&gt;The storage is the relational database, which holds the product photographs as binary large objects (BLOBs), and puts textual information directly into relational tables. The database also contains other product information, such as price, size, and weight, all keyed on the product bar code.&lt;br /&gt;&lt;br /&gt;Search takes place through the standard database query language [SQL]. For information already in database tables, a specialized XML search engine is not needed. A separate Web application translates user search criteria into SQL database queries, runs them against the database, and then formats the results as HTML pages with links into the catalog.&lt;br /&gt;&lt;br /&gt;Because the catalog is updated and published relatively infrequently, XSLT is adequate for rendering, despite its performance problems in high-speed environments (see Section 8.3 for more information). A series of XSLT templates generates HTML and prints renditions from the XML master file exported from the database during the creating stage.&lt;br /&gt;&lt;br /&gt;Transport takes place in various ways, depending on the catalog media. For the printed version, the transport is nothing more than regular mail; for the Web version, the transport is a Web server using HTTP. The catalog company also sends the raw XML version of the catalog to sales partners through a secure FTP server so that they can customize it and then generate their own formatted output, using their own XML systems.&lt;br /&gt;&lt;br /&gt;As XML systems go, this one is fairly straightforward. Authors can use simple forms-based interfaces rather than unfamiliar XML editing tools, and searching and storage use standard relational database facilities. This kind of approach does not always work, however, particularly for less structured information, such as reports or news stories. The following subsections discuss the various approaches people can take for each stage.&lt;br /&gt;&lt;br /&gt;2.1.1. Creation&lt;br /&gt;Normally, an XML system starts with an XML document, which has to come from somewhere. Two common ways of creating the starting XML document are to&lt;br /&gt;&lt;br /&gt;Have authors create it directly, using a text editor or a custom XML authoring tool&lt;br /&gt;&lt;br /&gt;Have software assemble it automatically from other sources, such as database tables, non-XML data files, or even other XML documents (see Section 2.1.4).&lt;br /&gt;&lt;br /&gt;The second approach will not always work, but where it does, as in the retail catalog example earlier, it will be significantly cheaper and easier than using XML authoring tools. Automatic software assembly generally works for data-oriented XML (see Chapter 4) but not for document-oriented XML (see Chapter 3). Direct XML authoring allows for richer information and works well with document-oriented XML, but it comes with higher ongoing costs for training, technical support, and staff time, as well as a higher probability of resistance from users.&lt;br /&gt;&lt;br /&gt;Larger XML projects sometimes combine the two approaches. Authors write basic in-line content and possibly skeleton structures in XML; then automated processes flesh out the document with automatically generated boilerplate text, tables, figures, and other data. A project producing maintenance manuals for large machinery might follow this approach, using the database to hold part numbers, standard procedures, warnings and cautions, diagrams, and other reusable information. Changes to the database will automatically appear in the XML document without requiring human editing.&lt;br /&gt;&lt;br /&gt;2.1.2. Storage and Archiving&lt;br /&gt;Now that you have created an XML document, either by hand or through scripts, you might need somewhere to keep it. That is not always the case, though; in XML networking (see Chapter 5), your system might simply generate the XML, blast it out over the network, and then forget about it. Even if you need to keep the XML around, simply saving it to the hard drive or LAN, the same way you would with a spreadsheet or word-processing file, might be sufficient. You can get a little fancier by keeping the XML in a revision-control system, such as Concurrent Version System (CVS) or Microsoft's Visual SourceSafe, without having to buy or build any specialized XML software.&lt;br /&gt;&lt;br /&gt;You cannot always get away with the easy solutions, however. You might need to allow several authors to work on different parts of the same XML document simultaneously, be able to maintain snapshots of hundreds of documents in a consistent state, or automate workflow through the authoring and editorial processes. For the first requirement, vendors sell custom XML databases that can manage each element in a document as if it were a separate file, but these databases have not had good results in the field. More typically, people will store XML documents in relational databases, either by decomposing them into data tables or by storing them as BLOBs or character large objects CLOBs.&lt;br /&gt;&lt;br /&gt;The major database vendors, such as Oracle and IBM, provide special support for working with XML in their products. Normally, even large projects can avoid the need for simultaneous authoring by dividing documents into small files. For example, a system could store an XML manual as 500 separate files, one for each task, rather than as a single, large filethat way, it is easy for different authors to work on different tasks without conflict. Larger repositories will almost always require some search ability: see Section 2.1.3 and Chapter 6 for more information.&lt;br /&gt;&lt;br /&gt;Archiving is a special case of storage. One of the major selling points of XML is future proofing: In 50 or 100 years, it may be difficult to read proprietary binary formats, but XML is designed to be easily accessible. Archiving may have special requirements, such as optical rather than magnetic media, and may also impose additional requirements on XML information, such as encryption, digital signatures, and metadata about when, how, why, and by whom each document was created. Archives typically also require an ability to search.&lt;br /&gt;&lt;br /&gt;2.1.3. Search and Retrieval&lt;br /&gt;Chapter 6 deals with the complex topic of XML searching. When an XML project contains dozens, hundreds, or even thousands of individual XML documents, authors and others working on the project will require some form of search and retrieval to find information. Following are several common approaches, from least to most complex, for searching XML documents:&lt;br /&gt;&lt;br /&gt;Batch searching&lt;br /&gt;&lt;br /&gt;Full-text indexing&lt;br /&gt;&lt;br /&gt;Database metadata&lt;br /&gt;&lt;br /&gt;Structural indexing&lt;br /&gt;&lt;br /&gt;With batch searching, a program reads all the documents for every search, similar to the way the Unix grep command searches plaintext files. Batch searches can be relatively slow, taking anywhere from a few seconds to a few hours or more; however, because there is no preindexing, there are no built-in limitations about the kinds of searches people can make. Batch searching is most appropriate when searches are rare but possibly complex and delays are not a problem.&lt;br /&gt;&lt;br /&gt;Full-text searching uses pregenerated indexes to speed up searching but simply treats XML documents like any other text documents, filtering out the markup and indexing the content and, possibly, attributing values. Although full-text searching is a blunt tool, it can be surprisingly effective, and many well-tested free and commercial indexing and retrieval tools are available off the shelf. Some full-text search engines allow labeled fields, so it is possible to add the name of the element containing text to the index, providing some simple structural search ability. Full-text indexing is most appropriate when content consists mainly of prose, such as novels, Web logs, or newspaper stories.&lt;br /&gt;&lt;br /&gt;Database metadata is a useful approach for finding XML documents based on preselected criteria. When a user checks an XML document into the system, the system scans it once, extracting predetermined information, such as names, organizations, country codes, dates, headlines, and so on, and stores that information in regular relational database tables. The system is then able to find XML documents using normal SQL database queries. This approach is most appropriate for documents that consist mainly of highly structured information, such as lists, tables, or fields, or for documents that include explicit metadata, such as news stories.&lt;br /&gt;&lt;br /&gt;Like full-text searching, structural searching uses indexes to speed up operations. However, instead of indexing only the text, the software also indexes the XML structure that goes with it. As a result, it is possible to formulate complex queries combining XML structure with text content. Both the XML Path Language [XPath] and the forthcoming XML Query Language [XQuery] can take advantage of structural search engines when they are available.&lt;br /&gt;&lt;br /&gt;2.1.4. Transformation&lt;br /&gt;Many XML systems include a transformation pipeline. A preliminary, raw XML document starts out at one end of the system and moves down the pipeline like a virtual assembly line, going through various stages of transformation until a finished XML document emerges from the other end. Transformations may involve rearranging or removing information that is already in the XML document, adding information from external sources, merging several smaller XML documents into one larger onesuch as assembling chapters into a bookor splitting a large XML document into several smaller ones, such as breaking a book up into smaller Web pages.&lt;br /&gt;&lt;br /&gt;Transformation components typically go through at least two iterations. First, developers prototype the transformations by using simple, template-based tools, such as XSLT processors; then, to improve efficiency and reduce memory requirements, developers rewrite the transformations in custom source code. In some cases, if speed is not essential and memory restrictions are not a problem, a system will continue to use XSLT right into production. One advantage of custom coding, however, is that it is easier to include information from non-XML sources, such as relational databases.&lt;br /&gt;&lt;br /&gt;Typically, transformation tools require more custom coding than storage or searching, but they are not overly complex or expensive. See Section 8.3.4 for more information.&lt;br /&gt;&lt;br /&gt;2.1.5. Rendering&lt;br /&gt;Rendering is a specialized form of transformation (Section 2.1.4) intended for human consumption rather than machine use. Rendering is also a complex topic, and Chapter 3 examines it in more detail.&lt;br /&gt;&lt;br /&gt;In practice, rendering components nearly always convert XML documents to HTML for online display and PDF or PostScript for printing. Normally, it is necessary to write separate code for rendering print and HTML, as the primitives are entirely different: HTML documents have tables, paragraphs, and links, whereas printed documents are usually formatted as a series of nested boxes on the page.&lt;br /&gt;&lt;br /&gt;Online rendering has some special possibilities. The simplest, most portable, approach is to convert the XML to HTML in advance, but some Web sites store only the XML and generate HTML on the fly when requested; modern browsers can handle XML directly without a conversion step, a just-in-time rendering approach that allows the user to set preferences and customize the appearance or content of the rendered document. Both of the major browsersMozilla and Microsoft's Internet Explorersupport client-side XML rendering using XSLT or CSS, but very few sites take advantage of this capability.&lt;br /&gt;&lt;br /&gt;2.1.6. Transport&lt;br /&gt;The final major component is transport: Once the information is ready, it needs to get to the end user. Chapter 5 is devoted to XML networking and deals extensively with transport issues.&lt;br /&gt;&lt;br /&gt;Very simple forms of transport include burning information onto a CD-ROM and mailing it, sending it as an e-mail attachment, or making it available through an FTP or HTTP server. More sophisticated projects may require scheduled, guaranteed delivery, publish-subscribe, and other features supported by advanced XML-related networking specifications.&lt;br /&gt;&lt;br /&gt;For some projects, transport is the most important part. For example, financial information services, such as Bloomberg and Reuters, make their money from getting information to a customer as quickly as possible, and wire services add extensive metadata to their news stories to help customers process it automatically. The Web log movement is built almost entirely around the ability of RSS to make transport simple. Such specifications as NewsML [NEWSML], RSS [RSS], and Internet Content Exchange [ICE] deal with transport in great detail.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-6758212349845050872?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/6758212349845050872/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/components-of-xml.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/6758212349845050872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/6758212349845050872'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/components-of-xml.html' title='Components of an XML'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2873716402863254637.post-5046116104296298522</id><published>2009-06-15T23:20:00.001-07:00</published><updated>2009-06-15T23:20:31.854-07:00</updated><title type='text'>learn xml script</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2873716402863254637-5046116104296298522?l=learn-xml-script.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learn-xml-script.blogspot.com/feeds/5046116104296298522/comments/default' title='Poskan Komentar'/><link rel='replies' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/learn-xml-script.html#comment-form' title='0 Komentar'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/5046116104296298522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2873716402863254637/posts/default/5046116104296298522'/><link rel='alternate' type='text/html' href='http://learn-xml-script.blogspot.com/2009/06/learn-xml-script.html' title='learn xml script'/><author><name>saeful uyun</name><uri>https://profiles.google.com/100073614588157994904</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-pUbI6Jk_NE8/AAAAAAAAAAI/AAAAAAAADB8/wPQGiJJchOs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry></feed>
