hot info

Script Client-Side XML

on Senin, 15 Juni 2009

Client-Side XML
Single-source publishing allows people to write documents in XML and then convert them to multiple formats, such as PDF or PostScript for print and HTML for the Web. However, the World Wide Web Consortium [W3C] did not design XML simply to be a source for other formats; many people intended XML itself to be a delivery format for the Web, replacing HTML. Beyond XML is a family of W3C specifications, such as XML Linking Language [XLINK], XML Pointer Language [XPOINTER], cascading style sheets [CSS], and XSL Transformations [XSLT], all designed to help browsers display XML documents directly.

In many ways, this work has succeeded. Both Microsoft's Internet Explorer and the variants of Mozilla contain extensive XML support: They can display a raw XML document without a style sheet, or they can use style sheets to make an XML document in a browser window indistinguishable from HTML for the casual user. Behind the scenes, however, client-side scripts can perform sophisticated tricks based on the XML source, providing a richer browsing experience for users. Although the XML support in the two browsers has incompatibilities, and some specifications, such as XLink, are barely supported, if at all, there is also a surprising amount of compatibility, especially in comparison with the bitter browser wars of the late 1990s.

Both the standards and the software are in place for delivering XML directly to users without going through a middle format, such as HTML or PDF, and most users not only have the software installed on their system but use it daily for viewing HTML pages. Nevertheless, XML on the Web is almost nonexistent, as HTML is good enough for almost everything that anyone wants to do on the Web, and the extra benefits of delivering XML directly do not make up for the costs of new training, new authoring tools, and incompatibility with the minority of users who still have old, pre-XML browsers installed.

Although client-side XML has failed to take off the way that HTML did in the mid-1990s, the tool availability is still a benefit for XML documentation projects. Instead of purchasing and installing special XML viewers, authors can preview formatted versions of XML documents directly in their familiar Web browsers. On intranets and other areas where browser versions are more uniform, client-side XML viewing is a useful ability, even if it has failed to become a social phenomenon on the Web.

3.4.2. Reuse
Documentation always fits awkwardly into technology projects. Technical writers complain about programmers who make changes at the last minute, forcing the writers to redo most of the documentation; programmers, in turn, complain about writers who seem unable to write most of their documentation until just before a release deadline.

In fact, programmers have long created and used systems to help them write and maintain documentation aimed at other programmers. In the 1980s, Donald Knuth promoted literate programming, whereby the source code for a program, such as TeX, was embedded inside its own documentation and extracted automatically for compilation; anyone editing the documentation would edit the code at the same time, and vice versa, ensuring that the documentation and the code remained synchronized. In the mid-1990s, as the Java programming language increased in popularity, the opposite approach became common: Programmers embedded documentation in the source code as specially formatted comments and extracted it automatically for publication. (Earlier programming languages, such as Emacs LISP, had already used this approach on a smaller scale.) The JavaDoc system proved extremely effective for generating programmer's API documentation and has been much imitated for other programming languages.

JavaDoc and literate programming work for programmer's documentation because the documentation nearly always follows the structure of the source code. When the programmer deletes a class or a method, the documentation disappears with it; when adding or modifying a class, the programmer simply needs to modify the documentation that is right there on the screen with it.

Unfortunately, things are not so easy for most technical writers. Normally, their documentation is designed for users rather than for programmers, so it is based on tasks or concepts rather than source-code structure. As a result, there is no natural connection between the changes a programmer makes to the source and the changes a technical writer has to make to the documentation. A single user task, such as creating a new account, might touch code from dozens of source code modules managed by different programmers; a single source code module might affect dozens of different task descriptions. Even a trivially small change to the source code can have an exponential impact on the documentation.

Consider a simple code module that displays a dialog box containing a message and two buttons labeled Accept and Cancel. The quality-assurance specialist sends a note to the programmer, saying that, for consistency, the first button should be labelled OK rather than Accept; the programmer takes 5 minutes to change one line of code, test, and commit, and the documentation specialist then announces that it will take 2 weeks to revise the tutorial and manual. What happened?

First, dozens of different parts of the code might invoke that dialog box, and each may be used by dozens of different tasks. Suppose that a manual has text like the following:

Select crop from the File drop-down menu.

A confirmation dialog will appear. Select Accept to continue.

For each instance, the writer will have to change Accept to OK, and then the writeror editor or quality-assurance specialistwill have to recheck all the documentation against the software. Even worse, the manual may contain screenshots of the dialog in different contexts, all of which will have to be recaptured and recropped. If a small change like this can cause so much trouble, it is not difficult to understand how a more fundamental change could throw technical documentation into chaos.

This kind of problem was common in computer programming as well until the structured programming movement, beginning in the 1970s, and the object-oriented programming movement, beginning in the 1980s, helped programmers get better at writing reusable code. Programmers have learned to encapsulate reusable code in a single place, such as a function or an object or even a library, rather than duplicating the same code over and over again in their programs; database designers do the same thing when they normalize their database. In fact, document writers have been able to do this for centuries before computers existed, simply by embedding a reference in a text, such as "(see Job 8:810)."

Modern technical documents could use the same include-by-reference approach as modern computer programs, in which case their documents might look like this:

To create a new document, take the following steps:

(See p.145)

(See p.251)

(See p.18)

(See p.44)

(See p.182)

It would be easy to write documents this way, especially with word processors that can track references and automatically fill in page numbers, but it would not be easy to read such a document. Following cross-references and keeping track of previous locations are a lot more difficult for humans than for a computer, and people reading documentation like this will quickly get frustrated or simply lost. Documentsat least in the final form that readers seeare necessarily highly redundant, or what a database specialist would call denormalized: They have to repeat information.

This is where another of XML's big promises comes in. XML, like its predecessor, SGML, is designed to allow writers to reuse text the same way that programmers reuse code: A single change in the XML source document can automatically propagate itself throughout the output formatted document at the other end of the XML publishing system.

XML has several mechanisms for allowing reusable text, among which the simplest is the internal text entity. In the internal DTD subset, an author includes a declaration like the following:





Then, in the main text, the author enters a reference to that entity:

A confirmation dialog will appear. Select
&accept-button-name; to continue.




Although this approach is dead simple for a single author creating XML in a text editor, it can cause problems in a large, multiuser environment, in which regular authors should not be able to modify the DTD, where the entity declarations appear. In those cases, system designers come up with more elaborate methods for reusable text. For example, in a maintenance manual, the following caution might appear many times:

Caution: Use calibrated torque wrench. Overtorquing may cause the bolt to shear.

In a big project, an author might create this caution once, possibly as an independent XML document like the following:

Use calibrated torque wrench. Overtorquing may
cause the bolt to shear.




An author who needs to include this caution will include it by reference, often through a custom-designed dialog box added to the editing system, as follows:





The same technique should work for shared steps in tasks, boilerplate legal text, and anything else that gets repeated throughout a document.

This approach looks like the ideal bridge across the discontinuity between the way coders code and the way writers write, but experience in field use has been disappointing. First, in technical writing, repeated text tends to be similar but not identical: A part name in the middle of a caution or the transition text at the beginning of a paragraph will change, depending on its context. Even when the text does not change, managing and locating small, reused chunks of text is mind-numbing work. Most authors would probably prefer to simply to retype when necessary, rather than spend several minutes each time searching through a repository of reusable steps to see whether one is appropriate. Even if authors were willing to use such a system, the savings would not be as great as forecasted. For example, if the same step were used in 200 places and the step were changed, authors or editors would still have to check all 200 places in the text to make sure that the change was appropriate in context, and checking the 200 places will usually take as long as retyping them.

It may be that new tools and new ways of running projects make reusable information more common in the future; for now, however, XML is not a universal solution to the documentation discontinuity.

3.4.3. Idioms
Although reusable text is a bit of a chimera, single-source publishing is a real benefit that can come from an XML publishing system. However, single-source publishing also has its limitations, and it is important to understand them before starting on a major XML publishing project.

Single-source publishing is an exciting idea that also happens to be easy to explain to nonspecialists. You create a single XML source document, then use scripts or templates to transform it automatically into different publication media, such as print, Braille, an automated voice telephone system, or the Web.

Much of the time, people want to publish from XML source documents only to print and the Web. For publishing to print, the typical data formats are TeX, PDF, RTF, PostScript, and MIF; for publishing to the Web, the typical data formats are HTML and XHTML but sometimes also PDF or Flash. XML specialists learn quickly that they need to write separate transformation style sheets for print and the Web, even if the core content is identical. The obvious problem is that they are transforming to different primitives: HTML deals with abstractions, such as paragraphs and lists, whereas print formats tend to deal with concrete layout elements, such as blocks, fonts, and spacing. This difference is not simply a design problem that could be fixed. Print formats are fundamentally page-based, whereas HTML is fundamentally screen-based. Each has pros and cons.

Page-based formats allow a designer to take advantage of all the available space, by including multiple texts and graphics in different parts of a single page, with fine control over the placement and size of each item. However, page-based formats are also brittle: A document needs to be optimized for a specific size and aspect and will not move easily across different display devices. (Try viewing a U.S. letter- or A4-sized PDF document on a handheld computer.)

Screen-based formats are inherently more flexible and, when properly used, will work for many different display sizes. However, that flexibility comes at the cost of surrendering control over the finer points of layout. (Try placing sidebars and graphics precisely in HTML.)

The real problem, then, is that a Web page is not simply a print document online but rather a fundamentally different kind of thing. That is why Web pages need their own style sheets. That's not such a big problem, however: Writing two, or even ten, style sheets accounts for very little overhead when you will be using them hundreds or thousands of times to transform XML for publication.

Single-source publishing works well for both print and the Web, as long as you are publishing the right kind of thing. A technical manual for a software program, or a novel, can easily pass from a single XML source document through a couple of different style sheets to print and Web versions, all without human intervention. You do end up, usually, with Web pages that scroll a lot (say, one page for each chapter). That will not be a problem if the user has decided to read a book or manual online, but it is not what you normally expect to find in a Web page. Web pages are typically short, dynamic, and interactive, not long and static.

There is no reason that a person cannot design an XML document type that takes dynamic content into account, so that the HTML rendition can contain animations, applets, forms, and so on, but doing so requires that you place new constraints on your XML document type in advance: You cannot publish just any document and have it look good both online and in print. This is not a medium problem but an idiom problem. You can print a Web page on paper or put a novel online, but neither fits naturally there, because Web pages and novels are fundamentally different kinds of things. Many other idioms cause trouble for single-source publishing. Consider, for example, slides, online help, and Unix man pages: Each of these follows a fundamentally different set of constraints and carries a different set of reader expectations, and it gets more and more difficult to generate all of them from a single XML source document. In the end, XML cannot deliver on all its promises for single-source publishing; it can allow you to publish to multiple formats from a single master document, but publishing to multiple idioms is much more difficult.

Is universal single-source publishing a hopeless case, then? Not quite. It turns out that, although all these idioms have drastically different top-level structures, they share much in common at the lower levels. Many of them, for example, have paragraphs of text with special-purpose phrases contained in them, and many use lists and tables. For the lower-level content, XML truly can deliver on at least some of the promise of single-source publishing: the same table might appear on a slide, a man page, a Web page, and in a printed book, all in significantly different contexts, appropriate to each idiom.

Furthermore, people are often willing to accept deficiencies for the sake of saving money or having easier access to information. For example, a state legislature might accept simple, unimaginative, automatically generated Web pages for its online legislation in exchange for a cost savings of $100,000 a year. The Web site for a popular magazine, on the other hand, will not likely be willing to make the same trade. Designing and implementing an XML-based publishing system is largely a matter of managing expectations. Raising false hopes about single-source publishing will lead to disappointment and hostility later on, but if both the implementers and the users understand and agree to the tradeoffs in advance, XML publishing can work and save money.

0 komentar:

Posting Komentar