hot info

The Disadvantages of XML Searching

on Rabu, 17 Juni 2009

The Disadvantages of XML Searching
Despite the optimistic view of many people in the XML community, the XML searching problem is complicated, from both technical and business perspectives. In some situations, XML-based contextual searching can be a major advantage; in others, it can be an unnecessary cost; in yet others, it can make the search engine's results worse. This section introduces some of the problems with the very idea of XML-based searching.

XML searching may be too complex for most users.

Documents on the Web can use deceptive markup to raise their ranking in a search.

XML documents are generally not interoperable in the same search environment, because of all the different, incompatible vocabularies.

6.2.1. Usability
Tim Bray, cofounder of Open Text, which ran an early Web search engine, and coauthor of the original XML specification [XML], wrote the following passage in a Web log (http://www.tbray.org/ongoing/When/200x/2003/06/17/SearchUsers):

Nobody Uses Advanced Search...

Every search engine has an "advanced search" screen, and nobody (quantitatively, less than 0.5% of users) ever goes there. This drove us nuts back at Open Text, because our engine was very structurally savvy and could do compound/boolean queries that look like what today we'd call XPath. But nobody used it.

What most people want is to have a nice simple field into which they will type on average 1.3 words and hit Enter, and have the result come back to them. So anyone who's building search needs to focus almost all their energy on doing an as-good-as-possible job given those 1.3 words and no other inputs.

This observation does not bode well for XML searching. If users are unwilling to use even relatively simple full-text techniques, such as Boolean or proximity searches, how much hope is there that they will be willing to formulate the complex queries that can take advantage of XML markup? Fortunately for the future of XML searching, Bray does go on to qualify that observation:

...Except the People who Do

Of course, the people who do use Advanced Search are your most fanatical users, the professional librarians, spooks, and private investigators. And the ones who will do what it takes to find out everything about research on the rare disease their child just got diagnosed with. These people tend to be loud-mouthed and aggressive and will get in your face if you don't have advanced search or it's not real good.

Presumably, these same kinds of people would be the ones using XML context in their searches. Others of Bray's "fanatical users" might be academics preparing papers, journalists researching news stories, and software agents collecting and amalgamating information for politicians and managers. This last example, in fact, may point to the real potential users of XML searching: not people but software. People other than governors and CEOs need to make decisions in their own lives, from changing jobs to buying new clothes, and software agents that find information for peoplesay, for price comparisoncould benefit greatly from the extra information provided by XML markup, assuming, of course, that vendors were willing to encode their pricing information in a standard format and accept the transparency that comes with that.

And that leads to another usability problem: XML searching requires people or software to know a lot about the structure of the documents they're searching. If all XML documents shared a single, global vocabulary, searching would be relatively straightforward, at least for power users: Every price would appear inside a price element, every bar code would appear inside a upc element, every person's name would appear inside a person element, and so on. This is unlikely ever to happen, for two reasons:

No single, accepted authority could impose a common vocabulary on all users.

XML documents can encode a potentially infinite variety of information, so a common vocabulary would always be incomplete.

Some XML-related specifications are designed to work around these problems, at least partlysee Section 6.3.3 for more informationbut in reality, if a large amount of XML markup did appear on the Web today, generalized XML searching would be almost useless, given the hundreds of incompatible XML-based vocabularies. The best people can hope for is specialized searching inside repositories or across Web collections in which all XML documents share a common type: Conceptually, this is the equivalent of a site search engine rather than a Web search engine, and it falls far short of a revolution in Web searching. Even then, searching will be more complex than the most difficult "advanced search" page currently available on full-text Web search engines. Either users will have to become experts in XML structure, or they will have to limit themselves to a few precooked searches, such as "Search for a person" or "Search for a part number," through Web sites that can construct an XML query for them.

In the end, XML searching may be useful for specific project applications. But usability issues alone make it seem unlikely that XML will ever cause the social revolution in Web searching that some supporters hoped for when the specification first appeared.

0 komentar:

Posting Komentar