There are three common design decisions taken by today’s search engines. First, they do not replicate the data found on the Web. Second, they rely on full-text indexes instead. Third, they do not support the querying of document structure. The main reason for the latter is that HTML’s ability to express semantics with syntactic structure is very limited. This is different for XML since it allows for self-describing data. Due to its flexibility by inventing arbitrary new element and attribute names, XML allows to encode semantics within syntax. The consequence is that search engines for XML should support the querying of structure. In our current work on search engines for XML data on the Web, we want to keep the first two design decisions of traditional search engines but modify the last one according to the new requirements implied by the necessity to query structure. Since our search engine accepts queries with structural information, a full-text index does not suffice any longer. What is needed is a scalable index structure that allows to answer queries over the structure of XML documents. One possible index structure called eXtended Access Support Relation (XASR) is introduced. Further, we report on a search engine for XML data called Mumpits. Due to its prototypical character, we intentionally kept the design and implementation of Mumpits very simple. Its design is centered around a single XASR and its implementation heavily builds on a commercial relational database management system
Dieser Eintrag ist Teil der Universitätsbibliographie.