An Overview of Semistructured Data (1998) [69 citations — 1 self]
Abstract:
Semistructured data is motivated primarily by data exchange. Partners rarely agree on both a common data model and a common schema hence, the reasoning goes, it is more convenient to exchange the data in a loose format. Semistructured data is tagged, i.e. every data item has a unique tag (also called label), and is nested, i.e. we allow one data item to contain other data items; formally, a semistructured data instance is a tree whose nodes are labeled with tags. Data can be structured loosely, and new tags may be invented at will. XML [Con98] is a standard syntax for describing such trees; Fig. 1 shows a tree representing a semistructured data instance and its XML syntax. We will refer interchangeably to semistructured data instances as trees or XML trees. But some sort of agreement is necessary nevertheless, to allow one application to understand the data produced by another. At minimum one should agree on the taxonomy, i.e. the meaning of the dierent tags. For example, an application must be able to understand the catalog product product product name mfr-price sales-price "Widget " 55 "Red"

