XML query library

Author(s): José Manuel Gómez Pérez.

This package provides a language suitable for querying XML documents from a Prolog program. Constraint programming expresions can be included in order to prune search as soon as possible, i.e. upon constraint unsatisfability, improving efficiency. Also, facilities are offered to improve search speed by transforming XML documents into Prolog programs, hence reducing search to just running the program and taking advantage of Prolog's indexing capabilities.

Queries in an XML document have a recursive tree structructure that permits to detail the search on the XML element sought, its attributes, and its children. As a suffix, a constraint programming expression can be added. Queries return value for the free variables included (in case of success), and checks whether the XML document structure matches that depicted by the query itself.

The operators introduced are described below:

  • @ Delimits a subquery on an elment's attribute, such as product@val(product_name, "car"), the first argument being the attribute name and the second its value. Any of them can be free variables, being possible to write queries like product@val(Name, "car"), intended to find the 'Name' of attributes of element product whose value is the string "car".

  • :: The right-hand side of the subexpression delimited by this operator is a query on the children elements of the element described on its left-hand side.

  • with Declares the constraints the items sought must satisfy.

Some examples of this query language (more can be found in the examples directory):

  • Example A:

product@val(product_name,"car")::(quantity(X), 
                                  'time-left'(Y), 
                                  negotiation::preference::price(Z)) 
       with X * Z .>. Y

  • Example B:

nitf::head::docdata::'doc-id'@val('id-string',"020918050")::(Y), 
                     body::'body.head'::abstract::p(X)


Usage and interface

Documentation on exports

PREDICATE

Usage:xml_search(Query,Source,Doc)

Checks a high level query Query against an XML document Source. If the query is successful it retuns in Doc the whole xml element(s) of the document that matched it.

  • The following properties should hold at call time:
    (term_typing:nonvar/1)Query is currently a term which is not a free variable.
    (term_typing:nonvar/1)Source is currently a term which is not a free variable.
    (term_typing:var/1)Doc is a free variable.
    (xml_path_types:canonic_xml_query/1)Query is a primitive XML query.
    (xml_path_types:canonic_xml_item/1)Source is either a XML attribute, a XML element or a line break.
    (xml_path_types:canonic_xml_item/1)Doc is either a XML attribute, a XML element or a line break.

PREDICATE

Usage:xml_parse(Query,Source,Doc)

Checks a high level query Query against an XML document Source. If the query is successful it retuns in Doc the whole xml element(s) of the document that matched it. On the contrary as xml_search/3, the query can start at any level of the XML document, not necessarily at the root node.

  • The following properties should hold at call time:
    (term_typing:nonvar/1)Query is currently a term which is not a free variable.
    (term_typing:nonvar/1)Source is currently a term which is not a free variable.
    (term_typing:var/1)Doc is a free variable.
    (xml_path_types:canonic_xml_query/1)Query is a primitive XML query.
    (xml_path_types:canonic_xml_item/1)Source is either a XML attribute, a XML element or a line break.
    (xml_path_types:canonic_xml_item/1)Doc is either a XML attribute, a XML element or a line break.

PREDICATE

Usage:xml_parse_match(Query,Source,Match)

Checks a high level query Query against an XML document Source. If the query is successful it retuns in Doc the exact subtree of the xml document that matched it. On the contrary as '$xml_search_match/3, the query can start at any level of the XML document, not necessarily at the root node.

  • The following properties should hold at call time:
    (term_typing:nonvar/1)Query is currently a term which is not a free variable.
    (term_typing:nonvar/1)Source is currently a term which is not a free variable.
    (term_typing:var/1)Match is a free variable.
    (xml_path_types:canonic_xml_query/1)Query is a primitive XML query.
    (xml_path_types:canonic_xml_item/1)Source is either a XML attribute, a XML element or a line break.
    (xml_path_types:canonic_xml_item/1)Match is either a XML attribute, a XML element or a line break.

PREDICATE

Usage:xml_search_match(BasicQuery,SourceDoc,Match)

Checks query Query against an XML document Source. If the query is successful it retuns in Doc the exact subtree of the xml document that matched it.

  • The following properties should hold at call time:
    (term_typing:nonvar/1)BasicQuery is currently a term which is not a free variable.
    (term_typing:nonvar/1)SourceDoc is currently a term which is not a free variable.
    (term_typing:var/1)Match is a free variable.
    (xml_path_types:canonic_xml_query/1)BasicQuery is a primitive XML query.
    (xml_path_types:canonic_xml_item/1)SourceDoc is either a XML attribute, a XML element or a line break.
    (xml_path_types:canonic_xml_item/1)Match is either a XML attribute, a XML element or a line break.

PREDICATE

Usage:xml_index_query(Query,Id,Match)

Matches a high level query Query against an XML document previously transformed into a Prolog program. Id identifies the resulting document Match, which is the exact match of the query against the XML document.

  • The following properties should hold at call time:
    (term_typing:nonvar/1)Query is currently a term which is not a free variable.
    (term_typing:var/1)Id is a free variable.
    (term_typing:var/1)Match is a free variable.
    (xml_path_types:canonic_xml_query/1)Query is a primitive XML query.
    (basic_props:atm/1)Id is an atom.
    (xml_path_types:canonic_xml_item/1)Match is either a XML attribute, a XML element or a line break.

PREDICATE

Usage:xml_index_to_file(SourceDoc,File)

Transforms the XML document SourceDoc in a Prolog program which is output to file File.

  • The following properties should hold at call time:
    (xml_path_types:canonic_xml_item/1)SourceDoc is either a XML attribute, a XML element or a line break.
    (basic_props:atm/1)File is an atom.

PREDICATE

Usage:xml_index(SourceDoc)

Transforms the XML document SourceDoc in a Prolog program, generating the associated clauses, which are stored dynamically into the current process memory space.

  • The following properties should hold at call time:
    (xml_path_types:canonic_xml_item/1)SourceDoc is either a XML attribute, a XML element or a line break.

PREDICATE

Usage:xml_query(Query,Doc,Match)

Checks that XML document Doc is compliant with respect to the query Query expressed in the low level query language. The exact mapping of the query over the document is returned in Match

  • The following properties should hold at call time:
    (term_typing:nonvar/1)Query is currently a term which is not a free variable.
    (term_typing:nonvar/1)Doc is currently a term which is not a free variable.
    (term_typing:var/1)Match is a free variable.
    (xml_path_types:canonic_xml_query/1)Query is a primitive XML query.
    (xml_path_types:canonic_xml_item/1)Doc is either a XML attribute, a XML element or a line break.
    (xml_path_types:canonic_xml_item/1)Match is either a XML attribute, a XML element or a line break.

Documentation on internals

REGTYPE

(True) Usage:canonic_xml_term(XMLTerm)

XMLTerm is a term representing XML code in canonical form.

    REGTYPE

    (True) Usage:canonic_xml_item(XMLItem)

    XMLItem is either a XML attribute, a XML element or a line break.

      REGTYPE

      (True) Usage:tag_attrib(Att)

      Att is a XML attribute.

        REGTYPE

        (True) Usage:canonic_xml_query(Query)

        Query is a primitive XML query.

          REGTYPE

          (True) Usage:canonic_xml_subquery(SQuery)

          SQuery defines a XML subquery.