Omegahat Statistical Computing

Ideas for statistical computing

Archive for the ‘XML’ Category

Topics related to XML, generally

RXQuery

Posted by omegahat on March 24, 2010

I have put a new version of the RXQuery package which interfaces to the Zorba XQuery engine. This makes the package compatible with the 1.0.0 release of Zorba for external functions.

The package allows one to use XQuery from within R and to use R functions within XQuery scripts.

Posted in R, Uncategorized, XML | Tagged: | Leave a Comment »

Posted by omegahat on March 17, 2010

Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we can fix up the errors, add namespace declarations to the document and then re-parse the resulting document. Here is the code. It will make it into the XML package.

fixXMLNamespaces =
  #
  #  call as
  #    dd = fixXMLNamespaces("~/v75_step6.wsp", .namespaces = MissingNS)
  #  or
  #   dd = fixXMLNamespaces("~/v75_step6.wsp", gating = "http://www.crap.org", 'data-type' = "http://www.morecrap.org")
  #
function(doc = "~/v75_step6.wsp", ..., .namespaces = list(...)) 
{
    # collect the error messages
  e = xmlErrorCumulator(, FALSE)
  doc = xmlParse(doc, error = e)

  if(length(e) == 0)
     return(doc)

     # find the ones that refer to prefixes that are not defined
  ns = grep("^Namespace prefix .* not defined", unique(environment(e)$messages), val = TRUE)
  ns = unique(gsub("Namespace prefix ([^ ]+) .*", "\\1", ns))

    # now set those name spaces on the root of the document
  if(is(.namespaces, "list"))
    .namespaces = structure(as.character(unlist(.namespaces)), names = names(.namespaces))

  uris = .namespaces[ns]
  if(length(uris)) {
     mapply(function(id, uri)
              newXMLNamespace(xmlRoot(doc), uri, id),
            names(uris), uris)
     xmlParse(saveXML(doc), asText = TRUE)
  } else
     doc
}

(I’ve made some minor changes thanks to Hin-Tak’s suggestions, but haven’t tested them.)

Posted in R, Uncategorized, XML | 4 Comments »