Omegahat Statistical Computing

Ideas for statistical computing

Archive for the ‘R’ Category

Topics related to R rather than other more general topics. This is going to be a hard one as I tend to think of things in more general terms.

Rllvm

Posted by omegahat on September 1, 2010

Over the past 10 years, I have been torn between building a new stat. computing environment
or trying to overhaul R. There are many issues on both sides. But the key thing is to
enable doing new and better things in stat. computing rather than just making the existing things
easier and more user-friendly.

If we are to continue with R for the next few years, it is essential that it get faster.
There are many aspects to this. One is compiling interpreted R code into something faster.
LLVM is a toolkit that facilitates the compilation of machine code. So in the past few days
I have looked into this and developed an R package that provides R-bindings to some of
the LLVM functionality.

The package is available from http://www.omegahat.org/Rllvm, as are several examples
of its use.
I used the package to implement a compiled version of one of Luke Tierney’s compilation examples
which uses a loop in R to add 1 to each element of a vector. The compiled version gives a speedup
of a factor of 100, i.e. 100 times faster than interpreted R code. This is slower than x + 1
in R which is implemented in C and does more. But it is a promising start. The compiled version is also faster than bytecode interpreter approaches. So this is reasonably promising.

Of course, it would be nicer to leverage an existing compiler! (Think SBCL and building on top of LISP).

Advertisements

Posted in Language, R, Uncategorized | 3 Comments »

Rffi

Posted by omegahat on September 1, 2010

A few weeks ago, I posted the Rffi package on the Omegahat repository.
It is an interface to libffi which is a portable mechanism for invoking native routines
without having to write and compile any wrapper routines in the native language.
In other words, we can use this in R to call C routines using only R code.
This enables us to call arbitrary routines and get back arbitrary values, including structures
arrays, unions, etc.

One could use the RGCCTranslationUnit package to obtain descriptions of routines and data
structures and then generate the interfaces to those routines via functions in Rffi.

Writing or generating C/C++ code for wrappers (see RGCCTranslationUnit) is still the way to
go in many ways, but Rffi is very convenient for dynamic invocations without any write and compile
setup costs.

As usual, you can install this from source from the Omegahat repository

install.packages(“Rffi”, repos = “http://www.omegahat.org/R”, type = “source”)

but you will need to have installed libffi.

Posted in Language, R, Uncategorized | Leave a Comment »

RXQuery

Posted by omegahat on March 24, 2010

I have put a new version of the RXQuery package which interfaces to the Zorba XQuery engine. This makes the package compatible with the 1.0.0 release of Zorba for external functions.

The package allows one to use XQuery from within R and to use R functions within XQuery scripts.

Posted in R, Uncategorized, XML | Tagged: | Leave a Comment »

Package Releases

Posted by omegahat on March 20, 2010

I just put a new version of the XML package on the Omegahat repository.

There is a new version of the RKML package which handles large datasets much more rapidly.

Also, I put a new package named RJSCanvasDevice which implements and R graphics device that creates JavaScript code that can be subsequently display on a JavaScript canvas in an HTML document.

Posted in R | Tagged: , , | 4 Comments »

Posted by omegahat on March 17, 2010

Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we can fix up the errors, add namespace declarations to the document and then re-parse the resulting document. Here is the code. It will make it into the XML package.

fixXMLNamespaces =
  #
  #  call as
  #    dd = fixXMLNamespaces("~/v75_step6.wsp", .namespaces = MissingNS)
  #  or
  #   dd = fixXMLNamespaces("~/v75_step6.wsp", gating = "http://www.crap.org", 'data-type' = "http://www.morecrap.org")
  #
function(doc = "~/v75_step6.wsp", ..., .namespaces = list(...)) 
{
    # collect the error messages
  e = xmlErrorCumulator(, FALSE)
  doc = xmlParse(doc, error = e)

  if(length(e) == 0)
     return(doc)

     # find the ones that refer to prefixes that are not defined
  ns = grep("^Namespace prefix .* not defined", unique(environment(e)$messages), val = TRUE)
  ns = unique(gsub("Namespace prefix ([^ ]+) .*", "\\1", ns))

    # now set those name spaces on the root of the document
  if(is(.namespaces, "list"))
    .namespaces = structure(as.character(unlist(.namespaces)), names = names(.namespaces))

  uris = .namespaces[ns]
  if(length(uris)) {
     mapply(function(id, uri)
              newXMLNamespace(xmlRoot(doc), uri, id),
            names(uris), uris)
     xmlParse(saveXML(doc), asText = TRUE)
  } else
     doc
}

(I’ve made some minor changes thanks to Hin-Tak’s suggestions, but haven’t tested them.)

Posted in R, Uncategorized, XML | 4 Comments »