Omegahat Statistical Computing

Ideas for statistical computing

Archive for the ‘Uncategorized’ Category

Twitter API with OAuth2 using R

Posted by omegahat on October 13, 2014

I just put together some code to collect tweets from Twitter’s search API for some students at Davis.  A brief document describing the approach and the code itself is available at https://github.com/duncantl/TwitterOAuth2.git. It is not completely robust, but it does illustrate how to

  • use OAuth2 for the application-only authentication,
  • deal with rate-limiting, and
  • cursor through the result set of a single query.

The OAuth2 approach gives us a higher rate-limit.  There is also code to use the OAuth1.1 mechanism by directly signing the request using the ROAuth package. This is quite simple using the ROAuth:::signRequest() function.

I am not the first to do this and other people have posted aspects of this at various places.  This tries to show all the pieces.

Advertisements

Posted in Uncategorized | Leave a Comment »

Data Science & Data Engineering

Posted by omegahat on April 15, 2014

I was very happy to participate in an NRC workshop on “TRAINING STUDENTS TO EXTRACT VALUE FROM BIG DATA“.  The discussion was very interesting and there was a terrific mix of very impressive participants.  Near the end, there was a slight contention. One dimension of that was what is perhaps a useful distinction that we should make when talking about Data Science generally.  Many of us talk about databases in Data Science, however, some people think of that in very ambitious, technically advanced and interesting ways.  On the other hand, practitioners (and people thinking from the data analysis perspective), typically think of using databases in a very simple-minded manner, and perhaps as being something dictated by the provider of the data.  As a result, many topics important in database design and implementation are not as relevant to consumers of databases.  So perhaps it is important to think of two types of data scientist – the consumer of data focusing on data analysis, and a different group who are “data engineers” and who design data products and implementations and who can architect important frameworks for the analysts to consume.

So data science may be better described as having sub-categories of data analysts and data engineers.

 

 

Posted in Uncategorized | 1 Comment »

Rllvm

Posted by omegahat on September 1, 2010

Over the past 10 years, I have been torn between building a new stat. computing environment
or trying to overhaul R. There are many issues on both sides. But the key thing is to
enable doing new and better things in stat. computing rather than just making the existing things
easier and more user-friendly.

If we are to continue with R for the next few years, it is essential that it get faster.
There are many aspects to this. One is compiling interpreted R code into something faster.
LLVM is a toolkit that facilitates the compilation of machine code. So in the past few days
I have looked into this and developed an R package that provides R-bindings to some of
the LLVM functionality.

The package is available from http://www.omegahat.org/Rllvm, as are several examples
of its use.
I used the package to implement a compiled version of one of Luke Tierney’s compilation examples
which uses a loop in R to add 1 to each element of a vector. The compiled version gives a speedup
of a factor of 100, i.e. 100 times faster than interpreted R code. This is slower than x + 1
in R which is implemented in C and does more. But it is a promising start. The compiled version is also faster than bytecode interpreter approaches. So this is reasonably promising.

Of course, it would be nicer to leverage an existing compiler! (Think SBCL and building on top of LISP).

Posted in Language, R, Uncategorized | 3 Comments »

Rffi

Posted by omegahat on September 1, 2010

A few weeks ago, I posted the Rffi package on the Omegahat repository.
It is an interface to libffi which is a portable mechanism for invoking native routines
without having to write and compile any wrapper routines in the native language.
In other words, we can use this in R to call C routines using only R code.
This enables us to call arbitrary routines and get back arbitrary values, including structures
arrays, unions, etc.

One could use the RGCCTranslationUnit package to obtain descriptions of routines and data
structures and then generate the interfaces to those routines via functions in Rffi.

Writing or generating C/C++ code for wrappers (see RGCCTranslationUnit) is still the way to
go in many ways, but Rffi is very convenient for dynamic invocations without any write and compile
setup costs.

As usual, you can install this from source from the Omegahat repository

install.packages(“Rffi”, repos = “http://www.omegahat.org/R”, type = “source”)

but you will need to have installed libffi.

Posted in Language, R, Uncategorized | Leave a Comment »

RXQuery

Posted by omegahat on March 24, 2010

I have put a new version of the RXQuery package which interfaces to the Zorba XQuery engine. This makes the package compatible with the 1.0.0 release of Zorba for external functions.

The package allows one to use XQuery from within R and to use R functions within XQuery scripts.

Posted in R, Uncategorized, XML | Tagged: | Leave a Comment »

Posted by omegahat on March 17, 2010

Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we can fix up the errors, add namespace declarations to the document and then re-parse the resulting document. Here is the code. It will make it into the XML package.

fixXMLNamespaces =
  #
  #  call as
  #    dd = fixXMLNamespaces("~/v75_step6.wsp", .namespaces = MissingNS)
  #  or
  #   dd = fixXMLNamespaces("~/v75_step6.wsp", gating = "http://www.crap.org", 'data-type' = "http://www.morecrap.org")
  #
function(doc = "~/v75_step6.wsp", ..., .namespaces = list(...)) 
{
    # collect the error messages
  e = xmlErrorCumulator(, FALSE)
  doc = xmlParse(doc, error = e)

  if(length(e) == 0)
     return(doc)

     # find the ones that refer to prefixes that are not defined
  ns = grep("^Namespace prefix .* not defined", unique(environment(e)$messages), val = TRUE)
  ns = unique(gsub("Namespace prefix ([^ ]+) .*", "\\1", ns))

    # now set those name spaces on the root of the document
  if(is(.namespaces, "list"))
    .namespaces = structure(as.character(unlist(.namespaces)), names = names(.namespaces))

  uris = .namespaces[ns]
  if(length(uris)) {
     mapply(function(id, uri)
              newXMLNamespace(xmlRoot(doc), uri, id),
            names(uris), uris)
     xmlParse(saveXML(doc), asText = TRUE)
  } else
     doc
}

(I’ve made some minor changes thanks to Hin-Tak’s suggestions, but haven’t tested them.)

Posted in R, Uncategorized, XML | 4 Comments »

Posting blog entries directly from R.

Posted by omegahat on March 14, 2010

While looking more at how others were preparing blog content about R, I saw that at least one person was uploading content via a python script. I like programmatic solutions and since I am writing a book on XML and Web Technologies including Web services, I looked into this. The mechanism used is XML-RPC. I have an XMLRPC package for R so we can quickly deploy it to provide functionality in R that allows each of us to

  • query information from our blog
  • post blog items, append to a post, create new pages, add categories, etc.

So the RWordpress package is the result.

Correction: WordPress is taking the URL and capitalizing the p in RWordpress. This seems to happen only for words containing “wordpress”. So I have renamed the R package to RWordPress. The link on this page (even while being lower-case p in the HTML) now corresponds to the new package name. Thanks Tal.

Posted in Uncategorized | 2 Comments »

Blogging directly from R – example

Posted by omegahat on March 14, 2010

This post is submitted directly from R using the RWordpress package.

newPost(list(description = 'This post is submitted directly from R using the RWordpress package.', title = 'Blogging directly from R - example'))

Posted in Uncategorized | Leave a Comment »

Forgetting email attachments

Posted by omegahat on March 12, 2010

When I write email and say that I am attaching a document, I frequently send the mail and forget to attach the document. We’re all working too fast and off it goes. So I thought I’d write an extension for Thunderbird to check for the string attach. Just as I was about to do this, I decided to check if there was such an extension already. And sure enough there was – attachment reminder by Philipp Kewisch and Daniel Folkinshteyn. And sure enough it works nicely with my version of Thunderbird.

Posted in Uncategorized | Tagged: , | Leave a Comment »

R plots and Google Earth in the browser

Posted by omegahat on March 12, 2010

In addition to playing with KML as a graphics device, I wrote up some notes that introduce how to put Google Earth into a Web browser and then use elements of the JavaScript API for Google Earth. This progresses to show how to add HTML form elements such as buttons and checkboxes to control the Google Earth display. And it ends by showing how to put an SVG plot created in R beside the Google Earth plugin and have that SVG plot be interactive. As the viewer moves the mouse over the time series in the R plot, the Google Earth display is rotated to show the corresponding US city.  The example is not necessarily very compelling, but the mechanism should allow us to do a lot more interesting things.

Posted in Graphics, Uncategorized | Tagged: , , , , | Leave a Comment »