Posted by omegahat on April 15, 2014

I was very happy to participate in an NRC workshop on “TRAINING STUDENTS TO EXTRACT VALUE FROM BIG DATA“.  The discussion was very interesting and there was a terrific mix of very impressive participants.  Near the end, there was a slight contention. One dimension of that was what is perhaps a useful distinction that we should make when talking about Data Science generally.  Many of us talk about databases in Data Science, however, some people think of that in very ambitious, technically advanced and interesting ways.  On the other hand, practitioners (and people thinking from the data analysis perspective), typically think of using databases in a very simple-minded manner, and perhaps as being something dictated by the provider of the data.  As a result, many topics important in database design and implementation are not as relevant to consumers of databases.  So perhaps it is important to think of two types of data scientist – the consumer of data focusing on data analysis, and a different group who are “data engineers” and who design data products and implementations and who can architect important frameworks for the analysts to consume.

So data science may be better described as having sub-categories of data analysts and data engineers.



  1. karenyng said

    Yep, there are articles talking about different types of data scientists and
    this mosaic plot summarizes what skills each type of data scientist would need: from

    The break down of the different types of data scientists might not be the most accurate.
    But according to my friend (math PhD) who just got a job as a data engineer,
    interview questions for data scientist positions do have quite different emphases,
    either they ask a lot about data structure / algorithm questions or
    they give you a scatter plot of data and ask you how you would approach it (stat. analysis questions).
    Seems like your typical stat. student(s) will fall into the data researcher category.

