May 28, 2015

Data Science 2015

Filed under: Data Science

It appears that the situation has not changed significantly since I talked about “Data Science” here a year ago; in fact, the field seems to be growing. Anecdotally, I hear through word-of-mouth that there is an increasing demand for “data scientists”*, and my job searches on online forums (e.g. LinkedIn) seem to bear this out.

*I use the term “data science/data scientists” in quotes because nobody has really been able to define it properly. It seems to be a mish-mash of statistics and computer science.

The growing demand for “data scientists” is reflected in the proliferation of “data science” bootcamps across the US. NYC Data Science, NewMet Data, Metis, and General Assembly all conduct short (1-3 month) bootcamps to train “data scientists” for the “sexiest” job of the 21st century.

Previously, “data science” bootcamps were using a PhD/PhD candidacy as a heuristic for the “analytical” skills required in “data science”, which confused me greatly. Fortunately, it looks like they have collectively come to their senses and realized that all that is necessary is some programming experience and a rudimentary knowledge of statistics. A degree, even an advanced degree in “STEM”, does not serve as an indicator of the aforementioned skillsets. In fact, one could game the system by simply taking the necessary classes in college, dropping out, and then applying to these bootcamps. An even better way would be to take the courses online for an even lower (or no (!)) cost.

Johns Hopkins University now offers a series of courses on Coursera as part of a “Data Science” Specialization track. I think this is their attempt to cash in on the “data science” trend, and from what I can tell, course reviews are rather mixed. The first one in the sequence, “The Data Scientist’s Toolbox” is extremely basic and can probably be completed in two hours or less (watching the video lectures is not necessary), as it seems to be geared to people who are complete novices even with Windows/MacOS. Based on that experience, you would think that the following course, “R Programming”, would also be aimed at a similarly green audience. WRONG. The course is poorly taught, with the video lectures rushing through key concepts. The online quizzes are fairly straightforward, if not a little confusing (I remember getting tripped up on a question related to lexical scoping in R). But the programming assignments are the tough part. For those without prior programming experience, let alone R experience, these are extremely difficult. The leap in difficulty in the assignments as the class progresses is just too much; I eventually had to drop the course for this reason (fortunately Coursera gave me a full refund). I plan to eventually take it again and complete it once I get more experience in computer science or R.



