musings on music and life

May 7, 2017

Course Updates

Filed under: Coding, Data Science — sankirnam @ 2:22 pm

As they say, the path to self-improvement never ends…

I just finished the final exam for the course MITx: 6.00.2x Introduction to Computational Thinking and Data Science on EdX, and this motivated another summary post, similar to what I wrote last year on its prequel course, MITx : 6.00.1x. These two courses make up an introductory sequence to computer science, primarily geared at non-majors; there is a similar corresponding course taught on the MIT campus. While 6.00.1x is focused on getting students up to speed with Python and using it to write simple programs, 6.00.2x then looks at more fundamental CS concepts (e.g. greedy algorithms, search trees, etc.).

The presentation of the course is excellent – all the movies are in HD, and the text is clearly visible on the screen. Code snippets presented in the video lectures can also be downloaded later so that you can play with them. While Prof. Guttag’s lecturing style may not be quite as engaging as Prof. Malan’s (Harvard CS50), the MIT rigor is definitely there in every slide.

When it comes to the material and choice of topics in the course, the instructors decided to go for breadth rather than depth, and this led to a very rushed coverage of a lot of topics. At the same time, in an introductory course like this, you will have a lot of non-majors taking the course, and you want to give them a flavor of everything the subject has to offer. I have the same issues with the standard introductory general chemistry curriculum that is used today at most universities – in those, the topic coverage doesn’t necessarily translate to knowledge that may be very relevant even for future chemistry courses. In any case, after taking this course, I have the confidence to take future courses in computer science/programming, and am especially interested in trying out some basic algorithms courses. While I may not have the chops yet to crack open Knuth and study that on my own, I think a guided approach in another class would be valuable.

The problem sets, as always, were appropriately challenging. I made it through to the end of the course, which means that I probably fared better than other students who may have dropped out, but among those who stuck till the end, I think I am one of the weaker students. The course has a corresponding Slack channel, and most of the students who took the final said on the Slack channel that they were able to finish the final far faster than I did (of course, this may also be subject to reporting bias). The course lays an emphasis on OOP (Object-Oriented Programming), and so this teaches you how classes, objects, and their instances are implemented in Python.

I did try taking 6.00.2x last year immediately after completing 6.00.1x, but I got hopelessly stuck on the first problem set involving implementing a greedy algorithm. This time around, I powered through it, and was also able to finish the rest of the course. I’ve becoming pretty good at debugging my code using print() statements, and from what I hear, this is an extremely important skill.

I also took the course HarvardX: PH526x Using Python for Research (Edx) last year, and I figured that I would put my thoughts on that course in this post as well. This is a basic-to-intermediate level course that introduces the various Python libraries that are useful in scientific computing. Some of the elements of the Numpy stack are included (Numpy, pandas, matplotlib), as well as some other packages (Bokeh, cartopy, and others). As with any course, there is no way you can cover everything there is in any one of these packages, and so there is always a tradeoff for breadth vs. depth. 

All the coding assignments and homework problems for this course were done through DataCamp, which has its own quirks. I remember having issues getting a question involving PCA correct due to rounding errors (caused by implementing pca.fit_transform() vs. a sequential pca.fit() followed by a pca.transform()) which were not being accounted for by the grader.

PH526x also covers some vanilla python topics, including an introduction to list comprehensions, which is one of my favorite aspects of Python; once you understand the simple concept (e.g. initialize an empty list, iterate through something, and append to the list), you’ll begin to want to use it everywhere, and there’s nothing quite as satisfying as being able to write list comprehensions that compress several lines of code into one line. Prof. Onnela (the instructor) also covers the itertools module briefly, which is handy for generating things like “power sets”, which are used for coming up with brute-force algorithmic solutions.

In retrospect, I wish I had taken both of these courses before the “Data Science” bootcamp last year, but what’s done is done – actually, I wouldn’t have been able to, since PH526x was only released for the first time last November.

Another thing I’m curious about is the attrition rate for these courses; how many people actually finish? Knowing this might help to give me a better idea about whether I actually accomplished something significant or not.

As always, for those who are curious, I’m uploading everything to my github.

December 16, 2016

My experiences with learning “Data Science” in 2016

Filed under: Coding, Data Science — sankirnam @ 11:21 pm

Well, 2016 is drawing to a close…

This has been a weird year globally, with the death of a lot of influential people in history (including, among others, Muhammad Ali and lately J. Jayalalitha, the Chief Minister of Tamil Nadu, India), and some other strange political occurrences (Brexit and Trump getting elected). I haven’t posted here much because I have a million thoughts swirling around my mind all the time now, and finding a couple of hours of focused time in order to distill them down into an article on a single topic is a bit challenging. Nonetheless, there is something that I want to discuss today.

Firstly, I had the sobering realization a few days ago that it has been 2 years since I finished my PhD and I have nothing concrete to show for it; I’ve been unemployed for the past two years. Well, I’ve learned some valuable things about life and other topics which I wouldn’t have been able to learn otherwise, but it has been at a rather expensive cost: progress in my career.

In any case, one of the major themes of this year (for me) was that I made major progress in learning programming! I want to share what I learned so that others who are thinking of venturing down this path can learn from my experiences.

Firstly, my motivation in learning to code resulted in me being a little unfocused; I was unemployed and seeing a lot of people around me getting hired for cushy tech jobs with great salaries. Desperation shouldn’t be your only motivation for trying something. I was also unaware of the vast variety of “coding” jobs out there, and they can be quite different; CSS is considered “coding”, but it is vastly different from doing software development in C++, for instance.

I’m all for teaching computer science principles in the grade school level; the basics of control flow are not terribly complicated – it’s just logic, after all. Understanding looping, recursion, iterations, and conditionals does not require a very advanced background in any other subject, and knowing these will take you very far later on in life. I’m a strong believer that everyone should learn to code, given the increasing automation that is threatening all industries today. Those who can code will be the last people to have their jobs automated out of existence, pure and simple.

All this being said, I started my journey down this rabbit hole with Codecademy. I highly recommend this for others who also have no formal background in programming/CS, as it eases you into the relevant concepts of the language of your choice. It’s a great place to learn the higher-level languages (such as JavaScript, Python, and Ruby), but keep in mind that the courses are introductory, and very short (they can be completed in a few hours). They’re designed to give you just enough knowledge so that you can go out and keep learning on your own or from other sources.

After Codecademy, my next stop was FreeCodeCamp. FreeCodeCamp is amazing, and I hope it grows from strength to strength over time. It is the brainchild of Quincy Larson, and it attempts to create a fairly rigorous curriculum in Full-Stack Web Development starting from scratch; no prior knowledge is required, like Codecademy. The first lesson is literally “Hello World!”. It starts off with a comprehensive coverage of the front-end (website building with HTML and CSS), and also covers responsive design using Twitter’s Bootstrap API. It then progresses into JQuery and vanilla JavaScript, and it has you also do some pretty challenging algorithm challenges, which reinforce your understanding of all the methods and properties in JavaScript. The bonus with FreeCodeCamp is that it also has you working on projects, which can be incorporated into a portfolio so that you have something to show to prospective employers.

Web Development has the lowest barrier to entry among all the different types of programming, and that’s why places like FreeCodeCamp thrive. It was after doing it for a while that I realized webdev wasn’t for me, however; I don’t have the patience to mess around with DOM elements and get that alignment juuuuust right; if I really had to choose, I would be more comfortable doing back-end stuff.

I continued working on JavaScript and FreeCodeCamp while applying to programming bootcamps in April-May 2016, and eventually ended up taking a “Data Science” bootcamp by Logit in Hollywood. I wrote about it earlier,  so there’s no need to reiterate what’s already been said. I felt like “Data Science” would be the best fit, given what I had experienced with programming thus far, and also (naively) thought it would give me the best ability to leverage my PhD.

I used the word “naively” in the previous paragraph; here’s what I learned:

  1. Getting a job after a bootcamp is all about how strong your resume is prior to the bootcamp. Now, that may not seem fair, as people want to go to bootcamps to “reset” their careers and get a fresh start, but the reality is that you really can’t learn much in just 12 weeks. And now that bootcamps are getting more popular, employers are looking for other ways to distinguish you from the hundreds or thousands of other people who are also taking bootcamps. Sure, you took a JavaScript bootcamp, but what else stands out? Do you have an advanced degree (MS/PhD) or did you go to a top university (Harvard/Stanford/MIT/Caltech/CMU etc.)? Do you have relevant prior work experience?
  2. In “Data Science”, degrees in CS, math, statistics, computational fields (e.g. computational biology), biostatistics, or physics are extremely sexy. If you have one, flaunt it as much as you can! Any other degree (including my PhD in Organic Chemistry, as I discovered) is worthless in this context. That’s because “Data Science” is a poorly defined field and a lot of employers still don’t know what they want. If you look at job descriptions, most will require knowledge of a scripting language (R or Python), Java, a lower-level language (C or C++), thorough understanding of SQL, and Bash scripting (on Linux). These are not things you can pick up in a few weeks at a bootcamp.
  3. The “Data Science” market is cooling off right now. A few years ago, there was a massive hype surrounding “Data Science”, and there were numerous articles talking about how there was a critical shortage of “Data Scientists” in the country. My experiences have shown the opposite, however – it took one of my friends in my cohort (who has a PhD in physics, one of the “sexy” subjects I mentioned above), about 4 months to land a job after the bootcamp.

So – what useful, actionable advice can I give after all this? What I can say is that if you want to learn “Data Science”, all the material is available online for free. The advantage with a bootcamp is that it gives you a roadmap of what to study, as well as connections – to your classmates, instructors, and other people who the organization is affiliated with. Out of all the courses I’ve seen and taken online regarding “Data Science”, this progression is probably the best, and most logical (feel free to leave comments if you have other suggestions):

  1. Start with Codecademy if you have 0 programming experience. If you want to get into Web Development, complete the JavaScript, HTML, CSS and related tracks, and then dive right into FreeCodeCamp. Otherwise, if you think you may want to do Data Science or want to have a broader understanding of CS fundamentals, stick with Python.
    N. B. Something to keep in mind: if you have no prior experience with programming, don’t worry about R. R is a specialized language for statistics; it is written by statisticians for statisticians, and the syntax is very challenging even for experienced programmers.
  2. Once you’ve completed Codecademy, the next course I would take is MIT’s 6.00.1x Intro CS course on EdX. I have taken this course myself and I have written about it. This course gives you a fantastic intro to the fundamentals of computer science at a fairly rigorous academic level, and it uses Python as well, so that should give you more practice with programming in vanilla Python. The follow-up course 6.00.2x is also good and covers more advanced topics including algorithms, random walks, and other topics, which should put you in a good position to learn more about “Data Science”.
  3. HarvardX’s PH526x course on EdX is a good follow up to this sequence, since it introduces a lot of the popular Python packages for “Data Science” including numpymatplotlib, Pandas, and others. I also just finished the course earlier this week and will put my thoughts on it in a separate post here.
  4. Microsoft DAT210x on EdX is also highly recommended, and I also wrote about it after completing the course. This course gives plenty of practice with machine learning, and will put you in a good position to learn more about any of the algorithms in the course (K-Means, KNN, SVM, Random Forest, and others).

So – after taking all of these courses, THEN you can think about joining a bootcamp to further your knowledge. I wish I had done all the above courses before I did the “Data Science” bootcamp this summer; I would have been in a better position to learn, absorb, and better assimilate the material. But what’s done is done, and I’m continuing to learn Python, Machine Learning, and “Data Science” concepts at my own pace. I’m continuing to practice vanilla Python on Hackerrank, and you can follow my progress on my github – I’m trying to make github commits on a regular basis so that it makes a favorable impression on whoever happens to stumble across it! Interestingly, some of my repositories are getting a fair bit of traffic….so, you never know!

I sincerely hope that this rather “stream-of-consciousness” post helps you, if you do decide to venture down this path!

 

September 6, 2016

Aaand it’s done!

Filed under: Data Science — sankirnam @ 10:16 am

I just finished the immersive Data Science bootcamp by Logit on Friday and am still slowly recuperating from the experience. The course was intense – it was a firehose of material, and a rapid survey of 2 years worth of material at a Master’s level compressed into 12 weeks.

The course started with a tour of vanilla Python and the Data Science related packages (Numpy, Scipy, Pandas, Matplotlib, Seaborn, Scikit-Learn, statstools, and many more), and then covered basic statistics and probability, before going into Machine Learning, which was the main part of the course. Both unsupervised and supervised models were covered, as well as the major methods of regression, classification, and clustering (e.g. K-Means, K-Nearest Neighbors, SVM, Naive Bayes, Decision Trees and Random Forest). Regularization, resampling, and feature selection were also covered, as well as transformation (e.g. PCA). After making a simple midterm project to do an analysis of a publicly available data set (I chose to work with the Boston Housing Dataset), we moved on to Neural Networks, Time-series analysis, Natural Language Processing, and “Big Data”. As usual, if you are curious about the course materials, I’ll be uploading some of the assignments on my github.

As I mentioned above, this course was really tough. The fact that I was also commuting 2 hours each way did not make it any easier, either. It was my first time ever taking a formal class in any kind of programming or computer science – my only regret now is that I wish I had started studying this sooner! Even if I do not end up in a Data Science-related job, these skills are nonetheless enormously useful.

Now that the course is done, I’m back to where I was 3 months ago – unemployed and back on the job hunt. I’m scheduled to meet with a recruiter today, so hopefully something good pans out! Let’s hope. My goal is to hopefully get a job in the intersection of this and chemistry – ideally in cheminformatics, or using Machine Learning models in drug discovery. Even Analytical chemistry positions would not be too bad – these programmatic data analysis methods can be readily applied in that area too. If that does not work out, then I’m considering applying to Master’s programs in CS. This is a really fascinating field, and I would like to get a better foundation in this area.

But yea, now that the course is done, there will be more posts here! Watch this space…

August 14, 2016

Microsoft DAT210x

Filed under: Data Science, education — sankirnam @ 12:15 pm

I recently completed the course Microsoft DAT210x: Programming with Python for Data Science on Edx, so I’ll just take a moment to review it here. I’m still taking the Data Science bootcamp by Logit, and took this as a supplement to get additional exercises with Python, Pandas, Scikit-Learn, and other Data Science-related packages.

The course was just introduced by Microsoft last month as part of their “Online Data Science Degree Program“. As such, I took the course from July-August, and this was the first iteration (the course just ended last Friday). That being said, 6 weeks is not much time to teach something as broad as “Data Science”. The course starts with an introduction to the subjects of Data Science and Machine Learning, and then progresses into an introduction to Pandas, which is a Python package for the manipulation of DataFrames, similar to what you do in R. After also covering a brief survey of 2-D and 3-D visualizations with Pandas and matplotlib, the course then covers data transformations and dimensionality reduction, namely PCA and Isomap (for non-linear dimensionality reduction). After that, the course covers several important algorithms used in supervised and unsupervised Machine Learning, including K-means clustering, K-Nearest Neighbors classification, Ordinary Linear Regression and Multiple Linear Regression, Support Vector Machines, Decision Trees, Random Forest Classifiers, and a final rush through confusion matrices, cross-validation, pipelining, and tuning parameters with GridSearchCV.

Given that this was the first iteration of the course, my experience was pretty good. The course could use a little more polish, both in the presentation of the online materials, quizzes, and programming assignments. There were minor typos all over the place (though they didn’t really impede understanding of the material), and the quiz questions were rather ambiguous from time to time, much to my consternation. The explanations of the concepts themselves were a little wishy-washy, but when you’re trying to address a general audience there’s little else you can do. Links to the literature, textbooks, and further explanations are included, and should be read as well in order to gain a complete understanding of the subject matter.

The programming assignments were great, however. They were quite challenging, and I think I spent more than the recommended 4-8 hrs/week on the assignments. They were all really interesting, and challenging enough where I didn’t get completely frustrated and give up. We used PCA and Isomap to project 3-D images into 2-D space, K-Means to identify people’s residences based on anonymized geolocation data from their cellphones (!), linear regression to reconstruct audio samples (sort of like what they do on TV shows!), SVC to analyze whether or not someone has Parkinson’s based on collected speech quality, and other interesting examples.

I would recommend people taking this course do as I did, and use it as a supplement for other courses. There’s no way you can learn everything there is to learn about a subject as broad as “Data Science” from one course, and it is good to take multiple courses because some of them explain certain concepts better than others. For example, this course covers Isomap, which is something that most other courses do not.

If you’re curious about the programming assignments, I’m posting them in my github.

June 17, 2016

Python Pandas

Filed under: Data Science — sankirnam @ 3:56 pm

Now that I’m learning how to use the Pandas package in Python, all I can think of is this:

futurama-panda-dump-truck-o

May 17, 2016

On learning to code

Filed under: Coding, Data Science, education — sankirnam @ 11:05 am

Last week, the following article was published in TechCrunch: Please don’t learn to Code. This was swiftly followed by Quincy Larson’s reply, Please do learn to code.

For those who don’t know, Quincy Larson is the founder and director of FreeCodeCamp, an online programming education website that is disrupting the traditional paradigm of teaching programming/ CS. I’m going through it myself, and highly recommend it for anyone who wants to learn programming – the front-end web development curriculum is very well done, and it walks you through HTML, CSS (including responsive design with Bootstrap), JQuery, and JavaScript. Even if you do not necessarily want to go into webdev, this is a good place to start; it has you make projects to really cement your knowledge. Until I did this program, I had no idea how to make a website from scratch with HTML and CSS!

In any case, with regards to the articles I linked at the beginning, I am siding with Quincy Larson on the issue. Computers and digital devices are ubiquitous in our lives nowadays, and we spend at least 5 hours or more (a very conservative estimate) a day interacting with computers, whether it is in the form of desktop computers, servers, laptops, tablets, or mobile smartphones. Knowing how to use these devices is one thing, but that is the bare minimum; if you want to be truly productive in today’s society, you need to be able to get these devices to work for you, and that is where a knowledge of programming comes into the picture. In addition, with the rise of machine learning and increased automation, we’re beginning to see an increased number of jobs that were traditionally done by humans now being done by computers. This automation is beginning to seep into areas that are considered “high-skill”, such as organic synthesis. Thus, it’s like I say nowadays:

You don’t want to lose your job because someone else automates your position, right? You would rather be in a position where you automate someone else’s job. The only way to ensure that you are in the latter position is to learn programming/computer science.

The beauty of the field of programming/computer science is that it is extremely egalitarian, compared to other fields. In the programming arena, people care only about what you’ve done, what you’ve accomplished, and whether you know your stuff or not; educational pedigree is largely irrelevant. Contrast this to a field like organic chemistry, where if you do not have a degree from MIT/Caltech/Harvard/Stanford/Berkeley your resume will be swiftly thrown in the trash. This is why, in CS, it is now accepted that a GitHub profile is the new resume.

In other news, I have been applying to bootcamps for the last few weeks, in order to have something do this summer given that the job situation in organic chemistry continues to remain abysmal. I know I have been scornful of bootcamps and “data science” in the past, but my reason for applying to these places is simple. I could learn the material on my own for free (or a significantly reduced cost), but it would take a long time – at least a year or two. If I can accelerate the process and learn everything in 12 weeks, then it is worth the extra cash, and after all, time is the most valuable asset we have in our lives. This video explains it pretty well:

After interviewing at several places, I was accepted to Codesmith, Logit Data Science, and Dev Bootcamp. I’ve decided to go with Logit Data Science simply because it makes more sense given my background; going into full-stack web development is orthogonal to my past education. There are pros and cons to all decisions; Logit is cheaper, but I’m going to be in the first cohort, so it remains to be seen how good the program is going to be. Also, given that my CS, math, and statistics backgrounds are very minimal, I’m anticipating that this is going to be extremely challenging. But sometimes, succeeding in life is all about risks and taking that first leap of faith! Codesmith is a little better established; they’ve been around for a year. I visited their campus/office a couple of weeks ago in Playa Vista, and was very impressed. The atmosphere is quite relaxed, but I did feel the “work hard, play hard” spirit there. The CEO, Will Sentance, is one of the main instructors there, and his teaching style is absolutely fantastic. He explains all the concepts thoroughly and clearly, and his enthusiasm for the subject is infectious. If you’re considering joining a full-stack bootcamp, I highly recommend Codesmith – do check them out! They are up there with Hack Reactor in terms of quality of instruction and overall experience.

April 20, 2016

ok now, this is getting a little ridiculous

Filed under: Coding, Data Science — sankirnam @ 11:26 pm

As part of my job search (which has been ongoing for the last year and a half now), I’m applying to several programming and “Data Science” bootcamps. I have posted my thoughts about “Data Science” before, but it seems the juggernaut is nigh unstoppable. During this process, I have experienced a multitude of things that I need to get down.

First off, I want to get a satisfactory answer to this question: If people with just 12 weeks of education can compete for the same jobs as computer science graduates from a university, does it mean that a CS degree is not really worth that much? On the flip side, the relative value of these skills is still pretty high – you can study chemistry for 10+ years, get a PhD, and end up unemployed (as in my case), or you can go through a bootcamp and code JavaScript and look forward to jobs with a minimum starting salary of $105,000 (so CS >>>>>>>>>>>>> chemistry, every time).

I have also heard that there are an astonishingly high number of CS graduates, even those with advanced degrees, who cannot do simple programming exercises like the “FizzBuzz” challenge or simple algorithms. So perhaps there are a large number of mediocre CS students who are getting through the university system and are unable to pass job interviews or fulfill job requirements. In chemistry, this would be like studying organic chemistry on paper but having trouble going into the lab and doing synthesis (or if you’re a theoretician, not being able to input and optimize a model system in a program like Gaussian or Spartan properly, and draw reasonable conclusions).

The other thing that I have been told by a lot of people who studied computer science formally and are now practicing computer scientists (or programmers) is that “computer science ≠ programming”. While this may be obvious to those in the field, it is not obvious to those outside, such as myself; for a long time, I was belaboring under the illusion that they were the same thing. Pure computer science is more akin to math or logic, and one spends a lot of time learning about abstract concepts such as Data Structures, and it is implied that students should be able to pick up programming skills along the way. The current rise of bootcamps and websites such as FreeCodeCamp and Codecademy has decoupled a “pure” CS education from that of programming; these programs get you coding first, usually with HTML, CSS, and JavaScript, without worrying about the underlying logic or science behind the code. Interestingly enough, when I asked interviewers at bootcamps about this (whether bootcamp graduates with a shallow theoretical CS education could compete with regular CS grads for programming jobs), they mentioned that bootcamp graduates were often competitive, simply because of their ability to code better and faster.

The analogous situation in chemistry would be decoupling experimental and theoretical chemistry – e.g. doing organic synthesis without knowing anything about the theory. Is this possible? We’ll never know, because I don’t think there will ever come a time where the demand for synthetic chemists will jump that high, to obscene levels beyond the ability of universities to produce sufficient graduates. At the same time, safety is the big consideration when comparing computer science and chemistry. If you screw up in CS, nobody will get hurt, but if you screw up in the chemistry lab, a range of things can happen, ranging from nothing (if you’re lucky), to killing yourself (if you’re not careful). But from an educational perspective, is it possible to teach “applied chemistry” in order to reach the masses, the same way websites like Codecademy, FreeCodeCamp, and Code School have revolutionized programming education to make it more egalitarian? Chemical concepts like equilibrium, reaction kinetics, etc. can be dry and theoretical; can you teach chemistry in a way to make it more understandable by the masses, but at the same time maintain the “tactility” required to really understand the subject that can only be achieved through lab work? This is a challenge for the next generation of instructors, and one that we as chemists all must face as we strive to prove to upcoming generations that our subject is relevant!

In any case, back to the subject of bootcamps. One of my friends mentioned earlier today:

“honestly you becoming a vanilla webdev is a waste of your talents and training
a lot of people can do that job
not many people can do research in organic chemistry”

Formatting is messed up because I copy-pasted this from a google chat. This friend does bring up a valid point though; why am I trying to go into CS? I have addressed this before, but I still have inner conflicts where I feel like I should keep trying for a job in chemistry (due to the sunk cost fallacy). In any case, this friend is forgiven for not having an accurate knowledge of the chemistry job market – that last statement is completely inaccurate, as there is a massive glut of people who can do research in organic chemistry.

But the sudden rise of bootcamps has got me thinking – is this indicative of another bubble? There are so many coding bootcamps now all over the US, and “Data Science” bootcamps are also springing up all over the place. BTW, the next person who tells me “with a PhD in science, you should think about going into “data science!” is going to get a kick in a very sensitive place. Unfortunately, as I have learned, organic chemistry is not a “quantitative” discipline, and I have been rejected from The Data Incubator, Metis, and Insight for not having the correct background. Also, the programming background required for “data science” is rather steep, and it is not something that can be easily picked up if you don’t have prior training in CS or programming, which is why I’m looking into “vanilla webdev” bootcamps, as the entry requirements are easier for me to meet with my limited coding background.

As to the title of this post, today I came across this.

I have NO idea what to make of this – it’s a prep course to help you get into a bootcamp (o_O). This is like what goes on in India today – you have prep courses to help you get into prep courses for the IIT JEE entrance exam. This has me completely flummoxed, and is another indicator of how the demand for programmers is far exceeding the supply – App Academy (the company running the prep course) is simply cashing in on this trend. Is this indicative of another imminent bubble? One can’t predict the future, but it certainly does seem that way…

March 5, 2016

PhD Job Prospects

Filed under: Chemistry Jobs, Data Science, education — sankirnam @ 3:44 pm

A friend of mine sent these two articles and asked for my comments on them.

  1. A bridge to business
  2. Enterprising science

The first article talks about how valuable PhDs, postdocs, and PhD candidates are to management consulting firms. It goes into detail about the training that a lot of PhDs receive while working towards their degree, and that their training is just as valuable as what MBA’s receive.

Now, it all sounds nice on paper, but my experience has been the polar opposite. I applied to several consulting firms last year and was either soundly rejected or received no response (which is quite common when applying for jobs online), and this is in spite of being one of those “[valuable] Science-PhD holders” the author talks about. So I really have no idea what management consulting firms are looking for.

The author also states:

“The broad set of valuable transferable skills that you developed while in graduate school go largely unrecognized and unarticulated within the academy. Most PhD graduates restrict their job searches to what they feel qualified to do, rather than exploring what they are capable of doing.”

Again, this trope sounds nice on paper, but my experience with applying for jobs has been quite the opposite. The whole idea of “transferable skills” only really holds in the tech industry, and that too for a small set of subjects (more on this in a moment).

The second article mentions that early-stage scientists (such as assistant professors, post-doctoral fellows, and PhD students) should also look into commercializing their successful ideas and forming start-up companies. This is solid advice. The article also mentions that professors are also not the best people to be running start-up companies, due to the many demands on their time. That is better left to younger people. Of course, this comes with a caveat.

Applied sciences, engineering, and computer science are by their nature easier to commercialize, as opposed to theoretical or more “pure” fields. Problems that are academically interesting are not necessarily ones that will lend themselves to commercialization once investigated. Another issue is that startups are rarely founded off of PhD research because the interests of the advisor and the student are opposed at that point. The advisor will want the successful student to continue working, generating results and writing papers, while the student will want to leave to start the company. In any case, as the author mentions, it never hurts to allow PhD students opportunities to network with successful people in their field; this will help later when they apply for jobs! Sadly, most schools do a piss-poor job in this regard. In most universities, PhD career services are virtually nonexistent, as are networking events for graduate students.

In any case, back to the subject of transferable skills. From what I have seen, transferable skills are those secondary skills that you might pick up on the course of your degree that are not necessary for success in that field, but can be used somewhere else. For example, most PhD holders would have given talks at conferences at some point. Based on that, “making and giving presentations” can be listed as a skill, even though this something that no self-respecting person would be caught putting on his or her resume. This skill is transferable to other fields where giving presentations is important, such as consulting. I’m not sure if this is a good example or not, but it is what I could think of.

Now, one transferable skill that is being thrown around a lot lately is “data analysis”. The author even refers to it in the first article I linked to above:

“If you have earned a PhD, you know, for example, how to analyse data. You also understand how to examine those results to gain insights.”

The term “data analysis” is beginning to seriously annoy me, because it is incredibly vague. A five-year-old putting his hand on a hot stove, screaming in pain, and then learning not to touch the stove again is doing data analysis! Yet would people call the five-year-old a “data scientist”? Even if others wouldn’t, I would – the kid has used evidence (even if it is a single datum) to draw a conclusion! So yes, in the broad sense, we are all “data scientists” and we all go about our day doing “data analysis” all the time, even if we do it unknowingly!

But the crux of the matter is that the type of data you will encounter varies from field to field, and the types of conclusions you can draw – the analysis, in other words – is domain-specific. In other words, “data analysis” is not a transferable skill. This is a seemingly simple fact that unfortunately is being overlooked by recruiters, employers, and tech workers. For example, I can readily interpret NMR spectra, GC-MS data, and other types of spectra that are commonly encountered in a chemistry lab. However, I would be laughed at if I claimed to be doing “data analysis” in the sense that is used in the tech industry today! What the tech industry calls “data science” or “data analysis” is the statistical interpretation, most often using methods derived from computer science, of large sets of facts or figures that have been compiled. Case in point: Thanks to a friend, I got an interview a few days ago for a “data analytics” position. The HR recruiter who called me was thoroughly confused by my resume, and I had to clarify that even though I had a PhD in science, I had zero skills that they were looking for. She told me “oh yea, we regularly hire people from a variety of backgrounds for this position…we have computer scientists, math majors, statisticians, and even physicists!”. Now, as far as transferable skills go, they probably have a very good command over computer science and programming, as well as a strong mathematics background. These skills are not generalizable to all scientists (just like how I would not expect a PhD computer scientist or statistician to be able to go into a chemistry lab and synthesize small molecules)!

As one of my friends told me,”…well, looks like you have a PhD in an inferior science”.

May 28, 2015

Data Science 2015

Filed under: Data Science — Tags: — sankirnam @ 2:18 pm

It appears that the situation has not changed significantly since I talked about “Data Science” here a year ago; in fact, the field seems to be growing. Anecdotally, I hear through word-of-mouth that there is an increasing demand for “data scientists”*, and my job searches on online forums (e.g. LinkedIn) seem to bear this out.

*I use the term “data science/data scientists” in quotes because nobody has really been able to define it properly. It seems to be a mish-mash of statistics and computer science.

The growing demand for “data scientists” is reflected in the proliferation of “data science” bootcamps across the US. NYC Data Science, NewMet Data, Metis, and General Assembly all conduct short (1-3 month) bootcamps to train “data scientists” for the “sexiest” job of the 21st century.

Previously, “data science” bootcamps were using a PhD/PhD candidacy as a heuristic for the “analytical” skills required in “data science”, which confused me greatly. Fortunately, it looks like they have collectively come to their senses and realized that all that is necessary is some programming experience and a rudimentary knowledge of statistics. A degree, even an advanced degree in “STEM”, does not serve as an indicator of the aforementioned skillsets. In fact, one could game the system by simply taking the necessary classes in college, dropping out, and then applying to these bootcamps. An even better way would be to take the courses online for an even lower (or no (!)) cost.

Johns Hopkins University now offers a series of courses on Coursera as part of a “Data Science” Specialization track. I think this is their attempt to cash in on the “data science” trend, and from what I can tell, course reviews are rather mixed. The first one in the sequence, “The Data Scientist’s Toolbox” is extremely basic and can probably be completed in two hours or less (watching the video lectures is not necessary), as it seems to be geared to people who are complete novices even with Windows/MacOS. Based on that experience, you would think that the following course, “R Programming”, would also be aimed at a similarly green audience. WRONG. The course is poorly taught, with the video lectures rushing through key concepts. The online quizzes are fairly straightforward, if not a little confusing (I remember getting tripped up on a question related to lexical scoping in R). But the programming assignments are the tough part. For those without prior programming experience, let alone R experience, these are extremely difficult. The leap in difficulty in the assignments as the class progresses is just too much; I eventually had to drop the course for this reason (fortunately Coursera gave me a full refund). I plan to eventually take it again and complete it once I get more experience in computer science or R.

August 5, 2014

“Data Science” bubble? Or not?

Filed under: Data Science — Tags: — sankirnam @ 10:15 am

Insight and The Data Incubator are offering bootcamps (similar to the one I mentioned by Zipfian Academy earlier) to train people as “Data Scientists” in the nascent field of “big data”. Supposedly, this is a burgeoning field with vast potential:

The data science education market is far from overcrowded: Demand for data scientists continues to outstrip supply. The McKinsey Global Institute estimates that by 2018, the U.S. will face a shortage of 140,000 to 190,000 people equipped with the deep analytical skills necessary to make sense of big data.”

Does this sound familiar? It sounds like the usual political rhetoric of the “STEM shortage” in the US (which is patently false; there is a shortage of overqualified applicants willing to work for pennies on the dollar).

However, the bootcamps offered by Insight and The Data Incubator are different in that they are free:

Beyond room and board expenses in New York City, the six-week program won’t cost participants a penny. Instead of student tuition, The Data Incubator will charge its employer partners if they decide to hire Data Incubator alumni as data scientists or quantitative analysts (quants).”

This sounds almost too good to be true, so naturally there must be some catch. One is that The Data Incubator’s program is more difficult to get into than Harvard (their acceptance rate is below 5%), but I can’t find anything else…

The other odd aspect of these programs is that applicants must be a PhD candidate or have a PhD. Being a PhD candidate myself, I don’t understand these people’s fixation with the title. A PhD is a very poor measure of intelligence or work ethic. I mean, sure, those with doctorates may be 1 or 2 standard deviations more intelligent than the median population, but if I were an employer, I wouldn’t put too much weight on that myself.

Success in research (especially in an experimental field like chemistry) depends much, much, much more on luck than on intelligence or hard work (I would say that is 99.99% luck and 0.01% intelligence/hard work)! In fact, one of the professors in my building said he would prefer to look at prospective students’ horoscopes rather than their transcripts.

In theoretical fields (such as mathematics, philosophy, and finance to a degree), I suspect this may be different. The key factor there is not so much discovery or observation of some natural phenomena as in experimental fields, but insight, which requires a deep understanding of the subject and a high degree of intelligence. Contrast this with an experimental field like organic chemistry where one may have to try hundreds of different combinations of conditions by brute-force methods before finding one that works. One may get lucky and find it in the first few tries, or may run 500 reactions and still have no useful results (this has happened to me)! Due to this, success in research in a field dominated by stochastic results (i.e., an experimental field) is not a good measure of intelligence or work ethic.

Create a free website or blog at WordPress.com.