musings on music and life

May 7, 2017

Course Updates

Filed under: Coding, Data Science — sankirnam @ 2:22 pm

As they say, the path to self-improvement never ends…

I just finished the final exam for the course MITx: 6.00.2x Introduction to Computational Thinking and Data Science on EdX, and this motivated another summary post, similar to what I wrote last year on its prequel course, MITx : 6.00.1x. These two courses make up an introductory sequence to computer science, primarily geared at non-majors; there is a similar corresponding course taught on the MIT campus. While 6.00.1x is focused on getting students up to speed with Python and using it to write simple programs, 6.00.2x then looks at more fundamental CS concepts (e.g. greedy algorithms, search trees, etc.).

The presentation of the course is excellent – all the movies are in HD, and the text is clearly visible on the screen. Code snippets presented in the video lectures can also be downloaded later so that you can play with them. While Prof. Guttag’s lecturing style may not be quite as engaging as Prof. Malan’s (Harvard CS50), the MIT rigor is definitely there in every slide.

When it comes to the material and choice of topics in the course, the instructors decided to go for breadth rather than depth, and this led to a very rushed coverage of a lot of topics. At the same time, in an introductory course like this, you will have a lot of non-majors taking the course, and you want to give them a flavor of everything the subject has to offer. I have the same issues with the standard introductory general chemistry curriculum that is used today at most universities – in those, the topic coverage doesn’t necessarily translate to knowledge that may be very relevant even for future chemistry courses. In any case, after taking this course, I have the confidence to take future courses in computer science/programming, and am especially interested in trying out some basic algorithms courses. While I may not have the chops yet to crack open Knuth and study that on my own, I think a guided approach in another class would be valuable.

The problem sets, as always, were appropriately challenging. I made it through to the end of the course, which means that I probably fared better than other students who may have dropped out, but among those who stuck till the end, I think I am one of the weaker students. The course has a corresponding Slack channel, and most of the students who took the final said on the Slack channel that they were able to finish the final far faster than I did (of course, this may also be subject to reporting bias). The course lays an emphasis on OOP (Object-Oriented Programming), and so this teaches you how classes, objects, and their instances are implemented in Python.

I did try taking 6.00.2x last year immediately after completing 6.00.1x, but I got hopelessly stuck on the first problem set involving implementing a greedy algorithm. This time around, I powered through it, and was also able to finish the rest of the course. I’ve becoming pretty good at debugging my code using print() statements, and from what I hear, this is an extremely important skill.

I also took the course HarvardX: PH526x Using Python for Research (Edx) last year, and I figured that I would put my thoughts on that course in this post as well. This is a basic-to-intermediate level course that introduces the various Python libraries that are useful in scientific computing. Some of the elements of the Numpy stack are included (Numpy, pandas, matplotlib), as well as some other packages (Bokeh, cartopy, and others). As with any course, there is no way you can cover everything there is in any one of these packages, and so there is always a tradeoff for breadth vs. depth. 

All the coding assignments and homework problems for this course were done through DataCamp, which has its own quirks. I remember having issues getting a question involving PCA correct due to rounding errors (caused by implementing pca.fit_transform() vs. a sequential pca.fit() followed by a pca.transform()) which were not being accounted for by the grader.

PH526x also covers some vanilla python topics, including an introduction to list comprehensions, which is one of my favorite aspects of Python; once you understand the simple concept (e.g. initialize an empty list, iterate through something, and append to the list), you’ll begin to want to use it everywhere, and there’s nothing quite as satisfying as being able to write list comprehensions that compress several lines of code into one line. Prof. Onnela (the instructor) also covers the itertools module briefly, which is handy for generating things like “power sets”, which are used for coming up with brute-force algorithmic solutions.

In retrospect, I wish I had taken both of these courses before the “Data Science” bootcamp last year, but what’s done is done – actually, I wouldn’t have been able to, since PH526x was only released for the first time last November.

Another thing I’m curious about is the attrition rate for these courses; how many people actually finish? Knowing this might help to give me a better idea about whether I actually accomplished something significant or not.

As always, for those who are curious, I’m uploading everything to my github.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: