I recently completed the course Microsoft DAT210x: Programming with Python for Data Science on Edx, so I’ll just take a moment to review it here. I’m still taking the Data Science bootcamp by Logit, and took this as a supplement to get additional exercises with Python, Pandas, Scikit-Learn, and other Data Science-related packages.
The course was just introduced by Microsoft last month as part of their “Online Data Science Degree Program“. As such, I took the course from July-August, and this was the first iteration (the course just ended last Friday). That being said, 6 weeks is not much time to teach something as broad as “Data Science”. The course starts with an introduction to the subjects of Data Science and Machine Learning, and then progresses into an introduction to Pandas, which is a Python package for the manipulation of DataFrames, similar to what you do in R. After also covering a brief survey of 2-D and 3-D visualizations with Pandas and matplotlib, the course then covers data transformations and dimensionality reduction, namely PCA and Isomap (for non-linear dimensionality reduction). After that, the course covers several important algorithms used in supervised and unsupervised Machine Learning, including K-means clustering, K-Nearest Neighbors classification, Ordinary Linear Regression and Multiple Linear Regression, Support Vector Machines, Decision Trees, Random Forest Classifiers, and a final rush through confusion matrices, cross-validation, pipelining, and tuning parameters with GridSearchCV.
Given that this was the first iteration of the course, my experience was pretty good. The course could use a little more polish, both in the presentation of the online materials, quizzes, and programming assignments. There were minor typos all over the place (though they didn’t really impede understanding of the material), and the quiz questions were rather ambiguous from time to time, much to my consternation. The explanations of the concepts themselves were a little wishy-washy, but when you’re trying to address a general audience there’s little else you can do. Links to the literature, textbooks, and further explanations are included, and should be read as well in order to gain a complete understanding of the subject matter.
The programming assignments were great, however. They were quite challenging, and I think I spent more than the recommended 4-8 hrs/week on the assignments. They were all really interesting, and challenging enough where I didn’t get completely frustrated and give up. We used PCA and Isomap to project 3-D images into 2-D space, K-Means to identify people’s residences based on anonymized geolocation data from their cellphones (!), linear regression to reconstruct audio samples (sort of like what they do on TV shows!), SVC to analyze whether or not someone has Parkinson’s based on collected speech quality, and other interesting examples.
I would recommend people taking this course do as I did, and use it as a supplement for other courses. There’s no way you can learn everything there is to learn about a subject as broad as “Data Science” from one course, and it is good to take multiple courses because some of them explain certain concepts better than others. For example, this course covers Isomap, which is something that most other courses do not.
If you’re curious about the programming assignments, I’m posting them in my github.