How I learned modern Machine Learning
I recently received a short email asking how I learned modern ML in less than year and ended up with a research internship.
I wrote back a detailed response which I was quite satisified about, and then decided to share it here.
I added some notes about advices in general and why they suck if you blindly follow them.
The Email
Hey {Redacted}, Thanks for your email!
Before going through the story and giving some advices, I want to emphasise that what worked for me (and the steps I took in order to be successful in that matter) won’t be easily transposable to your own life and situation. The unintuitive thing about advices in general is that one must extract the overall macro strategy from the noisy advices as opposed to naively following them. Perfect example is all the startup kids trying to copy Elon Musk and working on a hip internet startup (X.com, Zip2) before going all in into their passion project (SpaceX, Tesla). It worked 20 years ago because it was a different situation, now not so much.
Another thing to mention is that a lot of people are ambitious about their career and their ventures and thus are ready to take bets. They do things radically differently (eat ramen noodles everyday, work in their garage, move to the US to pitch, ….) in order to get radically different outcomes (build a $1B+ company in 5 years).
But almost nobody takes that approach when it comes to intellectual goals. Startup kids are ready to bet on building a unicorn in a short amount of time against all statistical evidences that they won’t. Yet nobody says “I am going to become an expert in a field in a year instead of ten”. That’s much more uncommon.
So if one wants to become an expert in a year instead of ten, one will to need to learn the subject in a radically different fashion (just like you don’t build a unicorn the same way you build an accounting firm in downtown France).
Hope this makes sense!
(I don’t claim I am an expert in ML. But I claim it’s possible to become an expert in an order of magnitude less time that most people spend on getting there)
Now let’s get to the story:
My University let us study abroad during our third year. I went to Singapore. While the city was really amazing, the university I went to was not (even if ranked in the top 20 Uni in the world).
I had way more free time than I expected and decided to split it into travelling and learning one subject over the course of the academic year. I think I ended up with 20% travelling and 80% learning (as opposed to my friends who spent a 100% of their time budget on travelling).
And so I decided to learn modern machine learning and all the bells and whistles associated to it.
The first thing I did was distilling ML into its most basic foundational elements: Linear Algebra, Continuous Probability, Information Theory, and some Calculus (Back-propagation is just a fancy name for the chain rule applied a gazillion time in a row).
I took online courses at MIT (using MIT Opencourseware) and made sure to do the homeworks! (Just doing the lectures won’t teach you much).
These courses were:
-
MIT 18.06 - Linear Algebra
-
MIT 18.05 - Introduction to Probability and Statistics (this won’t cover all the things you need)
-
For Information theory I read a lot of wikipedia pages and some snippets of textbooks
To build up your geometrical intuition on why Linear Algebra is the way it is I recommend watching the 3Blue1brown playlist on Linear Algebra: here
Then I went on to learn classic machine learning, which is a set of all the really good methods people used (and still use today) before the compute for neural networks made them usable (neural networks have been around since the 80s, they only started working because we digitised the world and GPUs can do linear algebra faster than CPUs). The best resource on classical ML is Andrew Ng’s course: Classical ML Course
Now onto the fun stuff!
I bought THE textbook (you always want a paper reference textbook, no matter how expensive it is) on Deep Learning: Deep Learning Textbook
Read through it once, took note, and made sure to understand the theoretical aspect of it very well.
In parallel I took the best course in Deep Learning (if you want to learn it the hard way by deriving all the backprop math yourself): CS231N
This course is worth all the pain. Trust me. And make sure you do the assignments! It took me an entire weekend to do the first one while my friends were in Bali, not gonna lie I did reconsider my commitment at that time :)
At that point my tree of knowledge had very strong roots and a very strong trunk.
Then it was time to:
-
Play with modern software. CS231N asks you to write your own linear algebra math in Numpy. While interesting, this approach is a lot of manual labour and you don’t want to think about all the low level details of backprop. So it meant learning to use either Pytorch or Tensorflow (I learned both) and do your own projects. No tutorials or courses. I just read the documentation and went on to train my own sweet little network on sweet little tasks.
-
Understand the limitations of Convolutional neural networks. I really don’t understand why everybody starts by training networks on real data. It would be like testing new propulsion research on a real rocket instead of in a lab?! I generated small test datasets (images with N lines in random position and direction) and tried to train a network to count the number of lines. It is surprisingly hard! CNNs recognise textures, but they are really bad at doing reasoning (counting, comparing, …). Play and watch them break. Remove the idea that “Deep Learning is magic” from your brain. It is not true!
-
Specialise and learn the things that are exciting: For me it was Reinforcement Learning. I took Sergey Levine’s course at Berkeley and David Silver’s course at UCL (Silver course first). I then played a lot with RL in Minecraft. It lead me to buying an external GPU for my laptop cause using cloud instances add a lot of friction (and when you learn something for the first time it’s all about removing friction).
Beginning of the second semester I committed to doing undergrad research to get more credits (and avoid having to follow another boring course). I found a supervisor who gave me a huge dataset of EEG data. I decided to apply some fancy anomaly detection technique from a paper I found. Being able to read papers and reimplement them myself is the most important skill I learned!
During the second semester, I spent a ridiculous amount of time applying to internships (and getting rejected). I finally found the perfect company and the perfect internship. I gave myself all the luck I could find and ended up with an offer for a Research Scientist internship. But that’s a story for another day ;)
-Justin