Google Machine Learning Crash Course Review
The Google Machine Learning Crash Course (with TensorFlow APIs) is a free course on the Google Developer site.
As a Machine Learning Noob, I started at the beginning of the course, but it gives you a quiz up front to move you ahead if you know the foundation concepts.
While not without its flaws, I was impressed with how much I learned in just a few minutes.
I highly recommend this free Machine Learning course.
- Google Machine Learning Crash Course Review
- Course Review
- Read this Next
Overview of What You’ll Learn
Course Instructor and Class Rating
The introduction gives three quick reasons you should use Machine Learning.
Learn Machine Learning to:
- Reduce time programming
- Better customize products
- Solve problems that you don’t (as a human, know how to solve
To reduce programming time, you feed the AI model examples to have it create new code.
To customize products, you can, for example, feed an AI model in different languages to translate your product for you.
To solve problems outside your expertise, you can feed the model data from similar problems to solve similar new ones.
You probably already know that Machine Language will change your thoughts about problems.
The focus, the instructor says, changes from logic to statistics because Machine Learning is a predictive science.
Mind Your Features and Labels
The Google Machine Learning Crash Course is about “Supervised Machine Learning,” which Google defines as:
Supervised Machine Learning: “Create models that combine inputs to produce useful predictions even on previously unseen data.”
First we need to label example data to create patterns.
If you are creating a spam filter, you can label data as “SPAM” or as “NOT SPAM.”
The labels become the examples that the model uses on data it’s never seen before.
The data we label are the Features, which are entities such as the email’s words and its email addresses.
Features are any data that is in the dataset, some of which we will label to train the model.
We feed labeled data to the model so that it can recognize patterns in unlabeled data.
The “model” is the prediction algorithm. The model predicts which labels it should assign to unlabeled features.
Examples to Drive Home Concepts
Next, the course gives you examples of everything you just learned.
It shows how the AI model might predict housing prices based on examples (labeled features).
The Features include the house’s median age, total rooms, and total bedrooms.
As model trainers, we assign “Median House Value” labels to this data.
The Features are the house median age, total sf. of all rooms, and total sf. of the bedrooms.
We assign the label 66900 to the median house value to represent $66,900.
To continue the fantasy that $66,900 represents an accurate price for a 5,612 sf. home in California, we train the model on more data for houses 19, 17, 14, and 20 years old, each with square footage features.
We assign $80,100, $85,700, and other laughably low values to the median house value for each example to train the model on the relationship of house age and size to the median value.
We input the labeled feature example into the model, and then it can use those patterns to predict housing prices of unlabeled data.
We ask the model, “What is the median house value of a 42-year-old house with 1686 sf., 361 sf. of which is bedroom space?
To put this in ML language, “Please predict the median house prices of these unlabeled examples based on the patterns found in the labeled examples.“
ML Carts Before ML Horses
Next, within the example, the course outlines the definitions of Training and Interference.
We’ve been doing training and interference without definition throughout the lesson.
Rather than teaching these terms and then showing examples, the course teaches these concepts for the first time within the example.
I am quibbling about this, not because it’s a bad course. It’s a very good course.
I’m quibbling because it could be great; it just needs some work to introduce concepts, define them, explain them, and then use them in examples.
In case you were curious: Training is when we feed the labeled feature examples to the model. Interference is when we ask the model to extrapolate from the examples to unlabeled data and label it for us. Those labels are our predictions.
The course then teaches Regression and Classification in the example without first introducing, defining, and explaining these concepts.
Since I know I was curious, you might be too: Regression models predict outcomes of “continuous values.” They answer the question: “Where does this data fall along a progression or regression of values (house values, for example)?”
Classification models predict outcomes based on discrete values that coalesce into patterns. Classification is a scoring system, such as creating a spam score based on individual bits of discrete data.
Google Course Quiz Experience
I’m sorry to be such a geek. I love quizzes. I think they’re one of the most powerful ways to reinforce new information as knowledge quickly.
Google once again provides a great learning experience with some frustrating ordering issues.
The quiz contains a question whose answer is not yet in the course!
Yet, I still have to forgive this oversight because I drove home the concepts in my mind, trying to figure out if I had missed them or if they were never presented.
Let’s dissect the first quiz to show you why I think it’s an excellent example of why I love this free Google Machine Learning course, and it might make me crazy.
Supervised Learning Quiz Example
The quiz looks like this:
Suppose you want to develop a supervised machine learning model to predict whether a given email is “spam” or “not spam.” Which of the following statements is true?
- Emails not marked as “spam” or “not spam” are unlabeled examples.
- Words in the subject header will make good labels.
- The labels applied to some examples might be unreliable.
- We’ll use unlabeled examples to train the model.
The correct answers are 1 and 3.
Number 1 is correct because “NOT MARKED” is the same as unlabeled.
Number 3 is correct because … do you know why?
I’ve given you everything I learned so far. Did we discuss unreliable labels? I’m not even sure if #3 is referring to unlabeled examples or unlabeled predictions. So I’m pretty confused.
The correct answer is: “The labels applied to some examples might be unreliable.”
Answer #3 is correct because: “It’s important to check how reliable your data is. The labels for this dataset probably come from email users who mark particular email messages as spam. Since most users do not mark every suspicious email message as spam, we may have trouble knowing whether an email is spam. Furthermore, spammers could intentionally poison our model by providing faulty labels.”
Based on this explanation, answer #3 refers to input data that might be poorly labeled due to user error. I had no hint of looking for bad user data going into the quiz. Don’t quiz me on material you have not yet taught!
So yeah, that drives me crazy, but the upside is figuring out whether I missed a section or Google did reinforce my understanding of faulty labeling.
I will never forget these concepts, so staying mad is hard.
The quiz explanations are truly helpful. Answer #2 is wrong because: “Words in the subject header might make excellent features, but they won’t make good labels.”
What does this mean?
Features are entities or pieces of data that we will label. If the email subject words are “Buy Viagra,” are those words spam? Or are they indicative of content that might be spam?
The keywords are features, which are conceptual data points. When the user labels the feature as spam, the subject words become labeled features. They are labeled as evidence of possible spam. The keywords are the feature, and the “possible spam” is the label.
Answer #4 is incorrect (“We’ll use unlabeled examples to train the model”) because we label features to train the model.
As the answer’s explanation says: “We’ll use labeled examples to train the model. We can then run the trained model against unlabeled examples to infer whether the unlabeled email messages are spam or not spam.”
Labeled features train the model to make predictions on unlabeled data.
Crash Course Review Conclusion
Courses have structure to maintain flow and encourage learning reinforcement.
Had I not had to dig to fix the course’s flaws, the experience would have been a net negative event.
But because I wanted to know more, I dug into the structure issue, and that research drove home the difference between labels and features.
The rest of the course is easy to follow once you find a workaround for a minor flaw like this.
The explanations make sense. They’re offered in a logical order. The quizzes and projects reinforce the material.
For a free course, Google Machine Learning Crash Course is excellent. For a paid course, it would be excellent.
As with any Machine Learning deep dive, the course is also a huge commitment. Taking this course will help you on your Machine Learning journey, and for that, it is worth your time.
Read this Next
Learning machine learning is not easy, but it is rewarding. Read this next for salary details: