Machine learning is not complicated. No really, it isn’t! I bet you can do machine learning without even opening your calculator app. Simply study this line of numbers for no more than a few seconds:
Got it? Compare that line to these four new lines:
Which is the example most similar to?
If you said Line 1, then you understand the fundamentals of most machine learning algorithms. That’s all machine learning is: turning data into patterns and making predictions based upon those patterns. In this example, you can say with more than 90% confidence that your prediction is correct because 15 out of 16 digits match up (as long as no one digit is more important than any other in your prediction).
In this case, that was supervised learning – supervised because you coerced your answer (also known as a prediction) to fit into an existing or known pattern.
Now, consider the four lines again, WITHOUT the original example as context:
If you were asked to group these four lines into THREE distinct groups (Group X, Y, and Z), you might organize them like this:
Group X: Line 1
Group Y: Line 2 and Line 3
Group Z: Line 4
Group Y (Lines 2 and 3), though not exactly similar, have enough in common to be grouped together. Because you are not trying to fit each line into an existing system, and trying to come up with a convention for them, this is an example of unsupervised learning.
If I ask you to group them into 2 distinct groups, it gets a little bit tougher – will you group them by the total sum of each the lines? If you do, Lines 1 (adds up to 7), 2 (sum = 6), and 3 (sum = 8) would be a group, and Line 4 would be its own group. But if you look at the total number of 0s per line, you might group 1, 2, and 4 together (they are more than 50% 0s). Or you might group them in a completely different way that I hadn’t thought of. Without more information on what those 1s and 0s represent, it’s hard to make a decision on how to group them together to make the most sense.
That’s what data scientists do, they take their knowledge of statistics, math, coding and domain expertise (or work with other experts) to make these sorts of tough decisions.
It gets a little bit tougher when the rows of numbers get longer and the entire collection of rows gets larger. And the complexity rises further when the numbers begin to represent ideas or categories (1 for blue, 2 for red, 3 for green…) or be given different weights (instead of 1 or 0, we can use the total weight in pounds). And that’s where computers come into play. But to be honest, computers aren’t really necessary when it comes to machine learning, it just makes things a whole lot faster.