Machine Learning: In Plain English
In this article, I'll describe how analytics is related to ML. I'll try to demystify some of the nonsense around Machine Learning and explain the process and types of ML.
Join the DZone community and get the full member experience.
Join For FreeIn this article, I will describe how analytics is related to Machine Learning. I'll try to demystify some of the nonsense around ML, and explain the process and types of machine learning. Finally, I'll share a couple of videos which describe the next level of Artificial Intelligence - Deep Learning.
Don’t worry if you're not an artificial intelligence expert — I won’t ever mention Linear Regression and K-Means Clustering again. This is an article in plain English.
Analytics and Machine Learning
You’d be forgiven for thinking Big Data is all about SQL queries and Terabytes of data, but the real purpose is to extract value from data by gaining insight. To discover something useful from the data. For example, "if I lower prices by 5%, I’ll increase sale volumes by 10%."
Analytics is the main technique, and this includes:
- Descriptive Analytics: To identify what happened? This typically involves reports that help describe what has happened. For example, to compare this month's sales to the same time last year.
- Diagnostic Analytics: Attempts to explain why it’s happening, which typically involves using dashboards with OLAP capability to drill into and investigate the data, along with Data Mining techniques to find correlations.
- Predictive Analytics: Attempts to estimate what might happen. It’s likely predictive analytics was used to select you as a potential reader of this article based upon your job title, interests and links to others.
Machine Learning (ML) fits into the Predictive Analytics space.
What Is Machine Learning?
Machine Learning is a subset of Artificial Intelligence whereby a machine learns from past experience, ie. Data. Unlike traditional programming, where the developer needs to anticipate and code every potential condition, a Machine Learning solution effectively adapts the output based upon the data.
A Machine Learning algorithm doesn’t literally write code, but it builds up a computer model of the world, which it then modifies based upon how it’s trained.
‘’It may be a hundred years before a computer beats humans at Go”— The New York Times, 1997
How Does It Work?
Spam filtering software is a great example. It uses Machine Learning techniques to learn how to identify spam from millions of mail messages. It works by using statistical techniques to help identify the patterns.
For example, if every 85 out of 100 emails, which include the words “cheap” and “Viagra” are found to be spam messages, we can say with 85% confidence that they are indeed spam. Combining this with several other indicators (for example, from a sender you've never received mail from), and testing the algorithm against a billion other emails, we can improve the confidence and accuracy over time.
In fact, Google indicates it now stops around 99.99% of spam sent.
“Master of Go Board Game Is Walloped by Google Computer Program” — The New York Times, 2016
Machine Learning Examples
There are literally hundreds of applications already in place including:-
- Targeted Marketing: Used by Google and Facebook to target adverts based on individual interests, and by Netflix to recommend movies to watch, and Amazon to recommend products to buy.
- Credit Scoring: Banks use income data (estimated from where you live), your age and marital status to predict whether you’ll default on a loan.
- Card Fraud Detection: Used to stop fraudulent use of credit or debit cards online based upon your previous and likely spending habits.
- Basket Analysis: Used to predict which special offers you’re more likely to use based upon the buying habits of millions of similar customers.
In one controversial case, the US retailer Target used a basket analysis of 25 different health and cosmetic products to successfully predict pregnancy including the due date with remarkable accuracy. This backfired when the father of a young girl complained that Target was encouraging teen-mums after she was bombarded with special offers related to pregnancy. He later apologized when he found the retailer knew more than he did.
What You'll Need
Effectively, you’re looking for correlations in the data, but you’ll need some domain expertise to verify the results. Yes, computers really are dumb, they can find a pattern, but only an expert can verify if it's relevant.
In summary, you need (in order of priority):
- A Goal. The problem you’re trying to solve. For example, is this credit card stolen? Will stock prices go up or down? Which movie will the customer enjoy most?
- Lots of Data. For example, to accurately predict home values you’ll need detailed historical prices along with extensive property details.
- An Expert. You’ll need a domain expert who understands the right answer to verify the results produced, and confirm when the model is accurate enough.
- A Pattern. You’re looking for a pattern in the data. If there’s no pattern, you may have the wrong or incomplete data or maybe there’s no pattern there at all.
Learning From Mistakes: Types of Machine Learning
Predictive analytics attempts to predict a future outcome based on historical data, and the most common method is referred to as Supervised Learning.
The types of Machine Learning are:
- Supervised Learning: Used when we know the correct answers from past data, but need to predict future outcomes. For example, using past house prices to predict the current and future value. (eg. US-based Zillow or UK based Zoopla). Effectively using a trial and error based statistical improvement process, the machine gradually improves accuracy by testing results against a set of values provided by a supervisor.
- Unsupervised Learning: Where there is no distinct correct answer, but we want to discover something new from the data. Most often used to classify or group data, for example, to classify music on Spotify, to help recommend which albums you might listen to. It will then classify the listeners, to see if they're more likely to listen to Radiohead or Justin Bieber. (Radiohead every time!).
- Reinforcement Learning: Doesn’t need a domain expert but involves constant improvements towards a predefined goal. It’s a technique that often deploys Neural Networks, for example, DeepMind in which AphaGo played a million games of Go against itself to eventually become the world champion.
The Machine Learning Process
Unlike the futuristic image of machines learning to play chess, most Machine Learning is (currently) quite laborious, and illustrated in the diagram below:
It’s likely in the future machine learning will be applied to help speed the process, especially in the area of data collection and cleaning, but the main steps remain:
- Define the Problem: As indicated in my other article, always start with a clearly defined problem and objective in mind.
- Collect the data: The greater the volume and variety of appropriate data, the more accurate the machine learning model will become. This can come from spreadsheets, text files, and databases in addition to commercially available data sources.
- Prepare the data: Which involves analyzing, cleaning and understanding the data. Removing or correcting outliers (wildly wrong values); this often takes upwards of 60% of the overall time and effort. The data is then separated into two distinct parts, Training and Test data.
- Train the model: Against a set of training data — used to identify the patterns or correlations in the data or make predictions, while gradually improving accuracy using a repeating trial and error improvement method.
- Evaluate the model: By comparing the accuracy of the results against the set of test data. It’s important not to evaluate the model against the data used to train the system to ensure an unbiased and independent test.
- Deploy and Improve: Which can involve trying a completely different algorithm or gathering a greater variety or volume of data. You could, for example, improve house price prediction by estimating the value of subsequent home improvements using data provided by homeowners.
In summary, most Machine Learning processes are in fact circular and continuous, as additional data is added or the situations change, because the world never stands still, and there's always room for improvement.
Summary
The diagram below illustrates the key strategies used by Machine Learning systems.
In conclusion, the critical component of any machine learning system is the data. Given the choice of additional algorithms, clever programming and great quantities of more accurate data - Big Data wins every time.
Thank You for Reading
You may also be interested in this 14-minute video on Google’s Deep Mind, which explains how Cambridge based scientists developed an Artificial Intelligence system that uses reinforcement learning to teach itself to win at computer games including Space Invaders. Very reminiscent of the 1980s movie “War Games.”
If you found this helpful, you can view more articles on Big Data, Cloud Computing, Database Architecture and the future of data warehousing on my web site www.Analytics.Today.
Published at DZone with permission of John Ryan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments