Machine learning: Difference between revisions
imported>Bruce M. Tindall mNo edit summary |
imported>John Stephenson m (Machine Learning moved to Machine learning: lower case in title) |
(No difference)
|
Revision as of 00:21, 17 August 2009
Machine Learning methods automatically learn statistical regularities in a training data set to make accurate predictions about new data. For example, a machine learning algorithm for Machine translation may be presented with several thousand examples of sentences in two different languages in some training phase, and then use statistical regularities to predict the most likely translation for new sentences. Such methods are often contrasted with rule-based methods which give explicit instructions for selecting the best prediction (for example, the best translation). Do note, however, that this is not a hard and fast division and that rule-based approaches are often used in tandem with the statistical techniques of machine learning methods.
Classifications of Machine Learning Methods
Machine Learning methods are divided into supervised and unsupervised methods depending on what sort of training data they use, and into generative and discriminative methods depending on how they arrive at their prediction.
Supervised and Unsupervised
In short, supervised methods get to see the right answer (or something like the right answer) when they are being trained, whereas unsupervised methods do not. The Machine translation example in the introduction is an example of supervised training because the algorithm sees the same sentence in both languages.
Supervised methods are further divided into "learning with a teacher," where the algorithm is told the right answer explicitly, and "learning with a critic," where the algorithm is only told when it is wrong.
The decision between a supervised method and an unsupervised method is usually determined by the sort of data available for training. Supervised methods are generally used when optimal performance is required and the programmer has access to a large amount of data that has been labeled with the right answer. Unsupervised methods must be used when there is not much labeled training data. Researchers and engineers often pursue hybrid "semi-supervised" or "self-supervised" approaches as well.
Unsupervised methods may also be used if a researcher is interested in modeling cognitive behavior and believes a method which does not know the right answer is a more accurate representation of how humans learn in a particular domain.