📚 The CoCalc Library - books, templates and other resources
License: OTHER
scikit-learn-intro
Credits: Forked from PyCon 2015 Scikit-learn Tutorial by Jake VanderPlas
Machine Learning Models Cheat Sheet
Estimators
Introduction: Iris Dataset
K-Nearest Neighbors Classifier
Machine Learning Models Cheat Sheet
Estimators
Given a scikit-learn estimator object named model
, the following methods are available:
Available in all Estimators
model.fit()
: fit training data. For supervised learning applications, this accepts two arguments: the dataX
and the labelsy
(e.g.model.fit(X, y)
). For unsupervised learning applications, this accepts only a single argument, the dataX
(e.g.model.fit(X)
).
Available in supervised estimators
model.predict()
: given a trained model, predict the label of a new set of data. This method accepts one argument, the new dataX_new
(e.g.model.predict(X_new)
), and returns the learned label for each object in the array.model.predict_proba()
: For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned bymodel.predict()
.model.score()
: for classification or regression problems, most (all?) estimators implement a score method. Scores are between 0 and 1, with a larger score indicating a better fit.
Available in unsupervised estimators
model.predict()
: predict labels in clustering algorithms.model.transform()
: given an unsupervised model, transform new data into the new basis. This also accepts one argumentX_new
, and returns the new representation of the data based on the unsupervised model.model.fit_transform()
: some estimators implement this method, which more efficiently performs a fit and a transform on the same input data.
Introduction: Iris Dataset
K-Nearest Neighbors Classifier
The K-Nearest Neighbors (KNN) algorithm is a method used for algorithm used for classification or for regression. In both cases, the input consists of the k closest training examples in the feature space. Given a new, unknown observation, look up which points have the closest features and assign the predominant class.
Note we see overfitting in the K-Nearest Neighbors model above. We'll be addressing overfitting and model validation in a later notebook.