Path: blob/master/section-2-data-science-and-ml-tools/scikit-learn-workflow-example.ipynb
874 views
A Simple Scikit-Learn Classification Workflow
This notebook shows a breif workflow you might use with scikit-learn
to build a machine learning model to classify whether or not a patient has heart disease.
It follows the diagram below:
Note: This workflow assumes your data is ready to be used with machine learning models (is numerical, has no missing values).
1. Get the data ready
With this example, we're going to use all of the columns except the target column to predict the targert column.
In other words, using a patient's medical and demographic data to predict whether or not they have heart disease.
2. Choose the model/estimator
You can do this using the Scikit-Learn machine learning map.
In Scikit-Learn, machine learning models are referred to as estimators.
In this case, since we're working on a classification problem, we've chosen the RandomForestClassifier estimator which is part of the ensembles module.
3. Fit the model to the data and use it to make a prediction
A model will (attempt to) learn the patterns in a dataset by calling the fit()
function on it and passing it the data.
Once a model has learned patterns in data, you can use them to make a prediction with the predict()
function.
4. Evaluate the model
A trained model/estimator can be evaluated by calling the score()
function and passing it a collection of data.
5. Experiment to improve (hyperparameter tuning)
A model's first evaluation metrics aren't always its last. One way to improve a models predictions is with hyperparameter tuning.
Note: It's best practice to test different hyperparameters with a validation set or cross-validation.
6. Save a model for later use
A trained model can be exported and saved so it can be imported and used later. One way to save a model is using Python's pickle
module.