Saving Sklearn Model to Pickle
The
pickle
module implements binary protocols for serializing and de-serializing a Python object structure.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.
Why do we need to save our algorithms to pickle?
Saving the finalized model to pickle saves you a lot of time as you don’t have to train your model every time you run the application. Once you save your model as pickle, you can load it later while making the prediction.
You can either use “pickle” library or “joblib” library in python to serialize your algorithms and save it to a file.
Let’s start with an example!
For the demonstration, I have used BBC news dataset which you can download from https://storage.googleapis.com/dataset-uploader/bbc/bbc-text.csv and linear SVM model for training. After the training process, the model is saved as “model.pickle” and vectorizer as “vectorizer.pickle”. You can also try using Google’s pre-trained model to get the vectors. For the demo, I have used TfidfVectorizer which is found in Sklearn package.
Once you run the code, the model gets saved as model.pickle and vectorizer.pickle in your local working directory.
Let’s load the model!
Now that we have our models saved, let’s load them using pickle and predict the class for the new unseen data.
Viola! you have successfully learned how to save and load the pickled model and do the predictions!
You can also try Joblib to save the models!
import joblib
Train your model and save using
joblib.dump(model, model_path+"model.sav")
joblib.dump(vectorizer_model, vectorizer_path+"vectorizer.sav")
To load and test the model
#Loading the saved model
def loading_joblibPickle(model_path):
vectorizer = joblib.load(model_path+"vectorizer.sav")
model = joblib.load(model_path+"model.sav")
return vectorizer, modeltext = "tv future in the hands of viewers with home"#load and predict
model_path = Project_path + "/08. Multi-class_text_classification/models/"
vectorizer1, model1 = loading_joblibPickle(model_path)
predict(model1, vectorizer1, text)
You can find the full code for both in the Github