Saving Sklearn Model to Pickle

Saving the model and vectorizer as pickle

2 min readSep 5, 2019

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.
— source: https://docs.python.org/3/library/pickle.html

Why do we need to save our algorithms to pickle?

Saving the finalized model to pickle saves you a lot of time as you don’t have to train your model every time you run the application. Once you save your model as pickle, you can load it later while making the prediction.

You can either use “pickle” library or “joblib” library in python to serialize your algorithms and save it to a file.

Let’s start with an example!

For the demonstration, I have used BBC news dataset which you can download from https://storage.googleapis.com/dataset-uploader/bbc/bbc-text.csv and linear SVM model for training. After the training process, the model is saved as “model.pickle” and vectorizer as “vectorizer.pickle”. You can also try using Google’s pre-trained model to get the vectors. For the demo, I have used TfidfVectorizer which is found in Sklearn package.

Once you run the code, the model gets saved as model.pickle and vectorizer.pickle in your local working directory.

Let’s load the model!

Now that we have our models saved, let’s load them using pickle and predict the class for the new unseen data.

Viola! you have successfully learned how to save and load the pickled model and do the predictions!

You can also try Joblib to save the models!

import joblib

Train your model and save using

joblib.dump(model, model_path+"model.sav")
joblib.dump(vectorizer_model, vectorizer_path+"vectorizer.sav")

To load and test the model

#Loading the saved model
def loading_joblibPickle(model_path):
    vectorizer = joblib.load(model_path+"vectorizer.sav")
    model = joblib.load(model_path+"model.sav")
    return vectorizer, modeltext = "tv future in the hands of viewers with home"#load and predict
model_path = Project_path + "/08. Multi-class_text_classification/models/"
vectorizer1, model1 = loading_joblibPickle(model_path)
predict(model1, vectorizer1, text)

You can find the full code for both in the Github

Saving Sklearn Model to Pickle

Saving the model and vectorizer as pickle

Why do we need to save our algorithms to pickle?

Written by Pema Gurung

Responses (1)