Text Classification Model – Classify Text SMS As Spam Or Ham
Text Classification is a technique using which we organize or group text with some labels or categories.
Text Classification is a technique using which we organize or group text with some labels or categories. It is also known as Text Tagging. For example, “I am feeling very happy and energetic today”. According to Sentiment Analysis/ Text Classification, we can classify it as ‘Positive’. We use various Machine Learning algorithms/ Deep Learning techniques along with Natural language Processing (NLP) to organize text data. This article will discuss how a pre-trained Machine Learning model can classify a given text SMS as Spam or Ham. The Text Classification Model is based on the Multinomial Naive Bayes Algorithm. The complete code is written in Python Programming Language.
Examples Where We Can Use Text Classification
- Sentiment Analysis of a given text- Positive or Negative
- Categorization of News Articles or Topic Detection
- Detection of Language
Steps Involved In SMS Text Classification- Python-Based
- Load Pre-Trained Model
- Provide Text Data
- Get Prediction
We have created a Flask-based Rest API for the model. We will use the API endpoint to get the prediction. The model can predict the text as Spam or Ham along with the level of confidence.
Open Python Editor & Load Pre-Trained Model
# to load the Label Encoder/ Count-vectorizer
import pickle
with open('parameters.pickle', "rb") as f:
Le, cv = pickle.load(f)
with open('classifier.pickle', "rb") as m:
clf = pickle.load(m)
Complete Implementation with Python-Flask Module
import pickle
from flask import Flask, jsonify
# function to predict result
def model_prediction(usr_txt):
result = {}
with open('parameters.pickle', "rb") as f:
Le, cv = pickle.load(f)
with open('classifier.pickle', "rb") as m:
clf = pickle.load(m)
cv_text = cv.transform([usr_txt]).toarray()
pred_res = clf.predict(cv_text)
result['pred_label'] = Le.inverse_transform(pred_res)[0]
result['confidence'] = {'ham':clf.predict_proba(cv_text)[0][0],'spam':clf.predict_proba(cv_text)[0][1]}
result['input_text'] = usr_txt
return jsonify(pred_label= Le.inverse_transform(pred_res)[0],
confidence= {'ham':clf.predict_proba(cv_text)[0][0],'spam':clf.predict_proba(cv_text)[0][1]},
input_text= usr_txt)
# flask implementation
app = Flask(__name__)
@app.route("/")
def home():
return '''Created & Distributed by Pykit: https://pykit.org/'''
@app.route("/smsPredict/<string:txt>")
def smsPredict(txt):
userText = txt
return model_prediction(userText)
if __name__ == "__main__":
app.run(debug=True, use_reloader=False)
Execute the above Python script
Click on the link in a new tab (in the browser):
Provide a Text and Press Enter To Get Results:
Result- JSON
Explanation of JSON Result- Text Classification Model Result
- confidence– The level of confidence at which the Text Classification Model predicts a message as Spam or Ham. For example, the above sample has around 99.2% that it’s a Ham and a 0.8% chance that it’s Spam.
- input_text– Text provided by the user.
- pred_label– Label predicted by the Text Classification Model.
The complete implementation of the code is available at my GitHub repository.
If you want to learn the complete steps of building an SMS Spam Classification Model from scratch, you can check out my recent article “Build Email Spam Classification Model (Naive Bayes Classifier)“
Summary
In this article, we discussed how we can Use a pre-trained model to classify a given text SMS as Spam or Ham (not spam). For text classification, we have used a Multinomial Naive Bayes Classification model written in Python Programming language to predict the result. We have also implemented the complete setup as a Flask-based Rest API.