Create a SMS Spam Classifier ?

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An example would be assigning a given email into “spam” or “non-spam” classes or assigning a diagnosis to a given patient as described by observed characteristics of the patient (gender, blood pressure, presence or absence of certain symptoms, etc.).


Or in simple terms, using already available data to assign a category to new data.


For Example, you want to train a model to find out if an email is spam or not. To do this, you would make a csv file of all the emails you have already marked as spam and emails marked as not spam.

Download this sample file “http://examples.hopdata.com/sms-spam.csv


ham,As per your request ‘Melle Melle (Oru Minnaminunginte Nurungu Vettam)’ has been set as your callertune for all Callers. Press *9 to copy your friends Callertune
spam,WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only.

Here “ham” means not spam and “spam” means spam.

To train the model, go to http://app.hopdata.com and just upload the document.


HopData’s machine learning engine will scan through the data and make relations on what word combination will be seen as spam and what as not.

Once the model is ready, you can test it by sending some example test using the form at the bottom of the page.

You can create more classifiers the same way.

