Posts

Showing posts from July, 2020

Smart website category classifier

Image
Presenting by Madhan Kumar Selvaraj As per the worldometer statistics, each day around 53,61,900 new blogs and websites are created. From that, most of them are adult websites and half of the adult websites are spam sites which contain malware while downloading the contents from them or it will try to get the credentials from us. Real-world issue The website URL that we get in the social medial account is not classified whether it is spam or not.  I got a scenario of classifying the Job posting website from the billion websites. But there is no perfect classifier and even those classifiers are proprietary. Sentiment analysis is done in the comment section of the website or blog. But there is no option of classifying the website URL commented by the user. Sometimes it may contain malware or adult website. To create the website classifier, we are going to use the Spacy library which is alternative to the Natural Language ToolKit (NLTK) library, machine learning algorithms, Fastapi w