2015年1月17日星期六

text classification


Many of these methods, including support vector machines (SVMs), the main topic of this chapter, have been applied with success to information retrieval problems, particularly text classification.

While several machine learning methods have been applied to this task, use of SVMs has been prominent. Support vector machines are not necessarily better than other machine learning methods (except perhaps in situations with little training data), but they perform at the state-of-the-art level and have much current theoretical and empirical appeal.

It is frequently the case that greater performance gains can be achieved from exploiting domain-specific text features than from changing from one machine learning method to another.
Understanding the data is one of the keys to successful categorization.
This process is generally referred to as feature engineering . At present, feature engineering remains a human craft, rather than something done by machine learning. Good feature engineering can often markedly improve the performance of a text classifier. [http://nlp.stanford.edu/IR-book/html/htmledition/features-for-text-1.html]


semi-supervised training methods . This includes methods such as bootstrapping or the EM algorithm,which we will introduce in Section 16.5


 how to adjust the weights of an SVM without destroying the overall classification accuracy.


It may be best to choose a classifier based on the scalability of training or even runtime efficiency.





没有评论:

发表评论