ML+NLP: sentiment analysis

sentiment words

Feldman et al. (2011), expert investors in microblogs were identified and sentiment analysis of stocks was performed.

The Stock Sonar - Sentiment Analysis of Stocks Based on a Hybrid Approach.
Feldman et al. (2011) introduced a hybrid approach for stock sentiment analysis based on companies’ news articles。
a nonparametric approach: the Dirichlet Process Mixture (DPM) model
First, we employ a DPM to estimate the number of topics in the streaming snapshot of tweets in each day.

Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach.

In Wang et al. (2011), the authors proposed a graph-based hashtag approach to classifying Twitter post sentiments, and in Kouloumpis et al. (2011), linguistic features and features that capture information about the informal and creative language used in microblogs were also utilized.

PMI:
Turney (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews,

The PMI-IR algorithm is employed to estimate
the semantic orientation of a phrase (Turney,2001). PMI-IR uses Pointwise Mutual Information
(PMI) and Information Retrieval (IR) to measure the similarity of pairs of words or phrases. The semantic orientation of a given phrase is calculated by comparing its similarity to a positive reference
word (“excellent”) with its similarity to a negative reference word (“poor”).

Here, p(word1 & word2) is the probability that word1 and word2 co-occur. If the words are statistically
independent, then the probability that they co-occur is given by the product p(word1)
p(word2). The ratio between p(word1 & word2) and p(word1) p(word2) is thus a measure of the degree of statistical dependence between the words. The
log of this ratio is the amount of information that
we acquire about the presence of one of the words

when we observe the other.

Turney(2001)
Thumbs up? Sentiment Classification using Machine Learning Techniques
http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf

LDA(2013)

The unsupervised approach was used too by Xianghua and Guo [50] to automatically discover the aspects discussed in Chinese social reviews and also the sentiments expressed in dif- ferent aspects. They used LDA model to discover multi-aspect global topics of social reviews, then they extracted the local topic and associated sentiment based on a sliding window con- text over the review text. They worked on social reviews that were extracted from a blog data set (2000-SINA) and a lexicon (300-SINA Hownet). They showed that their approach obtained good topic partitioning results and helped to improve SA accuracy. It helped too to discover multi-aspect fine- grained topics and associated sentiment.

There are other unsupervised approaches that depend on semantic orientation using PMI [82] or lexical association using PMI, semantic spaces, and distributional similarity to measure the similarity between words and polarity prototypes [83].

[82] Turney P. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of annual meeting of the Association for Computational Linguistics (ACL’02); 2002.
[83] Read J, Carroll J. Weakly supervised techniques for domain- independent sentiment classification. In: Proceeding of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion; 2009. p. 45–52.

tweets

Sentiment classification aims to identify the sentiment (or polar- ity) of retrieved opinions. There are two categories of approaches for this task.
One approach is to develop the linguistic resources for senti- ment orientation and the structures of sentiment expression, and then classify the text based on these developed resources [16]. Linguistic re- source development aims to construct linguistic resources that provide subjectivity, orientation, and the strength of terms, and make it possible to perform further opinion mining tasks. WordNet expansion and statistical estimation [18], such as the point-wise mutual information method, are two major methods.
The second approach for analyzing sentiment is to train and deploy a sentiment classifier, which can be built with several methodologies, such as support vector machine (SVM), maximum entropy, and naïve Bayes [46].

Recently, several works on the sentiment analysis of microblog opinions have been conducted. In [8], the authors use a predefined lexicon word set of positive and negative words to classify Twitter posts and track the sentimental fluctuation to the result of polls, such as consumer confidence survey and the job approval of President Obama in the US. The authors argue that time-intensive and expensive polls could be supplemented or supplanted by simply analyzing the text on microblogs. In [9], the authors develop an analytical methodology and visual representations that could help a journalist or public affairs manager better understand the temporal dynamics of sentiment in re- action to the debate video. The authors demonstrate visuals and metrics to detect sentiment pulse, anomalies in that pulse, and indications of controversial topics that can be used to inform the design of visual analytic systems for social media events.

To classify sentiments on microblogs, machine learning should be adequate because many new sentimental words are invented and used widely on microblogs. It is difficult to determine the sentiment polarity of many exclamations and emoticons, such as “arrrg” and “>__b” by using the common sentiment linguistic sources construction approach. With large and up-to-date training data, machine learning methods are more capable to deal with those words. In our framework, an SVM classifier was used, while we apply several heuristic preprocesses and test different features to provide a more accurate classification.

ASPECT EXTRACTION

an opinion always has a target.

Extraction based on frequent nouns and noun phrases
Extraction by exploiting opinion and target relations
Extraction using supervised learning
Extraction using topic modeling

Zhu et al. (2009) proposed a method based on the Cvalue measure from (Frantzi et al. (2000) for extracting multi-word aspects.

which is then refined using a bootstrapping technique with a set of given seed aspects

The idea of refinement is based on each candidate’s co-occurrence with the seeds.

ML+NLP

2015年1月19日星期一

sentiment analysis

Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach.

tweets

没有评论:

发表评论