Population Statistic: Read. React. Repeat.
Wednesday, August 26, 2021

As far as I can tell, sentiment analysis is just the latest spin on a continuing attempt to discern deeper insights from a fairly shallow expressive vein:

The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis (“love” is good, “hate” is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray.

“We are dealing with sentiment that can be expressed in subtle ways,” said Bo Pang, a researcher at Yahoo who co-wrote “Opinion Mining and Sentiment Analysis,” one of the first academic books on sentiment analysis.

To get at the true intent of a statement, Ms. Pang developed software that looks at several different filters, including polarity (is the statement positive or negative?), intensity (what is the degree of emotion being expressed?) and subjectivity (how partial or impartial is the source?).

For example, a preponderance of adjectives often signals a high degree of subjectivity, while noun- and verb-heavy statements tend toward a more neutral point of view.

As with most such semantic Web exercises, sentiment analysis relies far too much on keywords to interpret Web media and users. Tags, metadata, and just plain old descriptive text are all, by their nature, subject to ambiguity. Because people often imperfectly choose their words, wholesale analysis of those keyworded markings never will be wholly reliable. The 70-80 percent accuracy that Scout Labs, Jodange and other firms claim is probably the upper limit, and even those results aren’t useful beyond surface public opinion (and only a sub-sub-segment of that, since a minority of the total population actively engages in blogs and other social media).

Naturally, such research focuses on Web keywords and written communication because that’s all there is by which to navigate Web data. But that avoids the inherent problem: Keywords reveal only so much. Because the Internet is still primarily text-based, there’s a distinct limit to how, and how much, people will populate it. Tinkering with grammatical syntax and such builds great algorithms, but it’s never going to uncover particularly deep sentiments from the online hive-mind.

by Costa Tsiokos, Wed 08/26/2009 02:09:04 PM
Category: Business, Social Media Online, Society
| Permalink | Trackback |

2 Feedbacks »
  1. Hi Costa, I think many people mix semantic web research with machine learning research. I agree a keyword representation of content seems limiting (and backwards!). Most machine learning techniques look to address specific questions by making use of large sets of textual data. Perhaps it is short sighted to say that these evolving techniques will never uncover deep sentiments..

    Comment by david — 08/27/2009 @ 07:47:57 AM

  2. Thanks david. I think the key is those large sets of textual data — sheer scale is the only thing giving these analyses any validity. But the raw material is still all that text, and as long as the source is imperfect, the large-scale results are going to be imperfect.

    Comment by CT — 08/27/2009 @ 10:07:13 AM

RSS feed for feedback on this post.

Say something! (with optional tweeting)

Comment moderation might kick in, so please do not hit the "Say It!" button more than once.

Tweet this comment, too!

(Don't worry, your Twitter Name/Password is NOT saved.)