ADVERTISEMENT

Identifying Extreme Vs. Moderate Language In Online Ratings

Those who take moderate and nuanced positions often lament the rise in extreme rhetoric and polarized debates. While social media is often identified as the culprit behind this trend, polarizing language is also felt to be increasing in mainstream and online media.

In our most recently published paper (Al-Mosaiwi & Johnstone, 2018), we use automated text-analysis to identify linguistic markers for extreme and moderate language. We then employ machine learning to train a classifier (computer algorithm) to recognize extreme and moderate language. The predictive accuracy of the classifier is then tested by running it on new extracts of moderate and extreme natural language texts. Based on the linguistic patterns it has been trained to recognize, it will label each text extract as either “extreme” or “moderate.”

ADVERTISEMENT

How do we determine what is extreme and what is moderate language? Here we use the same methodology employed in sentiment analysis. In traditional sentiment analysis, an objective rating scale is used to determine if a review is positive or negative. For example, on the website IMDB, if a film review awards a film more than 5 stars (>5/10), it is deemed a positive review; if it awards fewer than 5 stars (<5/10), then it is a negative review. In sentiment analysis, machines learn the linguistic patterns associated with positive and negative reviews, as determined by this objective rating scale.

The use of objective rating scales to determine extreme and moderate responses is well established in psychology. On questionnaires with 7-point Likert type scales, a response that selects the absolute end-point of the scale (i.e. 1 or 7) is considered an extreme response. Conversely, a response that selects between 2-6 is considered a moderate response.

Extreme responding on these Likert scales have been linked to psychopathology (mental health disorders), personality disorders, lower IQ, lower life success outcomes, and certain cultures. On that last point, there have been studies that have linked extreme responding on Likert scales to Black (e.g. Bachman, O’Malley, & Freedman-Doan, 2010) and Latino (e.g. Marin, Gamba, & Marin, 1992) cultures, while moderate responding has been linked to Asian cultures (Chen, Lee, & Stevenson, 1995).

In our study, we use responses on an objective rating scale to determine extreme or moderate classifications and study the linguistic patterns associated with them. Our data sources are IMDB, TripAdvisor, and Amazon reviews, as they all have natural language written reviews accompanied by objective ratings.

ADVERTISEMENT

We wanted to identify features that would generalize beyond these three websites. Therefore, we restricted our feature selection to “functional” words. Nouns, verbs, and adjectives are “content” words which are highly dependent on the subject of the text. Conversely, articles, prepositions, and determiners are examples of functional words, which relate to the grammatical structure of a text and have little connection to the content. By exclusively focusing on functional words, we can generalize our findings beyond the websites in the study.

We identified 11 unigrams (words or symbols) which are markers specific to extreme natural language and 20 markers which are specific to moderate natural language (see fig. 1). Of the markers for extreme reviews, “ever,” “never,” and “anyone” all denote absolutes; “my,” “you,” and “your” are determiners/possessive pronouns; there are two negations — “can’t” and “doesn’t” — which are used as categorical imperatives; and exclamation marks are intensifiers which signal extreme language. For the moderate review linguistic markers,”but,” “though,” “despite,” “other,” and “however” all introduce nuance; “much” and “more” both refer to large amounts; “rather,” “somewhat,” “sometimes,” and “some” all specify a moderate extent; “seem,” “maybe,” and “probable” have a vague and noncommittal property, while “overall” seeks to combine separate components; finally, the word “certain” relates to specifying subcomponents.

Our classification algorithm had a prediction accuracy of over 90% when tested on new data and trained to recognize extreme and moderate patterns in natural language, based on the above markers.

Measuring extremism and moderation using natural language is more informative, flexible, and ecologically valid than the previous method, which was reliant on Likert-type scales. Moreover, at a time when there is an ever-growing proliferation of online natural language data, the classifier could be continually improved with greater training and can be employed to identify extreme and moderate language in a variety of online sources.

ADVERTISEMENT

These findings are described in the article entitled Linguistic markers of moderate and absolute natural language, recently published in the journal Personality and Individual DifferencesThis work was conducted by Mohammed Al-Mosaiwi and Tom Johnstone from the University of Reading.

Comments

READ THIS NEXT

What Is Cytosol?

The interior of eukaryotic cells can be divided into two main parts: the nucleus and the cytoplasm. The nucleus contains […]

Bias In Brazil’s Conservation Efforts

Setting aside land for nature conservation is a strategy that dates back centuries ago. During the 20th and 21st centuries, […]

Pangea: The Supercontinent

Pangea is a supercontinent that existed from 335 to 175 million years ago, it was the most recent supercontinent as […]

Examining Increased Costs Associated With Joint Government Programs

U.S. federal agencies exist to accomplish specific missions; for example, the Department of Defense (DoD) ensures national security, and the […]

Entomological Time Travel: Invasion History of the Florida Buckeye Butterflies

On Halloween, 1981, Dave Baggett and 25 other butterfly and moth enthusiasts, members of the Southern Lepidopterists’ Society, gathered for […]

The World’s Biggest Killer: Pollution Kills Over 9 Million People Every Year

As more and more research is done, the harm done by pollution becomes ever clearer. A new report recently published […]

How To Write A Cursive Lowercase b

Are you curious about how to write a cursive lowercase “b”? This article will cover some general facts about cursive […]

Science Trends is a popular source of science news and education around the world. We cover everything from solar power cell technology to climate change to cancer research. We help hundreds of thousands of people every month learn about the world we live in and the latest scientific breakthroughs. Want to know more?