Rethinking online product review usefulness

Utility check against review visibility. Distribution of ratings. Product type segmentation analysis. Feature of determining the price category. Improving buyers' purchasing decisions by permuting erroneous ratings of the current sorting algorithm.

Рубрика Маркетинг, реклама и торговля
Вид дипломная работа
Язык английский
Дата добавления 26.08.2020
Размер файла 2,6 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

The analysis of review's contents retrieved an especially depressing results: only the presence of analysis of product description's truthfulness and mentions about the package of the product proved to be significant enough to mention. The presence of the analysis of seller's description truthfulness indicated an increase of review's helpfulness by 1.73% on average. Any mentions about the packaging and/or delivery surprisingly also indicated an increase of 2.82%.

As to the stylistic choices, the only structural non-informative textual elements were the use of exclamation marks (“!”) and the average length of consecutive parentheses, that denote a series of truncated and stacked “smiley-faces” [“(((((“ means “bad emotions and “)))))” means positive emotions] in Russia. Each case of exclamations being used in a line (so, both “!” and “!!!” consecutively would count for a single use”) indicated an increase of 0.46% in review's helpfulness on average. The increase in the average number of parentheses, used one after the other to denote “smiley-rows” by 1 unit, indicated a decrease in review's helpfulness of 1.16% on average.

Review star ratings are the biggest helpfulness indicators in our model. When compared with 1-star reviews, 2-star reviews showed 3.8% lower ratios on average, 3-star reviews - 20.9% (!) lower, 4-star reviews - 10.67% and 5-star reviews - 5.43% lower. Squared star-rating divergence of a review form the status quo of the community indicated an additional decrease of 3.8% per star.

All of the factors described above have coefficients, significant at least 95% CL. Other results were left out, because of their insignificance.

Call:

lm(helpfulness ~ posting_order * oversaturation_point_passed + longer_comm + not_segmented + every_section + verified + anonymous + default_avatar + images + verified + user_experience + word_count * word_threshold_passed + pros_average_words_per_BP + cons_average_words_per_BP + any_digit_BP + any_math_BP + any_emoji_BP + twoSided_argumentation + expert_claim + product_comparisons + price_quality_analysis + product_description_truthfulness + summarised_opinion + conclusive_statement + measurement_indicators + price_mention + quality_mention + design_mention + images : Price_category + verified : Price_category + price_quality_analysis : Price_category + product_comparisons : Price_category + stars : user_experience + word_count : user_experience + P.S. + excessive_smiles * excessive_exclamations + delivery_and_packaging + call_to_action + I(status_quo_divergence^2) + factor(stars),

data = training, na.action = na.exclude)

Residuals:

Min 1Q Median 3Q Max

-0.73755 -0.10396 0.02344 0.12418 0.79685

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.594662682 0.026040306 22.836 < 0.0000000000000002 ***

word_count 0.000524737 0.000033808 15.521 < 0.0000000000000002 ***

word_threshold_passedTRUE 0.104956185 0.040777543 2.574 0.010073 *

every_section -0.022835527 0.025179064 -0.907 0.364471

not_segmented 0.012196822 0.015909870 0.767 0.443328

any_digit_BPTRUE 0.001574877 0.008584310 0.183 0.854441

any_math_BPTRUE 0.004675227 0.008864525 0.527 0.597923

any_emoji_BPTRUE 0.068428315 0.083538469 0.819 0.412738

pros_average_words_per_BP 0.000502091 0.000548764 0.915 0.360243

cons_average_words_per_BP -0.000254107 0.000410052 -0.620 0.535475

posting_order 0.000405381 0.000127912 3.169 0.001534 **

oversaturation_point_passedTRUE 0.001990071 0.009519549 0.209 0.834413

verifiedTRUE -0.015457340 0.018062242 -0.856 0.392141

anonymousTRUE -0.013099696 0.005410692 -2.421 0.015494 *

default_avatar 0.006349236 0.004413771 1.439 0.150326

imagesTRUE 0.014513859 0.014425748 1.006 0.314391

twoSided_argumentationTRUE -0.055126195 0.030267594 -1.821 0.068596

expert_claimTRUE 0.015228498 0.008401743 1.813 0.069936 .

product_comparisonsTRUE 0.000398665 0.008174937 0.049 0.961106

price_quality_analysisTRUE 0.010552110 0.008474792 1.245 0.213121

product_description_truthfulnessTRUE 0.017255366 0.004861943 3.549 0.000389 ***

summarised_opinionTRUE 0.005649765 0.005407088 1.045 0.296106

conclusive_statementTRUE -0.000233288 0.007111155 -0.033 0.973830

measurement_indicatorsTRUE 0.002629720 0.004418198 0.595 0.551724

price_mentionTRUE 0.000635700 0.004390553 0.145 0.884881

quality_mentionTRUE 0.007414021 0.004412303 1.680 0.092933 .

design_mentionTRUE 0.002669675 0.005141660 -0.519 0.603617

call_to_actionTRUE 0.004631161 0.005051093 -0.917 0.359239

delivery_and_packagingTRUE 0.028227235 0.006119663 4.613 0.0000040328870799 ***

exclamations 0.004550444 0.001174646 -3.874 0.000108 ***

excessive_exclamations 0.009255555 0.005293387 -1.749 0.080410 .

excessive_smiles 0.004634982 0.003719281 -1.246 0.212722

P.S.TRUE 0.004918008 0.016011287 -0.307 0.758730

average_exclamation_conga_length 0.004810905 0.002967193 1.621 0.104974

average_smile_conga_length -0.011643310 0.003260570 -3.571 0.000358

I(status_quo_divergence^2) -0.038020206 0.001705228 -22.296 < 0.0000000000000002 ***

factor(stars)2 -0.155117870 0.012174398 -12.741 < 0.0000000000000002

factor(stars)3 -0.209008470 0.017026014 -12.276 < 0.0000000000000002

factor(stars)4 -0.106693882 0.019399203 -5.500 0.00000003904210189 ***

factor(stars)5 -0.054311872 0.018562987 -2.926 0.003444 **

word_count:

word_threshold_passedTRUE -0.000396139 0.000078545 -5.043 0.0000004662145242 ***

word_count:

user_experienceMore than a year -0.000121712 0.000051212 -2.377 0.017493 *

word_count:

user_experienceSeveral months -0.000005037 0.000032725 -0.154 0.877682

word_count:

ComplexitySimple 0.000301137 0.000029291 10.281 < 0.0000000000000002 ***

longer_commFALSE:

comment_section 0.043398292 0.026026656 1.667 0.095459 .

longer_commTRUE:

comment_section 0.007409575 0.017704885 0.419 0.675588

longer_commTRUE:

every_section 0.047665444 0.029889196 1.595 0.110806

posting_order:

oversaturation_point_passedTRUE -0.000423672 0.000133469 -3.174 0.001507 **

ComplexitySimple:

posting_order -0.000281855 0.000052567 -5.362 0.0000000844458058 ***

verifiedFALSE:

Price_categoryLow 0.093762773 0.021331477 4.396 0.0000111786839184

verifiedTRUE:

Price_categoryLow 0.058207938 0.009839589 5.916 0.0000000034262399

verifiedFALSE:

Price_categoryMedium 0.050545572 0.020394147 2.478 0.013214 *

verifiedTRUE:

Price_categoryMedium 0.064340687 0.008581307 7.498 0.0000000000000711 ***

Price_categoryLow:

imagesTRUE 0.009914739 0.023002915 0.431 0.666463

Price_categoryMedium:

imagesTRUE 0.013579528 0.018181231 0.747 0.455145

user_experienceMore than a year:

verifiedTRUE 0.084337872 0.011068598 7.620 0.0000000000000280 ***

user_experienceSeveral months:

verifiedTRUE 0.030177623 0.007452272 4.049 0.0000517705940798 ***

user_experienceMore than a year:

oversaturation_point_passedTRUE -0.012225653 0.013306692 -0.919 0.358246

user_experienceSeveral months:

oversaturation_point_passedTRUE 0.008629380 0.010207564 0.845 0.397915

ComplexitySimple:

expert_claimTRUE 0.014559815 0.022053996 0.660 0.509149

Price_categoryLow:

product_comparisonsTRUE 0.010370754 0.012393231 0.837 0.402723

Price_categoryMedium:

product_comparisonsTRUE -0.004594342 0.010024257 -0.458 0.646732

Price_categoryLow:

price_quality_analysisTRUE 0.010129952 0.011525655 0.879 0.379477

Price_categoryMedium:

price_quality_analysisTRUE -0.001618477 0.010083329 -0.161 0.872483

exclamations:

excessive_exclamations 0.000299829 0.000310335 0.966 0.333997

Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 0.1854 on 8949 degrees of freedom

Multiple R-squared: 0.479, Adjusted R-squared: 0.4738

F-statistic: 93.48 on 88 and 8949 DF, p-value: < 0.00000000000000022

rmse of the test sample 0.1882908

The latter model for predicting helpfulness didn't include any of the variables, whose measurement is conducted with an individual-case expert assessment. We are talking, of course, about the number of claims in a review, their density and the inclusion of technical jargon. Due to the limitations of current methods of measurement, their assessment had been a very long and tedious process, so we couldn't analyze as many reviews as we would have wanted to. We were prepared to meet these difficulties in advance and thus decided to focus only on the most representative reviews - those, with 200+ visibility scores. Overall, we counted the number of claims and detected the use of specialist jargon in 618 reviews. Randomly selected 518 will become our training set and the rest - a test set. However, with the sample size this limiting we will have to discard a significant portion of our variables, so that we won't overfit our model. We chose to omit predictors, that were definitely ineffective in forecasting review's helpfulness in our third model (like the impact of the default avatar's on helpfulness), still leaving room for some of them that have a very strong theoretical background (e.g., images are still theoretically able to provide more information about the product and add credibility, despite them failing so far in the model), predictors that now don't have enough diversity in observations for each of their levels (for example, the variable “twoSided_argumentation” which indicates the de-facto absence of any arguments for either “pros” or “cons” section, is “0” for all reviews in this 618 reviews subsection) and most interaction effects and control variables. We do not expect to lose much predictive power with omitting these variables. We also changed the way we represent extremity and star rating of reviews in our model, that allowed us to proxy-control for the product rating but at the same time remove all possible factor combination variables, decreasing their number once again rather greatly.

Our final regression can be seen below [Model output 4]:

Call:

lm(helpfulness ~ I(status_quo_divergence^2) + stars + I(extremity^2) + word_count * word_threshold_passed + longer_comm + not_segmented + every_section + any_BP + pros_average_words_per_BP + cons_average_words_per_BP + posting_order * oversaturation_point_passed + verified + anonymous + images + any_BP + user_experience + number_of_claims + jargon + claims_density + I(claims_density^2) + expert_claim + product_comparisons + price_quality_analysis + product_description_truthfulness + summarised_opinion + conclusive_statement + measurement_indicators + price_mention + quality_mention + design_mention + call_to_action + delivery_and_packaging + expert_claim:jargon, data = training, na.action = na.exclude)

Residuals:

Min 1Q Median 3Q Max

-0.62444 -0.07729 0.00394 0.07991 0.55444

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.59390710 0.10137601 5.858 0.00000000870506 ***

I(status_quo_divergence^2) -0.06136451 0.00834186 -7.356 0.00000000000083 ***

stars -0.06014582 0.02418603 -2.487 0.013230 *

I(extremity^2) 0.06524152 0.00944106 6.910 0.00000000001551 ***

word_count 0.00014437 0.00020096 0.718 0.472849

word_threshold_passedTRUE 0.01557544 0.08624610 0.181 0.856764

longer_commTRUE 0.05173262 0.01555914 3.325 0.000953 ***

not_segmented 0.01982518 0.06815303 0.291 0.771260

every_section -0.04836198 0.03328839 -1.453 0.146930

any_BPTRUE 0.02834456 0.02395394 1.183 0.237281

pros_average_words_per_BP -0.00062645 0.00146820 -0.427 0.669802

cons_average_words_per_BP 0.00044399 0.00111721 0.397 0.691241

posting_order 0.00085656 0.00048473 1.767 0.077851 .

oversaturation_point_passedTRUE 0.07507880 0.03887363 1.931 0.054030 .

verifiedTRUE -0.01475162 0.02953392 -0.499 0.617671

anonymousTRUE 0.00650353 0.01791289 0.363 0.716718

imagesTRUE 0.03020453 0.02776143 1.088 0.277142

user_experienceMore than a year 0.01776306 0.03159997 0.562 0.574296

user_experienceSeveral months 0.01674515 0.01641542 1.020 0.308203

number_of_claims 0.01202372 0.00369145 3.257 0.001205 **

jargon -0.02390340 0.01762408 -1.356 0.175647

claims_density 0.00532939 0.00161542 3.299 0.001042 **

I(claims_density^2) -0.00001993 0.00000938 -2.125 0.034119 *

expert_claimTRUE -0.11834380 0.04550690 -2.601 0.009595 **

product_comparisonsTRUE 0.00534824 0.01631768 0.328 0.743239

price_quality_analysisTRUE 0.03869016 0.01824632 2.120 0.034484 *

product_description_truthfulnessTRUE 0.00220528 0.01623633 0.136 0.892018

summarised_opinionTRUE -0.02372710 0.02184870 -1.086 0.278038

conclusive_statementTRUE -0.01255914 0.02249942 -0.558 0.576970

measurement_indicatorsTRUE 0.03529070 0.01778097 1.985 0.047744 *

price_mentionTRUE -0.00921921 0.01531358 -0.602 0.547441

quality_mentionTRUE -0.01178566 0.01584146 -0.744 0.457257

design_mentionTRUE -0.01096119 0.01866090 -0.587 0.557220

call_to_actionTRUE -0.01703586 0.01699243 -1.003 0.316582

delivery_and_packagingTRUE 0.02252603 0.02163049 1.041 0.298216

word_count:

word_threshold_passedTRUE -0.00019600 0.00019783 -0.991 0.322310

posting_order:

oversaturation_point_passedTRUE -0.00102222 0.00053500 -1.911 0.056641 .

jargon:expert_claimTRUE 0.14665478 0.05445894 2.693 0.007331 **

Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 0.1547 on 478 degrees of freedom

(2 observations deleted due to missingness)

Multiple R-squared: 0.7187, Adjusted R-squared: 0.697

F-statistic: 33.01 on 37 and 478 DF, p-value: < 0.00000000000000022

rmse of the test sample 0.1391266

Now, this is much better: the model now explains 69.7% of variation in review helpfulness around its mean. The RMSE in training and test sets are rather low and are close together - 15.47% for the training sample and 13.91% for the test sample, so we can say that we avoided sampling biases. This model definitely can be used to make decent predictions! Let's examine, what makes it successful and see, what variables have we been missing in the model's earlier iterations. Note, that we slightly rearranged the order by which the variables are shown, so that they follow the structure of our hypotheses list for the RQ #2 (if we hadn't done so, it would have been difficult to navigate among them).

The coefficients for quadratic variables are a little tricky to interpret, but it is obvious that review's star rating plays a very important role in determining review helpfulness. In our model, 3 factors relate to the star rating: the measure of the divergence from the aggregate public opinion on the underlying product's star rating, the star rating of a review itself and the extremity of the rating. The divergence from the status quo and star-rating extremity are squared, because we previously identified a somewhat-quadratic relationship between them and helpfulness. The squared variable “status_quo_divergence2” is a proxy to account for the product rating and the other two relate to the review's star rating alone. Divergence from the public opinion exponentially decreases the helpfulness of a review, meaning that with a coefficient of -6.14%, a 1-star reviews written for 4-star rated products will be 55.23% less helpful than 4-star reviews for the same product. This review is very extreme (as would 5-star rated review be), so it will be 26.1% more helpful than 3-star reviews and 19.58% more helpful than 2- and 4-star reviews. Finally, 1 star in this 1-star review means that it will have the lesser penalty of only 6.01%; the “stars” variable is linear, so a review with any different number of stars would be less helpful than 1-star reviews by its number of stars less 1. This is a little convoluted, but once again, we morphed 33 variables into just 3, while keeping the logic behind them. “status_quo_divergence2” and “extremity2” are significant at 99.9% CL, “stars” are significant at 95% CL.

Since we mostly analyzed positively rated products, we can say that negative reviews are generally less helpful than positive reviews. To reduce the confusion, we plotted how helpfulness ratios of reviews with different star-ratings relate to each other for products of different aggregate ratings [Chart 28]. To create this chart, we used a sample that includes all reviews with 30+ votes on them - the same one we used to build Model 3.

The patterns for the averages and the spreads of distributions are very clear:

1-star reviews consistently perform the worst and are almost always more downvoted than upvoted for 4.5- and 5-star rated products. Their usefulness increases as they converge to the status quo (the underlying products rating).

Chart 28. Helpfulness distributions of review star ratings for products of different aggregate ratings

2-star reviews are less downvoted than 1-star reviews, but follow a similar trend

3-star reviews are upvoted even more commonly than the previous two groups

4-star reviews have means of helpfulness at ~ 80% for highly praised products and drop to ~ 70% for poorly rated products.

5-star reviews lead most categories (3-, 4.5- and 5-star rated products) and consistently have average helpfulness around 80-85%, no matter the public opinion.

Positive reviews also have the lowest variances, which means that their helpfulness is universally agreed on.

This is our first model, where the standalone number of words didn't prove to be significant. One explanation to this fact is that our newly included variables serve the same function to indicate the comprehensiveness or informativeness of a review: the number of claims and their densities, both linear and squared, proved to do their work better than the standalone number of words. Also note, that the variable “claims_density” is calculated using the number of words in a review as a numerator. Each unique statement about the product or your experience with it indicates an increase in review helpfulness by 1.2% and for each extra 10 words that the claims have on average it grows by ~ 5.33%. Both factors are significant at 99% CL.

It seems that reviews with a comment, larger than both “pros” and “cons” sections are more helpful after all - by 5.17% (sign. at 99.9% CL). This can be interpreted as a general preference of Russian customers of a certain type of information within a review: the “comment” section often includes stories, interesting observations and other content that couldn't possibly be strictly divided on “good” and “bad” experiences to write in the relative section. So, in this sense, bigger “comment” sections can indicate that a review is simply more interesting to read or that it goes more in-depth on each point. If the latter is the reason for the identified relationship, it might as well be for the fact that neither the inclusion of a complete “pros - cons - comment” structure, nor any bullet point lists in a review signify any change in its helpfulness, because there is no point in segmentation or bullet-pointed lists, if you are going to write a wall of text on each point anyway.

Recency of a review (approximated with the posting order) doesn't play a role in determining review helpfulness and there exists no point of oversaturation beyond which reviews become less helpful.

If review's author declares him- or herself a specialist in a relative area or to be very familiar with the products, similar to the reviewed one, his/her review will generally be 11.83% less helpful, than if they hadn't done so. However, if they use technical jargon or specialist terminology along with this claim, their reviews will instead be 2.83% more helpful than if they didn't claim their expert authority. The use of technical jargon or specialist terminology alone doesn't affect the helpfulness of a review. One explanation to this may be that people generally tend to hate the spread of misinformation and when they are certain that the claims are fraudulent or simply poor (in our case this means they are legitimate specialists and/or experienced users, reading this review) they fight to silence or discredit it, thus heavily downvoting it. The reasons as to why most reviews in our sample might include fraudulent or poor arguments are unclear, so our speculations might be not as flawless. Anyway, both “expert_claim” and “jargon:expert_claim” are significant at 99% CL, so it is unlikely that this relationship is due to chance.

Among the contents of the review only price/quality analysis and measurement indicators proved to predict review helpfulness with decent certainty. The inclusion of price/quality analysis in a review indicates an increase in its helpfulness by 3.87% and the inclusion of measurement indicators (a staple of any form of analysis) indicated an increase of 3.53%.

RQ #3

So far, we've done well, but one question remains unanswered: is our model able to improve currently employed sorting algorithm? To “improve” in this case means to (1) identify faulty review rankings that placed much less helpful reviews higher than much more helpful reviews and (2) fairly rearrange them. We will first use our model on 0-vote reviews grouped by their underlying products and evaluate their helpfulness. Then we will rank them ourselves based on their helpfulness ratios and compare our rankings with those of Yandex Market. If there is a mismatch in positioning between reviews with more than 25% difference, we will combine them into a pair and encode a less helpful review “the worst” and a more helpful “the best”.

This operation is repeated until we have 100 pairs of misranked reviews. Then we will give them to our mock “customers” - a group of 13 respondents, who will name each of two reviews in a pair either “the worst” or “the best” based on their own perception. Their answers will decide if our model performed well: if there are significantly more correct identifications made by our model than the average random ranking would produce, then we say that it has the ability to improve current sorting algorithms, at least for the 0-vote section. And since we have no reason to believe that review's helpfulness increases as it gets more votes, we could say that the model could potentially improve the ranking of any severely misranked reviews. To compare the distribution's means, we use a t-test once again [Model Output 5].

Welch Two Sample t-test

data: correct_matches Distribution of our model's predictions (0 - wrong reranking, 1 - correct reranking) and c(rep(0, 50), rep(1, 50))

t = 2.3113, df = 197.43, p-value = 0.02185

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

0.02348723 0.29651277

sample estimates:

mean of x mean of y

0.66 0.50

Looks like our model was successful in fairly rearranging the helpfulness position of reviews in ~ 66% of cases. The difference between our model estimates and random selection is statistically significant at 95% CL.

The model managed to improve current sorting algorithm for unrated reviews.

Revising hypotheses

Finally, we return to our list of hypotheses. In the map below [Image 2], we summarized how the outcomes of our analysis relates to each of the hypotheses. For convenience, accepted hypotheses are highlighted with green and rejected ones are highlighted with red.

We accept the 1st, 4th, 5th, 7th, 10th, 16th, 19th, 33rd, 36th, 40th, 53rd and 54th hypotheses. All other hypotheses we reject.

Image 2: The outcomes of hypotheses testing

Conclusion

Online markets all around the world share one requisite feature: a review section, where those customers, who have purchased a product can leave their feedback relating to their experience and satisfaction with said product. Later, other customers can use their experiences to form a vision of what the product is like in reality, which is very useful when you try to make a purchase decision between several similar items on the market.

Some online markets and marketplaces go even further and offer their users an opportunity to give feedback to the reviews themselves, usually in the form of “thumbs ups” and “thumbs downs”. It is expected that potential customers, who are interested in experiences of others will upvote reviews that helped them to make up their minds about the product and downvote those that wasted their time or tried to mislead the unwise reader but were called out by more knowledgeable users. The cumulative votes on these reviews that are shown to all platform users can be and is used by them to choose reviews that they will spend time reading and processing. This filtering option becomes especially useful when the number of reviews reaches hundreds if not thousands, as each person has a limited attention span and time, they are willing to spend on product experience analysis. For this very reason marketplaces that feature a review voting option uniformly employ a sorting algorithm to place the most helpful reviews at the top and the least helpful and detrimental reviews at the bottom. This way, customers, when faced with a plethora of reviews, will encounter the top of the crop straight away, which reduces time investments necessary to select reviews to read and to collect a comprehensive image of the underlying product's performance, reduces their frustration and potentially improves the decisions they make, as best rated reviews are expected to provide a more accurate and beneficial information about the products.

But this system has inherent flaws, the biggest one being that it heavily relies on its user base to vote on the reviews. But if the number of reviews constantly increases and you continue to show only a selected few, what happens to the rest? They settle so low in those “by helpfulness” sorted review lists that they are very rarely seen by anyone, even though they might provide invaluable insights about the product, that the reader wouldn't find anywhere else. A clear loss for the customers. The authors of these reviews often spend a significant amount of their time and a lot of effort to write long, comprehensive reviews and may become dissuaded from writing them after they don't receive recognition for their thought-out contributions. A clear loss for the platform, because without this motivation a lot of good authors may churn from leaving reviews altogether. We initially also thought that this may lead to people only being motivated (by their negative emotions and thirst for justice) to leave mostly negative reviews on the products and thus the proportion of negative reviews in the “top section”, which is the first several pages of supposedly most helpful reviews, doesn't fairly represent their overall share among all reviews for the product, but our later analysis proved us wrong: there is no more than 1-2% difference, which is less than 1.5 reviews, when you consider the length of the “top section”.

Another issue of current sorting algorithms is that a major part of evaluation is an absolute number of votes that a review receives. But it makes no sense that a review should be considered more helpful just because more people noticed and decided to actually read and evaluate it. We argue, that a proportion of people, who benefits form reading a review should be the only target measurement that determines the position of a review in ratings. And this value is intrinsic for a review unlike visibility scores, which are heavily susceptible to sampling biases (of evaluating readers). We believed that there exists a set of content- and meta-specific factors that can evaluate review helpfulness without relying on a huge user base to estimate each review individually each time a new one is posted.

To do so, first we needed to justify that Russian online marketplaces suffer from the same issue, because the researches that inspired this paper analyzed Western and South Asian marketplaces exclusively and didn't provide any analysis for Russian online communities. Since we sampled our review data from Yandex Market - a very popular Russian online marketplace, this was the thing we started with.

We found out that Russian online marketplaces have very similar issues: approximately 29% of all reviews for all products with over 50 reviews had 0 votes and that reviews with 0, 1, 2 and 3 votes make up more than half of all reviews on the platform. For an algorithm, heavily weighting numbers of votes in its evaluation of helpfulness those reviews are invisible and are placed at the bottom of helpfulness rankings. Always. We want to accentuate that more than half of reviews on the platform is nigh- to completely unfairly rated by the algorithm. The situation is slightly better only for the most expensive (>20,000 RUB) products on the market, for which people try reading (and sometimes voting on) more reviews. But even the reviews of the most expensive products don't get that much attention: reviews with more than 8 votes make up less than half of their overall number. These findings justify the necessity to build a better ranking algorithm.

If we assume (and we have no reasons to believe we are incorrect) that the proportion of customers that vote on reviews they read is approximately the same among the readers of all review pages, we can extrapolate the average number of votes that are left for each helpfulness rank to mean the number of people, who saw and actually read the review, placed at that rank. And while we won't have any meaningful absolute numbers, we can use these values to approximate the ratio of customers, who continue to read reviews further as their helpfulness rank decreases. This way we found, that ~75% of people are satisfied with reading just the first review page, ~80% stop reading after the second and so on. After the 7th page barely anyone reads reviews. This point of oversaturation denotes a point after which every review can be considered almost fully obscured. 70th rank (the last review on the 7th page) is supported by the analysis on both “by helpfulness” and “by date” sorting methods.

Along with low-rank reviews, other severely obscured reviews include: positive reviews, especially those written for well-received products; reviews written for “experience” type products, reviews written for simple, technically uncomplicated products and reviews written for relatively cheap (<5,000 RUB) products. Everyday use goods, food products and beverages, household chemicals and other conventional items' rankings are expected to be completely unfair, because reviews left for them receive no votes, which the current algorithm depends on.

With this newfound demand for improvement we set up a series of predictive models built around the features, which can be present in any review. We wanted our model to be applicable for any product and so didn't include the presence of any product-specific content patterns as a part of our variable set. The features we chose are partially based on analyzed researches on review visibility of other authors and on some, that theoretically could signal a higher or lower review quality. Our best performing model showed, that (1) even though people tend to specifically select negative reviews to read, they almost always disagree with their opinion and positive reviews will be well-received no matter what other people think about the product, (2) the more is described about the product and the more comprehensive those descriptions are, the more helpful will a review become, (3) the date a review is posted on has no effect on its inherit helpfulness, (4) product experts need to claim their expertise and use specialist terminology to have their reviews be more well-received by the public, because if they don't their reviews will lose in their perceived helpfulness and (5) customers like it when authors evaluate fairness of the underlying product's price, while taking into account its quality (price/quality analysis) and when the authors evaluate some aspects of the product or its competitors relative to something else.

Our final model explained almost 70% of variation from the mean and we managed to successfully implement it to rearrange unfairly ranked reviews. Still, we speculate that a big portion of the remaining 30% is attributable to variables relating to the product-specific content, which we deliberately left out of the model.

Our findings are far form being exhaustive and we expect that our model can be improved greatly by future researchers and professionals, but we had set up a theoretic foundation and empirically identified review features that can predict review's helpfulness with a high certainty without any contributions from the platform users. One improvement could be made by revising content variables. We evaluated them by detecting the presence of certain strings and their combinations in the text of a review. However, the lists of these strings were far from complete as we were able to identify only their most common variations and synonyms. A more inclusive and more properly filtering set of keyword dictionaries might lead to more accurate results.

Appendix

Keyword content indicator dictionaries

Table 13.{DICTIONARY == twoSided_argumentation}

(\bнет\s*(?![\w|\W])|\bнету\s*(?![\w|\W])|\bникаких\s*(?![\w|\W])|\bничего\s*(?![\w|\W]))

не (отыскал\w*| уловил\w*| наш\w*| увидел\w*| отметил\w*| заметил\w*| выявил\w*| обнаружил\w*)\s*(?![\w|\W])

нет (недостатков|достоинств|плюсов|минусов)\s*(?![\w|\W])

все(плохо|идеально|ужастно|хорошо|прекрасно|отвратительно)\s*(?![\w|\W])^\s*(\s*\-\s*)(?![\w|\W])

Table 14.{DICTIONARY == Summarized_opinion}

приличн

очень хорош

очень плох

идеальн

замечательн

отвратительн

божественн

приемлем

хорош

неплох

удобн

на славу

в восторге

как нельзя лучше

на совесть

на должном уровне

в лучшем виде

изумит

хлам

великолепн

отличн

красота

в ажуре

полный улет

хоть куда

грех жаловаться

блеск

все пучком

замечательн

крут(о|е)

благодат

достойн

достоин\b

дай бо(г|же)

щедр[\w]+

идеальн

основательн

знатн

годн

одобрительн

славн

превосходн

первоклассн

похвальн

чудн

изумительн

прикольн

благотворн

отменн

зачетн

козырн

\bлюбо\b

любо-дорого

добротн

гуд

норм

бэд

клев[\w]+

прекрасн

тип-топ

славн

справн

чудн

чудесн

комильфо

окейн

м образом

никуда не год

ну\

* такое

не фонтан

с грехом пополам

ужасн(о|ы|а)

печальн

кое-как

через одно место

оставляет желать лучшего

врагу не пожела

поган

кошмарн

дурн

скверн

добросовестн

хренов

отвратительн

негодн

возмутительн

дрянн

грешн

по кайфу

мда

не ахти

утешит

(хрень|хрено)

отвратн

безотрад

бездарн

плох

халтурн

как нельзя кстати

беспонтов

(ожидал|ждал)\w* (худшего|лучшего)

зарекомендов[\w*]

Table 15. {DICTIONARY == Product_description_truthfulness}

обман

все правд[\w]* (говорят)*

развод

неправд[\w]*

вранье

лапш(у/а) на уши

мошеннич[\w]*/мошенник[\w]*

соответств[\w]* действительности

дизинформац

не\s*соответств[\w]*

надувательство

ожидани(е|я)(\-|\/)реальность

ожидал/получил

производитель (заявляет|заявил)

(ожидани(я|е)|надежд(а|ы)) оправдал

оправд[\w]* (ожидани(я|е)|надежд(а|ы))

описани(е|ю)

обеща(ет|ют|л)

характеристик(и|а)*

(даже)* лучше\

* чем ожидал

(ожидал|ждал)\w* (худшего|лучшего)

рекомендую

превз[\w]* все ожидания

оказал.*с\w*

реальност

действительн

не обманул

все правда

правду сказали

правду говорят

приукраш

Table 16. {DICTIONARY == Product_comparisons}

(я|жен[\w]*|дру[\w]*) занимал

(несколько|много) лет

сравн

сопостав

соотн(е|о|ё)с

сверил

сход(ств|н|и)

различ

анализ

аналог

привести в пример

да.. фору

предпоч

отлича

ровня

у [\w]+ уже был

между [\w]+ и [\w]+

китай

\bали

лучше чем

хуже чем

ценне

(взвесив|взвешивая|взвесить)

Table 17. {DICTIONARY == Price_quality_analysis}

комплект

за свои деньги

своих денег

(взвесив|взвешивая|взвесить)

за глаза

бюджетн

дорог

дешев

ширпотреб

дешево и сердито

копейки

за гроши

даром

по [\w]* цене

на совесть

первоклассн

цен[\w]* (диапазон|категор|сегмент)

выгодн

подходящ

качественн

удобн

комфортн

практичн

выигрышн

посредственн

фуфло

(ожидал|ждал)\w* (худшего|лучшего)

цена[\/\-\s]качество

качество[\/\-\s]цена

в топку

реальн

Table 18. {DICTIONARY == call_to_action}

рекомендую

покупай(те)*

бери(те)*

остерегайтесь

советую[^(щий)|^(т)]

отвечаю

хватайте

не пожалеете

приобретайте

скупайте

сметайте

расхватывайте

руча(юсь|ется|емся)

мо(гу|жет|жем) поручиться

Table 19. {DICTIONARY == Conclusive_statement }

в заключение

в итоге

как итог

подводя итоги

все вышесказанное

суммируя все

\bвывод

(взвесив|взвешивая|взвесить)

Table 20. {DICTIONARY == measurement_indicators }

самый

очень

слишком

чересчур

мало

излишне

перебор

достаточн

Table 21. DICTIONARY == delivery_and_packaging }

достав[\w]*\s(?!=(неудобств))

\b(от)*брак

\bбрак

поврежден

гарант

упаковк

Table 22. {DICTIONARY == Expert_claim}

я разбир

я работ

у меня есть опыт

у меня имелся опыт

я опыт(ен|на)

я знаком

уже [\w]*работ(ал|ю)

работ(ал|ю) уже

\bя\b \w* со стажем

\bкак\b \w* со стажем

\bя\b \w* с опытом

име[\w]* опыт

пользовал[\w]* уже

уже знаком

знаком[\w]* уже

уже [\w]*пользовал

шарю

у меня уже был

Table 23. {DICTIONARY == P.S.}

P\.S\.*

\nP\.S\.*

\bПС\b

\bПы\s*Сы\b

\bЗЫ\b

Table 24. Mentions of product's design, price and quality.

дизайн

внешн[\w]* вид

цен(е|а|у|ный)\s??(?!=(диапазон|категор|сегмент))

качеств

References

1. Chen, P., Dhanasobhon, S., & Smith, M. (2008). All Reviews Are Not Created Equal: The Disaggregate Impact of Reviews on Sales on Amazon.com. Carnegie Mellon University.

2. Baumeister, R. F., Bratslavsky, E., Finkenauer, C. & Vohs, K. D. (2001). Bad Is Stronger Than Good. Review of General Psychology, 5 (4), 323-370.

3. Chevalier, J & Mayzlin, D. (2006). The Effect of Word of Mouth on Sales: Online Book Reviews. Journal of Marketing Research, (43:3), 345-35.

4. Clemons, E., Gao, G, & Hitt, L. (2006). When Online Reviews Meet Hyper differentiation: A Study of the Craft Beer Industry. Journal of Management Information System. (23:2), 149-171.

5. Eisend, M. (2006). Two-Sided Advertising: A Meta-Analysis. International Journal of Research in Marketing, (23), pp. 18-198.

6. Fang, B., Ye, Q., Kucukusta, D. & Law, R. (2016). Analysis of the perceived value of online tourism reviews: Influence of readability and reviewer characteristics. Tourism Management, 52, 498-506.

7. Forman, C, Ghose, A., & Wiesenfeld, . (2008). Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets. Information Systems Research (19:3), 291-313.

8. Huang, A., Chen, K., & Yen, D.C. (2015). A study of factors that contribute to online review helpfulness. Computers in Human Behavior, 48, 17-27.

9. Huang, P., Lurie, . H., & Mitra, S. (2009). Searching for Experience on the Web: An Empirical Examination of Consumer Behavior for Search and Experience Goods. Journal of Marketing, (73:2), 55-69.

10. Kim, P. Pantel, T. Chklovski, M. (2206). Automatically assessing review helpfulness, Conference on Empirical Methods in Natural Language Processing. Morristown NJ. 423-430.

11. Krishnamoorthy, S. (2015). Linguistic features for review helpfulness prediction. Expert Systems with Applications 42, 3751-3759.

12. Liu, Z. & Park, S.(2015). What makes a useful online review? Implication for travel product websites. Tourism Management, 47, 140-151.

13. Mayzlin, D. (2006). Promotional chat on the internet. Marketing Science, 25 (2), 155-163.

14. Nelson, P. (1970). Information and Consumer Behavior. Journal of Political Economy, (78:20), 311-329.

15. Nelson, P. (1974). Advertising as Information. Journal of Political Economy, (81:4), 729-75.

16. Park, C., & Lee, T.M. (2009). Information direction, website reputation and eWOM effect: A moderating role of product type. Journal of Business Research, 62(1), 61-67.

17. Pavlou, P., & Dimoka, A. (2006). The Nature and Role of Feedback Text Comments in Online Marketplaces: Implications for Trust Building, Price Premiums, and Seller Differentiation. Information Systems Research, (17:4), 392-414.

18. Purnawirawan, N., Eisend, M., Pelsmacker, P. & Dens, N. (2015). A Meta-analytic Investigation of the Role of Valence in Online Reviews. Journal of Interactive Marketing, 31, 17-27.

19. Qing Cao, Wenjing Duan & Qiwei Gan (2011). Exploring determinants of voting for the “helpfulness” of online user reviews: A text mining approach. Decision Support Systems, 50, 511-521.

20. Rozin, P. & Royzman, E. (2001). Negativity bias, negativity dominance, and contagion. Personality and Social Psychology Review. 5(4). 296-320.

21. Salehan, M. & Kim, D.J. (2016). Predicting the performance of online consumer reviews: A sentiment mining approach to big data analytics. Decision Support Systems, 81, 30-40.

22. Sbaffi, L. & Rowley, J. (2017). Trust and credibility in web-based health information: A review and agenda for future research. Journal of Medical Internet Research, 19(6).

23. Schlosser, A. (2005). Source Perceptions and the Persuasiveness of Internet Word-of-Mouth Communication. MN: Association for Consumer Research, 202-203.

24. Schuff, M. (2010). What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon.com. MIS Quarterly, 185-200.

25. Semin, G. R., & Fiedler, K. (1991). The linguistic category model, its bases, applications and range. European Review of Social Psychology, 2, 1-30.

26. Weathers, D., Swain, S., & Grover, V. (2015). Can online product reviews be more helpful? Examining characteristics of information content by product type. Decision Support Systems, 79, 12-23.

27. Y. Liu, X. Huang, A. An, X. Yu, F. & Giannotti, D. (2008). Modeling and predicting the helpfulness of online reviews, Eighth IEEE International Conference on Data Mining, Piscataway. NJ, 443-452.

28. Yi-Hsiu Cheng & Hui-Yi Ho (2014). Social influence's impact on reader perceptions of online reviews. Journal of Business Research, 883-887.

29. Yin, D., Bond, S.D. & Zhang, H. (2014). Anxious or ...


Подобные документы

  • История и причины для размещения product placement. Виды размещения product placement: визуальный; вербальный; кинестетический. Отношение читательской аудитории к размещению торговой марки в книгах. Плюсы и минусы российского книжного product placement.

    курсовая работа [40,9 K], добавлен 24.11.2010

  • Историческое развитие и современное состояние Product Placement. Скрытая реклама в СМИ. Практическое применение Product Placement как инструмента маркетингового PR в РФ. Социологическое исследование Product Placement в российском кино, его преимущества.

    курсовая работа [332,4 K], добавлен 09.06.2014

  • Research tastes and preferences of consumers. Segmenting the market. Development of product concept and determine its characteristic. Calculating the optimal price at which the firm will maximize profits. Formation of optimal goods distribution.

    курсовая работа [4,4 M], добавлен 09.08.2014

  • Product Placement в книжных изданиях: виды, преимущества и недостатки. Характеристика отечественного рынка книжной продукции: основные игроки. Популярные жанры художественной литературы и авторы для размещения Product Placement, их целевая аудитория.

    дипломная работа [119,9 K], добавлен 19.07.2011

  • Purpose of the Marketing Plan. Organization Mission Statement. The main strategies employed by BMW. Sales volume of automobiles. New records set for revenues and earnings. Current models of BMW. Product life cycle. Engagement in Celebrity Endorsement.

    курсовая работа [879,4 K], добавлен 03.05.2015

  • История развития и характеристика основных достоинств и недостатков Product Placement в российской киноиндустрии как рекламного приёма, заключающегося в использовании реального коммерческого бренда в качестве реквизита. Применение рекламного логотипа.

    курсовая работа [98,6 K], добавлен 06.01.2011

  • Скрытая реклама, ее понятие, характеристики и виды. Product placement как разновидность скрытой рекламы и техника его эффективного применения, ее отличия от других видов рекламы. Правовые основы размещения Product placement в современной телепродукции.

    курсовая работа [895,1 K], добавлен 19.10.2010

  • История возникновения, понятие, сущность рекламы. Классификация по типу размещения, в зависимости от объекта, преимущества и недостатки РР. Правовое регулирование Product Placement. Разработка стратегии применения для продвижения продукции компании ASICS.

    дипломная работа [1,0 M], добавлен 23.08.2017

  • Общая характеристика скрытой рекламы как разновидности рекламного продукта, ее типология. Анализ Product placement, его содержание, механизмы воздействия на потребителя, сравнение с другими видами рекламы, характер использования в современном телевидении.

    дипломная работа [2,9 M], добавлен 23.11.2009

  • Мировой рынок рекламы: особенности, состояние. Основные тенденции развития российской рекламы. Product Placement (скрытая реклама) в фильме "Дневной дозор". Планирование рекламной кампании, опирающейся на образ упаковки Водка "Абсолют": абсолютный успех.

    курсовая работа [82,4 K], добавлен 17.11.2014

  • Законность размещения брендов в художественных произведениях. Понятие Product placement, определение рекламы и спонсорства. Сравнительный анализ указанных технологий. Бренды - герои фильмов. Правовые основы размещения брендов в произведениях искусства.

    курсовая работа [29,7 K], добавлен 05.02.2009

  • Исследование особенностей функционирования и развития Product Placement в компьютерных играх. Определение законности размещения торговой марки, товара или услуги в компьютерных играх с целью получения прибыли. Виды скрытой рекламы в компьютерных играх.

    курсовая работа [1,9 M], добавлен 17.06.2013

  • Product placement як різновид прихованої реклами і техніка його ефективного застосування. Правові основи розміщення брендів у творах мистецтва. Практичний аналіз художніх фільмів за наявності реклами. Розміщення певної торгової марки або самого товару.

    курсовая работа [2,3 M], добавлен 19.04.2015

  • Классификация маркетинговых стратегий, основы их формирования. Компоненты 4Р (product, price, promotion, placement) в маркетинге. Политические, экономические, социально-культурные и правовые факторы, определяющие международную маркетинговую стратегию.

    контрольная работа [32,4 K], добавлен 08.01.2017

  • Анализ особенностей технологии и основных каналов размещения Product Placement. Оценка эффективности использования данной рекламной технологии. Разработка проекта телешоу для продвижения гостиничных услуг. Исследование общественного мнения о телешоу.

    дипломная работа [261,0 K], добавлен 16.06.2013

  • Business plans are an important test of clarity of thinking and clarity of the business. Reasons for writing a business plan. Market trends and the market niche for product. Business concept, market analysis. Company organization, financial plan.

    реферат [59,4 K], добавлен 15.09.2012

  • The concept of brand capital. Total branded product name for the whole company. Nestle as the largest producer of food in the world. Characteristics of technical and economic indicators. Nestle company’s brands. SWOT-analysis and Nestle in Ukraine.

    курсовая работа [36,2 K], добавлен 17.02.2012

  • Plan of marketing. Absence of a large number of competitors. The emergence of new competitors. Active advertising policy. The absence of a known name in the market. Break even point. The price for advertising. Distribution channels and pricing policy.

    презентация [2,6 M], добавлен 15.02.2012

  • A detailed analysis of lexical-semantic features of advertising in the World Wide Web. Description of verbal and nonverbal methods used in online advertising. Bringing a sample website hosted on its various banners and advertisements to consumers.

    дипломная работа [99,7 K], добавлен 10.04.2011

  • The collection and analysis of information with a view of improving the business marketing activities. Qualitative & Quantitative Research. Interviews, Desk Research, Test Trial. Search Engines. Group interviews and focus groups, Secondary research.

    реферат [12,5 K], добавлен 17.02.2013

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.