rvelices wrote:
plg wrote:
I totally agree, this is partly why I didn't want to activate ratings. Maybe we could add a simple threshold of 10 rates before being displayed in the best rated page.
The best I could find and computationnaly feasible in Php/MySql is to replace the simple average rate with a bayesian average (http://en.wikipedia.org/wiki/Bayesian_average). In this formula we could choose C = average number of votes (for photos with at least one vote).
We could implment it in the core piwigo maybe.
What this formula does not solve is
a. time/age factor (a photo one year old had more chances to get votes than one added one week ago)
b. audience factor (public/private), number of links to get to this photo (if already in best rated, many tags/albums, google - more likely users will see it and score it), etc ...
I would like to find a solution for a. which I think is important but I dont have it yet...
Well, that seems interesting but it must remain understandable for someone who didn't follow any PhD in Statistics :-) This is why I think that a simple threshold minimum number of rates to appear on the best rated page sounds more appropriate to me.
Of course, this improvement must be done in Piwigo core (rating is completely part of Piwigo core, even if I would like to be move into a plugin for the future).
Concerning bullet a, what about something like only take into account rates younger than 30 (or N) days for Best Rated page
Offline
plg wrote:
Well, that seems interesting but it must remain understandable for someone who didn't follow any PhD in Statistics :-)
Come on ... it's just a fancy name for an average with an aditional sum, multiplication and one division... (if you want the PhD version checkout Lower bound of Wilson score confidence interval for a Bernoulli parameter on http://www.evanmiller.org/how-not-to-so … ating.html and full explanation here: http://en.wikipedia.org/wiki/Binomial_p … e_interval)
plg wrote:
This is why I think that a simple threshold minimum number of rates to appear on the best rated page sounds more appropriate to me.
yes but you have to maintain the threshold correctly and when people vote first time, they will not see anything ...
plg wrote:
Concerning bullet a, what about something like only take into account rates younger than 30 (or N) days for Best Rated page
But then why should I go and vote if in N days it will be ignored ?
Offline
I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.
For ex, I'm looking at popularity contest a wp plugin, very good :
it calculates a score with weight values :
$akpc_settings['show_pop'] = 1; // clickthrough from feed
$akpc_settings['show_help'] = 1; // clickthrough from feed
$akpc_settings['ignore_authors'] = 1; // clickthrough from feed
$akpc_settings['feed_value'] = 1; // clickthrough from feed
$akpc_settings['home_value'] = 2; // clickthrough from home
$akpc_settings['archive_value'] = 4; // clickthrough from archive page
$akpc_settings['category_value'] = 6; // clickthrough from category page
$akpc_settings['single_value'] = 10; // full article page view
$akpc_settings['comment_value'] = 20; // comment on article
$akpc_settings['pingback_value'] = 50; // pingback on article
$akpc_settings['trackback_value'] = 80; // trackback on article
and then it seems to just do :
$result = $wpdb->query("
UPDATE $wpdb->ak_popularity
SET total = (home_views * $this->home_value)
+ (feed_views * $this->feed_value)
+ (archive_views * $this->archive_value)
+ (category_views * $this->category_value)
+ (tag_views * $this->tag_value)
+ (single_views * $this->single_value)
+ (searcher_views * $this->searcher_value)
+ (comments * $this->comment_value)
+ (pingbacks * $this->pingback_value)
+ (trackbacks * $this->trackback_value)
");
Finally the score displayed (for admin) is in % made with the top ranked post
This method is easy to parameter and to understand, but it doesn't take into account how old the element is : we could add a penalty due to the age.
Last edited by flop25 (2011-07-21 17:30:39)
Offline
rvelices wrote:
plg wrote:
Well, that seems interesting but it must remain understandable for someone who didn't follow any PhD in Statistics :-)
Come on ... it's just a fancy name for an average with an aditional sum, multiplication and one division...
OK, let's have a little example then :-)
We have 3 photos, photo1, photo2 and photo3 with the following ratings:
* photo1 : 5, 4, 4
* photo2 : 5
* photo3 : 4, 3, 4
The basic average rate is:
* photo1 : 4.33
* photo2 : 5.00
* photo3 : 3.66
And of course it makes photo2 rank in position #1 with only one rating. I would much prefer to see photo1 in #1 because it has received several good ratings (even if the average is slightly lower than photo2).
Now, I take the Bayesian average formulae, with m=4.33 (the average rate of a rated photo) and C=2.33 (average number of rate for each rated photo).
* photo1 : 4.33
* photo2 : 4.53
* photo3 : 3.95
(we keep the same "best rated" order with such a small dataset)
Is my computation correct?
Now take a bigger set (more ratings):
* photo1 : 5, 4, 4, 5, 5, 5
* photo2 : 5
* photo3 : 4, 3, 4, 4
Bayesian average with m=4.36 and C=3.66 :
* photo1 : 4.55
* photo2 : 4.50
* photo3 : 4.04
Which is starting to look much better :-)
Technical detail: we have to maintain values for m and C cached in the database and not recompute them every time we want to display an averate rate for a photo.
What seems to be a problem for me is that when a visitor gives the 5 stars rating for photo2, the average rate is not 5 but 4.50.
Offline
flop25 wrote:
I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.
I like this idea, but I don't think it's the same as ratings. Ratings implies a specific action from the visitor. He may display a photo several times and never rate it, on purpose.
Offline
plg wrote:
flop25 wrote:
I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.I like this idea, but I don't think it's the same as ratings. Ratings implies a specific action from the visitor. He may display a photo several times and never rate it, on purpose.
When you -both of you- was talking about :
What this formula does not solve is
a. time/age factor (a photo one year old had more chances to get votes than one added one week ago)
b. audience factor
it's more about popularity than rating
That's two different subject : if you want to add more parameter in the rating, it's popularity. if it's about how to compute the meaning, that's rating
About the new computation you just have tried, it sounds good
Offline
plg wrote:
Is my computation correct?
yes
plg wrote:
Technical detail: we have to maintain values for m and C cached in the database and not recompute them every time we want to display an averate rate for a photo.
Good point.... When we rate a picture, technically the "rating score" (note that I don't use average anymore) of all pictures will change. However in practice we don't need to recompute each time C/m (unless for a very small number of rates)... So two solutions
- we recompute global average / count each time
- we save in db global average / count and we recompute them only if new global count is bigger than 1% ... computing count(*) on a table does not require a table scan
In both cases we update only the newly rated item (we need from time to time to update all rates for all items with some algo ...)
plg wrote:
What seems to be a problem for me is that when a visitor gives the 5 stars rating for photo2, the average rate is not 5 but 4.50.
You are right we should not call it "average rate" or "bayesian average rate" but maybe "rating score" ? or an equivalent
Offline
plg wrote:
flop25 wrote:
I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.I like this idea, but I don't think it's the same as ratings. Ratings implies a specific action from the visitor. He may display a photo several times and never rate it, on purpose.
The popularity score might be difficult to compute and maintain in php/mySql (the rating score alone appears more complicated ...). But on the long run why not have "Best rated" and "Most popular" if we come up with a solution scalable for large dbs ?
Offline
rvelices wrote:
plg wrote:
Is my computation correct?
yes
in fact we could simplify the m computation as being as the average of all rates ... for the first example m would be 4.14 instead of 4.33
Offline
rvelices wrote:
So two solutions
- we recompute global average / count each time
- we save in db global average / count and we recompute them only if new global count is bigger than 1% ... computing count(*) on a table does not require a table scan
Computing C and m does not cost a lot if we only do it each time a rating is added/modified/deleted. But C and m is not the main problem, and as you say:
rvelices wrote:
In both cases we update only the newly rated item (we need from time to time to update all rates for all items with some algo ...)
To make things simple, I would say to update all images.average_rate each time a rating is added/modified/deleted (to any photo in the gallery). With mass_updates, it's not a big deal (unless we have a gallery with 50k photos rated :-/). If it's not efficient enough, then yes we can keep a cached value "average_rate_last_compute_on_rates_global_count" and if it is 1% bigger than "rates_global_count", then we recompute all images.average_rate and reset "average_rate_last_compute_on_rates_global_count"
Other question: do C and m apply on the
Other question: do C and m apply on the whole gallery or do we take permissions into account? => I would say "no" because it would mean that the rating score (images.average_rate) is different for each user :-/
rvelices wrote:
You are right we should not call it "average rate" or "bayesian average rate" but maybe "rating score" ? or an equivalent
rating score sounds nice.
Offline
rvelices wrote:
in fact we could simplify the m computation as being as the average of all rates ... for the first example m would be 4.14 instead of 4.33
OK. It doesn't change a lot:
* photo1: 4.25
* photo2: 4.40
* photo3: 3.87
Offline
plg wrote:
...To make things simple, I would say to update all images.average_rate each time a rating is...
Agreed
plg wrote:
Other question: do C and m apply on the whole gallery or do we take permissions into account? => I would say "no" because it would mean that the rating score (images.average_rate) is different for each user :-/
Let's tart with no time effect, no permission effect... We'll see later
plg wrote:
rating score sounds nice.
Would you agree to change database "average_rate" to "rate_score" ? It would make code more readable I believe ...
Offline
rvelices wrote:
Would you agree to change database "average_rate" to "rate_score" ? It would make code more readable I believe ...
No problem.
Offline
plg wrote:
rvelices wrote:
Would you agree to change database "average_rate" to "rate_score" ? It would make code more readable I believe ...
No problem.
Just realized sqlite does not allow to drop a column or change a column name (no idea how we will handle db updates on this engine) ... Still OK ?
Offline
rvelices wrote:
Just realized sqlite does not allow to drop a column or change a column name (no idea how we will handle db updates on this engine) ... Still OK ?
SQLite doesn't allow to drop a colum? Bad news. So it means that 101-database.php will fail on SQLite (I notify mistic100)
The only solution I've found is to rename the table, create the new table with the appropriate colums and perform an "insert into ... select ...", see answers on How do I rename a column in a SQLite database table?.
Offline