Announcement

  •  » Engine
  •  » improving average rate and sort order on best rated page

#1 2011-07-21 15:33:02

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

improving average rate and sort order on best rated page

rvelices wrote:

plg wrote:

I totally agree, this is partly why I didn't want to activate ratings. Maybe we could add a simple threshold of 10 rates before being displayed in the best rated page.

The best I could find and computationnaly feasible in Php/MySql is to replace the simple average rate with a bayesian average (http://en.wikipedia.org/wiki/Bayesian_average). In this formula we could choose C = average number of votes (for photos with at least one vote).
We could implment it in the core piwigo maybe.
What this formula does not solve is
a. time/age factor (a photo one year old had more chances to get votes than one added one week ago)
b. audience factor (public/private), number of links to get to this photo (if already in best rated, many tags/albums, google - more likely users will see it and score it), etc ...

I would like to find a solution for a. which I think is important but I dont have it yet...

Well, that seems interesting but it must remain understandable for someone who didn't follow any PhD in Statistics :-) This is why I think that a simple threshold minimum number of rates to appear on the best rated page sounds more appropriate to me.

Of course, this improvement must be done in Piwigo core (rating is completely part of Piwigo core, even if I would like to be move into a plugin for the future).

Concerning bullet a, what about something like only take into account rates younger than 30 (or N) days for Best Rated page


Latest blog post (May 3rd 2018) New subscription form

Offline

 

#2 2011-07-21 17:02:01

rvelices
Piwigo Team
2005-12-29
1957

Re: improving average rate and sort order on best rated page

plg wrote:

Well, that seems interesting but it must remain understandable for someone who didn't follow any PhD in Statistics :-)

Come on ... it's just a fancy name for an average with an aditional sum, multiplication and one division... (if you want the PhD version checkout Lower bound of Wilson score confidence interval for a Bernoulli parameter on http://www.evanmiller.org/how-not-to-so … ating.html and full explanation here: http://en.wikipedia.org/wiki/Binomial_p … e_interval)

plg wrote:

This is why I think that a simple threshold minimum number of rates to appear on the best rated page sounds more appropriate to me.

yes but you have to maintain the threshold correctly and when people vote first time, they will not see anything ...

plg wrote:

Concerning bullet a, what about something like only take into account rates younger than 30 (or N) days for Best Rated page

But then why should I go and vote if in N days it will be ignored ?

Offline

 

#3 2011-07-21 17:30:02

flop25
Piwigo Team
2006-07-06
6811

Re: improving average rate and sort order on best rated page

I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.

For ex, I'm looking at popularity contest a wp plugin, very good :
it calculates a score with weight values :
$akpc_settings['show_pop'] = 1;        // clickthrough from feed
$akpc_settings['show_help'] = 1;        // clickthrough from feed
$akpc_settings['ignore_authors'] = 1;        // clickthrough from feed
$akpc_settings['feed_value'] = 1;        // clickthrough from feed
$akpc_settings['home_value'] = 2;        // clickthrough from home
$akpc_settings['archive_value'] = 4;    // clickthrough from archive page
$akpc_settings['category_value'] = 6;    // clickthrough from category page
$akpc_settings['single_value'] = 10;    // full article page view
$akpc_settings['comment_value'] = 20;    // comment on article
$akpc_settings['pingback_value'] = 50;    // pingback on article
$akpc_settings['trackback_value'] = 80;    // trackback on article

and then it seems to just do :
        $result = $wpdb->query("
            UPDATE $wpdb->ak_popularity
            SET total = (home_views * $this->home_value)
                + (feed_views * $this->feed_value)
                + (archive_views * $this->archive_value)
                + (category_views * $this->category_value)
                + (tag_views * $this->tag_value)
                + (single_views * $this->single_value)
                + (searcher_views * $this->searcher_value)
                + (comments * $this->comment_value)
                + (pingbacks * $this->pingback_value)
                + (trackbacks * $this->trackback_value)
        ");
Finally the score displayed (for admin) is in % made with the top ranked post

This method is easy to parameter and to understand, but it doesn't take into account how old the element is : we could add a penalty due to the age.

Last edited by flop25 (2011-07-21 17:30:39)


To get a better help : Politeness like Hello-A link-Your past actions precisely described
Check my extensions : more than 30 available
who I am and what I do : http://fr.gravatar.com/flop25
My gallery : an illustration of how to integrate Piwigo in your website

Offline

 

#4 2011-07-22 10:50:54

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

Re: improving average rate and sort order on best rated page

rvelices wrote:

plg wrote:

Well, that seems interesting but it must remain understandable for someone who didn't follow any PhD in Statistics :-)

Come on ... it's just a fancy name for an average with an aditional sum, multiplication and one division...

OK, let's have a little example then :-)

We have 3 photos, photo1, photo2 and photo3 with the following ratings:

* photo1 : 5, 4, 4
* photo2 : 5
* photo3 : 4, 3, 4

The basic average rate is:

* photo1 : 4.33
* photo2 : 5.00
* photo3 : 3.66

And of course it makes photo2 rank in position #1 with only one rating. I would much prefer to see photo1 in #1 because it has received several good ratings (even if the average is slightly lower than photo2).

Now, I take the Bayesian average formulae, with m=4.33 (the average rate of a rated photo) and C=2.33 (average number of rate for each rated photo).

* photo1 : 4.33
* photo2 : 4.53
* photo3 : 3.95

(we keep the same "best rated" order with such a small dataset)

Is my computation correct?

Now take a bigger set (more ratings):

* photo1 : 5, 4, 4, 5, 5, 5
* photo2 : 5
* photo3 : 4, 3, 4, 4

Bayesian average with m=4.36 and C=3.66 :

* photo1 : 4.55
* photo2 : 4.50
* photo3 : 4.04

Which is starting to look much better :-)

Technical detail: we have to maintain values for m and C cached in the database and not recompute them every time we want to display an averate rate for a photo.

What seems to be a problem for me is that when a visitor gives the 5 stars rating for photo2, the average rate is not 5 but 4.50.


Latest blog post (May 3rd 2018) New subscription form

Offline

 

#5 2011-07-22 10:56:03

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

Re: improving average rate and sort order on best rated page

flop25 wrote:

I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.

I like this idea, but I don't think it's the same as ratings. Ratings implies a specific action from the visitor. He may display a photo several times and never rate it, on purpose.


Latest blog post (May 3rd 2018) New subscription form

Offline

 

#6 2011-07-22 11:22:19

flop25
Piwigo Team
2006-07-06
6811

Re: improving average rate and sort order on best rated page

plg wrote:

flop25 wrote:

I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.

I like this idea, but I don't think it's the same as ratings. Ratings implies a specific action from the visitor. He may display a photo several times and never rate it, on purpose.

When you -both of you- was talking about :

What this formula does not solve is
a. time/age factor (a photo one year old had more chances to get votes than one added one week ago)
b. audience factor

it's more about popularity than rating
That's two different subject : if you want to add more parameter in the rating, it's popularity. if it's about how to compute the meaning, that's rating
About the new computation you just have tried, it sounds good


To get a better help : Politeness like Hello-A link-Your past actions precisely described
Check my extensions : more than 30 available
who I am and what I do : http://fr.gravatar.com/flop25
My gallery : an illustration of how to integrate Piwigo in your website

Offline

 

#7 2011-07-22 11:54:59

rvelices
Piwigo Team
2005-12-29
1957

Re: improving average rate and sort order on best rated page

plg wrote:

Is my computation correct?

yes

plg wrote:

Technical detail: we have to maintain values for m and C cached in the database and not recompute them every time we want to display an averate rate for a photo.

Good point.... When we rate a picture, technically the "rating score" (note that I don't use average anymore) of all pictures will change. However in practice we don't need to recompute each time C/m (unless for a very small number of rates)... So two solutions
- we recompute global average / count each time
- we save in db global average / count and we recompute them only if new global count is bigger than 1% ... computing count(*) on a table does not require a table scan

In both cases we update only the newly rated item (we need from time to time to update all rates for all items with some algo ...)

plg wrote:

What seems to be a problem for me is that when a visitor gives the 5 stars rating for photo2, the average rate is not 5 but 4.50.

You are right we should not call it "average rate" or "bayesian average rate" but maybe "rating score" ? or an equivalent

Offline

 

#8 2011-07-22 11:58:00

rvelices
Piwigo Team
2005-12-29
1957

Re: improving average rate and sort order on best rated page

plg wrote:

flop25 wrote:

I think we should create a popularity score for each pics, and most popular page
The score will be a combination of view, rating, comments and age.

I like this idea, but I don't think it's the same as ratings. Ratings implies a specific action from the visitor. He may display a photo several times and never rate it, on purpose.

The popularity score might be difficult to compute and maintain in php/mySql (the rating score alone appears more complicated ...). But on the long run why not have "Best rated" and "Most popular" if we come up with a solution scalable  for large dbs ?

Offline

 

#9 2011-07-22 12:05:14

rvelices
Piwigo Team
2005-12-29
1957

Re: improving average rate and sort order on best rated page

rvelices wrote:

plg wrote:

Is my computation correct?

yes

in fact we could simplify the m computation as being as the average of all rates ... for the first example m would be 4.14 instead of 4.33

Offline

 

#10 2011-07-22 12:22:24

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

Re: improving average rate and sort order on best rated page

rvelices wrote:

So two solutions
- we recompute global average / count each time
- we save in db global average / count and we recompute them only if new global count is bigger than 1% ... computing count(*) on a table does not require a table scan

Computing C and m does not cost a lot if we only do it each time a rating is added/modified/deleted. But C and m is not the main problem, and as you say:

rvelices wrote:

In both cases we update only the newly rated item (we need from time to time to update all rates for all items with some algo ...)

To make things simple, I would say to update all images.average_rate each time a rating is added/modified/deleted (to any photo in the gallery). With mass_updates, it's not a big deal (unless we have a gallery with 50k photos rated :-/). If it's not efficient enough, then yes we can keep a cached value "average_rate_last_compute_on_rates_global_count" and if it is 1% bigger than "rates_global_count", then we recompute all images.average_rate and reset "average_rate_last_compute_on_rates_global_count"

Other question: do C and m apply on the

Other question: do C and m apply on the whole gallery or do we take permissions into account? => I would say "no" because it would mean that the rating score (images.average_rate) is different for each user :-/

rvelices wrote:

You are right we should not call it "average rate" or "bayesian average rate" but maybe "rating score" ? or an equivalent

rating score sounds nice.


Latest blog post (May 3rd 2018) New subscription form

Offline

 

#11 2011-07-22 12:26:31

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

Re: improving average rate and sort order on best rated page

rvelices wrote:

in fact we could simplify the m computation as being as the average of all rates ... for the first example m would be 4.14 instead of 4.33

OK. It doesn't change a lot:

* photo1: 4.25
* photo2: 4.40
* photo3: 3.87


Latest blog post (May 3rd 2018) New subscription form

Offline

 

#12 2011-07-22 13:01:54

rvelices
Piwigo Team
2005-12-29
1957

Re: improving average rate and sort order on best rated page

plg wrote:

...To make things simple, I would say to update all images.average_rate each time a rating is...

Agreed

plg wrote:

Other question: do C and m apply on the whole gallery or do we take permissions into account? => I would say "no" because it would mean that the rating score (images.average_rate) is different for each user :-/

Let's tart with no time effect, no permission effect... We'll see later

plg wrote:

rating score sounds nice.

Would you agree to change database "average_rate" to "rate_score" ? It would make code more readable I believe ...

Offline

 

#13 2011-07-22 13:09:43

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

Re: improving average rate and sort order on best rated page

rvelices wrote:

Would you agree to change database "average_rate" to "rate_score" ? It would make code more readable I believe ...

No problem.


Latest blog post (May 3rd 2018) New subscription form

Offline

 

#14 2011-07-22 14:58:50

rvelices
Piwigo Team
2005-12-29
1957

Re: improving average rate and sort order on best rated page

plg wrote:

rvelices wrote:

Would you agree to change database "average_rate" to "rate_score" ? It would make code more readable I believe ...

No problem.

Just realized sqlite does not allow to drop a column or change a column name (no idea how we will handle db updates on this engine) ... Still OK ?

Offline

 

#15 2011-07-22 15:22:05

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13140

Re: improving average rate and sort order on best rated page

rvelices wrote:

Just realized sqlite does not allow to drop a column or change a column name (no idea how we will handle db updates on this engine) ... Still OK ?

SQLite doesn't allow to drop a colum? Bad news. So it means that 101-database.php will fail on SQLite (I notify mistic100)

The only solution I've found is to rename the table, create the new table with the appropriate colums and perform an "insert into ... select ...", see answers on How do I rename a column in a SQLite database table?.


Latest blog post (May 3rd 2018) New subscription form

Offline

 
  •  » Engine
  •  » improving average rate and sort order on best rated page

Board footer

Powered by FluxBB

github twitter facebook google+ newsletter Donate Piwigo.org © 2002-2018 · Contact