Announcement

  •  » Engine
  •  » Performance issue with high volume

#1 2017-10-28 17:17:07

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Performance issue with high volume

Preamble : to solve performance issues, we have implemented a "permission cache" in Piwigo.

More and more frequently I encounter a major performance issue with Piwigo when combining these conditions:

* many photos/albums, like 100k photos and 1k albums
* private albums or photos
* photos are being uploaded with web upload or API
* each photo is uploaded in a few seconds (small photos or big bandwidth)
* visitors are active on gallery side

The root of the problem here is that each time a photo is added, this cache is reset. Each time a visitor opens a page, the cache is rebuilt if not available. In the described situation, the cache is rebuilt each time a visitor opens a page. Rebuilding the cache may take 5 to 10 seconds.

Even "locking" the gallery does not really solve the problem, because cache is rebuilt even in this situation (but I can push a change to avoid rebuilding cache if the gallery is locked, I have implemented that quickly on Piwigo.com)

Improving cache time generation sounds very complicated to me. The solution I see for now is to avoid cache reset on each photo added. Unfortunately that's going to lead to incoherences in the database. In addition to we would need to put the added photo in a "pending state" (with images.level=64 for instance) and after X minutes (or any other event) reset cache and remove the "pending" state.

By the way, when I write about cache reset, I describe what function invalidate_user_cache does.

Any better idea?

Offline

 

#2 2017-10-28 17:30:02

mistic100
Former Piwigo Team
Lyon (FR)
2008-09-27
3277

Re: Performance issue with high volume

I think this is somehow linked to the still not implemented feature of "upload session" : know when a batch of photos has been loaded.

Smart Albums suffers the same problem where every rule is re-evaluated after each upload.

I spend a lot of time on https://alpha.wallhaven.cc/ these days. When you upload files they enter a pending state where you can edit them before publish (they did this on purpose to force users to add tags but the same principle can be done on Piwigo).

So I think your idea is the right one : enforce a pending state for every uploaded file. The user will be able to prepare it's photos (useful when you don't have correct metadata) and click a nice "Publish" button, and there you can reset the cache (among other things). This could be integrated in the batch manager. This will replace the "upload done" page we currently have.

If the upload was aborted (browser closed, etc.) make it clear on the admin homepage that there are pending files.

Offline

 

#3 2017-10-29 09:10:04

rvelices
Former Piwigo Team
2005-12-29
1960

Re: Performance issue with high volume

Not an easy situation... If there are many logged users with same rights we could eventually cache permissions only once for all the users with the same permission (but it's a breaking change... And won't work in all the cases)

Offline

 

#4 2017-10-30 14:10:00

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

rvelices wrote:

Not an easy situation... If there are many logged users with same rights we could eventually cache permissions only once for all the users with the same permission (but it's a breaking change... And won't work in all the cases)

I think it would only make the problem less problematic, but the problem would still be here :

* 07h45m12s : new photo added, invalidate_user_cache
* 07h45m13s : user u1 opens a page, cache rebuilt for u1
* 07h45m14s : user u1 opens a page, cache is fine
* 07h45m14s:  user u2 opens  a page, cache rebuilt for u2
* 07h45m15s: new photo added, invalidate_user_cache
* 07h45m16s:  user u1 opens a page, cache rebuilt for u1
* ...

Now imagine you have 100 visitors instead of 2. Every time each user opens a page, the cache is rebuilt :-/

Offline

 

#5 2017-10-30 14:20:58

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

mistic100 wrote:

I think this is somehow linked to the still not implemented feature of "upload session" : know when a batch of photos has been loaded.

I have implemented something in Community : [Github] Piwigo-community commit d7188274 API method community.images.uploadCompleted

I could implement an equivalent for Piwigo core. The problem is you're not 100% sure the API method will ever get fired (any problem can occur on browser side, as you say later in your message).

mistic100 wrote:

So I think your idea is the right one : enforce a pending state for every uploaded file. The user will be able to prepare it's photos (useful when you don't have correct metadata) and click a nice "Publish" button, and there you can reset the cache (among other things). This could be integrated in the batch manager. This will replace the "upload done" page we currently have.

If the upload was aborted (browser closed, etc.) make it clear on the admin homepage that there are pending files.

OK, so you're "validating" the basic idea of a "staging" state and you're introducing a "manual" publish action, while I was more thinking about an "automatic" action. For example, you keep in database a $conf['oldest_staging_image'] = <date>. In include/common.inc.php, if there is such a config setting and if its value is older than X minutes, perform the "publish action" automatically. The idea behind auto-publish is to keep the current workflow valid. But the manual-publish is interesting too and it could be a configuration option : publish => auto/manual

Offline

 

#6 2017-10-30 15:30:45

flop25
Piwigo Team
2006-07-06
7037

Re: Performance issue with high volume

I agree with the idea of a publishing step which has to be done manually by default or could be also done automatically.

Beside solving the technical issue, the idea behind is that people might upload wrongly their picture and a manual publishing step for verification is a additional safety measure quite comforting (the sure has full control, a full overview and time). When you have private and public stuff, or different privacy policies it can be a bit stressful.

The disadvantage is that it sort of disable the great advantage of never going to your admin panel by uploading thought api : so that case should be handle specifically

The workflow I was thinking involved an email sent after Xhours with a link to check the pending pictures and a link to directly publish the picture (but no thumbnail, just x files sent and when) ; in the case of files sent through the api, we could add a method in the api to trigger the email, so people sending from an third party app could directly publish by clicking the link (smartphone, long upload, email, click and shazam)

That email solution could be an additional solution on top of a config variable to automatically publish after Xtime ; what I mean is that I see that's technically one step further, so it could be done afterwards. The config var is necessary but the emails is just an enhancement of the feature.

BUT instead of one $conf['oldest_pending_files'] i would really prefer two, one for pictures from the admin panel and one for the "real" api uploads: because I would set a very short time for api upload and a longer one for web upload. Usually people are preparing much more carefully their uploads when it's done from a third party software (Lightroom for instance)

my 2 cents


To get a better help : Politeness like Hello-A link-Your past actions precisely described
Check my extensions : more than 30 available
who I am and what I do : http://fr.gravatar.com/flop25
My gallery : an illustration of how to integrate Piwigo in your website

Offline

 

#7 2017-10-30 15:39:01

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

Good point: staging photos is problematic for remote app (Lightroom, mobile apps, digiKam...) because it would require admins to go in the Piwigo admin :-/ In that case, automatic publish sounds mandatory to me.

I would also like to point out that web upload and API upload is quite the same: since Piwigo 2.7 and HTML5 upload, Piwigo web upload form uses API method pwg.images.upload, just like the iOS app.

Offline

 

#8 2017-10-30 15:50:52

flop25
Piwigo Team
2006-07-06
7037

Re: Performance issue with high volume

plg wrote:

I would also like to point out that web upload and API upload is quite the same: since Piwigo 2.7 and HTML5 upload, Piwigo web upload form uses API method pwg.images.upload, just like the iOS app.

yeah i Know but for the user that an entire different workflow (at least as I understand how our users work)


To get a better help : Politeness like Hello-A link-Your past actions precisely described
Check my extensions : more than 30 available
who I am and what I do : http://fr.gravatar.com/flop25
My gallery : an illustration of how to integrate Piwigo in your website

Offline

 

#9 2017-10-30 16:09:18

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

of course, I just wanted to point out that "technically speaking" web upload ~= API upload :-) I fully agree with you, on a user perspective, upload through Lightroom is totally different from web upload!

Offline

 

#10 2018-03-05 11:24:28

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

Short discussion with Barnie, a user impacted by this performance issue, asking if locking the destination album would avoid performance issues. The answer is no with current Piwigo, but I think it could be an interesting way to solve this.

Offline

 

#11 2018-07-19 17:27:10

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

Small note while I'm thinking about this issue : if the photos were added "orphans" (linked to no album), we would not need to reset user cache and at the end of the upload all uploaded photos are linked to the destination album. Just an idea.

Offline

 

#12 2021-03-16 18:06:27

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

Still a problem not fixed. See [Github] Piwigo issue #1367

Offline

 

#13 2021-03-19 20:05:38

homdax
Member
Sweden
2015-02-02
310

Re: Performance issue with high volume

Not performance related but you mention reset of user cache. That would be ok, but then we have the image thumbnails cache. Generated thumbnails or images created in different sizes. I have hotlinks based on the following URL composition that are important to keep.

Example:

Code:

https://www.lotroshots.net/_data/i/upload/2017/07/15/20170715001512-bf351c1d-xl.jpg
https://www.lotroshots.net/_data/i/upload/2017/07/15/20170715154712-1ea61e8f-xl.jpg
https://www.lotroshots.net/_data/i/upload/2017/07/20/20170720163629-07a38a3a-xl.jpg

etc... all the same.

I have asked before about this so if your potential changes may be included in an update it would be a small disaster for me and a few of my "gallerists" if that functionality was changed, reset or removed.

Maybe we are not even talking about the same thing...

Last edited by homdax (2021-03-19 20:06:24)

Offline

 

#14 2021-05-20 12:24:16

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

homdax wrote:

Maybe we are not even talking about the same thing...

That's not the same thing ;-)

Offline

 

#15 2021-05-20 12:25:40

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13820

Re: Performance issue with high volume

I'm working on this subject and I want to push a solution in Piwigo 12. I made an investigation on Piwigo.com accounts about the user cache generation time and it really depends on the number of photos. Especially because of this SQL query:

Code:

SELECT c.id AS cat_id, id_uppercat, global_rank,
  MAX(date_available) AS date_last, COUNT(date_available) AS nb_images
FROM categories as c
  LEFT JOIN image_category AS ic ON ic.category_id = c.id
  LEFT JOIN images AS i
    ON ic.image_id = i.id
      AND i.level<=8
  GROUP BY c.id
(this query time : 8.876 s)

and I don't see how to make it faster or make it simpler and use PHP to perform computation. So the solution still relies on the "reset user cache less often" principle.

I've been thinking about a "lounge" where photos added with function add_uploaded_photo:

Code:

  $use_lounge = true; // TODO make it smarter

  if (isset($categories) and count($categories) > 0)
  {
    if ($use_lounge)
    {
      fill_lounge(array($image_id), $categories);
    }
    else
    {
      associate_images_to_categories(array($image_id), $categories);
    }
  }

...

  if (!$use_lounge)
  {
    invalidate_user_cache();
  }

Instead of adding the photo in the albums (categories) we put them in the lounge. This way, it does not (should not) disturb the gallery. Photos are like "orphans".

Then we have a function to move photos from the lounge to actual albums they were planned to.

Code:

function empty_lounge()
{
  $max_image_id = 0;

  $query = '
SELECT
    image_id,
    category_id
  FROM '.LOUNGE_TABLE.'
  ORDER BY category_id ASC, image_id ASC
;';

  $rows = query2array($query);

  $images = array();
  foreach ($rows as $idx => $row)
  {
    if ($row['image_id'] > $max_image_id)
    {
      $max_image_id = $row['image_id'];
    }

    $images[] = $row['image_id'];

    if (!isset($rows[$idx+1]) or $rows[$idx+1]['category_id'] != $row['category_id'])
    {
      // if we're at the end of the loop OR if category changes
      associate_images_to_categories($images, array($row['category_id']));
      $images = array();
    }
  }

  $query = '
DELETE
  FROM '.LOUNGE_TABLE.'
  WHERE image_id <= '.$max_image_id.'
;';
  pwg_query($query);

  invalidate_user_cache();
}

I still have to think about when exactly to call empty_lounge function. I see 3 ways of doing it:

1) automatically at the end of upload by calling an API method pwg.images.emptyLounge
2) automatically if there are photos in the lounge for more than X seconds (let's say 300 seconds for example, but can be configured)
3) manually on the maintenance page

I plan to have 2 configuration settings:

Code:

// number of photos beyond which individual photos are added in the
// lounge, a temporary zone where photos wait before being "launched".
// 50k photos by default.
$conf['activate_lounge_threshold'] = 50000;

// Lounge is automatically emptied (photos are being pushed to their
// albums) when the oldest one reaches this duration. In seconds.
// 5 minutes by default.
$conf['lounge_max_duration'] = 5*60;

Offline

 
  •  » Engine
  •  » Performance issue with high volume

Board footer

Powered by FluxBB

github twitter newsletter Donate Piwigo.org © 2002-2024 · Contact