Announcement

  •  » Requests
  •  » Synchronizing with large storage area - works now, a few hints?

#1 2008-07-01 14:32:30

dustin
Member
2008-07-01
3

Synchronizing with large storage area - works now, a few hints?

Hello,

I am new here, please excuse me if I have overlooked something.

First I would like to thank all developers for their wonderful work. I was looking for a solution for indexing and presenting thousands of images scattered throughout our company file storage area. It was rather simple to modify the well-structured code of your project. I would like to discuss the hacks I had to implement to make phpwebgallery achieve this goal (which it does marvelously now).

Some specifications: tree with 5k+ folders with spaces and diacritics located on linux server, 10k+ images located randomly, most of the folders contain no images. No need for the upload function, only synchronization. Stable release 1.7, not from SVN.

1. Folders with spaces + diacritics
--------------------------------------------------
Implementing support properly would require more changes. I needed only the synchronization, thumbnailing and listing functions. Removing the checks for allowed chars in directory and file names in the code paths + adding mysql_real_escape_string($insert[$dbfield]) in mass_inserts fixed the problem. Adding some rawurlencode code to URL formating made the browser download images correctly. Ugly hacks, certainly not all cases fixed, but it works OK now.

2. Huge number of folders with no images
------------------------------------------------------------
I split this to two parts:

Avoiding "picture-empty" folders in the categories list in functions_html.inc.php: get_html_menu_category  by adding a simple check in the while loop:
  foreach ($categories as $category)                                           
  {
    if ($category['count_images'] == 0) continue;
       ......


Secondly, skipping the representative selection code for empty categories  in category_cats.inc.php. For empty categories the existing caching code does not work, leading to generation of the representating image id every time (with empty result as the category is empty). A simple check fixed that:

while ($row = mysql_fetch_assoc($result))                                       
{     
  if ($row['count_images'] == 0)
        continue;
....

Perhaps these two hacks (if actually correct - they seem to work for me) could be considered for submitting to SVN.

I just thought I should give a feedback, perhaps some of the issues will find their way to the trunk code.

Again, thanks a lot for the great work. Of course, I would be happy to discuss this use scenario more.

Pavel.

Offline

 

#2 2008-07-01 15:11:45

VDigital
Former Piwigo Team
Paris (FR)
2005-05-04
17680

Re: Synchronizing with large storage area - works now, a few hints?


Piwigo.com: Start and run your own photo gallery. Signup and get 30 days to try for free, no commitment.
8-)

Offline

 

#3 2008-07-01 15:26:49

rvelices
Former Piwigo Team
2005-12-29
1960

Re: Synchronizing with large storage area - works now, a few hints?

I answered on the bug [Bugtracker] ticket 832 ...

Offline

 

#4 2008-07-01 15:33:27

rvelices
Former Piwigo Team
2005-12-29
1960

Re: Synchronizing with large storage area - works now, a few hints?

dustin wrote:

Implementing support properly would require more changes. I needed only the synchronization, thumbnailing and listing functions. Removing the checks for allowed chars in directory and file names in the code paths + adding mysql_real_escape_string($insert[$dbfield]) in mass_inserts fixed the problem. Adding some rawurlencode code to URL formating made the browser download images correctly. Ugly hacks, certainly not all cases fixed, but it works OK now.

you might have some side effects with the escape in mass_inserts, because it is widely used and all other places already escape data when necessary. so you will end up with double escaping...

anyway in the next version we will go fully to utf-8 and you might have to do some aditionnal work to change directory names as returned by php function from iso8859 to utf8

Wher did you add urlencode? I think get_thumbnail_url, get_element_url, get_image_url, get_high_url should be enough ...

Last edited by rvelices (2008-07-01 15:35:05)

Offline

 

#5 2008-07-01 22:01:56

dustin
Member
2008-07-01
3

Re: Synchronizing with large storage area - works now, a few hints?

Thanks a lot for considering this usage scenario. The "patches" are more like a proof of concept, not really suitable for SVN commit.

you might have some side effects with the escape in mass_inserts, because it is widely used and all other places already escape data when necessary. so you will end up with double escaping...

Well, in version 1.17 I did not find any mysql_xxxx_escape_string in the synchronization code path. Nevertheless, it is just an ugly hack, I know :)

anyway in the next version we will go fully to utf-8 and you might have to do some aditionnal work to change directory names as returned by php function from iso8859 to utf8

I have no problem with converting the data in DB. The question is - will you support various encodings on filesystem? E.g. our filesystem has iso-8859-2, I assume this will have to be converted to/from utf-8 in PHP. It is rather difficult for an outsider to find all appropriate places. Are you planning some kind of conversion layer from/to UTF-8 to allow for various filesystem encodings?

Wher did you add urlencode? I think get_thumbnail_url, get_element_url, get_image_url, get_high_url should be enough ...

Just get_element_url, get_thumbnail_url and it seems OK. I used an ugly code which just works:

$url = str_replace("%2F", '/', rawurlencode(get_element_location($element_info)));


I'm not sure that the first hack should go under svn. It can be easily achieved by a plugin on the event get_categories_menu_sql_where and adding the test count_images>0. Personally I would like to see all categories in the menu ....

It is definitely a candidate for a configuration option or even a plugin.

Last edited by dustin (2008-07-01 22:16:27)

Offline

 

#6 2008-07-02 21:56:34

rvelices
Former Piwigo Team
2005-12-29
1960

Re: Synchronizing with large storage area - works now, a few hints?

dustin wrote:

Are you planning some kind of conversion layer from/to UTF-8 to allow for various filesystem encodings?

Unfortunately this is not easy feasible for all combinations of linux/windows / php version / mysql version / web server version / pwg character encoding ...
There are some cases where php is not able to read some characters with diacritics from the filesystem, which will result in things not working correctly. In addition moving a gallery to a new provider will result in things being broken.
That's the reason why we allow only those standard characters...

Offline

 

#7 2008-07-03 00:16:00

mathiasm
Former Piwigo Team
2006-02-06
2650

Re: Synchronizing with large storage area - works now, a few hints?

rvelices wrote:

dustin wrote:

Are you planning some kind of conversion layer from/to UTF-8 to allow for various filesystem encodings?

Unfortunately this is not easy feasible for all combinations of linux/windows / php version / mysql version / web server version / pwg character encoding ...
There are some cases where php is not able to read some characters with diacritics from the filesystem, which will result in things not working correctly. In addition moving a gallery to a new provider will result in things being broken.
That's the reason why we allow only those standard characters...

Maybe we can have a joker character (e.g. +) which will replace all non-translatable chars in the name
Example:
déjà : imagine é can be "imported" to utf-8 from the filesystem but à cannot. Then we may have déj+ as the name. This allow to change the name in the database after the import by serching for the joker.

Just an idea.

Mathias

Offline

 

#8 2008-07-03 02:52:30

rvelices
Former Piwigo Team
2005-12-29
1960

Re: Synchronizing with large storage area - works now, a few hints?

mathiasm wrote:

Maybe we can have a joker character (e.g. +) which will replace all non-translatable chars in the name
Example:
déjà : imagine é can be "imported" to utf-8 from the filesystem but à cannot. Then we may have déj+ as the name. This allow to change the name in the database after the import by serching for the joker.

Just an idea.

Mathias

And when I sync a second time ? Better change the filename ...

Offline

 

#9 2008-07-03 21:37:36

mathiasm
Former Piwigo Team
2006-02-06
2650

Re: Synchronizing with large storage area - works now, a few hints?

rvelices wrote:

mathiasm wrote:

Maybe we can have a joker character (e.g. +) which will replace all non-translatable chars in the name
Example:
déjà : imagine é can be "imported" to utf-8 from the filesystem but à cannot. Then we may have déj+ as the name. This allow to change the name in the database after the import by serching for the joker.

Just an idea.

Mathias

And when I sync a second time ? Better change the filename ...

Real names are kept with the joker, and display names are still there ( directory name is #categories.dir, display name is #categories.name ). Don't know if we reset name when synchronizing, though.

Offline

 

#10 2008-07-04 11:15:31

dustin
Member
2008-07-01
3

Re: Synchronizing with large storage area - works now, a few hints?

Gentlemen, thank you for your involvement. Please, could we make this char translation/rewriting configurable or at least optional? Our use scenario which I find rather usefull for many organizations requires no char rewriting. As a matter of fact, changing file names on a live file server is not desirable.

I understand a special attention must be paid to the upload code path as that is the way to produce weird file names unsupported by the filesystem.

Portability of the DB contents between providers is certainly an important aspect, nevertheless non-portability could be optional too.

Thanks a lot for considering the "non-intrusive tolerant" mode too.

Offline

 

#11 2015-02-26 14:06:11

bocman
Member
2015-02-18
42

Re: Synchronizing with large storage area - works now, a few hints?

Hello!
As I see in http://piwigo.org/bugs/view.php?id=2995 request of exclude non-images/empty folders was solved, but in Piwigo 2.7 I always see empty folders.

How can I exclude them from synchronization?

Offline

 

#12 2015-02-26 14:13:35

mistic100
Former Piwigo Team
Lyon (FR)
2008-09-27
3277

Re: Synchronizing with large storage area - works now, a few hints?

what ? this bug is absolutely not related

also you already have a topic about empty folders, please not post everywhere

Offline

 
  •  » Requests
  •  » Synchronizing with large storage area - works now, a few hints?

Board footer

Powered by FluxBB

github twitter newsletter Donate Piwigo.org © 2002-2024 · Contact