Piwigo Bugtracker

Piwigo bug tracker has moved to Github

This bugtracker is kept to provide history on old issues.


View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0002801Piwigootherpublic2012.12.11 15:122015.02.11 12:59
Reportersakanaou 
Assigned Toplg 
PrioritynormalSeveritytweakReproducibilityalways
StatusassignedResolutionopen 
PlatformOSOS Version
Product Version2.4.5 
Target Version2.8.0beta1Fixed in Version 
Summary0002801: [Batch Manager] Find duplicates based on md5sum
DescriptionI recently noticed that the Batch Manager finds its duplicates based on the original file name. This is very confusing as normally duplicated files get uploaded because they happen to have different filenames. It is much safer to determine duplicates based on the file's md5sum, which is calculated after upload and stored in the database anyway.

I have created a small patch that searches for duplicates based on the md5sum instead of filename and will attach it here.
TagsNo tags attached.
browserany
Database engine and versionMySQL 5.x
PHP version5.3
Web serverApache 2.x
Attached Filesdiff file icon piwigo_batchmanager_md5sum_duplicate.diff [^] (704 bytes) 2012.12.11 15:12 [Show Content]

- Relationships

-  Notes
(0006809)
thimo (reporter)
2013.01.20 17:16

As mentioned in the forum the property "uniqueness_mode" should be incorporated:
http://piwigo.org/forum/viewtopic.php?id=20983 [^]

File /admin/batch_manager.php:

    // perform 2 queries instead. We hope there are not too many duplicates.

+ $field = ($conf['uniqueness_mode'] == 'md5sum') ? 'md5sum' : 'file';

     $query = '
-SELECT file
+SELECT '.$field.'
   FROM '.IMAGES_TABLE.'
- GROUP BY file
+ GROUP BY '.$field.'
   HAVING COUNT(*) > 1
 ;';
- $duplicate_files = array_from_query($query, 'file');
+ $duplicate_files = array_from_query($query, $field);
 
     $query = '
 SELECT id
   FROM '.IMAGES_TABLE.'
- WHERE file IN (\''.implode("','", $duplicate_files).'\')
+ WHERE '.$field.' IN (\''.implode("','", $duplicate_files).'\')
 ;';
 
     array_push(

And add a hash to imports from the galleries folder (file /admin/site_update.php):

    $insert = array(
      'id' => $next_element_id++,
      'file' => $filename,
      'name' => get_name_from_file($filename),
      'date_available' => CURRENT_DATE,
      'path' => $path,
      'representative_ext' => $fs[$path]['representative_ext'],
      'storage_category_id' => $db_fulldirs[$dirname],
      'added_by' => $user['id'],
+ 'md5sum' => md5_file($path),
      );
(0007749)
msakik (reporter)
2015.02.11 12:59

Hi, based on piwigo 2.7 this could add a checkmark to find by md5sum. This also could add an option to uncheck the filename option.

- Issue History
Date Modified Username Field Change
2012.12.11 15:12 sakanaou New Issue
2012.12.11 15:12 sakanaou File Added: piwigo_batchmanager_md5sum_duplicate.diff
2012.12.11 15:12 sakanaou browser => any
2012.12.11 15:12 sakanaou Database engine and version => MySQL 5.x
2012.12.11 15:12 sakanaou PHP version => 5.3
2012.12.11 15:12 sakanaou Web server => Apache 2.x
2013.01.20 17:16 thimo Note Added: 0006809
2014.09.01 20:39 plg Assigned To => plg
2014.09.01 20:39 plg Status new => assigned
2014.09.01 20:39 plg Target Version => 2.8.0beta1
2015.02.11 12:59 msakik Note Added: 0007749


Copyright © 2000 - 2018 MantisBT Team
Contact
Powered by Mantis Bugtracker