Announcement

  •  » Requests
  •  » How do I add support for Chinese tag?

#1 2009-07-18 13:32:20

weihuisen
Member
2009-07-18
3

How do I add support for Chinese tag?

How do I add support for Chinese tag?
How to get Piwigo to support Chinese encoding format, namely UTF-8 and GBK encoding?

Last edited by weihuisen (2009-07-18 15:16:43)

Offline

 

#2 2009-07-18 14:01:13

weihuisen
Member
2009-07-18
3

Re: How do I add support for Chinese tag?

Why can not I add Chinese tags

Offline

 

#3 2009-07-18 14:57:01

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13196

Re: How do I add support for Chinese tag?

In [Administration>Images>Tags], I've tried to create a chinese tag and I get the error:

[mysql error 1048] Column 'url_name' cannot be null

The problem is that next to the name of the tag (that can be a full chinese string) we need a "url_name" with ASCII characters only. There is an algorithm that transform the tag name into an ASCII equivalent, but in the case of a chinese string, there is no equivalent, so the result string is null, which is not allowed by the data model.

When I see URLs in chinese wikipedia, I think that the constraint to have only ASCII characters is useless. I have to make some tests but the solution I have currently in mind is to have the chinese string as "url_name".

Offline

 

#4 2009-07-18 15:17:55

weihuisen
Member
2009-07-18
3

Re: How do I add support for Chinese tag?

thank you for my question
So, what's the solution for this problem ?
i have no idea...

Offline

 

#5 2009-07-19 00:12:05

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13196

Re: How do I add support for Chinese tag?

[Bugtracker, ticket 1063, fixed] error when adding a tag with chinese only characters

For testing purpose, I've simplified str2url function to only perform:

$res = str_replace(' ','_',$str);

(and no call to remove_accent) and it works very nice.

weihuisen, if you want your Piwigo to work with chinese tags, please do the following: in include/functions.inc.php, function str2url, replace:

Code:

   $str = remove_accents($str);
   $str = preg_replace('/[^a-z0-9_\s\'\:\/\[\],-]/','',strtolower($str));
   $str = preg_replace('/[\s\'\:\/\[\],-]+/',' ',trim($str));

by

Code:

//   $str = remove_accents($str);
//   $str = preg_replace('/[^a-z0-9_\s\'\:\/\[\],-]/','',strtolower($str));
//   $str = preg_replace('/[\s\'\:\/\[\],-]+/',' ',trim($str));

Next is a development discussion, I've invited rvelices to come and give his opinion about this topic.

In [Subversion] r1119, plg (that's me) added the str2url from the Dotclear project, which was pretty simple and could handle only iso-8859-1 characters. In [Subversion] r2123 rvelices has added the remove_accents function from WordPress and used it to handle utf-8 characters.

But my question is : do we really need to replace utf8 characters? When I see URLs generated by Wikipedia, I think it may be OK to have utf8 characters in URL (and my very simple test tends to confirm this thought).

Offline

 

#6 2009-07-19 07:46:56

rvelices
Piwigo Team
2005-12-29
1960

Re: How do I add support for Chinese tag?

There should not be a major issue with doing that; just he url_name should be url encoded somewhere (either database as wordpress or before outputing it to the browser).

I think that I still prefer to keep the current behaviour for latin1 characters (lowercase, remove accents etc...) and also replace one or more spaces/punctuation signs with a dash or underscore ... This is how flickr works I believe; otherwise we loose all interest for url name

Offline

 

#7 2009-07-20 01:32:33

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13196

Re: How do I add support for Chinese tag?

rvelices wrote:

I think that I still prefer to keep the current behaviour for latin1 characters (lowercase, remove accents etc...)  [...] otherwise we loose all interest for url name

Can you give more details about that? What is the interest of names in url (there are several interests I suppose, and you seem to say that we would lost one of them with my proposition).

rvelices wrote:

and also replace one or more spaces/punctuation signs with a dash or underscore

I agree, I see at least the "/" (in case of multiple tags, "/" is used to separate them)

Offline

 

#8 2009-07-20 13:59:21

rvelices
Piwigo Team
2005-12-29
1960

Re: How do I add support for Chinese tag?

plg wrote:

rvelices wrote:

I think that I still prefer to keep the current behaviour for latin1 characters (lowercase, remove accents etc...)  [...] otherwise we loose all interest for url name

Can you give more details about that? What is the interest of names in url (there are several interests I suppose, and you seem to say that we would lost one of them with my proposition).

I was talking about url name in terms of the database column. So to summarize
- I would like the tags to be accessible only by url name (as today through config; instead of id-url_name by default)
- I would like the latin1 characters to work as today for 2 reasons
  - keep the urls as today
  - avoid errors due to url copy/paste or typing in the browser; for example the é character is url encoded in utf-8 as %C3%A9 and in iso-8859 as %E9 . If you type the character, the browser will choose one encoding, but you don't control which one
- There are pretty many changes in the code to have this working correctly (urlencode) and please note that QUERY_STRING and REQUEST_URI will not be url decoded, so an aditionnal check is required here ...

Offline

 

#9 2009-09-01 11:48:57

Sinux
Guest

Re: How do I add support for Chinese tag?

why not use a "pinyin"( means "Bopomofo") class/function?
like this:
http://www.eb163.com/article.php?id=235
trans a chinese char into a pinyin , chinese people can read it , and the url friendly!

Code:

<?php

/***************************************************************************
* Pinyin.php
* ------------------------------
* Date : Nov 7, 2006
* Copyright : 修改自网络代码,版权归原作者所有
* Mail :
* Desc. : 拼音转换
* History :
* Date :
* Author :
* Modif. :
* Usage Example :
***************************************************************************/

function Pinyin($_String, $_Code='gb2312')
{
$_DataKey = "a|ai|an|ang|ao|ba|bai|ban|bang|bao|bei|ben|beng|bi|bian|biao|bie|bin|bing|bo|bu|ca|cai|can|cang|cao|ce|ceng|cha".
"|chai|chan|chang|chao|che|chen|cheng|chi|chong|chou|chu|chuai|chuan|chuang|chui|chun|chuo|ci|cong|cou|cu|".
"cuan|cui|cun|cuo|da|dai|dan|dang|dao|de|deng|di|dian|diao|die|ding|diu|dong|dou|du|duan|dui|dun|duo|e|en|er".
"|fa|fan|fang|fei|fen|feng|fo|fou|fu|ga|gai|gan|gang|gao|ge|gei|gen|geng|gong|gou|gu|gua|guai|guan|guang|gui".
"|gun|guo|ha|hai|han|hang|hao|he|hei|hen|heng|hong|hou|hu|hua|huai|huan|huang|hui|hun|huo|ji|jia|jian|jiang".
"|jiao|jie|jin|jing|jiong|jiu|ju|juan|jue|jun|ka|kai|kan|kang|kao|ke|ken|keng|kong|kou|ku|kua|kuai|kuan|kuang".
"|kui|kun|kuo|la|lai|lan|lang|lao|le|lei|leng|li|lia|lian|liang|liao|lie|lin|ling|liu|long|lou|lu|lv|luan|lue".
"|lun|luo|ma|mai|man|mang|mao|me|mei|men|meng|mi|mian|miao|mie|min|ming|miu|mo|mou|mu|na|nai|nan|nang|nao|ne".
"|nei|nen|neng|ni|nian|niang|niao|nie|nin|ning|niu|nong|nu|nv|nuan|nue|nuo|o|ou|pa|pai|pan|pang|pao|pei|pen".
"|peng|pi|pian|piao|pie|pin|ping|po|pu|qi|qia|qian|qiang|qiao|qie|qin|qing|qiong|qiu|qu|quan|que|qun|ran|rang".
"|rao|re|ren|reng|ri|rong|rou|ru|ruan|rui|run|ruo|sa|sai|san|sang|sao|se|sen|seng|sha|shai|shan|shang|shao|".
"she|shen|sheng|shi|shou|shu|shua|shuai|shuan|shuang|shui|shun|shuo|si|song|sou|su|suan|sui|sun|suo|ta|tai|".
"tan|tang|tao|te|teng|ti|tian|tiao|tie|ting|tong|tou|tu|tuan|tui|tun|tuo|wa|wai|wan|wang|wei|wen|weng|wo|wu".
"|xi|xia|xian|xiang|xiao|xie|xin|xing|xiong|xiu|xu|xuan|xue|xun|ya|yan|yang|yao|ye|yi|yin|ying|yo|yong|you".
"|yu|yuan|yue|yun|za|zai|zan|zang|zao|ze|zei|zen|zeng|zha|zhai|zhan|zhang|zhao|zhe|zhen|zheng|zhi|zhong|".
"zhou|zhu|zhua|zhuai|zhuan|zhuang|zhui|zhun|zhuo|zi|zong|zou|zu|zuan|zui|zun|zuo";

$_DataValue = "-20319|-20317|-20304|-20295|-20292|-20283|-20265|-20257|-20242|-20230|-20051|-20036|-20032|-20026|-20002|-19990".
"|-19986|-19982|-19976|-19805|-19784|-19775|-19774|-19763|-19756|-19751|-19746|-19741|-19739|-19728|-19725".
"|-19715|-19540|-19531|-19525|-19515|-19500|-19484|-19479|-19467|-19289|-19288|-19281|-19275|-19270|-19263".
"|-19261|-19249|-19243|-19242|-19238|-19235|-19227|-19224|-19218|-19212|-19038|-19023|-19018|-19006|-19003".
"|-18996|-18977|-18961|-18952|-18783|-18774|-18773|-18763|-18756|-18741|-18735|-18731|-18722|-18710|-18697".
"|-18696|-18526|-18518|-18501|-18490|-18478|-18463|-18448|-18447|-18446|-18239|-18237|-18231|-18220|-18211".
"|-18201|-18184|-18183|-18181|-18012|-17997|-17988|-17970|-17964|-17961|-17950|-17947|-17931|-17928|-17922".
"|-17759|-17752|-17733|-17730|-17721|-17703|-17701|-17697|-17692|-17683|-17676|-17496|-17487|-17482|-17468".
"|-17454|-17433|-17427|-17417|-17202|-17185|-16983|-16970|-16942|-16915|-16733|-16708|-16706|-16689|-16664".
"|-16657|-16647|-16474|-16470|-16465|-16459|-16452|-16448|-16433|-16429|-16427|-16423|-16419|-16412|-16407".
"|-16403|-16401|-16393|-16220|-16216|-16212|-16205|-16202|-16187|-16180|-16171|-16169|-16158|-16155|-15959".
"|-15958|-15944|-15933|-15920|-15915|-15903|-15889|-15878|-15707|-15701|-15681|-15667|-15661|-15659|-15652".
"|-15640|-15631|-15625|-15454|-15448|-15436|-15435|-15419|-15416|-15408|-15394|-15385|-15377|-15375|-15369".
"|-15363|-15362|-15183|-15180|-15165|-15158|-15153|-15150|-15149|-15144|-15143|-15141|-15140|-15139|-15128".
"|-15121|-15119|-15117|-15110|-15109|-14941|-14937|-14933|-14930|-14929|-14928|-14926|-14922|-14921|-14914".
"|-14908|-14902|-14894|-14889|-14882|-14873|-14871|-14857|-14678|-14674|-14670|-14668|-14663|-14654|-14645".
"|-14630|-14594|-14429|-14407|-14399|-14384|-14379|-14368|-14355|-14353|-14345|-14170|-14159|-14151|-14149".
"|-14145|-14140|-14137|-14135|-14125|-14123|-14122|-14112|-14109|-14099|-14097|-14094|-14092|-14090|-14087".
"|-14083|-13917|-13914|-13910|-13907|-13906|-13905|-13896|-13894|-13878|-13870|-13859|-13847|-13831|-13658".
"|-13611|-13601|-13406|-13404|-13400|-13398|-13395|-13391|-13387|-13383|-13367|-13359|-13356|-13343|-13340".
"|-13329|-13326|-13318|-13147|-13138|-13120|-13107|-13096|-13095|-13091|-13076|-13068|-13063|-13060|-12888".
"|-12875|-12871|-12860|-12858|-12852|-12849|-12838|-12831|-12829|-12812|-12802|-12607|-12597|-12594|-12585".
"|-12556|-12359|-12346|-12320|-12300|-12120|-12099|-12089|-12074|-12067|-12058|-12039|-11867|-11861|-11847".
"|-11831|-11798|-11781|-11604|-11589|-11536|-11358|-11340|-11339|-11324|-11303|-11097|-11077|-11067|-11055".
"|-11052|-11045|-11041|-11038|-11024|-11020|-11019|-11018|-11014|-10838|-10832|-10815|-10800|-10790|-10780".
"|-10764|-10587|-10544|-10533|-10519|-10331|-10329|-10328|-10322|-10315|-10309|-10307|-10296|-10281|-10274".
"|-10270|-10262|-10260|-10256|-10254";
$_TDataKey = explode('|', $_DataKey);
$_TDataValue = explode('|', $_DataValue);

$_Data = (PHP_VERSION>='5.0') ? array_combine($_TDataKey, $_TDataValue) : _Array_Combine($_TDataKey, $_TDataValue);
arsort($_Data);
reset($_Data);

if($_Code != 'gb2312') $_String = _U2_Utf8_Gb($_String);
$_Res = '';
for($i=0; $i<strlen($_String); $i++)
{
$_P = ord(substr($_String, $i, 1));
if($_P>160) { $_Q = ord(substr($_String, ++$i, 1)); $_P = $_P*256 + $_Q - 65536; }
$_Res .= _Pinyin($_P, $_Data);
}
return preg_replace("/[^a-z0-9]*/", '', $_Res);
}

function _Pinyin($_Num, $_Data)
{
if ($_Num>0 && $_Num<160 ) return chr($_Num);
elseif($_Num<-20319 || $_Num>-10247) return '';
else {
foreach($_Data as $k=>$v){ if($v<=$_Num) break; }
return $k;
}
}

function _U2_Utf8_Gb($_C)
{
$_String = '';
if($_C < 0x80) $_String .= $_C;
elseif($_C < 0x800)
{
$_String .= chr(0xC0 | $_C>>6);
$_String .= chr(0x80 | $_C & 0x3F);
}elseif($_C < 0x10000){
$_String .= chr(0xE0 | $_C>>12);
$_String .= chr(0x80 | $_C>>6 & 0x3F);
$_String .= chr(0x80 | $_C & 0x3F);
} elseif($_C < 0x200000) {
$_String .= chr(0xF0 | $_C>>18);
$_String .= chr(0x80 | $_C>>12 & 0x3F);
$_String .= chr(0x80 | $_C>>6 & 0x3F);
$_String .= chr(0x80 | $_C & 0x3F);
}
return iconv('UTF-8', 'GB2312', $_String);
}

function _Array_Combine($_Arr1, $_Arr2)
{
for($i=0; $i<count($_Arr1); $i++) $_Res[$_Arr1[$i]] = $_Arr2[$i];
return $_Res;
}


echo Pinyin('这是小超的网站,欢迎访问http://www.eb163.com'); //默认是gb编码
echo Pinyin('这是WEB编程网',1); //第二个参数随意设置即为utf8编码



?>
 

#10 2009-09-02 00:44:18

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13196

Re: How do I add support for Chinese tag?

Thank you Sinux for the tip.

Offline

 

#11 2010-05-04 01:37:59

plg
Piwigo Team
Nantes, France, Europe
2002-04-05
13196

Re: How do I add support for Chinese tag?

OK, before release 2.1, I wanted to have this bug fixed, really.

The code change may seem ridiculous on [Subversion] r6060 but I took the time to make many tests, due to rvelices warnings.

str2url doesn't return an empty string. In case the string replacements produce an empty string, then str2url returns the incoming string.

With my available environements (Linux with Firefox 3.0/Google Chrome 5, MacOS 10.6 with Firefox/Safari, WindowsXP with IE6/Firefox), the reference URL is: http://192.168.0.12/piwigo/dev/trunk/in … стить
I've performed the following tests:
* on a given operating system, copy/paste of the URL  between the 2 web browsers
* from one operating system to another, send the URL by email (sent with Thunderbird or Gmail)

I've encountered 0 problem (I was even a bit surprised).

Offline

 
  •  » Requests
  •  » How do I add support for Chinese tag?

Board footer

Powered by FluxBB

github twitter facebook google+ newsletter Donate Piwigo.org © 2002-2019 · Contact