Jump to content

Requesting for Adminstration

nice

ebraminiotalk18:24, 9 April 2011

Please explain your problem. I can provide a list of pages in a particular language.

Siebrand19:09, 9 April 2011

This is good, but the thing that I really need is having a tool for finding occurrence of a regex (and chars) on that pages. I know with list of my language translations I can do that with pywikipedia (I used pywikipedia in my wikipedia for a lot propuses). But this can done better and easier without that.

For example we have some hidden character in Persian (zwnj, zwj and direction control chars that is common in RTL languages). If I had a tool for finding misused occurrence of them in translation, I could easily fix them. For example this regex needed for my language (and some languages with Arabic script): s/([ازرذدژوة])\u200c/\1/g also finding non-standard chars (for my languages) with Special:Search is not possible because Special:Search only yields occurrence of words and not chars.

ebraminiotalk19:57, 9 April 2011

Using regex on over a million pages in translatewiki.net will be blocking our database for too long. We really cannot make that functionality available. If you were to provide the regexes you are looking for, we will try and find a way to help you out.

Siebrand23:49, 9 April 2011
Edited by author.
Last edit: 11:11, 11 April 2011

Okay, Thanks for your great jobs on TranslateWiki. Seems i must use Pywikipedia. If you have time, Please make list of Persian translations for me. Also SQL or XML dump of this wiki (separated for each language) or a replication on toolserver is other solutions I think.

May it is better this thread moved to Portal:Fa. Thanks again!

ebraminiotalk08:37, 11 April 2011

Feel free to move the thread. Will get back with a reply. Data dump: have to ask Niklas.

Siebrand09:28, 11 April 2011

Okay, Thanks :)

ebraminiotalk11:14, 11 April 2011

I'm running a backup script now. We'll have to inspect what this results in. If it's OK, we can schedule a weekly dump.

php ./dumpBackup.php --current --report=1000 --skipheader --skip-footer --conf=../LocalSettings.php --output=bzip2:$HOME/dump.bzip2

Siebrand12:44, 13 April 2011

Backup is ready. Took about 20 minutes to create. Now checking if there's not something in there we wouldn't want to distribute (I think that there are no issues, but better safe than sorry). Will now also prepare a list of '/fa' pages with namespace.

Siebrand13:29, 13 April 2011
 
 
 

See https://translatewiki.net/static/temp:

  • fa.txt contains all pages ending "/fa"
  • translatewiki.net-dump-2011-04-13.xml.bz2 contains an XML dump of translatewiki.net
Siebrand11:31, 15 April 2011

Thank you. I don't need fa.txt because I will use "LIKE" on titles (also this is not needed really for me), But just a little problem, I can not download huge files in periods of time because it will consume my Internet traffic limits and my Internet traffic is so limited.

Can you split this dumps if you have program for publishing this dump also in future?

ebraminiotalk19:30, 15 April 2011

Split what/how? It's "only" a 77MB download.

Siebrand21:17, 15 April 2011

Splitting dump of each language. This tool is useful if you want.

Sadly, yes, it is only "77mb" but is big for me :(

ebraminiotalk22:22, 15 April 2011

Download it patiently then!

Huji15:04, 17 April 2011

Okay, I will solve my problem myself :) Thanks for your great helps.

ebraminiotalk17:27, 17 April 2011