Requesting for Adminstration

nice

Please explain your problem. I can provide a list of pages in a particular language.

This is good, but the thing that I really need is having a tool for finding occurrence of a regex (and chars) on that pages. I know with list of my language translations I can do that with pywikipedia (I used pywikipedia in my wikipedia for a lot propuses). But this can done better and easier without that.

For example we have some hidden character in Persian (zwnj, zwj and direction control chars that is common in RTL languages). If I had a tool for finding misused occurrence of them in translation, I could easily fix them. For example this regex needed for my language (and some languages with Arabic script): s/([ازرذدژوة])\u200c/\1/g also finding non-standard chars (for my languages) with Special:Search is not possible because Special:Search only yields occurrence of words and not chars.

ebraminio^talk‎

Using regex on over a million pages in translatewiki.net will be blocking our database for too long. We really cannot make that functionality available. If you were to provide the regexes you are looking for, we will try and find a way to help you out.

Siebrand‎

Okay, Thanks for your great jobs on TranslateWiki. Seems i must use Pywikipedia. If you have time, Please make list of Persian translations for me. Also SQL or XML dump of this wiki (separated for each language) or a replication on toolserver is other solutions I think.

May it is better this thread moved to Portal:Fa. Thanks again!

ebraminio^talk‎

Feel free to move the thread. Will get back with a reply. Data dump: have to ask Niklas.

Siebrand‎

Okay, Thanks :)

ebraminio^talk‎

I'm running a backup script now. We'll have to inspect what this results in. If it's OK, we can schedule a weekly dump.

php ./dumpBackup.php --current --report=1000 --skipheader --skip-footer --conf=../LocalSettings.php --output=bzip2:$HOME/dump.bzip2

Siebrand‎

Backup is ready. Took about 20 minutes to create. Now checking if there's not something in there we wouldn't want to distribute (I think that there are no issues, but better safe than sorry). Will now also prepare a list of '/fa' pages with namespace.

Siebrand‎

See https://translatewiki.net/static/temp:

fa.txt contains all pages ending "/fa"
translatewiki.net-dump-2011-04-13.xml.bz2 contains an XML dump of translatewiki.net

Siebrand‎

Thank you. I don't need fa.txt because I will use "LIKE" on titles (also this is not needed really for me), But just a little problem, I can not download huge files in periods of time because it will consume my Internet traffic limits and my Internet traffic is so limited.

Can you split this dumps if you have program for publishing this dump also in future?

ebraminio^talk‎

Split what/how? It's "only" a 77MB download.

Siebrand‎

Splitting dump of each language. This tool is useful if you want.

Sadly, yes, it is only "77mb" but is big for me :(

ebraminio^talk‎

Download it patiently then!

Huji‎

Okay, I will solve my problem myself :) Thanks for your great helps.

ebraminio^talk‎