Thread:Support/MediaWiki:Captcha-sendemail/reply

It is true that now spam bots can now solve easily the simple problems like sums (even if the reply is in full letters, given that they also detect the language of the question at the same time). On very visited sites, these simple questions are repeated very often, and their repetition pattern is frequent enough that most of these questions can be replied with a small dictionnary (this is because there's not enough possible replies to these questions, generally not much more than a couple of dozens).

Captchas requiring to read some graphic to enter a string that is about 6 character long (just like passwords, except that the characters used in Catchas have more randomness than passwords so they are more secure), will easily impact bots a lot by forcing them to give too many false answers, so these bots (notably those running on infected home PCs running some open anoymizing proxies) are much easier to detect and block before they cause significant damages (except "DoS attacks" that any Captchas cannot prevent).

So let's suppose that the Captcha tolerates at most 3 errors from an IP before it is locked for about 2 minutes, and then escalates this delay exponentially up to a limit of 24 hours, it will only allow only about 200 errors per 24 hours from a single IP. If the Captcha responses could be about 100,000 possible responses, the chance for getting a correct response at least once in 24 hours, by replying randomly, will be about one over 10,000, and this means that you'll have detected and eliminated almost all bots, except less than 0.01% of them.

Now you need to consider how many source IPs are used by spambots : most spambots currently cannot scale up to more than 10,000 PCs worldwide (and if this ever happens, there is a very huge chance that they will hit some honeypot owned by security suite vendors, that will be able to give an alert to the large websites (like Wikipedia) that are listening their alerts, and giving advices on how to best detect and filter these bots, or to get the list of IPs that these bots are controlling. (Note: I did not compute the actual statistical numbers, this is just a generic exposition of the solution to explain the principles).

So this is a question of time and scalability of spambots, and the time we currently have on Wikimedia sites to react to such situation. A 24 hours reaction time seems quite effective because we have enough admins working around the clock all around the world and collaborating in a complex private mesh through heterogeneous communication means (and constantly changing) that spambot authors cannot all monitor.

In summary, a Captcha system works, but NOT ALONE : it works with the help of system admins, and their own subscriptions to various Internet security monitoring groups and vendors. It will not work magically however for a wiki left for long period of times without any admin, or just using some automated security updates. A good captcha system in fact helps fighting attacks during zero-day attacks, before the bots get large enough to be detectable, and if botnets are left very small, it will take a considerable time to get some access and being able to post spams to our wikis, and their volume will be small enough that they will be easily reverted without lot of manual cleanup job to perform by admins.