Server move workbook

From translatewiki.net

This is a prettified version of items done on the server move on 2019-12-20.

Pre switch over action plan

  • Decide a switch over date and communicate with other sysadmins
  • Announce in Support
  • Add a sitenotice some days or a week before
  • Email hexmode new IP addresses for email relay
  • ccp: reduce DNS TTL to 900
  • web1->web2: Copy over stuffs
 sudo -E rsync -ahAX --info=progress2  /etc/webauth root@web2.translatewiki.net:/etc
 sudo -E rsync -ahAX --info=progress2 --del /etc/letsencrypt root@web2.translatewiki.net:/etc
 sudo -E rsync -ahAX --info=progress2 --del /home /www /srv root@web2.translatewiki.net:/ # 2h20m
 sudo -E rsync -ahAX --info=progress2 --del /resources/caches/translatewiki.net root@web2.translatewiki.net:/resources/caches #20min
 sudo -E rsync -ahAX --info=progress2 --del /resources/{abi,amir,nike,raymond,siebrand} root@web2.translatewiki.net:/scratch #~24h
  • web2 preps
 cd /resources; for i in /scratch/*; do ln -s $i $(basename $i); done # <--- can probably be puppetized (or create an ansible playbook?) # yeah, for each user set it up automatically
 cd resources; mkdir projects; chown betawiki:users projects: ln -s /home/betawiki/config/repoconfig.yaml /resources/projects/; sudo -u betawiki /home/betawiki/config/bin/repomulti update # 20min

Pre switch over tests

  • web2: arm/test keyholder
  • web2: verify mysql access
  • web2: verify ttmserver/search index update scripts do not fail
  • ccp: ensure rDNS entries for IPv4 and IPv6 (we found that IPv6 was missing)

Switch over action plan

  • ccp: reduce DNS TTL to 300
  • web1->web2
 sudo -E rsync -ahAX --info=progress2 --del /resources/{abi,amir,nike,raymond,siebrand} root@web2.translatewiki.net:/scratch # slow
  • web1: Disable cron jobs
 sudo nano /etc/cron.d/{awstats,backup,certbot,wikimaintenance,wikistats}
 sudo crontab -u root -e
  • web1: Drain the jobqueue
 sudo systemctl stop mw-jobrunner
 php /srv/mediawiki/workdir/maintenance/runJobs.php --wait
  • web1: update sitenotice to say it's now read only
  • web1: Set old mediawiki to read only
 b nano /home/betawiki/config/TranslatewikiSettings.php # no need to deploy
  • web1: Export an SQL dump
 mydumper -B translatewiki_net -u root -o dump -e -c # 10m
  • web1->web2: rsync SQL dump
 sudo -E rsync -ahAX --info=progress2 dump root@web2.translatewiki.net:/root # 1m
  • web2: Import an SQL dump
 myloader -B translatewiki_net -d dump -u root # 17m
  • web2: add GRANTs
  • web1->web2: rsync logs/stats
 sudo -E rsync -ahAX --info=progress2 --del /home /www /srv root@web2.translatewiki.net:/
 sudo -E rsync -ahAX --info=progress2 --del /resources/caches/translatewiki.net root@web2.translatewiki.net:/resources/caches
 sudo /usr/lib/cgi-bin/awstats.pl -update -config=translatewiki.net
 sudo -E rsync -ahAX --info=progress2 --del /var/lib/awstats root@web2.translatewiki.net:/var/lib
  • web2: restart memcached
 sudo systemctl restart memcached
  • web2: Test that MediaWiki works
  • ccp: Switch over DNS
  • web2: Disable read only
 b nano /home/betawiki/config/TranslatewikiSettings.php # no need to deploy
  • web2: update site notice
  • web2: Update configuration for Elasticsearch (from pointing to localhost instead of es.twn.net)
  • web2: Start search index boostrap script
 screen
 cd /www/translatewiki.net/docroot/w/extensions/CirrusSearch/maintenance
 php updateSearchIndexConfig.php --startOver 
 php forceSearchIndex.php --skipLinks --indexOnSkip --buildChunks 1000 --batch-size=100 | nice parallel --eta --joblog ~/reindex-1.log --no-notice 
 php forceSearchIndex.php --skipParse --buildChunks 1000 --batch-size=100 | nice parallel --eta --joblog ~/reindex-2.log --no-notice
  • web2: Start translation memory bootstrap script
 screen nice php /www/translatewiki.net/docroot/w/extensions/Translate/scripts/ttmserver-export.php --threads 4 --reindex
  • web1: stop IRC bots
  • web2: start IRC bots
  • web2: enable cron jobs

Post switch over action plan

  • Update News
  • web2: re-enable letsencrypt
  • ccp: Increase DNS expiry age back to normal <--- set to 7200.
  • uptimerobot: remove monitor for es.translatewiki.net
  • clean up puppet repo (remove es/web1 legacy) https://gerrit.wikimedia.org/r/#/c/translatewiki/+/559781 Goodbye old servers
  • web1: take backups of everything
  • shut down servers es
  • shut down servers web1
  • archive this workplan on a wiki page