GitLab 与 PostgreSQL 11.7 断电后无法启动

家里的服务器突然断电了,GitLab 无法启动、PGSQL崩了,导致 Metabase、Confluence这些都挂了,Docker 设置的自动重启,使得 CPU 温度过高

原本以为是升级了 VMware 16 或者最近更新的 Docker 19.03.13 版本导致的,于是回滚到升级前的备份点,发现错怪他们了

详细日志

GitLab 报错无法启动 :”/var/opt/gitlab/postgresql/.s.PGSQL.5432″

---- Begin output of "bash" "/tmp/chef-script20200921-27-19aio3b" ----,
STDOUT: rake aborted!,
PG::ConnectionBad: could not connect to server: No such file or directory,
Is the server running locally and accepting,
connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?,
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:53:in block (3 levels) in ', /opt/gitlab/embedded/bin/bundle:23:inload',
/opt/gitlab/embedded/bin/bundle:23:in `',
Tasks: TOP => gitlab:db:configure,
(See full trace by running task with --trace),
STDERR: ,
---- End output of "bash" "/tmp/chef-script20200921-27-19aio3b" ----,
Ran "bash" "/tmp/chef-script20200921-27-19aio3b" returned 1,
,
Chef Infra Client failed. 9 resources updated in 23 seconds,

查看 PSQL 的日志

解决方法

以 gitlab-psql 用户登入容器内部(注意 root 用户是没用的)

执行命令:pg_resetwal -f /var/opt/gitlab/postgresql/data

成功启动

blank

PostgreSQL 12.3

它断电了是报另外一个错误:

replication checkpoint has wrong magic 0 instead of 307747550

解决方法:

用数据库用户登陆后移除 $PGSQL/pg_logical/replorigin_checkpoint

PostgreSQL INDEX 问题

2020-09-22 02:52:40.632 UTC [347] ERROR: index “job_id_idx” contains unexpected zero page at block 37779

Confluence 报这个错误

用命令 \c confluencedb 进入数据库后执行命令:

REINDEX DATABASE confluencedb;

重建索引后即可正常运行