将PostgreSQL数据实时同步到Elasticsearch

PostgreSQL是一个类似MySQL的关系型数据库，它可以全文检索，但是需要额外插件才能支持中文。考虑到我的开发和部署环境都是通过Docker Compose进行，我倾向于使用更容器化友好的Elasticsearch。

Elasticsearch也是一个数据库，但和PostgreSQL、MongoDB不同的是，它专门用于检索文本，可以自动分词、设置权重。同时，Elasticsearch通过REST接口操作数据库，对于客户端而言，操作ES就像调用微服务API一样，十分方便。

在我的应用场景中，为了降低维护成本，我决定将PostgreSQL的数据实时同步到Elasticsearch中。这样，就可以让Elasticsearch变成一个只读数据库，对持久化稳定性和可靠性的要求就能适当降低，从而更容易容器化。同时，在后期拓展时，不需要担心因为事务、分布式等问题带来的数据不同步。

将PostgreSQL同步到Elasticsearch，有这两种解决方案：

通过Web后端，自己编写业务代码，手动进行同步，这是一个绝对稳妥的方案，即使对两个数据库没有足够了解，也可以实现。缺点就是，额外的编程成本和代码难以复用。
通过PGSync，一个专门用于同步数据到Elasticsearch的中间层，来解决这个问题。

从PGSync的仓库（注意不要和另一个同名开源项目搞混）介绍中可以看到，它就是用于解决这个问题的。于是，我将PGSync加入到容器编排中。具体配置文件如下（参考官方文档，需要PostgreSQL、Elasticsearch和Redis）：

FROM python:3.7

WORKDIR /app

RUN pip install pgsync

  pgsync:
    build:
      context: ./pgsync
    command: ./run.sh
    restart: on-failure:3
    sysctls:
      - net.ipv4.tcp_keepalive_time=200
      - net.ipv4.tcp_keepalive_intvl=200
      - net.ipv4.tcp_keepalive_probes=5
    labels:
      org.label-schema.name: "pgsync"
      org.label-schema.description: "Postgres to elasticsearch sync"
      com.label-schema.service-type: "daemon"

    depends_on:
      - redis
      - elasticsearch
    environment:
      - PG_USER=xxxx
      - PG_HOST=postgres
      - PG_PASSWORD=xxxx
      - ELASTICSEARCH_HOST=elasticsearch
      - REDIS_HOST=redis
      - LOG_LEVEL=INFO
      - POLL_TIMEOUT=10 #100s
      - REDIS_POLL_INTERVAL=1 #10s
    volumes:
      - ./pgsync:/app

#!/usr/bin/env bash

./wait-for-it.sh $PG_HOST:5432 -t 60
./wait-for-it.sh $ELASTICSEARCH_HOST:9200 -t 60
./wait-for-it.sh $REDIS_HOST:6379 -t 60

bootstrap --config schema.json
pgsync --config schema.json --daemon

通过编写schema.json，就可以直接将数据从PG映射到ES。这是schema.json的示例：

[
  {
    "database": "example",
    "index": "post",
    "nodes": {
      "table": "post",
      "columns": ["id", "title", "markdown"],
      "transform": {
        "mapping": {
          "markdown": {
            "type": "text",
            "analyzer": "ik_max_word"
          },
          "id": {
            "type": "long"
          },
          "title": {
            "type": "text",
            "analyzer": "ik_max_word"
          }
        }
      }
    }
  }
]

如果需要中文分词搜索，可以为Elasticsearch安装IK插件，并在上方添加"analyzer": "ik_max_word"。

这是安装了IK插件的Elasticsearch容器的Dockerfile：

FROM elasticsearch:7.16.2

RUN bin/elasticsearch-plugin install --batch https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.16.2/elasticsearch-analysis-ik-7.16.2.zip

如果数据库不发生变化，PGSync只需要执行一次就行。而如果要实时同步数据库，那就需要保持PGSync在线。

欢迎来到Yari的网站：yar2001 » 将PostgreSQL数据实时同步到Elasticsearch

将PostgreSQL数据实时同步到Elasticsearch

相关

From the earth

最新文章