Skip to content

Migrate data from Elasticsearch to OpenSearch 2

In this How-to you will learn how to migrate data from your existing Elasticsearch service instance to a new OpenSearch 2 service instance.

The recommended strategy to migrate the full dataset from an Elasticsearch service instance to an OpenSearch 2 service instance is to use the reindex Data Operation.

To follow this How-to, you need the following tools installed:

  • curl – Command line tool and library for transferring data with URL syntax
  • jq – Command-line JSON processor
  • cf CLI – Cloud Foundry command line interface

Before starting the migration, ensure the following steps are completed:

  • Isolate the source service instance so no new data is written.
  • Trigger a manual backup of the source service instance via the Service Dashboard and wait until it finishes.
  • Generate service credentials for the source service instance if not already available.
  • Check database consistency of the source instance.
  • Order a new OpenSearch 2 destination instance and create a credentials key.

The migration is performed using the reindex operation. reindexing must be done one index at a time.

For each index, the following steps must be executed on the destination OpenSearch 2 service instance.

Terminal window
# refresh index
curl -k -u <service-instance-username>:<service-instance-password> https://<service-instance-host>:<service-instance-port>/<target_index>/_refresh
# list all indices
curl -k -u <service-instance-username>:<service-instance-password> https://<service-instance-host>:<service-instance-port>/_cat/indices?v
  • Both service instances contain the same indices created by your application
  • Each index has the same docs.count
  • Sample documents are equivalent (for example using a specific timestamp)
Terminal window
curl -k -u <source-username>:<source-password> \
https://<source-host>:<source-port>/_cat/indices?v
Terminal window
curl -k -u <source-username>:<source-password> \
-X GET https://<source-host>:<source-port>/<index_name>/_settings?pretty
Terminal window
curl -k -u <source-username>:<source-password> \
-X GET https://<source-host>:<source-port>/<index_name>/_mapping?pretty
Terminal window
curl -k -u <source-username>:<source-password> \
-X GET https://<source-host>:<source-port>/_template?pretty
Terminal window
curl -k -u <source-username>:<source-password> \
-X GET https://<source-host>:<source-port>/_alias?pretty

Destination instance actions (OpenSearch2)

Section titled “Destination instance actions (OpenSearch2)”
Terminal window
curl -k -u <destination-username>:<destination-password> \
-X PUT https://<destination-host>:<destination-port>/<target_index> \
-H 'Content-Type: application/json' \
-d '{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"age": {
"type": "integer"
}
}
}
}'

OpenSearch documentation: Create Index

Terminal window
curl -k -u <destination-username>:<destination-password> \
-X PUT https://<destination-host>:<destination-port>/_index_template/<template_name> \
-H 'Content-Type: application/json' \
-d '{
"index_patterns": ["logs-2020-01-*"],
"template": {
"aliases": {
"my_logs": {}
},
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"value": {
"type": "double"
}
}
}
}
}'

OpenSearch documentation: Create a template

Terminal window
curl -k -u <destination-username>:<destination-password> \
-X PUT https://<destination-host>:<destination-port>/<target_index>/_settings \
-H 'Content-Type: application/json' \
-d '{
"index": {
"refresh_interval": -1,
"number_of_replicas": 0
}
}'

OpenSearch documentation: Update settings

The reindex operation requires an internal hostname and port of your source Elasticsearch service instance.

This information can be retrieved using the CF CLI (Cloud Foundry command line interface).

  1. Login via CF CLI
    Cloud Foundry Runtime is not required in this case, as STACKIT Data Services are located in technical organisations.

  2. Choose your technical organization
    Technical organization names always start with the prefix stackit_portal and contain your STACKIT project name.

    Terminal window
    cf target -o <technical_organization_name>
    cf target -o stackit_portal_prod_my_project_h4GU6Tew
  3. List your Data Services
    Identify the Elasticsearch source service instance.

    Terminal window
    cf services
  4. Identify the internal Elasticsearch service instance
    You should find two service instances with the same name:

    • one regular instance
    • one instance with the suffix -internal

    The -internal instance is required for the reindex operation.

  5. Create a service key for the internal instance

    Terminal window
    cf create-service-key <my_internal_instance_name> <my_key_name>
    cf create-service-key my_elasticsearch-internal mykey
  6. Retrieve internal credentials
    This command returns the internal hostname, port, username, and password.

    Terminal window
    cf service-key <my_internal_instance_name> <my_key_name>
    cf service-key my_elasticsearch-internal mykey
  7. Start the reindex operation

    Terminal window
    curl -k -u <destination-opensearch-service-instance-username>:<destination-opensearch-service-instance-password> \
    -X POST https://<destination-opensearch-service-instance-host>:<destination-opensearch-service-instance-port>/_reindex?wait_for_completion=false \
    -H 'Content-Type: application/json' \
    -d '{
    "source": {
    "remote": {
    "host": "https://<internal-source-elasticsearch-service-instance-host>:<internal-source-elasticsearch-service-instance-port>",
    "username": "<internal-source-elasticsearch-service-instance-username>",
    "password": "<internal-source-elasticsearch-service-instance-password>",
    "socket_timeout": "<socket-timeout>"
    },
    "index": "<target_index>",
    "size": <number-of-documents-to-reindex-by-batch>
    },
    "dest": {
    "index": "<target_index>"
    }
    }'

OpenSearch documentation: reindex data

Terminal window
curl -k -u <destination-username>:<destination-password> \
-X GET https://<destination-host>:<destination-port>/_tasks/<task-id>?pretty

OpenSearch documentation: Tasks

  • Repeat database consistency checks on the destination instance
  • Optionally verify settings and mappings

After reindexing is complete, restore your desired settings:

Terminal window
curl -k -u <destination-username>:<destination-password> \
-X PUT https://<destination-host>:<destination-port>/<target_index>/_settings \
-H 'Content-Type: application/json' \
-d '{
"index": {
"refresh_interval": "1s",
"number_of_replicas": 1
}
}'