Active-Active Deployment

Enterprise Feature

The features or components discussed in this document are available in TDengine Enterprise only. TDengine OSS does not include these features or components.

You can deploy TDengine in active-active mode to achieve high availability and reliability with limited resources. Active-active mode is also used in disaster recovery strategies to maintain offsite replicas of the database.

In active-active mode, you create two separate TDengine deployments, one acting as the primary node and the other as the secondary node. Data is replicated in real time between the primary and secondary nodes via TDengine's built-in data subscription component. Note that each node in an active-active deployment can be a single TDengine instance or a cluster.

In the event that the primary node cannot provide service, the client driver fails over to the secondary node. This failover is automatic and transparent to the business layer.

Replicated data is specially marked to avoid infinite loops. The architecture of an active-active deployment is described in the following figure.

Figure 1. TDengine in active-active mode

Limitations

The following limitations apply to active-active deployments:

You cannot use the data subscription APIs when active-active mode is enabled.
You cannot use the parameter binding interface while active-active mode is enabled.
The primary and secondary nodes must be identical. Database names, all configuration parameters, usernames, passwords, and permission settings must be exactly the same.
You can connect to an active-active deployment only through the Java client library in WebSocket mode.
Do not use the USE <database> statement to set a context. Instead, specify the database in the connection parameters.

Cluster Configuration

It is not necessary to configure your cluster specifically for active-active mode. However, note that the WAL retention period affects the fault tolerance of an active-active deployment. This is because data loss will occur If the secondary node is unreachable for a period of time exceeding the configured WAL retention period. Data lost in this manner can only be recovered manually.

Enable Active-Active Mode

Create two identical TDengine deployments. For more information, see Get Started.
Ensure that the taosd and taosx service are running on both deployments.
On the deployment that you have designated as the primary node, run the following command to start the replication service:
```
taosx replica start -f <source-endpoint> -t <sink-endpoint> [database]
```
- The source endpoint is the FQDN of TDengine on the primary node.
- The sink endpoint is the FQDN of TDengine on the secondary node.
- You can use the native connection (port 6030) or WebSocket connection (port 6041).
- You can specify one or more databases to replicate only the data contained in those databases. If you do not specify a database, all databases on the node are replicated except for information_schema, performance_schema, log, and audit.
- New databases in both sides will be detected periodically to start replication, with optional --new-database-checking-interval <SECONDS> argument.
- New databases checking will be disabled with --no-new-databases.
When the command is successful, the replica ID is displayed. You can use this ID to add other databases to the replication task if necessary.
Run the same command on the secondary node, specifying the FQDN of TDengine on the secondary node as the source endpoint and the FQDN of TDengine on the primary node as the sink endpoint.

Client Configuration

Active-active mode is supported in the Java client library in WebSocket connection mode. The following is an example configuration:

url = "jdbc:TAOS-RS://" + host + ":6041/?user=root&password=taosdata";
Properties properties = new Properties();
properties.setProperty(TSDBDriver.PROPERTY_KEY_BATCH_LOAD, "true");
properties.setProperty(TSDBDriver.PROPERTY_KEY_SLAVE_CLUSTER_HOST, "192.168.1.11");
properties.setProperty(TSDBDriver.PROPERTY_KEY_SLAVE_CLUSTER_PORT, "6041");
properties.setProperty(TSDBDriver.PROPERTY_KEY_ENABLE_AUTO_RECONNECT, "true");
properties.setProperty(TSDBDriver.PROPERTY_KEY_RECONNECT_INTERVAL_MS, "2000");
properties.setProperty(TSDBDriver.PROPERTY_KEY_RECONNECT_RETRY_COUNT, "3");
connection = DriverManager.getConnection(url, properties);

These parameters are described as follows:

Property Name	Meaning
PROPERTY_KEY_SLAVE_CLUSTER_HOST	Enter the hostname or IP address of the secondary node.
PROPERTY_KEY_SLAVE_CLUSTER_PORT	Enter the port number of the secondary node.
PROPERTY_KEY_ENABLE_AUTO_RECONNECT	Specify whether to enable automatic reconnection. For active-active mode, set the value of this parameter to true.
PROPERTY_KEY_RECONNECT_INTERVAL_MS	Enter the interval in milliseconds at which reconnection is attempted. The default value is 2000. You can enter 0 to attempt to reconnect immediately. There is no maximum limit.
PROPERTY_KEY_RECONNECT_RETRY_COUNT	Enter the maximum number of retries per node. The default value is 3. There is no maximum limit.

Command Reference

You can manage your active-active deployment with the following commands:

Use an existing replica ID to add databases to an existing replication task:
```
taosx replica start -i <id> [database...]
```
note
- This command cannot create duplicate tasks. It only adds the specified databases to the specified task.
- The replica ID is globally unique within a taosX instance and is independent of the source/sink combination.

Check the status of a task:

taosx replica status [id...]

This command returns the list and status of active-active synchronization tasks created on the current machine. You can specify one or more replica IDs to obtain their task lists and status. An example output is as follows:

+---------+----------+----------+----------+------+-------------+----------------+
| replica | task     | source   | sink     | database | status      | note           |
+---------+----------+----------+----------+------+-------------+----------------+
| a       | 2        | td1:6030 | td2:6030 | opc      | running     |                |
| a       | 3        | td2:6030 | td2:6030 | test     | interrupted | Error reason   |

Stop a replication task:
```
taosx replica stop [id [db...]]
```
If you specify a database, replication for that database is stopped. If you do not specify a database, all replication tasks on the ID are stopped. If you do not specify an ID, all replication tasks on the instance are stopped.

Use --no-new-databases to not stop new-databases checking.
Restart a replication task:
```
taosx replica restart [id [db...]]
```
If you specify a database, replication for that database is restarted. If you do not specify a database, all replication tasks in the instance are restarted. If you do not specify an ID, all replication tasks on the instance are restarted.
Update new databases checking interval:
```
taosx replica update id --new-database-checking-interval <SECONDS>
```
This command will only update the checking interval for new databases.
Check the progress of a replication task:
```
taosx replica diff [id [db....]]
```

This command outputs the difference between the subscribed offset in the current active-active replication task and the latest WAL (not representing row counts), for example:

+---------+----------+----------+----------+-----------+---------+---------+------+
| replica | database | source   | sink     | vgroup_id | current | latest  | diff |
+---------+----------+----------+----------+-----------+---------+---------+------+
| a       | opc      | td1:6030 | td2:6030 | 2         | 17600   | 17600   | 0    |
| ad      | opc      | td2:6030 | td2:6030 | 3         | 17600   | 17600   | 0    |

Delete a replication task.
```
taosx replica remove [id] [--force]
```
This command deletes all stopped replication tasks on the specified ID. If you do not specify an ID, all stopped replication tasks on the instance are deleted. You can include the --force argument to delete all tasks without stopping them first.

Limitations​

Cluster Configuration​

Enable Active-Active Mode​

Client Configuration​

Command Reference​

Support and Feedback

Limitations

Cluster Configuration

Enable Active-Active Mode

Client Configuration

Command Reference