Caching
TDengine includes caching as a built-in component. This includes write caching, read caching, metadata caching, and file system caching.
Write Cache
TDengine uses a time-driven cache management strategy that prioritizes caching the most recently ingested data. When the size of the data stored in cache reaches a preset threshold, the earliest data in cache is written to disk.
You can optimize database performance for your use case by specifying the number of vgroups in the database and the size of the write cache allocated to each vnode.
For example, the following SQL statement creates a database with 10 vgroups, with each vnode having a 256 MB write cache.
CREATE DATABASE power VGROUPS 10 BUFFER 256 CACHEMODEL 'none' PAGES 128 PAGESIZE 16;
Generally, a larger cache size results in improved performance. However, there exists a certain point at which further increasing the cache size has no significant effect on performance.
Read Cache
You can configure TDengine databases to cache the most recent data of each subtable, allowing for faster queries. To do so, you specify a cache model for your database by setting the the CACHEMODEL
parameter to one of the following values:
none
: The read cache is disabled.last_row
: The most recent row of data from each subtable is cached. TheLAST_ROW()
function will then retrieve this data from cache.last_value
: The most recent non-null value for each column of each subtable is cached. TheLAST()
function will then retrieve this data from cache.both
: The most recent row of each subtable and the most recent non-null value of each column of each subtable are cached. This simultaneously activates the behavior of both thelast_row
andlast_value
cache models.
You can also configure the memory size for each vnode by specifying a value for the CACHESIZE
parameter. This parameter can be set from 1 MB to 65536 MB. The default value is 1 MB.
Metadata Cache
Each vnode caches metadata that it has previously accessed. The size of this metadata cache is determined by the PAGES
and PAGESIZE
parameters of the database. For example, the following SQL statement creates a database whose vnodes have a metadata cache of 128 pages with each page being 16 KB:
CREATE DATABASE power PAGES 128 PAGESIZE 16;
File System Cache
For reliability purposes, TDengine records changes in a write-ahead log (WAL) file before any data is written to the data storage layer. The fsync
function is then called to write the data from the WAL to disk. You can control when the fsync
function is called for a database by specifying the WAL_LEVEL
and WAL_FSYNC_PERIOD
parameters.
-
WAL_LEVEL
:- Specify
1
to wait for the operating system to callfsync
. In this configuration, TDengine does not callfsync
itself. - Specify
2
for TDengine to callfsync
at a certain interval, specified by the WAL_FSYNC_PERIOD parameter.
The default value is
1
. - Specify
-
WAL_FSYNC_PERIOD
:- Specify
0
to callfsync
every time data is written to the WAL. - Specify a value between
1
and180000
milliseconds to callfsync
each time this interval has elapsed.
Note that this parameter takes effect only when
WAL_LEVEL
is set to2
. - Specify
The following SQL statement creates a database in which data in the WAL is written to disk every 3000 milliseconds:
CREATE DATABASE power WAL_LEVEL 2 WAL_FSYNC_PERIOD 3000;
The default configuration of WAL_VALUE 1
delivers the highest performance. In use cases where data reliability is a higher priority than performance, you can set WAL_LEVEL
to 2
.
Example: Enhancing Query Performance with Read Caching
This example demonstrates the performance improvements delivered by read caching. The sample data from Data Querying is used in this section. This data is generated by the following command:
taosBenchmark -d power -Q --start-timestamp=1600000000000 --tables=10000 --records=10000 --time-step=10000 -y
Note that read caching is disabled by default on the sample database generated by taosBenchmark.
-
To establish a performance baseline, run the following SQL statements:
taos> SELECT LAST(ts, current) FROM meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.353815s)
taos> SELECT LAST_ROW(ts, current) FROM meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.344070s)These return the most recent non-null value and the most recent row from any subtable in the
meters
supertable. It can be seen that these queries return in 353 and 344 milliseconds, respectively. -
Enable read caching on the database:
taos> ALTER DATABASE power CACHEMODEL 'both';
Query OK, 0 row(s) affected (0.046092s)
taos> SHOW CREATE DATABASE power\G;
*************************** 1.row ***************************
Database: power
Create Database: CREATE DATABASE `power` BUFFER 256 CACHESIZE 1 CACHEMODEL 'both' COMP 2 DURATION 14400m WAL_FSYNC_PERIOD 3000 MAXROWS 4096 MINROWS 100 STT_TRIGGER 2 KEEP 5256000m,5256000m,5256000m PAGES 256 PAGESIZE 4 PRECISION 'ms' REPLICA 1 WAL_LEVEL 1 VGROUPS 10 SINGLE_STABLE 0 TABLE_PREFIX 0 TABLE_SUFFIX 0 TSDB_PAGESIZE 4 WAL_RETENTION_PERIOD 3600 WAL_RETENTION_SIZE 0 KEEP_TIME_OFFSET 0
Query OK, 1 row(s) in set (0.000282s) -
Run the two queries from Step 1 again:
taos> SELECT LAST(ts, current) FROM meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.044021s)
taos> SELECT LAST_ROW(ts, current) FROM meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.046682s)
It can be seen that these queries now return in 44 and 47 milliseconds, respectively. This indicates that read caching on this system produces an approximately 8-fold improvement in query performance.