Installing ClockworkDB
For linux, you can install ClockworkDB using the provided RPM or DEB packages. These packages will install the necessary binaries, libraries, and configuration files to get you up and running with ClockworkDB quickly.
On Redhat based distributions:
ctt-clockworkdb-support-<version>.rpm for RPM-based distributions (e.g., CentOS, RHEL, Fedora)
ctt-clockworkdb-<version>.rpm for RPM-based distributions (e.g., CentOS, RHEL, Fedora)
ctt-clockworkdb-utilities-<version>.rpm for RPM-based distributions (e.g., CentOS, RHEL, Fedora)
On Debian based distributions:
ctt-clockworkdb-support-<version>.deb for DEB-based distributions (e.g., Ubuntu, Debian)
ctt-clockworkdb-<version>.deb for DEB-based distributions (e.g., Ubuntu, Debian)
ctt-clockworkdb-utilities-<version>.deb for DEB-based distributions (e.g., Ubuntu, Debian)
Configuring ClockworkDB
On linux, when using RPM or DEB packages, the configuration files are installed in the /opt/ctt/etc/clockworkdb/ directory. They are configured with basic settings for a WarpDrive+ repository named “warp1” that should work for most users. Follow the simple steps below to configure your environment, and run a quick test to list the repositories available.
Quick Start
Source the cdb-env.sh file to set up environment variables for ClockworkDB configuration. You should add this to your shell profile or rc file (.zshrc, .bashrc, .bash_profile, etc.):
source /opt/ctt/etc/clockworkdb/cdb-env.shTest that the configuration is set up correctly by running a ClockworkDB tool, such as the command line interface (CLI) tool cdb-repositories which lists the repositories defined in the configuration files.
Output:
#> cdb-repositories /\_/\ ( o.o ) > ^ < ____ _ _ _ ____ ____ / ___| | ___ ___| | ____ _____ _ __| | _| _ \| __ ) | | | |/ _ \ / __| |/ /\ \ /\ / / _ \| '__| |/ / | | | _ \ | |___| | (_) | (__| < \ V V / (_) | | | <| |_| | |_) | \____|_|\___/ \___|_|\_\ \_/\_/ \___/|_| |_|\_\____/|____/ v1.0.48 [11:32:52] [cdb-repositories] cdb-repositories Version: 0.3.3 [11:32:52] [cdb-repositories] Repository Description Module Provider [11:32:52] [cdb-repositories] -------------------- -------------------------------------------------- --------------- [11:32:52] [cdb-repositories] warp1 WarpDrive+ Repository(2k/16k) mod_warpdrive [11:32:52] [cdb-repositories] -------------------- -------------------------------------------------- ---------------
Creating and Managing Configuration Files
You can manage the configuration files for ClockworkDB using any text editor. The configuration files are XML files that define various settings and options for the ClockworkDB environment, repositories, and module providers that provide presistence logic for various backends.
To generate a new backend module provider configuration file, you can use the cdb-provider-config command line tool, which will create a new configuration file:
Usage: cdb-provider-config <module_provider_name>
#> cdb-provider-config mod_warpdrive
There are a number of configuration files which are used by the various tools, utilities, and programs which interact with ClockworkDB. These configuration files are XML files and one DTD file that defines ENTITYs, essentially variables used in the XML configuration files.
- cdb-env.sh file.
This is a shell script that sets environment variables to point to the ClockworkDB configuration directory, TOM_HOME, and adjusts the PATH and LD_LIBRARY_PATH environment variables to include the directories for tools and libraries. You should source this file before running any of the tools, utilities, or programs which interact with ClockworkDB.
- tom-environment.dtd file.
This file defines ENTITYs, which are essentially variables that can be used in the XML configuration files. This allows for easier management of configuration values and reduces redundancy across the XML files. Entities defined in this file can be referenced in the XML configuration files using the syntax &entity_name;.
A sample tom-environment.dtd file:
<!ENTITY support_email "support@curatedtimetech.com"> <!ENTITY install_dir "/opt/ctt/usr/"> <!ENTITY config_dir "/opt/ctt/etc/clockworkdb/"> <!ENTITY default_repo "warp1">
- tom-environment.xml file.
This file defines elements for logging, license, and clockworkdb environment configuration. When creating a new configuration file for a new repository, you can include this file via the <xi:include> directive. See the <xi:include href=”&config_dir;/cdb1.xml”> in the tom-environment.xml file for an example.
A sample tom-environment.xml file:
<!-- /\_/\ ( o.o ) > ^ < ____ _ _ _ ____ ____ / ___| | ___ ___| | ____ _____ _ __| | _| _ \| __ ) | | | |/ _ \ / __| |/ /\ \ /\ / / _ \| '__| |/ / | | | _ \ | |___| | (_) | (__| < \ V V / (_) | | | <| |_| | |_) | \____|_|\___/ \___|_|\_\ \_/\_/ \___/|_| |_|\_\____/|____/ v1.0.48 ClockworkDB is a high performance agnostic API for time series and vector data. Copyright (C) 2024 Curated Time Tech, Inc. --> <?xml version="1.0"?> <!DOCTYPE tom:environment SYSTEM "tom-environment.dtd"> <tom:environment xmlns:tom="http://www.tomsolutions.com/tom/environment/1.0" xmlns:xi="http://www.w3.org/2001/XInclude"> <!-- level="trace|info|warn|error" type="daily|rolling" file="</path/to/log>" or <filename> created in the current working directory format=[see fmt c++ library documentation] --> <logger level="info" format="[%Y-%m-%d %H:%M:%S.%F%z] [%E] [%n] [Pid: %P] [Tid: %t] [%l] [%v]" file="/tmp/clockworkdb.log" logtype="daily"/> <!-- license file checked by ClockworkDB tools, utilities, and programs --> <license file="clockworkdb.lic" path="&config_dir;/" /> <tsdb default-repository="&default_repo;" module-directory="&install_dir;/lib/tom-tsdb"> <!-- Include the back end modules we are using --> <xi:include href="&config_dir;/cdb1.xml"/> </tsdb> </tom:environment>
- cdb1.xml file.
This file is included in the tom-environment.xml file and defines the configuration for a specific repository, in this case, a WarpDrive+ powered repository named “warp1”. This file includes options for cache size, partitioning, transactions, environment settings, locking, datastore configuration, database home directory, directories for data and logs, and replication settings.
You can find more information about the various configuration options in the Module Provider Options section of the documentation.
A sample cdb1.xml file:
<!-- Template generated from mod_warpdrive [version: 1.0.10] Generated on: Tue Feb 17 12:43:10 2026 Module Path: /opt/ctt/usr/lib/tom-tsdb --> <?xml version="1.0"?> <!-- __ __ ____ _ \ \ / /_ _ _ __ _ __ | _ \ _ __(_)_ _____ _ \ \ /\ / / _` | '__| '_ \| | | | '__| \ \ / / _ \_| |_ \ V V / (_| | | | |_) | |_| | | | |\ V / __/_ _| \_/\_/ \__,_|_| | .__/|____/|_| |_| \_/ \___| |_| |_| WarpDrive+ is a high performance embedded database for time series and vector data. Copyright (C) 2024 Curated Time Tech, Inc. --> <!-- Definition of an WarpDrive+ repository name=[unique:string] value used to identify discrete repo in engine.get_session('name') module="mod_warpdrive" The module(.so|.dll) that manages the WarpDrive+ environment. description="<any>" Anything that describes this repository. Market data, Fed Data... --> <repository name="warp1" module="mod_warpdrive" description="WarpDrive+ Data (2k/16k)"> <options> <!-- This specifies the amount of memory for caching in the db environment. This is shared by all databases within the environment, but not across separate db environments. NOTE: This should be a power of 2! Sizes < 500M are rounded up 25% for overhead. max-mmap-size: This is the maximum part of the cache to mem-map into the process space [CONTENT]: The total size of cache memory for the environment subsystems (lock/write-ahead-log/transactions) --> <cache-size max-mmap-size-gigs="1" cache-segments="1" max-cache-gigs="1" init-cache-gigs="1"/> <!-- Use datastore partitioning use=[0|1] 1 turns on partitioning num-partitions=[1-64] Number of partitions to use hash-strategy=[substr,0,4] The hashing strategy to use. substr,0,4 means take the first 4 characters of the key partition-dirs=[dir1,dir2,dir3,...] The directories to use for the partitions --> <partitioning use="0" num-partitions="6" hash-strategy="substr,0,4" partition-dirs="data1,data2,data3,data4,data5,data6"/> <!-- For threaded performance, it is often beneficial to break environments into sub-enviornments so the locking subsystems have less contention enable="[0|1]": turn on(1) or off(0) slice-count="<N>": the number of cpu cores is a good choice slice-on-dimension=[0|1]: split data using the entire object name or the 1st dimension (part before the first '.') cache-size="<n>Gig.<bytes>": the size of the cache for each slice of environment cache-regions="<n>": break the cache into N contiguous regions --> <slices enable="0" slice-count="10" slice-on-dimension="1" cache-size="0.536870912" cache-regions="1"/> <!-- Will the environment be transactionally protected use: Turn on transactions write-ahead-logging: Use write-ahead logging to guarantee durability w/in ACID semantics timeout: The amount of time to wait for a transaction to complete or fail, in microseconds --> <transactions use="1" write-ahead-logging="1" timeout="10000"/> <!-- Some options to control the workings of the environment and dbs direct-db: turn off double buffering in the file-system multiversion: enable multiversion support in db no-mmap: don't memory map db pages into user/process space w/ mmap db-region-init: initialize data structures, preloading into memory for env auto-commit: automatically wrap all db operations in a transaction txn-nosync: don't write or sync txn log entries on transaction commits txn-write-nosync: write, but don't sync log entries on transaction commits auto-recover=[0|1] Cleanup environment support files. This will be ignored if replication is in use and this is not the master mlock-files: mlock mmap'd databases and environment files into memory use-sysv-ipc: instead of memory mapping files for cache, use System V Interprocess Comms sysv-ipc-key: unique id for System V IPC memory --> <environment direct-db="0" multiversion="0" no-mmap="0" db-region-init="1" auto-commit="0" txn-nosync="1" txn-write-nosync="1" auto-recover="0" mlock-files="1" use-sysv-ipc="0" sysv-ipc-key="55"/> <!-- Configure locking. This is important for both heavily threaded applications and many and/or massive multi-process shared environments. If you find frequent lock failures, try increasing timeout(in microseconds) or increase the lock/lockers/objects accross the board. --> <locking max-lockers="1200" max-locks="1200" max-objects="1200" timeout="10000"/> <!-- timeseries are stored in chunks on datastore pages chunk-size=[n] n should be a power of 2. a datastore, once created, cannot change chunk-size. page-size=[n] n should be a power of 2. a datastore locks pages, so pages should be large enough to hold several chunks, but not so big that lock contention becomes an issue for multi-threaded programs. NOTE: On most modern day hardware the hardware page size most widely used is 4096(4K) bytes. It would be wise to pick chunk size and page size with that in mind. If this baffles you, go find someone who understands hardware ;) --> <datastore chunk-size="2048" page-size="16384" compress="0" compress-algo="lz4" compress-level="12"/> <!-- This specifies the root directory for database files. dbs should be openned relative to this path. mode: POSIX file permision flag when creating new databases paths-relative-to: any database created, deleted, or openned will live under the specified directory in content [CONTENT]: The path where the database environment files live --> <db-home mode="666" paths-relative-to="1">/opt/ctt/usr/data/warp1</db-home> <!-- Where will various environment files live data-dir: the sub-directory of the db-home where database files live shared-memory-region-dir: locking/transaction/cache files live here log-dir: write-ahead log files live here to support Durability NOTE: ACID semantics supported. This is a standard requirement of enterprise database systems ACID: Atomicity Concurrent Isolation Durability. Google it if you need. WarpDrive+ supports both ACI and full ACID semantics WarpDrive+ also supports multi-version databases that maintain multiple views of the same data depending on the isolation of the applicable given transaction unit --> <directories data-dir="data" shared-memory-region-dir="regions" log-dir="logs"/> <!-- Configure replication: use=[0|1] 1 turns on replication master=[0|1] Should almost always be 0. 1 is reserved for the master in the replication group priority=[<a number>] Higher number get priority to be master if the master falls over port=[0-64k] Port to listen on verbose=[0|1] Turns on replication debugging auto-init=[0|1] Whether or not to re-init out of data replicas bulk-transfers=[0|1] Accumulate changes in a buffer before doing network transfers ack-policy: Strategy for considering how many/type of ACKs come in for cloud update --> <replication use="0" master="1" priority="150" port="3500" verbose="1" auto-init="1" bulk-transfers="1" ack-policy="quorum"> <master host="localhost" port="3500"/> <peers> <peer host="localhost" port="3501"/> <peer host="localhost" port="3502"/> <peer host="localhost" port="3503"/> </peers> </replication> </options> </repository>
Sample Python Programs
The Python API for ClockworkDB allows you to interact with the engine, repositories, datastores, and data using Python code. Below are some sample Python programs that demonstrate how to use the API to interact with the configured repositories and datastores in your environment. These programs can all be found in the examples/ directory of the source code, and you can run them after installing the package and setting up your environment.
List configured Repositories
You can use the cdb-repositories command line tool to list the repositories available in your environment, and then use the repository names in your Python programs to create sessions and interact with the data. The code below mimics the behavior of the cdb-repositories tool, however, it uses the Python API to interact with the engine and print out the repository metadata.
Code:
#!/bin/env python3 # The core ClockworkDB API is in the clockworkdb.tsdb module from clockworkdb.tsdb import * if __name__ == "__main__": # The engine class is a singleton responsible for managing sessions and repositories. # You ALWAYS get an instance of the engine using Engine.Instance() e = Engine.Instance() # This asks the engine for metadata about the repositories configured in the environment, # and prints out the name, description, and module provider for each repository. repos = e.get_repositories_meta_data() rng = range(repos.size()) for i in rng: repo = repos.get(i) print(f"Repository Name: {repo.name()}") print(f" Description: {repo.description()}") print(f" Module: {repo.module()}") print()Output:
Repository Name: warp1 Description: WarpDrive+ Data (2k/16k) Module: mod_warpdrive
List Repository Datastores
You can use the cdb-datastores command line tool to list the datastores available in a given repository, and then use the datastore names in your Python programs to interact with the data. The code below mimics the behavior of the cdb-datastores tool, however, it uses the Python API to interact with the engine and print out the datastore metadata for a given repository.
Code:
#!/bin/env python3 from clockworkdb.tsdb import * if __name__ == "__main__": e = Engine.Instance() # get a session for the "warp1" repository, which is configured in the environment. # You can have multiple repositories configured, and you can get sessions for any of them by name. session = e.get_session("warp1") # If you pass nothing to get_session(), it will use the default repository configured in your # environment, which is often what you want. # session = e.get_session() # WarpDrive+ is embedded in the same process as your Python program, so you can get a direct # connection to the data store connection = session.get_connection() # This asks the connection for metadata about the datastores configured in the repository, # and prints out the name of each datastore. ds_names = connection.get_datastores_meta_data() rng = range(ds_names.size()) for i in rng: ds = ds_names.get(i) print(f"Datastore Name: {ds.path()}")Output:
Datastore Name: co-insider-transactions.bdb Datastore Name: co-logo.bdb Datastore Name: co-news.bdb Datastore Name: co-peers.bdb Datastore Name: co-profile.bdb Datastore Name: fxdata.bdb Datastore Name: market-news.bdb Datastore Name: mktdata.bdb Datastore Name: normal.ann.bdb Datastore Name: normal.bdb Datastore Name: sec-master.bdb Datastore Name: stac-test.bdb
Catalog a Datastore
You can use the cdb-catalog command line tool to print out the catalog of a given datastore, which includes the timeseries and vectors stored in the datastore. The code below mimics the behavior of the cdb-catalog tool, however, it uses the Python API.
Code:
#!/bin/env python3 from clockworkdb.tsdb import * if __name__ == "__main__": # should be looking normal. Get an engine instance, get a session for the "warp1" repository, and get a connection, # and then a datastore for the "mktdata" datastore in that repository. NOTE: we open the datastore in read-only mode, # which is all we need to do to list the objects in it. e = Engine.Instance() session = e.get_session("warp1") connection = session.get_connection() datastore = connection.get_datastore("mktdata", AccessMode.READ_ONLY()) # Then we can do a regex search for all objects in that datastore and print their names. matches = datastore.regex_name_search(".*") while matches.next(): match = matches.name() print(f"Object Name: {match}")Output:
Object Name: A.ADJUSTED Object Name: A.CLOSE Object Name: A.HIGH Object Name: A.LOW Object Name: A.OPEN Object Name: A.VOLUME Object Name: AA.ADJUSTED Object Name: AA.CLOSE Object Name: AA.HIGH Object Name: AA.LOW Object Name: AA.OPEN Object Name: AA.VOLUME Object Name: AACG.ADJUSTED Object Name: AACG.CLOSE Object Name: AACG.HIGH ... Object Name: ZWS.CLOSE Object Name: ZWS.HIGH Object Name: ZWS.LOW Object Name: ZWS.OPEN Object Name: ZWS.VOLUME Object Name: ZYME.ADJUSTED Object Name: ZYME.CLOSE Object Name: ZYME.HIGH Object Name: ZYME.LOW Object Name: ZYME.OPEN Object Name: ZYME.VOLUME Object Name: ZYXI.ADJUSTED Object Name: ZYXI.CLOSE Object Name: ZYXI.HIGH Object Name: ZYXI.LOW Object Name: ZYXI.OPEN Object Name: ZYXI.VOLUME
Display timeseries/vector metadata
You can use the cdb-ts-meta command line tool to print out the metadata of a given timeseries or vector, which includes information about the object such as its name, type, number of records, and other relevant metadata. The code below mimics the behavior of the cdb-ts-meta tool, however, it uses the Python API.
Code:
#!/bin/env python3 # used to get at argv for passing in the name of the timeseries to get metadata for. # If no name is passed, it defaults to "nvda.close" import sys # The core ClockworkDB API is in the clockworkdb.tsdb module, and we also need to import # the calendars and scalar modules to get at the relevant types for timeseries metadata. from clockworkdb.tsdb import * from clockworkdb.calendars import * from clockworkdb.scalar import * if __name__ == "__main__": # normal setup to get at the datastore. We get an engine instance, then a session for the "warp1" repository, # then a connection, and then a datastore for the "mktdata" datastore in that repository. # NOTE: we open the datastore in read-only mode, which is all we need to do to get metadata about the timeseries stored in it. e = Engine.Instance() session = e.get_session("warp1") connection = session.get_connection() datastore = connection.get_datastore("mktdata", AccessMode.READ_ONLY()) ts_name = sys.argv[1] if len(sys.argv) > 1 else "nvda.close" # check to see that we have the timeseries/vector in the datastore, and if we do, get it and print out its metadata. # If not, print an error message and exit. if( not datastore.has_time_series(ts_name) ): print(f"Time series {ts_name} not found in datastore.") datastore.close() sys.exit(1) # If we have the timeseries/vector, we can get it from the datastore and print out its metadata, including its # name, type, calendar, creation and modification dates, first and last dates, and count of records. ts = datastore.get_time_series(ts_name) print(f"Time Series Name: {ts.name()}") print(f" Type: {ts.get_data_type().name()}") print(f" Calendar: {ts.get_calendar().name()}") print(f" Created: {ts.get_create_date()}") print(f" Modified: {ts.get_modify_date()}") print(f" First Date: {ts.get_first_date()}") print(f" Last Date: {ts.get_last_date()}") print(f" Count: {ts.get_last_date_int() - ts.get_first_date_int() + 1:,d}")Output (passing xom.close as argv[1]):
Time Series Name: XOM.CLOSE Type: Float Calendar: Business Created: 2024-Mar-02 21:26:58 Modified: 2026-Feb-15 13:37:18 First Date: 1970-Jan-02 Last Date: 2026-Feb-13 Count: 14,641
Display timeseries/vector data
You can use the cdb-ts command line tool to print out the data of a given timeseries or vector, which includes the date and value of each record in the timeseries or vector. The code below mimics the behavior of the cdb-ts tool, however, it uses the Python API.
Code:
#!/bin/env python3 # used to get at argv for passing in the name of the timeseries to get metadata for. # If no name is passed, it defaults to "nvda.close" import sys # The core ClockworkDB API is in the clockworkdb.tsdb module, and we also need to import # the calendars and scalar modules to get at the relevant types for timeseries metadata. from clockworkdb.tsdb import * from clockworkdb.calendars import * from clockworkdb.scalar import * if __name__ == "__main__": # normal setup to get at the datastore. We get an engine instance, then a session for the "warp1" repository, # then a connection, and then a datastore for the "mktdata" datastore in that repository. # NOTE: we open the datastore in read-only mode, which is all we need to do to get metadata about the timeseries stored in it. e = Engine.Instance() session = e.get_session("warp1") connection = session.get_connection() datastore = connection.get_datastore("mktdata", AccessMode.READ_ONLY()) ts_name = sys.argv[1] if len(sys.argv) > 1 else "nvda.close" # check to see that we have the timeseries/vector in the datastore, and if we do, get it and print out its metadata. # If not, print an error message and exit. if( not datastore.has_time_series(ts_name) ): print(f"Time series {ts_name} not found in datastore.") datastore.close() sys.exit(1) # If we have the timeseries/vector, we can get it from the datastore and print out its name and data. ts = datastore.get_time_series(ts_name) # internally, timeseries and vectors just store observations. There index in the vector identifies the date, time, or # ordinal location of the observation. To get the date for a given observation, you can use the calendar associated # with the timeseries/vector, and to get the value of the observation. you can use the appropriate scalar type, in this case, Float. # The code below iterates through all the observations in the timeseries/vector. rng = range( ts.get_first_date_int(), ts.get_last_date_int() + 1) calendar = ts.get_calendar() print(f"Time Series Name: {ts.name()}") for date_int in rng: # We check to see if the observation is normal, which means it has a valid value. # If it does, we get the value and print it out along with the date. if( ts.get_observation(date_int).is_normal() ): value = Float(ts.get_observation(date_int)).value() d = calendar.get_date(date_int) print(f" {d}[{date_int}]: {value:.2f}")Output (passing xom.close as argv[1]):
Time Series Name: XOM.CLOSE 1970-Jan-02[31309]: 1.94 1970-Jan-05[31310]: 1.97 1970-Jan-06[31311]: 1.96 1970-Jan-07[31312]: 1.95 1970-Jan-08[31313]: 1.96 1970-Jan-09[31314]: 1.96 1970-Jan-12[31315]: 1.95 1970-Jan-13[31316]: 1.94 1970-Jan-14[31317]: 1.95 1970-Jan-15[31318]: 1.93 1970-Jan-16[31319]: 1.93 1970-Jan-19[31320]: 1.93 1970-Jan-20[31321]: 1.93 ... 2026-Jan-23[45934]: 134.97 2026-Jan-26[45935]: 134.84 2026-Jan-27[45936]: 136.83 2026-Jan-28[45937]: 137.58 2026-Jan-29[45938]: 140.51 2026-Jan-30[45939]: 141.40 2026-Feb-02[45940]: 138.40 2026-Feb-03[45941]: 143.73 2026-Feb-04[45942]: 147.59 2026-Feb-05[45943]: 146.08 2026-Feb-06[45944]: 149.05 2026-Feb-09[45945]: 151.21 2026-Feb-10[45946]: 151.59 2026-Feb-11[45947]: 155.56 2026-Feb-12[45948]: 149.93 2026-Feb-13[45949]: 148.45