Axibase Time-Series Database Non-relational database for storing and analyzing large volumes of metrics collected at high-frequency
What is a Time-Series Database? A time series database (TSDB) is a software system optimized for handling time series data, arrays of numbers indexed by time Wikipedia. Key optimization areas are: High compression: lossless and lossy, de-duplication, swinging door Fast range retrievals: indexes include time, fast forwards Temporal functions: data processing at source Read/write throughput 2
Temporal Functions: Name union intersect left_join right_join except regularize filter interpolate lag smooth difference aggregate round range math split Description Merge multiple time-series into a single multivariate time-series. Retain timestamps with incomplete values. Merge multiple time-series into a single multivariate time-series. Discard timestamps with incomplete values. Merge two time-series into a single time-series containing two variables, retain timestamps of the first series. Merge two time-series into a single time-series containing two variables, retain timestamps of the second series. Remove one time series from another. Remove timestamps from first series that exist in second series. Modify timestamps and values so that frequency (interval between consecutive samples) is constant. Retain timestamps that match specified condition, such as calendar, day-of-weeks/months, value filter. Add missing timestamps and values based on interpolation function. Modify timestamps by shifting them k steps right (K > 0) or left (k < 0) and drop k values without timestamps. Replace values with statistical functions applied to a sliding window (count or duration based). Replace each value with difference/ratio between current value and value some steps back/forward. Convert a time series to specified periodicity by applying statistical functions to values within each period. Truncate (round) time to the nearest second, minute, hour. Select sub-series based on start and end-time. Apply mathematical function to each value, e.g. log(v), or square root(v) Split a given time series along time periods and create a list of shorter time series. 3
Use Cases in IT Monitoring Retain detailed data for many years. Collect statistics at high-frequency, for example every 15 seconds. Consolidate performance statistics from all systems in one place: facilities, network, storage, servers, applications, databases, transactions, user activity etc. Monitor infrastructure based on abnormal deviations instead of manual thresholds. Apply statistics to predict outages. 4
TSDB Examples IBM Informix TimeSeries OSISoft Pi System RRDtool TDW+SPA+WPA 5
Challenges No horizontal scalability: cannot add new nodes Pre-defined schema: store only what's defined Storing more data slows down read requests No support for ML and analytical functions 6
Axibase Time Series Database Axibase Time-Series Database (ATSD) is a non-relational database implemented on Hadoop Distributed File System. As a time-series database, it provides specialized libraries for querying, aggregating, transforming, and forecasting time-series. As a clustered system with special schema, it offers linear scalability and more than 70% space savings compared to relational databases. 7
Architecture
Supported Data Types Two types of data ingestion: push and pull. ATSD supports numeric values, messages and properties. API libraries available for Java, PHP, and R language. Telnet, ICMP, CSV/TSV, FILE, JMX, HTTP, and JSON. 9
1 0 Forecasting Predict problems before they occur. The accuracy of predictions depends on the frequency of data collection, the retention interval, and algorithms. Built-in forecasting algorithms (Holt-Winters, ARIMA, etc.) in ATSD allow predicting of system failures at early stages. The forecasting process is most effective in a clustered system with data locality such as ATSD. Dynamic predictions eliminate the need to set manual thresholds.
Forecast Automation ATSD selects the most accurate forecasting algorithm for each timeseries separately based on a ranking algorithm. The highest-ranked algorithm is used to compute forecast for the next day, week or month. Pre-computed forecasts can be used in rule engine.
Forecasting Example
Forecasting Example 1 3
Analytical Rule Engine Rule Examples Type Window Example Description threshold none value > 75 Raise alert if last metric value exceeds threshold statistical-time time('15 min') wavg(value) > 75 Raise alert if weighted average for the last 15 minutes exceeds threshold cpu forecast deviation time('5 min') abs(forecast_deviation(avg())) > 2 Raise alert if 5-minute average deviates from forecast by more than 2 standard deviations cpu forecast diff time('10 min') abs(avg() - forecast()) > 25 Raise alert if forecast deviates from average by more than 25% abs(forecast_deviation(avg())) > 2 abs(avg() - forecast()) > 25
Forecast Settings 1 5
Visualization
1 7 ITM History Extension ITM can be instrumented to write streaming data into CSV files. CSV can be instantly uploaded into ATSD using inotify utility and wget. Example: private history streaming in ITM KHD_CSV_OUTPUT_ACTIVATE = Y
1 8 nmon Reporting Consolidate trusted statistics from AIX and Linux systems in one database Analyze nmon data with forecasting algorithms
1 9 Custom Metrics API libraries for Java, PHP, R RESTful and Network commands
ATSD Benefits Extract additional value from data that already exists in IT infrastructures. Surprise and amaze your end-users with real-time metrics that they were not able to collect before. Set your engineers into innovation mode with NoSQL and big data solution. THANK YOU!