As far as I can tell, here are the main differences:
- OpenTSDB does not deteriorate data over time, unlike Graphite where the size of the database is pre-determined.
- OpenTSDB can store metrics per second, as opposed to Graphite which has minute intervals (I'm not sure of this, Graphite docs show retention policies which stores metrics every minute, but I don't know if this is the minimum unit of time we can play with)
I want to make an informed decision about which tool to use in order to store metrics, have I missed any other differences in these 2 systems? How performant/scalable are they?
Bonus Question: Is there any other time series system I should look at?
Disclaimer: I wrote OpenTSDB.
I would say that the biggest advantage of Graphite seems to be superior graphing capabilities. It offers more graph types and features. Deployment complexity is also probably a bit lower with Graphite, as it's not a distributed system and thus has fewer moving parts.
OpenTSDB, on the other hand, is capable of storing a significantly larger amount of fine-grained data points. This comes at the cost of deploying HBase, which isn't that big of a deal to be honest. If you want to get real-time data down to the second with >>10k new data points/s, then OpenTSDB will suit you well.
Some info about our current scale at StumbleUpon (these numbers generally double every 2-3 months):
User interface
Graphite has some superb graphing tools available. The default web interface is ugly (although functional), but you then have a wealth of great graphing and dashboard options.
A few examples:
Look here or here to find many more.
OpenTSDB on the other is still at the gnuplot stage:
Setup
In practice, Graphite is actually much more of a pain to setup than HBase + OpenTSDB. OpenTSDB has a comprehensive documentation and a few straightforward steps. These are the commands to install Graphite, things get even trickier if you build from source.
Performances
True. Also Graphite uses a file format similar to RRD, in practice this means a single data point will take as much disk space as the full time serie since this space is pre-allocated. This also means plotting an empty time interval will take as much time as if there was data there (an alternative storage engine, Ceres, is in the work but I haven't tried it yet).
As tsuna said OpenTSDB will let you store significantly more data points, leveraging the power of Hadoop's HDFS. Graphite on the other hand, whose architecture is detailed in this AOSA chapter, is a more adhoc solution.
Nope, both can log down to the second.