What is Carbon?
Graphite is comprised of two components, a webapp frontend, and a backend (Carbon) storage application. Data collection agents connect to carbon and send their data, and carbon's job is to make that data available for real-time graphing immediately and try to get it stored on disk as fast as possible.
How does Carbon work?
Carbon is made of up three processes: carbon-agent.py, carbon-cache.py, and carbon-persister.py. The primary process is carbon-agent.py, which starts up the other two processes in a pipeline. Carbon-agent accepts connections and receives time series data in the appropriate format. This data is sent through the pipeline to carbon-cache, who stores the data a cache where data points are grouped by their associated metric. Carbon-cache constantly attempts to write the largest such group of data points down the pipeline to carbon-persister. Carbon-persister reads these data points and writes them to disk using whisper.
The reason carbon is split into three processes is actually because of Python's threading problems. Originally carbon was a single application where these distinct functions were performed by threads, but alas Python's GIL prevents multiple threads from actually running concurrently. Since the initial deployment of Graphite was done on a machine with lots of rather slow CPU's, we needed true concurrency for performance reasons. Thus it was split into three processes connected via pipes.
How are Graphite graphs always real-time, even when carbon hasn't had time to write it's cached data to disk yet?
Upon receiving a rendering request, the Graphite webapp simultaneously retrieves data for the requested metrics from the disk and from carbon's cache via a simple cache query socket that carbon-cache provides. Graphite then combines these two sources of data points into a single series, which is then rendered. This ensures that graphs are always real-time, even when the data hasn't been written to disk yet.