What Is Federated Storage?
The primary bottleneck in storing lots of data in Graphite is disk I/O. Previously the only way to continue scaling Graphite indefinitely was to use a disk array that you could keep adding disks to to make it faster. If you needed to add capacity to the webapp then you had to add servers that could access this data by sharing it somehow. This is not a good solution for a lot of reasons, and this is also where federated storage enters the picture.
With federated storage you can have as many independent Graphite servers as you wish, each with their own independent local storage (still ideally with fast RAID). This allows you to add capacity to Graphite (both the frontend and the backend) by simply adding more servers instead expanding a large shared disk array. Each instance of the webapp has the ability to retrieve data from the other Graphite servers to give the appearance that it is one big system instead of many independent servers.
How does it work?
Each webapp needs to be configured with a list of all the Graphite servers in your cluster. When a user browses the metric hierarchy or renders a graph, the webapp asks the other webapps in the cluster if they have any data relevant to that particular request and consolidates the results for the user. Thus you could have a setup where you store application statistics on one Graphite server and network statistics on another. By visiting either Graphite instance you would be able to see the data from both, as if it was all stored on one server. If one of the servers goes down, then its portion of the data will no longer be visible, but everything else will continue to work and be responsive.
For the backend the only thing that is different is that you need to decide which server you want to put which data on. This is facilitated by a new process called carbon-relay. This application's role is to receive data sent by clients and to forward the data on to whichever Graphite servers your configuration specifies. It can also duplicate the data to more than one server to ensure high availability. If data is stored on more than one server, it will continue to be available as long as at least one of those servers is still available.
This feature was released in the 0.9.5 release and has been used in a large production environment for about 5 months now.