bcache

 

From Wikipedia, the free encyclopedia
Jump to: navigation, search
bcache is a Linux kernel block layer cache (hence the name, block cache), developed by Kent Overstreet. It allows one or more fast storage devices such as flash-based solid-state drives (SSDs) to act as a cache for one or more slower hard disk drives, effectively creating hybrid drives.
It is designed around the performance characteristics of SSDs, minimizing write amplification by never performing random writes and by turning them into sequential writes instead – for both the cache and the primary storage. That helps extending the lifetime of flash-based devices used as caches, and also improves performance of write-sensitive primary storages, like RAID 5.


Overview[edit]

Using bcache makes it possible to utilize SSDs as another level of indirection within the data storage access paths, allowing improved speeds by utilizing fast SSDs as caches for slower hard drives (HDDs). That way, the gap between SSDs and HDDs can be bridged – the costly speed of SSDs gets combined with the cheap storage capacity of traditional HDDs.[1]
Caching is implemented by using SSDs for storing data associated with performed random reads and random writes, utilizing near-zero seek times as the most prominent feature of SSDs. Sequential I/O is not cached, in order to avoid rapid SSD cache invalidation on such already suitable enough operations for HDDs. Not caching the sequential I/O also helps in extending lifetime of the SSDs used as caches. Going around the cache for big sequential writes is known as the write-around policy.
Write amplification is avoided by not performing random writes to SSDs. Instead, all random writes are always combined into block writes, rewriting only the complete erase blocks on SSDs while writing the data into caches.[2][3]
Both write-back and write-through policies are supported for caching write operations. In case of the write-back policy, written data is stored inside the SSDs caches first, and propagated to the hard drives later in a batched way and by performing seek-friendly operations – making bcache to act also as an I/O scheduler. For the write-through policy, performance improvements are reduced by effectively performing only caching of the written data.[2][3]
Write-back policy with batched writes to HDDs also provides benefits for RAID levels utilizing the read-modify-write approaches, including RAID 5 and RAID 6. That way performance penalties[4] of random writes are avoided for such RAID levels, by grouping them into sequential writes.[2][3]
Caching performed by bcache is operating at the block level, making itself filesystem-agnostic as long as the filesystem has an embedded UUID. Within the context of blocks as caching extents, their sizes are going down to the size of a single HDD sector.[5]

History[edit]

The bcache was first announced by Kent Overstreet in July 2010, as a completely working Linux kernel module, though at its early beta stage.[6] The development continued for almost two years, until May 2012, at which point bcache reached its production-ready state.[3]
It was merged into the Linux kernel mainline in kernel version 3.10, released on 30 June 2013.[7][8]

Features[edit]

The following features are currently (as of the version in Linux kernel 3.10) provided by bcache:[2]
  • the same cache device can be used for caching an arbitrary number of the primary storage devices
  • runtime attaching and detaching of primary storage devices from their caches, while mounted and in use (running in passthrough mode when not cached)
  • automated recovery from unclean shutdowns – writes are not completed until the cache is consistent with respect to the primary storage device; internally, bcache makes no differences between clean and unclean shutdowns
  • write barriers / cache flushes are properly handled
  • write-through, write-back and write-around policies
  • sequential I/O is detected and bypassed – with configurable thresholds, and bypassing can also be disabled
  • throttling of the I/O to the SSD if it becomes congested – as detected by measured latency of the SSD's I/O operations exceeding a configurable threshold; useful for configurations having one SSD providing caching for many HDDs
  • readahead on a cache miss – disabled by default
  • highly efficient write-back implementation – dirty data is always written out in sorted order, and optionally background write-back is smoothly throttled down to keeping configured percentage of the cache dirty
  • high-performance B+ trees are used internally – bcache is capable of around 1,000,000 IOPS on random reads, if the hardware is fast enough.

Improvements[edit]

New features are planned for the future releases of bcache:[8]
  • RAID 5/6 stripe awareness – adding awareness of the stripe layout to the write-back policy, so decisions on caching will be giving preference to already "dirty" stripes, and actual background flushes will be writing out complete stripes first
  • handling cache misses with already full B+ tree nodes – splits of the internally used B+ trees' nodes are currently (as of the version in Linux kernel 3.10) happening on writes, making initial cache warm-ups hardly achievable
  • multiple SSDs in a cache set – only dirty data (for the write-back policy) and metadata would be mirrored, without wasting SSD space for the clean data and read caches
  • data checksumming.

See also[edit]

Комментарии

Популярные сообщения из этого блога

S.M.A.R.T. (часть 3). Расшифровка и понимание SMART атрибутов

S.M.A.R.T. (часть 2). Мониторинг BBU RAID контроллеров

Обзор системы хранения Intel SS4200-E начального уровня