B+ tree
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)

B+ tree  

Type  Tree (data structure)  
Time complexity in big O notation  

A B+ tree is an mary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.^{[1]} The root may be either a leaf or a node with two or more children.
A B+ tree can be viewed as a Btree in which each node contains only keys (not key–value pairs), and to which an additional level is added at the bottom with linked leaves.
The primary value of a B+ tree is in storing data for efficient retrieval in a blockoriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,^{[1]} typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.
History[edit]
There is no single paper introducing the B+ tree concept. Instead, the notion of maintaining all data in leaf nodes is repeatedly brought up as an interesting variant. Douglas Comer notes in an early survey of Btrees (which also covers B+ trees) that the B+ tree was used in IBM's VSAM data access software, and refers to an IBM published article from 1973.^{[2]}
Structure[edit]
As with other trees, B+ trees can be represented as a collection of three types of nodes: root, internal, and leaf. These node types have the following properties:^{[3]}
 Individual nodes will have either record or children, but never both at the same time: this is the main distinction from a Btree.
 The order or branching factor b of a B+ tree measures the capacity of internal nodes, i.e. their maximum allowed number of direct child nodes. This value is constant over the entire tree.
 Internal nodes have no records, but will always have nonzero children. The actual number of children m for a given internal node is constrained such that . Each child is then referred to as for , where represents the child node at natural number index .
 Leaf nodes have no children, and instead contain the elements of the B+ tree as records. The number of records n contained in a given leaf node must satisfy the dual inequality .
 The root is typically considered to be a special type of internal node which may have as few as 2 children.^{[1]} This translates to . For example, if the order b of a B+ tree is 7, each internal node may have between and 7 children, while the root may have between 2 and 7.
 In the situation where a B+ tree is empty or contains exactly 1 node, the root instead becomes the single leaf. In this case, the number of keys n must satisfy .
A concrete example of these bounds is depicted in the table below:
Node Type  Children Type  Min Number of Children  Max Number of Children  Example  Example 

Root Node (when it is the only node in the tree)  Records  0  0–6  0–99  
Root Node  Internal Nodes or Leaf Nodes  2  b  2–7  2–100 
Internal Node  Internal Nodes or Leaf Nodes  b  4–7  50–100  
Leaf Node  Records  4–7  50–100 
Intervals in internal nodes[edit]
By definition, each value contained within the B+ tree is a key contained in exactly one leaf node. Each key is required to be directly comparable with every other key, which forms a total order.^{[4]} This enables each leaf node to keep all of its keys sorted at all times, which then enables each internal node to construct an ordered collection of intervals representing the contiguous extent of values contained in a given leaf. Internal nodes higher in the tree can then construct their own intervals, which recursively aggregate the intervals contained in their own child internal nodes. Eventually, the root of a B+ Tree represents the whole range of values in the tree, where every internal node represents a subinterval.
For this recursive interval information to be retained, internal nodes must additionally contain copies of keys for representing the least element within the interval covered by the child with index i (which may itself be an internal node, or a leaf).
Characteristics[edit]
For a border B+ tree with h levels of index:^{[citation needed]}
 The maximum number of records stored is
 The minimum number of records stored is
 The minimum number of keys is
 The maximum number of keys is
 The space required to store the tree is
 Inserting a record requires operations
 Finding a record requires operations
 Removing a (previously located) record requires operations
 Performing a range query with k elements occurring within the range requires operations
 The B+ tree structure expands/contracts as the number of records increases/decreases. There are no restrictions on the size of B+ trees. Thus, increasing usability of a database system.
 Any change in structure does not affect performance due to balanced tree properties.^{[5]}
 The data is stored in the leaf nodes and more branching of internal nodes helps to reduce the tree's height, thus, reduce search time. As a result, it works well in secondary storage devices.^{[6]}
 Searching becomes extremely simple because all records are stored only in the leaf node and are sorted sequentially in the linked list.
 We can retrieve range retrieval or partial retrieval using B+ tree. This is made easier and faster by traversing the tree structure. This feature makes B+ tree structure applied in many search methods.^{[5]}
Algorithms[edit]
Search[edit]
We are looking for a value k in the B+ Tree. This means that starting from the root, we are looking for the leaf which may contain the value k. At each node, we figure out which internal node we should follow. An internal B+ Tree node has at most children, where every one of them represents a different subinterval. We select the corresponding child via a linear search of the m entries, then when we finally get to a leaf, we do a linear search of its n elements for the desired key. Because we only traverse one branch of all the children at each rung of the tree, we achieve runtime, where N is the total number of keys stored in the leaves of the B+ tree.^{[3]}
function search(k, root) is let leaf = leaf_search(k, root) for leaf_key in leaf.keys(): if k = leaf_key: return true return false
function leaf_search(k, node) is if node is a leaf: return node let p = node.children() let l = node.left_sided_intervals() assert let m = p.len() for i from 1 to m  1: if : return leaf_search(k, p[i]) return leaf_search(k, p[m])
Note that this pseudocode uses 1based array indexing.
Insertion[edit]
 Perform a search to determine what bucket the new record should go into.
 If the bucket is not full (at most entries after the insertion), add the record.
 Otherwise, before inserting the new record
 split the bucket.
 original node has ⎡(L+1)/2⎤ items
 new node has ⎣(L+1)/2⎦ items
 Copy ⎡(L+1)/2⎤th key to the parent, and insert the new node to the parent.
 Repeat until a parent is found that need not split.
 Insert the new record into the new node.
 split the bucket.
 If the root splits, treat it as if it has an empty parent and split as outline above.
Btrees grow at the root and not at the leaves.^{[1]}
Bulkloading[edit]
Given a collection of data records, we want to create a B+ tree index on some key field. One approach is to insert each record into an empty tree. However, it is quite expensive, because each entry requires us to start from the root and go down to the appropriate leaf page. An efficient alternative is to use bulkloading.
 The first step is to sort the data entries according to a search key in ascending order.
 We allocate an empty page to serve as the root, and insert a pointer to the first page of entries into it.
 When the root is full, we split the root, and create a new root page.
 Keep inserting entries to the right most index page just above the leaf level, until all entries are indexed.
Note :
 when the rightmost index page above the leaf level fills up, it is split;
 this action may, in turn, cause a split of the rightmost index page one step closer to the root;
 splits only occur on the rightmost path from the root to the leaf level.^{[7]}
Deletion[edit]
The purpose of the delete algorithm is to remove the desired entry node from the tree structure. We recursively call the delete algorithm on the appropriate node until no node is found. For each function call, we traverse along, using the index to navigate until we find the node, remove it, and then work back up to the root.
At entry L that we wish to remove:
 If L is at least halffull, done
 If L has only d1 entries, try to redistribute, borrowing from sibling (adjacent node with same parent as L).
After the redistribution of two sibling nodes happens, the parent node must be updated to reflect this change. The index key that points to the second sibling must take the smallest value of that node to be the index key.
 If redistribute fails, merge L and sibling. After merging, the parent node is updated by deleting the index key that point to the deleted entry. In other words, if merge occurred, must delete entry (pointing to L or sibling) from parent of L.
Note: merge could propagate to root, which means decreasing height.^{[8]}
Implementation[edit]
The leaves (the bottommost index blocks) of the B+ tree are often linked to one another in a linked list; this makes range queries or an (ordered) iteration through the blocks simpler and more efficient (though the aforementioned upper bound can be achieved even without this addition). This does not substantially increase space consumption or maintenance on the tree. This illustrates one of the significant advantages of a B+tree over a Btree; in a Btree, since not all keys are present in the leaves, such an ordered linked list cannot be constructed. A B+tree is thus particularly useful as a database system index, where the data typically resides on disk, as it allows the B+tree to actually provide an efficient structure for housing the data itself (this is described in^{[9]}^{: 238 } as index structure "Alternative 1").
If a storage system has a block size of B bytes, and the keys to be stored have a size of k, arguably the most efficient B+ tree is one where . Although theoretically the oneoff is unnecessary, in practice there is often a little extra space taken up by the index blocks (for example, the linked list references in the leaf blocks). Having an index block which is slightly larger than the storage system's actual block represents a significant performance decrease; therefore erring on the side of caution is preferable.
If nodes of the B+ tree are organized as arrays of elements, then it may take a considerable time to insert or delete an element as half of the array will need to be shifted on average. To overcome this problem, elements inside a node can be organized in a binary tree or a B+ tree instead of an array.
B+ trees can also be used for data stored in RAM. In this case a reasonable choice for block size would be the size of processor's cache line.
Space efficiency of B+ trees can be improved by using some compression techniques. One possibility is to use delta encoding to compress keys stored into each block. For internal blocks, space saving can be achieved by either compressing keys or pointers. For string keys, space can be saved by using the following technique: Normally the ith entry of an internal block contains the first key of block . Instead of storing the full key, we could store the shortest prefix of the first key of block that is strictly greater (in lexicographic order) than last key of block i. There is also a simple way to compress pointers: if we suppose that some consecutive blocks are stored contiguously, then it will suffice to store only a pointer to the first block and the count of consecutive blocks.
All the above compression techniques have some drawbacks. First, a full block must be decompressed to extract a single element. One technique to overcome this problem is to divide each block into subblocks and compress them separately. In this case searching or inserting an element will only need to decompress or compress a subblock instead of a full block. Another drawback of compression techniques is that the number of stored elements may vary considerably from a block to another depending on how well the elements are compressed inside each block.
Applications[edit]
Filesystems[edit]
The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata indexing; BFS also uses B+ trees for storing directories. NTFS uses B+ trees for directory and securityrelated metadata indexing. EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.^{[10]} APFS uses B+ trees to store mappings from filesystem object IDs to their locations on disk, and to store filesystem records (including directories), though these trees' leaf nodes lack sibling pointers.^{[11]}
Database Systems[edit]
Relational database management systems such as IBM Db2,^{[9]} Informix,^{[9]} Microsoft SQL Server,^{[9]} Oracle 8,^{[9]} Sybase ASE,^{[9]} and SQLite^{[12]} support this type of tree for table indices, though each such system implements the basic B+ tree structure with variations and extensions. Many NoSQL database management systems such as CouchDB^{[13]} and Tokyo Cabinet^{[14]} also support this type of tree for data access and storage.
Finding objects in a highdimensional database that are comparable to a particular query object is one of the most often utilized and yet expensive procedures in such systems.^{[citation needed]} In such situations, finding the closest neighbor using a B+ tree is productive.^{[15]}
iDistance[edit]
B+ tree is efficiently used to construct an indexed search method called iDistance. iDistance searches for k nearest neighbors (kNN) in highdimension metric spaces. The data in those highdimension spaces is divided based on space or partition strategies, and each partition has an index value that is close with the respect to the partition. From here, those points can be efficiently implemented using B+ tree, thus, the queries are mapped to single dimensions ranged search. In other words, the iDistance technique can be viewed as a way of accelerating the sequential scan. Instead of scanning records from the beginning to the end of the data file, the iDistance starts the scan from spots where the nearest neighbors can be obtained early with a very high probability.^{[16]}
NVRAM[edit]
Nonvolatile randomaccess memory (NVRAM) has been using B+ tree structure as the main memory access technique for the Internet Of Things (IoT) system because of its non static power consumption and high solidity of cell memory. B+ can regulate the trafficking of data to memory efficiently. Moreover, with advanced strategies on frequencies of some highly used leaf or reference point, the B+ tree shows significant results in increasing the endurance of database systems.^{[17]}
See also[edit]
References[edit]
 ^ Comer, Douglas (1979). "Ubiquitous BTree". ACM Computing Surveys. 11 (2): 121–137. doi:10.1145/356770.356776. S2CID 101673.
 ^ ^{a} ^{b} PollariMalmi, Kerttu. ""B+ trees"" (PDF). Computer Science, Faculty of Science, University of Helsinki. p. 3. Archived from the original (PDF) on 20210414.
 ^ Grust, Torsten (Summer 2013). ""TreeStructured Indexing: ISAM and B+trees"" (PDF). Logo der Universität Tübingen Department of Computer Science: Database Systems. p. 84. Archived from the original (PDF) on 20211128.
 ^ ^{a} ^{b} "Database Systems for Advanced Applications". Scalable Splitting of Massive Data Streams.
 ^ "[Database Systems for Advanced Applications "Update Migration: An Efficient B+ Tree for Flash Storage"]". Lecture Notes in Computer Science, Vol 5982. Springer, Berlin, Heidelberg.
 ^ "ECS 165B: Database System Implementation Lecture 6" (PDF). UC Davis CS department. April 9, 2010. pp. 21–23.
 ^ Ramakrishnan, Raghu; Johannes Gehrke (2003). Database management systems (3rd ed.). Boston: McGrawHill. ISBN 0072465638. OCLC 49977005.
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} ^{f} Ramakrishnan Raghu, Gehrke Johannes – Database Management Systems, McGrawHill Higher Education (2000), 2nd edition (en) page 267
 ^ Giampaolo, Dominic (1999). Practical File System Design with the Be File System (PDF). Morgan Kaufmann. ISBN 1558604979. Archived from the original (PDF) on 20170213. Retrieved 20140729.
 ^ "BTrees". Apple File System Reference (PDF). Apple Inc. 20200622. p. 122. Retrieved 20210310.
 ^ SQLite Version 3 Overview
 ^ CouchDB Guide (see note after 3rd paragraph)
 ^ Tokyo Cabinet reference Archived September 12, 2009, at the Wayback Machine
 ^ Database Systems for Advanced Applications. Japan. 2010.
 ^ Jagadish, H. V.; Ooi, Beng Chin; Tan, KianLee; Yu, Cui; Zhang, Rui (June 2005). "iDistance: An adaptive B + tree based indexing method for nearest neighbor search". ACM Transactions on Database Systems. 30 (2): 364–397. doi:10.1145/1071610.1071612. ISSN 03625915. S2CID 967678.
 ^ Dharamjeet; Chen, TsengYi; Chang, YuanHao; Wu, ChunFeng; Lee, ChiHeng; Shih, WeiKuan (December 2021). "Beyond WriteReduction Consideration: A WearLevelingEnabled B⁺Tree Indexing Scheme Over an NVRAMBased Architecture". IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems. 40 (12): 2455–2466. doi:10.1109/TCAD.2021.3049677. ISSN 02780070. S2CID 234157183.
External links[edit]
This article's use of external links may not follow Wikipedia's policies or guidelines. (January 2019) 
 B+ tree in Python, used to implement a list
 Dr. Monge's B+ Tree index notes
 Evaluating the performance of CSB+trees on Mutithreaded Architectures
 Effect of node size on the performance of cache conscious B+trees
 Fractal Prefetching B+trees
 Towards pB+trees in the field: implementations Choices and performance
 CacheConscious Index Structures for MainMemory Databases
 Cache Oblivious B(+)trees
 The Power of BTrees: CouchDB B+ Tree Implementation
 B+ Tree Visualization
 B +trees by Kerttu PollariMalmi
 Data Structures BTrees and B+ Trees
Implementations[edit]
 Interactive B+ Tree Implementation in C
 Interactive B+ Tree Implementation in C++
 Memory based B+ tree implementation as C++ template library
 2019 improvement of previous
 Stream based B+ tree implementation as C++ template library
 Open Source JavaScript B+ Tree Implementation
 Perl implementation of B+ trees
 Java/C#/Python implementations of B+ trees
 C# B+ tree and related "AList" data structures
 File based B+Tree in C# with threading and MVCC support
 Fast semipersistent inmemory B+ Tree in TypeScript/JavaScript, MIT License
 JavaScript B+ Tree, MIT License
 JavaScript B+ Tree, Interactive and Open Source