Table of Contents
The HeapBackend stores data in a single database
file. New records are added by appending them to the end of the file, and records are deleted
by marking them as deleted in the file. After database records have been deleted,
the file contains holes where the deleted records were stored. The holes
are purged permanently if the backend is defragmented by calling the
Database's compact()
method.
When the client wants to search for a record in the heap backend, the backend iterates through the entire database file from the beginning until the requested record is found. The makes searches slow for large databases. Searches in large databases can be speeded up considerably by adding an index. See below.
See the preceding chapter for examples of how a HeapBackend can be created.
The ConstantRecordSizeHeapBackend is a variant of the HeapBackend for data records that have a constant size. Record insertion and searches works just like for the heap database – records are appended to the end of the file and searches involve scanning through the database file until the requested record is found. The difference between the two backends is in record deletion. This backend deletes a record by copying the last record in the database file to the place of the record to delete and then truncating the database file with one record. Hence, the database file never becomes defragmented.
Example 5.1. Creating a database using a constant record size heap backend
// Create a SimpleDatabase that uses a ConstantRecordSizeHeapBackend on the // database File f. The database contains Integer keys and Long values. // A LogAdapter that logs to stdout and stderr. LogAdapterHolder lah = new LogAdapterHolder( new StdOutLogAdapter()); Database<Integer, Long> db = new SimpleDatabase<Integer, Long, Long>( // Use a ConstantRecordSizeHeapBackendBuilder object to build the backend. new ConstantRecordSizeHeapBackendBuilder<Integer, Long>(). setKeySerializer(IntegerSerializer.INSTANCE). setValueSerializer(LongSerializer.INSTANCE). setLogAdapterHolder(lah). // Adapt the File to a ReadWritableFile. create(new ReadWritableFileAdapter(f)), lah); // Insert a few values db.insert(1, 1L); db.insert(2, 1L); db.insert(3, 2L); db.insert(4, 3L); db.insert(5, 5L); db.insert(6, 8L); db.insert(7, 13L);
The ConstantRecordSizeBPlusTreeBackend stores its records, which must be of a constant size, in a B+ Tree (using the BPlusTree class). B+ Trees provide fast inserts, updates and searches for a small overhead in metadata. Read more about B+ Tree properties in the Wikipedia article on B+ Trees.
Example 5.2. Creating a database using a constant record size B+ Tree backend
// Create a SimpleDatabase that uses a ConstantRecordSizeBPlusTreeBackend on the // database File f. The database contains Integer keys and String values. The // strings may be up to nine characters long. Longer strings are truncated to // nine characters. // A LogAdapter that logs to stdout and stderr. LogAdapterHolder lah = new LogAdapterHolder( new StdOutLogAdapter()); // A SimpleDatabase that have Integer keys and String values. Database<Integer, String> db = new SimpleDatabase<Integer, String, KeyAndValue<Integer, String>>( new ConstantRecordSizeBPlusTreeBackend<Integer, String>( // The NodeRepository is used by the BPlusTree // Use a caching node repository. new LruCacheNodeRepository<Integer, String>( // Use a FileBackedNodeRepositoryBuilder to build the file-backed // node repository. // The Integer keys are Comparable, so we don't have to use a Comparator. new FileBackedNodeRepositoryBuilder<Integer, String>(). // Use a node size of 4096 bytes for the B+ Tree. For optimal // performance, this should be the same as the allocation size of the // underlying file system. setNodeSizeStrategy(new FixedSizeNodeSizeStrategy(4096)). // Have to use a null-aware serializer for the B+ Tree keys. setKeySerializer(IntegerNullSerializer.INSTANCE). // This value serializer will truncate text longer than 9 characters. setValueSerializer( new ConstantSizeStringSerializer( Charset.forName("ISO-8859-1"), 9, true)). // This pointer size sets the maximum size of the B+ Tree to // 2^(2*8) = 65536 bytes. setInternalPointerSize(2). setLogAdapterHolder(lah). // Adapt the File to a ReadWritableFile. create(new ReadWritableFileAdapter(f), false), // Set the maximum LRU cache size to 16 nodes. 16), false, lah), lah); // Insert a few values db.insert(1, "apple"); db.insert(2, "banana"); // This value will be truncated to grapefrui when written to disk. As long as // its node remains in the LRU cache, it will be grapefruit, though. Be careful. db.insert(3, "grapefruit"); db.insert(4, "orange"); db.insert(5, "tangerine"); db.insert(6, "pear");
The MapBackend stores its data in a Map. It is perhaps most useful for testing.
The BPlusTreeIndexBackend creates a B+ Tree index for the positions of keys in another, proxied DatabaseBackend object. It can be used to speed up searches in backends that are slow to search through, such as the heap backend. The index is kept in a separate file.
The BPlusTree object requires that B+ Tree values should be of a constant size. To be able to accommodate proxied backend with both constant and variable size keys, the BPlusTreeIndexBackend uses a Hasher to hash the keys into a constant size value. If the proxied backend uses constant-size keys, a value-preserving hasher such as the LongToLongHasher or the ConstantSizeValueHasher can be used.
Example 5.3. Creating a B+ Tree index for a heap backend
// Create a new SimpleDatabase that uses a HeapBackend // with a BPlusTreeIndexBackend to speed up searches in the database. // The database will store String keys and values. // - f is a File where the database data will be stored. // - indf is a File where the B+ Tree index data will be stored. // A LogAdapter that logs to stdout and stderr. LogAdapterHolder lah = new LogAdapterHolder( new StdOutLogAdapter()); Database<String, String> db = new SimpleDatabase<String, String, Long>( // Use a BPlusTreeIndexBackendBuilder object to build the // BPlusTreeIndexBackend. new BPlusTreeIndexBackendBuilder<String, Long>(). // Hash the keys to a long value (eight bytes). // Beware of the birthday paradox! Use sufficiently large hash values. setKeyHasher(StringToLongHasher.INSTANCE). create( // Use a HeapBackendBuilder. new HeapBackendBuilder<String, String>(). setKeySerializer(StringSerializer.INSTANCE). setValueSerializer(StringSerializer.INSTANCE). // Adapt f to a ReadWritableFile. create(new ReadWritableFileAdapter(f)), // The BPlusTree is <Long, Long> since it has Long keys (the key // hashes) and Long values (the positions in the HeapBackend). new BPlusTree<Long,Long>( // Use a LRU caching NodeRepository. new LruCacheNodeRepository<Long, Long>( // Use a FileBackedNodeRepositoryBuilder to build the // node repository. new FileBackedNodeRepositoryBuilder<Long, Long>(). // Use a fixed node size of 4096 bytes. This is a common file // system allocation size. setNodeSizeStrategy(new FixedSizeNodeSizeStrategy(4096)). // Have to use a null-aware serializer for the keys. // This particular serializer uses Long.MIN_VALUE to represent // null, which means that that key value cannot be used at all in // the database. setKeySerializer(LongNullSerializer.INSTANCE). setValueSerializer(LongSerializer.INSTANCE). // An internal pointer size of 3 bytes sets the maximum size of // the B+ Tree to 2^(3*8) = 16777216 bytes. setInternalPointerSize(3). setLogAdapterHolder(lah). create(new ReadWritableFileAdapter(indf), false), // Use a cache size of 16 tree nodes. 16), // We don't have to use a key Comparator (null) since the tree's keys // are Comparable themselves (Long:s). null, lah)), lah); // Insert a record. db.insert("apples", "oranges");
The LruCacheBackend stores results from searches and updates to a proxied DatabaseBackend object in a LRU (Least Recently Used) cache with a configurable maximum size. If the cache is full when the backend tries to add a new object to it, the least recently object is purged from the cache to make room for the new object.
The backend also keeps a cache for negative search results; searches that did not find the key that the client was looking for.
Example 5.4. Creating a LRU cache and a B+ Tree index for a heap backend
// Create a new SimpleDatabase that uses a HeapBackend // with a BPlusTreeIndexBackend and a LruCacheBackend // to speed up searches in the database. // The database will store String keys and values. // - f is a File where the database data will be stored. // - indf is a File where the B+ Tree index data will be stored. // A LogAdapter that logs to stdout and stderr. LogAdapterHolder lah = new LogAdapterHolder( new StdOutLogAdapter()); Database<String, String> db = new SimpleDatabase<String, String, Long>( new LruCacheBackend<String, String, Long>( // Use a BPlusTreeIndexBackendBuilder object to build the // BPlusTreeIndexBackend. new BPlusTreeIndexBackendBuilder<String, BigInteger>(). // Hash the strings to a twelve-byte value, with the most significant // bit in the most significant byte removed. The most significant bit is // used to represent a null value by the FixedSizeBigIntegerNullSerializer // used below instead. setKeyHasher(new StringToBigIntegerHasher(12, 7)). create( // Use a HeapBackendBuilder. new HeapBackendBuilder<String, String>(). setKeySerializer(StringSerializer.INSTANCE). setValueSerializer(StringSerializer.INSTANCE). // Adapt f to a ReadWritableFile. create(new ReadWritableFileAdapter(f)), // The BPlusTree is <Long, BigInteger> since it has BigInteger keys // (the key hashes) and Long values (the records' positions in the // HeapBackend). new BPlusTree<BigInteger, Long>( // Use a LRU caching NodeRepository. new LruCacheNodeRepository<BigInteger, Long>( // Use a FileBackedNodeRepositoryBuilder to build the // node repository. new FileBackedNodeRepositoryBuilder<BigInteger, Long>(). // Use a fixed node size of 4096 bytes. setNodeSizeStrategy(new FixedSizeNodeSizeStrategy(4096)). // Have to use a null-aware serializer for the keys. // // This serializer uses a null value that has the most // significant bit in the most significant byte set. We chose to // not use that particular bit for our hashes when we created // the StringToBigIntegerHasher above. By doing that we // have made sure that null won't collide with any real value. // // Set the total data size of the serializer to 13 bytes. The // first byte of the serialized data is the length of the // BigInteger data. setKeySerializer(new FixedSizeBigIntegerNullSerializer(13)). setValueSerializer(LongSerializer.INSTANCE). // An internal pointer size of 3 bytes sets the maximum size of // the B+ Tree to 2^(3*8) = 16777216 bytes. setInternalPointerSize(3). setLogAdapterHolder(lah). create(new ReadWritableFileAdapter(indf), false), // Use a cache size of 16 tree nodes. 16), // We don't have to use a key Comparator (null) since the tree's keys // are Comparable themselves (BigInteger:s). null, lah)), // The cache size and the negative cache size false, 256, 32), lah); // Insert a record. db.insert("Fiat", "Ritmo");