Master's project by Jason Winnebeck - Current Status | Download | Using RealDB | Progress Reports
RealDB (RDB) will be a real-time embeddable database system for data streams. It is a non-relational database that is based on a changing value of a stream of sequential, timestamped data. To support this environment, the system will have a low overhead and deliver high performance; the trade-off is a narrower approach and fewer guarantees than a traditional relational database management system (RDBMS) would deliver.
To summarize the main topics on what this work is to accomplish:
For more details, please see the latest proposal.
The RealDB project proposal is finished. The final proposal can be found here (SVN revision 81, 7-Aug-2008).
The project schedule, from the final proposal (status as of 1-Sept-2008):
| Target | Planned | Completed | Percent Complete |
|---|---|---|---|
| Preproposal | 2006-10-17 | 2006-10-17 | 100% |
| Preproposal Presentation | 2006-10-17 | 2006-10-17 | 100% |
| Proposal Approved | 2008-07-31 | 2008-08-11 | 100% |
| Milestone 1 | 2008-08-31 | 2008-08-28 | 100% |
| Milestone 2 | 2008-08-31 | 2008-09-29 | 100% (r123) |
| Milestone 3 | 2008-09-08 | 2009-01-29 | 100% (r199) |
| Milestone 4 | 2008-09-22 | 2010-02-28 | 100% |
| Milestone 5 | 2008-10-20 | 0% | |
| Milestone 6 | 2008-11-10 | ||
| Milestone 7 | 2008-11-24 | ||
| Report | 2008-12-15 | ||
| Defense | 2009-01-15 |
Latest Release: RealDB-2010-02-28-r318 (milestone 4), released on February 28, 2010.
Javadocs and be browsed online here. You may want to read on how to use RealDB.
RealDB Project - Copyright (c) 2008-2010 by Jason Winnebeck
All Rights Reserved.
My eventual goal is to release the project as open-source, but currently I have not settled on which license. Therefore for now the license is "all rights reserved" -- no copying, distribution, or derivative works are permitted with this release.
For compiling and running unit tests, the following libraries are used:
For running RealDB, the only dependency is the Antlr runtime jar.
This section is very minimalistic and will be expanded in the future.
Download the full release of RealDB; no compilation is needed. There are currently three ways to use RealDB:
To compile RealDB yourself, the most compatible method is to use the realdb.xml build file, which should work with even very old Ant versions. This method was used to generate the full distribution.
ant -file realdb.xml
Currently this method can not run unit tests. You can use the NetBeans-generated build files if you have Ant 1.7.1 or later under the build.xml file. You can use NetBeans IDE or IntelliJ IDE with the IPR project file (Community Edition is OK) to also compile and run all unit tests. The IntelliJ project files are the most maintained and up-to-date versions.
The RealDB Definition Language (RDL) is used to define RealDB databases. Here is an example RDL file:
SET blockSize = 2048
SET fileSize = 204800
SET maxStreams = 3
SET dataBlockSize = 2
CREATE STREAM Test WITH ID 1 {
value float NULL //will use SampledAlgorithm by default
}
CREATE STREAM CarSnapshots WITH ID 2 {
rpm float WITH CODEC DeadbandAlgorithm PARAMS (deadband=50.0),
speed float WITH CODEC DeadbandAlgorithm PARAMS (deadband=5),
passengers uint8 WITH CODEC StepAlgorithm,
driving boolean WITH CODEC StepAlgorithm
}
This section will be expanded more later. Brief descriptions of the elements in this file:
A DataStream is a time-ordered sequence of Records with a fixed format -- an ordered list of elements.
A single entry in a DataStream, has a timestamp and can be a "discontinuity" marker.
A specific field in the Record with a known name and type.
A reconstructed continuous interval of a DataStream that comprises of 1 or more original (possibly dropped) records.
A stream can be in a "discontinuity" state. This is different from elements individually being null and even all elements being null at the same time. The actual meaning can be defined by the user. For example a null could be "not applicable," or "collected but not present", whereas a discontinuity could signifiy ranges where the system was turned off and not collecting data at all. When a discontinuity is written, the stream is said to remain in that state until the next non-discontinuity record is written.
All records in RealDB are timestamped, and written in ascending order of time. The timestamp is a 63 bit unsigned integer. RealDB does not assume any particular unit or meaning for time other than it being linear (i.e. distance between 5 and 10 is the same as between 50 and 55), leaving the user to define both the unit (i.e. seconds versus milliseconds) and the epoch (the meaning of time 0).
The current command line tool supports the following commands. Use the help command to get more information:
| bulkLoad | Bulk loads data into a stream from a tab-separated values file. |
| create | Creates a new RealDB Database |
| describe | Shows information about a database. |
| help | Prints out a list of all commands or detailed help on a single command |
| intervals | Reads intervals in a given time range on an element within a stream. |
| read | Reads all raw records or a range of raw records from a single stream and outputs a tab-delimited output with the first row as the column headers. |
| summary | Reads intervals in a given time range on an element within a stream, and summarizes them into a single report. |
Milestone 4 release. I took a little extra time from my previous report to make the import/export file formats more robust to handle discontinuities, and I realized that I missed some API support for discontinuities in StreamIntervals.
Refer to the instructions above for how to download and run RealDB. For this release, you can run these commands:
create examples/test.rdl test.rdb describe test.rdb bulkLoad test.rdb 1 examples/test.txt bulkLoad test.rdb 2 examples/CarSnapshots.txt read test.rdb 1 10 20 read test.rdb 2 intervals test.rdb 2 rpm 50 70 summary test.rdb 2 passengers
To get the output below:
% java -jar RealDB-cli.jar
RealDB Tool Interactive mode
This version is extremely experimental and the commands will change in the next release.
You can also invoke this program with arguments to run a single command non-interactively.
Use 'help' command to get a list of commands, 'exit' to end
> create examples/test.rdl test.rdb
Created database at test.rdb
> describe test.rdb
Database /home/stu4/s30/jpw9607/realdb/realdb-2010-02-28-r318/test.rdb:
Block size : 2048
Database size : 100 blocks (204800 bytes)
Data block size: 2
Database has 2 streams:
Test (1), with 1 elements:
0 value (Float, optional) with codec SampledAlgorithm
Stream contains records from times -1 to -1
CarSnapshots (2), with 4 elements:
0 rpm (Float) with codec DeadbandAlgorithm
1 speed (Float) with codec DeadbandAlgorithm
2 passengers (UInt8) with codec StepAlgorithm
3 driving (Boolean) with codec StepAlgorithm
Stream contains records from times -1 to -1
> bulkLoad test.rdb 1 examples/test.txt
Wrote 35 records into stream Test (ID 1)
> bulkLoad test.rdb 2 examples/CarSnapshots.txt
Wrote 30 records into stream CarSnapshots (ID 2)
> read test.rdb 1 10 20
stream time value
1 10 null
1 11 11.0
1 12 12.0
1 13 13.0
1 14 14.0
1 15 15.0
1 -- --
1 17 null
1 18 18.0
1 19 19.0
1 20 20.0
Read 11 records.
> read test.rdb 2
stream time rpm speed passengers driving
2 5 0.0 0.0 0 false
2 20 0.0 0.0 1 false
2 35 0.0 0.0 2 false
2 45 0.0 0.0 3 false
2 50 500.0 0.0 3 true
2 65 600.0 0.0 3 true
2 70 950.0 5.0 3 true
2 75 1250.0 10.0 3 true
2 80 1500.0 20.0 3 true
2 85 700.0 25.0 2 true
2 90 850.0 30.0 2 true
2 95 1000.0 35.0 2 true
2 100 1200.0 40.0 2 true
2 105 1400.0 45.0 2 true
2 120 0.0 0.0 2 false
2 135 0.0 0.0 0 false
Read 16 records.
> intervals test.rdb 2 rpm 50 70
stream element start end min avg max integral
2 rpm 50 65 500.0 500.0 500.0 7500.0
2 rpm 65 70 600.0 600.0 600.0 3000.0
2 rpm 70 75 950.0 950.0 950.0 4750.0
Read 3 intervals.
> summary test.rdb 2 passengers
stream element count start end min avg max integral
2 passengers 16 5 135 0.0 0.25384615384615383 3.0 255.0
I am almost completely done with milestone 4, and a release will be coming this week. Since the last progress report, I have completed all of the "next steps":
I'm just polishing up a few things for the release:
Here is an example RDL that I will be using for this run:
SET blockSize = 2048
SET fileSize = 204800
SET maxStreams = 3
SET dataBlockSize = 2
CREATE STREAM Test WITH ID 1 {
value float //will use SampledAlgorithm by default
}
CREATE STREAM CarSnapshots WITH ID 2 {
rpm float WITH CODEC DeadbandAlgorithm PARAMS (deadband=50.0),
speed float WITH CODEC DeadbandAlgorithm PARAMS (deadband=5),
passengers uint8 WITH CODEC StepAlgorithm,
driving boolean WITH CODEC StepAlgorithm
}
There is one caveat to the completion of milestone 4: the current transaction handling, while functioning perfectly (as far as I know), does not meet the scalability requirements in my proposal as the current implementation is naive. It is linear time on the size of the database, while it should be linear time on the uncommitted transactions, up to the maximum user-specified transaction log size.
The command line tool (org.gillius.realdb.tools.cli.RealDBTool) will provide at least the following commands:
Currently I am working on completing milestone 4, which really has caused me to come up with a full solution to the issue of treating the database as a database of continuous signals/streams rather than a series of points. I did not encounter much challenges in defining a data model to represent a "stream interval" over a time range. But then, I faced a dilemma about what kind of classes of interpolation ("compression") methods to support -- what is the minimum that I need in my API to support everything that I want. There were two major problems with my original concept:
For awhile each adjustment to that design I tried would be unworkable in some way. I didn't want this to hold up my project any longer especially since I was trying to really focus on reliability and performance than complex interpolation methods. But then I finally had a breakthrough after reviewing some typical methods. Many reasonable methods didn't really require anything but the points, but they do sometimes require to know about points coming ahead in the stream. Those that do take more parameters than the control points themselves are either too complicated, I think (NURBS), can be set to, reasonable, "generic" values (Hermite). I realized that this is true only after some research on interpolation methods, the most concise and useful I found is at Paul Bourke's site. So I made two fundamental changes in my design, based on the above problems, and with the new knowledge of what the algorithms would really need:
The first change also allowed the separation of the serialization code from the compression code, which significantly simplified the design by separating the responsibilities better as well as allowing the compression code to be potentially used outside of the RealDB context. The new model for both compression and reconstruction is a stream transformation function. The "look ahead" and "look behind" features are enabled by the design by allowing the stream transformation function to be delayed. For example see the table of interactions below for a theoretical transformation of a stream with 5 samples to a stream with 3:
| Input | Output |
|---|---|
| 2.0 | null |
| 3.0 | null |
| 4.0 | 2.0 |
| 4.0 | null |
| 4.5 | 4.0 |
| null | null |
| null | 4.5 |
The compression algorithm delays the output by 2 samples (because it wants to see 2 samples ahead to determine if a sample is needed), and preserves the first and last samples.
It is expected that 0 order interpolation (deadbands, always sample) need 0 look-ahead, first order needs 1 look-ahead, and higher order like cubic or hermite need 2 or more look-ahead points. Until I thought of the better design, I was ready to say that RealDB would only support nothing more complicated than linear interpolation under the theory that anything else would be too slow. However, looking at the implementation, cubic and/or Hermite don't involve a high number of complex operations.
The other thought I had about methods like Cubic and Hermite that work on 4 control points is that the complexity to find the best points could be something like NP-complete, and I almost ruled it out on that fact, until I thought that it is conceivable to have O(n) methods that have a chance of being quite good even though not perfect. I don't intend on researching these techniques in this project -- I only want to design a framework that could support such a technique were one to design it. The example technique I used to prove to myself it could be possible to be practically useful in terms of performance is this simple shell of an heuristic point-selection algorithm for a cubic/Hermite interpolation function:
There would be some handling of edge cases in the above that I left out for simplicity, to express the point simply that such an algorithm is feasible and could be successful in eliminating a significant portion of points.
Both steps above have finished tested (via unit testing).
In the last 4 months, there was development and fixing work leading up to a graphical demonstration program that works on physical disks; demonstrated on hard drive and removable flash. The demo produced at the middle of May would occasionally fail to create the database, which was surprising given the automated testing I had done.
I sat out to create an automated test based on the demo and was able to reproduce the problem in some cases. There were two bugs:
I wondered, why didn't my tests where I test failures at every possible I/O operation detect these bugs? Because I chose parameters such that a single index block was sufficient to hold all of the indices of a stream; a supported but degenerate case where head == tail and never moves.
I fixed the two bugs and I have yet to get the demo to fail again. I updated the original tests and ensured that they failed with the old code. However, with the new code they get much farther in the test, but still fail. So, I am still finding some bugs. These tests are needed not just because it's virtually impossible to duplicate all of the write faults in a real situation, but will come in critical use when I start "optimizing" some of the naive code just to get things to "work," in order to meet the complexity goals for the project (i.e. turn O(n) recovery into O(1), for example).
Updated the milestone 3 acceptance test framework to extend to n streams, where each stream skips every x records. This allows for streams to write between 50% to 100% of the full dataset. I haven't figured out how to "test" the performance aspects -- the test looks for the literal guarantees made by RealDB. Because RealDB is allowed to reclaim space (deleting records) technically as long as there are some records and they are in order and the same records that I've written, the test can pass. I've written some statements to print that give me a "heuristic" way to check that the test is reasonable. Here is an example of the results of the full (non-corrupting) run of an 8 stream test with skips = 0, 5, 2, 3, 0, 6, 50, 90. What this means is that stream 1 skips no records (100%), stream 3 skips 1 out of every 2 records (50%) and stream 7 skips 1 out of every 50 records (98%).
Total operations performed: 19623
Stream Test0 (ID 0) has 12999 records from 37001 to 50000
Stream Test1 (ID 1) has 10399 records from 37001 to 49999
Stream Test2 (ID 2) has 6499 records from 37001 to 49999
Stream Test3 (ID 3) has 8667 records from 37000 to 50000
Stream Test4 (ID 4) has 13000 records from 37000 to 50000
Stream Test5 (ID 5) has 10833 records from 37000 to 50000
Stream Test6 (ID 6) has 12739 records from 37001 to 49999
Stream Test7 (ID 7) has 12856 records from 37000 to 50000
Note that the time ranges are fairly close together -- the deletion algorithm always picks the oldest block, which is supposed to try to keep the tail of each stream together. The fact that the streams each contain proportionally different records is also a good sign. However, it's hard to write a JUnit assertion for this since I haven't defined the precise tolerance -- because of flushing and block boundaries, the stream that outputs half of the records of another will likely not have exactly half (but close -- within a block's worth).
Every 250 iterations I output the statistics again. Below is an example of the statistics on iteration 10250 (meaning that after the 10,250th I/O operation it throws an exception and, if it was a write, changes the target block or blocks to random data).
Beginning iteration 10250 Stream Test0 (ID 0) has 12999 records from 13501 to 26500 Stream Test1 (ID 1) has 10460 records from 13424 to 26499 Stream Test2 (ID 2) has 6499 records from 13501 to 26499 Stream Test3 (ID 3) has 8666 records from 13501 to 26500 Stream Test4 (ID 4) has 12999 records from 13500 to 26499 Stream Test5 (ID 5) has 10909 records from 13408 to 26499 Stream Test6 (ID 6) has 12890 records from 13346 to 26499 Stream Test7 (ID 7) has 12854 records from 13501 to 26499
We can see from this example that the tails are pretty close together and so are the ratios, which gives a good impression that the transaction and recovery handling is not just passing the strict requirements but also is performing reasonably. An additional piece of evidence is that the total number of records is close to the non-failing example.
In my last status update I forgot to mention that not everything is clean -- there are some limitations:
Summary: Although 2 months have passed without a progress report, a lot of work has been done. The version on 25-Nov was not capable of space management (reclaiming the oldest block to write new data), nor was it capable of handling corruption. 39 revisions of code later, RealDB now has space management and is capable of handling file storage failures at any point. Tentatively I am calling milestone 3 completed, but I want to clean up a few loose ends.
As an acceptance test to confirm completion of milestone 3, created a JUnit-based automated test similar to the one in milestone 2. The key component in this test is the usage of the FailingBlockFile, which allows throwing an IOException after x calls into the API and, if the call was a write, overwrites the entire block (or set of blocks if a multi-block call) with random data. The steps in this test are as follows:
The test runs in 461 seconds on my machine to loop over failures in each of the 7969 operations. It is important to note that the 7969 operations even includes the operations to actually create the database from an empty file, so the test also tests for failures when creating the database. However, RealDB does not guarantee that you can reload such a database. If the database throws IOException before it is built, the test does not attempt to read any data and passes.
Unfortunately I had to end up implementing a lightweight transaction system in RealDB. I wanted initially to stay away from transactions and rely more on "checkpoints", which scale with the number of flush operations. However, to adhere to RealDB's guarantees, I only have to do transactions on the two data index operations: Moving a block from the free pool to an index and moving a block from one index to another (or to itself).
The key feature in the transaction system to improve the performance is to not only let multiple transactions be pending at the same time, but also allow the operations to be scheduled for a different time. Each transaction is a list of operations represented as object instances implementing java.io.Flushable. The first operation of a transaction always is to open the transaction by writing it to disk. The add operation returns a Flushable object itself of a task that must be performed if the given operation is performed. This ensures that operations are performed always in order, which is essential for allowing transaction recovery.
An example might be a transaction that requires the steps A, B, C, D, E. A coordinator gets a request to perform an operation, which is to be done as a single transaction. It schedules A, B, C, D, and E to occur in order, although none of the operations actually occur yet, through calls like "addBlock( int block, Transaction tx );" If commit is called directly on the transaction, all 5 operations will occur at that time, in order. But of course there is more than one piece of data in every block in RealDB. That necessarily means that if a block is committed on an index, for example, a whole number of transactions that may be affected.
Let's say that D is the operation to add the item to the index. The coordinator calls index.addBlock( block, tx ). The index calls tx.addAction( D ) and that method returns back an object that performs {A,B,C} in order. If index.flush() is called, say because the client wants to commit some other operation "F", this might mean that D is about to be written to disk. Therefore, before the flush occurs, the index calls its object, which performs {A,B,C}, then it performs D directly. A Transaction remebers what is already done, so if later a piece of code asks for C to happen, it looks to run {A,B,C} and sees that those are already completed, and therefore skips those steps.
I am still currently concerned somewhat about circular dependencies, so I will have to make sure that what can be a transaction is managed properly and if a circular dependency could happen, it forces an early commit of part of a transaction. In fact, this is what happens on a smaller scale when a transaction occurs to transfer a block from the end of a stream to the front of that same stream. To properly support transaction recovery, I have to make sure that the removal happens before the add -- and because the head and tail of the queue could rest in the same physical block, it is possible that the block cannot be removed before it is added. It would be possible to perform this operation entirely without a Transaction, but that would require the coordinator to have a too deep of an understanding of the DataIndex implementation.
I would like to try to solidify that I've met milestone 3 by testing a multiple stream database and testing on real flash card hardware and creating faults by actually pulling the drive out of the socket as it is writing. I have a high confidence that both situations should work because:
I would also like to extend the test a bit to check to make sure that no free blocks are unaccounted for. The database could "leak blocks" but still pass in the scenarios above.
Next Steps to complete milestone 3: