SirixDB
Fiscal Host: Open Source Collective
Create an evolutionary, immutable, accumulate only database system
Contribute
Become a financial contributor.
Financial Contributions
Top financial contributors
Organizations
GitHub Sponsors
$20 USD since Nov 2022
Individuals
Johannes Lichtenberger
$20 USD since Dec 2021
SirixDB is all of us
Our contributors 3
Thank you for supporting SirixDB.
Budget
Transparent and open finances.
Credit from GitHub Sponsors to SirixDB •
+$20.00USD
Completed
Added funds #594314
Credit from Johannes Lichtenberger to SirixDB •
+$20.00USD
Completed
Contribution #503541
sirix.io domain
from Johannes Lichtenberger to SirixDB •
-$111.00 USD
Approved
Reimbursement #57630
$
Today’s balance$34.51 USD
Total raised
$34.51 USD
Total disbursed
--.-- USD
Estimated annual budget
--.-- USD
Connect
Let’s get the ball rolling!
Conversations
Let’s get the discussion going! This is a space for the community to converse, ask questions, say thank you, and get things done together.
Let us know what you think
Published on December 6, 2021 by Johannes Lichtenberger
Do you intend to use SirixDB? Do you have feature requests and/or having a hard time setting SirixDB up or embedding the library in a JVM based project?
About
SirixDB was started as a fork of a university project (University of Konstanz), called Treetank. Dr. Marc Kramis began to build the first version in 2006 with a few students for his Ph.D. thesis under the supervision of Dr. Marcel Waldvogel at the Distributed Systems group at the University of Konstanz.
As the project was nearly reaching its end, I (Johannes Lichtenberger, who began working on Treetank early on) have forked the project and kept on working on SirixDB in my spare time. At first, I introduced diffing capabilities between revisions. Furthermore, I've created interactive visualizations of the differences between revisions for my master thesis. Next, I began to work on a binding for Brackit(.org), a query engine based on XQuery and JSONiq for different storage engines with common query optimizations. You can add custom optimizations for a storage backend quickly through AST rewrites. Next, I've built several JSON layers to store binary JSON data and used some ideas from the Adaptive Radix Trie (ART) and Hash Array Mapped Tries (HAMT) to reduce the storage cost of pages with a lot of null references.
Furthermore, I've built a path summary and several custom index structures (name/field indexes, path indexes, content-and-structure indexes, and value indexes). SirixDB stores these indexes in the leaf nodes of subtrees of the RevisionRootPage. Database pages, which do not change between revisions, are referenced.
SirixDB allows the creation of several resources in a database (a collection of resources). It currently supports the import of XML and JSON files and stores these in a huge persistent, durable tree. An UberPage is the root of the resource tree and, thus, the main entry point. Underneath, SirixDB indexes revisions in a trie. A RevisionRootPage denotes the entry point into a specific resource. A second offset-file stores all revision-timestamps and the offsets into the main log-file for a particular revision to support binary search on the timestamps in-memory to fetch the specific RevisionRootPage from the log-file. SirixDB only ever appends data. A commit issues a postorder traversal of the in-memory transaction log and writes page fragments to durable storage. Each trie's last inner node level in the tree stores a predefined number of page fragments at most to the leaf data page fragments. A page fragment consists of currently changed nodes and nodes, which fall off a sliding window. Thus, SirixDB does not simply do the COW of a whole page, but stores batched fine-granular writes. Modern, byte-addressable, durable memory may be the best fit in the future to support small random reads of the page-fragments in parallel.
Moshe Uminer, as of the Hacktoberfest 2019, kept on working mainly on REST-API clients and a GUI web frontend for SirixDB and quickly became a core team member :-)
As the project was nearly reaching its end, I (Johannes Lichtenberger, who began working on Treetank early on) have forked the project and kept on working on SirixDB in my spare time. At first, I introduced diffing capabilities between revisions. Furthermore, I've created interactive visualizations of the differences between revisions for my master thesis. Next, I began to work on a binding for Brackit(.org), a query engine based on XQuery and JSONiq for different storage engines with common query optimizations. You can add custom optimizations for a storage backend quickly through AST rewrites. Next, I've built several JSON layers to store binary JSON data and used some ideas from the Adaptive Radix Trie (ART) and Hash Array Mapped Tries (HAMT) to reduce the storage cost of pages with a lot of null references.
Furthermore, I've built a path summary and several custom index structures (name/field indexes, path indexes, content-and-structure indexes, and value indexes). SirixDB stores these indexes in the leaf nodes of subtrees of the RevisionRootPage. Database pages, which do not change between revisions, are referenced.
SirixDB allows the creation of several resources in a database (a collection of resources). It currently supports the import of XML and JSON files and stores these in a huge persistent, durable tree. An UberPage is the root of the resource tree and, thus, the main entry point. Underneath, SirixDB indexes revisions in a trie. A RevisionRootPage denotes the entry point into a specific resource. A second offset-file stores all revision-timestamps and the offsets into the main log-file for a particular revision to support binary search on the timestamps in-memory to fetch the specific RevisionRootPage from the log-file. SirixDB only ever appends data. A commit issues a postorder traversal of the in-memory transaction log and writes page fragments to durable storage. Each trie's last inner node level in the tree stores a predefined number of page fragments at most to the leaf data page fragments. A page fragment consists of currently changed nodes and nodes, which fall off a sliding window. Thus, SirixDB does not simply do the COW of a whole page, but stores batched fine-granular writes. Modern, byte-addressable, durable memory may be the best fit in the future to support small random reads of the page-fragments in parallel.
Moshe Uminer, as of the Hacktoberfest 2019, kept on working mainly on REST-API clients and a GUI web frontend for SirixDB and quickly became a core team member :-)
Our team
Johannes Lich...
Admin
Moshe Uminer
Admin