Flash‑Based Storage + In‑Memory Performance
A ACID-compliant database built for in-memory analytics speed. For out-of-core processing it falls back gracefully to flash-based storage. Featuring fast code generation, low-latency query execution, and drop-in PostgreSQL compatibility.Try Demo Now Publications
Low-Overhead Buffer Manager
Umbra's LeanStore-based buffer manager provides in-memory performance for cached data and degrades gracefully in case the data does not fit into RAM. It uses variable-size pages to bridge the performance gap between background storage and main-memory effectively.
Low-Latency Query Compilation
Umbra's query compiler adaptively switches between two modes: Direct machine code generation instantly gets the query execution started. Longer Running Queries automatically switch to a higher gear with the help of the LLVM optimizing compiler.
Compact Intermediate Representation
Our custom intermediate representation uses a finely tuned data layout tailored to fast program generation and basic optimizations. It lays the foundation for our multi-level debugger.
Umbra features ACID-compliant transaction execution. Queries are specified in SQL, including extensions like window functions. It is a drop-in replacement for PostgreSQL.
Computational Database beyond SQL
Umbra provides an efficient approach to user-defined functions. The system automatically parallelizes user functions, creates deep integration into generated code, and ensures morsel-driven execution. This makes it easy to extend Umbra with advanced operations, e.g., gradient-descent and k-means algorithms.
From the Creators of HyPer
Prof. Dr. Thomas Neumann
Head of the Chair for Data Science and Engineering
Prof. Dr. Viktor Leis
Head of the Databases and Information Systems Group at Friedrich Schiller University Jena
Umbra: A Disk-Based System with In-Memory Performance
CIDR 2020 | Thomas Neumann and Michael Freitag | January 13, 2020
The increases in main-memory sizes over the last decade have made pure in-memory database systems feasible, and in-memory systems offer unprecedented performance. However, DRAM is still relatively expensive, and the growth of main-memory sizes has slowed down. In contrast, the prices for SSDs have fallen substantially in the last years, and their read bandwidth has increased to gigabytes per second. This makes it attractive to combine a large in-memory buffer with fast SSDs as storage devices, combining the excellent performance for the in-memory working set with the scalability of a disk-based system. In this paper we present the Umbra system, an evolution of the pure in-memory HyPer system towards a disk-based, or rather SSD-based, system. We show that by introducing a novel low- overhead buffer manager with variable-size pages we can achieve comparable performance to an in-memory database system for the cached working set, while handling accesses to uncached data gracefully. We discuss the changes and techniques that were nec- essary to handle the out-of-memory case gracefully and with low overhead, offering insights into the design of a memory optimized disk-based system.
JSON Tiles: Fast Analytics on Semi-Structured Data
SIGMOD 2021 | Dominik Durner, Viktor Leis, and Thomas Neumann | June 25, 2021
Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently - often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.
Building Advanced SQL Analytics From Low-Level Plan Operators
SIGMOD 2021 | André Kohn, Viktor Leis, and Thomas Neumann | June 25, 2021
Analytical queries virtually always involve aggregation and statistics. SQL offers a wide range of functionalities to summarize data such as associative aggregates, distinct aggregates, ordered-set aggregates, grouping sets, and window functions. In this work, we propose a unified framework for advanced statistics that composes all flavors of complex SQL aggregates from low-level plan operators. These operators can reuse materialized intermediate results, which decouples monolithic aggregation logic and speeds up complex multi-expression queries. The contribution is therefore twofold: our framework modularizes aggregate implementations, and outperforms traditional systems whenever multiple aggregates are combined. We integrated our approach into the high-performance database system Umbra and experimentally show that we compute complex aggregates faster than the state-of-the-art HyPer system.
To Partition, or Not to Partition, That is the Join Question in a Real System
SIGMOD 2021 | Maximilian Bandle, Jana Giceva, and Thomas Neumann | June 25, 2021
An efficient implementation of a hash join has been a highly researched problem for decades. Recently, the radix join has been shown to have superior performance over the alternatives (e.g., the non-partitioned hash join), albeit on synthetic microbenchmarks. Therefore, it is unclear whether one can simply replace the hash join in an RDBMS or use the radix join as a performance booster for selected queries. If the latter, it is still unknown when one should rely on the radix join to improve performance. In this paper, we address these questions, show how to integrate the radix join in Umbra, a code-generating DBMS, and make it competitive for selective queries by introducing a Bloom-filter based semi-join reducer. We have evaluated how well it runs when used in queries from more representative workloads like TPC-H. Surprisingly, the radix join brings a noticeable improvement in only one out of all 59 joins in TPC-H. Thus, with an extensive range of microbenchmarks, we have isolated the effects of the most important workload factors and synthesized the range of values where partitioning the data for the radix join pays off. Our analysis shows that the benefit of data partitioning quickly diminishes as soon as we deviate from the optimal parameters, and even late materialization rarely helps in real workloads. We thus, conclude that integrating the radix join within a code-generating database rarely justifies the increase in code and optimizer complexity and advise against it for processing real-world workloads.
Self-Tuning Query Scheduling for Analytical Workloads
SIGMOD 2021 | Benjamin Wagner, André Kohn, and Thomas Neumann | June 25, 2021
Most database systems delegate scheduling decisions to the operating system. While such an approach simplifies the overall database design, it also entails problems. Adaptive resource allocation becomes hard in the face of concurrent queries. Furthermore, incorporating domain knowledge to improve query scheduling is difficult. To mitigate these problems, many modern systems employ forms of task-based parallelism. The execution of a single query is broken up into small, independent chunks of work (tasks). Now, fine-grained scheduling decisions based on these tasks are the responsibility of the database system. Despite being commonplace, little work has focused on the opportunities arising from this execution model. In this paper, we show how task-based scheduling in database systems opens up new areas for optimization. We present a novel lock-free, self-tuning stride scheduler that optimizes query latencies for analytical workloads. By adaptively managing query priorities and task granularity, we provide high scheduling elasticity. By incorporating domain knowledge into the scheduling decisions, our system is able to cope with workloads that other systems struggle with. Even at high load, we retain near optimal latencies for short running queries. Compared to traditional database systems, our design often improves tail latencies by more than 10x.
Tidy Tuples and Flying Start: Fast Compilation and Fast Execution of Relational Queries in Umbra
VLDBJ | Timo Kersten, Viktor Leis, and Thomas Neumann | January 1, 2021
Although compiling queries to efficient machine code has become a common approach for query execution, a number of newly-created database system projects still refrain from using compilation. It is sometimes claimed that the intricacies of code generation make compilation-based engines too complex. Also, a major barrier for adoption, especially for interactive ad-hoc queries, is long compilation time. In this paper, we examine all stages of compiling query execution engines and show how to reduce compilation overhead. We incorporate the lessons learned from a decade of generating code in HyPer into a design that manages complexity and yields high speed. First, we introduce a code generation framework that establishes abstractions to manage complexity, yet generates code in a single fast pass. Second, we present a program representation whose data structures are tuned to support fast code generation and compilation. Third, we introduce a new compiler backend that is optimized for minimal compile time, and simultaneously, yields superior execution performance to competing approaches, e.g., Volcano-style or bytecode interpretation. We implemented these optimizations in our database system Umbra to show that it is possible to unite fast compilation and fast execution. Indeed, Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB and PostgreSQL. At the same time, on large data sets, its throughput is on par with the state-of-the-art compiling system HyPer.
Mosaic: A Budget-Conscious Storage Engine for Relational Database Systems
VLDB 2020 | Lukas Vogel, Alexander van Renen, Satoshi Imamura, Viktor Leis, Thomas Neumann, and Alfons Kemper | August 30, 2020
Relational database systems are purpose-built for a specific storage device class (e.g., HDD, SSD, or DRAM). They do not cope well with the multitude of storage devices that are competitive at their price 'sweet spots'. To make use of different storage device classes, users have to resort to workarounds, such as storing data in different tablespaces. A lot of research has been done on heterogeneous storage frameworks for distributed big data query engines. These engines scale well for big data sets but are often CPU- or network-bound. Both approaches only maximize performance for previously purchased storage devices. We present Mosaic, a storage engine for scan-heavy workloads on RDBMS that manages devices in a tierless pool and provides device purchase recommendations for a specified workload and budget. In contrast to existing systems, Mosaic generates a performance/budget curve that is Pareto-optimal, along which the user can choose. Our approach uses device models and linear optimization to find a data placement solution that maximizes I/O throughput for the workload. Our evaluation shows that Mosaic provides a higher throughput at the same budget or a similar throughput at a lower budget than the state-of-the-art approaches of big data query engines and RDBMS.
Meet Me Halfway: Split Maintenance of Continuous Views
VLDB 2020 | Christian Winter, Tobias Schmidt, Thomas Neumann, and Alfons Kemper | August 30, 2020
From Industry 4.0-driven factories to real-time trading algorithms, businesses depend on analytics on high-velocity real-time data. Often these analytics are performed not in dedicated stream processing engines but on views within a general-purpose database to combine current with historical data. However, traditional view maintenance algorithms are not designed with both the volume and velocity of data streams in mind.
In this paper, we propose a new type of view specialized for queries involving high-velocity inputs, called continuous view. The key component of continuous views is a novel maintenance strategy, splitting the work between inserts and queries. By performing initial parts of the view's query for each insert and the remainder at query time, we achieve both high input rates and low query latency. Further, we keep the memory overhead of our views small, independent of input velocity. To demonstrate the practicality of this strategy, we integrate continuous views into our Umbra database system. We show that split maintenance can outperform even dedicated stream processing engines on analytical workloads, all while still offering similar insert rates. Compared to modern materialized view maintenance approaches, such as deferred and incremental view maintenance, that often need to materialize expensive deltas, we achieve up to an order of magnitude higher insert throughput.
Adopting Worst-Case Optimal Joins in Relational Database Systems
VLDB 2020 | Michael Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper, and Thomas Neumann | August 30, 2020
Worst-case optimal join algorithms are attractive from a theoretical point of view, as they offer asymptotically better runtime than binary joins on certain types of queries. In particular, they avoid enumerating large intermediate results by processing multiple input relations in a single multiway join. However, existing implementations incur a sizable overhead in practice, primarily since they rely on suitable ordered index structures on their input. Systems that support worst-case optimal joins often focus on a specific problem domain, such as read-only graph analytic queries, where extensive precomputation allows them to mask these costs.
In this paper, we present a comprehensive implementation approach for worst-case optimal joins that is practical within general-purpose relational database management systems supporting both hybrid transactional and analytical workloads. The key component of our approach is a novel hash-based worst-case optimal join algorithm that relies only on data structures that can be built efficiently during query execution. Furthermore, we implement a hybrid query optimizer that intelligently and transparently combines both binary and multi-way joins within the same query plan. We demonstrate that our approach far outperforms existing systems when worst-case optimal joins are beneficial while sacrificing no performance when they are not.
On another level: how to debug compiling query engines
DBTest 2020 | Timo Kersten and Thomas Neumann | June 19, 2020
Compilation-based query engines generate and compile code at runtime, which is then run to get the query result. In this process there are two levels of source code involved: The code of the code generator itself and the code that is generated at runtime. This can make debugging quite indirect, as a fault in the generated code was caused by an error in the generator. To find the error, we have to look at both, the generated code and the code that generated it.
Current debugging technology is not equipped to handle this situation. For example, GNU’s gdb only offers facilities to inspect one source line, but not multiple source levels. Also, current debuggers are not able to reconstruct additional program state for further source levels, thus, context is missing during debugging.
In this paper, we show how to build a multi-level debugger for generated queries that solves these issues. We propose to use a time-travelling debugger to provide context information for compile-time and runtime, thus providing full interactive debugging capabilities for every source level. We also present how to build such a debugger with low engineering effort by combining existing tool chains.
Concurrent online sampling for all, for free
DaMoN 2020 | Altan Birler, Bernhard Radke, and Thomas Neumann | June 15, 2020
Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load.
We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.
Scalable and Robust Latches for Database Systems
DaMoN 2020 | Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, and Alfons Kemper | June 15, 2020
Multi-core scalability is one of the most important features for database systems running on today’s hardware. Not surprisingly, the implementation of locks is paramount to achieving efficient and scalable synchronization. In this work, we identify the key database-specific requirements for lock implementations and evaluate them using both micro-benchmarks and full-fledged database workloads. The results indicate that optimistic locking has superior performance in most workloads due to its minimal overhead and latency. By complementing optimistic locking with a pessimistic shared mode lock we demonstrate that we can also process HTAP workloads efficiently. Finally, we show how lock contention can be handled gracefully without slowing down the uncontended fast path or increasing space requirements by using a lightweight parking lot infrastructure.
Experimental Study of Memory Allocation for High-Performance Query Processing
ADMS 2019 | Dominik Durner, Viktor Leis, and Thomas Neumann | August 26, 2019
Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this work, we provide the first comprehensive experimental study that analyzes and explains the impact of memory allocation for high-performance query engines. We test five state-of-the-art dynamic memory allocators and discuss their strengths and weaknesses within our DBMS. The right allocator can increase the performance of TPC-DS (SF 100) by 2.7x on a 4-socket Intel Xeon server.
On the Impact of Memory Allocation on High-Performance Query Processing
DaMoN 2019 | Dominik Durner, Viktor Leis, and Thomas Neumann | July 1, 2019
[Short Paper - Long Version at ADMS 2019] Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this work, we provide the first comprehensive experimental analysis on the impact of memory allocation for high-performance query engines. We test five state-of-the-art dynamic memory allocators and discuss their strengths and weaknesses within our DBMS. The right allocator can increase the performance of TPC-DS (SF 100) by 2.7xon a 4-socket Intel Xeon server.
Scalable Reservoir Sampling on Many-Core CPUs
ACM SIGMOD 2019 | Altan Birler | June 30, 2019
Database systems need to be able to convert queries to efficient execution plans. As recent research has shown, correctly estimating cardinalities of subqueries is an important factor in the efficiency of the resulting plans. Many algorithms have been proposed in literature that utilize a random sample to estimate cardinalities. Thus, some modern database systems choose to store a materialized uniformly random sample for their relations. Such samples are built and refreshed when statistics are gathered, by loading uniformly random tuples from the relation in disk using random IO.
No False Negatives: Accepting All Useful Schedules in a Fast Serializable Many-Core System
ICDE 2019 | Dominik Durner and Thomas Neumann | April 9, 2019
Concurrency control is one of the most performance critical steps in modern many-core database systems. Achieving higher throughput on multi-socket servers is difficult and many concurrency control algorithms reduce the amount of accepted schedules in favor of transaction throughput or relax the isolation level which introduces unwanted anomalies. Both approaches lead to unexpected transaction behavior that is hard to under-stand by the database users. We introduce a novel multi-version concurrency protocol that achieves high performance while reducing the number of aborted schedules to a minimum and providing the best isolation level. Our approach leverages the idea of a graph-based scheduler that uses the concept of conflict graphs. As conflict serializable histories can be represented by acyclic conflict graphs, our scheduler maintains the conflict graph and allows all transactions that keep the graph acyclic. All conflict serializable schedules can be accepted by such a graph-based algorithm due to the conflict graph theorem. Hence, only transaction schedules that really violate the serializability constraints need to abort. Our developed approach is able to accept the entire useful intersection of order preserving conflict serializable (OCSR) and recoverable (RC) schedules which are the two most desirable classes in terms of correctness and user experience. We show experimentally that our graph-based scheduler has very competitive throughput in pure transactional workloads while providing fewer aborts and improved user experience. Our multi-version extension helps to efficiently perform long-running read transactions on the same up-to-date database. Moreover, our graph-based scheduler can outperform the competitors on mixed workloads.
Making Compiling Query Engines Practical
TKDE | André Kohn, Viktor Leis, and Thomas Neumann | March 15, 2019
Compiling queries to machine code is a very efficient way for executing queries. One often overlooked problem with compilation is the time it takes to generate machine code. Even with fast compilation frameworks like LLVM, generating machine code for complex queries often takes hundreds of milliseconds. Such durations can be a major disadvantage for workloads that execute many complex, but quick queries. To solve this problem, we propose an adaptive execution framework, which dynamically switches from interpretation to compilation. We also propose a fast bytecode interpreter for LLVM, which can execute queries without costly translation to machine code and dramatically reduces the query latency. Adaptive execution is fine-grained, and can execute code paths of the same query using different execution modes. Our evaluation shows that this approach achieves optimal performance in a wide variety of settings---low latency for small data sets and maximum throughput for large data sizes. Besides compilation time, we also focus on debugging, which is another important challenge of compilation-based query engines. To address this problem, we present a novel, database-specific debugger for compiling query engines.
Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates
CIDR 2019 | Michael Freitag and Thomas Neumann | January 13, 2019
Database systems heavily rely upon cardinality estimates for finding efficient execution plans, and estimation errors can easily affect query execution times by large factors. One particularly difficult problem is estimating the result size of a group-by operator, or, in general, the number of distinct combinations of a set of attributes. In contrast to, e. g., estimating the selectivity of simple filter predicates, the resulting number of groups cannot be predicted reliably without examining the complete input. As a consequence, most existing systems have poor estimates for the number of distinct groups. However, scanning entire relations at optimization time is not feasible in practice. Also, precise group counts cannot be precomputed for every possible combination of attributes. For practical purposes, a cheap mechanism is thus required which can handle arbitrary attribute combinations efficiently and with high accuracy. In this work, we present a novel estimation framework that combines sketched full information over individual columns with random sampling to correct for correlation bias between attributes. This combination can estimate group counts for individual columns nearly perfectly, and for arbitrary column combinations with high accuracy. Extensive experiments show that these excellent results hold for both synthetic and real-world data sets. We demonstrate how this mechanism can be integrated into existing systems with low overhead, and how estimation time can be kept negligible by means of an efficient algorithm for sample scans.
Everything You Always Wanted To Known About Compiled and Vectorized Queries But Were Afraid To Ask
VLDB 2018 | Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz | September 13, 2018
The query engines of most modern database systems are either based on vectorization or data-centric code generation. These two state-of-the-art query processing paradigms are fundamentally dif- ferent in terms of system structure and query execution code. Both paradigms were used to build fast systems. However, until today it is not clear which paradigm yields faster query execution, as many implementation-specific choices obstruct a direct comparison of ar- chitectures. In this paper, we experimentally compare the two mod- els by implementing both within the same test system. This allows us to use for both models the same query processing algorithms, the same data structures, and the same parallelization framework to ul- timately create an apples-to-apples comparison. We find that both are efficient, but have different strengths and weaknesses. Vector- ization is better at hiding cache miss latency, whereas data-centric compilation requires fewer CPU instructions, which benefits cache- resident workloads. Besides raw, single-threaded performance, we also investigate SIMD as well as multi-core parallelization and dif- ferent hardware architectures. Finally, we analyze qualitative dif- ferences as a guide for system architects.
Adaptive Optimization of Very Large Join Queries
ACM SIGMOD 2018 | Thomas Neumann and Bernhard Radke | June 10, 2018
The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.
LeanStore: In-Memory Data Management beyond Main Memory
ICDE 2018 | Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann | April 17, 2018
Disk-based database systems use buffer managers in order to transparently manage data sets larger than main memory. This traditional approach is effective at minimizing the number of I/O operations, but is also the major source of overhead in comparison with in-memory systems. To avoid this overhead, in-memory database systems therefore abandon buffer management altogether, which makes handling data sets larger than main memory very difficult. In this work, we revisit this fundamental dichotomy and design a novel storage manager that is optimized for modern hardware. Our evaluation, which is based on TPC-C and micro benchmarks, shows that our approach has little overhead in comparison with a pure in-memory system when all data resides in main memory. At the same time, like a traditional buffer manager, it is fully transparent and can manage very large data sets effectively. Furthermore, due to low-overhead synchronization, our implementation is also highly scalable on multi-core CPUs.
Adaptive Execution of Compiled Queries
ICDE 2018 | André Kohn, Viktor Leis, and Thomas Neumann | April 16, 2018
Compiling queries to machine code is a very efficient way for executing queries. One often overlooked problem with compilation is the time it takes to generate machine code. Even with fast compilation frameworks like LLVM, generating machine code for complex queries often takes hundreds of milliseconds. Such durations can be a major disadvantage for workloads that execute many complex, but quick queries. To solve this problem, we propose an adaptive execution framework, which dynamically switches from interpretation to compilation. We also propose a fast bytecode interpreter for LLVM, which can execute queries without costly translation to machine code and dramatically reduces the query latency. Adaptive execution is fine-grained, and can execute code paths of the same query using different execution modes. Our evaluation shows that this approach achieves optimal performance in a wide variety of settings—low latency for small data sets and maximum throughput for large data sizes.
Let's Work Together. Get in Touch!
Contact us for Ph.D. positions, student theses, collaborations, and research opportunities.Email Us
Stay Up To Date With Our Blog
A blog written by and for database architects with approx. quarterly articles. Read about the latest research in database systems, and be at the forefront in our field.Go To Blog