Flash‑Based Storage + In‑Memory Performance

A ACID-compliant database built for in-memory analytics speed. For out-of-core processing it falls back gracefully to flash-based storage. Featuring fast code generation, low-latency query execution, and drop-in PostgreSQL compatibility.

Try Demo Now Publications

Features

Low-Overhead Buffer Manager

Umbra's LeanStore-based buffer manager provides in-memory performance for cached data and degrades gracefully in case the data does not fit into RAM. It uses variable-size pages to bridge the performance gap between background storage and main-memory effectively.

Low-Latency Query Compilation

Umbra's query compiler adaptively switches between two modes: Direct machine code generation instantly gets the query execution started. Longer Running Queries automatically switch to a higher gear with the help of the LLVM optimizing compiler.

Compact Intermediate Representation

Our custom intermediate representation uses a finely tuned data layout tailored to fast program generation and basic optimizations. It lays the foundation for our multi-level debugger.

No Compromises

Umbra features ACID-compliant transaction execution. Queries are specified in SQL, including extensions like window functions. It is a drop-in replacement for PostgreSQL.

Computational Database beyond SQL

Umbra provides an efficient approach to user-defined functions. The system automatically parallelizes user functions, creates deep integration into generated code, and ensures morsel-driven execution. This makes it easy to extend Umbra with advanced operations, e.g., gradient-descent and k-means algorithms.

Publications

Understand Umbra

Foundational publications


2020

Umbra: A Disk-Based System with In-Memory Performance

CIDR 2020 | Thomas Neumann and Michael Freitag | January 13, 2020

Abstract

The increases in main-memory sizes over the last decade have made pure in-memory database systems feasible, and in-memory systems offer unprecedented performance. However, DRAM is still relatively expensive, and the growth of main-memory sizes has slowed down. In contrast, the prices for SSDs have fallen substantially in the last years, and their read bandwidth has increased to gigabytes per second. This makes it attractive to combine a large in-memory buffer with fast SSDs as storage devices, combining the excellent performance for the in-memory working set with the scalability of a disk-based system. In this paper we present the Umbra system, an evolution of the pure in-memory HyPer system towards a disk-based, or rather SSD-based, system. We show that by introducing a novel low- overhead buffer manager with variable-size pages we can achieve comparable performance to an in-memory database system for the cached working set, while handling accesses to uncached data gracefully. We discuss the changes and techniques that were nec- essary to handle the out-of-memory case gracefully and with low overhead, offering insights into the design of a memory optimized disk-based system.

All publications


2020

Mosaic: A Budget-Conscious Storage Engine for Relational Database Systems

VLDB 2020 | Lukas Vogel, Alexander van Renen, Satoshi Imamura, Viktor Leis, Thomas Neumann, and Alfons Kemper | August 30, 2020

Abstract

Relational database systems are purpose-built for a specific storage device class (e.g., HDD, SSD, or DRAM). They do not cope well with the multitude of storage devices that are competitive at their price 'sweet spots'. To make use of different storage device classes, users have to resort to workarounds, such as storing data in different tablespaces. A lot of research has been done on heterogeneous storage frameworks for distributed big data query engines. These engines scale well for big data sets but are often CPU- or network-bound. Both approaches only maximize performance for previously purchased storage devices. We present Mosaic, a storage engine for scan-heavy workloads on RDBMS that manages devices in a tierless pool and provides device purchase recommendations for a specified workload and budget. In contrast to existing systems, Mosaic generates a performance/budget curve that is Pareto-optimal, along which the user can choose. Our approach uses device models and linear optimization to find a data placement solution that maximizes I/O throughput for the workload. Our evaluation shows that Mosaic provides a higher throughput at the same budget or a similar throughput at a lower budget than the state-of-the-art approaches of big data query engines and RDBMS.

Meet Me Halfway: Split Maintenance of Continuous Views

VLDB 2020 | Christian Winter, Tobias Schmidt, Thomas Neumann, and Alfons Kemper | August 30, 2020

Abstract

From Industry 4.0-driven factories to real-time trading algorithms, businesses depend on analytics on high-velocity real-time data. Often these analytics are performed not in dedicated stream processing engines but on views within a general-purpose database to combine current with historical data. However, traditional view maintenance algorithms are not designed with both the volume and velocity of data streams in mind.
In this paper, we propose a new type of view specialized for queries involving high-velocity inputs, called continuous view. The key component of continuous views is a novel maintenance strategy, splitting the work between inserts and queries. By performing initial parts of the view's query for each insert and the remainder at query time, we achieve both high input rates and low query latency. Further, we keep the memory overhead of our views small, independent of input velocity. To demonstrate the practicality of this strategy, we integrate continuous views into our Umbra database system. We show that split maintenance can outperform even dedicated stream processing engines on analytical workloads, all while still offering similar insert rates. Compared to modern materialized view maintenance approaches, such as deferred and incremental view maintenance, that often need to materialize expensive deltas, we achieve up to an order of magnitude higher insert throughput.

Adopting Worst-Case Optimal Joins in Relational Database Systems

VLDB 2020 | Michael Freitag, Maximilian Bandle, Tobias Schmidt, Viktor Leis, and Alfons Kemper | August 30, 2020

Abstract

Worst-case optimal join algorithms are attractive from a theoretical point of view, as they offer asymptotically better runtime than binary joins on certain types of queries. In particular, they avoid enumerating large intermediate results by processing multiple input relations in a single multiway join. However, existing implementations incur a sizable overhead in practice, primarily since they rely on suitable ordered index structures on their input. Systems that support worst-case optimal joins often focus on a specific problem domain, such as read-only graph analytic queries, where extensive precomputation allows them to mask these costs.
In this paper, we present a comprehensive implementation approach for worst-case optimal joins that is practical within general-purpose relational database management systems supporting both hybrid transactional and analytical workloads. The key component of our approach is a novel hash-based worst-case optimal join algorithm that relies only on data structures that can be built efficiently during query execution. Furthermore, we implement a hybrid query optimizer that intelligently and transparently combines both binary and multi-way joins within the same query plan. We demonstrate that our approach far outperforms existing systems when worst-case optimal joins are beneficial while sacrificing no performance when they are not.

On another level: how to debug compiling query engines

DBTest 2020 | Timo Kersten and Thomas Neumann | June 19, 2020

Abstract

Compilation-based query engines generate and compile code at runtime, which is then run to get the query result. In this process there are two levels of source code involved: The code of the code generator itself and the code that is generated at runtime. This can make debugging quite indirect, as a fault in the generated code was caused by an error in the generator. To find the error, we have to look at both, the generated code and the code that generated it.
Current debugging technology is not equipped to handle this situation. For example, GNU’s gdb only offers facilities to inspect one source line, but not multiple source levels. Also, current debuggers are not able to reconstruct additional program state for further source levels, thus, context is missing during debugging.
In this paper, we show how to build a multi-level debugger for generated queries that solves these issues. We propose to use a time-travelling debugger to provide context information for compile-time and runtime, thus providing full interactive debugging capabilities for every source level. We also present how to build such a debugger with low engineering effort by combining existing tool chains.

Concurrent online sampling for all, for free

DaMoN 2020 | Altan Birler, Bernhard Radke, and Thomas Neumann | June 15, 2020

Abstract

Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load.
We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.

Scalable and Robust Latches for Database Systems

DaMoN 2020 | Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, and Alfons Kemper | June 15, 2020

Abstract

Multi-core scalability is one of the most important features for database systems running on today’s hardware. Not surprisingly, the implementation of locks is paramount to achieving efficient and scalable synchronization. In this work, we identify the key database-specific requirements for lock implementations and evaluate them using both micro-benchmarks and full-fledged database workloads. The results indicate that optimistic locking has superior performance in most workloads due to its minimal overhead and latency. By complementing optimistic locking with a pessimistic shared mode lock we demonstrate that we can also process HTAP workloads efficiently. Finally, we show how lock contention can be handled gracefully without slowing down the uncontended fast path or increasing space requirements by using a lightweight parking lot infrastructure.

2019

Experimental Study of Memory Allocation for High-Performance Query Processing

ADMS 2019 | Dominik Durner, Viktor Leis, and Thomas Neumann | August 26, 2019

Abstract

Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this work, we provide the first comprehensive experimental study that analyzes and explains the impact of memory allocation for high-performance query engines. We test five state-of-the-art dynamic memory allocators and discuss their strengths and weaknesses within our DBMS. The right allocator can increase the performance of TPC-DS (SF 100) by 2.7x on a 4-socket Intel Xeon server.

On the Impact of Memory Allocation on High-Performance Query Processing

DaMoN 2019 | Dominik Durner, Viktor Leis, and Thomas Neumann | July 1, 2019

Abstract

[Short Paper - Long Version at ADMS 2019] Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this work, we provide the first comprehensive experimental analysis on the impact of memory allocation for high-performance query engines. We test five state-of-the-art dynamic memory allocators and discuss their strengths and weaknesses within our DBMS. The right allocator can increase the performance of TPC-DS (SF 100) by 2.7xon a 4-socket Intel Xeon server.

Scalable Reservoir Sampling on Many-Core CPUs

ACM SIGMOD 2019 | Altan Birler | June 30, 2019

Abstract

Database systems need to be able to convert queries to efficient execution plans. As recent research has shown, correctly estimating cardinalities of subqueries is an important factor in the efficiency of the resulting plans. Many algorithms have been proposed in literature that utilize a random sample to estimate cardinalities. Thus, some modern database systems choose to store a materialized uniformly random sample for their relations. Such samples are built and refreshed when statistics are gathered, by loading uniformly random tuples from the relation in disk using random IO.

No False Negatives: Accepting All Useful Schedules in a Fast Serializable Many-Core System

ICDE 2019 | Dominik Durner and Thomas Neumann | April 9, 2019

Abstract

Concurrency control is one of the most performance critical steps in modern many-core database systems. Achieving higher throughput on multi-socket servers is difficult and many concurrency control algorithms reduce the amount of accepted schedules in favor of transaction throughput or relax the isolation level which introduces unwanted anomalies. Both approaches lead to unexpected transaction behavior that is hard to under-stand by the database users. We introduce a novel multi-version concurrency protocol that achieves high performance while reducing the number of aborted schedules to a minimum and providing the best isolation level. Our approach leverages the idea of a graph-based scheduler that uses the concept of conflict graphs. As conflict serializable histories can be represented by acyclic conflict graphs, our scheduler maintains the conflict graph and allows all transactions that keep the graph acyclic. All conflict serializable schedules can be accepted by such a graph-based algorithm due to the conflict graph theorem. Hence, only transaction schedules that really violate the serializability constraints need to abort. Our developed approach is able to accept the entire useful intersection of order preserving conflict serializable (OCSR) and recoverable (RC) schedules which are the two most desirable classes in terms of correctness and user experience. We show experimentally that our graph-based scheduler has very competitive throughput in pure transactional workloads while providing fewer aborts and improved user experience. Our multi-version extension helps to efficiently perform long-running read transactions on the same up-to-date database. Moreover, our graph-based scheduler can outperform the competitors on mixed workloads.

Making Compiling Query Engines Practical

TKDE | André Kohn, Viktor Leis, and Thomas Neumann | March 15, 2019

Abstract

Compiling queries to machine code is a very efficient way for executing queries. One often overlooked problem with compilation is the time it takes to generate machine code. Even with fast compilation frameworks like LLVM, generating machine code for complex queries often takes hundreds of milliseconds. Such durations can be a major disadvantage for workloads that execute many complex, but quick queries. To solve this problem, we propose an adaptive execution framework, which dynamically switches from interpretation to compilation. We also propose a fast bytecode interpreter for LLVM, which can execute queries without costly translation to machine code and dramatically reduces the query latency. Adaptive execution is fine-grained, and can execute code paths of the same query using different execution modes. Our evaluation shows that this approach achieves optimal performance in a wide variety of settings---low latency for small data sets and maximum throughput for large data sizes. Besides compilation time, we also focus on debugging, which is another important challenge of compilation-based query engines. To address this problem, we present a novel, database-specific debugger for compiling query engines.

Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates

CIDR 2019 | Michael Freitag and Thomas Neumann | January 13, 2019

Abstract

Database systems heavily rely upon cardinality estimates for finding efficient execution plans, and estimation errors can easily affect query execution times by large factors. One particularly difficult problem is estimating the result size of a group-by operator, or, in general, the number of distinct combinations of a set of attributes. In contrast to, e. g., estimating the selectivity of simple filter predicates, the resulting number of groups cannot be predicted reliably without examining the complete input. As a consequence, most existing systems have poor estimates for the number of distinct groups. However, scanning entire relations at optimization time is not feasible in practice. Also, precise group counts cannot be precomputed for every possible combination of attributes. For practical purposes, a cheap mechanism is thus required which can handle arbitrary attribute combinations efficiently and with high accuracy. In this work, we present a novel estimation framework that combines sketched full information over individual columns with random sampling to correct for correlation bias between attributes. This combination can estimate group counts for individual columns nearly perfectly, and for arbitrary column combinations with high accuracy. Extensive experiments show that these excellent results hold for both synthetic and real-world data sets. We demonstrate how this mechanism can be integrated into existing systems with low overhead, and how estimation time can be kept negligible by means of an efficient algorithm for sample scans.

2018

Everything You Always Wanted To Known About Compiled and Vectorized Queries But Were Afraid To Ask

VLDB 2018 | Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz | September 13, 2018

Abstract

The query engines of most modern database systems are either based on vectorization or data-centric code generation. These two state-of-the-art query processing paradigms are fundamentally dif- ferent in terms of system structure and query execution code. Both paradigms were used to build fast systems. However, until today it is not clear which paradigm yields faster query execution, as many implementation-specific choices obstruct a direct comparison of ar- chitectures. In this paper, we experimentally compare the two mod- els by implementing both within the same test system. This allows us to use for both models the same query processing algorithms, the same data structures, and the same parallelization framework to ul- timately create an apples-to-apples comparison. We find that both are efficient, but have different strengths and weaknesses. Vector- ization is better at hiding cache miss latency, whereas data-centric compilation requires fewer CPU instructions, which benefits cache- resident workloads. Besides raw, single-threaded performance, we also investigate SIMD as well as multi-core parallelization and dif- ferent hardware architectures. Finally, we analyze qualitative dif- ferences as a guide for system architects.

Adaptive Optimization of Very Large Join Queries

ACM SIGMOD 2018 | Thomas Neumann and Bernhard Radke | June 10, 2018

Abstract

The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.

LeanStore: In-Memory Data Management beyond Main Memory

ICDE 2018 | Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann | April 17, 2018

Abstract

Disk-based database systems use buffer managers in order to transparently manage data sets larger than main memory. This traditional approach is effective at minimizing the number of I/O operations, but is also the major source of overhead in comparison with in-memory systems. To avoid this overhead, in-memory database systems therefore abandon buffer management altogether, which makes handling data sets larger than main memory very difficult. In this work, we revisit this fundamental dichotomy and design a novel storage manager that is optimized for modern hardware. Our evaluation, which is based on TPC-C and micro benchmarks, shows that our approach has little overhead in comparison with a pure in-memory system when all data resides in main memory. At the same time, like a traditional buffer manager, it is fully transparent and can manage very large data sets effectively. Furthermore, due to low-overhead synchronization, our implementation is also highly scalable on multi-core CPUs.

Adaptive Execution of Compiled Queries

ICDE 2018 | André Kohn, Viktor Leis, and Thomas Neumann | April 16, 2018

Abstract

Compiling queries to machine code is a very efficient way for executing queries. One often overlooked problem with compilation is the time it takes to generate machine code. Even with fast compilation frameworks like LLVM, generating machine code for complex queries often takes hundreds of milliseconds. Such durations can be a major disadvantage for workloads that execute many complex, but quick queries. To solve this problem, we propose an adaptive execution framework, which dynamically switches from interpretation to compilation. We also propose a fast bytecode interpreter for LLVM, which can execute queries without costly translation to machine code and dramatically reduces the query latency. Adaptive execution is fine-grained, and can execute code paths of the same query using different execution modes. Our evaluation shows that this approach achieves optimal performance in a wide variety of settings—low latency for small data sets and maximum throughput for large data sizes.

Contact

Let's Work Together. Get in Touch!

Address
Technische Universität München
Institut für Informatik
Lehrstuhl III: Datenbanksysteme (I3)
Boltzmannstraße 3
85748 Garching bei München
Germany
Research Opportunities

Contact us for Ph.D. positions, student theses, collaborations, and research opportunities.

Email Us
Stay Up To Date With Our Blog

A blog written by and for database architects with approx. quarterly articles. Read about the latest research in database systems, and be at the forefront in our field.

Go To Blog