From Eigenpedia

Jump to: navigation, search

At the operating system level, memory management in LucidDB is reasonably straightforward because it runs as a single multi-threaded process (rather than as multiple processes with shared memory). However, within that process, memory management is a bit complicated due to the server's hybrid Java+native architecture.


Virtual memory usage overview

The table below shows the overall breakdown for the virtual memory address space of a LucidDB process.

Memory Type Category Subcategory Notes
Java heap
Catalog repository object cache Contains descriptors for catalog objects such as schemas, tables, and columns. This cache uses Java soft references, so it can grow and shrink automatically according to garbage collector memory pressure. Hence, no knob is provided for controlling it.
Session state Typically small, this includes information about a session such as the associated user and the list of open cursors. Memory consumption here is based on number of sessions and number of cursors open per session.
Statement plan cache As SQL statements are prepared and executed, they are encoded as "plans". This cache contains the sharable top level of each plan, which in turn references other sub-components of plans such as generated Java classes and native-code ExecStream graphs. Cache size is controlled by the "codeCacheMaxBytes" parameter. See FarragoCodeCache for more information.
Statements currently being prepared As SQL statements are prepared, they undergo operations such as parsing, validation, and optimization. Each preparation requires a spike of memory usage for these operations, after which most of the memory used by preparation can be garbage collected, with only the final plan remaining in memory. Therefore, the memory usage in this area is very bursty.
Statements currently being executed As statements are executed, they acquire and release Java memory dynamically for activities such as SQL expression evaluation, UDF invocation, and tuple buffering. The amount of memory used depends on the complexity of the statements currently being executed.
Singleton objects This includes global resources such as lock managers and catalog repository connections. Memory usage here generally achieves steady state as the first few statements are executed after startup.
Network connection pool Sockets, threads, RMI objects, etc. Size depends on number of clients active; there may be some RMI and VJDBC parameters which can control memory usage here.
Java classloader
JVM runtime libraries Should hit steady state soon after startup.
LucidDB Java codebase Includes third-party libraries; should hit steady state soon after startup.
Loaded plugin jar cache LucidDB allows plugin jars for SQL/MED and UDF's to be loaded dynamically. Caching of these jars is controlled by the "codeCacheMaxBytes" parameter. Jars are pinned while the code they contain is in use by at least one executing SQL statement.
Generated Java code cache LucidDB generates Java code dynamically as part of SQL statement optimization. Caching of generated classes is controlled by the "codeCacheMaxBytes" parameter. Classes can be shared by multiple cursors executing the same plan; classes are pinned while they are in use by at least one executing SQL statement.
Uncollected garbage Whatever the garbage collector hasn't gotten around to yet.
Unused space The Java heap retains memory allocations for reuse after garbage collection. There may also be significant fragmentation within Java heap blocks.
Buffer pool
Cached disk pages LucidDB caches disk pages after they are read. Disk pages are evicted from the cache automatically when no free pages remain and the memory is needed for some other buffer pool activity such as reading a different page or recording scratch information while executing a query. Disk pages may be pinned by an ExecStream while a statement is accessing them, temporarily preventing them from being evicted. The overall size of the buffer pool is controlled by the "cachePagesInit" parameter. Pages can be dedicated to global cache via the "cacheReservePercentage" parameter.
Scratch pages allocated to currently open ExecStreams ExecStreams are native-code dataflow algorithms used to execute SQL statements. Many ExecStreams such as sort and join are memory-hungry; the memory they use for computation is referred to as "scratch" memory because it is never written to any disk page (note that this memory is different from "temp" disk pages used for activities such as external sort and hash partitioning). Scratch pages are pinned exclusively by the ExecStreams which are using them and released into the free page pool once the ExecStream no longer needs their contents. Scratch page quotas are controlled via the "expectedConcurrentStatements" parameter. For more information, see the Fennel resource governor design docs.
Free page pool All buffer pool pages start out here and then get allocated for either disk cache or scratch usage. Pages are returned to the free pool once their contents are no longer needed (e.g. when a table is dropped, any of its cached disk pages are unmapped).
Native heap
ExecStream plan cache ExecStream instances are somewhat expensive to construct, so they are cached and reused as a separate level of plan caching. An ExecStream can only be in use by one cursor at a time, so multiple copies of the same ExecStream are added to the cache as needed if multiple cursors are executing the same plan. This cache is controlled by the "codeCacheMaxBytes" parameter.
ExecStreams currently being executed As ExecStreams do their work, they may dynamically acquire and release small amounts of random memory outside of the scratch pages they allocate from the buffer pool. In general, any large allocations are supposed to come out of the buffer pool where they can be accounted for accurately, so the small allocations should not be significant.
Singleton objects This includes allocations such as cache and device data structures; should reach steady-state soon after startup.
Fragmentation Without a garbage collector, there is a much greater potential for memory fragmentation since allocated objects cannot be moved. Fragmentation is minimized by taking all allocations of any significant size out of the buffer pool, which consists of uniformly sized pages, but some amount of fragmentation can be expected from ExecStream data structures.
Native code images
System runtime libraries Includes native portion of JVM
LucidDB native code libraries Includes third-party libraries such as Boost and STLport.
Just-in-time compiled Java code The JVM may compile frequently-used Java classes into native code; this includes statically compiled LucidDB libraries and plugins as well as Java classes generated dynamically by LucidDB's optimizer. Management of this memory is up to the JVM; there may be JVM-specific flags for controlling it.
Thread stacks Each thread executing in the server consumes some amount of memory for its stack. The number of threads executing depends roughly on the number of concurrent requests.
Unused virtual address ranges Large portions of the process virtual address space may be left unused.

Administrative goals

There are three main reasons an administrator may want to control how LucidDB allocates and uses memory:

  • To eliminate errors caused by LucidDB running out of memory
  • To limit the amount of physical memory LucidDB uses (e.g. to leave more for other processes running on the same machine)
  • To improve performance of LucidDB

These goals are often in competition; for example, out of memory errors can be eliminated by disabling code caching, but only at the expense of performance.

Available parameters

The following parameters are the knobs currently available:

  • JVM limits: the maximum size of the Java heap can be set via the -Xmx command-line parameter when the JVM is invoked to start LucidDB. Defaults for this value are JVM-dependent; for example, the Sun hotspot client JVM defaults to a 64MB limit, whereas the server JVM defaults to a quarter of available physical memory. For best performance on a server machine with dedicated resources, it's usually a good idea to set the minimum heap size (via -Xms) to the same value as the maximum heap size; this avoids unnecessary garbage collection.
  • Portion of buffer pool reserved for disk caching: this can be set via SQL with the "cacheReservePercentage" system parameter.
  • Buffer pool rationing for each statement: this can be set via SQL with the "expectedConcurrentStatements" system parameter.
  • Code cache size: this can be set via SQL with the "codeCacheMaxBytes" system parameter. The "code cache" is a single unified cache for statement plans, plugin jars, generated Java classes, and ExecStream plans. Setting this parameter to "min" disables caching for all of these objects. Setting this parameter to "max" prevents cached objects from ever being deallocated (this will eventually lead to out-of-memory errors). By default, this parameter is set to 2MB, which should work well in most configurations.

Running out of memory

LucidDB can run out of memory for the following reasons:

  • Failure to allocate buffer pool on startup. This can happen because the number of pages requested exceeds the available memory. LucidDB locks buffer pool pages (preventing them from being paged out), so the failure can be either from virtual memory limits (maximum address space size exceeded) or from physical memory limits (insufficient physical memory available to be locked). LucidDB currently only supports 32-bit address spaces, so the virtual memory limit may be as low as 2GB depending on the operating system.
  • Java heap limit exceeded. This is probably the most common failure, and can happen during just about any activity. It is usually caused by one of
    • setting that limit too low
    • setting the "codeCacheMaxBytes" parameter too high
    • overloading the server with too many concurrent requests given the available memory
  • Buffer pool exhausted. This can occur when a new statement is about to start executing if the resource governor detects that the statement's requirements cannot be satisified by the remaining buffers. This is usually caused by:
    • setting "cachePagesInit" too low
    • setting "expectedConcurrentStatements" too low (so that not enough of the buffer pool is kept in reserve for new statements)
    • setting "cacheReservePercentage" too high (so that not enough of the buffer pool is available for statement execution instead of disk caching)
    • overloading the server with too many concurrent requests given the available memory
  • Native heap allocation failure. This is rare; if encountered, it is most likely to be when a new statement is about to start executing rather than in the middle of statement execution. The cause is usually that some parameter (such as the Java heap size, buffer pool size, or code cache size) is set too high, meaning not enough virtual memory is left over for miscellaneous native heap allocations.

Limiting overall memory usage

This goal is achieved by limiting the settings for the JVM heap size, "cachePagesInit", and "codeCacheMaxBytes" parameters mentioned above. Roughly, the overall physical memory used by LucidDB can be calculated as the sum of

  • Java heap
  • Buffer pool
  • Native heap/stacks used by ExecStreams
  • Actively used portions of native code library images

Limits on the first two can be set directly. The last one should be fairly static at steady-state and can probably be estimated via operating-system working-set analysis utilities. The native heap is the hardest part; the only parameter which controls it is the code cache size, but that affects other objects as well.

Tuning for performance

TBD. Generally speaking, performance can be improved by increasing the sizes of the buffer pool and code cache, and by adjusting the "expectedConcurrentStatements" parameter to match the actual workload, but the details are very much application-dependent.

Note that for a database server, it is never a good idea to exceed available physical memory even if there is sufficient virtual memory available. In this instance, the goals of limiting usage and improving performance actually harmonize.

Personal tools