Appendix C. PyTables' parameter files.

PyTables issues warnings when certain limits are exceeded. Those limits are not intrinsic limitations of the underlying software, but rather are proactive measures to avoid large resource consumptions. The default limits should be enough for most of cases, and users should try to respect them. However, in some situations, it can be convenient to increase (or decrease) these limits.

Also, and in order to get maximum performance, PyTables implements a series of sophisticated features, like I/O buffers or different kind of caches (for nodes, chunks and other internal metadata). These features comes with a default set of parameters that ensures a decent performance in most of situations. But, as there is always a need for every case, it is handy to have the possibility to fine-tune some of these parameters.

Because of this, PyTables implements a couple of ways to change these values. All of the tunable parameters live in the tables/parameters.py (and tables/_parameters_pro.py, for PyTables Pro users). The user can choose to change them in the parameter files themselves for a global and persistent change. Moreover, if she wants a finer control, she can pass any of these parameters directly to the openFile() function (see description), and the new parameters will only take effect in the corresponding file (the defaults will continue to be in the parameter files.

A description of all of the tunable parameters follows. Please see your parameter files so as to know the actual default values.

[Warning]Warning

Changing the next parameters may have a very bad effect in the resource consumption and performance of your PyTables scripts. Please be careful when touching these!

C.1. Tunable parameters in tables/parameters.py.

C.1.1. Recommended maximum values

MAX_COLUMNS

Maximum number of columns in Table objects before a PerformanceWarning is issued. This limit is somewhat arbitrary and can be increased.

MAX_COLUMNS

MAX_NODE_ATTRS

Maximum allowed number of attributes in a node

MAX_GROUP_WIDTH

Maximum depth in object tree allowed.

MAX_UNDO_PATH_LENGTH

Maximum length of paths allowed in undo/redo operations.

C.1.2. Cache limits

METADATA_CACHE_SIZE

Size (in bytes) of the HDF5 metadata cache. This only takes effect if using HDF5 1.8.x series.

NODE_CACHE_SLOTS

Maximum number of unreferenced nodes to be kept in memory.

If positive, this is the number of unreferenced nodes to be kept in the metadata cache. Least recently used nodes are unloaded from memory when this number of loaded nodes is reached. To load a node again, simply access it as usual. Nodes referenced by user variables are not taken into account nor unloaded.

Negative value means that all the touched nodes will be kept in an internal dictionary. This is the faster way to load/retrieve nodes. However, and in order to avoid a large memory comsumption, the user will be warned when the number of loaded nodes will reach the -NODE_CACHE_SLOTS value.

Finally, a value of zero means that any cache mechanism is disabled.

C.1.3. Parameters for the I/O buffer in Table objects.

CHUNKTIMES

The buffersize/chunksize ratio.

BUFFERTIMES

The maximum buffersize/rowsize ratio before issuing a PerformanceWarning.

C.1.4. Miscellaneous

EXPECTED_ROWS_EARRAY

Default expected number of rows for EArray objects.

EXPECTED_ROWS_TABLE

Default expected number of rows for Table objects.

PYTABLES_SYS_ATTRS

Set this to False if you don't want to create PyTables system attributes in datasets. Also, if set to False the possible existing system attributes are not considered for guessing the class of the node during its loading from disk (this work is delegated to the PyTables' class discoverer function for general HDF5 files).

C.2. Tunable parameters in tables/_parameters_pro.py.

[Note]Note

These parameters are only available in PyTables Pro.

C.2.1. Parameters for the different internal caches

BOUNDS_MAX_SIZE

The maximum size for bounds values cached during index lookups.

BOUNDS_MAX_SLOTS

The maximum number of slots for the BOUNDS cache.

ITERSEQ_MAX_ELEMENTS

The maximum number of iterator elements cached in data lookups.

ITERSEQ_MAX_SIZE

The maximum space that will take ITERSEQ cache (in bytes).

ITERSEQ_MAX_SLOTS

The maximum number of slots in ITERSEQ cache.

LIMBOUNDS_MAX_SIZE

The maximum size for the query limits (for example, (lim1, lim2) in conditions like lim1 ≤ col < lim2) cached during index lookups (in bytes).

LIMBOUNDS_MAX_SLOTS

The maximum number of slots for LIMBOUNDS cache.

TABLE_MAX_SIZE

The maximum size for table chunks cached during index queries.

SORTED_MAX_SIZE

The maximum size for sorted values cached during index lookups.

SORTEDLR_MAX_SIZE

The maximum size for chunks in last row cached in index lookups (in bytes).

SORTEDLR_MAX_SLOTS

The maximum number of chunks for SORTEDLR cache.

C.2.2. Parameters for general cache behaviour

[Warning]Warning

The next parameters will not be effective if passed to the openFile() function (so, they can only be changed in a global way). You can change them in the file, but this is strongly discouraged unless you know well what you are doing.

DISABLE_EVERY_CYCLES

The number of cycles in which a cache will be forced to be disabled if the hit ratio is lower than the LOWEST_HIT_RATIO (see below). This value should provide time enough to check whether the cache is being efficient or not.

ENABLE_EVERY_CYCLES

The number of cycles in which a cache will be forced to be (re-)enabled, irregardingly of the hit ratio. This will provide a chance for checking if we are in a better scenario for doing caching again.

LOWEST_HIT_RATIO

The minimum acceptable hit ratio for a cache to avoid disabling (and freeing) it.