Storage

Description

What kind of different storages (storing backends) ZODB has and how to use them.

Introduction

This page explains details how ZODB stores data. The information here is important to know to understand Plone database behavior and how to optimize your application.

Pickling

ZODB is object oriented database. All data in ZODB is pickled Python objects. Pickle is object serialization module for Python.

  • Each time object is read and it is not cached, object is read from ZODB data storage and unpickled
  • Each time object is written, it is pickled and transaction machinery appends it to ZODB data storage

Pickle format is series of bytes. Here is example what it does look like:

>>> import pickle
>>> data = { "key" : "value" }
>>> pickled = pickle.dumps(data)
>>> print pickled
(dp0
S'key'
p1
S'value'
p2
s.

It is not very human readable format.

Even if you use SQL based RelStorage ZODB backends, the objects are still pickled to the database; SQL does not support varying table schema per row and Python objects do not have fixed schema format.

Binary trees

Data is usually organized to binary trees or BTrees . More specifically, data is usually stored as Object Oriented Binary Tree OOBtree which provides Python object as key and Python object value mappings. Key is the object id in the parent container as a string and value is any pickleable Python object or primitive you store in your database.

ZODB data structure interfaces.

Using BTrees example from Zope Docs.

Buckets

BTree stores data in buckets (OOBucket).

Bucket is the smallest unit of data which is written to the database once. Buckets are loaded lazily: BTree only loads buckets storing values of keys being accessed.

BTree tries to stick as much data into one bucket once as possible. When one value in bucket is changed the whole bucket must be rewritten to the disk.

Default bucket size is 30 objects.

Storing as attribute vs. storing in BTree

Plone has two kinds of fundamental way to store data:

  • Attribute storage (stores values directly in the pickled objects).
  • Annotation storage (OOBTree based). Plone objects have attribute __annotations__ which is OOBtree for storing objects in name-conflict free way.

When storing objects in annotation storage, reading object values need at least one extra database look up to load the first bucket of OOBTree.

If the value is going to be used frequently, and especially if it is read when viewing the content object, storing it in an attribute is more efficient than storing it in an annotation. This is because the __annotations__ BTree is a separate persistent object which has to be loaded into memory, and may push something else out of the ZODB cache.

If the attribute stores a large value, it will increase memory usage, as it will be loaded into memory each time the object is fetched from the ZODB.

BLOBs

BLOBs are large binary objects like files or images.

BLOBs are supported since ZODB 3.8.x. Plone 3.x still uses ZODB 3.7.x by default. ZODB 3.8.x works but it is not officially supported.

When you use BLOB interface to store and retrieve data, they are stored physically as files on your file systems. File system, as the name says, was designed to handle files and has far better performance on large binary data as sticking the data into ZODB.

BLOBs are streamable which means that you can start serving the file from the beginning of the file to HTTP wire without needing to buffer the whole data to the memory first (slow).

SQL values

Plone's Archetypes subsystem supports storing individual Archetypes fields in SQL database. This is mainly an integration feature. Read more about this in Archetypes manual.