Serialisation

In addition to supporting the pickle protocol, Norman provides a framework for serialising and de-serializing databases to other formats through the norman.serialise module. Serialisation classes inherit Reader, Writer or Serialiser, which is a subclass of the first two provided for convenience.

Serialisation Framework

In addition to the Reader, Writer and Serialiser classes, a convenience function is provided to generate uids.

norman.serialise.uid()

Create a new uid value. This is useful for files which do not natively provide a uid.

Readers

class norman.serialise.Reader

An abstract base class providing a framework for readers.

Subclasses are required to implement iter_source and may re-implement any other methods to customise behaviour.

The entry point in the read method, which iterates of over records yielded by iter_source, identifies possible foreign keys by isuid and dereferences them by identifying loops and processing them with create_group. This method calls create_record to actually create the record.

read(source, db)

Read data from a source into db.

This converts each value returned by iter_source into a record using create_record. It also attempts to re-map nested records by searching for matching uids.

Cycles in the data are detected, and all records involved in in a cycle are created in create_group.

iter_source(source, db)

Iterate over record in the source file, yielding tuples of (table, data) or (table, uid, data). table is the Table containing the record, uid is a globally unique value identifying the record and data is a dict of field values for the record, possibly containing other uids. If uid is omitted, then one is automatically generated using uuid.

Parameters:
  • db – The Database being read into.
  • source – The data source, as specified in read.
isuid(field, value)

Return True if value, for the specified field, could be a uid.

field is a Field object.

This only needs to check whether the value could possibly represent another field. It is only actually considered a uid if there is another record which matches it.

By default, this returns True for all strings which match a UUID regular expression, e.g. 'a8098c1a-f86e-11da-bd1a-00112444be1e'.

create_group(records)

Create a group of records. records is an iterable containing co-dependant records, i.e. records which cyclically reference each other. In many cases, records will contain only a single record.

Each record returned by records is a tuples of (table, uid, data, cycles) . The first three values are the same as those returned by iter_source, except that foreign uids in data have been dereferenced. cycles is a set of field names which contain the cyclic references.

The default behaviour is to remove the cyclic fields from data for each record, create the records using create_record and assign the created records to the cyclic fields.

The return value is an iterator over (uid, record) pairs.

create_record(table, uid, data)

Create a single record in table, using uid and data, as given by iter_source. This is called by create_group, so any foreign uid in data should have been dereferenced. The record created should be returned, or, if it cannot be created, None should be returned.

The default implementation simply calls table(**data) and sets the uid.

Writers

class norman.serialise.Writer

An abstract base class providing a framework for writers.

Subclasses are required to implement context and write_record and may re-implement any other methods to customise behaviour.

The entry point in the write method, which opens the target file with context and iterates of over records in the database with iterdb. Each record is converted to a simple python structure with simplify and written using write_record.

write(targetname, db)

Write the database to filename.

fieldname is used only to open the file using open, so, depending on the implementation could be anything (e.g. a URL) which open recognises. It could even be omitted entirely if, for example, the serialiser dumps the database as formatted text to stdout.

context(targetname, db)

Return a context manager which opens and closes the file, including and preparation and finalisation needed. A common implementation might be:

def context(self, file):
    return open(file, 'w')

This can also be implemented using contextlib.contextmanager, which is useful for more complicated examples:

@contextlib.contextmanager
def context(self, targetname, db):
    fh = open(targetname, 'w')
    fh.write('### Header line ###')
    yield fh
    fh.write('### Footer line ###')
    fh.close()
iterdb(db)

Return an iterator over records in the database.

Records should be returned in the order they are to be written. The default implementation is a generator which iterates over records in each table.

simplify(record)

Convert a record to a simple python structure.

The default implementation converts record to a dict of field values, omitting NotSet values and replacing other records with their _uid properties. The return value is passed directly to write_record, so it can be anything recognised by it. This implementation returns a tuple of (tablename, record._uid, record_dict).

write_record(record, target)

Write record to target.

This is called by write for every record yielded by iterdb. record is the values returned by simplify and target is the value returned by context.

Serialiser

class norman.serialise.Serialiser

This simply inherits from Reader and Writer to combine the functionality into one class for interfaces which support both reading and writing.

CSV

class norman.serialise.CSV(uidname='_uid_', **kwargs)

This is a Serialiser which reads and writes to a collection of csv files.

Each table in the database is written to a separate file, which is managed by csv.DictReader and csv.DictWriter. Any extra initialisation parameters are passed to these. If this includes fieldnames, it should be a mapping of table to fieldnames. This defaults to a sorted list of table fields. This is only used for writing.

An additional field specified by uidname is prepended which contains the record’s _uid. uidname may be empty or None, in which case uids are ignored and the field is omitted.

Since csv files can only contain text, all values are converted to strings when writing, and it is up to the database to convert them back into other objects when reading. The exception to this is uid keys, which are handled by the Reader. NotSet values are omitted when writing, and empty field values are converted to NotSet when reading.

The target and source specified in read and write should be a mapping of table name to file name, for example:

mapping = {Table1: '/path/table1.csv', Table2: '/path/table2.csv'}
CSV().read(mapping, db)

Any missing tables are omitted.

Sqlite

class norman.serialise.Sqlite(uidname='_uid_')

This is a Serialiser which reads and writes to a sqlite database.

Each table is dumped to a sqlite table with the same field names. An additional field specified by uidname is included which contains the record’s _uid. uidname may be empty or None, in which case uids are ignored and the field is omitted.

The sqlite database is created without any constraints. As described in the sqlite3 docs, under Python2, text is always returned as unicode.