Serialisation¶
In addition to supporting the pickle
protocol, Norman provides a
framework for serialising and de-serializing databases to other formats
through the norman.serialise
module. Serialisation classes inherit
Reader
, Writer
or Serialiser
, which is a subclass of the first two
provided for convenience.
Serialisation Framework¶
In addition to the Reader
, Writer
and Serialiser
classes, a convenience
function is provided to generate uids.
-
norman.serialise.
uid
()¶ Create a new uid value. This is useful for files which do not natively provide a uid.
Readers¶
-
class
norman.serialise.
Reader
¶ An abstract base class providing a framework for readers.
Subclasses are required to implement
iter_source
and may re-implement any other methods to customise behaviour.The entry point in the
read
method, which iterates of over records yielded byiter_source
, identifies possible foreign keys byisuid
and dereferences them by identifying loops and processing them withcreate_group
. This method callscreate_record
to actually create the record.-
read
(source, db)¶ Read data from a source into db.
This converts each value returned by
iter_source
into a record usingcreate_record
. It also attempts to re-map nested records by searching for matching uids.Cycles in the data are detected, and all records involved in in a cycle are created in
create_group
.
-
iter_source
(source, db)¶ Iterate over record in the source file, yielding tuples of
(table, data)
or(table, uid, data)
. table is theTable
containing the record, uid is a globally unique value identifying the record and data is a dict of field values for the record, possibly containing other uids. If uid is omitted, then one is automatically generated usinguuid
.Parameters:
-
isuid
(field, value)¶ Return
True
if value, for the specified field, could be a uid.field is a
Field
object.This only needs to check whether the value could possibly represent another field. It is only actually considered a uid if there is another record which matches it.
By default, this returns
True
for all strings which match a UUID regular expression, e.g.'a8098c1a-f86e-11da-bd1a-00112444be1e'
.
-
create_group
(records)¶ Create a group of records. records is an iterable containing co-dependant records, i.e. records which cyclically reference each other. In many cases, records will contain only a single record.
Each record returned by records is a tuples of
(table, uid, data, cycles)
. The first three values are the same as those returned byiter_source
, except that foreign uids in data have been dereferenced. cycles is a set of field names which contain the cyclic references.The default behaviour is to remove the cyclic fields from data for each record, create the records using
create_record
and assign the created records to the cyclic fields.The return value is an iterator over
(uid, record)
pairs.
-
create_record
(table, uid, data)¶ Create a single record in table, using uid and data, as given by
iter_source
. This is called bycreate_group
, so any foreign uid in data should have been dereferenced. The record created should be returned, or, if it cannot be created,None
should be returned.The default implementation simply calls
table(**data)
and sets the uid.
-
Writers¶
-
class
norman.serialise.
Writer
¶ An abstract base class providing a framework for writers.
Subclasses are required to implement
context
andwrite_record
and may re-implement any other methods to customise behaviour.The entry point in the
write
method, which opens the target file withcontext
and iterates of over records in the database withiterdb
. Each record is converted to a simple python structure withsimplify
and written usingwrite_record
.-
write
(targetname, db)¶ Write the database to filename.
fieldname is used only to open the file using
open
, so, depending on the implementation could be anything (e.g. a URL) whichopen
recognises. It could even be omitted entirely if, for example, the serialiser dumps the database as formatted text to stdout.
-
context
(targetname, db)¶ Return a context manager which opens and closes the file, including and preparation and finalisation needed. A common implementation might be:
def context(self, file): return open(file, 'w')
This can also be implemented using
contextlib.contextmanager
, which is useful for more complicated examples:@contextlib.contextmanager def context(self, targetname, db): fh = open(targetname, 'w') fh.write('### Header line ###') yield fh fh.write('### Footer line ###') fh.close()
-
iterdb
(db)¶ Return an iterator over records in the database.
Records should be returned in the order they are to be written. The default implementation is a generator which iterates over records in each table.
-
simplify
(record)¶ Convert a record to a simple python structure.
The default implementation converts record to a
dict
of field values, omittingNotSet
values and replacing other records with their _uid properties. The return value is passed directly towrite_record
, so it can be anything recognised by it. This implementation returns a tuple of(tablename, record._uid, record_dict)
.
-
CSV¶
-
class
norman.serialise.
CSV
(uidname='_uid_', **kwargs)¶ This is a
Serialiser
which reads and writes to a collection of csv files.Each table in the database is written to a separate file, which is managed by
csv.DictReader
andcsv.DictWriter
. Any extra initialisation parameters are passed to these. If this includes fieldnames, it should be a mapping of table to fieldnames. This defaults to a sorted list of table fields. This is only used for writing.An additional field specified by uidname is prepended which contains the record’s
_uid
. uidname may be empty orNone
, in which case uids are ignored and the field is omitted.Since csv files can only contain text, all values are converted to strings when writing, and it is up to the database to convert them back into other objects when reading. The exception to this is uid keys, which are handled by the
Reader
.NotSet
values are omitted when writing, and empty field values are converted toNotSet
when reading.The target and source specified in
read
andwrite
should be a mapping of table name to file name, for example:mapping = {Table1: '/path/table1.csv', Table2: '/path/table2.csv'} CSV().read(mapping, db)
Any missing tables are omitted.
Sqlite¶
-
class
norman.serialise.
Sqlite
(uidname='_uid_')¶ This is a
Serialiser
which reads and writes to a sqlite database.Each table is dumped to a sqlite table with the same field names. An additional field specified by uidname is included which contains the record’s
_uid
. uidname may be empty orNone
, in which case uids are ignored and the field is omitted.The sqlite database is created without any constraints. As described in the
sqlite3
docs, under Python2, text is always returned as unicode.