.. module:: norman.serialise

.. testsetup::

    from norman import *


Serialisation
=============

In addition to supporting the `pickle` protocol, Norman provides a
framework for serialising and de-serializing databases to other formats
through the `norman.serialise` module.  Serialisation classes inherit
`Serialiser`, and should reimplement at least `~Serialiser.iterfile`
and `~Serialiser.write_record`.  `Serialiser` has the following methods,
grouped by functionality:

*   General

    *   `~Serialiser.open`
    *   `~Serialiser.close`

*   Loading (Reading)

    *   `~Serialiser.load` (Class method)
    *   `~Serialiser.create_records`
    *   `~Serialiser.finalise_read`
    *   `~Serialiser.initialise_read`
    *   `~Serialiser.isuid`
    *   `~Serialiser.iterfile`
    *   `~Serialiser.read`
    *   `~Serialiser.run_read`

*   Dumping (Writing)

    *   `~Serialiser.dump` (Class method)
    *   `~Serialiser.finalise_write`
    *   `~Serialiser.initialise_write`
    *   `~Serialiser.iterdb`
    *   `~Serialiser.run_write`
    *   `~Serialiser.simplify`
    *   `~Serialiser.write`
    *   `~Serialiser.write_record`


.. contents::


Serialiser framework
--------------------

.. class:: Serialiser(db)

    An abstract base class providing a framework for serialisers.

    Subclasses are instantiated with a `~norman.Database` object, and
    serialisation and de-serialisation is done through the `write` and `read`
    methods.  Class methods `dump` and `load` may also be used.

    Subclasses are required to implement `iterfile` and its counterpart,
    `write_record`, but may re-implement any other methods to customise
    behaviour.


    .. attribute:: db

        The database handled by the serialiser.


    .. attribute:: fh

        An open file (or database connection), or `None`.

        This is set to the result of `open`.  If a file is not currently
        open, then this is `None`.


    .. attribute:: mode

        Indicates the current operation.

        This is set to ``'w'`` during *dump* operations and ``'r'`` during
        *load*.  At other times it is `None`.


    .. classmethod:: dump(db, filename)

        This is a convenience method for calling `write`.

        This is equivalent to ``Serialise(db).write(filename)`` and is
        provided for compatibility with the `pickle` API.


    .. classmethod:: load(db, filename)

        This is a convenience method for calling `read`.

        This is equivalent to ``Serialise(db).read(filename)`` and is
        provided for compatibility with the `pickle` API.


    .. method:: close

        Close the currently opened file.

        The default behaviour is to call the file object's `!close` method.
        This method is always called once a file has been opened, even if an
        exception occurs during writing.


    .. method:: create_records(records)

        Create one or more new records.

        This is called for every group of cyclic records.  For example,
        if records *a* references record *b*, which references record *c*, and
        record *c* references record *a*, then records *a*, *b*, and *c*
        form a cycle.  If record *d* references record *e* but record *e*
        doesn't reference any other record, each of them are considered to
        be isolated.

        *records* is an iterator yielding tuples of
        ``(table, uid, data, cycles)`` for each record in the cycle, or only
        one record if there is no cycle.  The first three values are the same
        as those returned by `iterfile`, except that foreign uids in data
        have been dereferenced.  *cycles* is a set of field names which
        contain the cyclic data.

        The default behaviour is to remove the cyclic fields from *data*
        for each record, create the records using ``table(**data)``
        and assign the created records to the cyclic fields.  The *uid*
        of each record is also assigned to its *_uid* attribute.

        The return value is an iterator over ``(uid, record)`` pairs.


    .. method:: finalise_read

        Finalise the file after reading data.

        This is called after `run_read` but before `close`, and can be
        re-implemented to for implementation-specific finalisation.

        The default implementation does nothing.


    .. method:: finalise_write

        Finalise the file after writing data.

        This is called after `run_write` but before `close`, and can be
        re-implemented to for implementation-specific finalisation.

        The default implementation does nothing.


    .. method:: initialise_read

        Prepare the file for reading data.

        This is called before `run_read` but after `open`, and can be
        re-implemented to for implementation-specific setup.

        The default implementation does nothing.


    .. method:: initialise_write

        Prepare the file for writing data.

        This is called before `run_write` but after `open`, and can be
        re-implemented to for implementation-specific setup.

        The default implementation does nothing.


    .. method:: isuid(field, value)

        Return `True` if *value*, for the specified *field*, could be a *uid*.

        *field* is a `~norman.Field` object.

        This only needs to check whether the value could possibly represent
        another field.  It is only actually considered a *uid* if there is
        another record which matches it.

        By default, this returns `True` for all strings which match a UUID
        regular expression, e.g. ``'a8098c1a-f86e-11da-bd1a-00112444be1e'``.


    .. method:: iterdb

        Return an iterator over records in the database.

        Records should be returned in the order they are to be written.  The
        default implementation is a generator which iterates over records in
        each table.


    .. method:: iterfile

        Return an iterator over records read from the file.

        Each item returned by the iterator should be a tuple of
        ``(table, uid, data)`` where  *table* is the `~norman.Table`
        containing the record, *uid* is a globally unique value identifying
        the record and *data* is a dict of field values for the record,
        possibly containing other uids.

        This is commonly implemented as a generator.


    .. method:: read(filename)

        Load data into `db` from *filename*.

        *fieldname* is used only to open the file using `open`, so, depending
        on the implementation could be anything (e.g. a URL) which `open`
        recognises.  It could even be omitted entirely if, for example,
        the serialiser reads from stdin.


    .. method:: open(filename)

        Open *filename* for the current `mode`.

        The return value should be a handle to the open file.  The default
        behaviour is to open the file as binary using the builtin *open*
        function.


    .. method:: run_read

        Read data from the currently opened file.

        This is called between `initialise_read` and `finalise_read`, and
        converts each value returned by `iterfile` into a record using
        `create_records`.  It also attempts to re-map nested records by
        searching for matching uids.

        Cycles in the data are detected, and all records involved in
        in a cycle are created in `create_records`.


    .. method:: run_write

        Called by `dump` to write data.

        This is called after `initialise_write` and before `finalise_write`,
        and simply calls `write_record` for each value yielded by `iterdb`.


    .. method:: simplify(record)

        Convert a record to a simple python structure.

        The default implementation converts *record* to a `dict` of
        field values, omitting `~norman.NotSet` values and replacing other
        records with their *_uid* properties.  The return value of this
        implementation is a tuple of ``(tablename, record._uid, record_dict)``.


    .. method:: write(filename)

        Write the database to *filename*.

        *fieldname* is used only to open the file using `open`, so, depending
        on the implementation could be anything (e.g. a URL) which `open`
        recognises.  It could even be omitted entirely if, for example,
        the serialiser dumps the database as formatted text to stdout.


    .. method:: write_record(record)

        Write *record* to the current file.

        This is called by `run_write` for every record yielded by `iterdb`.
        *record* is the values returned by `simplify`.


Sqlite
------

.. class:: Sqlite

    This is a `Serialiser` which reads and writes to a sqlite database.

    Each table in `~Serialiser.db` is dumped to a sqlite table with the
    same field names.  An additional field, *_uid_* is included which
    contains the record's *_uid*.  The sqlite database does not have any
    constraints, not even primary key constraints, as it is intended to
    be used purely for storage.

    The following methods are re-implemented from `Serialiser`:

    *   `~Serialiser.finalise_write` commits changes to the database.
    *   `~Serialiser.initialise_write` starts a database transaction and
        create tables.
    *   `~Serialiser.initialise_read` sets the `sqlite3` row factory.
    *   `~Serialiser.iterfile` yield records from each valid table in the
        file which matches a table in `~Serialiser.db`.
    *   `~Serialiser.open` returns an open database connection to *filename*.
    *   `~Serialiser.write_record` adds a record to the sqlite database.


.. class:: Sqlite3

    .. deprecated:: 0.6.1

        Use `Sqlite`, which implements the `Serialiser` framework instead.


    .. method:: dump(db, filename)

        Dump the database to a sqlite database.

        Each table is dumped to a sqlite table, without any constraints.
        All values in the table are converted to strings and foreign objects
        are stored as an integer id (referring to another record). Each
        record has an additional field, '_oid_', which contains a unique
        integer.


    .. method:: load(db, filename)

        The database supplied is read as follows:

        1.  Tables are searched for by name, if they are missing then
            they are ignored.

        2.  If a table is found, but does not have an "oid" field, it is
            ignored

        3.  Values in "oid" should be unique within the database, e.g.
            a record in "units" cannot have the same "oid" as a record
            in "cycles".

        4.  Records which cannot be added, for any reason, are ignored
            and a message logged.


.. testcleanup::

    import os
    try:
        os.unlink('file.sqlite')
    except OSError:
        pass