Data Structures¶
Contents
Norman data structures are build on four objects: Database, Table, Field
and Join. In overview, a Database is a collections of Table subclasses.
Table subclasses represent a tabular data structure where each column is
defined by a Field and each row is an instance of the subclass. A Join
is similar to a Field, but behaves as a collection of related records:
class Branch(Table):
# Each branch knows its parent branch
parent = Field(index=True)
# Children are determined on the fly by searching for matching parents.
children = Join(parent)
AutoTable is a special type of Table which automatically creates
fields dynamically. This is used in conjunction with AutoDatabase,
is is particularly useful when de-serialising from a source without knowing
details of data in the source.
Database¶
-
class
norman.Database¶ Databaseinstances act as containers ofTableobjects, which are identified by name.Databasesupports the following operations.Operation Description db[name]Return a Tableby namename in dbReturn Trueif aTablenamed name is in the database.table in dbReturn Trueif aTableobject is in the database.iter(db)Return an iterator over Tableobjects in the database.Databases are mainly provided for convenience, as a way to group related tables. Tables may beloong to multiple databases, or no database at all.
-
add(table)¶ Add a
Tableclass to the database.This is the same as including the database argument in the class definition. The table is returned so this can be used as a class decorator.
>>> db = Database() >>> @db.add ... class MyTable(Table): ... name = Field()
-
tablenames()¶ Return an list of the names of all tables managed by the database.
-
reset()¶ Delete all records from all tables in the database.
-
delete(record)¶ Delete a record from the database. This is a convenience function which simply calls record.__class__.delete(record), but also checks that the record does actually belong to the database. If not, a
NormanWarningis raised, and the record is still deleted.
-
-
class
norman.AutoDatabase¶ A subclass of
Databasewhich automatically createsAutoTablesubclasses when a table is looked up by name. For example:>>> adb = AutoDatabase() >>> newtable = adb['NewTable'] >>> issubclass(newtable, AutoTable) True
Apart from this, it behaves exactly the same as
Database.
Tables¶
Tables are implemented as a class, with records as instances of the class.
Accordingly, there are many class-level operations which are only applicable
to a Table, and others which only apply to records. The class methods shown
in Table are not visible to instances.
-
class
norman.Table(**kwargs)¶ Records are created by instantiating a
Tablesubclass. Tables are defined by subclassingTableand addingfieldsto it. For example:>>> class MyTable(Table): ... field1 = Field() ... field2 = Field()
Fieldnames should not start with_, as these names are generally reserved for internal use.FieldsandJoinsmay also be added to aTableafter theTableis created, but cannot be shared between tables. If aFieldwhich already belongs to a table is assigned to another table, a copy of it is created. The same cannot be done with aJoin, since the behaviour of this would be unclear.Records are created by simply instantiating the table, optionally with field values as keyword arguments:
>>> record = MyTable(field1='value', field2='other value')
The following class methods are supported by
Tableobjects, but not by instances. Tables also act as a collection of records, and support the following sequence operations:Operation Description len(t)Return the number of records in t.iter(table)Return an iterator over all records in t.r in tReturn Trueif the recordris an instance of (i.e. contained by) tablet. This should always returnTrueunless the record has been deleted from the table, which usually means that it is a dangling reference which should be deleted.Boolean operations on tables evaluate to
Trueif the table contains any records.-
_store¶ A
Storeinstance used as a storage backend. This may be overridden when the class is created to use a customStoreobject. Usually there is no need to use this.
-
hooks¶ A
dictcontaining lists of callables to be run when an event occurs.Two events are supported: validation on setting a field value and deletion, identified by keys
'validate'and'delete'respectively. When a triggering event occurs, each hook in the list is called in order with the affected table instance as a single argument until an exception occurs. If the exception is anAssertionErrorit is converted to aValueError. If no exception occurs, the event is considered to have passed, otherwise it fails and the table record rolls back to its previous state.These hooks are called before
Table.validateandTable.validate_delete, and behave in the same way. They may be set at any time, but do not affect records already created until the record is next validated.
-
delete([records=None])¶ Delete delete all instances in records. If records is omitted then all records in the table are deleted.
-
fields()¶ Return an iterator over field names in the table
-
-
class
norman.AutoTable(**kwargs)¶ This is a special type of
Tablewhich automatically creates a new field whenever a value is assigned to an attribute which does not yet exist. This only occurs for attributes which do not start with'_'. This should be subclassed in exactly the same was asTable. Attempting to instantiateAutoTabledirectly will result in aTypeErrorbeing raised.>>> class MyTable(AutoTable): pass >>> record = MyTable(a=1) >>> record.a 1 >>> isinstance(MyTable.a, Field) True >>> record.b = 2 >>> isinstance(MyTable.b, Field) True
However:
>>> record._c = 3 >>> MyTable._c Traceback (most recent call last): ... AttributeError: '_c'
As with other
Tableclasses, it is also possible to manually add fields or joins:>>> MyTable.d = Field()
Records¶
Table instances, or records, are created by specifying field values as
keyword arguments. Missing fields will use the default value (see Field).
In addition to the defined fields, records have the following properties and
methods.
-
Table._uid¶ This contains an id which is unique in the session.
It’s primary use is as an identity key during serialisation. Valid values are any integer except 0, or a valid
uuid. The default value is calculated usinguuid.uuid4upon its first call. It is not necessary that the value be unique outside the session, unless required by the serialiser.
-
Table.validate()¶ Raise an exception if the record contains invalid data.
This is usually re-implemented in subclasses, and checks that all data in the record is valid. If not, an exception should be raised. Internal validate (e.g. uniqueness checks) occurs before this method is called, and a failure will result in a
ValidationErrorbeing raised. For convenience, anyAssertionErrorwhich is raised here is considered to indicate invalid data, and is re-raised as aValidationError. This allows all validation errors (both from this function and from internal checks) to be captured in a single except statement.Values may also be changed in the method. The default implementation does nothing.
-
Table.validate_delete()¶ Raise an exception if the record cannot be deleted.
This is called just before a record is deleted and is usually re-implemented to check for other referring instances. This method can also be used to propogate deletions and can safely modify this or other tables.
Exceptions are handled in the same was as for
validate.
Notes on Validation and Deletion¶
Data is validated whenever a record is added or removed, and there is the
opportunity to influence this process through validation hooks. When a
new record is created, there are three sets of validation criteria which
must pass in order for the record to actually be created. The first step
is to run the validators specified in Field.validators. These can change
or verify the value in each field independently of context. The second
validation check is applied whenever there are unique fields, and confirms
that the combination of values in unique fields in actually unique. The
final stage is to run all the validation hooks in Table.hooks. These affect
the entire record, and may be used to perform changes across multiple fields.
If at any stage an Exception is raised, the record will not be created.
The following example illustrates how the validation occurs. When a new
record is created, the value is first converted to a string by the field
validator, then checked for uniqueness, and finally the validate
method creates the extra parts value.
>>> class TextTable(Table):
... 'A Table of text values.'
...
... # A text value stored in the table
... value = Field(unique=True, validators=[str])
... # A pre-populated, calculated value.
... parts = Field()
...
... def validate(self):
... self.parts = self.value.split()
...
>>> r = TextTable(value='a string')
>>> r.value
'a string'
>>> r.parts
['a', 'string']
>>> r = TextTable(value=3)
>>> r.value
'3'
>>> r = TextTable(value='3')
Traceback (most recent call last):
...
norman._except.ValidationError: Not unique: TextTable(parts=['3'], value='3')
When deleting a record, Table.validate_delete is first called. This
should be used to ensure that any dependent records are dealt with. For
example, the following code ensures that all children are deleted when
a parent is deleted.
>>> class Child(Table):
... parent = Field()
...
>>> class Parent(Table):
... children = Join(Child.parent)
...
... def validate_delete(self):
... for child in self.children:
... Child.delete(child)
...
>>> parent = Parent()
>>> child = Child(parent=parent)
>>> Parent.delete(parent)
>>> len(Child)
0
Fields¶
Fields are defined inside a Table definition as class attributes, and
are used as record properties for instances of a Table. If the value of
a field has not been set, then the special object NotSet is used to
indicate this.
-
norman.NotSet¶ A sentinel object indicating that the field value has not yet been set. This evaluates to
Falsein conditional statements.
-
class
norman.Field(unique=False, default=NotSet, readonly=False, validators=None, key=None)¶ A
Fieldis used in tables to define attributes.>>> class MyTable(Table): ... name = Field()
Fields may be created with a combination of properties as keyword arguments, including
default,key,readonly,uniqueandvalidators.Fields can be used with comparison operators to return a
Queryobject containing matching records. For example:>>> class MyTable(Table): ... oid = Field(unique=True) ... value = Field() >>> t0 = MyTable(oid=0, value=1) >>> t1 = MyTable(oid=1, value=2) >>> t2 = MyTable(oid=2, value=1) >>> Table.value == 1 Query(MyTable(oid=0, value=1), MyTable(oid=2, value=1))
The following comparisons are supported for a
Fieldobject, provided the data stored supports them:==,<,>,<=,>==,!=. The&operator is used to test for containment, e.g. `` Table.field & mylist`` returns all records where the value offieldis inmylist.-
key¶ A key function used for indexing, similar to that used by
sorted. All values returned by this function should be sortable in the same list. For example, if the field is known to contain a mixture of strings and integers,strwould be a valid function, butlambda x: xwould not, since a list of strings and integers cannot be sorted.keyshould raiseTypeErrorfor any value it cannot handle. These will be indexed separately, so that equality lookups are still optimised, but comparisons will not be supported. As an illustrative example, consider the following case which orders values by length:>>> class T(Table): ... value = Field(key=len) ... >>> t1 = T(value='abc') >>> t2 = T(value='defg') >>> t3 = T(value=42) >>> (T.value > 'xxx').one() # Find values longer than 3 characters T(value='abc') >>> (T.value == 42).one() # Find the numerical value 42 T(value=42) >>> (T.value() > 42).one() # len(42) raises TypeError Traceback (most recent call last) ... TypeError
The default implementation orders data by type first, then value, for the following types:
numbers.Real,str,bytes. This might lead to unexpected results, since42 < 'text'will evaluate True.NotSetvalues are handled slightly differently, and are never passed through this function. Comparison queries onNotSetwill always fail.
-
name¶ This is the assigned name of the field and is set when it is added to the
Table. This attribute is read-only.
-
owner¶ This is the owning
Tableof the field and is set when it is added to theTable. This attribute is read-only.
-
readonly¶ If
True, prohibits setting the variable, unless its value isNotSet(default:False). This can be used withdefaultto simulate a constant. This can be toggled to effectively lock and unlock the field.
-
unique¶ Trueif records should be unique on this field (default:False). If more than one field in the table have this set then they are evaluated together as a tuple. If this is set after the field is created, all existing records in the table are evaluated and aValidationErrorraised if there are duplicates.
-
validators¶ A list of functions which are used as validators for the field. Each function should accept and return a single value (i.e. the value to be set), and should raise an exception if the value is invalid. The validators are called sequentially in the order specified, i.e.
newvalue = validator3(validator2(validator1(oldvalue))).
-
Joins¶
A Join dynamically creates Queries for a specific record. This is best
explained through an example:
>>> class Child(Table):
... parent = Field()
...
>>> class Parent(Table):
... children = Join(Child.parent)
...
>>> p = Parent()
>>> c1 = Child(parent=p)
>>> c2 = Child(parent=p)
>>> set(p.children) == {c1, c2}
True
In this example, Parent.children returns a Query for all Child
records where child.parent == parent_instance for a specific
parent_instance. Joins have a query attribute which is a Query
factory function, returning a Query for a given instance of the owning table.
-
class
norman.Join(*args, **kwargs)¶ Joins can be created in several ways:
Join(query=queryfactory)- Explicitly set the query factory.
queryfactoryis a callable which accepts a single argument (i.e. the owning record) and returns aQuery. Join(table.field)This is the most common form, since most joins simply involve looking up a field value in another table. This is equivalent to specifying the following query factory:
def queryfactory(value): return table.field == value
Join(db, 'table.field`)- This has the same affect as the previous example, but is used when the
foreign field has not yet been created. In this case, the query
factory first locates
'table.field'in theDatabasedb. Join(other.join[, jointable])- It is possible set the target of a join to another join, creating a
many-to-many
relationship. When used in this way, a join table is automatically
created, and can be accessed from
Join.jointable. If the optional keyword parameter jointable is used, it is the name of the new join table.
Joins have the following attributes, all of which are read-only.
-
jointable¶ The join table in a many-to-many join.
This is
Noneif the join is not a many-to-many join, and is read only. If a jointable does not yet exist then it is created, but not added to any database. If the two joins which define it have conflicting information, aConsistencyErroris raise.
Exceptions and Warnings¶
Exceptions¶
-
class
norman.NormanError¶ Base class for all Norman exceptions.
-
class
norman.ConsistencyError¶ Raised on a fatal inconsistency in the data structure.
-
class
norman.ValidationError¶ Raised when an operation resulting in table validation failing.
For now this inherits from
NormanError,ValueErrorandTypeErrorto keep it backwardly compatible. This will change in version 0.7.0
Advanced API¶
Two structures, Store and Index manage the data internally. These are
documented for completeness, but should seldom need to be used directly.
-
class
norman.Store¶ Stores are designed to hide the implementation details and expose a consistent API, so that they can be switched out without any other changes to the table.
Tables are exposed as an array of cells, where each cell is identified by
TableandFieldinstances. Cells are unordered, although implementations may order them internally.The Store is tolerant of missing values.
getwill return defaults if the record requested does not exist.setwill add a new record if the record does not exist.-
add_field(field)¶ Called whenever a new field is added to the table.
-
add_record(record)¶ Called whenever a new record is created.
-
clear()¶ Delete all records in the store.
-
get(record, field)¶ Return the value in a cell specified by record and field. This should respect any field defaults. If this is called with a record that has not been added, it will be added.
-
has_record(record)¶ Return True if the record has an entry in the data store.
-
iter_field(field)¶ Iterate over pairs of
(record, value)for the specified field. This should respect any field defaults. If this is called with a field that has not been added, the behaviour is unspecified.
-
iter_records()¶ Return an iterator over all records in the data store.
-
iter_unset(field)¶ Iterate over records which do not have a value set on field, that is, those for which
store.get(record, field)will returnfield.default. This is used for managing indexes.
-
record_count()¶ Return the number of records in the table.
-
remove_field(field)¶ Remove a field.
-
remove_record(record)¶ Remove a record.
-
set(record, field, value)¶ Set the data in a record.
-
setdefault(field, value)¶ Called when the default value of a field in changed.
-
-
class
norman.Index(field)¶ An index stores records as sorted lists of
(keyvalue, record)pairs, where keyvalue is a key based on the data cell value, determined by the return value ofField.key, which should always return the same, sortable type. If a return value cannot be sorted, then it is stored separately by its hash, and comparisons (except for equality checks) cannot be used with it. It is is not hashable, then it is stored byid, so equality checks will actually return identity matches. Note thatNotSetis handled separately, and is never evaluated withField.key. The defaultField.keyreturns a tuple of(type, keyvalue)for recognised types. The implementation is:def key(value): if isinstance(value, numbers.Real): return '0Real', value elif isinstance(value, str): return '1str', value elif isinstance(value, bytes): return '2bytes', value else: raise TypeError
The following examples show a few example of how this can be used:
>>> import re >>> from norman import Table, Field >>> class MyTable(Table): ... numbers = Field(key=lambda x: re.findall('\d+', x)) ... >>> r1 = MyTable(numbers='number 1, numbers 2 and 3') >>> r2 = MyTable(numbers='45 and 46') >>> r3 = MyTable(numbers='a, b, c = 5, 6, 7') >>> r4 = MyTable(numbers='no numbers here') >>> set(MyTable.numbers > 'number 3') == set((r2, r3)) True >>> set(MyTable.numbers < '1 or 2') == set((r4,)) True
-
clear()¶ Delete all items from the index.
-
insert(value, record)¶ Insert a new item. If equal keys are found, add to the right.
-
remove(value, record)¶ Remove first occurrence of
(value, record).
-