A short user's guide to mytools

The toolset

Two different, complementary tools come in this package:

  1. MyDatabaseCluster (in module mytools.DatabaseCluster): a class designed to run private instances of MySQL servers for Python applications.
  2. MyCatalog (in module mytools.Catalog): a searchable catalog for Python objects, plus helper classes (in module mytools.query) for building Python-style queries to the catalog.

How MyCatalog works

MyCatalogs are fairly straightforward to understand.  A MyCatalog uses a MySQL database as its data store.  Each MyCatalog keeps one or more infospaces: groups of related objects.  Your application opens a connection to an infospace and uses it to store objects keyed by string, which are automatically indexed in a form optimized for fast querying.

Preemptive explanation: MyCatalog is *not* a persistence engine: it is designed to be a companion to a real data store, such as the shelve module, or (more sophisticated) ZODB from Zope.  You cannot store an object and retrieve it later.  More importantly, a catalog can only store a limited set of data types, namely strings (also usable as paths), floats, ints, booleans and None.  Having said this, these limitations are no more than a nuisance.  MyCatalog's primary purpose is to let you integrate very fast searching, with unbounded complexity, to your application.

Objects you want catalogued need to comply with two simple rules:

class MyObject(Catalogable):
    def name = "my name"
    def something = 0.8
    def MyObject_catalogable = ["name","something"]

Using MyCatalog

Practical usage of the catalog is fairly straightforward.  Simply instantiate a MyCatalog class, passing standard MySQL database connection arguments (consult the MySQLdb documentation for further information on this topic).  The catalog will automatically connect, check that the chosen database exists and set the client encoding to Python's preferred encoding (which is equivalent to your application's preferred encoding).

Once you have a catalog instance, request an infospace from the catalog via the get_catalog_infospace(name) method.  The catalog will automatically create the infospace if necessary.  You can now open one or more connections to the infospace via the conn() method of the infospace.

Remember to close() the infospace connection once you're done using it.

Cataloguing and uncataloguing objects

To catalog an object (which needs to comply with the rules described above), you can use the catalogObject method of a MyCatalogInfospace instance:

    object = MyObject(); key = "my_object_name"
    catalog_connection.catalogObject(key,object)

A similar procedure is used to uncatalog an object:

    catalog_connection.uncatalogObject(key)

Searching the catalog for objects

The catalog is fully searchable.  To search the catalog for objects, use the search() method, passing an instance of Expression, available in the mytools.query module. The catalog's search method returns a Set of object keys:

    keys = catalog_connection.search(expression)

Expressions can be Criterions or Aggregations.

Constructing expressions

You construct a criterion by instancing the Expression class, and setting the attribute name, value and query type, like this:

    e = Expression(attribute="artist",
    						operator=Contains(),value="Beatles")

where value can be a string, None, a mx.DateTime class, an integer or a float. The search() method will transparently use these data types as appropriate, so remember that "2" is not the same as 2.

There is a list of operators available to you in the operators list of the query module. See each operator's documentation for information on what it does.

To aggregate expressions in an AND/OR fashion, instantiate an Or or And class as appropriate, passing a list of expressions to aggregate:

    aggregation = Or(exp1,exp2...)

Constructing expressions with lists and dictionaries

Another way to generate expressions is available, perhaps more suitable for runtime generation:

    uexpression = { "aggregation":"or", "expressions": [
    { "attribute":"artist","operator":"has","value":"Beatles" },
    { "attribute":"album","operator":"=","value":"Revolver" },
    ]}
    expression = query.normalize_expression(uexpression)
    keys = my_catalog_connection.search(expression)

See each operator's documentation to find its proper textual representation to fill in for the "operator" key.

Searching for paths

The catalog can also be searched for paths, which really are just special cases of strings. For path searches to work, your catalogued object's attribute that contains the path must contain a path that is normalized (e.g. with the os.path.normpath() function), that is, no double slashes, relative paths, and no slashes at the end. Just instantiate a criterion, passing IsBelow() or IsOrIsBelow() for operators.

Using MyDatabaseCluster

MyDatabaseClusters enable you to run private instances of MySQL servers.  You can use a MyDatabaseCluster to create and start a private database server for your application.

To start a database server, instantiate a MyDatabaseCluster and start it:

    cluster= MyDatabaseCluster("/path/to/data/files")
    cluster.start()

The database cluster will be created in the chosen directory if necessary, and secured (accessible only to the current user) .  Connections will be possible only through a local, random UNIX socket, and thus using it directly is not recommended (although possible).  Once the database server is started, you can use the cluster's get_database(database_name) method to return a dictionary of connection parameters, suitable for use with MySQLdb.connect() or MyCatalog.

To stop the database server, use the instance's stop() method.

    cluster.stop()

If you forget to stop the server (or stop() does not stop the
database server for some reason), it will be killed upon application
exit, leading to database recovery and possible data loss.

It is not possible to run two database servers out of the same cluster, and attempting to do so will raise an error.