Monday, January 26, 2009

ORM strengths and shortcomings

Object-Relational Mapper technology is used in object-oriented languages to try to reduce the amount of SQL contained inside application logic. ORM libraries exist for several dynamically-typed languages. Two popular Python ORM libraries are SQLObject and SQLAlchemy. The basic idea behind an ORM is that persistent objects within an application are mapped to a database table. The table schema is derived from the class declaration of the object to be stored.

For instance, here is an example of a BlogEntry class using SQLObject
#SQLObject declaration example.

from sqlobject import *

class BlogEntry(SQLObject):
"""An abstraction representing a BlogEntry."""
class sqlmeta:
table ='blog_entry'
idName ='id'

uuid = StringCol(length=36,\
title=StringCol(length=80,default="Title Placeholder")
user=ForeignKey('User', default=None)
Here, we have a blog entry abstraction. The BlogEntry class defines a meta class called sqlmeta. This meta class is used to specify table-specific information used by the database when the table is created. For instance, the underlying table in the example will be called blog_entry and will use the id column is the primary key. We have also defined several columns for our class. These columns will serve as attributes for any instances of this class. Once an instance of BlogEntry is created, the ORM will automatically create a table row in the database.

I consider this to be a real strength of ORM technology. It drastically simplifies the abstraction storage requirement. There is no need to write SQL CREATE statements. Or INSERT and UPDATE statements for that matter. There is nothing specifically wrong is SQL. SQL is extremely expressive and powerful. The problem arises when combining SQL with application logic in an interleaved manor. This leads to unmaintainable systems.

One approach to decoupling the SQL required for persistent objects from the behavior implemented by the objects in an object-oriented system is define SQL templates. For example, we might have an define_blog_entry.sql template file. This file could then be read by some database module that then executes the SQL. The developer would then write several other templates for UPDATE, INSERT, and various other database activities. ORM libraries do this very well. There is a very transparent layer that manages persistence.

OK, so how about querying? How do we get our objects back from the database? Well, the ORM also does this. From the example, our BlogEntry class inherits a select() class method from the SQLObject class. Using this method we can pass various criteria in order to retrieve BlogEntry instances.

I think this is the key weakness in the ORM. In a large percentage of cases, it serves well. All we want to retrieve are blog entries. What about when we need multiple types of objects? There is really no way to do this. At least not sensibly. In our example, all we can do is There is no method to retrieve BlogEntry instances and User instances in the same query. Multiple types means multiple queries in ORM land.

SQL along with relational databases are indispensable. Especially the SELECT statement. It is by fore the most effective way to retrieve complex data. ORM technology has done a great job exploiting most of the power SQL has to offer. I just don't thing the querying functionality is as flexible as it could be in most cases.

1 comment :

  1. retrieving multiple types of objects is not an issue for many ORMs including Hibernate (supports it with HQL) and SQLAlchemy:

    sess.query(BlogEntry, BlogComment).\

    Preserving most of the capability of SQL within the ORM is a key feature of SQLAlchemy, and is not a general ORM issue - the idea that such a problem exists is usually pushed by proponents of OODBMS.