Difference between revisions of "Data Access Objects (DAO)"

From PKP Wiki
Jump to: navigation, search
(Initial version)
 
m (Added link to bug report for examples.)
Line 113: Line 113:
 
   ...
 
   ...
 
  }
 
  }
 +
 +
See [http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=5231 bug report 5231] for further examples and discussion of this error.

Revision as of 06:52, 16 February 2011

The DAO pattern

Typical properties of implementations of the Data Access Object (DAO) pattern in the context of a Model-View-Controller (MVC) application as ours are:

  • The DAO maps the relational database model to an OO model thereby bridging the "impedance" mismatch of these two data modeling approaches. It also sometimes hides implementation details of specific storage technologies (e.g. different database vendors). The latter is often, at least partially, delegated to a special data access abstraction layer, though (i.e. ADOdb in our case or maybe PDO in the future). This decouples the controller from the relational data model and from database implementation details.
  • The DAO usually implements a basic "Create-Read-Update-Delete" (CRUD) pattern to handle single instances of entity objects.
  • Most of the time additional use-case specific methods are required that batch-retrieve objects to improve data access performance or populate objects differently, e.g. with or without dependent objects loaded.
  • The DAO will usually return domain objects (DO) that represent instances of data entities (e.g. articles, users, etc.).
  • DOs contain either only values or in more recent design paradigms (domain driven design, DDD) values & domain-specific behavior. In our case we mostly stick to the value-only approach but exceptions confirm the rule and there's not problem with that.
  • DAOs usually instantiate DOs and not the other way round except when implementing lazy-load (see below). If you can choose then you should avoid dependencies from the DO to the DAO to improve encapsulation.


DAO design problems and trade-offs

DAO design is not as straightforward as it may seem from the above description. The main challenges are:

  • the "impedance mismatch" between the relational and OO data models and
  • performance (runtime, memory) vs. code re-use trade-offs.

Polymorphism

The relational data model does not implement polymorphism out-of-the-box. This means that as soon as DOs are polymorphic we'll no longer have a clean one-to-one relationship between database entities (aka "tables") and domain objects. An object-relational (O/R) mapping will be required. The three standard O/R mappings for polymorphic objects are: Single Table Inheritance, Class Table Inheritance and Concrete Table Inheritance. No description here - descriptions are easily to be found on the web.

The standard pattern for an inheritance mapper is like this:

Inheritance Mapper.png

In the given example, the SubmissionFileDAO will often be the only DAO required to handle access to mixed lists of files. Whenever the implementation doesn't matter you'll use the methods of the domain object DAO to access objects from the whole inheritance hierarchy. The DAO will make sure that monograph files will be instantiated as MonographFile and artwork files as ArtworkFile classes.

To achieve this, the SubmissionFileDAO will delegate to one of the two concrete DAOs when one of its methods is being invoked. It identifies the right DAO either by inspecting the type of the object (on update/insert) or by querying the base table that contains some type indicator (on read access, e.g. genre in the case of the monograph/artwork file distinction).

To avoid unnecessary database queries the SubmissionFileDAO can also prepare data via outer joins over all class tables to be passed into the concrete DAOs. The methods of the delegates do not necessarily correspond 1:1 to the methods on the public DAO interface. See the separate inheritance hierarchies.

The inheritance mapper design does not prescribe a specific database design (see the three O/R mapping strategies above). We can use all common mapping techniques.

You can use one of the concrete DAOs in your handler but only if you really want to deal with that specific domain object type. If you implement a special artwork file editing form for example it may make sense to use the ArtworkFileDAO directly. Most of the time access through the domain object DAO should be preferred, though, as it will deal with most cases and is easier to extend and maintain.


Performance Optimization vs. Maintainability

Performance problems are especially difficult to resolve whenever a domain object cannot be isolated from other domain objects but links to them (object composition and aggregation). Often the linked objects will link other objects and so on. So we need to make a judgment where to cut of the dependency chains. This can only be done based on specific use cases which means that we'll need to couple the DAO implementation at least to some extend to the controller, at least in our case where a fully-fledged O/R mapper is not an option.

As a basic rule we should always populate as many compositions/aggregates by default as can reasonably be done without the user perceiving a slow-down (or the server memory/CPU load being unduly impacted). This means that we "avoid premature optimization" for the sake of code maintainability and better re-use. It is obvious that the more dependencies are populated by default the more use cases can be served by a DAO method and the less DAO methods we'll have to write. It is wrong, though, to include data for which we have no use case just because there "might" be one in the future.

This is a very difficult trade-off to make and often we'll have to re-factor to accommodate new use cases. Such trade-offs should be made consciously, though.

If we think that populating all dependencies unduly hurts performance then we can choose from one of the following design options without breaking the DAO pattern:

Lazy Load

We can initialize (parts of) an object at runtime when needed by depending on another DAO to retrieve the linked object when needed, i.e.

class Chapter {
  function &getAuthors() {
    $chapterAuthorDao =& ...
    return $chapterAuthorDao->getAuthors(...);
  }
}

Polymorphism

We can introduce subclasses so that we can use the general concept of an entity in use cases where we require less data and only instantiate the the more complex, specialized (and expensive) concept of an entity when really needed. See the inheritance mapper design pattern above for that case. One example would be the different versions of submissions (author submissions, series editor submissions, etc.) that we implement.

Please make sure, though, that the entities really implement a semantic concept that makes sense in "natural language", i.e. a concept that is not only introduced for performance reasons but makes sense to those who read the code.

Use-case specific DAO methods

We can create two different DAO-methods, one that retrieves a fully populated object and another that only retrieves an object with basic values set.

Example: Implement a DAO method that retrieves an Author object with it's user group populated and another one that leaves the Author's user group null. The former calls the latter and only adds the user group. We can add an assertion to the DO, i.e. in this case Author::getUserGroup() that makes sure that the accessor is never called when the field has not been populated.

What to use when?

Keeping maintenance cost low is our most important design goal. The basic rule therefore is "don't repeat yourself" (DRY) as long as it doesn't unduly impact performance or introduces too much complexity or abstraction. Whenever you see potential for code re-use you should try to realize it. Make your choice based on which design pattern will give you a minimum of repeated code (i.e. minimize long-term development cost) - if several options are available then choose the one that makes your code more readable and is cheaper to implement.

Common DAO Design Errors

DOs that instantiate themselves via their corresponding DAO

We had cases where a DO instantiates itself via its own DAO and copying this code across the inheritance hierarchy. Don't confuse this with the above lazy-load pattern. While it is ok that a domain object lazy-loads other objects, it's usually not ok that a domain object instantiates itself.

Compositions that are not part of the concept of an object

Sometimes you may be tempted to introduce a dependent class into an object just for a single use case. You shouldn't do that. Always pass your design by the "natural language test". Does it make sense to say "my object has-a xyz object" (composition) or "my object is-a xyz object" (inheritance). If not then you should think again and find out where the information should live from a domain object model perspective.

It is especially bad to partially copy properties of other objects into your object rather than copying the whole object as a dependency.

See this for an anti-example:

class Author {
  ...
  function getLocalizedUserGroupName() {
    $userGroup =& $userGroupDao->...
    return $userGroup->get...();
  }
  ...
}

This is wrong for two reasons:

  1. The user group name is a property of the user group and not of the author. It is wrong to say that "the author has-a user group name". It would only be correct to say "the user group has-a name".
  2. An author can be part of several (author) user groups. So copying over a single property won't work anyway.

If we allowed that then we'd also have to implement Author::getUserGroupRoleId(), Author::getUserGroupPath(), etc. all by re-instantiating the same DAO over and over again.

(This is real code by the way that was part of our code base...)

What's really the case is that the author belongs to (aka "has") user groups and not a user group name. So the correct "lazy-load" implementation would be:

class Author {
  ...
  function &getUserGroups() {
    $userGroups =& $userGroupDao->...
    return $userGroups;
  }
  ...
}

See bug report 5231 for further examples and discussion of this error.