Tuesday, January 20, 2015

Application vs. Reference Data

At my very first programming job, I learned a distinction between two kinds of data: Reference and Application.

Application data is produced by the application, consumed and modified by the software. It is customers, orders, events, inventory. It changes and grows all the time. Typically a new deployment of the app starts with no application data, and it grows over time.

Reference data is closer to configuration. Item definitions, tax rates, categories, drop-down list options. This is read by the application, but changed only by administrative interfaces. It's safe to cache reference data; perhaps it updates daily, or hourly, at the most. Often the application can't even run without reference data, so populating it is part of deployment.


Back at Amdocs we separated these into different database schemas, so that the software had write access to application data and read-only access to reference data. Application data had foreign key relationships to reference data; inventory items referenced item definitions, customers referenced customer categories. Reference data could not refer to application data. This follows the Stable Dependencies principle: frequently-changing data depended on rarely-changing data, never the other way around.

These days I don't go to the same lengths to enforce the distinction. It may all go in the same database, there may be no foreign keys or set schemas, but in my head the classification remains. Which data is essential for application startup? Reference. Which data grows and changes frequently? Application. Thinking about this helps me avoid circular dependencies, and keep a clear separation between administration and operation.

My first job included some practices I shudder at now[1], but others stick with me. Consider the difference between Reference and Application the next time you design a storage scheme.


[1] Version control made of perl on top of cvs, with file locking. Unit tests as custom drivers that we threw away. C headers full of #DEFINE. High-level design documents, low-level design documents, approvals, expensive features no one used. Debugging with println... oh wait, debugging with println is awesome again. 

No comments:

Post a Comment