Reading A Pattern Language by Christopher Alexander has given me some ideas about modeling data for enterprise IT systems. I’ll jump right in to my stream of conciousness. If the organization is viewed as a space in which different types of entities interact to accomplish the goals of the organization then we can define these entities in spacial terms such as Alexander lays out a town with its neighborhoods, shopping centers, buildings, roads and etc. This provides a structure for thinking about how the organization operates and what function IT applications perform within that organization.

We can begin at the superstructure and then work our way down into the nooks and crannies. At a high level within the organization there are people, processes, and knowledge that interact to accomplish enterprise goals:

People who work for the organization

Processes that the people follow to accomplish goals

Knowledge within the organization

Goals of the organization

We can say that the role of enterprise IT applications is to provide functionality and data to the organization. Functionality allows the people to carry out the processes and of course can, to some extent, automate the processes. Data as long as it is accessible by the people or automated processes represents one form of enterprise knowledge. Therefore IT software applications provide:

Functionality provided by IT applications

Data stored in IT systems

IT applications in most cases do not operate in isolation in the enterprise space, they interact with other applications to perform larger functions, assisting in the execution of processes and providing enterprise knowledge thus accomplishing enterprise goals. In specific the functionality and data of an application interact with functionality and data of other applications in various ways that are unique to the different classes of functionality and data involved. In terms of data, data can interact with other data or with functionality in different ways:

Silo Data is not shared with other applications, and provides knowledge to only the users of the application.

Shared Application Data is managed by a single application, but is shared with other applications.

Enterprise Data lives outside the applications and belongs to the enterprise as a whole.

In terms of functionality a similar scheme applies:

Silo Functionality is not shared with other applications, and provides knowledge to only the users of the application.

Shared Application Functionality is part of a single application, but is available to other applications through an API.

Enterprise Functionality lives outside the applications and belongs to the enterprise as a whole. (An example of this is the shared services provided by an SOA architecture)

And there is also a relationship between functionality and data such that functionality performs different operations on the data:

Present the data

Modify the data

Take action based on the data (e.g. assembling a product)

Transform the data into other classes of data

Transfer the data to other storage.

And from some of these operations, relationships between data is formed. Additionally, data itself encodes relationships between data. Taken together we form an enterprise data model describing the relationships between data:

Data lives in isolation

Data is related to other data

Data is a copy of data stored elsewhere

Data is a transformation of other data

For copied and transformed data a master/slave or synchronization relationship is formed whereby one set of data may be read only and another may be both read and written to.

Data is the master source for data in its class

Data is a slave destination for data in its class

Data is a synchronized source among different sources of data in its class

For related data different relationships are possible:

Data refers to other data, the data being refered to can be used independently.

Data contains other data, the contained data is dependent and intertwined with the data.

Data represents another view on other data

Data referring and data containing are well known within traditional data modeling not as two relationships (as described here), but as one relationship. However, to understand data properly it is important to know where the scope of a particular class of data ends. Clearly in the case of orders, the order lines are contained within and can be treated as a unit. It would not make sense to delete an order without deleting the lines, thus, the “contained” dependent relationship. Conversly, refered data such as the customer for which an order is placed, is independent. If the order is deleted, the data for the customer that placed the order is not necessarily deleted. The customer data could be used to place other orders and is therefore independent. Once the order is invoiced, the third situation may arrise. Order, customer and address data can change over time, but an invoice (for legal reasons) needs to represent that data at a single point in time. To accomplish this many systems will take a snapshot of the data and store that snapshot inside of special invoice tables. This creates data that represents another view of the original data. Then, what views are possible? Several new ideas emerge:

Data as the historical records of other data

Data as the understanding of other data from different points of view

Historical data is covered by the invoice example, but is also common in the form of versioning systems that allow users to undo changes to data and rollback to previous versions. Data from different points of view, however, is not (yet) as common in IT systems. Data is not truth or reality itself, but attempts to represent truth and reality. This indirect nature of data leads to different problems in point of view, problems on which most murder mystery novels are based. If the detectives in the novel understood the information as most IT systems do they would be lead from one conculsion to the next with no hope of solving the crime. In IT, customer data management systems fall victim to this confusion. The customer data in the internal system that sales people use may have a different point of view from the customer address data that the customers enter themselves for online purchases. I have watched in horror as online users change the address of an order by whiping out (known as overlaying in the business) the first address in their online address book. Did they not see the add address button? I wonder how our new customer hub software will deal with that. Many customer hubs attempt to join all the matching data to one master record so that it is possible to look at the data from multiple points of view. In the case that the customer overlays the data the hub will rejoin data to its master records dynamically. These hubs are attempting to tackle data from different points of view. Data about truth is not what it seems.

Just as Alexander has done in his book about “that other kind of architecture”, here we have layed out a language of software and functionality and data, or at least we have begun to. Is this analysis valuable? Where do we go next? I suppose that is what blogs are good for. Explore on! (later.)