<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Information Architecture</title>
	<atom:link href="http://mtruchard.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://mtruchard.wordpress.com</link>
	<description>Modeling data for software applications.</description>
	<lastBuildDate>Wed, 30 Sep 2009 12:23:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='mtruchard.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Information Architecture</title>
		<link>http://mtruchard.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://mtruchard.wordpress.com/osd.xml" title="Information Architecture" />
	<atom:link rel='hub' href='http://mtruchard.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Data Modeling as a hobby</title>
		<link>http://mtruchard.wordpress.com/2009/09/30/data-modeling-as-a-hobby/</link>
		<comments>http://mtruchard.wordpress.com/2009/09/30/data-modeling-as-a-hobby/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 12:23:45 +0000</pubDate>
		<dc:creator>mtruchard</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mtruchard.wordpress.com/?p=28</guid>
		<description><![CDATA[To me data modeling is an interesting subject and I suppose it takes a certain personality.  But can it really be a hobby?  Is it something people would do in their spare time?  I wonder.  Recently I read an article &#8220;Why computer modeling should become a popular hobby&#8221; that suggests two things: 1) there are [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=28&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>To me data modeling is an interesting subject and I suppose it takes a certain personality.  But can it really be a hobby?  Is it something people would do in their spare time?  I wonder.  Recently I read an article &#8220;<em><a href="http://www.dlib.org/dlib/october96/10forbus.html">Why computer modeling should become a  popular hobby</a>&#8221; that suggests two things: 1) there are people out there who get into modeling; and 2) maybe modeling just doesn&#8217;t quite have what it takes to keep a wider audience interested (yet?).</em></p>
<p>To be a hobby modeling would need to be something you could show off.  Hey check out this data model!  But modeling doesn&#8217;t work on its own.  Really there are three components:</p>
<p><em>Model, Information, Application</em></p>
<p>Without all three you don&#8217;t have a hobby.</p>
<p><em><br />
</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mtruchard.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mtruchard.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mtruchard.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mtruchard.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mtruchard.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mtruchard.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mtruchard.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mtruchard.wordpress.com/28/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=28&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mtruchard.wordpress.com/2009/09/30/data-modeling-as-a-hobby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4afc00089f862a9c9b08092f830bd621?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mtruchard</media:title>
		</media:content>
	</item>
		<item>
		<title>Complexities of Categorization</title>
		<link>http://mtruchard.wordpress.com/2008/05/27/on-the-categorization-of-shampoo/</link>
		<comments>http://mtruchard.wordpress.com/2008/05/27/on-the-categorization-of-shampoo/#comments</comments>
		<pubDate>Tue, 27 May 2008 20:56:59 +0000</pubDate>
		<dc:creator>mtruchard</dc:creator>
				<category><![CDATA[Categorization]]></category>

		<guid isPermaLink="false">http://mtruchard.wordpress.com/?p=18</guid>
		<description><![CDATA[Using hair care products as an example I will go through an exercise that illustrates some of the tricky issues encountered with categorizing data.  Why do hair care products need categorization? Suppose we are creating a hair care shopping web site where consumers are guided to the right hair care products for their particular needs. Surprisingly, even simple [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=18&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Using hair care products as an example I will go through an exercise that illustrates some of the tricky issues encountered with categorizing data.  Why do hair care products need categorization? Suppose we are creating a hair care shopping web site where consumers are guided to the right hair care products for their particular needs. Surprisingly, even simple shopping web sites need information architecture work (at some point in their lives) and even more surprising is that if you try to design the categorization up front it can get messy so quickly. This leads to the question of whether it even possible to get it right. What is so complex about categorization? Let&#8217;s explore&#8230;<br />
<span id="more-18"></span></p>
<p>Hundreds of hair products are out there. Ever since the 90&#8242;s I have found there to be a confusing array of options surrounding hair care products. I go into the store looking for a shampoo and end up in an aisle stacked with hundreds of different bottles organized in a way that makes absolutely no sense (at least not to me).  Can there really be hundreds of different products? Is this really all shampoo? You know the stuff you wash your hair with? It seems shampooing has transformed, thanks to the marketing genius of the last few decades, from basic hygine into a brave new world of concepts of which I had never before heard.  What really matters is how a shampoo makes you feel. Take, for example, this one direct from the label: an aromatheraputic, organic, paraben free, moisture balancing, 100% biodegradeable, no artificial colors, no animal ingredients, ph balanced shampoo for all hair types. I nostalgically remember from my childhood a simple world where there was no confusion. Where&#8217;s the shampoo? Oh, its right here, comes in one size, works for everyone. See the label says &#8220;shampoo&#8221; and it is right next to the package labeled &#8220;toothpaste&#8221;. Well that&#8217;s the way I remember it anyway.</p>
<p>The shampoo product selection can be confusing, but we are not without hope. In my work as an IT Architect, I&#8217;ve seen harder categorization problems solved. No, we don&#8217;t need no shampoo expert, all we need is a little bit of taxonomy, you know, like Carl Linnaeus did for plants and animals. We&#8217;ll layout the kingdoms, classes, orders, genus, species for shampoo then will plug it all into a product configurator, navigate through the choices, then presto-magico a shampoo will be selected for us. And just for fun we&#8217;ll throw in the rest of the hair care family of products. It&#8217;s that easy.  Lets jump in&#8230;</p>
<p>Here is a first attempt at the high level categories:</p>
<ul>
<li>Hair Care Product
<ul>
<li>Shampoo</li>
<li>Conditioner</li>
<li>Hair Gel</li>
<li>Hair Mousse</li>
<li>Hair Dye</li>
</ul>
</li>
</ul>
<p>We could say, though, that gel and mousse are kind of the same thing. So I&#8217;ll check it out with wikipedia (usually I would look up ISO standards, but I don&#8217;t think they cover hair). Wikipedia has an article on gel which gives me a few more categories:</p>
<ul>
<li>Hair Care Product
<ul>
<li>Hair Spray</li>
<li>Hair Glue</li>
<li>Hair Wax</li>
<li>Ethnic Gel</li>
<li>Hair Coloring Gel</li>
</ul>
</li>
</ul>
<p>Not sure quite what the difference is, but level of hold seems to be a factor.  Also, wikipedia suggested a type of gel that could go also in the hair dye category.  And I can think of another:</p>
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
</ul>
<blockquote><p><span style="color:#000000;"><strong>Issue: not all things fall neatly into one category.  This questions the use of single parent hierarchies for all cases.</strong></span></p></blockquote>
<p>No, problem we&#8217;ll just whip up a poly-hierarchy and put these categories under two parents.</p>
<ul>
<li>Hair Care Product
<ul>
<li>Shampoo
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
</ul>
</li>
<li>Conditioner
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
</ul>
</li>
<li>Hair Styling Product
<ul>
<li>Hair Gel
<ul>
<li>Ethnic Gel</li>
<li>Other Gel</li>
</ul>
</li>
<li>Hair Spray</li>
<li>Hair Wax</li>
<li>Hair Coloring Gel</li>
</ul>
</li>
<li>Hair Dye
<ul>
<li>Hair Coloring Gel</li>
</ul>
<ul>
<li>Other Hair Dye</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Notice that there are four categories that contain only one sub-category: &#8220;Shampoo&#8221;, &#8220;Conditioner&#8221;, &#8220;Hair Gel&#8221;, and &#8220;Hair Dyes&#8221;.  In all four cases the sub-category by no means covers all of the products in the category.  When selecting a category users will typically try to drill down the hierarchy and pick a leaf sub-category.  If, for example, I am trying to categorize &#8220;Redken Clear Moisture Shampoo&#8221; I would drill into &#8220;Shampoo&#8221; and look for an appropriate sub-category, but none exists.  For this reason we may want to create &#8220;other&#8221; categories to help users pick.  (I have seen this situation arise time and time again.) </p>
<blockquote><p><span style="color:#000000;"><strong>Issue: the &#8220;other&#8221; category.  Sometimes a situation occurs where items fall into a parent category, but none of the parent&#8217;s subcategories make sense.  Here you are tempted to create the &#8220;other&#8221; category to make the hierarchy more user friendly.  Sometimes a need for a &#8220;None&#8221; or &#8220;Not Selected&#8221; category also arises.</strong></span></p></blockquote>
<p>Here is the hierarchy with the &#8220;other&#8221; categories added.</p>
<ul>
<li>Hair Care Product
<ul>
<li>Shampoo
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
<li>Other Shampoos</li>
</ul>
</li>
<li>Conditioner
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
<li>Other Conditioners</li>
</ul>
</li>
<li>Hair Styling Product
<ul>
<li>Hair Gel
<ul>
<li>Ethnic Gel</li>
<li>Other Gel</li>
</ul>
</li>
<li>Hair Spray</li>
<li>Hair Wax</li>
<li>Hair Coloring Gel</li>
</ul>
</li>
<li>Hair Dye
<ul>
<li>Hair Coloring Gel</li>
</ul>
<ul>
<li>Other Hair Dyes</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Now it seems we have a good start on the root of the hierarchy, but we still don&#8217;t have enough sub-categories to easily find one in hundreds of products.  Let&#8217;s take another stab at the lower levels of hierarchy.  Looking out on the web I easily find a online drug store with useful seeming hierarchies of shampoo and conditioner.  Now that we have more sub-categories under &#8220;Shampoo&#8221; and &#8220;Conditioners&#8221; I can get rid of my &#8220;Other&#8221; categories:</p>
<ul>
<li>Hair Care Product
<ul>
<li>Shampoo
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
<li>moisturizing Shampoo</li>
<li>dandruff Shampoo</li>
<li>natural Shampoo</li>
<li>everyday usage Shampoo</li>
<li>Children&#8217;s Shampoo</li>
</ul>
</li>
<li>Conditioner
<ul>
<li>All in One &#8211; Shampoo &amp; Conditioner</li>
<li>dandruff Conditioner</li>
<li>natural Conditioner</li>
<li>Color Treated Hair Conditioner</li>
<li>Detanglers</li>
<li>Leave in Conditioner</li>
<li>Children&#8217;s Conditioner</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Shampoos and conditioners have several overlapping concepts (such as &#8220;Childrens Conditioner&#8221; and &#8220;Childrens&#8217;s Shampoo&#8221; in the list above), but these cannot be solved with a single category node in the poly-hierarchy since they are different categories.  The same concept is applied to multiple category nodes.  Such a situation usually indicates that we need to pull these concepts out into another dimension of categorization.  Maybe we create a &#8220;used for&#8221; categorization taxonomy:</p>
<ul>
<li>Used For
<ul>
<li>Cleaning</li>
<li>Moisturizing</li>
<li>Dandruff</li>
<li>Everyday Usage</li>
<li>Children</li>
<li>Color Treated Hair</li>
<li>Detangling</li>
</ul>
</li>
</ul>
<blockquote><p>Rule: Separate out overlapping concepts into their own taxonomy.  As much as it is possible, a single taxonomy should be a consistent single view point on the problem.</p></blockquote>
<p>Notice now that this new taxonomy is a little different in the way it gets applied. A single product can be categorized with multiple of these categories where as in our &#8220;Hare Care Product&#8221; taxonomy we attempted to create categories that would uniquely describe each product.  The multiple parent issue we had with &#8220;All-in-One&#8221; shampoos and conditioners could be solved this way also.  It might even work better that way since &#8220;all-in-one&#8221; could include other functions such as dying hair.  This thinking radically changes the hierarchy.  Here we have three categories (I have added another to handle the &#8220;ingredients&#8221; dimension):</p>
<ul>
<li>Hair Care Product
<ul>
<li>Shampoo</li>
<li>Conditioner</li>
<li>Hair Styling Products
<ul>
<li>Hair Gel</li>
<li>Hair Spray</li>
<li>Hair Wax</li>
</ul>
</li>
<li>Hair Dye</li>
</ul>
</li>
<li>Used For
<ul>
<li>Cleaning</li>
<li>Moisturizing</li>
<li>Dandruff</li>
<li>Everyday Usage</li>
<li>Children</li>
<li>Color Treated Hair</li>
<li>Detangling</li>
<li>Ethnic Hair</li>
</ul>
</li>
<li>Made With
<ul>
<li>Natural Ingredients</li>
<li>Organic Ingredients</li>
<li>Edible Ingredients</li>
</ul>
</li>
</ul>
<p>This also changes the way we categorize each product.  Rather than picking the category as a single point in a single taxonomy, we pick multiple categories from multiple taxonomies.  Here is an example of the categorization for fictitional product:</p>
<ul>
<li>Studio X All-in-One Natural Shampoo and Conditioner
<ul>
<li>Shampoo</li>
<li>Conditioner</li>
<li>Cleaning</li>
<li>Moisturizing</li>
<li>Natural Ingredients</li>
</ul>
</li>
</ul>
<p>What began as a rigid hierarchical classification system has evolved into a more flexible system which is almost as intuitive and in some cases maybe more intuitive.  This is good progress for our design.  It&#8217;s not yet perfect, but it has come a long way.  In the end it is worth while challenging the single hierarchy idea.  But this new approach does take a little practice to get right.  We have to get good at analyzing and picking abstract concepts such as &#8220;Used For&#8221; and &#8220;Made With&#8221;.  Imagine these abstract concepts as dimesions in multiple dimensional space, try to make them all orthogonal so the concepts work independently taking you to the correct point in your categorizational space.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/mtruchard.wordpress.com/18/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/mtruchard.wordpress.com/18/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mtruchard.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mtruchard.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mtruchard.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mtruchard.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mtruchard.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mtruchard.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mtruchard.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mtruchard.wordpress.com/18/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=18&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mtruchard.wordpress.com/2008/05/27/on-the-categorization-of-shampoo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4afc00089f862a9c9b08092f830bd621?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mtruchard</media:title>
		</media:content>
	</item>
		<item>
		<title>Notes on Principles of Programming</title>
		<link>http://mtruchard.wordpress.com/2008/05/21/notes-on-principles-of-programming/</link>
		<comments>http://mtruchard.wordpress.com/2008/05/21/notes-on-principles-of-programming/#comments</comments>
		<pubDate>Wed, 21 May 2008 13:49:03 +0000</pubDate>
		<dc:creator>mtruchard</dc:creator>
				<category><![CDATA[Development Process]]></category>
		<category><![CDATA[Software Architecture]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mtruchard.wordpress.com/?p=19</guid>
		<description><![CDATA[These are principles that I have picked up along the way which have been reinforced by experience.  These principles, which can be found in many programming books, I have found to be generally valid and useful.  I&#8217;ll add to this list as I remember them.   Verify As You Go Devise ways of testing the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=19&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>These are principles that I have picked up along the way which have been reinforced by experience.  These principles, which can be found in many programming books, I have found to be generally valid and useful.  I&#8217;ll add to this list as I remember them.<br />
<span id="more-19"></span><br />
 </p>
<h2>Verify As You Go</h2>
<p>Devise ways of testing the theory design and implementation of your project as you work.  Create ways of walking through the design, build unit tests as you code, keep the application in a runnable state as you work so that it can be tested.</p>
<h2>Design Components In One Scale</h2>
<p>Design systems, data structures, classes and other components to work at a particular scale.  Components should interact with other components at different scales but should not perform actions at those scales.  For example, a class that is design to hold data should not also attempt to perform UI operations or database reads and writes.  This rule creates a simple tiered structure that can be managed and understood as the application develops.  A good design should consider the thousands of angles and interactions within the program.  Designing components at their appropriate scales allows you to narrow the problem down to those angles and interactions working at one scale at a time.</p>
<h2>Refactor in Bursts</h2>
<p>Be aware of the impact of your design choices as you code.  Refactor when it becomes consistenly clear that the design can be significantly improved.  Refactor when the team has clarity about what design fix is needed, not when the project plan calls for refactoring.  Make the decision to refactor first, then plan the time to do it.  As the cost of refactoring builds with the amount of code, devise ways of testing the new design (Verify As You Go) before you refactor.</p>
<h2>Learn As You Go</h2>
<p>The best time to learn design skills or new technologies is when you are working on a project.  Research as you work.</p>
<h2>Don&#8217;t Copy, Steal</h2>
<p>When you copy, without modification, design or code from another application or from a book such as a book on design patterns you are not taking ownership.  The copy remains in your application as someone else&#8217;s idea, some one else&#8217;s code.  It makes for an incongruity in the application design where we shift from one paradigm to another.  It also may be an area of your own application that you don&#8217;t understand. Rather than copy, steal.  Take ownership of the stolen ideas, make them your own, understand them, incorporate them into your code.  When you bring in an outside library, don&#8217;t recode it, but do understand how to use it and devise your own way of use that is unique for your application, take ownership of its use.  Try not to &#8220;wrap&#8221; outside libraries, the insulating buffer isolates you too much from the library&#8217;s intent and may not provide you the interchangeability you strive for unless very well designed.  Rather learn the library and understand its use, if you need a wrapper, learn the intent of similar libraries and build an interface that really is interchangeable.</p>
<h2>Create Reuse on the Third Implementation</h2>
<p>The first implementation of a function is unique to its particular context.  The second implementation is an experiment in reuse.  The third implementation creates the library.  It is good to strive to create reusable components in your code, but don&#8217;t force it.  Just because a reusable library can be written does not mean it is useful.  A reusable library not reused is waste of effort.</p>
<h2>Evolve the Design</h2>
<p>Create a high level vision for the project.  Understand what it must be to accomplish its basic goals.  Understand what it has the potential to be if well designed. Understand what is important and what are the details.  Evolve this vision as you work.  Refactor in Bursts.  Jettison unnecessary baggage.  Remain true to the vision.  If the vision divides, create another project with a different vision.  Do this as you work, not in a preplanned &#8220;design&#8221; phase.</p>
<h2>Create Diagrams that Communicate</h2>
<p>The purpose of diagrams and documentation is to communicate.  Don&#8217;t clutter them by attempting to convey every detail of every system, class, operation, table column, variable, relationship, enumeration, or etc.  Use economy and focus to highlight the details that present the broader picture while removing details that don&#8217;t.  Instead of one giant diagram that represents everything, create smaller digestable diagrams that represent the system from different view points: how the user sees the application, the high-level table structure, how the application is to be deployed on servers, etc.  Customize these views by being selective with details.  For the complete detailed picture rely on the source code and installation scripts (try not to have many manual build/installation steps).  Use automated tools to generate documenation directly from the code.</p>
<p> </p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/mtruchard.wordpress.com/19/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/mtruchard.wordpress.com/19/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mtruchard.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mtruchard.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mtruchard.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mtruchard.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mtruchard.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mtruchard.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mtruchard.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mtruchard.wordpress.com/19/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=19&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mtruchard.wordpress.com/2008/05/21/notes-on-principles-of-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4afc00089f862a9c9b08092f830bd621?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mtruchard</media:title>
		</media:content>
	</item>
		<item>
		<title>Designing the Enterprise IT Data Space</title>
		<link>http://mtruchard.wordpress.com/2008/04/22/designing-the-enterprise-data-space/</link>
		<comments>http://mtruchard.wordpress.com/2008/04/22/designing-the-enterprise-data-space/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 15:49:40 +0000</pubDate>
		<dc:creator>mtruchard</dc:creator>
				<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Theory]]></category>

		<guid isPermaLink="false">http://mtruchard.wordpress.com/?p=15</guid>
		<description><![CDATA[Reading A Pattern Language by Christopher Alexander has given me some ideas about modeling data for enterprise IT systems. I&#8217;ll jump right in to my stream of conciousness. If the organization is viewed as a space in which different types of entities interact to accomplish the goals of the organization then we can define these [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=15&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Reading <em>A Pattern Language</em> by Christopher Alexander has given me some ideas about modeling data for enterprise IT systems.  I&#8217;ll jump right in to my stream of conciousness.  If the organization is viewed as a space in which different types of entities interact to accomplish the goals of the organization then we can define these entities in spacial terms such as Alexander lays out a town with its neighborhoods, shopping centers, buildings, roads and etc. This provides a structure for thinking about how the organization operates and what function IT applications perform within that organization.<span id="more-15"></span></p>
<p>We can begin at the superstructure and then work our way down into the nooks and crannies.  At a high level within the organization there are people, processes, and knowledge that interact to accomplish enterprise goals:</p>
<p style="padding-left:30px;"><strong>People</strong> who work for the organization</p>
<p style="padding-left:30px;"><strong>Processes</strong> that the people follow to accomplish goals</p>
<p style="padding-left:30px;"><strong>Knowledge</strong> within the organization</p>
<p style="padding-left:30px;"><strong>Goals</strong> of the organization</p>
<p>We can say that the role of enterprise IT applications is to provide functionality and data to the organization. Functionality allows the people to carry out the processes and of course can, to some extent, automate the processes. Data as long as it is accessible by the people or automated processes represents one form of enterprise knowledge. Therefore IT software applications provide:</p>
<p style="padding-left:30px;"><strong>Functionality</strong> provided by IT applications</p>
<p style="padding-left:30px;"><strong>Data</strong> stored in IT systems</p>
<p>IT applications in most cases do not operate in isolation in the enterprise space, they interact with other applications to perform larger functions, assisting in the execution of processes and providing enterprise knowledge thus accomplishing enterprise goals. In specific the functionality and data of an application interact with functionality and data of other applications in various ways that are unique to the different classes of functionality and data involved. In terms of data, data can interact with other data or with functionality in different ways:</p>
<p style="padding-left:30px;"><strong>Silo Data</strong> is not shared with other applications, and provides knowledge to only the users of the application.</p>
<p style="padding-left:30px;"><strong>Shared Application Data</strong> is managed by a single application, but is shared with other applications.</p>
<p style="padding-left:30px;"><strong>Enterprise Data</strong> lives outside the applications and belongs to the enterprise as a whole.</p>
<p>In terms of functionality a similar scheme applies:</p>
<p style="padding-left:30px;"><strong>Silo Functionality</strong> is not shared with other applications, and provides knowledge to only the users of the application.</p>
<p style="padding-left:30px;"><strong>Shared Application Functionality</strong> is part of a single application, but is available to other applications through an API.</p>
<p style="padding-left:30px;"><strong>Enterprise Functionality</strong> lives outside the applications and belongs to the enterprise as a whole. (An example of this is the shared services provided by an SOA architecture)</p>
<p>And there is also a relationship between functionality and data such that functionality performs different operations on the data:</p>
<p style="padding-left:30px;"><strong>Present the data</strong></p>
<p style="padding-left:30px;"><strong>Modify the data</strong></p>
<p style="padding-left:30px;"><strong>Take action</strong> based on the data (e.g. assembling a product)</p>
<p style="padding-left:30px;"><strong>Transform</strong> the data into other classes of data</p>
<p style="padding-left:30px;"><strong>Transfer</strong> the data to other storage.</p>
<p>And from some of these operations, relationships between data is formed. Additionally, data itself encodes relationships between data. Taken together we form an enterprise data model describing the relationships between data:</p>
<p style="padding-left:30px;"><strong>Data lives in isolation</strong></p>
<p style="padding-left:30px;"><strong>Data is related to other data</strong></p>
<p style="padding-left:30px;"><strong>Data is a copy</strong> of data stored elsewhere</p>
<p style="padding-left:30px;"><strong>Data is a transformation</strong> of other data</p>
<p>For copied and transformed data a master/slave or synchronization relationship is formed whereby one set of data may be read only and another may be both read and written to.</p>
<p style="padding-left:30px;"><strong>Data is the master source</strong> for data in its class</p>
<p style="padding-left:30px;"><strong>Data is a slave destination</strong> for data in its class</p>
<p style="padding-left:30px;"><strong>Data is a synchronized source</strong> among different sources of data in its class</p>
<p>For related data different relationships are possible:</p>
<p style="padding-left:30px;"><strong>Data refers to other data</strong>, the data being refered to can be used independently.</p>
<p style="padding-left:30px;"><strong>Data contains other data</strong>, the contained data is dependent and intertwined with the data.</p>
<p style="padding-left:30px;"><strong>Data represents another view</strong> on other data</p>
<p>Data referring and data containing are well known within traditional data modeling not as two relationships (as described here), but as one relationship. However, to understand data properly it is important to know where the scope of a particular class of data ends. Clearly in the case of orders, the order lines are contained within and can be treated as a unit. It would not make sense to delete an order without deleting the lines, thus, the &#8220;contained&#8221; dependent relationship.  Conversly, refered data such as the customer for which an order is placed, is independent.  If the order is deleted, the data for the customer that placed the order is not necessarily deleted.  The customer data could be used to place other orders and is therefore independent.  Once the order is invoiced, the third situation may arrise.  Order, customer and address data can change over time, but an invoice (for legal reasons) needs to represent that data at a single point in time.  To accomplish this many systems will take a snapshot of the data and store that snapshot inside of special invoice tables.  This creates data that represents another view of the original data.  Then, what views are possible?  Several new ideas emerge:</p>
<p style="padding-left:30px;"><strong>Data as the historical</strong> records of other data</p>
<p style="padding-left:30px;"><strong>Data as the understanding</strong> of other data <strong>from different points of view</strong></p>
<p>Historical data is covered by the invoice example, but is also common in the form of versioning systems that allow users to undo changes to data and rollback to previous versions.  Data from different points of view, however, is not (yet) as common in IT systems.  Data is not truth or reality itself, but attempts to represent truth and reality.  This indirect nature of data leads to different problems in point of view, problems on which most murder mystery novels are based.  If the detectives in the novel understood the information as most IT systems do they would be lead from one conculsion to the next with no hope of solving the crime.  In IT, customer data management systems fall victim to this confusion.  The customer data in the internal system that sales people use may have a different point of view from the customer address data that the customers enter themselves for online purchases.  I have watched in horror as online users change the address of an order by whiping out (known as overlaying in the business) the first address in their online address book.  Did they not see the add address button?  I wonder how our new customer hub software will deal with that.  Many customer hubs attempt to join all the matching data to one master record so that it is possible to look at the data from multiple points of view.  In the case that the customer overlays the data the hub will rejoin data to its master records dynamically.  These hubs are attempting to tackle data from different points of view.  Data about truth is not what it seems.</p>
<p>Just as Alexander has done in his book about &#8220;that other kind of architecture&#8221;, here we have layed out a language of software and functionality and data, or at least we have begun to.  Is this analysis valuable?  Where do we go next?  I suppose that is what blogs are good for.  Explore on!  (later.)</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/mtruchard.wordpress.com/15/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/mtruchard.wordpress.com/15/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mtruchard.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mtruchard.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mtruchard.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mtruchard.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mtruchard.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mtruchard.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mtruchard.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mtruchard.wordpress.com/15/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=15&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mtruchard.wordpress.com/2008/04/22/designing-the-enterprise-data-space/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4afc00089f862a9c9b08092f830bd621?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mtruchard</media:title>
		</media:content>
	</item>
		<item>
		<title>Internationalization: Part I</title>
		<link>http://mtruchard.wordpress.com/2008/04/15/modeling-internationalization/</link>
		<comments>http://mtruchard.wordpress.com/2008/04/15/modeling-internationalization/#comments</comments>
		<pubDate>Tue, 15 Apr 2008 00:00:16 +0000</pubDate>
		<dc:creator>mtruchard</dc:creator>
				<category><![CDATA[Internationalization]]></category>
		<category><![CDATA[Localization]]></category>

		<guid isPermaLink="false">http://mtruchard.wordpress.com/?p=7</guid>
		<description><![CDATA[It should be simple. Our web site currently is translated into only a handful of languages, but we allow the visitor to pick from many different locales when viewing the site so that currencies, prices, dates and other such locale specific things can be displayed for the correct country and language. The idea is that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=7&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It should be simple. Our web site currently is translated into only a handful of languages, but we allow the visitor to pick from many different locales when viewing the site so that currencies, prices, dates and other such locale specific things can be displayed for the correct country and language. The idea is that we localize for most countries, but only translate into the few languages. We have had the problem solved and have implemented many web applications using this scheme, but for some reason we keep hitting problems and coming back to the issue. What is so hard about localization? Is it that we have the theory wrong, that our developers just need better training on the standards, or that localization is inherently hard? Let&#8217;s explore.<span id="more-7"></span></p>
<p style="padding-left:30px;"><strong>Internationalization</strong> is the process of designing software applications so that they can be adapted to various different cultures, regions and languages without engineering changes.</p>
<p style="padding-left:30px;"><strong>Localization</strong> is the process of adapting software for a specific culture, region or language by adding locale-specific meta-data and translating text.</p>
<p style="padding-left:30px;"><strong>Globalization</strong> is the process of doing both Internationalization and Localization so that the application has as wide of a potential audience as possible.</p>
<p style="padding-left:30px;"><strong>Translation</strong> is the process of translating application text into different languages.</p>
<p>As software designers we are interested in the process of internalization which is all about making the software application easy to localize. And this means much more than just saving and using some translation text. Here is an example:</p>
<p style="padding-left:30px;"><strong>American English: </strong>HURRY!!!! Your price of $10.00 expires on 1/18/2009 and we are quickly running out of your favorite color!</p>
<p style="padding-left:30px;"><strong>British English: </strong>Hurry! Your price of £6.00 expires on 18/1/2009 and we are quickly running out of your favourite colour.</p>
<p>Despite the fact that both of texts are in English there are several differences to consider:</p>
<ul>
<li>$10.00 vs £6.00 &#8211; There is both a difference in the currency sign and the amount. In some cases a conversion rate could be used or in other cases special pricing be give by country or customer account. In other currencies the number of decimals may vary and the currency sign may be on the other side of the amount.</li>
<li>1/18/2009 v.s. 18/1/2009 &#8211; the American date format has the month first, followed by the day. Many European countries switch this order.</li>
<li>&#8220;favorite&#8221; v.s. &#8220;favourite&#8221; &#8211; Many British words are spelled differently than their American counterparts.</li>
<li>&#8220;HURRY!!!!&#8221; vs &#8220;Hurry!&#8221; &#8211; Americans require more excitement, let&#8217;s add a few more exclamation points. (What can I say!!!)</li>
</ul>
<p>Because of the dynamic nature of pricing and expiration dates we don&#8217;t want to translate the text for all possible prices and dates, instead we want some kind of templating mechanism. There are several ways to create templates for text (or markup such as html) which we may explore in a different post, but here we will use substitution tags embedded in the translated text shown in brackets {}.</p>
<p style="padding-left:30px;"><strong>American English: </strong>HURRY!!!! Your price of {price} expires on {date} and we are quickly running out of your favorite color!</p>
<p style="padding-left:30px;"><strong>British English: </strong>Hurry! Your price of {price} expires on {date} and we are quickly running out of your favourite colour.</p>
<p>Now we can get the price and expiration date from a pricing system, perform the localization using standard APIs and then render the completed text. In Java there are several internationalization APIs to help us out. First, we must distinguish between American and British English using a java locale. Here we create a locale for American English and grab the country from the locale:</p>
<pre style="padding-left:30px;">Locale locale = new Locale("en","US"); // American English</pre>
<pre style="padding-left:30px;">String shipCountry = locale.getCountry();</pre>
<p>Next we get the pricing information from a pricing system. Pricing systems may take in various parameters, but here we assume price is based on customer, product and the country that the product will be shipped to. Lets assume the pricing system does a lookup on these three parameters in a table and returns an object containing price, currency, and the date that the price expires. It is safest to let the pricing system return the currency rather than depending on another localization API so we are sure that the amount matches the currency.</p>
<pre style="padding-left:30px;">// get price information from our custom pricing system</pre>
<pre style="padding-left:30px;">PriceData priceData = getPrice(customerId, productId, shipCountry);
Float price = priceData.price;
Currency currency = priceData.currency;
Date expiration = priceData.expiration;</pre>
<p>Now we can use the DateFormat class to format the date to conform to American English. The SHORT format selects the numeric only data format (e.g. 1/18/2007).</p>
<pre style="padding-left:30px;">DateFormat dateFormatter =</pre>
<pre style="padding-left:30px;">              DateFormat.getDateInstance(DateFormat.SHORT, locale);
String dateText = dateFormatter.format(expiration);</pre>
<p>And here we use the NumberFormat class to convert the price into a string. Note that the currency formatter for numbers in java takes in a locale and it chooses the currency based on that locale, but our pricing routine also chose the currency based on its own rules. The pricing routine must take precedence since the pricing routine also returns the amount, so in this simple example we throw an exception if they are not equal.</p>
<pre style="padding-left:30px;">NumberFormat currencyFormatter = NumberFormat.getCurrencyInstance(locale);
String priceText = currencyFormatter.format(price);
if ( !currencyFormatter.getCurrency().equals(priceData.currency) ) {
    // This shouldn't happen
    throw new RuntimeException ("Currency missmatch.");
}</pre>
<p>Then finally we apply the localized template. Assume that the getTemplate() method returns the template text based on the locale. Java provides a nice template substitution and localization facility in the MessageFormat class, but it requires that we change the template text to use &#8220;{0}&#8221; and &#8220;{1}&#8221; instead of &#8220;{price}&#8221; and &#8220;{date}&#8221;.</p>
<pre style="padding-left:30px;">String template = getTemplate(locale);</pre>
<pre style="padding-left:30px;">// ex: Hurry!!!!  Your price of {0} expires on {1} and we are quickly running out of your favorite color!
String message = MessageFormat.format(template,priceText,dateText);
System.out.println(message);</pre>
<p>The result is that the proper localized text is printed. One final challenge is the possible mismatch between the locale and the customer&#8217;s actual shipping address. The customer may have chosen &#8220;en-US&#8221; as the locale so that the text would display in English, but they actually intend to ship the product to Mexico where it will be put into use. The reason this is an issue is that we have specified that pricing is to be based on shipping address rather than the choice of locale (this is often the case for many pricing systems).  The pricing functionality must match the localization or there is a danger of prices being quoted incorrectly.  We may not actually know the shipping address until the customer goes to checkout and purchase the product. At checkout the price would need to be recalculated and displayed for the correct country which in our example is Mexico. This means re-executing the getPrice() method with the new shipping address country and applying the currency formatter for English but in Mexican Pesos:</p>
<pre style="padding-left:30px;text-align:left;">// At checkout customer chooses a Mexican shipping address</pre>
<pre style="padding-left:30px;text-align:left;">shipCountry = "MX";
Locale shipLocale = new Locale("en", shipCountry);
priceData = getPrice(customerId, productId, shipCountry);
currencyFormatter = NumberFormat.getCurrencyInstance(shipLocale);
priceText = currencyFormatter.format(priceData.price);
System.out.println(priceText);</pre>
<p>This would output the following text:</p>
<pre style="padding-left:30px;">HURRY!!!! Your price of MXN99 expires on 1/18/2009 and we are quickly running out of your favorite color!</pre>
<p>The text is localized in English, and the amount, while displayed in an English format is quoted in Mexican Pesos.  We need the currency formatter to follow these rules so we have done this by creating a hybrid &#8220;en-MX&#8221; locale which seems to work fine in this case.  The unfortunate issue is that the currency formatter in Java mixes the concept of locale with the concept of currency.  This code would need to be properly tested for all currencies used.</p>
<p>One common technique for localization that we have seen in this example is the separation of the translation of text from the locale based formatting of variable data. In practice this is a good idea since it cuts down on the amount of text that will have to be translated into different languages.  In the example we were able to display a price for Mexico in Pesos without having to translate the text into Spanish.  This means that we can marginally serve Mexican customers even before we have the Spanish translations in place, something which is quite important for a global company that has customers in many countries.  Also, in practice we may not want to take the expense of translating into both American and British English since the following text is acceptable to British English speakers:</p>
<p style="padding-left:30px;"><strong>English: </strong>HURRY!!!! Your price of £6.00 expires on Jan 18, 2009 and we are quickly running out of your favorite color!</p>
<p>Notice one change here, however.  The date format has been fixed to eliminate the possibility of ambiguity.  Generally speaking the all numeric date format dd/mm/yyyy (DateFormat.SHORT in Java) is a bad idea.  Most systems will use some format with the month spelled out and of course nobody displays only 2 characters in the year any more.</p>
<p>In Part II we will cover more on how to manage the translated text.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/mtruchard.wordpress.com/7/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/mtruchard.wordpress.com/7/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mtruchard.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mtruchard.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mtruchard.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mtruchard.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mtruchard.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mtruchard.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mtruchard.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mtruchard.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=7&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mtruchard.wordpress.com/2008/04/15/modeling-internationalization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4afc00089f862a9c9b08092f830bd621?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mtruchard</media:title>
		</media:content>
	</item>
		<item>
		<title>Versioning Data in a Relational Model</title>
		<link>http://mtruchard.wordpress.com/2008/04/14/versioning-data-in-a-relational-model/</link>
		<comments>http://mtruchard.wordpress.com/2008/04/14/versioning-data-in-a-relational-model/#comments</comments>
		<pubDate>Mon, 14 Apr 2008 22:02:41 +0000</pubDate>
		<dc:creator>mtruchard</dc:creator>
				<category><![CDATA[Relational Modeling]]></category>
		<category><![CDATA[Versioning]]></category>

		<guid isPermaLink="false">http://mtruchard.wordpress.com/?p=5</guid>
		<description><![CDATA[If data never changed or needed to be updated our lives as software designers might be a little easier, we could just store the data once and be done with it. But data changes, it gets edited and deleted, and we designers cringe when we hear a user utter the words, &#8220;Can I undo that?&#8221; [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=5&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If data never changed or needed to be updated our lives as software designers might be a little easier, we could just store the data once and be done with it.  But data changes, it gets edited and deleted, and we designers cringe when we hear a user utter the words, &#8220;Can I undo that?&#8221;  Its what Einstein did to Physics, he messed with the time dimension and everything came unraveled.  Fortunately, for most of us we don&#8217;t have to work at the messy quantum level (yet).  As it is with web software designers and data versioning, you might design twenty applications without having to worry much about it and then one day it hits you.  &#8220;Can I undo?&#8221;  Desktop application software designers have had to deal with the &#8220;undo&#8221; problem for many years, and now its time that web and IT software designers jump in.  To make matters more complicated we have to figure out how to solve this problem for vast amounts of data typically stored in databases.  Let&#8217;s call it Web 2.0 with the undo feature!<span id="more-5"></span></p>
<p>Of course undo is only one feature related to data versioning.  There are many other reasons you might want to track changes.  Here is a list of some typical reasons:</p>
<ul>
<li>Ability to undo changes</li>
<li>Reporting and anaylsis on historical changes to data</li>
<li>Auditing who made changes to data (and what changes they made)</li>
<li>Taking a snapshot of data for contractual purposes. (e.g. taking a snapshot when an invoice is printed.)</li>
</ul>
<p>How versioning should be modeled in your application depends on what features you are trying to achieve.  While it may be academically rewarding trying to solve for all of these features, it may be unnecessarily expensive since most systems (such as relational databases) don&#8217;t have a good native way to deal with versioning.</p>
<p>Time itself is not absolute.  You don&#8217;t have to be traveling at the speed of light to see this.  Try hooking a desktop application in Finland to a server hosted in England. If the combined system depends on time as registered by both computers then the server may think your PC in Finland is operating one or two hours in the future (depending on the time of year and daylight savings time adjustments.)  Even within the same timezone the slight amount of time that the clocks of a server and a PC differ may make all the difference.  The moral of the story: choose a standard for where to get the time and keep time zones in mind.</p>
<p>What about dates?  They are fairly fixed, aren&#8217;t they?  Once you get around the time zone thing and unless you are dealing with large ranges of dates, in practice it seems so.  Computers have been programmed to handle the fact that a year is actually a little under 365.25 days.  Don&#8217;t lookup the term &#8220;year&#8221; on wikipedia unless you want the ugly truth about how all this works.  Suffice it to say that humans haven&#8217;t always used our modern calendar and so dates in the past have been recorded using many different schemes including basing them on the cycles of the moon.  The Date class in Java represents a specific moment in time to the millisecond, but the methods for converting Date to a month, day, or year have been deprecated.  Why?  Because there are different calendars to choose from. The GregorianCalendar class allows you to a way to convert a Date into the days in our modern calendar.  How very flexible!  Even still our modern handling of time is still not exactly precise, but it is close enough for most applications.</p>
<h2>Logging Changes</h2>
<p>One approach to tracking changes is to log them in a separate column or table in the database.  Let&#8217;s say that we want to analyze the changes that an online user makes to their shopping cart which is stored in relational tables:</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Table</strong>
NUMBER cart_id
NUMBER user_id
VARCHAR(1000) description
... other data ...
VARCHAR(32767) change_log</pre>
<pre style="padding-left:30px;"><strong>Shopping Cart Lines Table</strong>
NUMBER line_id
NUMBER cart_id
NUMBER product_name
NUMBER quantity
NUMBER price
... other data ...</pre>
<p>The change_log column in the table could contain a comma delimited list of the changes made by the user that looks something like this:</p>
<p style="padding-left:30px;">01-JAN-2007 created, 01-JAN-2007 added part #8839 to cart, 02-JAN-2007 added part #6746 to cart, 03-JAN-2007 removed part #8839</p>
<p>Don&#8217;t be too horrified.  I have seen many systems designed like this and they work as a quick and dirty solution for limited troubleshooting purposes.  Just remember to concatenate the log string down to 32767 characters when you update the table and at least your application won&#8217;t crash.  Extending this methodology gets a little trickier.  A change_log column could be added to the lines table, where you could have a little more room for quantity and price details but that won&#8217;t work for deletes.  My opinion on this methodology is that if you choose the quick and dirty, keep it simple and abandon it altogether if your needs get more complex.</p>
<p>The next step up in fancy in terms of design options is to pull the change log out into a separate table (or tables):</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Change Log Table</strong>
NUMBER cart_id
VARCHAR(2000) change
DATE change_date</pre>
<p>Here you are not limited in space for the change text that is logged.  Every change can have it&#8217;s own row in the change log table, and the date column makes changes easier to query.  This solution costs a little more since now you have an extra table to manage and all programmers writing inserts need to know to hit multiple tables.  The improved querying capabilities gives you easier reporting and troubleshooting, but you are still reduced to parsing the text for more specific queries.  Now, what if we expand the change column into multiple columns that can keep the details of the change?  There are several approaches to this, but before we get to those we will need an extra change log table for the lines:</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Lines Change Log Table</strong>
NUMBER line_id
... change columns ...
DATE change_date</pre>
<p>Now we can attach information on changes directly to the individual lines in the shopping cart.  One more sophisticated way of capturing the change information is to copy the rows from the original tables into the change log tables so that we can capture the exact set of data that was in the original table before or after the change was made.  We also need a change_type column for tracking whether this was an update, add or delete.</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Change Log Table</strong>
NUMBER cart_id
VARCHAR(1000) description
... other cart data ...
VARCHAR(15) change_type  (add/delete/update)
DATE change_date

<strong>Shopping Cart Lines Change Log Table</strong>
NUMBER line_id
NUMBER product_name
NUMBER quantity
NUMBER price
... other line data ...
VARCHAR(15) change_type  (add/delete/update)
DATE change_date</pre>
<p>This solution approaches another more elegant solution which we will cover later so we won&#8217;t analyze this too much.  The other approach is to capture changes on a column by column basis so that we insert a row into the change log table for each value that was changed.  For this approach the two change log tables would look like this:</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Change Log Table</strong>
NUMBER cart_id
column_name
before_value
after_value
DATE change_date</pre>
<pre style="padding-left:30px;"><strong>Shopping Cart Lines Change Log Table</strong>
NUMBER line_id
column_name
before_value
after_value
DATE change_date</pre>
<p>This approach captures updates well, but doesn&#8217;t work for updates and deletes.  A full history of how data in each column changed is now available.  Another minor issue is that for a single update to a row in the shopping cart lines table there may be multiple lines in the log table for changes to product, quantity, and price.  This multiple rows make it harder to query and force you to find a way to group the multiple rows back into one change most likely by grouping on the change_date column which isn&#8217;t a perfect solution.  Another option for this solution is to combine the two tables by genericizing the foreign key id and adding another column that tells us which table to join back to.  This means you could reduce your whole change logging system down to a single table that tracks changes for all tables in the database, but at the cost of having a very large table to manage, dealing with the associated performance problems, and having to write complex and unintuitive queries.</p>
<h2>An Elegant (and expensive) way of Versioning</h2>
<p>The previous line of approaches led us further down into a hole where things seemed to be getting more and more complex while adding marginal value.  That is a sure sign that either: 1) you are stuck in a design rut; or 2) your about to make a big breakthrough in design.  Should we sell our stock or hold on for the big payoff?  Let&#8217;s sell.  Let&#8217;s go back to the original tables themselves and try another approach.  Here we will add two new columns for versioning: a from_date, and a to_date column and a column to indicate if the cart line is active or has been deleted.  For our example we will work solely with the lines table:</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Lines Table</strong>
NUMBER line_id
NUMBER cart_id
NUMBER product_name
NUMBER quantity
NUMBER price
DATE from_date
DATE to_date
VARCHAR active_or_deleted</pre>
<p>Each row in the table now represents how the date looked between the two dates.  Here is what the data for a single line item would look like if we created the line with a quantity of 1 and then changed it 2 days later to a quantity of two.  Note that there are two rows in the table to represent the historical changes:</p>
<pre style="padding-left:30px;">1,  1,  Ink Jet Printer X11, 1, $150, 01-JAN-2007, 03-JAN-2007 active
1,  1,  Ink Jet Printer X11, 2, $150, 03-JAN-2007, 31-DEC-9999 active
</pre>
<p>The first row shows what the data looked like between 01-JAN-2007 and 03-JAN-2007 and the second row shows what the data looked like from 03-JAN-2007 to 31-DEC-9999.  This does make querying tougher even if it is just to get the latest data.  The reason we don&#8217;t use null is that it upsets the way the database handles date comparisons.  If we did use null then getting the current row would be easier, but it would be harder find the row effective for a certain date.  Another issue is that we must be careful about how we compare dates.  The to_date on the first row matches the from_date on the second row.  We must compare one exclusively (less-than) and on inclusively (greater-than-or-equal) to make this work. Of course to implement this approach we must also change the way that we update, insert and delete rows in the table so that we never update or delete.  Instead we only insert.  This rule alone suggest that we are keeping history since we never go back and change historical data.</p>
<p>Here is how a deleted item would look:</p>
<pre style="padding-left:30px;">1,  1,  Ink Jet Printer X11, 1, $150, 01-JAN-2007, 03-JAN-2007 active
1,  1,  Ink Jet Printer X11, 2, $150, 03-JAN-2007, <span style="color:#888888;">05-JAN-2007</span> active
<strong>1,  1,  Ink Jet Printer X11, 2, $150, 05-JAN-2007, 31-DEC-9999 deleted</strong></pre>
<p>And if we undelete the item we add another row:</p>
<pre style="padding-left:30px;">1,  1,  Ink Jet Printer X11, 1, $150, 01-JAN-2007, 03-JAN-2007 active
1,  1,  Ink Jet Printer X11, 2, $150, 03-JAN-2007, 05-JAN-2007 active
1,  1,  Ink Jet Printer X11, 2, $150, 05-JAN-2007, <strong><strong>07-JAN-2007</strong></strong> deleted
<strong>1,  1,  Ink Jet Printer X11, 2, $150, 07-JAN-2007, 31-DEC-9999 active</strong></pre>
<p>Notice also that we are breaking our rule slightly about updating rows.  When we insert a row to make a change to it, we also go back and update the previous row to end-date the previous change.  This adds quite a bit of complexity to the code for updating the table and would typically warrant creating a wrapper API to encapsulate the code.</p>
<p>Let&#8217;s make one more modification before calling the design final.  Let&#8217;s change the dates to version numbers like this:</p>
<pre style="padding-left:30px;"><strong>Shopping Cart Lines Table</strong>
NUMBER line_id
NUMBER cart_id
NUMBER product_name
NUMBER quantity
NUMBER price
DATE from_version
DATE to_version
VARCHAR active_or_deleted</pre>
<p>Here is some example data with versions instead of dates:</p>
<pre>1,  1,  Ink Jet Printer X11, 1, $150, 1, 2
1,  1,  Ink Jet Printer X11, 2, $150, 2, 9999999</pre>
<p>Then let&#8217;s have a version table that keeps track of the version numbers and can include all sorts of other meta-information:</p>
<pre style="padding-left:30px;"><strong>Version Table</strong>
NUMBER version_number
DATE update_date
VARCHAR(200) updated_by
VARCHAR(2000) reason_for_update</pre>
<p>Joining it all together we can write a query that shows information about a particular version of the data:</p>
<pre style="padding-left:30px;">SELECT v.*, i.*
FROM version_table v, line_table i
WHERE v.version_number = 1
AND v.version_number &gt;= i.from_version
AND v.version_number &lt; i.to_version
</pre>
<p>Which would return a row like this:</p>
<pre style="padding-left:30px;">1, 01-JAN-2007, mtruchard, adding a part, 1,  1,  Ink Jet Printer X11, 1, $150, 1, 2</pre>
<p>This is a fairly flexible design and can allow for changes to data to be tracked, changes to be undone, and even undo&#8217;s to be redone and a full history of the undo&#8217;s and redo&#8217;s tracked.  This approach is similar to the way source code revision software operates.  The major downside is the complexity.  What used to be a simple and intuitive query to get the most recent data has been complicated and the updates are best left to an API built by a programmer who knows how the scheme works.  And then there is the pesky &#8220;999999&#8243;, change that to null and historical queries gets messy and slow, but don&#8217;t change it and we have nightmares of Y2K all over again.  Overall, however, this solution is far more elegant than any of the previous ones.</p>
<h2>Change Capture Built-in to the Database</h2>
<p>An even better solution would be to have the database handle all this for you.  I haven&#8217;t yet seen a scheme like this built into a relational database, but that doesn&#8217;t mean there isn&#8217;t one out there.  What I have seen are several mechanisms for logging or capturing changes on a database table.  Oracle databases have what is called redo logs where all changes made to a set of tables can be automatically written out to a file on the system.  These logs can then be played back in the database to perform the operations over again.  Oracle also has a technology called CDC where changes can be captured on a table as they are happening and written to a queue.  The queue can either be queried directly for changes or can be used to insert the data into other tables with a structure that allows for tracking of history like the ones discussed previously.  This is one method for populating a datawarehouse with historical change data while keeping the original tables simple.  Used alone, or in combination with the approaches described above built in database features are the best way to go if they can meet your needs.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/mtruchard.wordpress.com/5/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/mtruchard.wordpress.com/5/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mtruchard.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mtruchard.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mtruchard.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mtruchard.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mtruchard.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mtruchard.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mtruchard.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mtruchard.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mtruchard.wordpress.com&amp;blog=3479956&amp;post=5&amp;subd=mtruchard&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mtruchard.wordpress.com/2008/04/14/versioning-data-in-a-relational-model/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4afc00089f862a9c9b08092f830bd621?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mtruchard</media:title>
		</media:content>
	</item>
	</channel>
</rss>
