Computers rely on data to do meaningful work. Without data, computer programs would have nothing to do. Computer programs are built to receive, process, present, and store data. The process of designing computer programs (and computer systems) begins with an examination of the data – the inputs and outputs – and for a number of years, it was assumed that the data would not change. Slowly, it has become apparent that the data needs of systems do change during the lifetime of the system and sometimes changes rapidly. This is referred to as the evolution of data. One very big change that affected many systems was the need to represent the year as a four digit number. Remember Y2K? That was a data change. Another was the addition of four digits to the US postal code: zip+4. Even today many computer programs are unable to accept that data. These evolutionary changes can be caused by many things – corporate mergers, new functionality, and user creativity rank at the top.
So why is it such a problem? Consider an example based on the US postal code extension. If the screen is changed and allows for the new data, the processing part of the program won’t know what to do with it. (Computer programs are not smart – they only do what they are told to do and unless they were built otherwise, they tend to fail when they encounter new data.) Assume the program doesn’t fail (there’s many ways to do this) and tries to store the information in the database. The database knows that zip code is only 5 digits, and in the best case just truncates the extra four digits. When the user asks for their data back, part of it isn’t there! That was the best-case scenario. The worst case scenario would be failure when the program or database encountered the unexpected data. Neither is good. This is a simple example; consider what would happen when a data change shifts Name to First Name, Middle Name and Last Name. Data from the new system will not be compatible with the old data without additional processing or conversion.
Obviously, if there’s new data or a restructuring, everything needs to change in a coordinated fashion – the input and output screens, the processing and the storage. In large systems, this can be a considerable expense and as a result is avoided whenever possible. Many such changes are accumulated and dealt with all at once in the “next release”. In an environment where the data changes rapidly, or is not known in advance, this solution won’t work. This only works when the changes to the data are known and under the control of thesame people that control the programs.
In many analytical fields the structure and content of the data being analyzed is not under the control of the analyst or the programmers. Changes in data by a data provider often cause failure in the programs used by an analyst. Even if the programs don’t fail, the analyst is unable to evaluate or make use of the new data. In this situation, the analyst has little choice but to make the changes as rapidly as possible if they wish to continue using that data source.
Engineers (system, software and database) depend on the use of a data model to describe the data a system uses. When a change in data is anticipated or detected, the model is changed to help document and communicate that change. The impact of changing a data model is magnified by the number of systems that use the model as their foundation. If all systems using the model don’t adopt the change they will become unable to communicate with each other.
When working on a database that supported many analytical programs, I struggled with trying to develop it in a way that would allow some flexibility should the data change. Well, really, when the data changed. It was the one thing we could count on – our data would change. It is impossible to figure out in advance everything an analyst might want to store, or every sort of thing that might be useful during an analysis. To complicate matters, we were never sure what new things might show up in the data sources that we’d need to store. I remember thinking there had to be a better way…. and then I ran into TRIZ.