Open Component Architecture



 Overview

Within the context of database and data handling software, products based on a true Open Component Architecture are easy to distinguish.

They offer the flexibility of Open Systems standards, so that they are available on a wide range of platforms, compatible with a wide range of operating environments, and their data and code is easily accessible through a wide range of routes.

Re-usability based on simple, stable, self-contained components which go on working year in year out brings genuine productivity rather than continual re-working, re-skilling and re-building.

A consistent structure that provides a clear architecture to guide use, development, maintenance is essential if low deployment costs are to be reflected in low support costs to make integration across the enterprise truly viable.



 Open Flexibility

Availability on a wide range of platforms, minimally including Unix and Microsoft operating systems is an essential pre-requisite for any database or data handling product with any serious claims to an Open Systems heritage. However, it is not enough. Availability must be matched by compatibility and accessibility.

For databases themselves this means data compatibility. Movement of complete databases between platforms should be straightforward with appropriate automatic or semi-automatic tools to cover the complex issues such as transformation between high and low end binary data as well as the more trivial ones. Even more important for the typical user is compatibility between revisions of the same product or the same operating system family. It is here in particular that many products completely fail to live up to expectations. It is a completely unacceptable waste of precious resources for database administrators or other highly expert personnel to be engaged on completely unnecessary exercises in data conversion simply in order to benefit from the new release of a database product from the same vendor.

Code compatibility is the equivalent criterion for data handling tools and once again users have the right to expect code to run unchanged, and wherever possible, without recompilation across platforms, operating systems and product revisions with only the minimal operating system dependent changes required. Once again, many of the products which make extensive claims of openness fail miserably when it comes to proving it in a real situation.

Accessibility of data is absolutely critical in the world of complex enterprise-wide interchange of data that the mature database market faces. To meet the criteria of true openness in this regard, data must be accessible independently of the database vendor's own tools. This is the difference between genuinely open data and data that is, in reality, held in black-box proprietary formats, (often directly on disk), with a limited range of accessibility tools available, often at additional cost.

The pay-offs for users who have a truly open database and data tools are very considerable. Skills, tools and code become immediately re-useable. Internal support costs are kept to the bare minimum, freeing up resources for new development. Integration with disparate other forms of data and application becomes not only possible, but simple, without resource to costly and time-consuming additional products. We need look no further than the Internet to see the phenomenal changes to capabilities and costs that an open approach to data makes possible.



  Component Productivity

Reuse without redoing is the characteristic benefit of components. That is not achieved just by not having to completely redevelop a component if the resource that is saved has to be wasted on keeping a complex supportive environment up-to-date. Loading up CDs full of software every other week, continually learning new development environments and spending

half of your life on training courses picking up soon-to-be-obsolete esoterica are not productive activities. Products that push users down this route can never deliver the real benefits of a component approach that depend critically on containment, stability and standards.

A component must, by definition, be self-contained. Data or code which cannot operate independently of a proprietary supportive environment is not a component itself. It can only become a component if that whole environment is wrapped up with it. This is possible, for example in the case of Java applets which carry all their supporting classes with them. However it is important to realise that what may seem at first sight like a simple straightforward component may in actual fact be much more complex and clumsy.

Containment has nothing to do with the internal technology used to create the component, and whether it is based on this or that object framework, it has everything to do with how it can be used. Internet URLs present a classic example of successfully component-ised elements. They deliver their content, regardless of how it is formed, in a consistent, standard, open format that is immediately re-useable in an unlimited number of contexts. Contrast this with traditional fat client applications where every supposed "component" is in actual fact dependent on a complex environment of proprietary and third party "DLL"s and set-up information. They may be easy to use if the environment is all exactly as expected, but that does not make them genuinely re-useable components.

The most common re-use of components is re-use over time. It is frankly bizarre that some of the vendors who make most noise about how reusable their components are, have the very worst track records in maintaining stability of components, whether data or code, which employed their technologies in the past. It cannot emphasised too strongly that there is no point in reusing the same components across different application areas if they will not even go on working consistently in one area without further work tomorrow. The vendors in the database and data tools market who are truly committed to components and reuse have shown it in the reusability over time of the applications based on their technology in the past.

Conformance to standards is the key to effective delivery of components. Not ad-hoc vendor standards but the standards that underpin the day-to-day use of computer systems. Within the context of databases and data tools, much has been made of SQL as an all encompassing, all important standard. However, this has much more to do with the failure to adhere to any other standards by the major database vendors than the intrinsic value of SQL, which was, after all, only intended as a "query language".

Standards in data storage, for example the use of operating system files and indexing systems, are equally important for true reusability. The almost complete failure to address this issue leads to the bizarre situation of almost every database vendor completely reinventing their own systems of disc storage, backup, security and administration while disabling all the know-how and tools already supplied, (and paid for), with the supporting operating system. This may be good for their business, but it is of no value to the user who has to pay to be locked in.

When it comes to the reality of the actual data manipulation required to process data, the heart of all data applications, there is an equally appalling lack of standards. In practice, adding and updating of data is not carried out on the basis of the multi-row, multi-join relational sets with which SQL is concerned. In by far the majority of real applications, it is carried out on a per-row, per-table basis using "cursors" that bear no relation to the original SQL set theory but bear every resemblance to the same record-handling metaphors and techniques which have been employed for decades. The great loss for the end-user, is that, with a few exceptions, (of which Doric's products are the most noticeable), no standards are available to help with the real job of manipulating data in a simple, but powerful, and consistent form.



  Architecture Robustness

This is one area where the major RDBMs and tool suppliers can hardly be accused of being lacking. They tend to have huge, complex, (albeit proprietary), architectures, within which even experienced specialist data base administrators often get lost. What is more, they have the habit of changing them at rapid speed, often between each version of their software, creating immediate incompatibilities and time-wasting data reloads.

On the other hand, inadequate architecture and a lack of structure is a weakness of many of the alternatives which have been developed to find a way round the lack of openness and a component approach of most offerings in the market.

In particular, the wildfire growth of mini data applications, using powerful tools such as perl and tcl, to serve burgeoning Internet-driven demand for open component data handling, has created a major problem in maintaining and making full use of the data and applications available.

Many of these suffer badly, (and unnecessarily), from the absence of any kind of architectural structure. The key requirements in architectural terms center around independence of data definition, linking to external data, grouping of data files and code into projects/users and availability of a standard, powerful interface for data manipulation and check-out.

Independence of data definition from specific code was and remains an extremely powerful incentive for employing the services of a database product. As soon as data is required for more than one purpose, let alone in the context of data of enterprise value, a readily accessible data definition, independent of the scripts or programs and which allows for easy extension and modification of that definition is essential.

An ability to link to external data which does not form part of the "database in view", is an inevitable requirement in any successful data application. It may well be that the link does not have to be live. In many cases a recent copy of data is perfectly adequate. However, if ongoing support costs of maintaining the data link adequately up to date are to be kept low, a generic architecture for handling this in terms of database structure and appropriate tools is very valuable. If this includes direct access into a wide range of proprietary databases via ODBC and/or gateways, it becomes indispensable in an enterprise context.

While an over-emphasis on tight grouping of related data into proprietary databases can be very constricting, an architecture which enables easy grouping, (and ungrouping) of data (and the related code) into a consistent directory framework is greatly liberating. It ensures that the focus remains on delivery of service rather than the backend housekeeping which any but the most trivial uses of data soon require.

The final benefit of a consistent, flexible, file-based, architecture is the opportunity to employ powerful interfaces for data manipulation and check-out without reinvention of the wheel. For "data debugging" alone these are well worthwhile. While a lot of attention is given to the ease of debugging of powerful scripting and development tools, the question of checking out the successful processing of the data is all too frequently ignored. This leads inevitably to either time wasted on developing ad-hoc check out tools or, more seriously, a failure to fully test the correctness of data manipulation. The costs of insufficient direct access to the data increase during the life of any data application as data problems become increasingly difficult and expensive to fix.