Overview
Within the context of database and data handling software, products
based on a true Open Component Architecture are easy to distinguish.
They offer the flexibility of Open Systems standards, so that they
are available on a wide range of platforms, compatible with a wide
range of operating environments, and their data and code is easily
accessible through a wide range of routes.
Re-usability based on simple, stable, self-contained components
which go on working year in year out brings genuine productivity rather
than continual re-working, re-skilling and re-building.
A consistent structure that provides a clear architecture to guide
use, development, maintenance is essential if low deployment costs
are to be reflected in low support costs to make integration across
the enterprise truly viable.
Open
Flexibility
Availability on a wide range of platforms, minimally including Unix
and Microsoft operating systems is an essential pre-requisite for
any database or data handling product with any serious claims to an
Open Systems heritage. However, it is not enough. Availability must
be matched by compatibility and accessibility.
For databases themselves this means data compatibility. Movement
of complete databases between platforms should be straightforward
with appropriate automatic or semi-automatic tools to cover the complex
issues such as transformation between high and low end binary data
as well as the more trivial ones. Even more important for the typical
user is compatibility between revisions of the same product or the
same operating system family. It is here in particular that many products
completely fail to live up to expectations. It is a completely unacceptable
waste of precious resources for database administrators or other highly
expert personnel to be engaged on completely unnecessary exercises
in data conversion simply in order to benefit from the new release
of a database product from the same vendor.
Code compatibility is the equivalent criterion for data handling
tools and once again users have the right to expect code to run unchanged,
and wherever possible, without recompilation across platforms, operating
systems and product revisions with only the minimal operating system
dependent changes required. Once again, many of the products which
make extensive claims of openness fail miserably when it comes to
proving it in a real situation.
Accessibility of data is absolutely critical in the world of complex
enterprise-wide interchange of data that the mature database market
faces. To meet the criteria of true openness in this regard,
data must be accessible independently of the database vendor's own
tools. This is the difference between genuinely open data and data
that is, in reality, held in black-box proprietary formats, (often
directly on disk), with a limited range of accessibility tools available,
often at additional cost.
The pay-offs for users who have a truly open database and data tools
are very considerable. Skills, tools and code become immediately re-useable.
Internal support costs are kept to the bare minimum, freeing up resources
for new development. Integration with disparate other forms of data
and application becomes not only possible, but simple, without resource
to costly and time-consuming additional products. We need look no
further than the Internet to see the phenomenal changes to capabilities
and costs that an open approach to data makes possible.
Component Productivity
Reuse without redoing is the characteristic benefit of components.
That is not achieved just by not having to completely redevelop a
component if the resource that is saved has to be wasted on keeping
a complex supportive environment up-to-date. Loading up CDs full of
software every other week, continually learning new development environments
and spending
half of your life on training courses picking up soon-to-be-obsolete
esoterica are not productive activities. Products that push users
down this route can never deliver the real benefits of a component
approach that depend critically on containment, stability and standards.
A component must, by definition, be self-contained. Data or code
which cannot operate independently of a proprietary supportive environment
is not a component itself. It can only become a component if that
whole environment is wrapped up with it. This is possible, for example
in the case of Java applets which carry all their supporting classes
with them. However it is important to realise that what may seem at
first sight like a simple straightforward component may in actual
fact be much more complex and clumsy.
Containment has nothing to do with the internal technology used
to create the component, and whether it is based on this or that object
framework, it has everything to do with how it can be used. Internet
URLs present a classic example of successfully component-ised elements.
They deliver their content, regardless of how it is formed, in a consistent,
standard, open format that is immediately re-useable in an unlimited
number of contexts. Contrast this with traditional fat client applications
where every supposed "component" is in actual fact dependent on a
complex environment of proprietary and third party "DLL"s and set-up
information. They may be easy to use if the environment is all exactly
as expected, but that does not make them genuinely re-useable components.
The most common re-use of components is re-use over time. It is
frankly bizarre that some of the vendors who make most noise about
how reusable their components are, have the very worst track records
in maintaining stability of components, whether data or code, which
employed their technologies in the past. It cannot emphasised too
strongly that there is no point in reusing the same components across
different application areas if they will not even go on working consistently
in one area without further work tomorrow. The vendors in the database
and data tools market who are truly committed to components and reuse
have shown it in the reusability over time of the applications based
on their technology in the past.
Conformance to standards is the key to effective delivery of components.
Not ad-hoc vendor standards but the standards that underpin the day-to-day
use of computer systems. Within the context of databases and data
tools, much has been made of SQL as an all encompassing, all important
standard. However, this has much more to do with the failure to adhere
to any other standards by the major database vendors than the intrinsic
value of SQL, which was, after all, only intended as a "query language".
Standards in data storage, for example the use of operating system
files and indexing systems, are equally important for true reusability.
The almost complete failure to address this issue leads to the bizarre
situation of almost every database vendor completely reinventing their
own systems of disc storage, backup, security and administration while
disabling all the know-how and tools already supplied, (and paid for),
with the supporting operating system. This may be good for their business,
but it is of no value to the user who has to pay to be locked in.
When it comes to the reality of the actual data manipulation required
to process data, the heart of all data applications, there is an equally
appalling lack of standards. In practice, adding and updating of data
is not carried out on the basis of the multi-row, multi-join relational
sets with which SQL is concerned. In by far the majority of real applications,
it is carried out on a per-row, per-table basis using "cursors" that
bear no relation to the original SQL set theory but bear every resemblance
to the same record-handling metaphors and techniques which have been
employed for decades. The great loss for the end-user, is that, with
a few exceptions, (of which Doric's products are the most noticeable),
no standards are available to help with the real job of manipulating
data in a simple, but powerful, and consistent form.
Architecture Robustness
This is one area where the major RDBMs and tool suppliers can hardly
be accused of being lacking. They tend to have huge, complex, (albeit
proprietary), architectures, within which even experienced specialist
data base administrators often get lost. What is more, they have the
habit of changing them at rapid speed, often between each version
of their software, creating immediate incompatibilities and time-wasting
data reloads.
On the other hand, inadequate architecture and a lack of structure is a weakness
of many of the alternatives which have been developed to find a way
round the lack of openness and a component approach of most offerings
in the market.
In particular, the wildfire growth of mini data applications, using
powerful tools such as perl and tcl, to serve burgeoning Internet-driven
demand for open component data handling, has created a major problem
in maintaining and making full use of the data and applications available.
Many of these suffer badly, (and unnecessarily), from the absence
of any kind of architectural structure. The key requirements in architectural
terms center around independence of data definition, linking to external
data, grouping of data files and code into projects/users and availability
of a standard, powerful interface for data manipulation and check-out.
Independence of data definition from specific code was and remains
an extremely powerful incentive for employing the services of a database
product. As soon as data is required for more than one purpose, let
alone in the context of data of enterprise value, a readily accessible
data definition, independent of the scripts or programs and which
allows for easy extension and modification of that definition is essential.
An ability to link to external data which does not form part of
the "database in view", is an inevitable requirement in any successful
data application. It may well be that the link does not have to be
live. In many cases a recent copy of data is perfectly adequate. However,
if ongoing support costs of maintaining the data link adequately up
to date are to be kept low, a generic architecture for handling this
in terms of database structure and appropriate tools is very valuable.
If this includes direct access into a wide range of proprietary databases
via ODBC and/or gateways, it becomes indispensable in an enterprise
context.
While an over-emphasis on tight grouping of related data into proprietary
databases can be very constricting, an architecture which enables
easy grouping, (and ungrouping) of data (and the related code) into
a consistent directory framework is greatly liberating. It ensures
that the focus remains on delivery of service rather than the backend
housekeeping which any but the most trivial uses of data soon require.
The final benefit of a consistent, flexible, file-based, architecture
is the opportunity to employ powerful interfaces for data manipulation
and check-out without reinvention of the wheel. For "data debugging"
alone these are well worthwhile. While a lot of attention is given
to the ease of debugging of powerful scripting and development tools,
the question of checking out the successful processing of the data
is all too frequently ignored. This leads inevitably to either time
wasted on developing ad-hoc check out tools or, more seriously, a
failure to fully test the correctness of data manipulation. The costs
of insufficient direct access to the data increase during the life
of any data application as data problems become increasingly difficult
and expensive to fix.
|