Applying XML

By now your interest in XML may be piqued but you may not still be clear on where and how XML fits in. Allow me to present some scenarios -- some general, some specific, some hypothetical, and some real -- with their general XML-based solution outlines. The hypothetical ones are potential seeds for business ideas.

General Uses of XML: Making Database Information Available on the Web

As an example, one of the first things you learn when you get into information systems is how important database systems are and the number of opportunities that arise in using the Internet to interface with your companies database. This was the case at XOG Xerox, when we wanted to move some transactions in our current service CRM system to our field service organization as well as our customers via the Web. In other words, providing a mechanism for the field to communicate their job status, and for our customers to obtain printer contracts via a web browser.

However, what is not usually mentioned is how hard it is to translate information back and forth between the web/Internet and the back end databases. There is currently no standard way of binding the internet to existing databases. Sure, there are plenty of custom solutions, such as Coldfusion, Tango, ASP, etc., but all involve many, many lines of code in the form of CGI (common gateway interfaces, such as ColdFusions CFM code and scripts), but there isn't a really standardized way of making an SQL query and generating a report or document connection over the Internet.

Furthermore, once you've generated the query, the results that come back to you are not necessarily the most useful. Sure they may be in a pretty HTML page, it might even have the correct data, but then what? How do you export that data into Excel or some other package (possibly between databases or data sources) for further processing? Cutting and pasting is an option, but it is cumbersome and irritating and scales very poorly with larger data, the figure below illustrates this problem.

myProbIntergWithDB (Ceponkus A, Hoodbhoy F, 1999)

This situation has a lot to do with the way information is exchanged over the web, and the vehicles that we use to travel the internet, via our browsers. Usually, information transfers are made using the HTTP protocol. This protocol, which is a standardized way of transferring information, is done via our HTML pages, over the Internet/intranet. Typically, the information transferred is mostly marked-up text and the markup itself contains pointers to which binary files also need to be transferred (for example, icons or sounds). The client browser receives the markup and, as it receives markup, starts making further requests based on the pointers and renders everything together in the shape of an HTML page.

Thus when making database queries, a database server normally receives an SQL syntax request and gives you back a table that is used to generate a report or transfer data to another database or data source. If the client has direct access to the database server, then it's a fairly mundane process in making requests and updates. Typically in an ideal situation you don't want the user to have direct access to the server engine, both for security reasons, and because you normally want to isolate the database logic from a direct user. Instead, the user has to access a gateway (ASP, PHP, ColdFusion, Tango etc.) to the database, which both generates the query and packages and transforms the results of the query into an HTML page. This is then sent to the end users client machine.

The situation to consider here is the processing overhead, it is quickly realized that a lot of steps are implicit and take up significant resources that include, processor overhead, excess network traffic, hard disk access times, throughput time, etc. Granted processor speeds and processor numbers, memory, and network speeds are constantly increasing, but overhead is overhead, and XML offers a better solution. The diagram below shows the typical database to web solution.

myClientDB-WebSol
(Ceponkus A, Hoodbhoy F, 1999)

The above solution is extremely involved, and there are many ways of doing it. The problem is that very few of those methods (ColdFusion, Tango, ASP, etc.) are economical (both in money and complexity), and all tax the web server intensively. The key is in the Gateway. Usually in the ColdFusion environment, we add more servers to the cluster, which is/can be expensive and requires configuration changes for each expansion. Therefore, the gateway has to be able to do the following.

  1. Convert the request from the client (usually posted tin HTML or plain text) into a SQL query string.
  2. Send the query to the database and wait for the results.
  3. Receive th results and transform them into an HTML page.
  4. Send the HTML page to the requesting Client.

The challenge is that the client's browser gets a result set from the web server that can be viewable, but it is very difficult to generate further information without taxing the web server further. Thus, the received data is in a sense static, because the user can't change it without doing another query with different parameter inputs. The idea of generating additional information (not data) are items such as sorting, regrouping, calculating, rendering, or transferring easily to another database schema.

This also means that the clients machine is pretty much wasted (as far as computing data rendering machine). The table below lists the limitations of getting just HTML pages served from a database connection via the web.

Item Limitation
1 Server is heavily taxed, resulting in poor performance for many users.
2 Difficult to implement and program.
3 The information the client receives is WYSIWYG. The client cannot easily perform any further processing on the information other then view it.
4 For every request, the server has to spend a lot of resources formatting user information.
5 Every page sent to the web client invariably contains more formatting information than raw content. This results in large files and slow data transfers.

However, all said and done, the above solution does work, although inefficiently. As the clientele increases the web server is going to increasingly bog down over time. Customer complaints about poor performance is the possible outcome, resulting in competitors receiving some or your clients. As stated earlier you can add another server to the cluster, but there is a better and less expensive way to attack this problem. This solution is XML.

In fact, XML and databases fit together nicely, and most applications that process data can use both forms to their advantage, (Pitts N., 2001).

So why use XML, why have vendors like Oracle, IBM, Microsoft, Apple, Sun, SAP, and many others moved so fast to the support XML? After all, these companies have worked for many years to fine-tune the efficiency of their proprietary data formats and tools. The reason is simple: as a vendor-neutral, platform-neutral, language-neutral technology for web-based data exchange, the XML family of standards solves a key problem for these companies' customers, XML simplifies the task of connecting applications and services over the Web (Muench S., 2000).

Also, as stated above, the real bottlenecks are at the web server's end. The biggest reason for this is that the size of the each page sent to the web client is inflated by formatting information. By using a system that incorporates XML, XSL stylesheets, and XML schemas, users perform a one-time download of the stylesheet. On every request made to the web server thereafter, users receive only raw content information (contracts, articles, stock updates, database queries, sports scores, etc.), and the formatting is applied by the stylesheet(s) that is cached at the client side. The diagram below shows the XML in a classic three-tier solution model or environment. You still need the web server and several other classic modules such as ColdFusion. What XML does is add in the abstract layer that connects the web server with the client in a richer way than is currently possible. Using the middle tier, we are able to integrate information from many disparate information sources.

myXML_AppliedToThreeTier

At XOG Xerox we use XML as a package to ship data between various disparate databases, data stores, and then sending this data to the web for customer data gathering, and contract acceptance. The XML system is basically getting contracts out of a Xerox Metrix CRM system, with Oracle as the data source which stores the different formats of contracting. XML is used to convert/store and update this database data to a useable Web document that a customer can use to apply for a contract.

***At this point we can add the CASE DATA that I supplied earlier in semester***

References:

Oracle XML Applications, O'Reilly, Steve Muench, 2000, pgs 17 - 22

XML, Black Book, 2nd Ed, Coriolis, Pitts Natanya, 2001, pgs 581 - 582

Applied XML: A Toolkit for Programers, Wiley, Alex Ceponkus, Faraz Hoodbhoy, 1999, pgs 43 - 48

Thoughts, Ideas, Concerns...

Bob --