WebSTAR 4 Manual & Technical Reference

Manual Contents | Chapter Contents | Previous Page | Next Page

WebSTAR SSI-WebInclude

WebSTAR SSI-WebInclude is a WebSTAR Plug-In that adds functionality to the WebSTAR SSI Plug-In, allowing it to dynamically include parts of remote web pages into your web pages. WebInclude introduces two new HTML-like tags. The <WEBINCLUDE> tag retrieves HTML pages from a remote web site and merges them into the current SSI input stream. The <TAGEXTRACT> tag extracts specific images, tables, etc. from HTML text (usually HTML returned via <WEBINCLUDE> ). Most relative URLs in the text returned by <WEBINCLUDE> are automatically converted into the correct full URLs.

WARNING: By installing the WebSTAR SSI-WebInclude Plug-In you may compromise the security of your Web server. Make sure you read and understand the security implications described below before installing this software.

WebInclude Security

Even if you never use it, installation of the WebSTAR SSI-WebInclude plug-in could compromise the security of your Web server, or other servers at your location. The problem is that all HTTP requests that are made by HTML files including the SSI-WebInclude tags are issued from your server, effectively masking the identity (i.e., the host name or IP address) of the person who is making the request. So servers that protect files from unauthorized access using a security mechanism, like WebSTAR's Allow/Deny, will see <WEBINCLUDE> requests coming from your server instead of the client's machine. If your server is allowed access, but the client is not, it is possible for the client to access confidential files via <WEBINCLUDE> .

Installation of WebInclude is safe whenever either of the following two conditions are met:

1 Only trusted people (such as the web site administrator) have the ability to add files to your web site.
2 Your Web server machine does not have unrestricted access to another Web server. If no other server is granting access to your server based on host name or IP address, then there is no way for someone else to abuse that access via the <WEBINCLUDE> command.

WebInclude is smart enough to ignore <WEBINCLUDE> commands that reference the same server (that is, when both client and server IP addresses are the same). The real danger is to other servers in the same domain, or other web sites being serviced by the same server (via virtual hosts), which allow other local machines access to their data. Thus, WebInclude may pose the greatest danger to your web site, when it installed on another server in your own domain!

To protect your web site against unauthorized use of WebInclude, you may need to use WebSTAR's Allow/Deny security to deny access to any local hosts (or IP addresses) that have no business sending HTTP requests to your web site.

Legal Issues

Do not use SSI-WebInclude data from other sites without asking permission: assume that all data is protected by copyright. Even within an institution, there may be implications in republishing data of which you are not aware. Many informational and even commercial sites will happily give you permission to include and republish their data, but you should always ask first. If you include data without permission, the original publishers may just deny you access to the site, or they may sue you for copyright infringement.

Installation

The WebSTAR Installer does not install the WebSTAR SSI-WebInclude plug-in by default. After you have read the security section WebInclude Security , you can install it easily:

1 Open the WebSTAR Server Suite Installer and choose Custom Install / Server Suite / Plug-Ins / Extra Plug-In modules / WebSTAR SSI-WebInclude.
2 If the WebSTAR SSI plug-in is not already installed in your Web server's Plug-Ins folder, choose Plug-Ins / Core Plug-Ins / WebSTAR SSI as well.
3 Select your current WebSTAR Server Suite folder.
4 Click the Install button and proceed.
5 Quit and restart the WebSTAR application.

Using WebSTAR SSI-WebInclude

WebSTAR SSI-WebInclude defines two new HTML-like tags that conform to the conventional syntax.

<WEBINCLUDE> goes to a server and gets a web page, just like a browser or other HTTP client. For example, the following HTML page grabs and re-server the WebSTAR home page:

 

<HTML>

 
	<HEAD>

 
		<TITLE>WebSTAR SSI-WebInclude Example</TITLE>

 
	</HEAD>

 
	<BODY>

 
		<WEBINCLUDE

 
		URL="http://www.starnine.com/webstar/webstar.html">

 
	</BODY>

 
</HTML>

The <TAGINCLUDE> tag can extract specific text or images from the retrieved HTML and display it as part of your page. You define which parts of the HTML you want to include by using the TAG and WHICH attributes of the tag.

the following example extracts the third graphic image found on page http://remote.host.com/somepage.html and inserts it into the SSI input stream:

 

<TAGEXTRACT TAG="IMG" WHICH="3">

 
	<WEBINCLUDE URL="http://remote.host.com/somepage.html">

 
</TAGEXTRACT>

Relative URLs in <IMG SRC="..."> , <A HREF="..."> , and <BODY BACKGROUND="..."> tags in the downloaded page are automatically converted to the correct full URLs (using the provided or default baseurl value). If the text retrieved by webinclude contains other relative URLs, you must precede the webinclude command with an HTML <BASE HREF="..."> tag.

Syntax Notes

The WebSTAR SSI plug-in is very forgiving about the blank spaces in commands. In general, zero or more spaces may appear between any two symbols inside the comment markers; but at least one space must appear immediately before any tag name. All commands, tags, and values are case insensitive (i.e., there is no difference between upper and lower case). Although tags are limited to 32 characters, value sizes are bounded only by available memory.

If for any reason an WebSTAR SSI-WebInclude command generates an error, the appropriate error message is generated within an HTML comment and will appear at the original position of the command. For example:

 

<!--#WEBINCLUDE ERROR: Remote url not found. -->

Usually, the values associated with tags are constant strings, and are thus surrounded by double-quotes ("). To embed double-quotes within a quoted value, simply double all internal quote marks (e.g., "This ("") is a double-quote"). Alternatively, you can URL encode the value (e.g,. "This (%22) is a double-quote.").

It is occasionally useful to associate a dynamic value with a tag. Because WebSTAR SSI-WebInclude merely augments the functionality of WebSTAR SSI, this can be done by surrounding SSI commands with single-quotes ('). For example:

 

<WEBINCLUDE URL='<!--#echo var = "piRefererKeyword"-->'>

All single-quoted text is processed by SSI before being used as an argument to the specified SSI command (in this case, config). As with double-quotes, to embed a single-quote in a single-quoted value, either double the embedded double-quote marks (e.g., "don't") or URL encode the quoted value.

SSI WebInclude Commands

webinclude

The webinclude command accepts four arguments, although only url is required.

url

The complete URL of the file to download. Relative URLs will result in an error.

baseurl

The initial portion of a URL to be inserted into all relative URLs. If this optional argument is omitted, it will be automatically derived from the url.

xtra_header

Extra header text to be inserted at the end of the HTTP request header. Remember to end each line with %0D%0A (i.e., \r\n). This argument is optional.

strip_header

A value denoting whether or not WebInclude should strip the HTTP response header from the downloaded page. Allowed values for this argument are:

 
true, false, yes, no, 0, 1.

This argument is optional. The default value is true .

tagextract

The tagextract command extracts specific pieces of HTML out of a larger body of HTML text. For example:

 

				<TAGEXTRACT tag="img" which="3">

 
		<WEBINCLUDE url="http://www.domain.com/">

 
	</TAGEXTRACT>

would extract the third image tag (e.g., <IMG SRC="http://..."> ) from the domain.com home page.

The tagextract command accepts four arguments:

tag

the name of the HTML tag to extract from the enclosed HTML.

The tag argument is required. Specific extraction code is provided for the following HTML tags: IMG , BR , LI , META .

In addition, any HTML tag that has both an open (e.g., <A HREF="..."> ) and close (e.g., </A> ), tag may also be extracted: TABLE , A , TITLE , OL , UL , P etc.

which

a number that specifies which tag of the specified type to extract. This argument is optional, and defaults to 1 .

start

used with a tag value of "search", see below.

end

used with a tag value of "search", see below.

Both the start and end tags are optional.

FLAY

This special tag strips the outermost open and close tags from the enclosed HTML text. For example:

 

<TAGEXTRACT tag="flay">

 
	<A HREF="http://www.domain.com/">This is an anchor.</A>

 
</TAGEXTRACT>

would strip the <A...> ... </A> tags, leaving only the text This is an anchor. displayed in the client's browser.

SEARCH

This tag allows you to extract substrings from a body of text. Two special arguments called start and end are strings which should surround the text you want to extract. For example:

 

<TAGEXTRACT tag="search" start="the " end=" system">

 
	This is a test of the emergency broadcast system.

 
</TAGEXTRACT>

would extract the string "emergency broadcast", but not the start and end text enclosing it. If you want that text, just tack it on before and after, as follows:

 

the <TAGEXTRACT tag="search" start="the " end=" system">

 
	This is a test of the emergency broadcast system.

 
</TAGEXTRACT> system

Both the start and end arguments are optional.

Nesting TAGEXTRACT Commands

Due to the vagaries of HTML parsing, you must not nest multiple tagextract commands within each other. However, because this is a very useful thing to do, the WebInclude Plug-In gets around this limitation by registering 10 distinct tagextract commands with SSI, named: "tagextract", "tagextract1", "tagextract2", ..., "tagextract9". Thus, you may nest different tagextract commands within each other, as long as they all have distinct numbers. For example, to extract the third image from the second row of the first table from a web page at "http://www.starnine.com/", the following command structure could be used:

 

			<TAGEXTRACT tag="img" which="3">

 
		<TAGEXTRACT1 tag="tr" which="2">

 
			<TAGEXTRACT2 tag="table">

 
				<WEBINCLUDE url="http://www.starnine.com/">

 
			</TAGEXTRACT2>

 
		</TAGEXTRACT1>

 
	</TAGEXTRACT>

Extracting Nested HTML Tags

The tagextract command properly handles nested HTML tags. For example, suppose an HTML page contains two tables, and that each of those tables contains several nested tables. Then the following commands:

 

							<TAGEXTRACT1 tag="table" which="2">

 
		<TAGEXTRACT2 tag="flay">

 
			<TAGEXTRACT3 tag="table" which="2">

 
				<WEBINCLUDE url="http://www.domain.com/">

 
			</TAGEXTRACT3>

 
		</TAGEXTRACT2>

 
	</TAGEXTRACT1>

would first retrieve the page specified in the webinclude command. The innermost tagextract command (i.e., TAGEXTRACT3) would find the first table, skip over it and all of its contents (including any nested tables), and then extract the second major table on the page. The middle tagextract command (i.e., TAGEXTRACT2) would then remove the <table>, </table> tags from the extracted text. The outermost tagextract commands (i.e., TAGEXTRACT1) would then find the second table in the extracted text.