WebSTAR SSI-WebInclude is a WebSTAR Plug-In that adds functionality to the WebSTAR SSI Plug-In, allowing it to dynamically include parts of remote web pages into your web pages. WebInclude introduces two new HTML-like tags. The <WEBINCLUDE> tag retrieves HTML pages from a remote web site and merges them into the current SSI input stream. The <TAGEXTRACT> tag extracts specific images, tables, etc. from HTML text (usually HTML returned via <WEBINCLUDE> ). Most relative URLs in the text returned by <WEBINCLUDE> are automatically converted into the correct full URLs.
WARNING: By installing the WebSTAR SSI-WebInclude Plug-In you may compromise the security of your Web server. Make sure you read and understand the security implications described below before installing this software.
Even if you never use it, installation of the WebSTAR SSI-WebInclude plug-in could compromise the security of your Web server, or other servers at your location. The problem is that all HTTP requests that are made by HTML files including the SSI-WebInclude tags are issued from your server, effectively masking the identity (i.e., the host name or IP address) of the person who is making the request. So servers that protect files from unauthorized access using a security mechanism, like WebSTAR's Allow/Deny, will see <WEBINCLUDE> requests coming from your server instead of the client's machine. If your server is allowed access, but the client is not, it is possible for the client to access confidential files via <WEBINCLUDE> .
Installation of WebInclude is safe whenever either of the following two conditions are met:
WebInclude is smart enough to ignore <WEBINCLUDE> commands that reference the same server (that is, when both client and server IP addresses are the same). The real danger is to other servers in the same domain, or other web sites being serviced by the same server (via virtual hosts), which allow other local machines access to their data. Thus, WebInclude may pose the greatest danger to your web site, when it installed on another server in your own domain!
To protect your web site against unauthorized use of WebInclude, you may need to use WebSTAR's Allow/Deny security to deny access to any local hosts (or IP addresses) that have no business sending HTTP requests to your web site.
Do not use SSI-WebInclude data from other sites without asking permission: assume that all data is protected by copyright. Even within an institution, there may be implications in republishing data of which you are not aware. Many informational and even commercial sites will happily give you permission to include and republish their data, but you should always ask first. If you include data without permission, the original publishers may just deny you access to the site, or they may sue you for copyright infringement.
The WebSTAR Installer does not install the WebSTAR SSI-WebInclude plug-in by default. After you have read the security section WebInclude Security , you can install it easily:
WebSTAR SSI-WebInclude defines two new HTML-like tags that conform to the conventional syntax.
<WEBINCLUDE> goes to a server and gets a web page, just like a browser or other HTTP client. For example, the following HTML page grabs and re-server the WebSTAR home page:
<HTML>
<HEAD>
<TITLE>WebSTAR SSI-WebInclude Example</TITLE>
</HEAD>
<BODY>
<WEBINCLUDE
URL="http://www.starnine.com/webstar/webstar.html">
</BODY>
</HTML>
The <TAGINCLUDE> tag can extract specific text or images from the retrieved HTML and display it as part of your page. You define which parts of the HTML you want to include by using the TAG and WHICH attributes of the tag.
the following example extracts the third graphic image found on page http://remote.host.com/somepage.html and inserts it into the SSI input stream:
<TAGEXTRACT TAG="IMG" WHICH="3">
<WEBINCLUDE URL="http://remote.host.com/somepage.html">
</TAGEXTRACT>
Relative URLs in <IMG SRC="..."> , <A HREF="..."> , and <BODY BACKGROUND="..."> tags in the downloaded page are automatically converted to the correct full URLs (using the provided or default baseurl value). If the text retrieved by webinclude contains other relative URLs, you must precede the webinclude command with an HTML <BASE HREF="..."> tag.
The WebSTAR SSI plug-in is very forgiving about the blank spaces in commands. In general, zero or more spaces may appear between any two symbols inside the comment markers; but at least one space must appear immediately before any tag name. All commands, tags, and values are case insensitive (i.e., there is no difference between upper and lower case). Although tags are limited to 32 characters, value sizes are bounded only by available memory.
If for any reason an WebSTAR SSI-WebInclude command generates an error, the appropriate error message is generated within an HTML comment and will appear at the original position of the command. For example:
<!--#WEBINCLUDE ERROR: Remote url not found. -->
Usually, the values associated with tags are constant strings, and are thus surrounded by double-quotes ("). To embed double-quotes within a quoted value, simply double all internal quote marks (e.g., "This ("") is a double-quote"). Alternatively, you can URL encode the value (e.g,. "This (%22) is a double-quote.").
It is occasionally useful to associate a dynamic value with a tag. Because WebSTAR SSI-WebInclude merely augments the functionality of WebSTAR SSI, this can be done by surrounding SSI commands with single-quotes ('). For example:
<WEBINCLUDE URL='<!--#echo var = "piRefererKeyword"-->'>
All single-quoted text is processed by SSI before being used as an argument to the specified SSI command (in this case, config). As with double-quotes, to embed a single-quote in a single-quoted value, either double the embedded double-quote marks (e.g., "don't") or URL encode the quoted value.
The webinclude command accepts four arguments, although only url is required.
The complete URL of the file to download. Relative URLs will result in an error.
The initial portion of a URL to be inserted into all relative URLs. If this optional argument is omitted, it will be automatically derived from the url.
Extra header text to be inserted at the end of the HTTP request header. Remember to end each line with %0D%0A (i.e., \r\n). This argument is optional.
A value denoting whether or not WebInclude should strip the HTTP response header from the downloaded page. Allowed values for this argument are:
true, false, yes, no, 0, 1.
This argument is optional. The default value is true .
The tagextract command extracts specific pieces of HTML out of a larger body of HTML text. For example:
<TAGEXTRACT tag="img" which="3">
<WEBINCLUDE url="http://www.domain.com/">
</TAGEXTRACT>
would extract the third image tag (e.g., <IMG SRC="http://..."> ) from the domain.com home page.
The tagextract command accepts four arguments:
the name of the HTML tag to extract from the enclosed HTML.
The tag argument is required. Specific extraction code is provided for the following HTML tags: IMG , BR , LI , META .
In addition, any HTML tag that has both an open (e.g., <A HREF="..."> ) and close (e.g., </A> ), tag may also be extracted: TABLE , A , TITLE , OL , UL , P etc.
a number that specifies which tag of the specified type to extract. This argument is optional, and defaults to 1 .
used with a tag value of "search", see below.
used with a tag value of "search", see below.
Both the start and end tags are optional.
This special tag strips the outermost open and close tags from the enclosed HTML text. For example:
<TAGEXTRACT tag="flay">
<A HREF="http://www.domain.com/">This is an anchor.</A>
</TAGEXTRACT>
would strip the <A...> ... </A> tags, leaving only the text This is an anchor. displayed in the client's browser.
This tag allows you to extract substrings from a body of text. Two special arguments called start and end are strings which should surround the text you want to extract. For example:
<TAGEXTRACT tag="search" start="the " end=" system">
This is a test of the emergency broadcast system.
</TAGEXTRACT>
would extract the string "emergency broadcast", but not the start and end text enclosing it. If you want that text, just tack it on before and after, as follows:
the <TAGEXTRACT tag="search" start="the " end=" system">
This is a test of the emergency broadcast system.
</TAGEXTRACT> system
Both the start and end arguments are optional.
Due to the vagaries of HTML parsing, you must not nest multiple tagextract commands within each other. However, because this is a very useful thing to do, the WebInclude Plug-In gets around this limitation by registering 10 distinct tagextract commands with SSI, named: "tagextract", "tagextract1", "tagextract2", ..., "tagextract9". Thus, you may nest different tagextract commands within each other, as long as they all have distinct numbers. For example, to extract the third image from the second row of the first table from a web page at "http://www.starnine.com/", the following command structure could be used:
<TAGEXTRACT tag="img" which="3">
<TAGEXTRACT1 tag="tr" which="2">
<TAGEXTRACT2 tag="table">
<WEBINCLUDE url="http://www.starnine.com/">
</TAGEXTRACT2>
</TAGEXTRACT1>
</TAGEXTRACT>
The tagextract command properly handles nested HTML tags. For example, suppose an HTML page contains two tables, and that each of those tables contains several nested tables. Then the following commands:
<TAGEXTRACT1 tag="table" which="2">
<TAGEXTRACT2 tag="flay">
<TAGEXTRACT3 tag="table" which="2">
<WEBINCLUDE url="http://www.domain.com/">
</TAGEXTRACT3>
</TAGEXTRACT2>
</TAGEXTRACT1>
would first retrieve the page specified in the webinclude command. The innermost tagextract command (i.e., TAGEXTRACT3) would find the first table, skip over it and all of its contents (including any nested tables), and then extract the second major table on the page. The middle tagextract command (i.e., TAGEXTRACT2) would then remove the <table>, </table> tags from the extracted text. The outermost tagextract commands (i.e., TAGEXTRACT1) would then find the second table in the extracted text.