[an error occurred while processing the directive]
Some Recommendations
for ver. PL18 and newer.
This document contains some examples of the most widespread
configurations used in practice. Actually, it is not exactly necessary, because all the details
are mentioned in the description of configuration directives
and in the description of the server's functioning principles.
However, if some ready examples are provided, the number of e-mails from those asking for help
significantly decreases.
This document has been changed to make allowance for new configuration directives and new
behavior of the server that appeared in vers. PL18-PL24. The users of old versions
(PL12-PL16) may use the previous version of this document,
although it is somewhat hard to understand.
All the configuration examples given below have the same general idea:
- There is the "main" server (www.company.ru), which automatically recognizes the
client's encoding by the Accept-Charset: and User-Agent: headers. As a rule, it will recognize the encoding
correctly, but the resultant documents will not be cacheable (because the same URL may correspond
to different contents in different situations), and work with slow channels becomes more difficult.
- Each document of the main server corresponds to several (depending on the number of encodings
supported) "virtual URLs." The contents of these URLs are identical, and
the encoding is independent of the User-Agent: header (but depends on Accept-Charset:).
Such virtual URLs may be created in several ways:
virtual directories,
virtual servers with different hostnames, and
virtual servers corresponding to different port numbers.
Each way has its own shortcomings and advantages, and the selection of the appropriate method
is fully up to the administrator. In all the three cases, the document received with a URL
with an "explicitly specified encoding" may be cached
by both HTTP/1.0- and HTTP/1.1-compatible caches (that is, the Expires: header
will be absent, and the Vary: header will contain only Accept-Charset).
An example of the main server configuration is included in the Russian Apache distribution.
Hereafter, this document describes only the changes and additions to the configuration
that are appropriate in this or that case.
Selection of Encoding by Virtual Directory
When the encoding is selected in this way, there are several URLs corresponding to each document:
- http://www.domain/file.html - automatic selection of encoding;
- http://www.domain/charset-name/file.html
(for example, http://www.domain/koi8-r/file.html) - the encoding is specified by the directory prefix.
For the server to work in this mode, one should either create preudo-directories as symlinks:
ln -s /www/root/htdocs /www/root/htdocs/koi8-r
etc
or use the Alias directive:
Alias /koi8-r /your/www/root/htdocs
# etc
The principal advantage of this method is its simplicity, and the principal shortcoming is that
all references between documents should be specified using the relative path.
That is, instead of <A HREF="/some">, we have to write something like
<A HREF="../../../some">. The reason is as follows: if we pass to /some
from (for example) the document /koi8-r/file.html, the server will switch to automatic recoding.
To preserve explicit choice of encoding, we should pass to /koi8-r/some, that is, to ../some.
The required degree of matching between the name of the encoding and the directory name
is specified by the CharsetStrictURIMatch directive.
Selection of Encoding by Virtual Server Name
If the encoding is explicitly specified in this way, the corresponding URLs will be as follows;
- http://www.domain/file.html - automatic selection of encoding
- http://another-host.domain/file.html
(for example, http://win-www.domain/file.html) - the encoding is specified by the
hostname prefix and/or by <VirtualHost> configuration.
There are at least two ways of configuring the server for working in this mode:
- If the first characters (or the host part - see the
CharsetStrictURIMatch directive)
of the hostname match the name or alias of some charset, this charset will be selected
(if the default settings of
CharsetSelectionOrder are used).
An example of such a configuration:
<VirtualHost win-www.domain.ru>
# generally speaking, no additional directives are required
</VirtualHost>
- In principle, the name of the virtual host may be arbitrary. In this case, the following
configuration may be used:
<VirtualHost any.domain.ru>
CharsetSelectionOrder
CharsetDefault windows-1251 # for example
</VirtualHost>
In this example, the
CharsetSelectionOrder directive is empty; that is, none of the ways suitable for automatic
determination of the encoding (Portnumber, Hostname, Dirprefix, Useragent) are to be used,
and the document is sent in the default encoding. But, if the client's program included the
Accept-Charset header in the request and the server is familiar with this charset,
the client's requirement will be satisfied. Any other behavior is incompatible with the HTTP standards.
Virtual Servers and DNS
If you expect HTTP/1.0-compatible clients to visit your server, each virtual server should
have its own IP address. When the protocol is HTTP/1.1, separate IP addresses are not necessary,
because each request always contains the Host: header, which informs the server
as to which virtual server is being addressed by the client. The Host: header is also
provided by some HTTP/1.0-clients, but this is not their indispensable property.
Therefore, to use the virtual server mechanism, the DNS zone for the domain must
contain records of the A type, and the reverse zone (address->hostname) must contain correct records
of the PTR type. A correct reverse zone is necessary for correct processing of HTTP/1.0
requests (without Host) by Apache-1.2.x. Versions 1.1.x were less sensitive to this requirement.
Also, Apache-1.2.x (both with and without patches) always addresses the DNS during startup
(or at the moment of reconfiguration); therefore, the DNS records should be correct at the moment
of the server startup (I wouldn't write so much about this if these problems were not so widespread).
If you want to know more about the specific features of Apache's work with virtual servers,
you may go directly to the WWW server of the Apache project.
Advantages and Disadvantages of the Scheme with <VirtualHost>
The advantages are evident: all references (<A HREF=) in a document may be absolute. The
disadvantages are also evident: while HTTP/1.0-compatible clients that do not provide the
Host: header still exist, each virtual server needs its own IP address and correct
configuration of zones in the DNS (that is, either access to the DNS or access to an administrator who
understands the problem is needed). This problem is not difficult for an "official webmaster" but
may become serious for administrators of personal servers.
Selection of Encoding According to Port Number
When the encoding is explicitly specified in this way, the corresponding URLs will be as follows:
- http://www.domain/file.html - automatic selection of encoding
- http://www.domain:port/file.html
(for example, http://www.domain:8001/file.html) - the encoding is specified by the port number
in the URL (the correspondence is established by the CharsetByPort directive
and/or by the <VirtualHost> configuration).
For the server to "listen" (listen(2)) to more than one TCP port,
use the Listen Portnumber directive, for example
Listen 80
Listen 8100
Listen 8101
# and so on.
The Listen 80 directive is necessary, because the presence of the Listen directive
cancels the Port directive (to put it more precisely, if Listen is present, Port
is responsible only for the contents of the CGI variable SERVER_PORT).
There are at least two ways for associating some definite encoding with a port number:
- The use of the
CharsetByPort directive:
Listen 80
Listen 8100
Listen 8101
CharsetByPort koi8-r 8100
CharsetByPort windows-1251 8101
CharsetSelectionOrder Portnumber Hostname Dirprefix Useragent #default value
When the server in configured like this, the encoding will be chosen according to the
port number for all virtual servers.
- The encoding may be directly specified in the description of the virtual server:
Listen 80
Listen 8100
Listen 8101
<VirtualHost some.domain:8100>
CharsetDefault koi8-r
CharsetSelectionOrder # again an empty directive, see above
</VirtualHost>
In both cases, one can redefine the behavior for each pair hostname:port. Moreover,
all the directives may also be encountered in .htaccess files. That is,
one has virtually infinite possibilities for creating a complex-structure server.
Specificity of Work with Recoding According to Port Number
- When one addresses a "directory" with the trailing slash absent from the
URL (for example, http://www.server.ru/directory), mod_dir from
Apache-1.2.x redirects this request to http://www.server.ru/directory/.
It would be OK, but, if the server is not configured fully enough, the code from
mod_dir does not provide the correct port number. For example,
if the client addresses http://www.server.ru:8000/directory, a redirect to
http://www.server.ru/directory/ will take place, which is unacceptable.
This behavior may be "cured" by several methods:
- Apache-1.2.4 RUS PL21.1 and newer contains patches to mod_dir;
as a result, the correct port number is always substituted.
- If virtual hosts are not used, all Port directives should be removed from
Apache's config files, and the server will work correctly.
- If virtual hosts and recoding according to port number are used, each
Host:port pair used in the process should be described by the
<VirtualHost> directive. Like this:
<VirtualHost host1:80>
# you need not write anything here, although at least the DocumentRoot
# directives will have to be duplicated for actual VirtualHosts
</VirtualHost>
<VirtualHost host1:8100>
...
</VirtualHost>
<VirtualHost host2:80>
...
</VirtualHost>
<VirtualHost host2:8100>
...
</VirtualHost>
Probably the first of the above methods is the simplest one.
They promised to change all the logic of working with ports in Apache-1.3. We will see.
- Directives of the
Redirect
class (Redirect, RedirectTemp, RedirectPermanent) require
that their last parameter should be the full URL to which documents should be
redirected. Evidently, if recoding according to the port number takes place, the use of
Redirect* is inconvenient or impossible. Instead of these directives, you may use
the CharsetSoftRedirect directive.
Its last argument is the URL in relation to ServerRoot; that is, the server hostname
and port number are substituted at runtime and will be correct.
Advantages of This Scheme:
- All references within the server may be absolute.
- IP addresses (which are scarce) are not wasted.
Disadvantages:
- When you work via a filtering firewall (packet filter), you have to open more ports than usually.
This may become a problem if the WWW administrator and the firewall admin are different persons.
- According to tradition, WWW servers with explicit recoding according to port number
use ports with numbers greater than 1024 (as a rule, 8000-800x, 8080-8080x, etc.). In UNIX,
even a nonprivileged user can open such a port, and some security problems are possible. That is,
a potential attacker can wait until the main server is off and start his own fake server at the
same ports. This problem is not dramatic at all, but one should not forget about it (the workaround
is to select ports with the numbers smaller than 1024).
Selection of Encoding by Port Number Used Together with the
<VirtualHost> Directive
It turned out that some administrators of WWW servers often encounter the same problem.
When they want to work according to the scheme where
- the client's encoding is specified by the port number and
- <VirtualHost> is used for its original purpose (placement of two virtual servers
at the same computer),
everything is normal if they address the default port, but, if they address any other port,
they see the contents of the main server (though in the desired encoding).
This behavior is not a characteristic property of Russian Apache; it is a feature of
Apache-1.2.x, which is described in the documentation to the original Apache. To
achieve the desired behavior, one should use the <VirtualHost> directive in the form
<VirtualHost www.some.domain:*>
(because, by default, the <VirtualHost> directive concerns only the port specified by the
Port directive). See also the above notes.
Attention! Apache-1.2.0 . . . 1.2.5 contains an error concerning its work with
"Host: -based" virtual hosts (i.e., virtual hosts working at the same IP address).
As a result, the constructions <VirtualHost host.domain> and <VirtualHost host.domain:*>
are equivalent; that is, <VirtualHost> does not work with ports that are different from the one
specified by the Port directive. This is clearly an error; its description and the necessary patch
have already been sent to the Apache Team. But, while is has not yet been removed from the
original version, you may use Russian Apache PL20.5 (or newer), where this error is removed.
Which Way Is Better?
In my opinion, recoding according to the directory prefix
is the most convenient method for small (as to their contents) servers, where the documents
are controlled by one or two persons. In this case, the requirement of relative references
is easy to satisfy.
Recoding according to the name of the virtual server is convenient when there is
actually one server (as far as contents are concerned).
Recoding according to the port number is appropriate when there are several servers with
different contents coexisting in one computer. In this case, the contents will be chosen according to
the name of the virtual server, and the required recoding will be selected by the port number.
Other Notes
- Language Negotiation
-
In the previous version of this document,
I recommended you to use the language negotiation mechanism, which is
present in Apache. This recommedation is still valid: if you use the default configuration,
the server provides the charset=... string in the Content-Type header only if
the document language (described by the AddLanguage directive) is the same
as the language described for this charset by the
CharsetDecl directive
(to put it more precisely, the configuration file in the distribution package
suggests the opposite default behavior: charset=... is provided for all documents).
To change this behavior, you may use the
CharsetMatchLanguage directive.
At the same time, the use of the language negotiation mechanism entails some problems,
which should be mentioned here.
- By default, Netscape Communicator 4.0x sends the
Accept-Language: en
header. If you address a document with the name /some/file.html, and this document
is absent, but the document /some/file.html.ru is present, the server will inform you
(in pure English) that the document with the requested language is absent but a
document with language=ru is present. This behavior is completely correct from the
standpoint of the standard, but one should not expect all users of
Netscape Communicator to have an urgent desire to reconfigure it to
Accept-Language: ru, although such a possibility is available in this program.
- The ScriptAlias directive in Apache-1.2 is incompatible with the
language negotiation mechanism, and Option MultiViews does not work with it.
There is a simple substitution for this directive. Instead of
ScriptAlias /cgi-bin/ /where/your/cgi-bin
<Directory /where/your/cgi-bin>
AllowOverride None
Options None MultiViews
</Directory>
(as in the original configuration of Apache-1.2), you should configure the server as follows:
Alias /cgi-bin/ /where/your/cgi-bin
<Directory /where/your/cgi-bin>
AllowOverride None
Options ExecCGI MultiViews
SetHandler cgi-script
</Directory>
- Documents that have only a language-specific extension (for example, README.ru)
are not the subject of language negotiation; that is, a request like GET /README
with the language 'ru' will not be satisfied. There is a similar problem with documents
for which Content-Type is not described (that is, their extensions are not mentioned in the
mime.types file or in the AddType directives).
-
Content-Type and CGI Scripts
-
The server itself inserts the charset=CHARSET_NAME string when the document
is finally given to the client. Therefore, you need not worry about this.
- Russian Filenames
-
In general, the use of Russian filenames is a potential source of various strange phenomena.
There are several reasons (I will not explain them now) why handling of these filenames
causes difficulties for Russian Apache.
After some discussion, the
community
decided that the simplest solution would be to skip recoding of filenames during
request processing. In earlier versions, the default behavior was different
(filenames were recoded, but too late); so, we had to introduce a specifal directive
CharsetRecodeFilenames.
Thus, to create files with Russian names, you should act as follows:
- Cancel recoding of filenames by the directive
CharsetRecodeFilenames Off.
- In the filenames, encode all letters that do not belong to US-ASCII as %aa
(that is, you will substitute פאיכ.html by %c6%c1%ca%cc.html).
- If you are using a CGI with Russian names, the correct name of the script
will be specified by the $SCRIPT_NAME variable, and $REQUEST_URI will be totally recoded.
That is, to make a "self-reference", one should write
${SCRIPT_NAME}?${QUERY_STRING} rather than ${REQUEST_URI}.
And, in general, Russian filenames cause more problems than they resolve.
If you have any notes, changes, or additions concerning this document, send them
directly to its author.
[an error occurred while processing the directive]