[an error occurred while processing the directive]
Some Recommendations

for ver. PL18 and newer.
This document contains some examples of the most widespread configurations used in practice. Actually, it is not exactly necessary, because all the details are mentioned in the description of configuration directives and in the description of the server's functioning principles. However, if some ready examples are provided, the number of e-mails from those asking for help significantly decreases.

This document has been changed to make allowance for new configuration directives and new behavior of the server that appeared in vers. PL18-PL24. The users of old versions (PL12-PL16) may use the previous version of this document, although it is somewhat hard to understand.

All the configuration examples given below have the same general idea:

  1. There is the "main" server (www.company.ru), which automatically recognizes the client's encoding by the Accept-Charset: and User-Agent: headers. As a rule, it will recognize the encoding correctly, but the resultant documents will not be cacheable (because the same URL may correspond to different contents in different situations), and work with slow channels becomes more difficult.
  2. Each document of the main server corresponds to several (depending on the number of encodings supported) "virtual URLs." The contents of these URLs are identical, and the encoding is independent of the User-Agent: header (but depends on Accept-Charset:).
    Such virtual URLs may be created in several ways: virtual directories, virtual servers with different hostnames, and virtual servers corresponding to different port numbers. Each way has its own shortcomings and advantages, and the selection of the appropriate method is fully up to the administrator. In all the three cases, the document received with a URL with an "explicitly specified encoding" may be cached by both HTTP/1.0- and HTTP/1.1-compatible caches (that is, the Expires: header will be absent, and the Vary: header will contain only Accept-Charset).

An example of the main server configuration is included in the Russian Apache distribution. Hereafter, this document describes only the changes and additions to the configuration that are appropriate in this or that case.

Selection of Encoding by Virtual Directory

When the encoding is selected in this way, there are several URLs corresponding to each document:

  1. http://www.domain/file.html - automatic selection of encoding;
  2. http://www.domain/charset-name/file.html (for example, http://www.domain/koi8-r/file.html) - the encoding is specified by the directory prefix.
For the server to work in this mode, one should either create preudo-directories as symlinks:
ln -s /www/root/htdocs /www/root/htdocs/koi8-r
etc
  
or use the Alias directive:
Alias /koi8-r /your/www/root/htdocs
# etc
  

The principal advantage of this method is its simplicity, and the principal shortcoming is that all references between documents should be specified using the relative path. That is, instead of <A HREF="/some">, we have to write something like <A HREF="../../../some">. The reason is as follows: if we pass to /some from (for example) the document /koi8-r/file.html, the server will switch to automatic recoding. To preserve explicit choice of encoding, we should pass to /koi8-r/some, that is, to ../some. The required degree of matching between the name of the encoding and the directory name is specified by the CharsetStrictURIMatch directive.

Selection of Encoding by Virtual Server Name

If the encoding is explicitly specified in this way, the corresponding URLs will be as follows;

  1. http://www.domain/file.html - automatic selection of encoding
  2. http://another-host.domain/file.html (for example, http://win-www.domain/file.html) - the encoding is specified by the hostname prefix and/or by <VirtualHost> configuration.
There are at least two ways of configuring the server for working in this mode:
  1. If the first characters (or the host part - see the CharsetStrictURIMatch directive) of the hostname match the name or alias of some charset, this charset will be selected (if the default settings of CharsetSelectionOrder are used). An example of such a configuration:
    <VirtualHost win-www.domain.ru>
    # generally speaking, no additional directives are required
    </VirtualHost>
        
  2. In principle, the name of the virtual host may be arbitrary. In this case, the following configuration may be used:
    <VirtualHost any.domain.ru>
    CharsetSelectionOrder
    CharsetDefault windows-1251 # for example
    </VirtualHost>
        
    In this example, the CharsetSelectionOrder directive is empty; that is, none of the ways suitable for automatic determination of the encoding (Portnumber, Hostname, Dirprefix, Useragent) are to be used, and the document is sent in the default encoding. But, if the client's program included the Accept-Charset header in the request and the server is familiar with this charset, the client's requirement will be satisfied. Any other behavior is incompatible with the HTTP standards.

Virtual Servers and DNS

If you expect HTTP/1.0-compatible clients to visit your server, each virtual server should have its own IP address. When the protocol is HTTP/1.1, separate IP addresses are not necessary, because each request always contains the Host: header, which informs the server as to which virtual server is being addressed by the client. The Host: header is also provided by some HTTP/1.0-clients, but this is not their indispensable property.

Therefore, to use the virtual server mechanism, the DNS zone for the domain must contain records of the A type, and the reverse zone (address->hostname) must contain correct records of the PTR type. A correct reverse zone is necessary for correct processing of HTTP/1.0 requests (without Host) by Apache-1.2.x. Versions 1.1.x were less sensitive to this requirement. Also, Apache-1.2.x (both with and without patches) always addresses the DNS during startup (or at the moment of reconfiguration); therefore, the DNS records should be correct at the moment of the server startup (I wouldn't write so much about this if these problems were not so widespread).

If you want to know more about the specific features of Apache's work with virtual servers, you may go directly to the WWW server of the Apache project.

Advantages and Disadvantages of the Scheme with <VirtualHost>

The advantages are evident: all references (<A HREF=) in a document may be absolute. The disadvantages are also evident: while HTTP/1.0-compatible clients that do not provide the Host: header still exist, each virtual server needs its own IP address and correct configuration of zones in the DNS (that is, either access to the DNS or access to an administrator who understands the problem is needed). This problem is not difficult for an "official webmaster" but may become serious for administrators of personal servers.

Selection of Encoding According to Port Number

When the encoding is explicitly specified in this way, the corresponding URLs will be as follows:

  1. http://www.domain/file.html - automatic selection of encoding
  2. http://www.domain:port/file.html (for example, http://www.domain:8001/file.html) - the encoding is specified by the port number in the URL (the correspondence is established by the CharsetByPort directive and/or by the <VirtualHost> configuration).
For the server to "listen" (listen(2)) to more than one TCP port, use the Listen Portnumber directive, for example
Listen 80
Listen 8100
Listen 8101
# and so on.
  
The Listen 80 directive is necessary, because the presence of the Listen directive cancels the Port directive (to put it more precisely, if Listen is present, Port is responsible only for the contents of the CGI variable SERVER_PORT).

There are at least two ways for associating some definite encoding with a port number:

  1. The use of the CharsetByPort directive:
    Listen 80
    Listen 8100
    Listen 8101
    CharsetByPort koi8-r 8100
    CharsetByPort windows-1251 8101
    CharsetSelectionOrder Portnumber Hostname Dirprefix Useragent #default value
        
    When the server in configured like this, the encoding will be chosen according to the port number for all virtual servers.
  2. The encoding may be directly specified in the description of the virtual server:
    Listen 80
    Listen 8100
    Listen 8101
    <VirtualHost some.domain:8100>
    CharsetDefault koi8-r
    CharsetSelectionOrder # again an empty directive, see above
    </VirtualHost>
        
    In both cases, one can redefine the behavior for each pair hostname:port. Moreover, all the directives may also be encountered in .htaccess files. That is, one has virtually infinite possibilities for creating a complex-structure server.

Specificity of Work with Recoding According to Port Number

  1. When one addresses a "directory" with the trailing slash absent from the URL (for example, http://www.server.ru/directory), mod_dir from Apache-1.2.x redirects this request to http://www.server.ru/directory/. It would be OK, but, if the server is not configured fully enough, the code from mod_dir does not provide the correct port number. For example, if the client addresses http://www.server.ru:8000/directory, a redirect to http://www.server.ru/directory/ will take place, which is unacceptable. This behavior may be "cured" by several methods: Probably the first of the above methods is the simplest one.
    They promised to change all the logic of working with ports in Apache-1.3. We will see.
  2. Directives of the Redirect class (Redirect, RedirectTemp, RedirectPermanent) require that their last parameter should be the full URL to which documents should be redirected. Evidently, if recoding according to the port number takes place, the use of Redirect* is inconvenient or impossible. Instead of these directives, you may use the CharsetSoftRedirect directive. Its last argument is the URL in relation to ServerRoot; that is, the server hostname and port number are substituted at runtime and will be correct.

Advantages of This Scheme:

  1. All references within the server may be absolute.
  2. IP addresses (which are scarce) are not wasted.
Disadvantages:
  1. When you work via a filtering firewall (packet filter), you have to open more ports than usually. This may become a problem if the WWW administrator and the firewall admin are different persons.
  2. According to tradition, WWW servers with explicit recoding according to port number use ports with numbers greater than 1024 (as a rule, 8000-800x, 8080-8080x, etc.). In UNIX, even a nonprivileged user can open such a port, and some security problems are possible. That is, a potential attacker can wait until the main server is off and start his own fake server at the same ports. This problem is not dramatic at all, but one should not forget about it (the workaround is to select ports with the numbers smaller than 1024).
Selection of Encoding by Port Number Used Together with the <VirtualHost> Directive

It turned out that some administrators of WWW servers often encounter the same problem. When they want to work according to the scheme where

everything is normal if they address the default port, but, if they address any other port, they see the contents of the main server (though in the desired encoding).

This behavior is not a characteristic property of Russian Apache; it is a feature of Apache-1.2.x, which is described in the documentation to the original Apache. To achieve the desired behavior, one should use the <VirtualHost> directive in the form

<VirtualHost www.some.domain:*>
  
(because, by default, the <VirtualHost> directive concerns only the port specified by the Port directive). See also the above notes.

Attention! Apache-1.2.0 . . . 1.2.5 contains an error concerning its work with "Host: -based" virtual hosts (i.e., virtual hosts working at the same IP address). As a result, the constructions <VirtualHost host.domain> and <VirtualHost host.domain:*> are equivalent; that is, <VirtualHost> does not work with ports that are different from the one specified by the Port directive. This is clearly an error; its description and the necessary patch have already been sent to the Apache Team. But, while is has not yet been removed from the original version, you may use Russian Apache PL20.5 (or newer), where this error is removed.

Which Way Is Better?

In my opinion, recoding according to the directory prefix is the most convenient method for small (as to their contents) servers, where the documents are controlled by one or two persons. In this case, the requirement of relative references is easy to satisfy.
Recoding according to the name of the virtual server is convenient when there is actually one server (as far as contents are concerned).
Recoding according to the port number is appropriate when there are several servers with different contents coexisting in one computer. In this case, the contents will be chosen according to the name of the virtual server, and the required recoding will be selected by the port number.

Other Notes
Language Negotiation
In the previous version of this document, I recommended you to use the language negotiation mechanism, which is present in Apache. This recommedation is still valid: if you use the default configuration, the server provides the charset=... string in the Content-Type header only if the document language (described by the AddLanguage directive) is the same as the language described for this charset by the CharsetDecl directive (to put it more precisely, the configuration file in the distribution package suggests the opposite default behavior: charset=... is provided for all documents). To change this behavior, you may use the CharsetMatchLanguage directive. At the same time, the use of the language negotiation mechanism entails some problems, which should be mentioned here.
  1. By default, Netscape Communicator 4.0x sends the
    Accept-Language: en
          
    header. If you address a document with the name /some/file.html, and this document is absent, but the document /some/file.html.ru is present, the server will inform you (in pure English) that the document with the requested language is absent but a document with language=ru is present. This behavior is completely correct from the standpoint of the standard, but one should not expect all users of Netscape Communicator to have an urgent desire to reconfigure it to Accept-Language: ru, although such a possibility is available in this program.
  2. The ScriptAlias directive in Apache-1.2 is incompatible with the language negotiation mechanism, and Option MultiViews does not work with it. There is a simple substitution for this directive. Instead of
    ScriptAlias /cgi-bin/ /where/your/cgi-bin
    <Directory /where/your/cgi-bin>
    AllowOverride None
    Options None MultiViews
    </Directory>
          
    (as in the original configuration of Apache-1.2), you should configure the server as follows:
    Alias /cgi-bin/ /where/your/cgi-bin
    <Directory /where/your/cgi-bin>
    AllowOverride None
    Options ExecCGI MultiViews
    SetHandler cgi-script
    </Directory>
          
  3. Documents that have only a language-specific extension (for example, README.ru) are not the subject of language negotiation; that is, a request like GET /README with the language 'ru' will not be satisfied. There is a similar problem with documents for which Content-Type is not described (that is, their extensions are not mentioned in the mime.types file or in the AddType directives).

Content-Type and CGI Scripts
The server itself inserts the charset=CHARSET_NAME string when the document is finally given to the client. Therefore, you need not worry about this.

Russian Filenames
In general, the use of Russian filenames is a potential source of various strange phenomena. There are several reasons (I will not explain them now) why handling of these filenames causes difficulties for Russian Apache. After some discussion, the community decided that the simplest solution would be to skip recoding of filenames during request processing. In earlier versions, the default behavior was different (filenames were recoded, but too late); so, we had to introduce a specifal directive CharsetRecodeFilenames. Thus, to create files with Russian names, you should act as follows: And, in general, Russian filenames cause more problems than they resolve.

If you have any notes, changes, or additions concerning this document, send them directly to its author.

[an error occurred while processing the directive]