Russian Apache Switch to English
Switch to Russian koi8-r
windows=1251
cp-866
iso8859-5
Russian Apache Now it works Notes Download How to install How to configure Status and support
Brief intro FAQ Mailing list Acknowlegments Site search Powered by Russian Apache
How It Works
This document describes the algorithm used by Apache-RUS to determine the encoding in which the document should be given to the client. In a certain sense, here we repeat the description of configuration directives, but the current order of their description corresponds to the behavior of the server, and detailed explanation of directives is omitted.

The document describes the latest versions (PL20 and PL21). Configuration of older versions (PL16 and earlier), including directive names, is different and described in a separate document. The points where the behavior of old and new versions differs essentially are specially noted.

Hereafter, the terms charset and encoding are used as synonyms.

Preliminary Notes

The main purpose of the recoding module is to perform correct conversion from the "on-disk charset" (the storage encoding) to the "client's charset" (the transfer encoding) when the document is given to the client and to perform the reverse conversion when information is received from the clients (submitted forms, etc.). All possible ways of such a conversion should be described in the server configuration by the directives CharsetDecl (the existence of the code table is declared to the server) and CharsetRecodeTable (conversion from one encoding to another is described). All encodings and recodings available should be described only in the configuration of the server/virtual server. Description of the CharsetDecl and CharsetRecodeTable directives in .htaccess/<Directory> is forbidden by an obvious reason: such a description requires that the server must reinitialize the recoding tables each time this directory is addressed, and a lot of superfluous actions are thus undertaken. All other Charset... directives may be specified wherever desirable.

The storage encoding (the one used for storing files on disk) should be specified (maybe separately for each directory) by the CharsetSourceEnc directive, which describes all files in a directory, or by the CharsetByExtension directive. The latter has a higher priority.

Determination of the Client's Encoding by Accept-Charset:/Accept

If the HTTP headers include the Accept-Charset: SomeCharset or Accept: text/x-cyrillic-SomeCharset header and at least one of the requested charsets is known to the server (that is, described in the CharsetDecl or CharsetAlias directive), the server will send the document in accordance with the charset requested. If the server knows several charsets among the requested ones, the one with the highest priority will be selected. If several charsets in the request have the same maximum prioity, the one mentioned prior to others in the CharsetPriority directive will be chosen. If this directive is absent, the result of choice among these highest-priority charsets is ambiguous.

If the Accept-Charset (Accept) header specifies only charsets that are unknown to the server and does not include the wildcard (*), the server behavior depends on the CharsetErrReject flag. If this flag is set to On, the client will receive an error message; if it is set to Off, the server will try to determine the client's charset using other parameters.

Determination of the client's encoding according to the Accept-Charset header cannot be cancelled completely: the HTTP standard would be violated. However, there are some particular cases when the action of Accept-Charset should be cancelled. For example, Netscape Communicator 4.x in the default configuration sends the "Accept-Charset: iso-8859-1,*,utf-8" header; accordingly, if you have described Charset iso-8859-1, then the user with NC 4.x will always see iso-8859-1. To cancel determination of encoding by Accept-Charset in such specific cases, you may use the CharsetBrokenAccept directive.

Determination of the Client's Encoding by Other Parameters

If the AcceptCharset/Accept headers are absent in the request or the server cannot select the encoding according to them, it will try to determine the user's encoding by three parameters: The order in which these methods work is specified by the CharsetSelectionOrder directive (in versions prior to PL16, you could only "reverse" the order of DirPrefix/UserAgent and cancel charset determination by the hostname prefix or directory prefix).

The required degree of matching between the server hostname/filename prefix and the name/alias of some encoding may be controlled by the CharsetStrictURIMatch directive. In the Off mode (the default one), the server selects the encoding by the hostname/directory if the beginning of the server/directory name matches the name/alias of some charset. In the On mode, the checking is more rigorous: the charset name should coincide with the full name of the server or its host part (for selection by the hostname) and, accordingly, with the full name of the directory (for selection by the directory name).

If the Server Failed

If the server failed to determine the client's encoding, the document will be given to the client in the encoding determined by the CharsetDefault directive. If CharsetDefault is not specified, the charset mentioned as the first one in the CharsetPriority directive will be used.

SSI

Since all directives (except for CharsetDecl and CharsetRecodeTable) may be present wherever desitable, documents may be stored in any mixture of encodings. Some complications may be caused by ServerSideIncludes. The rule is simple: a file (even included via SSI) adheres to the rules of the directory in which it is physically situated.

The HTTP header Content-Type: text/html; charset=...

The server provides the substring "; charset=CharsetName" in the Content-Type: header depending on the CharsetMatchLanguage directive. If it is set to On, charset=... is provided if the following three conditions are simultaneously satisfied:
  1. The client's browser is not a Bad Agent
  2. The MultiViews option (support of multilingual representation) is set to On
  3. The document language described by the AddLanguage directive is the same is the language of the charset, which is described by the CharsetDecl directive.
If the CharsetMatchLanguage option is set to Off, then charset=... is provided for all documents.




Спонсоры сайта:

[ Russian Apache ] [ How it works ] [ Some notes ] [ Download ] [ How to install ] [ How to setup ] [ Status & support ]
[ Introduction ] [ FAQ ] [ Mailing list ] [ Acknowledgments ] [ Search ] [ Powered by Russian Apache ]

"Russian Apache" includes software developed by the Apache Group for use in the Apache HTTP server project (http://www.apache.org/)
See Apache LICENSE.
Copyright (C) 1995-2001 The Apache Group. All rights reserved.
Copyright (C) 1996 Dm. Kryukov; Copyright (C) 1997-2009 Alex Tutubalin. Design (C) 1998 Max Smolev.