mod_uid.c version 1.0

a module issuing the "correct" cookies for counting the site visitors

Contents

  1. Copyright
  2. Purpose
  3. Installation
  4. Configuration
  5. Cookie format
  6. What can be written to the log
  7. Why not mod_usertrack
  8. TODO

Copyright

Copyright (C) 2000-2002 Alex Tutubalin, lexa@lexa.ru

May be distributed and used in derived products under the conditions analogous to the Apache License: the author's copyright and the reference to http://www.lexa.ru/lexa must be preserved, and the derived product should not be called mod_uid.

A prototype of this module was written by the author when he was working at Rambler Co.; the present version has been significantly modified.

The author is grateful to Dmitry Khrustalev for valuable advice.

Description

The standard distribution of Apache does not provide adequate means for user tracking (for problems associated with mod_usertrack, see below), and this module provides them.

What it actually does:

Advantages:

Installation

While configuring Apache, add the following to ./configure parameters: --add-module=/path/to/mod_uid.c:
tar xzvf apache_1.3xxx
tar xzvf mod_uid-1.0.xx.tar.gz
cd apache_1.3xx
./configure --prefix=/usr/local/apache \
... --add-module=../mod_uid_1.0.xx/mod_uid.c other-params
make
make install

Configuration Directives

All the configuration directives may be specified wherever desired: Server/VirtualServer/Location/... To specify them in .htaccess, one should allow AllowOverride FileInfo (or All).

UIDActive On/Off
Cookie issue turned on/off.
If set to "off", the cookies received from the client are decoded all the same and may be written to the log.
Default: On

UIDCookieName string
Cookie name (default - uid).
The name of the cookie issued to the client. Should not match any other name(s) used at the site.

UIDService number
The "service number" is a strictly positive (nonzero) unique number identifying the given server in the cluster or the given document or document set.
This number is used for two purposes:
  1. If several servers are used within one domain (with the same cookie parameter domain=) or with one hostname, then the use of different UIDService numbers guarantees that the cookies issued by different servers will be unique.
  2. The use of different UIDService numbers for different parts of the server makes it possible to reveal (by log analysis) which of the parts was first visited by the client.
Default: server IP address.

UIDDomain .domain.name
Name of the domain for which the cookie is issued
In multiserver configurations, this directive makes it possible to have a common cookie namespace for all the servers (for example, mail.rambler.ru, www.rambler.ru, and info.rambler.ru use the .rambler.ru domain)
If domain= has to be set to "off" for a certain document set but stay "on" for the server as a whole, one should use UIDDomain none in the corresponding config section (Location/Directory/...).
Default: no domain; that is, the user's browser will return the cookie only to the originating server.

UIDPath string
The path for which the cookie is issued (parameter path= in Set-Cookie:)
Default: /

UIDExpires number
Sets the expiration date for the cookie.
UIDExpires number - number of seconds to be added to the current time.
UIDExpires plus 3 year 4 month 2 day 1 hour 15 minutes - the same expressed in normal human language.
Default: current date plus 10 years.

UIDP3P On/Off/Always
Controls if the P3P header is issued together with the cookie.
Variants: Default: Off.
This directive is required for satisfying MS IE6+ in the multiserver configuration and, for example, for including the "counter" code from another server in the page. In case the cookie is issued without domain= or domain includes the current server name for the main document, MS IE6+ with default settings will be satisfied all the same; however, the cookies may be suppressed for compound documents collected from different servers.
mod_uid issues only the P3P header (by default, only with compact policy); support of /w3c/p3p.xml and the like is up to the owner of the server.
The P3P header is issued only if mod_uid issues the Set-Cookie header; that is, if you have to issue other cookies as well and also need P3P for them, the problem of P3P issuing should be solved separately and independently.

UIDP3PString string
Text of the P3P header sent to the client.
Default: CP="NOI PSA OUR BUS UNI"

Cookie Format

The cookie format in the binary form is unsigned int cookie[4], where
These 128 bits are converted with respect for the network byte order, encoded (base64) and sent to the client. (In ver. 1, everything was sent in the host order, and support of server clusters with different architectures was thus complicated.)

Uniqueness

Evidently, only insurance can fully guarantee anything. ;) And if more than 2^128 cookies are issued within a single domain, some of them will be duplicate. However, the cookie format was developed in such a way that the cookies must be unique if their number is reasonable.
  1. If the "service number" is unique (each server has its own) within the given domain, different servers will surely issue different cookies.
  2. Inclusion of the issue time and pid in the cookie implies that pids of different processes are not duplicated during one second. This is true for all UNIX systems I know: pids monotonically increase up to a certain maximum (2^16 or higher). That is, cookie[1]/cookie[2] may be duplicated within one server if more than 2^16 fork() is done per second, which is hardly possible in the present state of matters.
  3. The sequencer (the upper 24 bits in cookie[3]) enables one to verify the uniqueness of the cookie within one process during one second. The capacity of the sequencer makes it possible to issue up to 1.0E+07 cookies per second by one process.

What can be written to the log

mod_uid writes one of the following two values to "notes":
  1. if a cookie was received from the client, it is placed in note uid_got;
  2. if a cookie was sent to the client, it is placed in note uid_set.
Cookies are logged as four 32-bit hexadecimal numbers in the host order (in ver. 2, a network-host conversion is performed; in ver. 1, everything is saved "as is" under the assumption that the server architecture did not change since the cookie had been issued). In LogFormat, these notes may be used in the form of \"%{uid_got}n\" and \"%{uid_set}n\", respectively.
Using LogFormat of the type
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{uid_got}n\" \"%{uid_set}n  combined_cookie
we'll have approximately this kind of log entries:
Cookie sent to the client:
62.104.212.93 - - [05/Jan/2002:00:02:06 +0300] "GET / HTTP/1.0" 200
13487 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x
4.90)" "-" "ruid=000000013C36184E00009A2100002901" 

Cookie received from the client:
216.136.145.172 - - [05/Jan/2002:00:14:59 +0300] "GET /buttons/but-support-e.gif
 HTTP/1.0" 200 252 "http://apache.lexa.ru/english/meta-http-eng.html" 
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" 
"ruid=000000013C361B5000009A0100009501" "-" 
Such a format is easily understood by widespread log analyzers, including Webtrends, which nicely counts visitors according to such a log.

Why not mod_usertrack from the Apache distribution?

Because it has several drawbacks:

TODO

  1. Support of various formats (Netscape/Cookie/Cookie2, as in mod_usertrack), but only if it becomes really necessary - and so far I haven't noticed any such necessity.
  2. There is a vague suspicion that the sequencer increment should be surrounded with mutexes at multithread-apache and multiprocessor computers.