mod_uid.c version 1.0
a module issuing the "correct" cookies for counting the
site visitors
Contents
- Copyright
- Purpose
- Installation
- Configuration
- Cookie format
- What can be written to the log
- Why not mod_usertrack
- TODO
Copyright
Copyright (C) 2000-2002 Alex Tutubalin, lexa@lexa.ru
May be distributed and used in derived products under the conditions
analogous to the
Apache License:
the author's copyright and the reference to
http://www.lexa.ru/lexa
must be preserved, and
the derived product should not be called mod_uid.
A prototype of this module was written by the author when he was
working at Rambler Co.; the
present version has been significantly modified.
The author is grateful to Dmitry Khrustalev for valuable advice.
Description
The standard distribution of Apache does not provide adequate means for
user tracking (for problems associated with mod_usertrack, see
below), and this module provides them.
What it actually does:
- if the user has provided the cookie header with the correct cookie-name,
the module writes this cookie in notes with the name uid_got
(accordingly, then it may be written to the log);
- if the user has arrived without the required cookie, the module
issues the SetCookie header for him/her and writes the cookie thus issued
in notes with the name uid_set (and this may also be written
to the log);
- if built-in P3P support is included, the P3P header is also
issued as the Set-Cookie header is issued.
Advantages:
- the cookie contains the date it is issued and the "service number"
(that is, the number specified during configuring); thus, it helps one
understand when the user first arrived at our site and where exactly
he/she arrived;
- multiserver work is supported: under accurate configuring
(or its total absence ;), it is guaranteed that the cookie issued
to the user will be unique;
- the cookie issued to the user and the one received from him/her are
not mingled in the log file;
- the cookies are 128 bit long, and one may work with them in the
log analyzer (quick search etc.) using ready source code intended for
working with IPv6 (for example, libpatricia);
- support of P3P (minimal) is provided.
Installation
While configuring Apache, add the following to ./configure parameters:
--add-module=/path/to/mod_uid.c:
tar xzvf apache_1.3xxx
tar xzvf mod_uid-1.0.xx.tar.gz
cd apache_1.3xx
./configure --prefix=/usr/local/apache \
... --add-module=../mod_uid_1.0.xx/mod_uid.c other-params
make
make install
Configuration Directives
All the configuration directives may be specified wherever desired:
Server/VirtualServer/Location/... To specify them in .htaccess,
one should allow AllowOverride FileInfo (or All).
- UIDActive On/Off
- Cookie issue turned on/off.
If set to "off", the
cookies received from the client are decoded all the same and may be
written to the log.
Default: On
- UIDCookieName string
- Cookie name (default - uid).
The name of the cookie issued to the client. Should not
match any other name(s) used at the site.
- UIDService number
- The "service number" is a strictly positive (nonzero)
unique number identifying the given server in the cluster or
the given document or document set.
This number is used for two purposes:
- If several servers are used within one domain
(with the same cookie parameter domain=)
or with one hostname, then the use of different
UIDService numbers guarantees that the cookies issued
by different servers will be unique.
- The use of different UIDService numbers for
different parts of the server makes it possible to reveal
(by log analysis) which of the parts was first visited by
the client.
Default: server IP address.
- UIDDomain .domain.name
- Name of the domain for which the cookie is issued
In multiserver configurations, this directive makes it
possible to have a common cookie namespace for all the servers
(for example, mail.rambler.ru, www.rambler.ru, and info.rambler.ru
use the .rambler.ru domain)
If domain= has to be set to "off" for a certain document
set but stay "on" for the server as a whole, one should use
UIDDomain none
in the corresponding config
section (Location/Directory/...).
Default: no domain; that is, the user's browser will
return the cookie only to the originating server.
- UIDPath string
- The path for which the cookie is issued (parameter path= in
Set-Cookie:)
Default: /
- UIDExpires number
-
Sets the expiration date for the cookie.
UIDExpires number
- number of seconds to be
added to the current time.
UIDExpires plus 3 year 4 month 2 day 1 hour 15
minutes
- the same expressed in normal human language.
Default: current date plus 10 years.
- UIDP3P On/Off/Always
- Controls if the P3P header is issued together with the
cookie.
Variants:
- Off - P3P header is not issued;
- On - issued only if the domain parameter
is issued for the cookie;
- Always - always issued (i.e. even without domain).
Default: Off.
This directive is required for satisfying MS IE6+ in the
multiserver configuration and, for example, for including
the "counter" code from another server in the page. In case
the cookie is issued without domain= or domain
includes the current server name for the main document, MS IE6+
with default settings will be satisfied all the same; however,
the cookies may be suppressed for compound documents collected from
different servers.
mod_uid issues only the P3P header (by default,
only with compact policy); support of /w3c/p3p.xml and the like
is up to the owner of the server.
The P3P header is issued only if mod_uid issues the Set-Cookie
header; that is, if you have to issue other cookies as well and
also need P3P for them, the problem of P3P issuing should be solved
separately and independently.
- UIDP3PString string
- Text of the P3P header sent to the client.
Default: CP="NOI PSA OUR BUS UNI"
Cookie Format
The cookie format in the binary form is
unsigned int cookie[4], where
cookie[0] is the "service number" (specified via UIDService);
cookie[1] is the issue time (unix time);
cookie[2] is the pid of the process that issued the cookie;
cookie[3] contains a unique sequencer within the limits of the process
(upper 24 bits, starting value 0x030303) and
the cookie version number (lower 8 bits, now equal to 2).
These 128 bits are converted with respect for the network byte order,
encoded (base64) and sent to the client. (In ver. 1, everything was sent
in the host order, and support of server clusters with different
architectures was thus complicated.)
Uniqueness
Evidently, only insurance can fully guarantee anything. ;) And if
more than 2^128 cookies are issued within a single domain, some of them
will be duplicate. However, the cookie format was developed in such
a way that the cookies must be unique if their number is reasonable.
- If the "service number" is unique (each server has its own)
within the given domain, different servers will surely issue
different cookies.
- Inclusion of the issue time and pid in the cookie implies
that pids of different processes are not duplicated during
one second. This is true for all UNIX systems I know: pids
monotonically increase up to a certain maximum (2^16 or higher). That
is, cookie[1]/cookie[2] may be duplicated within one server if more
than 2^16 fork() is done per second, which is hardly possible
in the present state of matters.
- The sequencer (the upper 24 bits in cookie[3]) enables one
to verify the uniqueness of the cookie within one process during one
second. The capacity of the sequencer makes it possible to issue
up to 1.0E+07 cookies per second by one process.
What can be written to the log
mod_uid writes one of the following two values to "notes":
- if a cookie was received from the client, it is placed in note
uid_got;
- if a cookie was sent to the client, it is placed
in note uid_set.
Cookies are logged as four 32-bit hexadecimal numbers in the host order
(in ver. 2, a network-host conversion is performed; in ver. 1,
everything is saved "as is" under the assumption that the server
architecture did not change since the cookie had been issued).
In LogFormat, these notes may be used in the form of \"%{uid_got}n\"
and \"%{uid_set}n\", respectively.
Using LogFormat of the type
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{uid_got}n\" \"%{uid_set}n combined_cookie
we'll have approximately this kind of log entries:
Cookie sent to the client:
62.104.212.93 - - [05/Jan/2002:00:02:06 +0300] "GET / HTTP/1.0" 200
13487 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x
4.90)" "-" "ruid=000000013C36184E00009A2100002901"
Cookie received from the client:
216.136.145.172 - - [05/Jan/2002:00:14:59 +0300] "GET /buttons/but-support-e.gif
HTTP/1.0" 200 252 "http://apache.lexa.ru/english/meta-http-eng.html"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
"ruid=000000013C361B5000009A0100009501" "-"
Such a format is easily understood by widespread log analyzers, including
Webtrends, which nicely counts visitors according to such a log.
Why not mod_usertrack from the Apache distribution?
Because it has several drawbacks:
- it does not strictly guarantee that the same cookie will not
be issued to two users, although, of course, the probability of
such an event is minimized due to consideration of getpid(),
remote_ip, and time up to milliseconds;
- it does not support multiserver work, and the probability of
issuing identical cookies increases in this case;
- one might wish to see the cookie sent to the user also in the log,
and see it separately, whereas mod_usertrack mingles them;
- one might wish to see the "service number" (see above) in order
to understand which of our services was visited by the user during his
first visit.
TODO
-
Support of various formats (Netscape/Cookie/Cookie2, as in
mod_usertrack), but only if it becomes really necessary -
and so far I haven't noticed any such necessity.
- There is a vague suspicion that the sequencer increment
should be surrounded with mutexes at multithread-apache and
multiprocessor computers.