GVU Technical Report Number:
GIT-GVU-97-13
Title:
In Search of: Reliable Usage Data on the World Wide Web
Authors:
James E. Pitkow
Abstract:
The WWW is currently the hottest testbed for future interactive
digital systems. While much is understood technically about how
the WWW functions, substantially less is known about how
this technology is used collectively and on an individual basis.
This disparity of knowledge exists largely as a direct
consequence of the decentralized nature of Web. Since each user
of the Web is not uniquely identifiable across the system
and the system employs various levels of caching, measurement
of actual usage is problematic. This paper establishes terminology
to frame the problem of reliably determining usage of WWW resources
while reviewing current practice and their shortcomings. A review
of the various metrics and analyses that can be performed to determine
usage is then presented. This is followed by a discussion of the
strengths and weaknesses of the hit-metering proposal [Mogul and
Leach 1997] currently in consideration by the HTTP working group.
Lastly, new proposals, based upon server-side sampling are introduced
and assessed against the other proposal. It is argued that server-side
sampling provides more reliable and useful usage data while requiring
no change to the current HTTP protocol and enhancing user privacy.
Keywords:
World Wide Web, statistical analysis, clustering, path analysis, log
file analysis, sampling
You can access this technical report via:
PDF
Postscript
 
|