[squid-users] Calculate time spent on website (per ip address)

Amos Jeffries squid3 at treenet.co.nz
Wed Feb 11 02:09:09 UTC 2015


On 11/02/2015 1:37 p.m., Luis Miguel Silva wrote:
> I'm trying to export this information and create pretty reports detailing
> how much time each device spent online / on each site.

The graph of online will shock you. Network access times are seriously tiny.

The Squid access.log column #2 is the count of milliseconds the client
spent online accessing the resource in the URL field. Add that up
per-site and you have the answer for online time. All the rest of the
time is OFFLINE - user reading, viewing, doing other stuff.

You will need to adjust for CONNECT transactions, which skew the data by
staying active/counting across times when actually nothing is going on.

The other unsolved problem is figuring out what a "site" actually is.
Most people start off assuming domain name == website. They are wrong so
very, very wrong.

When it comes to anything above small business sites or personal domains
a "site" turns into a unholy mashup of multiple domain names (yay
multiple sites ... or is it one? and how are they overlaid with
embedding? .... eek!) and objects from all over the place.


> 
> I understand I'll probably need to create this myself, I'm just trying to
> figure out what the state of the art is so I don't waste time on problems
> that have already been solved by others! :o)

State of the art AFAIK is proprietary information. You will have to ask
the big "social network" crowd if they will let you in on.

The Squid time quote helper uses 5 minutes default, but is configurable.

The session helpers use 1 hour defaults, but is not easily per-site (see
above).

The Squid built-in client DB uses a threshold with frequency of access
over the past T seconds. But its only session-tracking to know whether
the resources consumed by state about the client can be discarded or are
better kept for some future traffic.

Amos



More information about the squid-users mailing list