[squid-users] Refresh pattern issue in squid 3.1.20

Eliezer Croitoru eliezer at ngtech.co.il
Mon Dec 28 04:50:48 UTC 2015


Hey,

The pattern you wrote is wrong and also doesn't describe your need\desire.

A domain name can contain only certain characters so using a "." is wrong.
Also url and domain regular expressions should be as strict as possible 
so you would not have false positive matches.
Amos suggested to use:
refresh_pattern -i ^http://[a-zA-Z]+\.wsj\.net/ 10 200% 10 \
     override-expire reload-into-ims

And you can tweak it a bit to something like:
refresh_pattern -i ^http://[a-z\-\_\.A-Z0-9]+\.wsj\.(net|net|com|edu)/ 
10 200% 10 \
	override-expire reload-into-ims

which would describe what you want in a better way and will not produce 
false positive matches.
I would suggest you to use the next online tools:
https://regex101.com/
http://rubular.com/
http://www.regextester.com/

against a lost of urls such as you have mentioned:
http://www.wsj.net/wwww
http://www.wsj.donotexistdomain/wwww
http://test1.test-2.www.wsj.donotexistdomain/wwww
http://test1.test-2.www.wsj.edu/wwww
http://test1.test-2.www.wsj.text.com/wwww
http://test1.test-2.www.wsj.text.net/wwww
http://test1.test-2.www.wsj.text-4.ddd.net/wwww

And you can find couple real urls in your logs to match.

Once you have tested that different patterns you will be able to 
understand the issue with your patterns a bit better.

Eliezer

On 28/12/2015 06:30, SaRaVanAn wrote:
> Thanks for prompt response.
>
> I want to match all the URL's which has a pattern of "wsj" (example: *.
> wsj.com, *.wsj.net, *.wsj.edu ) . Does wildcard makes sense in squid
> refresh pattern? Can we have something like this?
>
>   refresh_pattern -i ^http://*\.wsj\.*/ 10 200% 10 \
>      override-expire reload-into-ims
>
>
> - Saravanan N



More information about the squid-users mailing list