[squid-users] ACL matches when it shouldn't

Marcus Kool marcus.kool at urlfilterdb.com
Fri Oct 2 10:28:57 UTC 2020


Of course this script is sluggish since it reads many category files and forks at least 3-6 times.

If you *really* want to implement this with a perl script, it should read all files at startup and the script does a lookup using perl data structures.

But I suggest to look at ufdbGuard which is a URL filter that is way faster and has all functionality that you need.

Marcus


On 2020-10-02 10:08, Vieri wrote:
> Regarding the use of an external ACL I quickly implemented a perl script that "does the job", but it seems to be somewhat sluggish.
>
> This is how it's configured in squid.conf:
> external_acl_type bllookup ttl=86400 negative_ttl=86400 children-max=80 children-startup=10 children-idle=3 concurrency=8 %PROTO %DST %PORT %PATH /opt/custom/scripts/squid/ext_txt_blwl_acl.pl --categories=adv,aggressive,alcohol,anonvpn,automobile_bikes,automobile_boats,automobile_cars,automobile_planes,chat,costtraps,dating,drugs,dynamic,finance_insurance,finance_moneylending,finance_other,finance_realestate,finance_trading,fortunetelling,forum,gamble,hacking,hobby_cooking,hobby_games-misc,hobby_games-online,hobby_gardening,hobby_pets,homestyle,ibs,imagehosting,isp,jobsearch,military,models,movies,music,podcasts,politics,porn,radiotv,recreation_humor,recreation_martialarts,recreation_restaurants,recreation_sports,recreation_travel,recreation_wellness,redirector,religion,remotecontrol,ringtones,science_astronomy,science_chemistry,sex_education,sex_lingerie,shopping,socialnet,spyware,tracker,updatesites,urlshortener,violence,warez,weapons,webphone,webradio,webtv
>
> I'd like to avoid the use of a DB if possible, but maybe someone here has an idea to share on flat file text searches.
>
> Currently the dir structure of my blacklists is:
>
> topdir
> category1 ... categoryN
> domains urls
>
> So basically one example file to search in is topdir/category8/urls, etc.
>
> The helper perl script contains this code to decide whether to block access or not:
>
> foreach( @categories )
> {
>          chomp($s_urls = qx{grep -nwx '$uri_dst$uri_path' $cats_where/$_/urls | head -n 1 | cut -f1 -d:});
>
>          if (length($s_urls) > 0) {
>              if ($whitelist == 0) {
>                  $status = $cid." ERR message=\"URL ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
>              } else {
>                  $status = $cid." ERR message=\"URL ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
>              }
>              next;
>          }
>
>          chomp($s_urls = qx{grep -nwx '$uri_dst' $cats_where/$_/domains | head -n 1 | cut -f1 -d:});
>
>          if (length($s_urls) > 0) {
>              if ($whitelist == 0) {
>                  $status = $cid." ERR message=\"Domain ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
>              } else {
>                  $status = $cid." ERR message=\"Domain ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
>              }
>              next;
>          }
> }
>
> There are currently 66 "categories" with around 50MB of text data in all.
> So that's a lot to go through each time there's an HTTP request.
> Apart from placing these blacklists on a ramdisk (currently on an M.2 SSD disk so I'm not sure I'll notice anything) what else can I try?
> Should I reindex the lists and group them all alphabetically?
> For instance should I process the lists in order to generate a dir structure as follows?
>
> topdir
> a b c d e f ... x y z 0 1 2 3 ... 7 8 9
> domains urls
>
> An example for a client requesting https://www.google.com/ would lead to searching only 2 files:
> topdir/w/domains
> topdir/w/urls
>
> An example for a client requesting https://01.whatever.com/x would also lead to searching only 2 files:
> topdir/0/domains
> topdir/0/urls
>
> An example for a client requesting https://8.8.8.8/xyz would also lead to searching only 2 files:
> topdir/8/domains
> topdir/8/urls
>
> Any ideas or links to scripts that already prepare lists for this?
>
> Thanks,
>
> Vieri
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users


More information about the squid-users mailing list