[squid-users] Regex optimization

Yuri Voinov yvoinov at gmail.com
Thu Jun 16 19:17:42 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
 
Heh. As usual.

The only solution is uses redirector + DB + blocklists.

For example, ufdbguard.

PS. What an stupid idea - uses regex for this task without CRAY....
:))))))))

17.06.2016 1:11, Alfredo Rezinovsky пишет:
> Well.. I tried.
> I need to ban 8613 URLs. Because a law.
>
> If I put one per line in a file and set the filename for an url_regex
acl it works. But when the traffic goes up the cpu load goes 100% (even
using workers) and the proxy turns unusable.
>
> I tested and saw my squid can't parse regexes with more than 8192
characters.
> I managed to combine the 8000 uris in 34 regexes using a ruby gem, and
the cpu load stays almost at the same level it is without any acl (same
traffic).
>
> The problem is that of the 34 regexes 33 works fine and 1 matches
everything.
>
> "http://something" matches.
>
> the regex is:
>
>
^http:\/\/(m(i(n(i(\.free\-teen\-pussy\.com\/scj\/thumbs\/(0\/(253_Mini_Mini\.jpg|306_Mini_Mini\.jpg|468_Pree_Pree\.jpg|519_Models_Models\.jpg|688_Models_nn\.jpg|7(17_Models_Models\.jpg|22_Models_Mini\.jpg)|880_nn_Models\.jpg)|1\/(170_little_models\.jpg|271_Young_Young\.jpg|330_models_little\.jpg|412_little_models\.jpg|6(28_little_models\.jpg|43_little_little\.jpg)|742_Young_Young\.jpg|877_littlemodels_littlemodels\.jpg|903_littlemodels_littlemodels\.jpg)|2\/(0(47_models_little\.jpg|92_models_models\.jpg)|1(44_little_little\.jpg|75_Young_Young\.jpg)|2(04_littlemodels_littlemodels\.jpg|29_russian_russian\.jpg|76_russian_little\.jpg)))|girls\.biz)|t(ladies\.com\/|teens\.com\/))|ragetube\.com\/content\/(35\/907_un_pequena\.jpg|75\/939_sex_Simpsons\.jpg|95\/473_Babe_039\.jpg))|mmgay\.com\/|o(dels(\.world\-collections\.com\/index\.html\?(4[69]|7(34|7)|88)|\-(hot\.net\/|me\.com\/index\.html\?(23|838)))|e(img(1\.moesearch\.net\/imgs1\/01058000\/1058747\.jpg|2\.moesearch\.net\/imgs2\/(0(1058000\/1058604\.jpg|2095000\/2095432\.jpg|4027000\/4027262\.jpg)|12509000\/12509478\.jpg))|search\.net\/img\.php\?mode=view&id=1058747)|ist\.lolajunior\.com\/images\/cache\/240x180\/(1(00\.125\.1231425553\.jpg|11\.85\.1921663855\.jpg|40\.76\.1735890287\.jpg|50\.115\.1167057586\.jpg|6(1\.133\.2028798891\.jpg|6\.130\.165837020\.jpg|7\.111\.1548016707\.jpg)|72\.107\.88564122\.jpg)|2(42\.142\.777367129\.jpg|76\.170\.1873449959\.jpg)|95\.80\.959860653\.jpg)|n(dayporn\.com\/f\/teen\/|stercams\.in\/)|ppetdollz\.com\/|r(asmovie\.com\/|e\-porn\.net\/)|ther\.taboo\.cc\/images\/cache\/300x250\/191\.170\.32154737\.jpg|v(\.(18\-21\-teens\.com\/|ftvcash\.com\/Free\-Movies\/galleries\/FTV\/1\/165\/)|ie(1820\.com\/trifuns\/sabine\.61sec\.shtml|galls(1(\.t(eensexreality\.com\/665\/\?nats=MjgwOjI6NA|inseks\.com\/video2\/070\/\?nats=MjI6Mjo2,0,0,0,5506)|2\.teenburg\.com\/262\-265\/265\/070\/\?nats=Mjc6Mjox)|3\.teen(burg\.com\/videos15\/093\/\?nats=MTc6Mjox,0,0,0,4396|sexreality\.com\/videos3\/042\/\?nats=MjU6Mjo0))|pornshop\.com\/?|sroom\.net\/index2\.shtml)))|pxgirls\.com\/stream\/thumbs\/(196\/196614\.jpg|96\/96855\.jpg)|rpornsite\.com\/tube\/teen\/|snwhores\.com\/|u(lti(\.xnxx\.com\/gallery\/132750\/ad59\/|grab\.olimptraffic\.com\/thumbs\/(003a\/8e8675a8f2fb40dfedbde11abedb4685\.jpg|new(\/00(02\/1fe44498d20b8ca368b23b517f03af47\.jpg|14\/07df98c7a77ab7069dc236273948fe7d\.jpg|37\/a209f10ecc66fa95c4e5d0c56c6eb6b5\.jpg|52\/0a61ea933a3b7a5febcfc94f3e225c98\.jpg)|2\/0(0(07\/0bae6ad1630ca38eafe01383eeb0d860\.jpg|89\/(86365a52f500fd038170116e4026fe94\.jpg|a9bb7df30ede34bb568978495968ed7b\.jpg))|1(36\/e00a6e199a1655fb757448e857b63607\.jpg|59\/3434c873b5544bfef18c58518b89eb81\.jpg)))))|n\-da\.com\/\?ref=nuvilon\.com|s(etoons\.com\/|icnet\.sil\.at\/phpbb\/viewtopic\.php\?p=35246\#35246)|ychicas\.com\.ar\/imgoct2013\/17\.jpeg)|y(18teens\.com\/(movies\/(amazing\-teenage\-whore\-kacey\-gives\-head\/\?nats=NTM6Mjox,0,0,0,10461|brenda\-blowjob\/\?nats=NjY1OjI6MQ|gorgeous\-teen\-girl\-katya\-pornmovies\/\?nats=NTM6Mjox,0,0,0,10507|teenage\-model\-gina\-porn\-clips\/\?nats=NTM6Mjox,0,0,0,10492)|pictures\/(naked\-teen\-hoe\-sabrina\/\?nats=NTM6Mjox,0,0,0,10440|sweet\-teen\-babe\-sabrina\/\?nats=NTM6Mjox,0,0,0,10431|teen\-fuck\-movie\/\?nats=MTI6Mjox,0,0,0,1189))|3(dsex\.com\/|xxx\.com\/(dtr\/galls\/(03104b\/|6(0298a\/|64bda\/)))?)|boyssite\.com\/\?id=10gay\.com|crazyvids\.com\/mov\/best\/petite\-1\.html|e(mogirl\.com\/|xbaby\.com\/)|firstteens\.com\/|\-fruits\.info\/topics\/(HOT%20NEW%203D%20SITE%20!!%20Photos%20of%20young%20nonude%20models%20in%203D%20!%20%28Anaglyph%20and%20MPO%29\/|More%20young%20talents%20Vinka%20Model\/|NEW%20star%20Tiana%20Model\/|Preteen%20art%20from%20Talent%20Young%20%28NEW%29\/|S(ummer%20days%20\-%20best%20nonude%20preteen%20photos%20and%20videos%20from%20summer%20beaches\/|veta%20with%20piggy%20tails\/)|Talent%20Young%20very%20HOT%20preteens%20collection%20%28NEW%29\/)|idealgirl\.com\/archives\/1|juniorsister\.com\/videos\/(0(29\.jpg|61\.jpg)|1(3(7\.jpg|8\.jpg)|54\.jpg|88\.jpg)|2(4(4\.jpg|6\.jpg)|89\.jpg|98\.jpg)|375\.jpg)|s(ilverteens\.com\/index\.php|luts\.net\/(\?x=7106\.8911\.3297)?)|teen(ie\.com\/|pics\.co\.tv\/tgp\/|tv\.com\/|video\.com\/pictures\/(amateur\-teenage\-girl\-nesti\-sex\/\?nats=bogk84q9;827:notrials50:myteenvideo,0,0,0,19752|sweet\-teen\-babe\-aletta\/\?nats=NTM6Mjoz,0,0,0,10608))|youngsex\.com\/))|n(a(dia(31\.firstmo\.com\/|kristina\.firstmo\.com\/thumbs\/(0(5\.jpg|6\.jpg|8\.jpg)|1(2\.jpg|6\.jpg|7\.jpg|8\.jpg)|2(0\.jpg|1\.jpg)))|ked(\-(girls\-vids\.com\/|nude\.com\/|photos\.org\/)|teenspictures\.com\/str\/thumbs\/7\/7(168\.jpg|485\.jpg))|na\.legalmodolls\.com\/promo\/240x460_02\.jpg|onlinedee\.qipim\.ru\/twocj\.html|rutoepisodeporno\.com\/|sty(\.preteennonude\.com\/(\?ref=preteennonude\.com&x=9622\.8898\.|pic\/pic\.php\?285)|teensdesire\.com\/)|turistteenphotos\.com\/th\/thumb034\.jpg|ughtyteenvids\.com\/\?id=kind\-girls\.net)|c\-mcadd\.org\/model\-girls\-preteen\/tn\/(20952\.jpg|4086\.jpg|7905\.jpg)|e(mo\-glamour\.com\/|odet\.com\/\?id=(nonude\-modelcom|topnonude\-girlsinfo|virginstubenet)|w(\.(18onlygirls\.com\/4234c2e9\/MTE3NjU6NToyNA\/|beataporn\.com\/a31412b5\/Njg1MDo1OjMx\/|nonudeplace\.com\/(aida\.holymodels\.jpg|bella\.holymodels\.jpg|evie\.holymodels\.jpg|g(irls(1\.holymodels\.jpg|3\.holymodels\.jpg)|lenda\.holymodels\.jpg)|kamilla\.holymodels\.jpg|lia\.holymodels\.jpg|scj\/tmp\/0\/(2(29_preteen_model\.jpg|62_nonude_glenda\.jpg)|351_lolita_nude\.jpg|4(02_non_preteen\.jpg|50_prelolitas_prelolitas\.jpg)|511_nonude_pics\.jpg|6(03_lolita_lolita\.jpg|85_prelolitas_prelolitas\.jpg)|7(27_nn_model\.jpg|63_nude_preteen\.jpg)))|younglegalporn\.com\/(3f68aa4a\/Njg1MDo1OjI1\/|ee0598ed\/MzcyODo1OjI1\/))|\-(art\-nude\.net\/streamrotator\/thumbs\/(0\/(234\.jpg|32\.jpg|493\.jpg)|59\/59594\.jpg|65\/65515\.jpg)|content\.info\/cj\/g\/index\.php\?ft=gaybase\.biz|models\.in\/scj\/thumbs\/0\/(576_0053_0053\.jpg|733_From_From\.jpg)|teensex\.com\/(\?facename=indexpdl&updatestat=off)?)|estpornlinks\.com\/|n(nmodels\.com\/scj\/thumbs\/0\/300_Star_Preten\.jpg|ubiles\.net\/)|pornsite(\.org\/(\?x=0058\.4384\.|vids\/x09\/05\/26\/A\-nice\-Blowjob\/index\.html)?|s\.net\/)|yes\.moy\.su\/sixcj\.html)|zemnoykaif\.com\/(\?d=p1)?)|fsx\.com\/(\?x=(0693\.6976\.0182\.9567\.9074\.4120\.5480\.1010\.9410\.5849|2504\.9354\.5283\.5928\.5849\.|4113\.5849\.|5849\.?|7119\.5849nfsx\.com\/\?x=7119\.5849\.))?|i(c(exvideostube\.com\/dtr\/thumbs\/(0(82339\.jpg|92429\.jpg)|3c2af8\.jpg|4c9c39\.jpg|6(2ba29\.jpg|61782\.jpg)|76803f\.jpg|97c126\.jpg|b(46397\.jpg|65702\.jpg)|c(0ab58\.jpg|3d805\.jpg|9065b\.jpg)|d(799f4\.jpg|8b70c\.jpg))|hegalz\.com\/moviesgeneral\.php\?d=etvtube\.com&acc=arkan&in2=)|ghttubes\.com\/|neclips\.com\/(watch\.php\?tag=0glreYaIS9)?|vonog\.zonalibre\.org\/archives\/015324\.html)|n(\-(1\.com\/(ban_468\.jpg)?|art\.info\/(image\/8\.jpg|portal\/main\/index_files\/tn_(1\.jpg|reyna25\.jpg))|b(bs\.info\/(100150\.gif|ban\.gif)|eauties\.com\/)|girls\.biz\/sitesx\/(2\/stella\.jpg|4\/(bella\.jpg|evie\.jpg)|5\/mashamodel\.jpg)|l(ist\.in\/scj\/thumbs\/0\/394_and_5\.jpg|ola\.net\/index\.html\?44)|mods\.com\/index\.html\?36|sites\.com\/(nnmodels(3\.html|4\.html))?|t(een\.thumblogger\.com\/|op\.com\/100x150\.jpg))|100\.in\/(\?id=simply\-models\.net|s(cj\/thumbs\/1\/(3(09_Elba_Pretty\.jpg|31_young_models\.jpg)|68(3_Beautiful_Mara\.jpg|9_Hot_Young\.jpg)|7(03_Model_0071\.jpg|29_Other_Girls\.jpg|57_Model_Girls\.jpg|91_From_Models\.jpg)|8(28_From_Vilma\.jpg|43_Models_Models\.jpg|51_0068_5\.jpg|76_Model_Mara\.jpg))|ites\/(lila\/Lila\-ind\-400x600\.jpg|natasha\/Natasha\-ind\-400x600\.jpg)))|desire\.com\/bannersmall\.gif|elis\.com\/|lopics\.com\/(PC\-tn(alina6b\.jpg|daphne3b\.jpg|roxana2\-2b\.jpg))?|neversleeps\.info\/|preteen\.com\/index\.html\?18|t(eens\.com\/tgp\/my18teens\-arielfuck\/bunny\.htm|op\.org\/(100150\.gif|miniban\/inna\.jpg))|ville\.com\/(\?(ref=n(onudemodel\.net|uvilon\.com)|x=5608\.9251\.9012\.6426\.)|GALL\/art\/trixie\/)?)|o(chesexo\.com\/|n(\-nude\.tv\/non\-nude_room\/151\-3_gb_private_colection_jb_bonus\.html|nudesitescatalog\.com\/banner\.gif|stop\-nn\.info\/(\?ref=newnnmodels\.net|100150\.jpg|galls\/mnrv\/More%20preteen%20nonude%20models%20Dany%20and%20Camy\/)))))
>
> Testing the regex in regex101 it should not match.
>
> My squid was compiles with gnu regex
>
> Squid Cache: Version 3.5.17
> Service Name: squid
> configure options:  '--prefix=/opt/sepia/distro/squid'
'--sysconfdir=/var/lib/sepia/' '--disable-auth' '--disable-auto-locale'
'--disable-cache-digests' '--disable-cpu-profiling'
'--disable-debug-cbdata' '--disable-delay-pools' '--disable-devpoll'
'--disable-ecap' '--disable-esi' '--disable-eui'
'--disable-external-acl-helpers' '--disable-follow-x-forwarded-for'
'--disable-forw-via-db' '--enable-gnuregex' '--disable-htcp'
'--enable-icap-client' '--disable-ident-lookups' '--enable-internal-dns'
'--disable-ipf-transparent' '--disable-ipfw-transparent'
'--disable-ipv6' '--disable-leakfinder' '--disable-pf-transparent'
'--disable-poll' '--disable-select' '--disable-snmp' '--with-openssl'
'--disable-stacktraces' '--disable-translation'
'--disable-url-rewrite-helpers' '--disable-wccp' '--disable-wccpv2'
'--disable-win32-service' '--disable-x-accelerator-vary'
'--disable-icmp' '--disable-storeid-rewrite-helpers' '--enable-async-io'
'--enable-disk-io' '--enable-epoll' '--enable-http-violations'
'--enable-inline' '--enable-kill-parent-hack' '--enable-linux-netfilter'
'--enable-log-daemon-helpers' '--enable-removal-policies'
'--enable-storeio' '--enable-unlinkd' '--enable-x-accelerator-vary'
'--enable-zph-qos' '--with-default-user=nobody'
'--with-logdir=/var/log/sepia' '--with-pthreads' '--with-included-ltdl'
'--with-pidfile=/var/lib/sepia/squid.pid' '--with-netfilter-conntrack'
'--disable-arch-native' --enable-ltdl-convenience
>
> Testing with more lines of smaller regexes sometimes leads to the same
problem. One regex matching everything
>
> 2016-04-27 8:32 GMT-03:00 Alfredo Rezinovsky <alfrenovsky at gmail.com
<mailto:alfrenovsky at gmail.com>>:
>
>     I saw in debug log that when an ACL has many regexes each one is
compared sequentially.
>
>     If I have
>
>     www.facebook.com <http://www.facebook.com>
>     facebook.com <http://facebook.com>
>     www.google.com <http://www.google.com>
>     google.com <http://google.com>
>
>     If will be faster to check just ONE optimized regex like
(www\.)?(facebook|google).com than the previous three?
>
>     I'm really talking about optimizing about 3000 url regexes in one
huge regex because comparing each and every url to 3000 regexes is too slow.
>
>     I know using
(www\.facebook\.com)|(facebook\.com)|(www\.google\.com)|(google\.com)
with PCRE will produce the same optimized result as
(www\.)?(facebook|google)\.com. Squid uses GnuRegex. Does GNURegex lib
optimizes this as well ?
>
>
>     --
>     Alfrenovsky
>
>
>
>
> --
> Alfrenovsky
>
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
 
iQEcBAEBCAAGBQJXYvtVAAoJENNXIZxhPexGMzQH/25lP8nVMYP7y5UDQdZbEAzg
MD5L/zwUK9XUzT4FlgpkJjYmV/mkADsIV7bgs3fjsK6mWTNwS+IaRoo/gnziFAE0
sS9k2Z5fm+LHkUxC6YaCC1kHSNC9WFKdl8DQLTlvnkjuootJkPvZFxmzDFsvPqtB
ZIqX/F+PfJtuS0vwAEO0j6YoI+XqsWk6GacAyAf55H+VUKf3yCqgNj502UQF8QYf
K9s/nXuydQ/7EBTMBHqmFqVyfuNc4lVtg/V4rVgB62M2sjeLrTvXU8zEQSUm0tLV
ea/CAv0runAzjfdKWOXn22aSkvzfoI13DV+wXEadshXWmU2QzdXQaQnyR8gkRG4=
=aHYr
-----END PGP SIGNATURE-----

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20160617/7a021cf0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x613DEC46.asc
Type: application/pgp-keys
Size: 2437 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20160617/7a021cf0/attachment-0001.key>


More information about the squid-users mailing list