[squid-dev] Squid 3.5.23: crash in Comm::DoSelect

Tue Oct 18 14:48:26 UTC 2016

On 10/18/2016 03:44 AM, oleg gv wrote:

> nfds=284, so loop ends on 283 and pfds[283] is buggy

> I/o module is  src/comm/ModPoll.cc, method Comm::DoSelect(int msec)
> On stack we see that pfds[SQUID_MAXFD=256], so is less than nfds in loop.
> May be malloc nfds?

If your maxfd is bigger than SQUID_MAXFD than the bug is elsewhere and
dynamically allocating pfds is not the right fix (even though it will
"work").

I suspect your Squid is creating creating or accepting a descriptor that
exceeds SQUID_MAXFD-1. Biggest_FD+1 cannot be allowed to exceed the
misnamed SQUID_MAXFD limit.

This combination looks like a big red flag to me:

    struct pollfd pfds[SQUID_MAXFD];
    ...
    maxfd = Biggest_FD + 1;
    for (int i = 0; i < maxfd; ++i) {
        ...
        pfds[nfds].fd = i;

That code is missing assert(maxfd <= SQUID_MAXFD) which will fail in
your case.

If you want a workaround, try building Squid with a reasonable number of
maximum descriptors (e.g., 16K, 32K, or 64K). If that number is never
reached in your environment, the code will appear to work.

If you want to try a quick fix, replace SQUID_MAXFD with (Biggest_FD +
1) when declaring pfds. You may need to ignore/disable compiler warnings
about C++ arrays with dynamic sizes. Alternatively, you can allocate
pfds dynamically (as you suggested).

If you want to fix the bug, audit all Biggest_FD- and
SQUID_MAXFD-related code to make sure the two are always in sync.

HTH,

Alex.

> 2016-10-18 8:29 GMT+03:00 Amos Jeffries:
> 
>     FYI: Squid-3.5.23 does not exist yet. What is the output of "squid -v" ?
> 
>     On 18/10/2016 5:01 a.m., oleg gv wrote:
>     > I have big traffic (at least 100 computers) , and squid often crashed in
>     > Comm::DoSelect(int msec) function.
>     > I have interception mode and NAT redirect.
>     >
>     > In coredump I saw then bug is in next fragment of code:
>     >
>     > 446│         for (size_t loopIndex = 0; loopIndex < nfds; ++loopIndex) {
>     > 447│             fde *F;
>     > 448│             int revents = pfds[loopIndex].revents;
>     > 449│             fd = pfds[loopIndex].fd;
>     > 450│
>     > 451│             if (fd == -1)
>     > 452│                 continue;
>     > 453│
>     > 454├>            if (fd_table[fd].flags.read_pending)
>     > 455│                 revents |= POLLIN;
>     >
>     > SIGSEGV occured often (about 1 time in a minute) in line 454 : fd=-66012128
>     > , loopindex=283
>     >
>     > (gdb) p pfds[282]
>     > $17 = {fd = 291, events = 64, revents = 0}   -- looks ok
>     >
>     > (gdb) p pfds[283]
>     > $18 = {fd = -66012128, events = 32595, revents = 0}  -- looks strange and
>     > spoiled
>     >
>     > (gdb) p Biggest_FD
>     > $19 = 292
>     >
> 
>     What is the nfds value ?
> 
>     It looks to me like only 282 FD have operations to perform on this I/O
>     cycle.
> 
>     What I/O module is being used?
> 
>      src/comm/ModDevPoll.cc:Comm::DoSelect(int msec)
>      src/comm/ModPoll.cc:Comm::DoSelect(int msec)
> 
> 
>     Amos
> 
>     _______________________________________________
>     squid-dev mailing list
>     squid-dev at lists.squid-cache.org <mailto:squid-dev at lists.squid-cache.org>
>     http://lists.squid-cache.org/listinfo/squid-dev
>     <http://lists.squid-cache.org/listinfo/squid-dev>
> 
> 
> 
> 
> _______________________________________________
> squid-dev mailing list
> squid-dev at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-dev
>