<div dir="ltr">Thanks Alex.<div><br></div><div>Unfortunately I don't have enough C/C++ skills to fix it.</div><div><br></div><div>I've created a bug report – <a href="https://bugs.squid-cache.org/show_bug.cgi?id=4788">https://bugs.squid-cache.org/show_bug.cgi?id=4788</a></div><div><br></div><div>We've also changed parent behavior so it will not silently close the connection but will return 502 in this exact situation and seems like it fixes unexpected squid re-forward.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 28, 2017 at 7:12 PM, Alex Rousskov <span dir="ltr"><<a href="mailto:rousskov@measurement-factory.com" target="_blank">rousskov@measurement-factory.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 11/28/2017 02:27 PM, Ivan Larionov wrote:<br>
<br>
> Another interesting fact is that I can't reproduce this issue if squid<br>
> has no other traffic except my testing requests. But it's easy to<br>
> reproduce when server has other traffic.<br>
<br>
</span>I did not check your logs carefully, but I believe that (when things do<br>
not work the way you expect) your Squid is retrying a failed persistent<br>
connection (rather than re-forwarding the request after receiving a bad<br>
response). See the "pconn race happened" line below:<br>
<br>
> 1_retry.log:2017/11/28 12:55:12.731| 17,3| FwdState.cc(416) fail: ERR_ZERO_SIZE_OBJECT "Bad Gateway"<br>
> 1_retry.log:2017/11/28 12:55:12.731| 17,5| FwdState.cc(430) fail: pconn race happened<br>
> 1_retry.log:2017/11/28 12:55:12.731| 93,5| AsyncJob.cc(84) mustStop: HttpStateData will stop, reason: HttpStateData::<wbr>continueAfterParsingHeader<br>
> 1_retry.log:2017/11/28 12:55:12.731| 17,3| FwdState.cc(618) retryOrBail: re-forwarding (1 tries, 40 secs)<br>
> 1_retry.log:2017/11/28 12:55:12.731| 17,4| FwdState.cc(621) retryOrBail: retrying the same destination<br>
<br>
<br>
Assuming you tested with forward_max_tries set to 1, the retryOrBail<br>
lines above confirm the off-by-one problem I was describing in my<br>
previous response.<br>
<br>
AFAICT, compensating by setting forward_max_tries to zero will _not_<br>
work (for reasons unrelated to the problem at hand).<br>
<br>
<br>
FWIW, your current options include those outlined at<br>
<a href="http://wiki.squid-cache.org/SquidFaq/AboutSquid#How_to_add_a_new_Squid_feature.2C_enhance.2C_of_fix_something.3F" rel="noreferrer" target="_blank">http://wiki.squid-cache.org/<wbr>SquidFaq/AboutSquid#How_to_<wbr>add_a_new_Squid_feature.2C_<wbr>enhance.2C_of_fix_something.3F</a><br>
<br>
<br>
Alex.<br>
<div><div class="h5"><br>
<br>
> On Tue, Nov 28, 2017 at 7:32 AM, Alex Rousskov wrote:<br>
><br>
>     On 11/27/2017 05:19 PM, Ivan Larionov wrote:<br>
><br>
>     > I see retries only when squid config has 2 parents. If I comment out<br>
>     > everything related to "newproxy" I can't reproduce this behavior anymore.<br>
><br>
>     The posted logs are not detailed enough to confirm or deny that IMO, but<br>
>     I suspect that you are dealing with at least one bug.<br>
><br>
><br>
>     > <a href="https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F" rel="noreferrer" target="_blank">https://wiki.squid-cache.org/<wbr>SquidFaq/InnerWorkings#When_<wbr>does_Squid_re-forward_a_<wbr>client_request.3F</a><br>
>     <<a href="https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F" rel="noreferrer" target="_blank">https://wiki.squid-cache.org/<wbr>SquidFaq/InnerWorkings#When_<wbr>does_Squid_re-forward_a_<wbr>client_request.3F</a>><br>
>     ><br>
>     >> Squid does not try to re-forward a request if at least one of the following conditions is true:<br>
>     >><br>
>     >> The number of forwarding attempts exceeded forward_max_tries. For<br>
>     >> example, if you set forward_max_tries to 1 (one), then no requests<br>
>     >> will be re-forwarded.<br>
><br>
><br>
>     AFAICT, there is an off-by-one bug in Squid that violates the above:<br>
><br>
>     >     if (n_tries > Config.forward_max_tries)<br>
>     >         return false;<br>
><br>
>     The n_tries counter is incremented before Squid makes a request<br>
>     forwarding attempt. With n_tries and Config.forward_max_tries both set<br>
>     to 1, the quoted FwdState::checkRetry() code will not prevent<br>
>     re-forwarding. There is a similar problem in FwdState::reforward(). This<br>
>     reasoning needs confirmation/testing.<br>
><br>
>     Please note that simply changing the ">" operator to ">=" may break<br>
>     other things in a difficult-to-detect-by-simple-<wbr>tests ways. The correct<br>
>     fix may be more complex than it looks and may involve making policy<br>
>     decisions regarding forward_max_tries meaning. The best fix would remove<br>
>     checkRetry() and reforward() duplication. This code is difficult to work<br>
>     with; many related code names are misleading.<br>
><br>
><br>
>     >> Squid has no alternative destinations to try. Please note that<br>
>     >> alternative destinations may include multiple next hop IP addresses<br>
>     >> and multiple peers.<br>
><br>
>     The fact that Squid sends two requests to the same peer with only one<br>
>     peer address selected suggests that Squid is retrying a failed<br>
>     persistent connection rather than re-forwarding after receiving a bad<br>
>     response. Again, the logs are not detailed enough to distinguish the two<br>
>     cases. I can only see that a single peer/destination address was<br>
>     selected (not two), which is correct/expected behavior. I cannot see<br>
>     what happened next with sufficient detail.<br>
><br>
>     Going forward, you have several options, including:<br>
><br>
>     A. Post a link to compressed ALL,7+ logs to confirm bug(s).<br>
>     B. Fix the broken condition(s) in FwdState. See above.<br>
><br>
>     HTH,<br>
><br>
>     Alex.<br>
><br>
><br>
>     > 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15<br>
>     > 2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection<br>
</div></div>>     > on local=<a href="http://0.0.0.0:3128" rel="noreferrer" target="_blank">0.0.0.0:3128</a> <<a href="http://0.0.0.0:3128" rel="noreferrer" target="_blank">http://0.0.0.0:3128</a>> <<a href="http://0.0.0.0:3128" rel="noreferrer" target="_blank">http://0.0.0.0:3128</a>><br>
<div class="HOEnZb"><div class="h5">>     remote=[::] FD 15 flags=9<br>
>     > 2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest:<br>
>     > HTTP Client local=<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">127.0.0.1:3128</a> <<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">http://127.0.0.1:3128</a>><br>
>     <<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">http://127.0.0.1:3128</a>><br>
>     > remote=<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">127.0.0.1:53798</a> <<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">http://127.0.0.1:53798</a>><br>
>     <<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">http://127.0.0.1:53798</a>> FD 45 flags=1<br>
>     > 2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest:<br>
>     > HTTP Client REQUEST:<br>
>     > ---------<br>
>     > GET <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> HTTP/1.1<br>
>     > Host: HOST:12345<br>
>     > User-Agent: curl/7.51.0<br>
>     > Accept: */*<br>
>     > Proxy-Connection: Keep-Alive<br>
>     ><br>
>     ><br>
>     > ----------<br>
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)<br>
>     > clientAccessCheckDone: The request GET <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> is ALLOWED;<br>
>     > last ACL checked: localhost<br>
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721)<br>
>     > clientAccessCheck2: No adapted_http_access configuration. default: ALLOW<br>
>     > 2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745)<br>
>     > clientAccessCheckDone: The request GET <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> is ALLOWED;<br>
>     > last ACL checked: localhost<br>
>     > 2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding<br>
>     > client request local=<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">127.0.0.1:3128</a> <<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">http://127.0.0.1:3128</a>><br>
>     <<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">http://127.0.0.1:3128</a>><br>
>     > remote=<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">127.0.0.1:53798</a> <<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">http://127.0.0.1:53798</a>><br>
</div></div><div class="HOEnZb"><div class="h5">>     <<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">http://127.0.0.1:53798</a>> FD 45 flags=1,<br>
>     > url=<a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a><br>
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths:<br>
>     > Find IP destination for: <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a>' via 127.0.0.1<br>
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths:<br>
>     > Found sources for '<a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a>'<br>
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths: <br>
>     >  always_direct = DENIED<br>
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths: <br>
>     >   never_direct = ALLOWED<br>
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths: <br>
>     >     cache_peer = local=127.0.0.3 remote=<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">127.0.0.1:18070</a> <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>><br>
>     > <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>> flags=1<br>
>     > 2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths: <br>
>     >       timedout = 0<br>
>     > 2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server<br>
>     > local=<a href="http://127.0.0.3:57091" rel="noreferrer" target="_blank">127.0.0.3:57091</a> <<a href="http://127.0.0.3:57091" rel="noreferrer" target="_blank">http://127.0.0.3:57091</a>><br>
>     <<a href="http://127.0.0.3:57091" rel="noreferrer" target="_blank">http://127.0.0.3:57091</a>> remote=<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">127.0.0.1:18070</a> <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>><br>
>     > <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>> FD 40 flags=1<br>
>     > 2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server<br>
>     > REQUEST:<br>
>     > ---------<br>
>     > GET <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> HTTP/1.1<br>
>     > User-Agent: curl/7.51.0<br>
>     > Accept: */*<br>
>     > Host: HOST:12345<br>
>     > Cache-Control: max-age=259200<br>
>     > Connection: keep-alive<br>
>     ><br>
>     ><br>
>     > ----------<br>
>     ><br>
>     > [SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]<br>
>     ><br>
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader:<br>
>     > WARNING: HTTP: Invalid Response: No object data received for<br>
>     > <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> AKA HOST/<br>
>     > 2017/11/27 15:54:20.627| 17,2| FwdState.cc(655)<br>
>     > handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338<br>
>     > <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a><br>
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server<br>
>     > local=<a href="http://127.0.0.3:41355" rel="noreferrer" target="_blank">127.0.0.3:41355</a> <<a href="http://127.0.0.3:41355" rel="noreferrer" target="_blank">http://127.0.0.3:41355</a>><br>
>     <<a href="http://127.0.0.3:41355" rel="noreferrer" target="_blank">http://127.0.0.3:41355</a>> remote=<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">127.0.0.1:18070</a> <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>><br>
>     > <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>> FD 40 flags=1<br>
>     > 2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server<br>
>     > REQUEST:<br>
>     > ---------<br>
>     > GET <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> HTTP/1.1<br>
>     > User-Agent: curl/7.51.0<br>
>     > Accept: */*<br>
>     > Host: HOST:12345<br>
>     > Cache-Control: max-age=259200<br>
>     > Connection: keep-alive<br>
>     ><br>
>     ><br>
>     > ----------<br>
>     ><br>
>     > [SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]<br>
>     ><br>
>     > 2017/11/27 15:55:00.728| ctx: enter level  0: '<a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a>'<br>
>     > 2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP<br>
>     > Server local=<a href="http://127.0.0.3:41355" rel="noreferrer" target="_blank">127.0.0.3:41355</a> <<a href="http://127.0.0.3:41355" rel="noreferrer" target="_blank">http://127.0.0.3:41355</a>><br>
>     <<a href="http://127.0.0.3:41355" rel="noreferrer" target="_blank">http://127.0.0.3:41355</a>><br>
>     > remote=<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">127.0.0.1:18070</a> <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>><br>
</div></div><span class="im HOEnZb">>     <<a href="http://127.0.0.1:18070" rel="noreferrer" target="_blank">http://127.0.0.1:18070</a>> FD 40 flags=1<br>
>     > 2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP<br>
>     > Server REPLY:<br>
>     > ---------<br>
>     > HTTP/1.0 502 Bad Gateway<br>
>     > Cache-Control: no-cache<br>
>     > Connection: close<br>
>     > Content-Type: text/html<br>
>     ><br>
>     > <html><body><h1>502 Bad Gateway</h1><br>
>     > The server returned an invalid or incomplete response.<br>
>     > </body></html><br>
>     ><br>
>     > ----------<br>
>     > 2017/11/27 15:55:00.728| ctx: exit level  0<br>
>     > 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:<br>
>     > StoreEntry::checkCachable: NO: not cachable<br>
>     > 2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable:<br>
>     > StoreEntry::checkCachable: NO: not cachable<br>
>     > 2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073)<br>
>     > processReplyAccessResult: The reply for GET <a href="http://HOST:12345/" rel="noreferrer" target="_blank">http://HOST:12345/</a> is<br>
>     > ALLOWED, because it matched (access_log stdio:/var/log/squid/access.<wbr>log<br>
>     > line)<br>
>     > 2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage:<br>
>     > HTTP Client local=<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">127.0.0.1:3128</a> <<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">http://127.0.0.1:3128</a>><br>
</span><span class="im HOEnZb">>     <<a href="http://127.0.0.1:3128" rel="noreferrer" target="_blank">http://127.0.0.1:3128</a>><br>
>     > remote=<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">127.0.0.1:53798</a> <<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">http://127.0.0.1:53798</a>><br>
</span><div class="HOEnZb"><div class="h5">>     <<a href="http://127.0.0.1:53798" rel="noreferrer" target="_blank">http://127.0.0.1:53798</a>> FD 45 flags=1<br>
>     > 2017/11/27 15:55:00.728| 11,2| client_side.cc(1410)<br>
>     sendStartOfMessage:<br>
>     > HTTP Client REPLY:<br>
>     > ---------<br>
>     > HTTP/1.1 502 Bad Gateway<br>
>     > Date: Mon, 27 Nov 2017 23:54:20 GMT<br>
>     > Cache-Control: no-cache<br>
>     > Content-Type: text/html<br>
>     > X-Cache: MISS from ip-172-23-18-130<br>
>     > X-Cache-Lookup: MISS from ip-172-23-18-130:3128<br>
>     > Transfer-Encoding: chunked<br>
>     > Connection: keep-alive<br>
>     ><br>
>     ><br>
>     > ----------<br>
><br>
><br>
><br>
><br>
> --<br>
> With best regards, Ivan Larionov.<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">With best regards, Ivan Larionov.</div>
</div>