[squid-dev] [PATCH] assertion failed: Write.cc:41: "!ccb->active()"
Eliezer Croitoru
eliezer at ngtech.co.il
Mon Mar 14 01:30:30 UTC 2016
I would like to respond on some of the things in the post related to V4.
First 3.5 is very stable for a very long time now.
It have couple bugs in it and it will be good to somehow make them gone
but I must highlight couple things which might not be clear.
Atomic patches are great and they solve things in a very good way.
But every bug has it's own "life" and we can understand that it would
not be possible to write a perfect product without couple bugs\errors\typos.
I like many believe in perfection but from the last test(RAM ONLY) I
ran, it seems that the hardware is not really ready to specific heat.
The DDR2 ECC cards started pooping out of their socket in the middle of
a simple test when it's pretty cold in the room...
+1 for trunk while I still have couple doubts about what I am
understanding about take1 and take2 steps.
I will try to see how the service works after the patches will be
applied to V4.
I have couple more build nodes and the automation learning keeps me busy.
Eliezer
On 10/03/2016 23:35, Alex Rousskov wrote:
> On 03/10/2016 12:14 PM, Christos Tsantilas wrote:
>
>> I am attaching two patches for this bug.
>
> I will re-summarize the problem we are dealing with using higher-level
> concepts so that it is easier to grok what Christos is talking about:
>
> 1. Ftp::Client cannot deal with more than one FTP command at a time.
>
> 2. Ftp::Server must _delay_ reading the next FTP command due to #1.
>
> 3. Delaying is complicated because Ftp::Server receives responses from
> "Squid" (ICAP, errors, caching, etc.), and not Ftp::Client itself:
> Ftp:Server may receive a response for the current FTP command
> long _before_ Ftp::Client is done dealing with that same command!
> Christos has highlighted a few specific cases where that happens.
>
> [ In HTTP, this problem is much smaller because the two Squid HTTP
> sides do not share much state: New Http::Client objects are created
> as needed for the same ConnStateData object while the old ones
> are finishing their Http::Client jobs. ]
>
> 4. The current Squid code (v3.5 and trunk) contains a mechanism for #2,
> but that mechanism does not handle a now-known race condition. The
> bug leads to the Write.cc assertion. Our take1 patch fixes that bug.
>
> 5. While working on #4, we realized that the existing mechanism for #2
> relies on Ftp::Client existence and other conditions that may not
> materialize. If things go wrong, Ftp::Server may get stuck waiting
> for Ftp::Client that either did not exist or did not end the wait.
> Our take2 patch revises the mechanism for #2 to be much more robust.
> We do not know of any specific way to hit the bugs fixed by take2,
> but that does not mean it is (and will remain) impossible.
>
> Take2 also relays FTP origin errors to the FTP user in more cases.
>
>
>> One simple for squid-3.5 (t1
>> patch) and one more complex (t2 patch). The simple patch solve the bug
>> for now, but may leave other similar bugs in squid.
>
> Amos, do you want us to port take2 to v3.5? The take1 patch for v3.5 is
> enough to fix the known assertion. Take2 fixes that assertion as well,
> but it is bigger because it also fixes design problems that may lead to
> other bugs in v3.5. Which one do you want in v3.5?
>
> For trunk/v4, there is no doubt in my mind that we should use take2 (or
> its polished variant).
>
>
> The rest is my take2 review.
>
>
>> + bool clientSideWaitingForUs; ///< whether the client is waiting for us
>
> Let's avoid confusing "client side" and "client" terms in new code,
> especially when it is Ftp::Server that is waiting:
>
> /// whether we are between Ftp::Server::startWaitingForOrigin() and
> /// Ftp::Server::stopWaitingForOrigin() calls
> bool originWaitInProgress;
>
>
>> +void
>> +Ftp::Relay::swanSong()
>> +{
>> + if (clientSideWaitingForUs) {
>> + CbcPointer<ConnStateData> &mgr = fwd->request->clientConnectionManager;
>> + if (mgr.valid()) {
>> + if (Ftp::Server *srv = dynamic_cast<Ftp::Server*>(mgr.get())) {
>> + typedef UnaryMemFunT<Ftp::Server, int> CbDialer;
>> + AsyncCall::Pointer call = asyncCall(11, 3, "Ftp::Server::stopWaitingForOrigin",
>> + CbDialer(srv, &Ftp::Server::stopWaitingForOrigin, 0));
>> + ScheduleCallHere(call);
>
> and
>
>> void
>> Ftp::Relay::serverComplete()
>> {
>> CbcPointer<ConnStateData> &mgr = fwd->request->clientConnectionManager;
>> if (mgr.valid()) {
>> + if (clientSideWaitingForUs) {
>> + if (Ftp::Server *srv = dynamic_cast<Ftp::Server*>(mgr.get())) {
>> + typedef UnaryMemFunT<Ftp::Server, int> CbDialer;
>> + AsyncCall::Pointer call = asyncCall(11, 3, "Ftp::Server::stopWaitingForOrigin",
>> + CbDialer(srv, &Ftp::Server::stopWaitingForOrigin, ctrl.replycode));
>> + ScheduleCallHere(call);
>> + clientSideWaitingForUs = false;
>> + }
>> + }
>
>
> Let's move this nearly duplicated code into a new
> Ftp::Relay::stopOriginWait(int code) method _and_ always clear
> clientSideWaitingForUs/originWaitInProgress there.
>
>
>> void
>> Ftp::Server::writeForwardedReply(const HttpReply *reply)
>> {
>> Must(reply);
>>
>> + if (master->waitingForOrigin) {
>> + delayedReply = reply;
>> + return;
>> + }
>> +
>
> If this method is called twice while waitingForOrigin is true, should we
> throw an exception, honor the first reply, or honor the last one? Please
> adjust the code as need. *If* honoring the last reply is the best
> strategy (which is what take2 does), add a comment:
>
> delayedReply = reply; // and forget any previous ones
>
>
>> + debugs(33, 5, "Transfering data in progress, waitting server side to finish");
>
> I suggest using "waiting for Ftp::Client data transfer to end" for
> brevity sake. This will also fix the two spelling errors in the above text.
>
>
>> + // Currently only data upload and data download requests are
>> + // waiting for the origin:
>> + // Must(master->serverState == fssHandleUploadRequest ||
>> + // master->serverState == fssHandleDataRequest);
>
> True, but our code does not _rely_ on that, right? I would remove that
> comment to avoid the implication that it does rely on that and we are
> just afraid to check.
>
> If you decide to keep this comment or enable these Must()s, please move
> all that _after_ waitingForOrigin is cleared.
>
>
>> if (master->serverState == fssHandleDataRequest) {
>> + if (!master->userDataDone) {
> ...
>> + originDataDownloadAbortedOnError = (originStatus > 400);
>> + return;
>> + }
>> +
>> + completeDataExchange();
>> + } else {
>> + if (delayedReply != NULL) {
>> + writeForwardedReply(delayedReply.getRaw());
>> + delayedReply = NULL;
>> + }
>> + }
>
>
> The above logic looks correct to me, but I feel like I am reading an
> inside-out code. Please consider this instead:
>
> // HttpReply in fssHandleDataRequest lacks status code
> if (master->serverState == fssHandleDataRequest)
> originDataDownloadAbortedOnError = (originStatus > 400);
>
> if (!master->userDataDone) {
> ... your big comment here ...
> // HttpReply in fssHandleDataRequest lacks status code
> if (master->serverState == fssHandleDataRequest)
> originDataDownloadAbortedOnError = (originStatus > 400);
>
> debugs(33, 5 "too early to write the response");
> return;
> }
>
> if (delayedReply) {
> writeForwardedReply(delayedReply.getRaw());
> delayedReply = nullptr;
> } else {
> completeDataExchange();
> }
>
>
> Please note that my "HttpReply in fssHandleDataRequest lacks status
> code" is based on your IRC comments, but it does not mesh well with your
> "If the server side aborted with an error before we are done, we need to
> record it" source code comment which seems to imply that we should use
> originStatus in all "server aborted first" cases and not just those
> where serverState is fssHandleDataRequest. Please adjust the final
> comment as needed.
>
>
>> + originDataDownloadAbortedOnError = (originStatus > 400);
>
> To minimize questions/doubts, please include 400 unless it is a
> successful status.
>
>
>> + //Adjust our reply if the server responded with an error:
>> + if (master->userDataDone == 226 && originDataDownloadAbortedOnError)
>> + master->userDataDone = 451;
>
> Please add a debugs() statement here so that we know why the code was
> changed during triage.
>
>
>> - MasterState(): serverState(fssBegin), clientReadGreeting(false), userDataDone(0), waitForOriginData(false) {}
>> +MasterState(): serverState(fssBegin), clientReadGreeting(false), userDataDone(false), waitingForOrigin(false) {}
>
> Unnecessary and wrong initial value change for userDataDone.
>
>
>> + /// whether we have to wait the transfer on the Squid-origin data connection
>> + /// to be finished
>
> Let's shorten that to "whether we wait for the origin data transfer to end".
>
>> + /// whether the transfer aborted with an error on server side
>> + bool originDataDownloadAbortedOnError;
>
> Avoid "server side" with "whether the origin data transfer aborted".
>
>
>> + /// Set when the HttpReply can not be forwarded right now to the client
>> + /// and must be delayed unless the server side is done.
>
> Simplify and describe the member meaning, not what the code does with
> it: "a response which writing was postponed until stopWaitingForOrigin()".
>
>
>> - /// whether the transfer on the Squid-origin data connection is not over yet
>> - bool waitForOriginData;
>> + /// whether we have to wait the transfer on the Squid-origin data connection
>> + /// to be finished
>> + bool waitingForOrigin;
>
> To reduce "global" or "shared" state, if you can now move
> waitingForOrigin to Ftp::Server, you should do that IMO.
>
>
> All of these are pretty much cosmetic changes that do not alter the
> proposed patch functionality AFAICT. IMO, if there are no objections,
> the polished patch should go into trunk (see above regarding v3.5).
>
>
> Thank you,
>
> Alex.
>
> _______________________________________________
> squid-dev mailing list
> squid-dev at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-dev
>
More information about the squid-dev
mailing list