<div dir="ltr">The sites I am talking about check the User-Agent header and makes sure the user-agent is for a well-known browser, i.e. a browser that they support. And any browser like Firefox, Chrome, Safari, Edge for example, sends the headers in a certain order and the order depends on the browser. And this header order for well-known headers like Accept, Accept-Language, Accept-Encoding, Content-Length, Host, Connection, Referer, Cookie, etc. And they match the order of the received request with the standard header order for the browser for that user-agent.<div><br></div><div>This detects bots like a poorly written bot(i.e ones that don't consider this header order) using python requests or in any language for that matter where the requests are handled using a low level http requests library. </div><div><br></div><div>So, keeping the header order sent from the client intact would prevent them from dropping proxied requests(ones that use squid). I know for a fact that they don't intend to block proxies.</div><div><br></div><div>Could you point me in the direction to where I should look for in the source code of squid? the part that handles the header data sent from the client.</div><div><br></div><div>With regards,</div><div>Sonya Roy.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 23, 2017 at 12:02 AM, Alex Rousskov <span dir="ltr"><<a href="mailto:rousskov@measurement-factory.com" target="_blank">rousskov@measurement-factory.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 06/22/2017 11:49 AM, Sonya Roy wrote:<br>
<br>
> I noticed that squid changes the header order received from the client<br>
> before sending it to the origin server.<br>
><br>
> I assume this is because squid parses the header data and adds some<br>
> headers depending on the config file and then recreates the header data.<br>
<br>
</span>IIRC, modern Squids change a header field position when the received<br>
field is deleted and then added back. This is typical for hop-by-hop<br>
headers such as Connection, but there are other reasons for Squid to<br>
delete and add a header field. When the value of the added field is the<br>
same as the value of the removed field, such pointless "editing" looks<br>
like mindless "reordering" to the outside observer.<br>
<br>
The two actions (field deletion and addition) may happen in a single<br>
piece of code or may be separated by lots of code and even time.<br>
Preventing pointless editing in the former cases is straightforward, but<br>
the latter cases are difficult to handle. Correct avoidance of pointless<br>
editing may improve performance and, if it does, can be considered a<br>
useful optimization on its own, regardless of your use case.<br>
<span class=""><br>
<br>
> Is there any way to prevent this?<br>
<br>
</span>Not without changing Squid code (or adding more proxies). However,<br>
before we even talk about code changes, we should clarify the problem we<br>
are dealing with. The questions below will guide you.<br>
<br>
It is probably much easier to ensure some fixed field send order<br>
(regardless of the received order) than to preserve the received order.<br>
Will a fixed order (e.g., always alphabetical) address your use case?<br>
This feature will hurt performance, but you might be able to convince<br>
others to accept it if you have a very compelling/specific/detailed use<br>
case because it can be disabled by default.<br>
<span class=""><br>
<br>
> I am asking because some sites detect bots using the header order and<br>
> they drop any such connection. So they unintentionally block squid<br>
> proxies even if its not being used by a bot.<br>
<br>
</span>Are you implying that bots often change header field order between their<br>
requests? Or that bots often use a different (fixed) header field order<br>
than the (fixed) field order used by non-bots? Preserving received order<br>
may help in the former case but not in the latter case.<br>
<br>
Also, do those blocking sites pay attention to all headers or just<br>
end-to-end headers?<br>
<br>
Please note that there are many other ways to detect a proxy so if a<br>
site wants to block proxies rather than bots, then it is probably<br>
pointless to fight it (or, at least, the Squid Project should not).<br>
<br>
<br>
HTH,<br>
<br>
Alex.<br>
</blockquote></div><br></div></div>