[squid-dev] ICAP REQMOD request and response structure expectations?

Amos Jeffries squid3 at treenet.co.nz
Tue Sep 29 10:16:08 UTC 2015


On 29/09/2015 12:57 p.m., Eliezer Croitoru wrote:
> I have been working on an ICAP service and eventually I found out that
> some of the issues I have been struggling to resolve are just there
> because the library implementer partially read the RFC or just not fully
> considered the code he was writing.
> 
> So after reading the ICAP RFC(who knows what time) I found out that
> squid obeys it but it seems to alter the requests a bit.
> So I wanted to verify as much as possible I understand correctly what is
> possible.
> For HTTP requests squid seems to replace the original request first line
> PATH to a full URI.

No, to the "effective Request URI" as defined in RFC 7230. In practice
that means the absolute-URI (or "full URL") on all requests except
OPTIONS and CONNECT.

<http://tools.ietf.org/html/rfc7230#section-5.5>

NP: Squid was doing this long before it was documented in the RFCs.


> I have tried couple times to verify and this is how it works..
> I remember that there might have been an RFC that allows the usage of a
> full url in the path of the Request first line and I know but it is
> supported by many web servers.

What do you mean "in the path" ?

Like this?
  http://example.com/http://example.net/

 - has always been allowed. There is nothing special about
/http://example.net/" to prevent it being a series of path segments, or
folders in a filesystem somewhere.


Or do you mean software sending the absolute-URL where you personally
think relative-URL is supposed to go?
 If so it is your expectation that was wrong. HTTP/1 has always defined
request-target as being optionally an absolute-URI, with a preference
for relative-URL on port 80 and 443 messages. (except on CONNECT and
OPTIONS methods)

> 
> So would it be expected to be always like that in squid? since it
> applies to tproxy and forward proxy mode I assume it will be the same
> for everything else including reverse proxy mode.
> Another case is the CONNECT method, there I know that in a forward proxy
> mode a URI is not being used in the path but a domain\ip:port is there
> always.


The criterion for the _4_ request-target URL types ("forms") is
<http://tools.ietf.org/html/rfc7230#section-5.3>


> 
> Now to the main question:
> In a case I would modify a request to perform a url rewrite like
> operation, which is to replace url_rewrite helper I can do one of two
> things.
> - I can either modify the full request and transparently send the client
> to another page.
> - or use the same request and append a response such as 302 or 307
> redirection.

The *proper* way to do it is 30x redirect. But that relies on the
redirected-to URL being accessible to the client.
  In that #2 case you just have to emit the 30x message properly with
Location header and UI-suitable payload both agreeing with each other
(semantically, if not absolutely).

Note that 300, 301, 302, 303, 307, 308, 426, 451 and 511 are all
redirect (or equivalent) responses that may be more (or less) suitable
depending on the specific situations.


> 
> The second case is simple to implement but the first one(changing the
> original request) means that I am changing the URL, but what squid
> expects me to change in the request?

The message request-target field (what commonly called the message URL).
Any headers which embed parts of the URL.

If you change the URL using a absolute-URL on the adapted request
message Squid will deal with the Host header itself.

But there is also Origin, Authority, Forwarded-For, etc. which might
exist depending on what sub-protocol or extension is being transmitted
over HTTP.

Maybe uploaded content, though that is probably rare outside of PUT.


> or what response does it expects?

For #1 case, to make it transparent you have to filter the response
headers. Anything like Location, Content-Location, Forwarded-For which
contains URL or portions of it.

Also the payload - any content object that might contain URLs. HTML, JS,
JSON are the worst. But also PS, PDF, images, videos SWF etc.

If any of those get through on reponses there is a chance the user or
some automated script might use it. What that chance is varies from
large to ignorable depending on the format type and what the client
software is.


Sorry. Content adaptation is complex if you plan to do it seriously.

> Does squid expect me to change the URI only in the request first line or
> also the Host header?

Yes, both.

> 
> It is not a "critical" question since for now I replace them both, but
> if squid expects only the URI in the first request line to be changed
> there is no point to change the Host header.

See above. But you may not want to tie this behaviour to depend on
Squid. ICAP is by design portable between proxies and other types of
ICAP clients.

> 
> Thanks In Advance,
> Eliezer
> 
> * I am trying to think about documenting couple things about ICAP in the
> WIKI with an example ICAP service.

Our wiki is probably not the best place. Unless you were just
concentrating on the squid.conf parts, and AFAICS that is covered already.

You might want to get in contact with Christos about improving c-icap
documentation or the ICAP forum for adding to their docs for the service
internals or how-to documentation.

Amos


More information about the squid-dev mailing list