[squid-users] Theoretically speaking about a proxy service

Thu Jun 29 02:20:57 UTC 2023

Hey Everybody,

I have seen couple free proxy providers like:
Urban vpn
Nord vpn
Clearvpn

And couple other proxy services.

A long time ago I wrote the article:
A Proxy for each Internet user! The future!

https://www1.ngtech.co.il/wpe/2016/05/02/proxy-per-internet-user-is-it-realistic/

And I was just wondering to myself a thing or two about http proxies.

Most of the VPN services use and support OpenVPN, wireguard and other vpn services on the route level.
These are simple and needs some kinds of "smart" CGNAT to operate and are cheaper than a http proxy since the it works in the lower
level of the connection.
For example, you can give a static private IP to the client in your system and apply all the relevant routing and NAT rules and the connection
will be initiated automatically with the relevant external IP.
Also, if you need an IP address you can just spin an "exit" node on any public cloud and add it into the pool of routes.

But there is another option, the proxy way of things.
Either socks or plain HTTP Proxy..

But let start with a proxy to simplify things.

Let say I want to spin couple squid "exit" nodes and I would like to have a frontend that will route traffic based on authentication details.
I have seen an answer which is un-verified since 2013 at:
https://access.redhat.com/solutions/259903

To make it all work we first need to assume that 
bever_direct allow all

will force all CONNECT requests to a cache_peer (since there aren't too many plain http services else then MS updates and couple others).

There is also another problem, how do we route clients based on credentials from a frontend to the backend exit nodes / cache peers?

There are couple issues in this kinds of setup.
Since the client connects to the proxy service in plain text it can be intercepted so we will assume that the user can access some securely to the proxy.
IE Wireguard or OpenVPN or SSTP or other IPSEC based solution which or any other alternative method like a Trusted network...

The next step in this setup is securing the connections between the proxies.
For this we need to use some kind of network of connection between the Hub or Hubs to the exit nodes.
If both the HUB and the exit node has a public IP address behind a 1:1 nat and can communicate directly they can use Wireguard or OpenVPN to secure their connections.
There are couple other things that need to be sorted and these are the provisioning of the exit nodes and their registration and status check each.
Any of the HUBs need to be able to handle couple of these tasks with a bit of automation and couple uuid generators.

I wanted to build such a tiny setup but I lack couple things for the specs for such a system.
I have seen this nice post:
* https://www.blackhatworld.com/seo/developer-needed-to-build-scripts-to-create-proxies-using-haproxy-or-squid-advanced-expertise-required.1300167/

So I am trying to mimic a WWW net.
The first thing is to have two to three ipconfig.io nodes which will have a very tiny foot print that I will use to test the setup.
The next thing is the basic WWW net ie couple sites with BGP each will have a /24(?) CIDR behind them and a central /24(?) for all of them.
Since it's a lab it's preferable that all these will have a very small resources foot print.
We can use a simple containers network and use the next piece of software:
* https://github.com/georgyo/ifconfig.io
* https://hub.docker.com/r/elicro/ifconfig.io

For the tests we might need a root CA but not really relevant since -k is good enough for most basic tests with curl since... we assume the connection is secured already.

Networks the we can use, private only(?):
192.168.0.0/16
10.0.0.0/8
172.16. 0.0/12

We can use also use CGNAT cidr:
100.64.0.0/10

* https://www.rfc-editor.org/rfc/rfc6598

And just for theses who need:
* https://www.ngtech.co.il/ipcalc/
* https://hub.docker.com/r/elicro/ipcalc

So we will need first one central hub for automation registry and management.
It will use couple internal CIDRs and couple 1:1 nat address spaces.

The end result should be couple tiny clients that will run couple curl tests with usename and password that will be the routing vector for the setup.
So we will have one main HUB and this hub will have 1 port that will listen to all proxy requests with username and passwords.
So basically we need an office and an internet connection, an idea and all the automation tools to implement it.
Currently AWS and many other providers have enough automation tools that can remove some of the heavy lifting off the table.
So now for the DB and registration system.
For each exit node we need a uuid and couple specific services.
* health check
* external ip verification
* registration against the hub
* VPN to the central HUB? (complexity.. but flexibility for the NAT connection tracking limit of the OFFICE/Proxy IP)

In the central office we need let say port 10000 a http proxy in port which will be port forwarded to a single squid proxy server with a floating ip and redundant server.
If we would have a secure channel between the proxies and the central office it will be much simple to register new proxies 
(Assuming each proxy receives the uuid and registration and VPN details in it's cloud-init or any other initialization method)

So we would have a DB which will hold a uuid and configuration details prepared before for the registration and health checks and status.

The squid.conf of the proxy should be created dynamically since there are changes in the network....
Unless we assume a specific capacity and an internal connection between the HUB and the proxy.
If we assume an internal connection between the HUB and the proxies we can dedicate a cidr for the proxies.
Then we can create a pretty "static" squid.conf (a big one..) and we can change the configuration in the DB so
helpers will help us decide which proxy is up or down and which of the static cache_peers a user name and password will use.

What do you think about this? How will it work?
Squid can handle this kind of load with couple workers and couple scripts but to create such a setup, it’s a bit of a job.
Let say I will assume a network of proxies with 10 proxies which will spin up and down, how will it work????
How much resources are required to run test such a setup?

I believe a demo can all be done on a linux network namespaces on a single node setup but it's not like real world...
What OS will you use in such a setup?
These days any linux OS requires at-least 512 MB of RAM to spin nicely so I assume an Alpine based setup would be nice but...
It's not like RHEL systems, There are scripts that should be written and supervised to be used (compared to systemd) etc...

Let me know if the script I wrote seems reasonable enough.

( 6.0.3 here I'm coming, here since 3.2 beta )

Eliezer