[squid-users] Squid as an education tool

Sat Feb 10 21:53:03 UTC 2024

Hey Francesco and others,

First thanks of the direction.

I was thinking about using generic tools that are available as possible.
Also, in education there is a whole thing about it not being an intercept proxy (with or without bump) so
it simplifies some of the aspects of the setup.

I would try to write the general specs of the project from my point of view.
Since the goal is educating and not enforcing a policy we can start by defining the age of the kids as low as 6-5 or even lower.
Due to the age of the kids there is a baseline policy that must be enforced ie couple of standard and known categories.
With this in mind we need a DB setup that will host these categories and will be performant enough for high load ie Schools.
Since the law in most if not all countries on earth prohibit nudity to a degree and also prohibit the demonstration of reproductive activities
in both animals and humans, it's pretty clear that any violation of these should only be possible only by professional staff that is allowed by the law
to open these doors for very very special cases (which I know to exist).
There are also other activities and categories which are known to be harmful for specific ages which should be blocked by policy.
We can divide the level of filtering policy to domains, urls and content inside a page or a dynamic app which is either embedded or sources in another method.
Domains and urls are the most known and is commonly filtered and many tools are available to enforce and block these.
There is some issue is with systems and sites which are not based on static content and others which are based on content which is 
streams inside Websocket or another method such as content that is chunked over multiple urls or customized requests and responses.

On this specific project I want to address only the basics which are: domains and maybe urls.
Due to the above fact and the fact that the internet is far ahead of 1985 or 2000 the depth of the education session is restricted 
the proxy will be used only to demonstrate that there are bad actors on the Internet.
There are also other categories that probably many would like to add into the list such as malware sites.

I believe that the right way is to use a forward proxy which will use usernames to authenticate and identify the user.
This will make the whole setup a bit simpler to build and it is based on that the kids or teenagers are actively participating
in the setup and agreed to the terms of use based on their trust in the teachers and parents.
We also need to show some trust to the kids to allow them to be open in the session.

>From my point of view the architecture should be something like this:
* Proxy
* DB (SQL or another)
* Users Web portal (app)
* Admins Web Portal (app)
* Blockpages (static content with a touch of JS)
* A set of external helpers (auth, dstdomain matcher, time limit, dns rbl checker)
* Audit system

The assumption is that only authenticated users can use the proxy ie no username no internet.. even for windows and AV updates.
We also assume that the admins of the proxy do not need to override the basic polices because they have access to unrestricted internet.
Authentication can be done using the existing tools with a MySQL DB which can be integrated with the web portal( not AD or LDAP..)
The DB for the dstdomain/url blacklists should be fast enough to allow almost real time updates to the degree of TTL such as 5 to 10 seconds.
Every domain which should be blocked by the policy is a "must bump" one while if it allowed by the policy a "no bump" should be applied.
There are couple layers of block and whitelists (first match from left to right):
  Top-level(never allowed), , campus wide customized blacklist (for testing), campus wide customized whitelist(for testing), user customized blacklist, user customized whitelist, campus wide blacklist, campus wide whitelist

The user can manage his lists via the web portal but not the top level and campus lists.
There is also a section in the web portal which allows the user to contact the content administrators about any non user customized
lists such as the top level and the campus wide.
The expectation from the content administrators is to really understand the user interaction with them and to not just enforce the policy.
There is also a requirement from the content admin to have above the average technical knowledge about how internet works.
It includes both IP level and application level such as how TLS and firewall piercing.
The expectation is that all changes in any of the lists will be logged in the audit log.
Also, any "action" in the web portal will be logged in the audit log.
The audit is required by law to prevent from bad actions to be done in un supervised manner.
Due to this the Proxy structure and config is set and cannot be changed by anyone, even the sys admins.
To allow the system to be effective the only option to access the DB is using an audited web portal.
Since the structure of the DBs is pretty simple we can simplify the access to it VIA a very simple API.
The API should include both single entries action(add/modify) of entries and also bulk actions(for big lists).

I believe that such a setup can be implemented with containers and in a HA architecture. 

The actual DB for the lists which I have considered are:
* MySQL/MariaDB
* PostgreSQL
* MSSql
* SquidGuard
* ufdbguard
* SquidBlocker
* DNS Rbl

The limitations of SquidGuard and ufdbguard and DNS Rbl services is that they need to recompile the lists for usage.
For lists which should be re-compiled every one hour or so we can create a CI CD pipeline which includes a compilation
of the lists DB on a dedicated system and publish the precompiled files in a public storage ie s3 compatible or git.
A list change check can be done every 5 minutes for emergency updates but will only be updated periodically every 1 hour.
The above idea can work with both ufdbguard and SquidGuard or any DNS RBL system.

As for the user and campus dynamic lists, these should be stored and managed on a DB such as key-value or any other SQL
which doesn't require compilation to begin with.
If the dynamic lists DB will be small enough per user or campus it would be possible to use a ttl of 5-15 seconds on the dstdomain
external helper to reduce the number of times the "slow" queries against the DB will happen.
The other option is to use some kinds of RAM caching service such as Memcached or redis and to cache the response per 
domain per user for 300 Seconds ie the user "eliezer" response for "www.example.com" will be stored as: "eliezer://www.example.com"
and if the lists are small enough it would probably be simple to even trigger a prefix cleanup for all "eliezer://" namespace.
Currently all the Lists DB that I know about do not allow a query to know if a dstdomain is in a category or a set of categories.
With such a service we can divide the user lists to 2 separate searches/steps.
* customized dstdomain list match
* customized dstdomain to set of categories match

It can work for both white and black lists.

Working with pre-compiled lists or a fast enough service would allow the system to work fast enough and probably scale.

I have all the squid knowledge required for such a system but I need some help with the other moving parts.

I am open for any comments and suggestions about the setup technical or other aspects.

Thanks,
Eliezer Croitoru
ngtech1ltd at mgail.com
+972-5-28704261

From: squid-users <squid-users-bounces at lists.squid-cache.org> On Behalf Of Francesco Chemolli
Sent: Friday, February 9, 2024 12:00 PM
To: Marcus Kool <marcus.kool at urlfilterdb.com>
Cc: squid-users at lists.squid-cache.org
Subject: Re: [squid-users] Squid as an education tool

Hi Eliezer, Marcus,
  what you describe seems very similar to a captive portal, just with a very dynamic allowlist policy.
I'm confident that it can be implemented with Squid, a few helpers, and a side webserver plus a small website.
In fact, it would probably be a nice project to release to the community if it were built to be generic enough

On Fri, Feb 9, 2024 at 9:23 AM Marcus Kool <mailto:marcus.kool at urlfilterdb.com> wrote:
Hi Eliezer,

I am not aware of a tool that has all functionality that you seek so you probably have to make it yourself.
I know that you are already familiar with ufdbGuard for Squid to block access, but you can also use ufdbGuard for temporary access by including a time-restricted whitelist in the configuration file 
and doing a reload of the ufdbGuard configuration.  The reload does not interrupt the function of the web proxy or ufdbGuard itself.

Marcus

On 09/02/2024 03:41, mailto:ngtech1ltd at gmail.com wrote:
> Hey Everybody,
>
> I am just releasing the latest 6.7 RPMs and binaries while running couple tests and I was wondering if this was done.
> As I am looking at proxy, in most cases it's being used as a policy enforcer rather than an education tool.
> I believe in education as one of the top priorities compared to enforcing policies.
> The nature of policies depends on the environment and the risks but eventually understanding the meaning of the policy
> gives a lot to the cooperation of the user or an employee.
>
> I have yet to see a solution like the next:
> Each user has a profile/user which when receiving a policy block will be prompted with an option to allow temporarily
> the specific site or domain.
> Also, I have not seen an implementation which allows the user to disable or lower the policy strictness for a short period of time.
>
> I am looking for such implementations if those exist already to run education sessions with teenagers.
>
> Thanks,
> Eliezer
>
> _______________________________________________
> squid-users mailing list
> mailto:squid-users at lists.squid-cache.org
> https://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
mailto:squid-users at lists.squid-cache.org
https://lists.squid-cache.org/listinfo/squid-users

-- 
    Francesco