We use to have a filtering/proxy setup that was configured so that about 500 users were routed through a Unix-based system running SQUID with SquidGuard to block questionable content and log what users were doing. What we found was that there were some people able to bypass the block by going to free online proxies that use the HTTPS protocol. I couldn't find a way to block HTTPS traffic without disabling it altogether, which wasn't going to work for purchasing and HR (or regular users with banking sites).
Is there a way to block or filter HTTPS traffic with SQUID/Squidguard? Or some other open source monitoring programs? How do others deal with it without resorting to commercial appliances?
EDIT: We're not trying to watch the traffic. We're trying to monitor the urls and block them as necessary. We don't want credit card numbers and such...we wanted to know when Johnny was accessing https://schoolssucksoweproxyforyou.com/playboy.com, add the domain to a blockfile, and prevent them from doing it again. See some added comments below...
EDIT2: (Re: speaking of HR, banking, etc...) You may not think about it that way, but we have a person that does purchasing, our district covers 7 buildings scattered around the county so we have someone in charge of distributing materials and inventory, we have the same people working with checking and banking records, we have people in charge of unions for staff and another for faculty, we have people in charge of dealing with HR information like insurance and injury claims and liability...there's a lot of things involved in running schools that I don't think the general public ever thinks about. Plus we do have issues where it was considered overbearing to prevent staff/faculty from doing online banking, or even students from being able to access certain sites for online use (such as web quests) that sometimes required SSL connections to work, so banning HTTPS outright at the router would not be feasible.
Anyway, moving on to your suggestion about blocking only certain people, I didn't see a way to integrate the functionality in for authentication on a per-user basis; if credentials could be passed along by Windows so users didn't need to keep authenticating with yet another password it would be a workable solution but anything I found was kludgey and didn't work reliably. Then it's an issue of getting people to remember yet another password (or students/staff stealing/sharing passwords). I personally liked having filtering done fairly across the board for staff and students rather than making it an issue of teachers can do X while students aren't considered enough of a person to do that same thing.
In the end I couldn't find a way to get the server to reliably authenticate against Active Directory (to centralize user management and reduce their passwords to remember), or if I used a different authentication scheme it would mean another database to keep in sync with...what, last count, ~1200 users or so? with a high churn as every year we have kids in and out of district, graduating, and new ones coming in?...
The entire point of HTTPS traffic is that it's encrypted between the server and the end-user so no one else can snoop on it - including your filters. You won't be able to do any content filtering on it. The only HTTPS filtering you'll be able to do is blocking the SSL port to specific IP addresses.
If you whitelist, you'll have loads of false positives - banks you didn't think of, useful sites that require HTTPS to login or access, etc. If you blacklist, you'll have loads of false negatives - new proxy sites pop up every second.
This is something that needs to be addressed at a policy level, not a technical one. If someone's goofing off on porn sites at work and using proxies to get around your filters, HR should be smacking them on the hand and threatening termination if it continues.
Here's a very ugly solution that's implemented by a commercial vendor:
Replace the browsers' CA certs with your own in-house one
When a connection is requested to an unknown address, the proxy connects with its own client, and fetches the sites' cert
Then generate a fake cert for that site, signed by your own CA
The proxy then effectively acts as a MITM (man-in-the-middle)
You can't do that with stock Squid, but it would take me about a day of mod_perl hacking to implement that with Apache.
What solution might be to do what IRC servers does sometimes, if they see a connection from XXX.XXX.XXX.XXX they will try to connect to that IP and see if its a open proxy server and if it is they block that IP.
Thats the closest thing i can think of that would be a fully automatic solution, but would require work on your end. That combined with the other suggetions regarding white listing or just manualy look at logs to see and then check the remote ip of the https and see if its hosting a site might be the only soltion
You must NOT monitor the content of HTTPS traffic even if you can, using man-in-the-middle techniques. It will weaken the security of the entire thing and just destroy the point of using HTTPS in the first place. You can, however, monitor the HTTPS traffic externally.
For example, you can know what sites they are connecting to because all HTTPS traffic through a web proxy uses the CONNECT method to initiate, which is the initial bit transmitted in the clear. This information can be used to help decide whether or not to allow a connection to continue.
You can also monitor the bandwidth usage of the HTTPS traffic. This can give you insight into whether they are using it to stream videos or they are using it for online transactions such as banking.
You may need a http ssl proxy like DeleGate as a Man-In-The-Middle proxy
EDIT: Add a new way probably work
Maybe you can try pass all traffic at the ssl port, then by the mean time start a process to detect if the target server is a ssl proxy or not(this can be detected if no authentication required), then block it.
Most content based firewall use such strategy, first passed it, then check and block it.
I'm not going to get into ethics here - maybe you want to do this on all sites you don't "know" the domain for... anyway, ethical issues aside:
It is possible, definitely need to add bits for squid2, but AFAIK squid3 will do this out of the box. As will some commercial web filter vendors. MITM style attack is generally the only way.
I know this question is old, but I believe that you should use a DNS proxy. You can therefore blacklist all domains you don't want (be it SSL or not).
dnsmasq works well, for example. You can configure it as a transparent DNS proxy. There's an open source Linux distribution called "Endian Firewall". You can use it for that, and it's EXTREMELY easy to set up.
What you surely won't be able to do without MITM is having reports of HTTPS sites (how much they were visited, how often, by whom, etc.).