The Problem
SSL certificate providers are moving from certificates signed with 1024–bit RSA keys to the new 2048-bit RSA key standard. One article explaining the background and significance of the issues this can cause; also an article from VeriSign about the migration.
For our particular web application, we have hundreds of customer systems who connect to our system via SOAP API calls over HTTPS and via HTTPS POSTs. A secondary issue is end users who browse to the HTTPS URLs. For end users, the upgrade from a VeriSign 1024-bit cert to a new 2048-bit cert should not have a massive impact as most browsers/operating systems will trust the new root CA. The legacy systems that connect to us are a different story, developed up to 10 years ago they are on a variety of hardware and OS flavours and have varying certificate management strategies. The impact if they don't trust the new root CA is catastrophic, as their systems that have been running with no problems for years will suddenly stop working. The fix is simple, but requires an administrator to apply on their server.
Potential Solutions
- Delay the upgrade for as long as possible (Tricky as the certs will expire in the next year and no reputable providers issue 1024-bit certs.)
- Contact each customer and walk them through the upgrade process (Possible but difficult as the technical contact that did the original implementation may no longer be around)
How are other organisations handling this problem?
One of the features of the Certificate Authority system is that things change. Roots expire, new roots come into existence, certificates are revoked, roots go bankrupt after compromises. An update mechanism really needs to be in place to handle these changes.
There is a third option available, and that is to use a different CA that offers 2K certificates signed with an authority that was around 10 years ago. That isn't Verisign, but may be someone like Thawte.
If that proves unworkable, you'll need to go with a blend solution. Provide a test site with the new CA you picked for clients to validate against, and offer plenty of notice that the certificate will change and will break up to 25% of clients. Build as much update documentation as you can think of, and build more as clients run into issues. Stress that updating CA information is something that all systems need to do once in a while, so documenting this is a good thing in the long run; it's a best-practice.
You need an option 2+1. You need to set up a test/alternate site with a new certificate for people to qualify against. They'll never know if they have your particular root without a bunch of work but it is likely that just about any person in their data handling group can test a new link for issues.
Give people a timeline until the new certificate has to be in place due to the hard renewal date and offer help as you suggested. Warn them that help will be limited in the last weeks since you don't have 1000 people just sitting around hoping they get to work this week. ;-)
Without a hard date people won't move on it and without an easy test most won't bother testing to see if they are effected by the change.
If you can get the alternate site working as production you can just transition people there permanently and know from your logs on the old server how many clients are still out of compliance.
As you said, there are 2 options.
I would prepapre for option 2 and announce this change to your customer early combined with your technical assistance...
in case you know the ips of your customers' legacy systems, you could do some tricks, that will help you migrating on a customer-by-customer basis instead of doing a hard change on day X for everybody.