Ping a Specific Port

Question

claws

Asked: 2010-08-10 10:29:42 +0800 CST2010-08-10 10:29:42 +0800 CST 2010-08-10 10:29:42 +0800 CST

"Site Down for Maintenance" what kind of maintenance work do they do?

772

I'm a moderate web developer. I haven't managed any high traffic websites. Generally, I observe that only high traffic websites are down for maintenance. stackoverflow.com will also go down for maintenance.

I always wonder. What kind of maintenance do they do? I mean, the process is automated.

user request --> web server --> server side programs --- > Database server.

What is there to maintain?

7 Answers

Voted

Jon Skeet · Answer 1 · 2010-08-10T10:32:05+08:00

Jon Skeet

2010-08-10T10:32:05+08:002010-08-10T10:32:05+08:00

Usually the highest traffic sites don't go down for maintenance. They're designed so they don't have to. (Depending on the site, that can be very tricky. It's not just a case of running multiple servers, although obviously that's the starting point.)

However, usually "Site down for maintenance" means any of:

Web application software upgrade (adding new features etc)
Hardware change (e.g. moving to a different data centre; during the switchover)
Something's gone terribly wrong and they're trying to fix it (e.g. there's been a power outage at the data centre; change the DNS entry to point to a static "site is down" page elsewhere until the power comes back)

6

eldarerathis · Answer 2 · 2010-08-10T10:32:29+08:00

eldarerathis

2010-08-10T10:32:29+08:002010-08-10T10:32:29+08:00

They may want to run updates (or fixes) on many of the different pieces of software running on the server, including (but not limited to):

The operating system
The webserver software iteself
Any scripting frameworks
Databases
Etc

Beyond that, they could also be doing hardware maintenance, such as adding a new hard drive, upgrading a motherboard, putting in faster RAM, or swapping out network cards. There's plenty of things, both hardware and software, that can be upgraded or modified, really.

Now if they have a backup server (or a cluster or something of the sort), this can be transparent, but if it's literally one box serving the pages...well, it pretty much has to go down.

3

Kara Marfia · Answer 3 · 2010-08-10T21:09:43+08:00

Kara Marfia

2010-08-10T21:09:43+08:002010-08-10T21:09:43+08:00

Since you're coming from a coding background, I'll base my analogy there. Imagine that being a sysadmin is just like programming, except you'll be called on to code in a different language every couple of hours. And sometimes it's Pascal.

Truly, though it could mean anything. Sometimes a mouse chews its way into a warm place. Or a single point of failure makes itself known. Eliminating downtime is what we pursue ... like writing code that works perfectly on the first compile.

2

thinice · Answer 4 · 2011-12-06T21:19:33+08:00

Liken a single server to a running vehicle. If you turn off the vehicle, your 'server' is down.

There are some things you can do while the car is running - add fuel, oil, washer fluid, clean the windshield, change gears, etc.

However, you can't replace the fuel line in the car while it's running - liken fuel to data; you don't want to lose any, or you'll have unhappy customers.

These downtimes vary based on the level of administrator skill and the complexity of changes. On larger, high traffic sites - the only way this could feasibly happen is if there's a major architecture change; something that, no matter how many servers and redundancies you have, the architecture needs to change all at once.

This is rare for very large systems - I liken it to replacing the fuel line on a running vehicle: for many, it's not feasible to do (or worth the effort/risk) at certain skill and resource levels. However, for places that have the skills and resources, they can perform a fuel line replacement on a running vehicle. Liken that to architecture migration; they do it a lot more complex.

SQLMenace · Answer 5 · 2010-08-10T10:37:49+08:00

SQLMenace

2010-08-10T10:37:49+08:002010-08-10T10:37:49+08:00

Could be upgrade of servers, frameworks, databases Moving to a new datacenter and shutting the old servers own so that nobody can connect Patching of operating systems or software that runs on those servers

basically anything that could make the site unavailable for a certain amount of time

1

SplinterReality · Answer 6 · 2011-12-06T22:20:35+08:00

Regular maintenance items would be things like rebuilding caches, upgrading software and/or templates, doing some data trawling for statistics, various routine maintenance tasks like backups, (which work better on quiet systems) and a variety of other expensive, infrequent tasks.

Some tasks just require pouring over a lot of data, and it's not really efficient to do after each change. Recommendation databases are one thing that comes to mind, as you don't need up to the second data, and it's rather expensive to calculate common purchase patterns across many different users. This is an N^2 complexity problem with some algorithms, and tends to take both a lot of data trawling, and lots of memory.

Financial institutions may use the down time to calculate and make interest payments to accounts, or close outstanding transactions and calculate reconciliation balances. This data in theory should never change after reconciliation, so it makes sense to write it to WORM storage at this point.

Backups are a major item that's often done during downtime because high Disk I/O tends to bring even very powerful servers to their knees, and taking the site offline can help speed the backup process. I remember one organization I was at, where they had a very large customer RAID array, and the backup team kept complaining because their backup window for this one customer typically extended 22-24, and at one point 26 hours. A small amount of quiet time can decrease that window substantially.

WhiteSites · Answer 7 · 2011-12-06T20:47:13+08:00

WhiteSites

2011-12-06T20:47:13+08:002011-12-06T20:47:13+08:00

Defrag the disk arrays. Its faster and safer to defrag servers when they are offline, allowing the CPU and disks to focus on that task rather than running 1000 websites. Its better to tell people to come back later, than to give them a poor user experience.

If its a windows server, you can crash it by running defrag while memory usage is over 50%. This is because at this point windows starts to rev up the page file. I learned this the hard way.

-6

"Site Down for Maintenance" what kind of maintenance work do they do?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?