Let's say I'm in the process of planning the setup of a website. I study similar sites that offer similar services or might receive similar traffic model.
Is there a way to determine a bit the kind of setup, software and/or hardware.
Some things are obvious. If I see .php or .jsp then I already know a bit. But any ideas on how to decipher more?
Maybe where the site is hosted, hardware, platforms...
Phone their admins and ask them?
You can get hosting (or at least network) provider easily enough (traceroute), OS and server software (nmap/HTTP headers), but it really doesn't tell you squat. There's a lot more to making a useful, scalable site than picking the OS and network transit provider. In fact, you could be buying yourself a lot of problems by replicating someone else's architecture -- who knows if they have constant problems with it? You're far better off finding someone who knows how to do this sort of thing and having them work with you to understand your specific problems and how to work around them.
Examining DNS records for all related site hostnames will give you a hint a the topology of the site. You may see multiple IP addresses (which don't necessarily mean multiple physical machine, but often will) and the same or different network addresses, which may hint at how they distribute the load for redundancy or speed reasons.
Examining the HTTP headers of a site's various services will give you a possible idea of the front-end. Are they using a reverse proxy, such as nginx or Varnish, or are you hitting the web servers directly? Are requests for PHP pages coming from a different server (apache) than those for static HTML and image files (nginx,lighttpd , etc.)?
Examining SMTP headers from mails sent from a site will give you more hints.
Traceroutes and pings will yield a little more info.
Of course, much info gathered will be speculation and guessing on your part, because a well configured site will not give out too much info about its internal architecture. What you'd be doing is, in essence, much what a penetration tester would do for certain info. Just make sure to not cross the line and disrupt the site.
From a client-side perspective there's almost nothing useful you could find out no, not unless the site has gone out of their way to make this easy for you - any they won't have.
You should be able to get info about the host from DNS, although that is unlikely to help you much in figuring out the platform unless the host is something like Google.
Work through the site and study the generated HTML and script files. By comparing those to the same from sites whose hosting provenance is known, you should be able to make a lot of inferences. This will work particularly well if the site is built on something like WordPress or Movable Type. On custom-written sites (such as the SO Trilogy), it will take a bunch of digging to find breadcrumbs leading to the framework (i.e. ASP.net MVC), but unless the developers have made a concerted effort to scrub them, clues are likely there.
I totally agree with Womble, anyway ,it seems like the begining of a pen test, you may look at some of the tools and techniques mentionned here and use some scapy magick.
You're going about this completely wrong. Any web site can be achieved in a number of ways. Forget about what others do or use. Start with a definition of what you're trying to achieve. Then determine the best way to get there. Smart IT is not about copying what someone else has done. It's working out the best and most efficient way to get the job done with as little requirement for maintenance as possible. A major factor should be what you (or whoever has to maintain the beast) are comfortable working with.