I know the difference between a router and a switch, but there are a few fuzzy spots in my understanding.
When you uplink one switch into another, do they share mac address tables? Or is this a vendor specific function? If they don't share, how do they handle packets addressed to macs they don't directly control?
What is the largest IP address space that can be effectively handled using only a switched network, and at what point should you consider breaking the network into multiple segments joined by a router?
Which is more architecturally sound: one core router joining many subnets to the Internet or a hierarchy of routers (one per department uplinking to the core)? Or, is it best to give each department a router and then mesh them together into a mini-internet?
Regarding uplinking one switch into another: No, they don't share MAC address tables. Each switch maintains its own bridging table, which is built by listening to the traffic each switch receives on a give port. Consider the following example (apologies for the terrible ASCII art):
Host A is connected to Switch 1, Port 1. Host B is connected to Switch 2, Port 1. The two switches are interconnected via Port 2 on both.
Assume at the start the the bridging tables of both switches are empty. Host A wants to send frame to Host B. (To simplify things, we'll assuming that host A and host B have static ARP entries for each other, so there is no need to ARP for MAC addresses).
At this point, Host B receives the frame. When Host B sends a response, the following happens.
In terms of learning MAC addresses, this same process is followed regardless of the number of switches and the number of devices connected to them. As you add more complexity to your switched network (VLANs, Spanning Tree), more subtleties come in to play, but the base algorithm remains the same.
Regarding your second and third questions:
2) My personal bias is to minimise switching wherever possible. Spanning tree is the bane of many professional lives; add to that the fact that Ethernet has no loop protection; a minor misconfiguration could lead to broadcast storms that require you to manually intervene and down links in order for them to subside. Even if your network is small, have at least one router off which all your layer 2 subnets hang; it's just easier in my opinion.
3) It depends very much on the scale of your network, and how much intranet vs. internet traffic you expect to see. If there will be a lot of communication between departments, it may make sense to have a hierarchy of routers so that pure internal traffic does not impact internet access for everyone else. If on the other hand, you expect everyone to access only a common set of services (AD, email) and the internet, then a single core router (or a pair, for redundancy) may be sufficient.
In terms of giving each department a router and meshing them, how is this network to be administered? If there is going to be one administrative IT authority, then just build a hierarchical network; having users served by shared routers won't be a problem. If each department is going to maintain their own IT staff, then a router per department and internal peering may be required, but it will most likely complicate your network design.
I'm going to try to answer all of your questions as clearly as possible.
For you question concerning macs that they don't directly control I think you mean macs of PCs that aren't directly connected to their ports. Well let's take for example PC A on my switch and PC B on your switch. Both switches are connected via a standard uplink. Your PCs and your switch are all alone in the networking world when I come along and connect my switch. The PC on my switch is going to need your PC's MAC address and to do just that it will need to emit a broadcast ARP message (Level2 Broadcast but Level3 unicast since it has your PC's IP address). My switch will broadcast it on every one of it's ports. Then it arrives on your switch who will do the same. Then your PC will answer to my PC and they will both know each other's mac address. In the process, both switch will write down the mac addresses they didn't know.
Remember, switch-only networks operate on level 2 so they are (theoreticaly) independent from level 3. Let's just say going beyond a /8 (255.0.0.0) is not very reasonnable and will make you go out of private IP space.
I would definently say a hierarchy of routers because it allows you to have clearer configuration and enforce policies on a per-department basis. Cisco agrees with me in the CCNA :-)
Your question is long, I am going to address this part.
Most routers can implement some kind of firewalling. Since switches are layer 2 devices you usually don't do any kind of filtering there. If you need to separate departments for some kind of security, or if you need to implement a different firewall policy for that department it will be easier to enforce this if they are on a separate subnet.
In a hubbed network you are limited to 1024 devices in your collision domain. Even though this no longer technically applies in a switched network I generally try to stick to that as top limit of how many devices I put on a single subnet.
You also may want break up your networks with routers if you are using a protocol that does lots of broadcasting. Broadcasts are normally not passed by a router.
Citation for 1024 device limitation.
Ethernet: The Definitive Guide 3.6 Collision Domain (google books link)
For your second question, see When/why to start subnetting a network?
Generally stacks share a table between the members (eg, Juniper's "Virtual Chassis"), as they often have redundent paths, and a standard table can't work.
Otherwise each switch needs their own table, although through protocols like CDP and LLDP they can get more information from their neighbour.
If security's not an issue, then it simply comes down to broadcasts. Desktops (and laptops) tend to be chatty, so there's noticable broadcast traffic at only a few hundred machines. Well managed servers (eg, arp expiry extended into hours) add almost no broadcast load, so limits on your edge router throughput might be the limiting factor. For low traffic servers (or an exceptional network) many thousands of servers could easily be on the same broadcast domain.
Both are valid views, however in these days of Layer-3 switches that run OSPF I tend towards the second.