We have training rooms where normally Windows XP is installed (via PXE). The "normal" DNS/DHCP infrastructure are Windows-Servers. The training room has its own VLAN (different from the Windows servers), so there is most propably an IP helper for DHCP requests active on the Cisco router where all PCs from that room are connected to.
Now we wanted to convert some of the PCs to Linux instead. The idea was: Put our own Laptop with a DHCP server into the VLAN of the room and override the "normal" DHCP response. The idea was that this should work, since a directly attached DHCP server in that VLAN should have a faster response-time than the "normal" DHCP server located some hops away from that VLAN.
It turned out that this did not work. We had to manually release the lease on the original DHCP server to get it working.
On the Laptop we did see the client requesting the IP and "our" dhcp was sending NACKs to the Windows IP request, before that we did offer our own response.
Old Question: Why did this not work out as expected? What is making the PC regain its old lease?
Update 2012-08-08:
The regain-issue has been explained in the DHCP-RFC. Now this explains why the PC regains its old lease.
Now we do release the IP from the Windows-DHCP-server before giving it another try.
Again - the Windows-DHCP-server wins.
I suspect that there is some algorithm for the dhcp-client which determines the "best" dhcp-answer for the client. The new question is:
How does the client choose the "best" answer?
Assuming the router is still acting as a DHCP relay and forwarding the request to your original server, then the reason it did that is simply because that Windows DHCP server told it to go ahead and use the IP. In this instance the DHCPNACK from the new server is irrelevant, as a DHCP client will consider all responses, and since it got an offer from the Windows DHCP box, its perfectly happy to use it.
It is vendor, even firmware specific how a client reacts to multiple DHCP answers.
Variants I have seen over the years are:
1) Accept the first regardless whether it is an ACK or NACK.
2) Take the first ACK, ignore NACK's completely.
3) Take the last ACK received within a set time-interval (usually 5-10 seconds).
Example: Some years ago we had issues with Ricoh MFP's.
We had 2 DHCP servers. One supplied the addresses, the other only additional DHCP options. The 2nd server always answered first.
The Ricoh's used variant 1) even if the 1st offer only contained DHCP options. Ricoh changed it to variant 2) with a firmware update after we explained the problem to them.
If nothing else helps - RTFM (read the fine manual). In this case the first one was the hit.
RFC 2131 outlines DHCP-operations.
Section 1.6 states that DHCP must:
Now the interesting question is how that design goal is being achieved on a client that has no knowledge of its past. Section 3.2 outlines:
So a DHCP-server holding an active lease gets precedence by using a shortcut in the protcol.
From then on the Laptop-DHCP-Server is being ignored by the Client.
So the solution in our case will probably be (I will update this when we actually test it):
The new question should probably be in a different question - the title of the question doesn't fit at all with most of the body of the question.
In any case, with regard to how a client chooses which offer to go with, in the case where it has no current lease: it's up to the client, but in every DHCP client implementation that I'm aware of, it's a simple race.
RFC 2131 covers this:
There's an IETF draft out there that seems dead that would have added configurability to the selection process, and also mentions the lackluster client implementations (of over a decade ago, but not much has changed):
Having two DHCP servers providing service to the same network with different configuration just results in races, which is not desirable from a reliability or predictability perspective. There's really no reason you can't get your single DHCP server to provide what you need.