Websites which supply ISO files for download will often give the md5 checksums of those files, which we can use to confirm that the file has downloaded correctly, and has not been corrupted.
Why is this necessary? Surely the error correcting properties of TCP are sufficient. If a packet isn’t received correctly, it will be retransmitted. Doesn’t the very nature of a TCP/IP connection guarantee data integrity?
There is probably a zillion reasons why one should check the md5sum but a few do come to my mind:
And it only takes a few seconds anyway.
As has been noted by others, there are many possibilities for data corruption where any checksum at the transport layer cannot help, such as corruption happening already before the checksum is calculated at the sending side, a MITM intercepting and modifying the stream (data as well as checksums), corruption happening after validating the checksum at the receiving end, etc.
If we disregard all these other possibilities and focus on the specifics of the TCP checksum itself and what it actually does in terms of validating data integrity, it turns out that the properties of this checksum are not at all comprehensive in terms of detecting errors. The way this checksum algorithm was chosen rather reflects the requirement for speed in combination with the time period (late 1970's).
This is how the TCP checksum is calculated:
This means that any corruption that balances out when summing the data this way will go undetected. There are a number of categories of corruption to the data that this will allow but just as a trivial example: changing the order of the 16 bit words will always go undetected.
In practice, it catches many typical errors but does not at all *guarantee* integrity. It's also helped by how the L2 layer also does integrity checks (eg CRC32 of Ethernet frames), albeit only for the transmission on the local link, and many cases of corrupted data never even get passed to the TCP stack.
Validating the data using a strong hash, or preferably a cryptographic signature, is on a whole different level in terms of ensuring data integrity. The two can barely even be compared.
TCP/IP does guarantee data integrity*. But it does not guarantee that 100% of a file has been downloaded. There can be many reasons why this could happen. For example: It is possible that you can mount an ISO that misses one or two bytes somewhere in the middle. You won't have a problem with it until you need one or two particular files that are corrupt. Comparing checksums ensure that you really did download the whole file.
* see comment
The TCP checksum is only 16 bits. This means that, in the absence of other checksums, one out of every 65536 corrupted packets will be accepted as non-corrupted. If, for example, you were downloading an 8GB DVD image across a noisy link with a 1% corruption rate, you'd expect 81 undetectably-corrupted packets.
MD5 is a much larger checksum, at 128 bits. The odds of those 81 packets producing something with the same checksum as the original is about 1 in 1,000,000,000,000,000,000,000,000,000,000,000.
There are several reasons to verify the checksum of a file downloaded via HTTP:
1 sources in comment because lol rep
Daniel, Depending on the tool you are using for the ISO Download per say. If it is Say Firefox.. It may show the file download. However you may not have the full ISO intact. If you burn it then try to use it, information may be missing. This happens time to time on different webservers hosting files.
It is a good practice to at least compare the file size (total bytes or bits) make sure they match. Windows will show the file byte count different then say Linux. MD5 sum check will show same values no mater which OS is used. Hope this helps a bit. Cheers...
I notice lots of interesting answers but there is a last thing to consider:Two Generals' Problem
The two generals problem and the Byzantine Generals problem consider specifically the implications of transferring information reliably through unreliable channels.
Checksums are just another layer of "increasing reliability", and one with a very slim chance of failure. This is the reason why it is so popular.