Application data sent over TCP experiences multiple encapsulations:
- The application data is encapsulated within one or many TCP fragments
- The TCP fragment is encapsulated within one or many IP datagrams
- The IP datagram is encapsulated within an Ethernet frame
It turns out Ethernet frames are sent most-significant byte first, and within each byte, most-significant bit first. What about the multiple encapsulations? Are they performed MSB first or LSB first?
First, one correction: IP datagrams are not sent within one or many Ethernet frames. One IP datagram is sent within exactly one Ethernet frame. The other stipulations in your description are true, although TCP tries hard to choose the segment size to that one TCP segment does not have to be fragmented into multiple IP datagrams.
All of the protocols in the TCP/IP suite use what's known as network byte order, which is the same thing as big endian, which is the same as MSB first.
TCP and IP do not really deal with things as the bit level, only at the byte level. So they are subject to whatever the physical layer (be it Ethernet or a serial link or something else) does with the bits.
Virtually everything in IP and its related protocols is most significant byte first. In older documents, such as the RFCs you should be reading, you will see this referred to as "network byte order".