Ping a Specific Port

Question

SunSparc

Asked: 2013-06-04 14:14:59 +0800 CST2013-06-04 14:14:59 +0800 CST 2013-06-04 14:14:59 +0800 CST

Need an efficient data container. Moving from storage to memory as quickly as possible

772

Problem: I need to copy a large block of data from a remote location into system memory as quickly as possible.

Scenario: I have a data processing system. The system is built via shell scripts on-the-fly using multiple components that are pulled in from remote locations.

One of those components is a large block of data stored as groups of files.

The requirement I have is to retrieve that large block of data from a remote location and install it into system memory as quickly as possible. This is a requirement so that the system which relies on this data can start using it for processing as soon after boot time as possible.

Question: "What would be the most efficient container for my data?"

Solutions already tried/considered:

ISO file: requires tools for creation and reading that are not typically native
TAR file: extracting can take a lot of time
Remote filesystem mounted as local: slow because contents need to be copied into memory
LVM snapshot: gear more toward backups, not built for speed on restore

Notes:

Data loss is not a primary concern.
The remote file transfer procedure is not a primary concern as I already have an adequate tool.
The system is currently using Ubuntu Linux.

3 Answers

Voted

Mark Wagner · Answer 1 · 2013-06-04T14:25:48+08:00

Mark Wagner

2013-06-04T14:25:48+08:002013-06-04T14:25:48+08:00

"The remote file transfer procedure is not a primary concern as I already have an adequate tool."

If you already have the file transferred, I suggest using mmap(2).

4

Hauke Laging · Answer 2 · 2013-06-04T14:30:44+08:00

You should consider an image file with a file system that contains your data (put a loop device over the file with losetup and mount the loop device). The fastest way would probably be a compresed read-only file system like squashfs.

This would even allow some tricks if not all the data is needed simultaneously. Instead of mounting the loop device you could put a DM device on top of it, mount a network file system (or network block device) with the image file, put a second loop device on top of the network version of the file and combine both loop devices with the DM device.

Let's assume you have to copy 500 MiB of data. You start copying it. As soon as the first 100 MiB have been transferred you create the loop devices and the DM device. The DM device points to the loop device of the local file for the first 100 MiB and to the other one for the rest. After e.g. each transferred 10 MiB block you suspend the DM device and reload it with the border shifted by another 10 MiB.

The risk is: If accesses go to the network version then that data is transferred twice. So if that happens often then the data transfer will take longer (the whole process may finish earlier though, depending on its access characteristics).

Edit 1:

See this answer of me to another question for an explanation how to use DM devices this way (without suspend/reload/resume though).

SunSparc · Answer 3 · 2013-06-07T14:29:54+08:00

My initial research into the ISO container was apparently incomplete. The ISO container seems to be the most efficient, for purpose of being able to quickly get to the contents. This is based on what my research has been able to uncover, and could of course change.

Packaged in an ISO I am able to:

store the data remotely
retrieve it very quickly via multipart transfer
store it locally, directly into memory
mount it quickly

Using this container I have been able to get the entire process down to under 1 minute, which is an acceptable tolerance level for this project.

Creating this container is done easily in Ubuntu with a command similar to the following:

mkisofs -o /tmp/data.iso /opt/data/

**Note that this requires genisoimage which is easily installed via apt-get.

To store the file directly into memory I created a ramdisk in the /tmp filesystem:

mount -t tmpfs -o size=3G tmpfs /tmp/data

Retrieving the container can be done relatively quickly with a multipart transfer utility. I used one called axel in this manner:

axel -a -n 128 -o /tmp/data/data.iso https://s3.amazonaws.com/bucket/data.iso

Finally we mount the file to a local filesystem:

mount -o loop -r /tmp/data/data.iso /opt/data/

The mounting process is nearly instantaneous, which allows the system to quickly begin using the data for processing.

Need an efficient data container. Moving from storage to memory as quickly as possible

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?