I want to build a small warehousing application for my company. We have a central warehouse which distributes to 8 sales points across the country. They insist on an in-house solution. I am thinking to setup a central mySQL db Linux server and have the branches connect to it to store sales.
Queries to the db from the branches will be minimum, maybe 10 per hour. However I need all the branches to be able to store each sale data ( product ID, customer ID ) in the central db at peak time at most once every five minutes.
My question is can I get away with simple 24mbps/768kbps DSL lines? If not what is the bandwith requirement? Can I rely on a load balancing router to combine additional lines if needed? Can you propose some server hardware specs?
Ok let me clarify this a bit. All I need is to store data (prodctID, itemsSold) when something is sold and to retrieve availability in other stores. Eg to return quantity of certain product in other branches in order to re-supply from central warehouse or have some other branch send something over. I am guessing one row (branchName, itemQuantity) per every other branch (7 branches - 7 rows) whenever a branch is out of something. I think the data sent is minimal but I dont know if there is overhead. How can I estimate that?
Only you can answer this question by measuring or estimating the bandwidth usage based on the number of queries and/or transactions and the size of the dataset per query and/or transaction, multiplied by the number of queries and/or transactions over a specific period of time.
As you are not going to transfer much data, you won't need too much bandwidth. Additionally compression using f.e. SSH Tunnels could help.
In my experience, latency is a much greater issue for remote (DB) applications.
You also could set up local MySQL instances in a cluster-like solution. Propagating of changes would be managed asynchronously by the database itself.
Given the narrow scope of your project, it actually should be fairly simple to figure out the kind of bandwidth you'll be consuming and the amount of time it will take.
Sending queries
This is pretty simple to calculate. Let's assume this is your query:
That's 79 characters. 79 characters = 632 bytes, you have a 24Mb inbound connection so that query will take
24*1024*1024/632
simultanious queries (39819) before becoming bandwidth limited. I can't tell you how long that will take with any certainty though because:But it should be reasonably quick.
Retrieving data
Let's assume:
That's a grand total of 20 + 4 bytes for each row. 7 rows =
7*(20+4)
= 168 bytes. You have 768Kb of outbound bandwidth, so you can send 4681 requests of that size simultaniously before they start getting squeezed at the bandwidth end.Now forget everything I've just said
Because there's so much more to it than just that. As I've already alluded to, there's overheads in authentication, initiating connections, and then you've got latency over DSL links, possible contention ratio issues, and because it's not over a switched network there's every possibility that a whole bunch of TCP re-assembly and re-transmission is going to be required for every single query and this can affect perceived speed dramatically.
The only way to really know is to try.
The bandwidth requirement should be related not only to the frequency of queries, but it should be related to the queries output and table(s) size. For example, a single query may return one single row and another query may return thousands of rows. So, there is no specific answer unless you have an approximate of your data size and type of queries.