I am working in a project and my job is to build a database system to manage about 60,000,000,000 data entries.
The project background is I have to do real-time storage for large number of messages that read from about 30,000 RFID readers every second. Suppose every RFID reader generates 6,000 messages per one day, I have to insert 180,000,000 entries into the database.
A possible data entry is like "time_stamp, Reader_ID, Tag_ID, other_msg_content"
The will be queries (SELECT) base on time range, Reader_ID, and Tag_ID. The queries won't be very complicated.
Now I am designing the database system, and I plan to use MySQL. My dump questions are:
Is it wise to use MySQL, or I should resort to Oracle (which is expensive), or HBase?
If I must use MySQL, any idea how I can build the cluster?
If I insert the messages into a table, soon the table will be very long. I would like to use Sharding techniques to split a long table to many short tables.
3.a. I want to know the proper length for a MySQL InnoDB table, i.e., after how many data entries were inserted, I shall start to sharding?
3.b. Is there any good sharding proxy solution out there? I know spock proxy and some others, need recommendations.
Do I have to use MySQL Cluster? OR I just use mysql master servers and sharding slaves, and use Replication to achieve high availability?
Assume I have to handle 20 TB data in MySQL (for 1 year), I plan to use 20 nodes (PC server, cheap), and to store 1 TB data per node, is it possible? Any comments are welcomed.
Many thanks.
Thoughts:
Finally