I often use SQLite for creating simple programs in companies. The database is placed on a file server. This works fine as long as there are not more than about 50 users working towards the database concurrently (though depending on whether it is reads or writes). Once there are more than this, they will notice a slowdown if there are a lot of concurrent writing on the server as lots of time is spent on locks, and there is nothing like a cache as there is no database server.
The advantage of not needing a database server is that the time to set up something like a company Wiki or similar can be reduced from several months to just days. It often takes several months because some IT-department needs to order the server and it needs to conform with the company policies and security rules and it needs to be placed on the outsourced server hosting facility, which screws up and places it in the wrong localtion etc. etc.
Therefore, I thought of an idea to create a distributed database server. The process would be as follows: A user on a company computer edits something on a Wiki page (which uses this database as its backend), to do this he reads a file on the local harddisk stating the ip-address of the last desktop computer to be a database server. He then tries to contact this computer directly via TCP/IP. If it does not answer, then he will read a file on the file server stating the ip-address of the last desktop computer to be a database server. If this server does not answer either, his own desktop computer will become the database server and register its ip-address in the same file. The SQL update statement can then be executed, and other desktop computers can connect to his directly.
The point with this architecture is that, the higher load, the better it will function, as each desktop computer will always know the ip-address of the database server. Also, using this setup, I believe that a database placed on a fileserver could serve hundreds of desktop computers instead of the current 50 or so. I also do not believe that the load on the single desktop computer, which has become database server will ever be noticable, as there will be no hard disk operations on this desktop, only on the file server.
Is this idea feasible? Does it already exist? What kind of database could support such an architecture?
Edit: I should point out that this idea is not pretty, stabile, best practice, or something I would really be proud of. The reason why I am still interested in the feasibility is that some of my clients are banks, and the bureaucracy involved with gaining access to a database is enormous. Often the project sponsor on such projects needs to be above Vice President level, due to their extreme security concerns with gaining access to servers. Needless to say, this means that there is a lot of work for setting up a Wiki. Later if the Wiki proves to be successful, it should, of course, be migrated onto a proper database server.
Edit2: The reason for this idea is to reduce the risk of Writer Starvation when using SQLite when the database is placed on the file server. This problem is described in section 5.1 here. Utilizing a desktop computer to have a cache of the most accessed information (i.e. Wiki pages), would mean that the work load on the file server would be reduced dramatically. This again should improve the user experience. Do you really think that I am still way off with this idea?
No.
Not that I know of.
See above.
Honestly, this is a really bad idea on many levels. There's a reason companies keep critical data within the datacenter. You don't want business applications to be dependent on X number of desktop machines to be up and running. Another issue would be firewalls - in all but small environments, there would be no guarantees that Desktop X would be able to communicate with Desktop Y, and good luck getting that firewall change past your network team.
Is there any reason your company doesn't have a central well-maintained database server that this app can use? There's no reason a company wiki should need its own database server.
This question isn't related to system administration but when I read it so many warning alarms went off that I just have to answer.
I really have to tell you that your entire concept is so far off the mark that you won't find anyone else doing it. For starters, SQLite is unsuited to such jobs and the fact that you've had some success with it is more due to good luck that anything else.
Your plan has so many holes in it I really don't know where to start but I will tell you that it will be an overly complex system that will prove to be incredibly unreliable and poor performing.
Your comment
Tells me much. To set up a wiki normally takes only minutes and any decent wiki system will have aides to speed up the importing of data from other systems.
I suggest you abandon your current design ideas and have a look at how such things are being done by others. Use any one of the common wiki systems (I prefer MediaWiki) with a regular database system (MySQL being very popular) and you'll not only save massive amounts of time but you'll end up with a system that is both more usable and more robust, plus much cheaper to implement.
In short, stop trying to reinvent the wheel, because your current design is going to end up more like a square with a hole roughly in the middle.
You could actually build a good distributed database environment if you partition (or target) your reads and writes at different databases. We do such work, and the trick is very simple. You have the master database on a file server and target all writes to it. You have a local copy of the database on every user's computer and you target the reads to it. You now also need a synchronizing mechanism between the master database and the local databases. This can be done in multiple ways. One way is to have a "delta" table in the master database. This delta table will contain the transactions that have been applied in the master database. Whenever the user's application performs a read or write operation, the delta on the master is first checked and updated locally. Only the transactions in the delta not yet applied (which can be checked based on time stamp) need to be applied. You could even have a background process doing this continuously. This delta could be a daily delta (or a weekly delta) when it is flushed. If a user has not logged on for a week or so, you just simply copy over the whole database to the user's computer. The advantage of having a local copy is that users can query stuff even when they are offline and - believe it or not - this is pretty fast even when you are online updating stuff.
As mentioned, this question is outside the scope of systems administration. That said, distributed databases and distributed data stores are being used in some very recognizable places. While SQLite's strengths do not generally lend itself to this type if application, it is not unheard of. Look, for example, at the Fossil project. Even though this is a distributed source control system based on SQLite, it also provides a distributed wiki and a blogging application, and might actually do the trick for you. While you should probably look beyond SQLite, this does not mean you need to abandon open-source. Consider implementing your project in Apache CouchDB or a Hadoop-based data store An even more novel approach is to create applications in a distributed user-space virtual environment like Inferno.
Your description sounds a lot like what POS (Point of Sale) systems use. One master terminal is declared on startup that does the database processing. A copy of the database is synced between the master and all the slave terminals for backup.
If the master was to fail, all the other terminals pop up a message saying "Make me the new master?". You press yes and everything continues on. This could continue until there was one terminal standing.
It works, and is sort of idiot proof, but having a corrupted database at the end of the day is common. Luckily, the terminals only store that days sales, so your daily totals may be off a little as some order didn't get saved right. This is preferred over the system going down for a few hours and loosing sales.
In a big network/power outage, the end of day cleanup is what over-time is for as the current days sales can get spreads across several different terminals and you have to sort it all out. I am glad I no longer do that work.
Stick to one big database server with good backups.
It's not altogether clear from your question where the data ultimately resides? Does it live on the centralized fileserver? If so, moving the database engine around a multitude of desktops while using the centralized fileserver as a disk store will probably not buy you much performance. If anything, the remoteness of the disk from the engine will probably cause it to run worse, if at all.
If the data is not centralized, data consistency will be a problem if you've got multiple desktops all containing different bits of data.
Similar problems exist around database configuration and security, none of which is trivial. And finally, running a database server on a desktop machine serving 100+ active remote users will have a noticable affect on that desktop's performance.
Have you seen http://litereplica.io/ ? They have a sqlite3 driver for nodejs and it seems fairly well architected.
I have Recently completed developed of a distributed database layer for a SOA/ESB/RESTful middleware framework that needed to be proprietary without the reliance of database infrastructure, built in c# with a wrapper for SQLite.
My database layer operates as a cluster of nodes, comprising of witness nodes (master and failover), data write/commit nodes (again master and failover) and replication nodes which basically stores replicated data.
On write operations, the selected write node generates unique id's and foreign keys which the location of successful data writes to nodes are indexed. this ensures replicated data maintain the same Ids and foreign keys. There are threaded/parallel processes that maintains replication. Foreign keys don't get strictly enforced but do work.
I also wrote wrote a client wrapper for this data layer that provided client connection string failover between witnesses.
So far testing and Benchmarking seems to prove concept. I've tested with different sizes of data and it seems to handle well. Obviously as my database layer is designed to be Restful middleware it's speed is less important than high availability. Further more, the requirements for the structure of your data is the primary factor in whether this approach will work or not.
My next revision will be to see if I can distribute retrieval of large datasets across replicated nodes as the dataset is streamed to the client framework, kind of a data grid with json idea.