I want to get some advice / best practices for how to go about developing a strategy for scaling up my a web application. I waffle on a bit here and show my limited knowledge but want to fill in my knowledge gaps. It is tempting to learn as much of this as possible myself but realise I need to look at getting some outside help so as a general question it would be good to know what the easiest things to outsource are.
My background - I am a developer who has primarily worked developing user interfaces and have been working in Flash and PHP to develop the functionality of the app which sees users uploading images and video to share online.
The system architecture is as follows -
- a single web server which also acts as database server (MySQL). This server is on a managed hosting package with a trusted and reliable hosting company. The webserver serves up PHP pages and Flash SWFs which are the main UI components
- Amazon S3 buckets for storage of user's images, video and audio files.
- User interface components are either PHP pages, or Flash SWFs, for example images and video are viewed through Flash swfs which query the database through an AMFPHP service to get urls of the images and video files to load. These are then received from Amazon S3 buckets. Another FLash SWF handles uploads and POSTS files to a PHP script running on an EC2 instance on Amazon Cloud.
- The upload server for managing image, video and audio uploads. This is an Amazon EC2 instance running behind an Elastic Load Balancer set to add more instances when it sees CPU capacity reach 80%.
- We also use a third party service which also runs on Amazon EC2 for transcoding video files.
So for the large part I think things are set up ok to be able to scale up. But I have no experience of how to scale up or manage a high traffic web application and so will be relying on our webserver hosting company to manage the setup of a scalable configuration of web/app server and database server.
Hardware / architecture scaling -
As I understand it the first steps here will be to separate the web server and database and have a database server running on its own, place the webserver behind a load balancer and ultimately to have a master/slave configuration for the database servers. What should I ask my web hosting company to do? What are the issues around doing this and what are the implications for things like my AMFPHP services, the different types of queries - writes and reads? I have a separate script which contains the database connection details which gets included in the globals.php script so I can easily update the connection details in a single step. Am I right that in a master / slave config all writes usually go to the master server and reads come from a slave? would this mean I would need to look at all my database queries and make sure that if its an "UPDATE" or "DELETE" query that it goes to the master database server? In my PHP code the database queries are spread throughout the scripts and called from within functions as needed. I have read a little about database abstraction but do not fully understand the significance of this approach.
Code optimisation for scaling -
What are the things I need to think about changing in my code to make things more scalable? What are the common things in PHP that are affected by scaling?
Security -
What are the common things relating to security that I need to be aware of when thinking of dealing with large volume.
Database optimisation, backup and recovery procedures -
What are the best ways to implement an automated database backup and recovery strategy for a MySQL database for a large database. Should I think about splitting my database - I keep all data on a single server though there are several databases such as a database of user membership information, a database for holding data about user uploads, a database with data about the running of the site such as things that are more related to administrative functions. I want to be able to analyse the databases to be able to see how things are working and get a picture of how the web app is being used, developing so assume it is better to take a snapshot of the database on a periodic basis and store these so I have a timeline of how things have developed over time and use some sort of database analysis tool to query these snapshots to track how things have changed. Is there any good advice about how to go about this, tools that allow both myself and other non-technical people to analyse the data and produce reports.
This about covers my understanding of the areas I will need to address when developing a scale-up strategy. What do people think i need to do to successfully manage the above issues? Am I right in my assumptions? What have I missed? What other things should I consider? Are there any good resources out there for dealing with this? If I was to recruit someone to help with these issues what are the skills / experiences that I should expect the perfect candidate to have?
All advice / experiences / suggestions gratefully received
cheers!
Just some pointers, to be helpful after my vote to close. Consider reading:
"Building Scalable Web Sites" by Cal Henderson
"Scalable Internet Architectures" by Theo Schlossnagle
Both are good books on how to build scalable web applications, both covering the technologies used and the mindset / design process.
This question is too broad to answer here - I suggest reading highscalability.com