I'm in the process of building a data stack for a small company : the choice has been made to have a UNIX server running as a "Scheduler". The goal of this scheduler is just to extract data from different applications and send it back to a in-cloud datawarehouse. Opposite flows are also expected in the near future.
In concrete terms the server will just host Airflow and run Python and Bash scripts. It's likely that the server will also host other Flasks apps for internal use only (data, monitoring etc.). The company in this case has really "small" flows : we're not talking of big data.
Now that I have to choose a host and hardware specs I'm a bit confused, what should I look for and what descriminants should I think of ? Is there any must-have option/capability or any that I should avoid at all costs ?
Thank you !
0 Answers