Real time monitoring of IT infrastructures is a critical topic.
Whatever the context may be, detecting system failures in near-real-time is key to SLA compliance and deep knowledge of the most critical failure points is one of the best way to provide a solid and reliable infrastructure.
Because of this reason, I always set up redundant monitoring and alerting systems for the infrastructures I manage.
My tool of choice for nodes and resources monitoring is Amon.
[Read More]