Logging
If you ask me what is the best place to get more insights to infra, my answer would be from Infrastructure Issues. I spend a good first half of my career supporting and solving infra issues be it network or server. In those days orgs used to heavily depend on IT support, they still do, though it has transformed to present day tech support or cloud support.
Unfortunately , even today it comes to infra it’s reactive approach to solve these issues has been the norm, although proper analysis of incidents with proactive measures can prevent them but they get ignored since they are considered non-functional or operational . These come to notice when businesses start losing customers to their competitors, review their annual performance and identity its due to lower QoS. The faster you can respond and solve any issue the better your QoS and the more trust you establish with your customers.
How many hours you have spent last month on production issues, how many downtimes you had and how long you spend to bring a back system up. Before I continue further spend the next few seconds to find out how long were those so called non-productive hours, so rewind your memory and start noting the downtimes.
Why log ?
Observability and can debug faster.
Root cause analysis and postmortem to identity root cause of an incident and take preventive measures.
In turn allowing you better commitment to your SLAs and SLOs and having the max no. of 9s in it.
In short it helps identify the fault lines in your infra and help to build a fault tolerant service.
Some common challenges in logging
Missing Logs
Format Issue
Too many tools
Too much logs
Log corruption
Summary
Faster you can debug & solve an issue the shorter your SLA or SLO to your customer. Customer sales increases with the increase on 9s in the SLA.
For faster debugging you need a one stop station of human readable formatted logs, in short a centralised robust logging system that is easy to operate, stable, scalable, secure and cost effective.
Last updated
Was this helpful?