Sun. Sep 25th, 2022

Delving into application logging best practices which ensure quick resolution when things go wrong and considering “designing” logs as a core skill.

Here are quick links to other posts about building APIs

And then logging…

Myth: Log a lot, log everything

Fact

  • Minimal effective logs are needed
  • Each log takes time, uncontrolled logging has performance implications, up to 10% performance implication have been noticed in performance tests

What not to log

  • Entry/Exit from functions, if one really needs this, ensure it’s a debug log
  • Ensure production is reporting only Info, Warn, Error, Fatal level logs (Note: The log levels vary based on the library and language)

Code for failure

That sounds very pessimistic, isn’t it?

Things go wrong and when they go wrong is when a developer’s true merit comes forth.

The logs should speak and tell us what exactly went wrong

Hardening of code

Yes like a server would be hardened for production, a code base should be hardened as well

What is it

  • Log level control and right defaults (default to be info and above)
  • Ensure every line of code is combed through for failure scenarios like NPE, Index out of bound so on
  • Ensure precise logging in exception blocks

How and What to log

Exceptions

  1. Ensure we have a log.error or log.fatal or log.warn depending on application scenario
  2. Do take care to avoid duplicate logging, especially stack traces as it can get really cluttered when viewing logs

Important decisions / scenarios

  1. Ensure critical information or decision points in the code logic are logged to aid application debugging and understanding
  2. Examples: important counters, critical if else conditions

External API

  1. Ensure clear logs are maintained in case of any errors when invoking external APIs
  2. Status Code, Response Headers and Body and Exceptions (must have for non 2xx status codes) are some examples of data that must be logged
  3. Remember it is tough to debug dependencies and the more information available the easier it will be

Database

  1. Ensure DB exceptions are well logged with as much context as possible
  2. Ensure the attributes of SQLException are available in the log, debugging DB errors are painful

Other aspects

  1. A contentious topic is to decide using logs for tracking various aspects like time taken, number of calls, retry count, etc. due to overlap of responsibility with other tools and best analyzed in context of your application
  2. Ensuring application logs are consistent would definitely aid in log analysis

Log Management

Most applications now log only to console and log management tools like ELK, EFK, LogDNA integrate and move the logs to the respective applications

In case applications are following file based logging, following are must have to avoid logs eating up disk space and being able to process logs especially when something goes wrong

  1. Retention by time and size
  2. Rotation
  3. Compression

Examples

log.error(exception);

Doesn’t give any information as to what data caused this exception

log.error(“Failure”, exception);

A botched attempt to fix the previous log

log.error(“Failed to fetch ID:” + id + “ from database”, exception);

Now with this log we know, what data failed (id)and during which operation (database)

log.error(“Failed to call API”, exception);

Does not tell which API with what parameters, what was the response etc

log.error(“Failed to call API”, exception);

log.error(“Request:”, request);

log.error(“Response Status Code:”, statusCode);

log.error(“Response:”, response);

Now, we are logging all the information, however in separate logs.

These 4 logs can appear many lines away depending on the load on API server and the logs being printed across the application due to concurrent requests. Please avoid such logging and follow the example below

log.error(“Failed to call xyz API, Request:” + request + “, Response Status Code:” + statusCode + “, Response Body: “ + response);

Managing Exceptions

new Exception(“Failed to fetch ID:” + id + “ from database”);

Pass the exception along so that we do not loose the rich information they contain e.g. SQLException contains lots of information for us to decipher DB issues

new Exception(“Failed to fetch ID:” + id + “ from database”, originalException);

Ensures we have the complete context when we decide to log the exception

Conclusion

While this post has focused on logging for APIs and concurrency around it, the concepts are applicable to batch jobs or any program.

Leave a Reply

Your email address will not be published. Required fields are marked *