Missing logs for GELF UDP input over AWS NLB
Does your cluster really log everything?
Are you using a Graylog cluster with a GELF UDP input behind a load balancer in your monitoring stack? If so, you're probably irretrievably losing logs (especially the longer ones) and don't even know it!
How it all started?
Recently, my team encountered an issue with missing data being written to Graylog SIEM, which was visible in the application logs.
Initially, it seemed that the missing data was occurring at random intervals unrelated to higher load. We also did not notice any network issues.
Upon further analysis, it turned out that the missing data was most likely from the higher payload and it may be related to the MTU value.