Cloudflare has admitted that it broke its own logging-as-a-service service with a bad software update, and that customer data was lost as a result.
The network-taming firm admitted in a Tuesday post that, for roughly 3.5 hours on November 14, its Cloudflare Logs service didn’t send data it collected to customers – and about 55 percent of the logs were lost.
Cloudflare Logs gathers logs generated by the cloud services and sends them to customers who want to analyze them. Cloudflare suggests the logs may prove helpful “for debugging, identifying configuration adjustments, and creating analytics, especially when combined with logs from other sources, such as your application server.”
Cloudflare customers often want logs from multiple servers and, as logfiles can be verbose and voluminous, the provider worries that consuming them all could prove overwhelming.
“Imagine the postal service ringing your doorbell once for each letter instead of once for each packet of letters,” the post suggests. “With thousands or millions of letters each second, the number of separate transactions that would entail becomes prohibitive.”
Cloudflare therefore uses a tool called Logpush to bundle logs into bundles of predictable size, then push them to customers with a sensible cadence.
Logs that Cloudflare provides to customers are prepared by other tools called Logfwdr and Logreceiver.
On November 14, Cloudflare made a change to Logpush, designed to support an additional dataset.
It was a buggy change – it “essentially informed Logfwdr that no customers had logs configured to be pushed.”