Have you been wanting to join the Blend team? Well, good news: We're hiring. Check out the open positions!

Getting the most out of Optimizely Cloudflare Edge Logs

Optimizely is introducing the ability to access Cloudflare's edge logs, which gives access to some information previously unavailable except through support. Access and analysis can be a bit clunky, so Blend's Director of Development Bob Davidson provides some strategies for getting the most out of these logs.

Authored by

Categorized

  • Optimizely
  • Thoughts
  • Development

Optimizely is introducing the ability to access Cloudlfare Edge logs. I was fortunate to be a part of the beta.

First, a bit of context: Cloudflare works as an intermediary between web traffic and your site. Requests come to Cloudflare and Cloudflare can either proxy the requests to the site or serve the expected contents from its own cache at the edge.

Caching at the edge is great performance boost in addition to the other benefits Cloudflare provides, but it also means that not all traffic actually reaches the server. This means not all traffic is logged at the server (in Application Insights).

But it is logged at Cloudflare.

The Edge Logs feature works by pushing Cloudflare's logs into an Azure Storage account, from which you can then pull the logs for analysis.

How is this useful?

More information is always useful, especially when you are looking for tricky issues. Specifically:

  • If you're doing advanced caching at the edge, pulling the logs can give you information about cache hit rate, what requests are getting missed, and more.
  • If the site is experiencing unusual traffic, the edge logs can provide more detailed information about the traffic, including IPs. This can be helpful, for example, for targetting bad bot traffic.
  • Ultimately, Cloudflare logs can provide a different view into traffic trends that may not be as accurately represented in web server logs.

Of course, you can request all of this data from Optimizely support, but with Optimizely's Cloudflare Edge Logs, you now have the ability to get it yourself.

How does it work?

As mentioned, logs are stored in an Azure Storage container. You can get access to this container via an Shared Access Signature (SAS) link through the DXP PowerShell module, EpiCloud:

  1. Install the module if you haven't already.
  2. Log into the PaaS portal and generate an API key for your project, making sure the key has access to the General environment (note: this appears as Edge Logs in the PaaS portal for me, but it's called the General environment when making API requests).
  3. Use the EpiCloud powshell module to generate a SAS link. In PowerShell, the command looks something like this:
$result = Get-EpiStorageContainerSasLink `
  -ProjectId "XXXXXXX" `
  -ClientKey "XXXXXXX" `
  -ClientSecret "XXXXXX" `
  -Environment "General" `
  -StorageContainer "cloudflarelogpush"

Note: You should be taking appropriate precautions and not logging the client secret in plaintext anywhere when entering API keys on the command line.

This command will put an object in the $result variable. This object includes a few properties, such as expiration date, container name, and most importantly, the SAS link. I usually put the SAS link on the clipboard and paste it into Azure Storage Explorer.

$result.sasLink | clip

Using something like Azure Storage Explorer, you can now download the logs. The logs consist of gzipped text files, with each request represented as a single line of JSON in the text files.

Automation

All of this can be helpful, but it can be more helpful with a sprinkling of automation. Lines of JSON are fine for quick debugging or finding specific and timely information; this requires almost no setup or infrastructure. At the other extreme is observability software with log ingest, metrics, and dashboards; this is far more powerful, but requires significant investment. For me, what I need is most often somewhere in the middle: the ability to quickly import targeted data into an environment where I can run ad hoc queries in a language I know well. In other words, a SQL database.

So I've written up quick and dirty C# code to automate generating the SAS link, downloading just the files I need, and importing them into a SQL database where I can work with the data.

Note: What follows are the pieces that can be assembled into a solution. My final build of this is part of an internal automation tool that we're not ready to share, but one could assemble the following into a complete solution.

The DXP Client

The EpiCloud powershell module is mostly a wrapper around the Optimizely DXP Deployment API, but there's no reason we can't also build a wrapper around the API in C#.

Here's an example, written using the Flurl.Http library. This example is only capable of authenticating and requesting SAS links.

Download Files

The next step is listing and downloading logs. Logs are broken up into chunks by time period, and the file names reflect that. You'll see filenames lke: 20250110T200036Z_20250110T200136Z_3e5fc78d.log.gz. Decoding this is a relatively simple matter of spacing out the component parts of a full datetime. In this example:

  • 20250110T200036Z is the start time
  • 20250110T200136Z is the end time
  • From 20250110T200036Z (the start time), we get:
    • 2025 is the year
    • 01 is the month (January)
    • 10 is the day
    • 20 is the hours (8 P.M. in 12 hour time)
    • 00 is the minutes
    • 36 is the seconds
    • All of this is in UTC

Given that these logs add up quickly (particularly when doing heavy load testing), it can become necessary to filter out log files from time periods you're not interested in. Because the log filenames include a start and end date, it's a relatively simple matter of only downloading logs that appear in the window of time you want.

Below is an example of using the Azure.Storage.Blobs nuget package to download files behind the SAS link, roughly filtering by date (using string comparison because the date format allows that), and using a SemaphoreSlim to download multiple files concurrently.

Inserting to SQL

Now that we have the files, we can read them, parse them, and dump them into a SQL database.

First, we need a table that holds the data we're most interested. For example, this is the table I've put together:

CREATE TABLE CloudflareLog (
	ID INT NOT NULL IDENTITY(1,1),
	RayID varchar(32) NOT NULL,
	ClientIP varchar(64) NOT NULL,
	ClientLatitude varchar(32) NULL,
	ClientLongitude varchar(32) NULL,
	ClientRequestHost nvarchar(256) NOT NULL,
	ClientRequestMethod varchar(16) NOT NULL,
	ClientRequestPath nvarchar(2048) NOT NULL,
	ClientRequestReferer nvarchar(2048) NULL,
	ClientRequestURI nvarchar(2048) NOT NULL,
	ClientRequestUserAgent nvarchar(2048) NULL,
	CacheCacheStatus varchar(16) NOT NULL,
	CacheResponseBytes int NOT NULL,
	CacheResponseStatus int NOT NULL,
	EdgeResponseStatus int NOT NULL,
	OriginResponseStatus int NOT NULL,
	OriginResponseDurationMs int NULL,
	
	EdgeStartTimestamp varchar(128) NOT NULL,
	EdgeEndTimestamp varchar(128) NOT NULL,

	CONSTRAINT PK_CloudflareLog PRIMARY KEY (ID)
);

Next, we need to insert the data into this table. While we certainly could just use a long series of INSERT statements, we will quickly run into performance issues when inserting millions of rows. This quick and dirty setup ends up only being dirty if it takes an hour to load data from an incident. Instead we lean on a minimal implementation of IDataReader to feed our log entries to SqlBulkCopy. This combination allows us to write rows to the database without individual log entries, speeding the process up significantly.

Reading the table

With the data in a SQL table, we can now query it like any other SQL data. For example, if you wanted to know roughly what your cache hit rate is at the edge, you could query the total number of hits for each Cloudflare cache status:

SELECT
    EdgeResponseStatus,
    COUNT(*) as Total
FROM
    [CloudflareLog]
GROUP BY 
    EdgeResponseStatus
ORDER BY 
    COUNT(*) DESC

You might also want a time series representing total requests in five minute intervals. That might look something like:

SELECT 
    DATEADD(MINUTE, DATEDIFF(MINUTE, 0, EdgeStartTimestamp) / 5 * 5, 0) AS TimeBucket,
    COUNT(ID) AS RequestCount
FROM 
    [CloudflareLog]
GROUP BY 
    DATEADD(MINUTE, DATEDIFF(MINUTE, 0, EdgeStartTimestamp) / 5 * 5, 0)
ORDER BY 
    TimeBucket;

The only limitations are the data you have, the queries you write, and the compute needed to process the results.

Putting it all together

Above are all the individual components of some kind of Cloudflare Edge Log automation workflow. Ultimately, it's up to you to incorporate them together in a way that works and is secure. Below is the absolute simplest way one could string these steps together:

// Generate the Sas Link
var sasResult = await DxpClient.GetStorageSasLink(
    "[[PROJECT ID]]", 
    "[[CLIENT KEY]]",
    "[[CLIENT SECRET]]", 
    "General", 
    "cloudflarelogpush", 
    false
);

if (sasResult is null || !sasResult.Success || string.IsNullOrEmpty(sasResult.Result?.SasLink))
    throw new InvalidOperationException("Something went wrong. Put a better log message here.");

// Download the matching files from 2025-01-24 midnight UTC to 2025-01-25 midnight UTC
var downloadedFiles = await SasLinkDownloader.DownloadFiles(
    sasResult.Result.SasLink,
    @"D:\tmp\logs", 
    "202501240000",
    "202501250000"
);

// Now bulk upload them SQL. 
CloudflareBulkSqlUploader.UploadLogFiles("Data Source=.;Initial Catalog=CloudflareExampleLogs;Trusted_Connection=True;Encrypt=False;", downloadedFiles);

The Optimizely Cloudflare Edge Logs availablity on its own is a great tool to have in your pocket to help monitor, analyize, and diagnose traffic on your site. And with a touch of automation, it can truly become a powerhouse of data.

Blend's Director of Development, Bob Davidson, provides tutorials on all things development.

His web series, Coding with Bob, can be found on Youtube. Check it out!