Migrating to use WAF for an existing project

Recently Ive written a bunch of blog posts about using Frontdoor and WAF below:

One of my friends from a previous project Calvin Mayo asked a really great question on Linked In. It was basically along the lines of “How can a customer migrate an existing application or API to use WAF and Frontdoor but deal with all of the potential issues the WAF logs might identify”.

I wanted to write about this problem because its such a good question and one that I think is a big problem and something ive come across a few times.

One example which springs to mind is an application which had been migrated from AWS to Azure and there was a WAF solution being used (i cant remember which one) but when the app was migrated the Azure WAF had identified a lot of additional issues which it was reporting and I think the approach to dealing with this is important to understand.

What is our desired target state

The place most people will want to get to is where you have your WAF connected to your frontdoor and you are in prevention mode. This means any requests which dont the rules will automatically be blocked and not get near your code.

In this situation your logging in WAF is very explicit about attempts that were blocked by the rules and there is unlikely to be much noise in the logs and not many false positives events logged.

Ideal World Green Field Project

If your on a greenfield project and in development then you are in a great place and should look to add WAF and Frontdoor early in the project lifecycle. You will start identifying areas where the application may be coded in a way which wouldnt be desirable and where WAF might block API calls or page requests. You can identify these early and change the way they work to be compatible with the WAF rules. Your testing will make sure the app works well before go live and your WAF can be on prevention mode from day 1.

The challenge of a migration project or an existing app

If you are adding WAF to an application that is already built and working then the problem you can have is that the WAF rules immediately start logging loads of issues and blocking some API’s and pages which you havent realised before. At this point the application is already built so your WAF rules might stop some of the applications functionality. The question in this situation is how to handle it so you can still get the value from a WAF.

The most common response to this scenario is to put the WAF into Detection mode so that it logs the issues.

The problem here is that a lot of the times its just left like this and there are loads of issues being logged but no one is really doing anything about it.

What can we do?

The first thing is to be honest about where you want to get to. It is possible to get back to the desired state where WAF is in prevention mode and you are only blocking and logging genuine issues but its probably going to take some work and isnt likely to be quick.

If you just want to tick the box that you have a WAF and have reports about activity then leave it in Detection mode and your already there. I wouldnt recommend this at all as your really just waiting for an incident to happen but unfortunately all to often this seems to be what happens. You just end up with loads of issues logged and no one is really monitoring them or anything.

The steps id take to get to the desired place are:

Step 1 – Add WAF in Detection mode and get a feel for what issues you have

To start with when you have your WAF added to frontdoor, put it in detection mode and you can then start getting info logged about how your app/api is performing. It will start logging to log analytics any issues which the WAF rules identify. In detection mode the issues get logged but WAF wont block any calls.

You can not use kusto queries for your logs to check whats happening. You can look for specific rules that are problems and specific areas in the application or API which may be problematic. I wrote a post a few weeks ago with some useful kusto queries which will help you do this on the below link.

https://mikestephenson.me/2021/11/26/useful-kusto-queries-for-azure-frontdoor-waf-logs/

You want to be looking to identify the following:

Which rules have no issues at all
Which pages/api operations have no issues
Are there any common issues affecting many parts of the app

As you apply changes to WAF you need to be re-doing this analysis as things change.

Step 2 – Identify low hanging fruit

From the analysis of the logged events you might be able to identify some low hanging fruit of things in the application which could easily be changed to fix issues. If you identify quick and easy fixes its best to just get them done asap. We had a couple of these kind of issues where a very small code change gets rid of a lot of the events which were getting logged.

Step 3 – Identify sensible exclusions

From your analysis in step 1 you might find that there are some places where it would make sense to add an exclusion to one or 2 rules. An example might be if there is a particular url on your application which flags up an event from WAF but its actually needed for application functionality. If you decide that its better to add an exclusion to the rule than it is to change the application then the exclusion will get rid of a lot of events from the log making them easier to analyse.

We had a situation where a form parameter on a particular url was given an exclusion because it was needed for application functionality. By adding this fine grained exclusion it took a lot of log events away and it allows that rule to focus on the rest of the application.

Step 4 – Change rules from block to log

One of the common misconceptions is that WAF only has Prevent and Detect mode. This is kind of true but you can configure the action for specific rules too. When we get to step 4 we can go to all of our rules and change the action for all of them from Block to Log as shown below.

Step 5 – Change WAF from Detection to Prevention

This means I can now turn the overall WAF mode from Detection to Prevention but rather than rejecting any request which WAF doesnt like it will Log all of them instead.

Step 6 – Change some rules back to Block

I can now go back to my analysis and look for all of the rules which arent flagging up any issues. I can now change them to Block and if any requests trigger these rules they will be blocked. I should not be affecting the application functionality because during our analysis they werent triggered by the app. This is protecting us for future changes.

I now go back to my analysis and look at the rules where I had some exclusions and if they are reporting no issues because the exclusions are working fine then I can choose to block these too.

At this point I should just have rules not blocking where they are reporting issues in the log for certain parts of my app which I want to make changes to fix issues in the app code.

There is a decision to make here, do you want to leave them logging so its clearly visible you have an issue you know you need to deal with or do you want to add an exclusion to your rules so it gets the logs clean and then add work items to get the code fixed so the exclusion removed. There are pros and cons of the two options to handle it here. Id do what means its most likely it wont get forgotten about. The risk of he exclusion is that it does get forgotten that there is an issue in your app and because the logs dont flag up the issue you might end up with the misconception that everything is fine.

Step 7- Add some tests

You now have a number of identified code changes to make and you should look to be able to develop a test that can be executed to make the WAF event get logged. This lets you recreate the issue and test it when the code is fixed.

One of my friends Mikael Sand suggested a great idea on Twitter that you could add some postman tests to a pipeline and automate some of the testing which could be a good idea.

Step 8 – Identify changes needed to fix the issues that are left

You would then implement the code changes to the app to make it work in a way that will not raise issues in WAF. You can then test and deploy the changes and monitor the logs to make sure its all fine.

Step 9 – Turn Rules back to Block

By this point you should be able to get all of your rules back to block and your WAF is in prevention mode with just a small number of accepted exclusions.

Summary

Hopefully this gives you some thoughts on how you can migrate to use Azure Frontdoor and WAF and then take a structured approach to managing all of the events you might get. If you are using WAF but there is so much noise in the logs that it isnt monitored and its in detection mode then you should ask what value you are really getting from the WAF. Ok you have a lot to retrospectively look at what happened if you had an issue but the real value is helping you have a more secure and higher quality application and hopefully this approach will help you manage your way through from a mountain of unmanagable noise to a WAF that is really helping you.

What is our desired target state

Ideal World Green Field Project

The challenge of a migration project or an existing app

What can we do?