
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Thu, 16 Apr 2026 00:42:29 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Get better visibility for the WAF with payload logging]]></title>
            <link>https://blog.cloudflare.com/waf-payload-logging/</link>
            <pubDate>Mon, 24 Nov 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ The WAF provides ways for our customers to gain insight into why it takes certain actions. The more granular and precise the insight, the more reproducible and understandable it is. Revamped payload logging is one such method.  ]]></description>
            <content:encoded><![CDATA[ <p>As the surface area for attacks on the web increases, Cloudflare’s <a href="https://www.cloudflare.com/application-services/products/waf/"><u>Web Application Firewall (WAF)</u></a>  provides a myriad of solutions to mitigate these attacks. This is great for our customers, but the cardinality in the workloads of the millions of requests we service means that generating false positives is inevitable. This means that the default configuration we have for our customers has to be fine-tuned. </p><p>Fine-tuning isn’t an opaque process: customers have to get some data points and then decide what works for them. This post explains the technologies we offer to enable customers to see why the WAF takes certain actions — and the improvements that have been made to reduce noise and increase signal.</p>
    <div>
      <h2>The Log action is great — can we do more?</h2>
      <a href="#the-log-action-is-great-can-we-do-more">
        
      </a>
    </div>
    <p>Cloudflare’s <a href="https://www.cloudflare.com/application-services/products/waf/"><u>WAF</u></a> protects origin servers from different kinds of layer 7 attacks, which are attacks that <a href="https://www.cloudflare.com/learning/ddos/application-layer-ddos-attack/"><u>target the application layer</u></a>. Protection is provided with various tools like:</p><ul><li><p><a href="https://developers.cloudflare.com/waf/managed-rules/"><u>Managed rules</u></a>, which security analysts at Cloudflare write to address <a href="https://www.cve.org/"><u>common vulnerabilities and exposures (CVE)</u></a>, <a href="https://www.cloudflare.com/learning/security/threats/owasp-top-10/"><u>OWASP security risks</u></a>, and vulnerabilities like Log4Shell.</p></li><li><p><a href="https://developers.cloudflare.com/waf/custom-rules/"><u>Custom rules</u></a>, where customers can write rules with the expressive <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/"><u>Rules language</u></a>.</p></li><li><p><a href="https://developers.cloudflare.com/waf/rate-limiting-rules/"><u>Rate limiting rules</u></a>, <a href="https://developers.cloudflare.com/waf/detections/malicious-uploads/"><u>malicious uploads detection</u></a>, <a href="https://developers.cloudflare.com/waf/detections/leaked-credentials/"><u>leaked credentials detection</u></a>, etc.</p></li></ul><p>These tools are built on the <a href="https://developers.cloudflare.com/ruleset-engine/"><u>Rulesets engine</u></a>. When there is a match on a <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/expressions/"><u>Rule expression</u></a>, the engine executes an <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/actions/"><u>action</u></a>.</p><p>The Log action is used to simulate the behaviour of rules. This action proves that a rule expression is matched by the engine and emits a log event which can be accessed via <a href="https://developers.cloudflare.com/waf/analytics/security-analytics/"><u>Security Analytics</u></a>, <a href="https://developers.cloudflare.com/waf/analytics/security-events/"><u>Security Events</u></a>, <a href="https://developers.cloudflare.com/logs/logpush/"><u>Logpush</u></a> or <a href="https://developers.cloudflare.com/logs/logpush/logpush-job/edge-log-delivery/"><u>Edge Log Delivery</u></a>.</p><p>Logs are great at validating a rule works as expected on the traffic it was expected to match, but showing that the rule matches isn’t sufficient, especially when a rule expression can take many code paths.

In pseudocode, an expression can look like:</p><p><code>If any of the http request headers contains an “authorization” key OR the lowercased representation of the http host header starts with “cloudflare” THEN log</code>

The rules language syntax will be:</p>
            <pre><code>any(http.request.headers[*] contains "authorization") or starts_with(lower(http.host), "cloudflare")</code></pre>
            <p>Debugging this expression poses a couple of problems. Is it the left-hand side (LHS) or right-hand side (RHS) of the OR expression above that matches? Functions such as <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/functions/#decode_base64"><u>Base64 decoding</u></a>, <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/functions/#url_decode"><u>URL decoding</u></a>, and in this case <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/functions/#lower"><u>lowercasing</u></a> can apply transformations to the original representation of these fields, which leads to further ambiguity as to which characteristics of the request led to a match.</p><p>To further complicate this, many <a href="https://developers.cloudflare.com/ruleset-engine/about/rules/"><u>rules</u></a> in a <a href="https://developers.cloudflare.com/ruleset-engine/about/rulesets/"><u>ruleset</u></a> can register matches. Rulesets like <a href="https://developers.cloudflare.com/waf/managed-rules/reference/owasp-core-ruleset/"><u>Cloudflare OWASP</u></a> use a cumulative score of different rules to trigger an action when the score crosses a <a href="https://developers.cloudflare.com/waf/managed-rules/reference/owasp-core-ruleset/concepts/#score-threshold"><u>set threshold</u></a>. </p><p>Additionally, the expressions of the Cloudflare Managed and OWASP rules are private. This increases our security posture – but it also means that customers can only guess what these rules do from their titles, tags and descriptions. For instance, one might be labeled “SonicWall SMA - Remote Code Execution - CVE:CVE-2025-32819.”</p><p>Which raises questions: What part of my request led to a match in the Rulesets engine? Are these false positives? </p><p>This is where payload logging shines. It can help us drill down to the specific fields and their respective values, post-transformation, in the rule that led to a match. </p>
    <div>
      <h2>Payload logging</h2>
      <a href="#payload-logging">
        
      </a>
    </div>
    <p>Payload logging is a feature that logs which fields in the request are associated with a rule that led to the WAF taking an action. This reduces ambiguity and provides useful information that can help spot check false positives, guarantee correctness, and aid in fine-tuning of these rules for better performance.</p><p>From the example above, a payload log entry will contain either the LHS or RHS of the expression, but not both. </p>
    <div>
      <h3>How does payload logging work ?</h3>
      <a href="#how-does-payload-logging-work">
        
      </a>
    </div>
    <p>The payload logging and Rulesets engines are built on Wirefilter, which has been <a href="https://blog.cloudflare.com/building-fast-interpreters-in-rust/"><u>explained extensively</u></a>.</p><p>Fundamentally, these engines are objects written in Rust which implement a <a href="https://github.com/cloudflare/wirefilter/blob/72e3954622ff7f30c4171f45461c2274656ee1e3/engine/src/compiler.rs#L7"><u>compiler</u></a> trait. This trait drives the compilation of the abstract syntax trees (ASTs) derived from these expressions.</p>
            <pre><code>struct PayloadLoggingCompiler {
     regex_cache HashMap&lt;String, Arc&lt;Regex&gt;&gt;
}

impl wirefilter::Compiler for PayloadLoggingCompiler {
	type U = PayloadLoggingUserData
	
	fn compile_logical_expr(&amp;mut self, node: LogicalExpr) -&gt; CompiledExpr&lt;Self::U&gt; {
		// ...
		let regex = self.regex_cache.entry(regex_pattern)
		.or_insert_with(|| Arc::new(regex))
		// ...
	}

}</code></pre>
            <p>The Rulesets Engine executes an expression and if it evaluates to true, the expression and its <a href="https://github.com/cloudflare/wirefilter/blob/72e3954622ff7f30c4171f45461c2274656ee1e3/engine/src/execution_context.rs#L38"><u>execution context</u></a> are sent to the payload logging compiler for re-evaluation. The execution context provides all the runtime values needed to evaluate the expression.</p><p>After re-evaluation is done, the fields involved in branches of the expression that evaluate to true are logged.</p><p>The structure of the log is a map of wirefilter fields and their values <code>Map&lt;Field, Value&gt;</code></p>
            <pre><code>{

	“http.host”: “cloudflare.com”,
	“http.method”: “get”,
	“http.user_agent”: “mozilla”

}</code></pre>
            <p>Note: <a href="https://blog.cloudflare.com/encrypt-waf-payloads-hpke/"><u>These logs are encrypted with the public key provided by the customer.</u></a> </p><p>These logs go through our logging pipeline and can be read in different ways. Customers can configure a Logpush job to write to a custom Worker we built that uses the customer’s private key to automatically decrypt these logs. The Payload logging <a href="https://github.com/cloudflare/matched-data-cli"><u>CLI tool</u></a>, <a href="https://github.com/cloudflare/matched-data-worker"><u>Worker</u></a>, or the Cloudflare dashboard can also be used for decryption.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4jTk0nPsfA0yowEwHx5VUr/eed6cece439d238b1ec861a9e0760dd6/image5.png" />
          </figure>
    <div>
      <h3>What improvements have been shipped?</h3>
      <a href="#what-improvements-have-been-shipped">
        
      </a>
    </div>
    <p>In wirefilter, some fields are array types. The field <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/fields/reference/http.request.headers.names/"><u>http.request.headers.names</u></a> is an array of all the header names in a request. For example:</p>
            <pre><code>[“content-type”, “content-length”, “authorization”, "host"]</code></pre>
            <p>An expression that reads <code>any(http.request.headers.names[*] contains “c”)</code> will evaluate to true because at least one of the headers contains the letter “c”. With the previous version of the payload logging compiler, all the headers in the “http.request.headers.names” field will be logged since it's a part of the expression that evaluates to true.  </p><p><b>Payload log (previous)</b></p>
            <pre><code>http.request.headers.names[*] = [“content-type”, “content-length”, “authorization”, "host"]</code></pre>
            <p>Now, we partially evaluate the array fields and log the indexes that match the expressions constraint. In this case, it’ll be just the headers that contain a “c”!</p><p><b>Payload log (new)</b></p>
            <pre><code>http.request.headers.names[0,1] = [“content-type”, “content-length”]</code></pre>
            
    <div>
      <h3>Operators</h3>
      <a href="#operators">
        
      </a>
    </div>
    <p>This brings us to operators in wirefilter. Some operators like “eq” result in exact matches, e.g. <code>http.host eq “a.com”</code>. There are other operators that result in “partial” matches – like “in”, “contains”, “matches” – that work alongside regexes. 

The expression in this example: `<code>any(http.request.headers[*] contains “c”)`</code> uses a “contains” operator which produces a partial match. It also uses the “<code>any</code>” function which we can say produces a partial match, because if at least one of the headers contains a “c”, then we should log <i>that</i> header – not <i>all</i> the headers as we did in the previous version.</p><p>With the improvements to the payload logging compiler, when these expressions are evaluated, we log just the partial matches. In this case, the new payload logging compiler handles the “contains” operator similarly to <a href="https://doc.rust-lang.org/std/string/struct.String.html#method.find"><u>the “find” method for bytes in the Rust standard library</u></a>. This improves our payload log to:</p>
            <pre><code>http.request.headers.names[0,1] = [“c”, “c”]</code></pre>
            <p>This makes things a lot clearer. It also saves our logging pipeline from processing millions of bytes. For example, a field that is analyzed a lot is the request body — <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/fields/reference/http.request.body.raw/"><u>http.request.body.raw </u></a>— which can be tens of kilobytes in size. Sometimes the expressions are checking for a regex pattern that should match three characters. In this case we’ll be logging 3 bytes instead of kilobytes!</p>
    <div>
      <h3>Context</h3>
      <a href="#context">
        
      </a>
    </div>
    <p>I know, I know, <code>[“c”, “c”]</code> doesn’t really mean much. Even if we’ve provided the exact reason for the match and are significantly saving on the volume of bytes written to our customers storage destinations, the key goal is to provide useful debugging information to the customer. As part of the payload logging improvements, the compiler now also logs a “before” and "after” (if applicable) for partial matches. The size for these buffers are currently 15 bytes each. This means our payload log now looks like:</p>
            <pre><code>http.request.headers[0,1] = [
    {
        before: null, // isnt included in the final log
        content: “c”, 
        after: “ontent-length”
    },
    {
        before: null, // isnt included in the final log
        content: “c”, 
        after:”ontent-type”
    }
]</code></pre>
            <p><b>Example of payload log (previous)</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4yhJWnG9FiDnQRAuV82t6C/0dbef6ded6f48a7f74d6a69aaf7d52a5/image7.png" />
          </figure><p><b>Example of payload log (new)</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6M1bQKaqNvalqtJo7nO6fM/8e87e1927ed404495273c082e258e6ee/image4.png" />
          </figure><p>In the previous log, we have all the header values. In the new log, we have the 8th index which is a malicious script in a HTTP header. The match is on the “&lt;script&gt;” tag and the rest is the context which is the text in gray.      </p>
    <div>
      <h3>Optimizations</h3>
      <a href="#optimizations">
        
      </a>
    </div>
    <p>Managed rules rely heavily on regular expressions to fingerprint malicious requests. Parsing and compiling these expressions are CPU-intensive tasks. As managed rules are written once and deployed across millions of zones, we benefit from compiling these regexes and caching them in memory. This saves us CPU cycles as we don’t have to re-compile these until the process restarts.</p><p>The Payload logging compiler uses a lot of dynamically sized arrays or vectors to store the intermediate state for these logs. Crates like <a href="https://docs.rs/smallvec/latest/smallvec/"><u>smallvec</u></a> are also used to reduce heap allocations.  </p>
    <div>
      <h3>The infamous “TRUNCATED” value</h3>
      <a href="#the-infamous-truncated-value">
        
      </a>
    </div>
    <p>Sometimes, customers see <a href="https://github.com/cloudflare/matched-data-cli/blob/master/src/main.rs#L124-L129"><u>“truncated”</u></a> in their payload logs. This is because every firewall event has a size limit in bytes. When this limit is exceeded, the payload log is truncated. </p><p><b>Payload log (previous)</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7234Ub1YB3xCMd6thJ5Gje/22b87bc0149f82a9315c258227764e58/image6.png" />
          </figure><p><b>Payload log (new)</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3mBw8ElDalEbpjKmGoVTT3/f877a232df29c6e01889cc1941d1b69a/image1.png" />
          </figure><p>We have seen the p50 byte size of the payload logs shrink from 1.5 Kilobytes to 500 bytes – a 67% reduction! That means way fewer truncated payload logs.</p>
    <div>
      <h3>What’s next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We’re currently using a <a href="https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy"><u>lossy representation of utf-8 strings</u></a> to represent values. This means that non-valid utf-8 strings like multimedia are represented as <a href="https://doc.rust-lang.org/std/char/constant.REPLACEMENT_CHARACTER.html"><u>U+FFFD unicode replacement characters</u></a>. For rules that will work on binary data, the integrity of these values should be preserved with byte arrays or with a different serialization format.</p><p>The storage format for payload logging is JSON. We’ll be benchmarking this alongside other binary formats like <a href="https://cbor.io/"><u>CBOR</u></a>, <a href="https://capnproto.org/"><u>Cap'n Proto</u></a>, <a href="https://protobuf.dev/"><u>Protobuf</u></a>, etc., to see how much processing time this saves our pipeline. This will help us deliver logs to our customers faster, with the added advantage that binary formats can also help with maintaining a defined schema that will be backward compatible. </p><p>Finally, payload logging only works with Managed rules. It will be rolled out to other Cloudflare WAF products like custom rules, WAF attack score, content scanning, <a href="https://developers.cloudflare.com/waf/detections/firewall-for-ai/"><u>Firewall for AI</u></a>, and more. 

<i>An example of payload logging showing prompts containing PII, detected by Firewall for AI: </i></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4nq4rYROygGRVW7mTZ3nqz/3c49b85c54eee6f98aa0397dfce32fa5/image2.png" />
          </figure>
    <div>
      <h2>Why should I be excited?</h2>
      <a href="#why-should-i-be-excited">
        
      </a>
    </div>
    <p>Visibility into the actions taken by the WAF will give customers assurance that their rules or configurations are doing exactly what they expect. Improvements to the specificity of payload logging is a step in this direction — and in the pipeline are further improvements to reliability, latency, and expansion to more WAF products.</p><p>As this was a breaking change to the JSON schema, we’ve rolled this out slowly to customers with <a href="https://developers.cloudflare.com/changelog/2025-05-08-improved-payload-logging/"><u>adequate documentation</u></a>.</p><p>To get started and enable payload logging, <a href="https://developers.cloudflare.com/waf/managed-rules/payload-logging/#turn-on-payload-logging"><u>visit our developer documentation</u></a>. </p> ]]></content:encoded>
            <category><![CDATA[Firewall]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Logging]]></category>
            <guid isPermaLink="false">2FEaxrBcxhN1nwrSuT9Jpd</guid>
            <dc:creator>Paschal Obba</dc:creator>
        </item>
        <item>
            <title><![CDATA[General availability for WAF Content Scanning for file malware protection]]></title>
            <link>https://blog.cloudflare.com/waf-content-scanning-for-malware-detection/</link>
            <pubDate>Thu, 07 Mar 2024 14:00:14 GMT</pubDate>
            <description><![CDATA[ Announcing the General Availability of WAF Content Scanning, protecting your web applications and APIs from malware by scanning files in-transit ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Kt17OxA0rO7EXenzZGJXf/b2410be6e9de67c10ee488c6c23ac60d/TXqgLUZ0L11n0fBPaEzi9DYuGJ_QGmAxLbfS9xKb6c_N9nBhFQhsQ4TsJ7kU82ADmgLjkM6EWmZXDBO_5OX4urYAeca428kjsFf2MM8RWBUsNtkg0gEO75D-brm4.png" />
            
            </figure><p>File upload is a common feature in many web applications. Applications may allow users to upload files like images of flood damage to file an insurance claim, PDFs like resumes or cover letters to apply for a job, or other documents like receipts or income statements. However, beneath the convenience lies a potential threat, since allowing unrestricted file uploads can expose the web server and your enterprise network to significant risks related to <a href="https://www.cloudflare.com/network-services/solutions/enterprise-network-security/">security</a>, privacy, and compliance.</p><p>Cloudflare recently introduced <a href="/waf-content-scanning/">WAF Content Scanning</a>, our in-line <a href="https://www.cloudflare.com/application-services/solutions/">malware file detection and prevention solution</a> to stop malicious files from reaching the web server, offering our Enterprise WAF customers an additional line of defense against security threats.</p><p>Today, we're pleased to announce that the feature is now generally available. It will be automatically rolled out to existing WAF Content Scanning customers before the end of March 2024.</p><p>In this blog post we will share more details about the new version of the feature, what we have improved, and reveal some of the technical challenges we faced while building it. This feature is available to Enterprise WAF customers as an add-on license, contact your account team to get it.</p>
    <div>
      <h2>What to expect from the new version?</h2>
      <a href="#what-to-expect-from-the-new-version">
        
      </a>
    </div>
    <p>The feedback from the early access version has resulted in additional improvements. The main one is expanding the maximum size of scanned files from 1 MB to 15 MB. This change required a complete redesign of the solution's architecture and implementation. Additionally, we are improving the dashboard visibility and the overall analytics experience.</p><p>Let's quickly review how malware scanning operates within our WAF.</p>
    <div>
      <h2>Behind the scenes</h2>
      <a href="#behind-the-scenes">
        
      </a>
    </div>
    <p>WAF Content Scanning operates in a few stages: users activate and configure it, then the scanning engine detects which requests contain files, the files are sent to the scanner returning the scan result fields, and finally users can build custom rules with these fields. We will dig deeper into each step in this section.</p>
    <div>
      <h3>Activate and configure</h3>
      <a href="#activate-and-configure">
        
      </a>
    </div>
    <p>Customers can enable the feature via the <a href="https://developers.cloudflare.com/waf/about/content-scanning/api-calls/#enable-waf-content-scanning">API</a>, or through the Settings page in the dashboard (Security → Settings) where a new section has been added for <a href="https://developers.cloudflare.com/waf/about/#detection-versus-mitigation">incoming traffic detection</a> configuration and enablement. As soon as this action is taken, the enablement action gets distributed to the Cloudflare network and begins scanning incoming traffic.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/33blcVizOJx9iZiSqAzX3Y/5bdd6e83500ebf66d842a92854515b4e/image1-27.png" />
            
            </figure><p>Customers can also add a <a href="https://developers.cloudflare.com/waf/about/content-scanning/#2-optional-configure-a-custom-scan-expression">custom configuration</a> depending on the file upload method, such as a base64 encoded file in a JSON string, which allows the specified file to be parsed and scanned automatically.</p><p>In the example below, the customer wants us to look at JSON bodies for the key “file” and scan them.</p><p>This rule is written using the <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/">wirefilter syntax</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2bZfebbWfiQfchZpKJv75q/6088e2458deff944c3965647be3249ef/Scanning-all-incoming-requests-for-file-malware.png" />
            
            </figure>
    <div>
      <h3>Engine runs on traffic and scans the content</h3>
      <a href="#engine-runs-on-traffic-and-scans-the-content">
        
      </a>
    </div>
    <p>As soon as the feature is activated and configured, the scanning engine runs the pre-scanning logic, and identifies content automatically via heuristics. In this case, the engine logic does not rely on the <i>Content-Type</i> header, as it’s easy for attackers to manipulate. When relevant content or a file has been found**,** the engine connects to the <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/http-policies/antivirus-scanning/">antivirus (AV) scanner</a> in our <a href="https://www.cloudflare.com/zero-trust/solutions/">Zero Trust solution</a> to perform a thorough analysis and return the results of the scan. The engine uses the scan results to propagate useful fields that customers can use.</p>
    <div>
      <h3>Integrate with WAF</h3>
      <a href="#integrate-with-waf">
        
      </a>
    </div>
    <p>For every request where a file is found, the scanning engine returns various <a href="https://developers.cloudflare.com/waf/about/content-scanning/#content-scanning-fields">fields</a>, including:</p>
            <pre><code>cf.waf.content_scan.has_malicious_obj,
cf.waf.content_scan.obj_sizes,
cf.waf.content_scan.obj_types, 
cf.waf.content_scan.obj_results</code></pre>
            <p>The scanning engine integrates with the WAF where customers can use those fields to <a href="https://developers.cloudflare.com/waf/custom-rules/">create custom WAF rules</a> to address various use cases. The basic use case is primarily blocking malicious files from reaching the web server. However, customers can construct more complex logic, such as enforcing constraints on parameters such as file sizes, file types, endpoints, or specific paths.</p>
    <div>
      <h2>In-line scanning limitations and file types</h2>
      <a href="#in-line-scanning-limitations-and-file-types">
        
      </a>
    </div>
    <p>One question that often comes up is about the file types we detect and scan in WAF Content Scanning. Initially, addressing this query posed a challenge since HTTP requests do not have a definition of a “file”, and scanning all incoming HTTP requests does not make sense as it adds extra processing and latency. So, we had to decide on a definition to spot HTTP requests that include files, or as we call it, “uploaded content”.</p><p>The WAF Content Scanning engine makes that decision by filtering out certain content types identified by heuristics. Any content types not included in a predefined list, such as <code>text/html</code>, <code>text/x-shellscript</code>, <code>application/json</code>, and <code>text/xml</code>, are considered uploaded content and are sent to the scanner for examination. This allows us to scan a <a href="https://crates.io/crates/infer#supported-types">wide range</a> of content types and file types without affecting the performance of all requests by adding extra processing. The wide range of files we scan includes:</p><ul><li><p>Executable (e.g., <code>.exe</code>, <code>.dll</code>, <code>.wasm</code>)</p></li><li><p>Documents (e.g., <code>.doc</code>, <code>.docx</code>, <code>.pdf</code>, <code>.ppt</code>, <code>.xls</code>)</p></li><li><p>Compressed (e.g., <code>.7z</code>, <code>.gz</code>, <code>.zip</code>, <code>.rar</code>)</p></li><li><p>Image (e.g., <code>.jpg</code>, <code>.png</code>, <code>.gif</code>, <code>.webp</code>, <code>.tif</code>)</p></li><li><p>Video and audio files within the 15 MB file size range.</p></li></ul><p>The file size scanning limit of 15 Megabytes comes from the fact that the in-line file scanning as a feature is running in real time, which offers safety to the web server and instant access to clean files, but also impacts the whole request delivery process. Therefore, it’s crucial to scan the payload without causing significant delays or interruptions; namely increased CPU time and latency.</p>
    <div>
      <h2>Scaling the scanning process to 15 MB</h2>
      <a href="#scaling-the-scanning-process-to-15-mb">
        
      </a>
    </div>
    <p>In the early design of the product, we built a system that could handle requests with a maximum body size of 1 MB, and increasing the limit to 15 MB had to happen without adding any extra latency. As mentioned, this latency is not added to all requests, but only to the requests that have uploaded content. However, increasing the size with the same design would have increased the latency by 15x for those requests.</p><p>In this section, we discuss how we previously managed scanning files embedded in JSON request bodies within the former architecture as an example, and why it was challenging to expand the file size using the same design, then compare the same example with the changes made in the new release to overcome the extra latency in details.</p>
    <div>
      <h3>Old architecture used for the Early Access release</h3>
      <a href="#old-architecture-used-for-the-early-access-release">
        
      </a>
    </div>
    <p>In order for customers to use the content scanning functionality in scanning files embedded in JSON request bodies, they had to configure a rule like:</p>
            <pre><code>lookup_json_string(http.request.body.raw, “file”)</code></pre>
            <p>This means we should look in the request body but only for the “file” key, which in the image below contains a base64 encoded string for an image.</p><p>When the request hits our Front Line (FL) NGINX proxy, we buffer the request body. This will be in an in-memory buffer, or written to a temporary file if the size of the request body exceeds the NGINX configuration of <a href="https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size">client_body_buffer_size</a>. Then, our WAF engine executes the lookup_json_string function and returns the base64 string which is the content of the file key. The base64 string gets sent via Unix Domain Sockets to our malware scanner, which does MIME type detection and returns a verdict to the file upload scanning module.</p><p>This architecture had a bottleneck that made it hard to expand on: the expensive latency fees we had to pay. The request body is first buffered in NGINX and then copied into our WAF engine, where rules are executed. The malware scanner will then receive the execution result — which, in the worst scenario, is the entire request body — over a Unix domain socket. This indicates that once NGINX buffers the request body, we send and buffer it in two other services.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2OZumKn4Bar00t8BxfPuz6/bf2ffb663817b3092dccc2acd2701374/Screenshot-2024-03-07-at-12.31.52.png" />
            
            </figure>
    <div>
      <h3>New architecture for the General Availability release</h3>
      <a href="#new-architecture-for-the-general-availability-release">
        
      </a>
    </div>
    <p>In the new design, the requirements were to scan larger files (15x larger) while not compromising on performance. To achieve this, we decided to bypass our WAF engine, which is where we introduced the most latency.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3e2jggqBgIMdZaAwQPkiq9/01f31f022d7ef735cdb536041a28f25b/Screenshot-2024-03-07-at-12.32.42.png" />
            
            </figure><p>In the new architecture, we made the malware scanner aware of what is needed to execute the rule, hence bypassing the Ruleset Engine (RE). For example, the configuration “lookup_json_string(http.request.body.raw, “file”)”, will be represented roughly as:</p>
            <pre><code>{
   Function: lookup_json_string
   Args: [“file”]
}</code></pre>
            <p>This is achieved by walking the <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a> (AST) when the rule is configured, and deploying the sample struct above to our global network. The struct’s values will be read by the malware scanner, and rule execution and malware detection will happen within the same service. This means we don’t need to read the request body, execute the rule in the Ruleset Engine (RE) module, and then send the results over to the malware scanner.</p><p>The malware scanner will now read the request body from the temporary file directly, perform the rule execution, and return the verdict to the file upload scanning module.</p><p>The file upload scanning module populates these <a href="https://developers.cloudflare.com/waf/about/content-scanning/#content-scanning-fields">fields</a>, so they can be used to write custom rules and take actions. For example:</p>
            <pre><code>all(cf.waf.content_scan.obj_results[*] == "clean")</code></pre>
            <p>This module also enriches our logging pipelines with these fields, which can then be read in <a href="https://developers.cloudflare.com/logs/about/">Log Push</a>, <a href="https://developers.cloudflare.com/logs/edge-log-delivery/">Edge Log Delivery</a>, Security Analytics, and Firewall Events in the dashboard. For example, this is the security log in the Cloudflare dashboard (Security → Analytics) for a web request that triggered WAF Content Scanning:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2XGS4YhMjcm2rLRg9znwVr/9f1e5dcd67808f3bba13b80296a1aa42/image5-15.png" />
            
            </figure>
    <div>
      <h2>WAF content scanning detection visibility</h2>
      <a href="#waf-content-scanning-detection-visibility">
        
      </a>
    </div>
    <p>Using the concept of incoming traffic detection, WAF Content Scanning enables users to identify hidden risks through their traffic signals in the analytics before blocking or mitigating matching requests. This reduces false positives and permits security teams to make decisions based on well-informed data. Actually, this isn't the only instance in which we apply this idea, as we also do it for a number of other products, like WAF Attack Score and Bot Management.</p><p>We have integrated helpful information into our security products, like Security Analytics, to provide this data visibility. The <b>Content Scanning</b> tab, located on the right sidebar, displays traffic patterns even if there were no WAF rules in place. The same data is also reflected in the sampled requests, and you can create new rules from the same view.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6M1QR4dFLFsv0sG8xQ7ezj/83ec1b533b3b2c490451d6624685a26b/image7-4.png" />
            
            </figure><p>On the other hand, if you want to fine-tune your security settings, you will see better visibility in Security Events, where these are the requests that match specific rules you have created in WAF.</p><p>Last but not least, in our <a href="https://developers.cloudflare.com/logs/reference/log-fields/zone/http_requests/">Logpush</a> datastream, we have included the scan fields that can be selected to send to any external log handler.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Before the end of March 2024, all current and new customers who have enabled WAF Content Scanning will be able to scan uploaded files up to 15 MB. Next, we'll focus on improving how we handle files in the rules, including adding a dynamic header functionality. Quarantining files is also another important feature we will be adding in the future. If you're an Enterprise customer, reach out to your account team for more information and to get access.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[WAF Rules]]></category>
            <category><![CDATA[Content Scanning]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[General Availability]]></category>
            <category><![CDATA[Anti Malware]]></category>
            <guid isPermaLink="false">018GZnJIhpYLWge6uKNfnd</guid>
            <dc:creator>Radwa Radwan</dc:creator>
            <dc:creator>Paschal Obba</dc:creator>
            <dc:creator>Shreya Shetty</dc:creator>
        </item>
    </channel>
</rss>