IP Lists¶
When it comes to blocking or allowing requests, you are usually relying on the visitor's IP addresses quite a lot. Our web application firewall comes with an IP List feature that allows you to define arbitrary lists of IP ranges that you can use in rules to quickly allow or block requests as well as differently weigh or hit them.
Configuration¶
IP Lists are defined a root-level in the global configuration file and can either be populated inline or from an external endpoint that can also be updated dynamically at runtime.
ip_lists:
tor-exit-nodes:
source: "url"
url: "https://datafeed.rescaled.com/tor?only=exit"
refresh_interval: "30m"
rfc1918:
source: inline
entries:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
Inline Definition¶
The easiest way of defining IP lists is to set source to inline and then use the entries field to statically list the IPs or CIDR ranges you want to match against.
This kind of definition is perfectly fine when you only have a handful of IPs or ranges to match against that rarely change.
Remote Source Definition¶
When it gets more dynamic, you may want to use remote source definitions to assemble your IP lists. This can be done by setting source to url and then providing a valid URL to fetch the list from.
The endpoint is expected to return a plain text response with one IP address (or CIDR range) per line.
You can optionally set a refresh_interval to automatically refresh the list from the remote endpoint at a regular interval. The expression is parsed by time.ParseDuration and needs to be a positive duration. Setting it to 0 or removing the field entirely, disables automatic refresh.
Managed Datafeeds¶
Customers can also use our managed datafeeds to populate IP lists. These are hosted by rescaled and can be used to quickly assemble IP lists for common use cases. We're taking care of keeping them up-to-date and will automatically refresh them at regular intervals for you to fetch.
As of today, rescaled is offering the following managed datafeeds:
- Tor Exit Nodes
- Cloudflare IPs
- Google Search Bots
- Apple Search Bots
- Bing Search Bots
- DuckDuckGo Search Bots
- Commoncrawl Search Bots
- Perplexity AI User Bots
- OpenAI GPT Bots (AI-Training Data Ingesting)
- OpenAI Search Bots (Non-AI-Training Data Ingesting)
- Amazon Web Services (AWS) - All Services
- Amazon Web Services (AWS) - EC2
- Amazon Web Services (AWS) - CloudFront
- Amazon Web Services (AWS) - S3
Please ask your technical account manager for more information on how to configure these datafeeds.
Usage¶
In order to use an IP list in a rule, you need to reference it by its name. You can use the IP list with any of the available actions as well as within CEL expressions.
The same logic can be expressed using CEL.
Together with the fact that rules are evaluated in order, this also allows you to quickly identify and block bots that are e.g. impersonating known search engine crawlers. The following two rules are an optional part of our general purpose ruleset and will detect and block vistors who pretend to be a Googlebot.