-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for hostname filters in standard mode #181
Comments
@gorhill CMIIW uBlock Origin appears to literally translate Maybe it doesn't matter in practice, but then it makes you wonder why the extended anchor ( |
Yes, the parsing makes it internally an equivalent of the |
The way it was implemented in Adblock Plus was that the But then what's the use of the Filters could be interpreted such that any hostname characters at the beginning of the pattern would automatically make up the hostname. |
@mjethani I'm not convinced that is the right move; it's definitely a tradeoff either way. The majority of filter list maintenance and adblock engine development is done by a handful of people who are very familiar with ABP syntax. There's already a fair amount of... unintuitive features in ABP syntax, and additional special-cased parsing logic causes more surprises for the experts, not less. That breakage appears to have been a tiny mistake introduced with easylist/easylist@1e83bda. @ryanbr is solidly on the "pro" end of the filter list maintainer spectrum, and he probably knows even more than I do about how ABP syntax works! I think it's more important that the mistakes are caught and fixed in a timely manner when they happen. If it weren't for the breakage you referenced, that |
I implement a commit fix to trim any filters smaller than 9chars long to avoid this in the future. Was a mistake on my part, but was fixed within a couple of hours. |
@antonok-edm @ryanbr thanks for taking the time to comment on this. @antonok-edm wrote:
Perhaps it was not such a great idea to open with that incident as an example. Indeed, it has little to do with this enhancement request. It was only what got me thinking on this topic. The takeaway from the incident really is that different filter engines have different syntaxes and this itself could become a source of confusion among filter developers. The more general idea here is that Brave should make improvements to the current syntax. It's not uncommon for an implementation to do this. For example, Adblock Plus changed how it interprets non-ASCII characters and the trailing dot. These little improvements start to make sense over time as the web becomes more standardized.
This specific idea of interpreting |
As it turns out, the Adblock Plus project seems to have decided all of a sudden that it wants to be compatible with uBlock Origin too. This is a total coincidence of course. |
Here's what I had in mind. Hostnames:
With the above, the uBlock Origin is already at level 2. Word boundaries:
These ideas are based on what I see in EasyList et al at the moment. @antonok-edm I think I see your point that just the one change might be a hard sell. Maybe it's better to come up with a proposal for an entire set of changes of this nature. |
There was an issue in EasyList recently that broke some ad blockers. Even though the report doesn't mention Brave, the issue should have affected Brave too. Meanwhile, uBlock Origin was barely affected because it interpreted the filter in question as a hostname.
Now this raises the question as to how such filters should be interpreted in general.
The Brave implementation has two modes: standard and hosts. In hosts mode, a hostname on a line by itself is interpreted as a hostname. In standard mode, the same hostname must be specified using Adblock Plus syntax. For example,
example.com
must be encoded as||example.com^
.Should Brave adopt the uBlock Origin style in standard mode?
The crux of the issue is that Adblock Plus-style filters tend to overblock by default. A filter like
ad.png
would block not only the URLhttps://example.com/ad.png
(intended) but also the URLhttps://example.com/ipad.png
(not intended). Similarly, the filterexample.com
would blockhttps://example.com/ad.png
(intended) but would also blockhttps://example.computer/ad.png
(not intended).Whereas filter developers are aware of such nuances, and indeed there are no such filters in EasyList et al, the average user who types a filter into a box (brave/brave-browser#8838) is unlikely to go look up the syntax, learn it, and then correctly specify the hostname
example.com
as||example.com^
. In other words, the Adblock Plus syntax is not exactly user-friendly. Based on the recent incident, it also appears to be a bit of a footgun. We don't know how many end users are experiencing broken web pages because they were savvy enough to write their own filters by hand.The principle of least surprise dictates that if the pattern in a URL filter looks like a hostname, it should be interpreted as a hostname.
This new "shorthand" syntax for hostnames would break 0 existing filters in EasyList and EasyPrivacy.
On the other hand, there are nearly 32,000 filters in EasyList and EasyPrivacy, out of a combined total of about 81,000 filters, that could be converted to use the new syntax at some point, reducing the size of the raw filter text by ~100 KiB.
Not only would the shorthand syntax be more user-friendly, it might also have bandwidth, memory, and disk usage benefits over time.
WDYT?
The text was updated successfully, but these errors were encountered: