Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examining How the Great Firewall Discovers Hidden Circumvention Servers (IMC 2015) #208

Open
wkrp opened this issue Feb 11, 2023 · 7 comments
Labels
China reading group summaries and discussions of research papers and other publications

Comments

@wkrp
Copy link
Member

wkrp commented Feb 11, 2023

The posts under the reading group label have so far simply been posted summaries of new research papers. I want to try something different. Let's read an old, significant paper, and then discuss it in a voice/video chat. By this, I hope to promote a better common understanding of censorship research.

The time:
Sunday, 2023-02-26 13:00–14:00 UTC
This is morning in the Americas, daytime in Europe and Africa, and evening in Asia and Oceania.

The paper:
"Examining How the Great Firewall Discovers Hidden Circumvention Servers", 2015
PDF, project web page
This was an early, detailed look at active probing by the Great Firewall using a variety of protocols. It followed "How the Great Firewall of China is Blocking Tor" (2012), which focused on Tor, and preceded "How China Detects and Blocks Shadowsocks" (2020), on Shadowsocks.

I don't yet know how the discussion will be set up. We'll use some kind of video conference system. I'll post connection information closer to the date. I will try to make sure there is a video afterward for anyone who cannot attend.

This is an experiment. If it goes well, I would like to schedule a series of readings and discussions.

@wkrp wkrp added reading group summaries and discussions of research papers and other publications China labels Feb 11, 2023
@wkrp
Copy link
Member Author

wkrp commented Feb 24, 2023

Here's a summary of the paper, in advance of the reading group discussion.


Examining How the Great Firewall Discovers Hidden Circumvention Servers
Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver, Vern Paxson
https://censorbib.nymity.ch/#Ensafi2015b
https://ensa.fi/active-probing/

This paper from 2015 takes a look at active probing done by the Great Firewall to discover obfuscated proxy servers. Active probing is where the censor looks at what servers clients connect to, then makes its own connections to those same servers to see how they respond. If a server responds in a way characteristic of a proxy server, then the censor can block its IP address. This research study set out to study active probing of obfs2, obfs3, and plain Tor TLS without pluggable transports ("vanilla Tor"). In the course of events, they also incidentally discovered and documented active probes for SoftEther VPN and App Engine–based domain fronting proxies.

The authors ran multiple experiments that uncover different aspects of active probing:

Shadow (3 months)
The authors set up private Tor bridges outside China (vanilla, obfs2, and obfs3), and made periodic connection attempts to the bridges from Tor client in China. They then examined the Tor client log files to get a timestamped list of when connection attempts succeeded and failed.
Sybil (20 hours)
This was a short-term, high-rate experiment between one client in China and one vanilla Tor bridge outside China that was listening on 600 different ports. The authors connected to all 600 bridge ports sequentially, over the course of two hours, then continued monitoring for several more hours. The data from the Sybil experiment (181 MiB) is available.
Log (5 years)
This experiment came from passive observation of the application logs of a server that happened to receive active probes since 2013. Unlike the other experiments, the probes were naturally occurring, and not triggered by the researchers. It started by searching logs for IP addresses that had sent obfs2 probes (obfs2 is passively detectable with near 100% accuracy, even retroactively), then looking at what other traffic those IP addresses had sent, and repeating with newly discovered probe types. For a short time, the authors ran a custom sink server on port 23, in order to unwrap and inspect the contents of obfs2, obfs3, and TLS-based probes. The data from the Log experiment (69 MiB) is available.
Counterprobe (1 week)
The authors port-scanned the source IP addresses of active probes, while active probing was in progress and every hour for 24 hours thereafter.

In the Shadow experiment, the vanilla bridges were consistently blocked (except for brief intervals of accessibility every 25 hours), but the obfs2 and obfs3 bridges were not blocked. The fact that obfs2 and obfs3 were not blocked is strange, since other experiments showed that the GFW had the ability, at the time, to send obfs2 and obfs3 probes. The Sybil experiment showed that bridges were usually probed within 1 second of the triggering connection, then probed again 12 hours later. A complete TCP handshake was required to trigger active probes, but the detection system did not robustly reassemble TCP streams. Some of the probes showed inconsistencies at the application layer: the Tor probes used an old version of the Tor protocol; obfs3 probes were distinguishable from mainline obfsproxy in the way the implemented random padding; and HTTP and TLS features in the AppSpot probes did not match their claimed Chromium User-Agent.

Collectively, the experiments found 16,083 unique source IP addresses for active probes. Most of them appeared only once. Virtually all prober IP addresses were in Chinese networks. Reverse port scans to the source IP addresses of active probes always showed the addresses as completely unresponsive while probes were being sent—but often, later, those addresses would begin responding to port scans, revealing no common pattern of open ports or TCP/IP characteristics. But the packets sent by probers did have similar TCP/IP characteristics, despite the diversity of source IP addresses, and other evidence, like consistent TCP initial sequence number and TCP timestamp sequences, strongly indicated that the probes had a centralized or common origin.

@wkrp
Copy link
Member Author

wkrp commented Feb 25, 2023

We'll start the reading group tomorrow at 13:00 UTC here:

https://meet.jit.si/moderated/a47502eba43419adf342da21f02ef7c1aed6503295770f39f2dfcfd50df23e32

I will try to get the stream started up about 20 minutes early, to give time for participants to debug any technical issues. You may feel free to join with whatever pseudonym you like. I don't know whether the Jitsi link above will be accessible to everyone; this is an experiment, and if needed we can make adjustments for any future sessions. In case of a total technical catastrophe, in the worst case I'll record a video and post it later.

@cross-hello
Copy link

cross-hello commented Feb 26, 2023 via email

@wkrp
Copy link
Member Author

wkrp commented Feb 28, 2023

Video thumbnail
Link to video

Here is a video recording of the reading group. The audio is a little choppy due to a technical problem. There are captions if anything is difficult to understand.

These are the links that are referred to in the video:

Some questions came up that I didn't immediately have the answers to. If I get the answers, I'll post them in this issue.

  • How long does it take for an IP–port to be blocked, after probing identifies it?
  • Was there any reaction by the developers of SoftEther VPN / VPN Gate to the demonstration of active probing against their protocol?
  • Do the characteristic TCP fingerprints of the active probers correspond to any known OS or TCP implementation?

@wkrp
Copy link
Member Author

wkrp commented Feb 28, 2023

How long does it take for an IP–port to be blocked, after probing identifies it?

It's strange, but that question is apparently not taken up by this piece of research. It does say, though (Section 5.1), that the vanilla Tor protocol was blocked, but obfs2 and obfs3 were not blocked.

"How the Great Firewall of China is Blocking Tor" from 2012 does mention delays for blocking and unblocking:

On October 4, 2011 a user reported to the Tor bug tracker that unpublished bridges stop working after only a few minutes when used from within China [17].

After approximately 12 hours, the Tor process behind port 23941 (which appeared to be unreachable to the GFC) became reachable again whereas connections to port 27418 still timed out and continued to do so. … This observation shows that once a Tor bridge has been blocked, it only remains blocked if Chinese scanners are able to continuously connect to the bridge. If they cannot, the block is removed.

"How China Detects and Blocks Shadowsocks" observes that only a few of their test Shadowsocks servers got blocked, despite all of them getting probed. In one case, a server was blocked within 15 minutes.

Few of the servers that received probes were blocked. One of the servers that was blocked had operated for only around 15 minutes, and had not received nearly as many probes as other servers that did not get blocked.

We have two hypotheses attempting to explain this phenomenon. One is that the blocking of Shadowsocks is controlled by human factors. … Another hypothesis is that active probing is ineffective against the particular Shadowsocks implementations and versions that we used in most of our experiments.

Was there any reaction by the developers of SoftEther VPN / VPN Gate to the demonstration of active probing against their protocol?

We, the researchers, did actually get in touch with VPN Gate developers, though not on this question specifically. They told use that there is no formal specification of the SoftEther VPN protocol apart from the source code. While SoftEther VPN is open-source, the additional components for VPN Gate are not.

Even at the time, the SoftEther VPN probe did not match what was produced by the official SoftEther client. The official client had an HTTP Host header while the active probes did not. There is some logic to check for the presence of the Host header, but it looks like it is (was) disabled by default. SoftEther commits tend to come in huge units with tens of thousands of changed lines and no list of changes, so it's hard to know exactly what's intended.

It looks like the current version of ClientUploadSignature is unchanged since then.

Do the characteristic TCP fingerprints of the active probers correspond to any known OS or TCP implementation?

Yes; p0f classifies virtually all the probes we looked at as "Linux 2.6.x" or "Linux 3.1-3.10". I had forgotten, but this analysis is present in the Log experiment data. See the files p0f.https_tcpdump.csv, p0f.http_tcpdump.csv, and p0f.telnet_tcpdump.csv, and the p0f figures in figures/index.html.

p0f classifications were consistent within a TCP timestamp sequence:

p0f uptime and OS by date (http+https+telnet tcpdump)

And p0f classification was correlated with TLS fingerprint, in the subset of probes that were TLS:

p0f tsval and TLS fp by date (http+https+telnet tcpdump)

(NB the first graph's y-axis is the p0f-reported "uptime", which is tsval/rate. The y-axis of the second graph is an attempt to recover tsval as uptime×rate; it's noisy because p0f does not record enough precision to recover it exactly.)

@wkrp wkrp changed the title Live reading group 2023-02-26 – Examining How the Great Firewall Discovers Hidden Circumvention Servers (2015) Examining How the Great Firewall Discovers Hidden Circumvention Servers (IMC 2015) Mar 7, 2023
@bmixonba
Copy link

bmixonba commented Mar 7, 2023

Are there plans for another paper?

@wkrp
Copy link
Member Author

wkrp commented Mar 8, 2023

Are there plans for another paper?

In 2020 there was "How China Detects and Blocks Shadowsocks" that used similar methodology. An early report from that project is in thread #22.

Other than that I'm not aware of any ongoing projects on active probing. If it's something you're interested in doing, I'd say the field is open. Personally I would like to see deeper counterprobing experiments with finer granularity: scanning back at least once per minute to find out how quickly active probing IPs become responsive after sending their probes, and an attempt to detect shared TCP ISN or TCP timestamp sequences across multiple probe targets. Although the zeitgeist has changed somewhat since the wider adoption of probe-resistant protocols.

A recent paper on obfs4 bridge detection, "Detecting Tor Bridge from Sampled Traffic in Backbone Networks", has what is perhaps a nod to active probing. Their obfs4 detection technique has high recall but low precision; i.e. few false negatives but high false positives: too many false positives to be useful as a blocking rule directly. The authors state the need for a "secondary detection mechanism" to make the detection practical; in other protocols this could be active probing, but obfs4 is deliberately probing resistant.

For network managers, what is needed is to detect as many correct Tor bridge addresses as possible. High recall rate means that the number of the real bridges which identified as bridge accounts for a high proportion of the number of the real bridge, which means that the detection results are relatively complete, and the real bridges are almost in the detection results. Therefore, when the detection is performed in the actual network, it is only necessary to perform a second identification based on the first detection result.

In future work, we need to follow the update of the obfuscation protocol promptly, re-screen the features, and design a secondary detection method to maintain the practicability of this method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
China reading group summaries and discussions of research papers and other publications
Projects
None yet
Development

No branches or pull requests

3 participants