It’s that time of the year again, and Hiro, 4chan’s current owner, has decided to switch to a new ad loader after issues with the previous one (“argon”) being blocked. I didn’t make an article about Argon, since I didn’t figure anyone would give a shit beyond the few paranoids on /g/, but I did tear it apart and reverse-engineer it on my GitLab as an exercise. Similarly, I decided to also reverse-engineer this one.
As with most modern ad loaders, it makes use of some interesting techniques to dodge detection by ad-blockers. Unfortunately, its audience is more technically literate than not, so the morons on /g/ naturally assumed that it is loaded with canvas fingerprinting techniques and other invasive data-gathering based on its tendency to load a scary-looking PNG from 4chan.
So, let’s drill down and rip this thing apart to see what it really does.
tl;dr and How to Block
The scary image is actually just JSON encoded as an image, and carries the revcontent ad loader and backup ad delivery data. The loader, known as Yavli, does not perform fingerprinting, but it does act as a proxy to permit javascript and content to bypass CORS blocking.
To block it:
uBlock Origin:
! Add these to "My Filters".
! 9/19/2018, 10:41:33 AM https://boards.4chan.org/diy/
boards.4chan.org##body > .desktop.adg-rects
! 9/19/2018 - Yavli
boards.4chan.org##script:contains("adclix.png")
boards.4chan.org##script:contains("img[src*='wpengine.netdna-cdn.com']")
AdBlock Plus and uBlock Origin already have updated filters. Purge the ad blocker’s caches and update to receive the blocking rules.
Background
The first thing I wanted to do with this was gather some more information about who was serving these ads. Without any adblocking, you get the following at the top of the page:
So, as is plainly visible, the ads are served by a company named revcontent. For shits and giggles, I decided to actually click the “Ads by revcontent” link and see if I got a face-full of Russian trojans or not. I was pleasantly surprised to see that they actually had an about popup with some options:
Clicking on “Choose your own content” actually produced several different options, and even let you select a content filter. Haven’t seen that before.
I didn’t bother selecting any other options, because c’mon.
Here’s the interesting thing, though: That scrambled image up above has nothing to do with revcontent. It’s actually a backup ad loader that only kicks in if it cannot load a small image from a well known ad server (orbitfour47.com/adclix.png). So, naturally, I went to that server to see what was up, and got a generic “Hi, I am your new HeroFW app!” placeholder page at its root.
After digging around a bit, I found that this particular adloader has been seen in other places and is known as Yavli to Adblock Plus and uBlock Origin’s developers.
What is Yavli?
Yavli turns out to be a company that designed an ad blocker dodging and detection system. In fact, they run an entire “content recommendation service” with one of the major selling points being the ability to “[m]onetize your entire audience, reach users accessing your site with ad blocking software installed.”
Their company appears to have been launched and operated in the west, and even advertise who their various employees and chief officers are. They are also a tad delusional.
The truth is that people don’t want to read about Hillary Clinton 2 years after she became politically irrelevant, and adblockers have become as integral to a new computer as antivirus, since some sketchier ad networks occasionally have drive-by malware attacks hosted on their services.
So, let’s see how this thing ticks.
Operation
First, I made a tab, disabled NoScript for that tab, and then navigated to https://boards.4chan.org/diy/. I then disabled uBlock and Tampermonkey, and was rewarded by a fully operational ad placement. 4chan uses one at the top of the page, followed by another at the bottom of the page, both served by revcontent. You’ll notice that the ads are created by a script embedded in core.min.1021.js, and only serves revcontent. So, let’s ignore that entire chain of stuff.
If we look further down the list of network activity, we see something odd: A script embedded in the page tries to download a 32×32 icon we don’t see anywhere on the page.
I found this script and prettified it here.
At the end of the oeo.start() function (which mostly sets up a jquery-like system for querying the DOM), we see the following:
// Try to load adclix.png via Image().src return oeo.abd(function() { // Failed to load (onerror() called) // Inject things into page (doesn't on 4chan since oeo.fl=[]) oeo.f(oeo.fl); // WIPE OUT ALL IMAGES. (sets them display: none !important) oeo.fc(); // Unknown, function is empty. oeo.mf(); // Inject shit into the page every 1.5s (again, it's empty) setTimeout(function() { oeo.f(oeo.fl); }, 1500); // Try loading ads via alternate method (the encoded image shit) oeo.now(oeo.ur(oeo.u), oeo.q("html")[0]); return true; });
now() itself is where the scary image (as specified in oeo.u) is loaded. Now, here’s where things are a little wonky: The scary image is only loaded if there is partial blocking of ads, meaning uBlock or Adblock only blocked revcontent. This was what happened yesterday (September 19th, 2018), before I helped provide some rules that uBlock and Adblock adopted to block Yavli itself. To see Yavli in action, you need to block adclix.png. To do that with all adblockers disabled, I simply added the following to C:\Windows\System32\drivers\etc\hosts (same can be done to /etc/hosts):
0.0.0.0 orbitfour47.com
After refreshing the page, you get the image downloading properly. The image itself looks like this:
However, it’s not so scary once you actually look at what’s going on under the hood. In now(), it creates a a canvas the size of the image and then reads each pixel of the image, translating it to a series of [R,G,B,A] bytes. Each alpha byte is skipped, and bytes greater than 0 are added to an array. The array is then translated into a string and parsed as JSON.
now: function() { return function(url, parent) { // Let's not do this twice. uc = loaded data if (oeo.uc != false) { // Process already collected ad data oeo.collection(oeo.uc, parent); // Not sure why they're doing this here but w/e. oeo.uc = false; return; } // Create an <img> var i = new Image(); // Disable CORS i.crossOrigin = "Anonymous"; // What we do once the image loads i.onload = function() { // How many fetches we've made. oeo.c++; // Create a <canvas> var c = document.createElement("canvas"); // Create a 2D rendering context. var t = c.getContext("2d"); // Get height and width from the image. var w = i.width; var h = i.height; // Set canvas to image size. c.style.width = c.width = w; c.style.height = c.height = h; // Disable alpha t.globalAlpha = 1.0; // No antialiasing? t.globalCompositeOperation = 'copy'; // Draw the image onto the canvas. t.drawImage(i, 0, 0); // Get the image as a series of [r,g,b,a,r,g,b,a,...] bytes, // then convert each byte into an integer and slap it into the end of the array. // It skips \0 bytes and the alpha byte. // (See p24()) var b = oeo.p24(t.getImageData(0, 0, w, h).data); var s = ""; // Now convert the array into a string, each byte being a character. for (var x = 0; x < b.length; x++) if (b[x]) s += String.fromCharCode(b[x]); // Parse string as JSON. var ucl = JSON.parse(s); // If first parsing and ucl.observe.enabled == 1, then we do mutation checks with MutationObserver. if (oeo.c == 1 && ucl.observe.enabled == 1) { oeo.uc = ucl; oeo.later(ucl.observe); } else { // Otherwise, we just render the ads. oeo.collection(ucl, parent); } }; // The image itself. i.src = url; } }(),
As seen above, no canvas fingerprinting is done.
The Ads Themselves
This JSON file is fairly big with the HTML and CSS for every ad, as well as another adloader bundled inside. Each ad image is loaded from 4chan via a webserver rule that redirects requests to Yavli’s backend server, which loads stuff from whoever Hiro wants. At the moment, they appear to be the same “related content” clickbait bullshit as RevContent. The ads in this case are loaded as a single image atlas from 4chan and displayed with CSS and HTML.
Finally, the ad loader checks the widgets (ad display areas) and the ads it has selected for display, to see if they’re present on the DOM. It then makes a list of widgets and displayed ads, and feeds them into an image URL with a rather complex format. It requests this URL and receives a 1px image that it does no processing on.
Let’s examine one:
The image URL in the image above is https://i.4cdn.org/mptfeolite/mptfenvaldite/2018/09/morkov1c12857c2156c9271c8377c9687tommel-2hpgj1.png
. Now, let’s look at the JSON data from the first image:
{ //... "load_url": "//i.4cdn.org/osqfeolite/osqfenvaldite/2018/09/serrual12857c[i]trigger-[r].png", "link_redirect_url": "http://a.yvsystem.com/wcr.html?u=", "view_url": "//i.4cdn.org/osqfeolite/osqfenvaldite/2018/09/morkov[p]c12857c[i]tommel-[r].png", //... }
So we’re looking at a view_url. Now, if you look in that URL specification, it has a [p]
, an [i]
, and an [r]
. Let’s look up those values in <a href="https://gitlab.com/N3X15/yavli-dissection/blob/3157b92f87ddab8e4b85fc2bdcc401018900fcf4/loader.js#L238">ur()</a>
. [p]
is replaced with a number indicating whether the ads were rendered yet, so it’s 1 in the URL above. Next, we see [i], which is complex, so we’ll come back to that. Finally, we have [r], which is just a random string.
So, let’s try to decode [i]. It turns out that [i] is a “-“-seperated list of widgets and their displayed ads, in the format of {widget ID}c{ad ID}[c{ad ID}[...]]
. If we look above, [i] is 2156c9271c8377c9687
. Since they’re using “-” as a seperator, we know right off the bat that there’s only one widget displayed. Since the values for that widget are seperated by “c”, we can further decode it as [2156, 9271, 8377, 9687],
which finally decodes as “we’re displaying widget ID #2156, with ads #9271, #8377, and #9687”.
This checks out, as the JSON object specified a unit with widget_id = 2156.
Inner Payload?
What about that payload inside of the JSON object? All that additional JS just appears to be another copy of the Yavli adloader, although they could conceivably use it to load a third-party network’s adloader without running afoul of CORS blocking. There are some minor differences, such as there being a bit less obfuscation, and it being loaded inside of an IIFE, but otherwise, it’s the same damn thing, at the moment.
Conclusions
Right now, the rumors of HTML5 canvas fingerprinting are false: The only canvas used is the one used for decoding the JSON object. Now, does this mean that you shouldn’t lock down your browser against fingerprinting attacks? No, even though you’ll still be fingerprintable in many other ways. However, this just seems to be a more complex way of loading ads.