WebTech: Archive
Entry for 17th November 2003 at 18:20 GMT
I've noticed over the last couple of days a new wave of referrer spammings, but this new batch are all rather strange. In my referrer logs I get requests from a single IP address (141.85.3.130) always by a robot (assumed) with user-agent "MSIE 6.0" - nothing more and nothing less, that's the complete string. Each request is a GET and leaves behind the address of someone else's blog - always blogs, it never leaves the address of a different type of site. The robot is only visiting a couple of times a day, but goes no deeper than the site root (where, relatively speaking, most bloggers serve their blogs from).
If you look at the sites that are being promoted they are all quite odd. Take www.mikesspot.com and www.bongohome.com to name but two (NB: I've deliberately not made links to them for a good reason). The sites look like normal blogs, although when you read them you realise that they aren't 'personal', there's no human content, each entry in their blogs is actual the first paragraph of a news story from a legitimate news site (the blog titles are actually links to the ripped-off content).
The sites are themed on some topic, which introduces a feeling of normality about them, but the oddness really starts when try to 'use' the site. Some sites appear to have user comments to each of the entries, except that the links to those user comments are empty and can't be navigated to. The archive links in the sidebar only ever take you to the front-page of the site, never to an actual archive (same for the calendar tool). The Links section contains random links, but not necessarily to blogs, some are but a lot are random and are often admin areas of sites that need passwords or have no relation to the site's theme at all - really just a subsection of random web pages. The site search, if it has one, takes you away from the site to someone else's site altogether. Some sites claim to be made with Moveable Type others by Blog City.
Spammer & Sites are Linked
In actual fact the sites are all auto-generated and made to look like real sites.
Well, this is an interesting theory, but couldn't they be genuine folk's sites?
No, there's something that indicates that there's more to this than just an apparent casual connection and that comes when you start digging around. The robot that adds the spam referrer has a certain IP address that resolves to an institution:
"Politehnica" University of Bucharest
Communication Center
Splaiul Independentei 313
Bucharest 77206
If you trace the websites themselves to see who serves the sites, you get the same IP address (or an IP address in the range assigned to that institution). Bingo, there's definitely something going on here.
Scam
So why bother to go to this effort to auto-generate web sites and add referrer spam to folk's sites? Well, to answer that you need to go back and look at the HTML source of the sites again. At the bottom of the source are a couple of links that aren't seen in the displayed webpage. Each site has the same link, to /adult-webcam/. If you navigate to that page, and I suggest you don't if you're at work or on a network that won't take kindly to you viewing scantily clad ladies), you end up at an entry page to a p*rn site (always the same one). The offending site appears in a HTML frame, the top frame source of which is full of search engine fodder - lots of text and links to pages within the /adult-webcam/ folder. The site inside the frame is the true p*rn site served from a server in Amsterdam.
Why?
Well this is still a bit tricky, but my tentative theory is that it's to do with search engines, particularly Google. Blogs have very high page rankings due to their popularity and dense inter-link relationships. P*rn sites also have dense inter-link relationships between similar sites, but these relationships (networks) are used by some search engines to actively filter those unsuitable sites out of mainstream web searches. The person or persons behind this scheme is attempting (whether they intended to or not) to create a dense-linked network between legitimate blogs, across to these auto-generated blogs and then to the p*rn sites. The next part is rather open to suggestion, but with this dense linking network from blogs to p*rn sites in place search engines such as Google might find it difficult to make the distinction between 'good' and 'bad' sites. If this is the case the filtering systems may hiccup and let the p*rn sites through into mainstream web searches, which will increase the 'customers' to the p*rn sites. I'm unsure of this last bit, because I can think of better ways to create dense inter-linking to the p*rn sites without the need for the in-between site, and all-in-all I doubt Google is that stupid. However, this p*rn link relationship would likely decrease the page-ranking of legitimate blogs. Given this uncertainty, the real reason for the effort in creating these fake blogs and fake referrers may still be alluding me.
The "University of Bucharest", if a genuine institution, likely doesn't know that it's web servers are being used in this way (students up to no good or crackers in action).
Feedback
If you have a more likely theory I'd be interested to hear it (use the comment link on this entry to write a reply). I'm currently blocking all spam referrer hits from IP address 141.85.3.130 to prevent them getting into my blog referrer list (and from thus initiating a relationship between my site and the p*rn sites).
If you have similar entries in your site referrers I'd like to know about it. Look for IP addresses between 141.85.0.0 and 141.85.255.255, or look for user-agents of exactly "MSIE 6.0" or indeed look for apparent referrer links from www.mikesspot.com and www.bongohome.com (to name but two). If there is something in this then I'll contact the "University" and warn them.
Update (9:23pm GMT)
I'm not the only person to have spotted this: netobjects.fusion7.gen-discuss: OT: Re: Hits from a Weird URL.