WebTech: Archive

Entries For May 2005

RSS Privacy

Fri 27th May 2005 10:56 GMT by Andrew from Darlington, U.K.

For a couple of weeks now I've been noticing that certain RSS feeds are including adverts and web-bugs that can be used to track readership (?Boing Boing?, ?Slashdot?, and more). Reading FeedBurner Plants a Bug this morning reminds me to mention one way on how to block them: Re-route the image requests to a different webserver. This can be done simply by editing your ?hosts? file.

Certainly on Win2K and probably WinXP the hosts file resides at C:\WINNT\system32\drivers\etc\hosts and can be edited with NotePad, it's probably the same location for previous versions of Windows too. On Linux/Unix systems it resides at /etc/hosts I believe.

Add the following at the end of the file:

#Block certain RSS web-bugs
127.0.0.1 feeds.feedburner.com
127.0.0.1 images.feedstermedia.com

Save the file. You may need to wait until next time you reboot or perhaps log-in before it takes effect. Once it does you'll see broken images at the end of certain posts in your RSS aggregator, not elegant but more private. Mind you, this relates to the ongoing debate about whether blocking ads is breaking a user-agreement with the site that wanted them there to pay for their running costs.

Note: In some situations the 127.0.0.1 part (IP address) may be incorrect for your system. If in your hosts file there is an entry for localhost and its IP address is not 127.0.0.1, then you should use that different IP address in the RSS web-bugs snippet above instead of 127.0.0.1.

Calendar Year

01:11 GMT by Andrew from Darlington, U.K.

Yearly Calendar ? just a simple calendar display, a year at a time.

More for personal use really as I never seem to have a calendar to hand and end up trying to use the fiddly Windows clock utility instead.

Bloglines Innovation

Thu 19th May 2005 01:48 GMT by Andrew from Darlington, U.K.

I can't remember how long how I've been using Bloglines as my personal RSS aggregator, although it must be at least one year by now. I'm now monitoring 76 feeds for news and comment pretty much continuously throughout the day. A couple of new (or newish) features have really impressed me: email groups in particular and weather reports.

The ability to subscribe to email newsletters using a temporary Bloglines email address and view their contents using the same GUI that I use to read almost all of my other news these days is quite innovative and mighty convenient. I only started using it recently when I wanted to keep up-to-date with jobs being posted to a university message board that didn't publish an RSS feed, but which did have a jobs-to-email facility.

It's nice to be able to forget about the job emails completely until you want to go and look at them ? whereas with ordinary email my mailbox watcher application pesters me to read any waiting email I might have and breaks my concentration. For important email I want it to do just that, but not for newsletters and similar stuff, but of course the mailbox watcher doesn't understand that. So, with newsletters I can send them to Bloglines instead. Now important email interrupts me like a telephone call, and unimportant email can be treated how you treat a newspaper ? you pick it up when you feel like it. Another bonus of this service is that I don't have to worry about the email address getting spammed since it's not mine and once I'm finished with it Bloglines bounce any mail to it.

Topicblogs

Tue 17th May 2005 00:52 GMT by Andrew from Darlington, U.K.

I notice that there's a new blog crawler doing the rounds: topicblogs. No word yet on what services will be offered by the system but in the meantime it'd be nice if their robot were to start paying attention to HTTP 304 content negotiation to make crawling a bit more intelligent and a bit less wasteful. Checking robots.txt wouldn't hurt either.

?Making Wrong Code Look Wrong?

Wed 11th May 2005 16:10 GMT by Andrew from Darlington, U.K.

There's a lot that I like about this article: Making Wrong Code Look Wrong over at ?Joel on Software?. Web programmers everywhere should read it and re-read it. For small projects with a single programmer it's possible to keep track of the flow of raw and HTML-encoded strings, but for anything larger the scope for mistakes is huge. The ?Apps Hungarian? method that Joel applies in the article looks to be a really good idea for development teams in this context ? anything to help with the appalling security record of web applications.

The article is more like 3 articles in one, the first 2 should be essential reading for web programmers but the last one on exceptions is a little dry and really should be carted off to another article. When it comes to defensive programming via conditions or exceptions though, my tuppence on the matter is to do both and not one or the other ? a lot of conditional state checking for anticipated errors with a lightweight exception handler wrapping it all up for unanticipated foul ups as a means to bail out. I take the point though that bailing out with exceptions isn't something you want to take lightly in many mission-critical situations.

BlueCoat Pre-Caching Proxy Problem

15:09 GMT by Andrew from Darlington, U.K.

It would seem to me that commercial blackbox proxies that pre-cache pages en-masse are not particularly scalable. Page caching proxies have been around for nearly as long as the WWW has been around, mainly because of cost savings in reducing WWW traffic. I notice though that there are popular commercial proxying products around today that when given a page request also perform a bulk grab of other pages on the same website. The idea being that if a request is made to another page on the same site then the proxy server may then already have a copy of it and so web-browsing speed appears to be increased for users behind the proxy server.

Thing is though that these bulk pre-caching servers seem to be getting more and more popular. BlueCoat make several all-in-one proxy servers that you can install on the edge or your network that appear to come with bulk pre-caching. BlueCoat's products came to my attention due to a pages-per-second throttle limit that I run on this website ? BlueCoat users are continually being blocked by it. The block is in place to put the brakes on resource hogs such as uncivilised web crawlers, referrer spammers and folks ripping sites to steal their content.

A web surfer behind a BlueCoat proxy makes a request to view a webpage and the proxy server fetches the page in question and caches it for later use ? this is normal caching proxy server behaviour. However, the BlueCoat server then also makes a burst request for multiple pages that it thinks the user might wish to browse to next. From looking at the pages that BlueCoat requests during it's pre-cache bursting on my site I strongly doubt that the user benefits from the effort ? the pages it fetches en-masse more often than not have little contextual connection to the original page that was requested.

The BlueCoat user doesn't get much if any benefit from the burst requests performed on his or her behalf. The web server on the other end of the request however momentarily gets a spike of page requests to process. I don't know how many requests the BlueCoat server makes during it's burst as after 5 requests in 5 seconds it gets blocked and seems to give up after it gets 5 HTTP 403s. I should experiment to take the block off for BlueCoat users and check. It'd also be interesting to do a statistical analysis of sorts on the pages requested in relation to the context of the originally requested page.

To me it seems that the BlueCoat pre-cache grab strategy is dumb and unscalable. I think it has some merit in theory, but pragmatically it's a resource hog with no benefit to the user it's there to serve. BlueCoat servers appear to be quite popular and I get a handful of BlueCoat bursts per day on this small website. The more companies that use them though, the more pre-cache bursts there are. That translates into a lot of unnecessary traffic for popular sites ? the effect I see is that traffic goes up by ten times. That is to say that a single page request generates 9 other pre-cache requests, although I don't know if this figure is artificially low because of my blocking routines.

I'll likely take the page request throttle block out for BlueCoat users as they aren't the folks that the routines were designed to restrict, but I'm dismayed at the developers of BlueCoat and fail to see any benefit for the users behind such proxies compared to a normal web proxy that caches only the page that was actually requested. I still need the throttling routines in place though as I continue to get hammered with a few thousand requests per day from referrer spammers and anonymous crawlers.

Note: you can tell if a user is behind a BlueCoat proxy as the server adds a HTTP_X_BLUECOAT_VIA header to the page request, you'll then have a chunk of page requests from the same IP address in your server logs all made within typically 1 second of the original page request.

[ modified 22nd May 2005 11:54 GMT ]

TextPad Syntax File For Classic ASP / JScript / XHTML / ADO

Tue 10th May 2005 23:01 GMT by Andrew from Darlington, U.K.

I'm making available the TextPad Syntax highlighting file that I use. It was originally ?Contributed by Jess Kim, sysinfra.com? according to the comment in the file. Except that I've kind of modified it beyond recognition to include lots of missing ASP and JScript commands plus methods common to the default COM objects that ship with classic ASP and the ADO constants:

Download the TextPad syntax file for ASP / JScript / Javascript / XHTML / ADO

Format String As Uri Component

01:03 GMT by Andrew from Darlington, U.K.

Another new script for the library: ?Format String As Uri Component? ? a javascript function that takes a string parameter and returns a human-readable version of it that is safe for use in URIs. It converts certain symbols into words, white-space into underscores and a wide range of accented characters are converted into plain ASCII equivalents. I use a very similar function to create the querystrings of WebTech entries.

[ modified 10th May 2005 11:10 GMT ]

Create Directory Path Function

Mon 9th May 2005 23:07 GMT by Andrew from Darlington, U.K.

A new script for the library: ?Create Directory Path? ? a simple function that takes an absolute filepath string as an input and ensures that all of the directories in the path exist.

Jumping Out of the Frying Pan and Into the Fire

12:31 GMT by Andrew from Darlington, U.K.

Regarding the latest critical Firefox exploits that are currently being publicised (The Register: Firefox exploit targets zero day vulns). If you're a Firefox user pondering switching back to Microsoft IE then bear in mind that you'd be ?jumping out of the frying pan and into the fire? with regards to internet security. Might I suggest though that you try out Opera 8 whilst the Firefox developers write new security patches.

TrackBacks Implemented

Sun 8th May 2005 13:53 GMT by Andrew from Darlington, U.K.

I've implemented TrackBacks for JASPER. At the moment the implementation is missing out-bound auto-discovery since in order to build that part I feel I should build a ?robots.txt? parser first. That way it'd stop the auto-discovery feature making a nuisance of itself were it to scan a site that someone didn't want scanning. To build a competent robots.txt parser would take quite a bit of effort, so I'll leave that for another day.

Apart from that in-bound auto-discovery for third party tools should work, as should out-bound trackback pinging from JASPER. Although, since I don't have access to any trackback enabled third-party tools I don't know for certain that in-bound auto-discovery works ? it'd be great if someone could ping me to confirm it.

[ modified 8th May 2005 20:20 GMT ]

Recent Comments RSS Feed

Sat 7th May 2005 13:17 GMT by Andrew from Darlington, U.K.

There's now a recent comments RSS webfeed. It's a full content feed (i.e. it's not a list of summaries) of the last few comments from both public commentors and blog editors.

Saying ?No? to Google's Cache & Grab

Fri 6th May 2005 15:46 GMT by Andrew from Darlington, U.K.

Boing Boing: ?Google Accelerator is bad news for Web apps? pointing to ?How to show Google's Web Accelerator the door in Rails?. OK, so porting the blocking routine to JScript ASP would yield:

function DisableLinkPrefetching() {
????var HTTP_X_MOZ = Request.ServerVariables("HTTP_X_MOZ");
????if (HTTP_X_MOZ.Count() && HTTP_X_MOZ.Item(1) == "prefetch") {
????????Response.AppendToLog("prefetch detected: sending 403 Forbidden");
????????Response.Clear();
????????Response.Status = 403;
????????Response.End();
????}
}

Although it'd be better to invoke another function to do the redirect and also populate a descriptive message explaining the reason for the block.

NTL Offers Free Anti-Virus & Privacy Tools

Thu 5th May 2005 14:16 GMT by Andrew from Darlington, U.K.

Vnunet: ?NTL offers free antivirus service?. OK, great for the masses but what if you already have your own firewall (albeit, in software ? Kerio) configured for your home network and multiple VPN needs with separate anti-virus (AVG) and multiple customised anti-spyware applications? I hope you can opt out in the same way as you can opt out of using NTL's connection software ? i.e. don't install it in the first place. My concern is that I end up with a dumbed-down service, or lack thereof, like AOL.

A Phishing Solution For Internet Banking

Tue 3rd May 2005 13:28 GMT by Andrew from Darlington, U.K.

The Register: Brits fall prey to phishing. Time now I think for UK banks to ship read-only e-banking applications on USB dongles to their customers ? and I don't mean an executable on a memory stick that can be copied, but an application in hardware. Providing of course that the application contained therein is fully accessible and usable (unlike the majority of e-banking websites it has to be said) then it seems like a reasonable solution to phishing.

With a physical black-box application higher standards of encryption can be employed and make it safe to use e-banking even on public machines. In one step phishing as we know it disappears and the cost of the programme pays for itself after the first year of not having to pay out phishing compensation. However, the physical dongle should obviously not be self-authenticating in case it is lost or stolen but instead should still require account numbers and pass-phrases in order to use it, this also has ?economy of scale? advantages too in being able to mass-produce a single unit.

De-Referencing Classic ASP Collection Objects

13:07 GMT by Andrew from Darlington, U.K.

Recently I was surprised to read that an ASP developer with a popular Developer website didn't understand why the following JScript ASP statement failed:

var myVal = Request.QueryString("myKey").split(",")

I'm slightly shocked because the reason comes down to a basic mis-understanding of Classic ASP.

Collections
?Request.QueryString("myKey")? returns a Collection object, not a String. I'm positive that most Classic ASP developers assumes that it returns a String because of the context in which they usually use it and because ?it just works? most of the time. I.e. they do something like Response.Write("myKey = " + Request.QueryString("myKey")) (there's also another problem with that statement ? see the last bullet point below). They're saying ?write the value of QueryString key ?myKey? to the output?. Yet ?Request.QueryString("myKey")? returns an Object and not a String. What they don't realise is that the underlying C++ Object behaviour is showing through and performing an implicit Object-to-String conversion.

Implicit Object-to-String Conversions
The ?Request.QueryString? Collection object has a couple of overridden ?toString()? methods. These are automatically fired when the Collection object is assigned to a String. However, when the Object is not assigned to a String the original Object is retained. So, invoking the String-only method ?split()? on a Collection produces an ?Microsoft JScript runtime error '800a01b6': Object doesn't support this property or method? error.

If the developer has taken the best-practice step of wrapping the Collection output in a ?Server.HTMLEncode()? statement then they are also prone to the cryptic errors that arise from doing: Response.Write(Server.HTMLEncode(Request.QueryString("myKey"))) when the querystring does not contain the key ?myKey? that they assume exists. The reason being that the Collection object is null and not a null string "" and Server.HTMLEncode() will error with a null argument since nulls are not implicitly castable to the String argument type that it needs. A more careful approach to using Collection objects will solve all of these apparent problems.

Collection Object Methods
A Collection object has Count(), Key() and Item() methods that are used to examine the contents of the object, in particularly the latter which de-references the Collection values and returns a String.

The curious thing is that in the article where the ?problem? was being discussed none of the commentors knew about this either. It's a Classic ASP fundamental; since Classic ASP is a framework of Objects and most of these are Collections ? e.g. ?Request.QueryString?, ?Request.Form?, ?Request.ServerVariables?, etc.

Defensive Programming
The defensive programming technique for dealing with native Collections is as follows:

  1. Before de-referencing the values in a Collection, test if the ?key? exists first by using the ?Count()? child method. For example:
    if (Request.QueryString("myKey").Count()) { /* de-referencing statement block */ }
  2. If the ?key? exists, be aware that a given ?key? can have an infinite number of values. So if you are expecting just 1 value in your applicatoin for a given ?key? then make sure you dereference only 1 value, e.g. use the ?Item()? method in conjunction with an index argument. For example:
    var myVal = Request.QueryString("myKey").Item(1);. If you don't do this then you get back a comma-separated String of all of the key's values. It's then the easiest thing in the world to disrupt your application by duplicating a key in your querystrings. If you've also been careless in how you treat input values then your application will quite likely fail. If you also let your failure messages be seen by the public, then you've just invited crackers to have a go at your application ? information leaking out of programming failures is gold dust to a cracker ? ?Security 101?.
  3. Lastly, always Server.HTMLEncode() data being sent to the client.

Bookmarklets: WWW Sub-domains, Fit Content to Window Width

Sun 1st May 2005 11:16 GMT by Andrew from Darlington, U.K.

A couple of bookmarklets that I wrote for personal use many months back may actually be of interest to someone else. The first one helps when you type in a domain into your browser's address bar and then discover that ?argos.co.uk? (for example) doesn't have a www sub-domain so you end up with a browser error message. Firing the following bookmarklet inserts a ?www.? and re-submits the request:

javascript:this.location = this.location.href.replace(/\/\//, "//www.");

The other bookmarklet attempts to destroy defined widths in a page. Say you're viewing an online article in The Guardian or Observer and you're a tad unhappy with the tiny column width of the article body, then firing the following bookmarklet attempts to set all widths to their default size for the content they contain, typically the content expands side-ways to fit your entire browser window:

javascript:var d=document.getElementsByTagName("html")[0];d.innerHTML=d.innerHTML.replace(/\swidth(:|=)\s*("|')?\d+(%|px)?("|')/ig, " ");

NB: These bookmarklets have only been tested in Opera

271 Archived Entries by Month & Year

Search Reminder

Can't find what you're looking for? Search for it!

Blog Search