Another great tool I’ve discovered is PubSub. In simple terms it’s Google for dynamic web sites (those whose conent is updated frequently) such as news sites or blogs. As these dynamic sites are updated with content that matches your search you are notified either through your aggregator or through XMPP, a protocol invented for instant messaging.
Aggregator operate on what’s known as a "pull" algorithm. It’s the responsibility of the client to periodically request updates from the server. In contrast, XMPP uses a "push" algorithm. When there are updates the server immediately notifies all clients. There are pros and cons to both techniques, but I have no real need for "push" and just subscrbe to searches via my aggregator (Bloglines).
So, why is it useful to subscribe to searches? Well, I want to read certain topics, such as articles on .NET and XUL programming (don’t worry if you don’t know what those topics are, it’s not that relevant to this post). In the past, I did Google searches to find blogs that discussed these topics and subscribed to them. If they referenced other blogs I’d often subscribe to the new ones as well. This works, but there’s a few problems. First, many blogs are not focused on single topics. A blogger might discuss XUL often enough to be worth subscribing to his blog, but still 50% or more of his posts are on subjects I have no interest in. Plus there are lots of good articles I miss out on because I’ve yet to discover the blog site. PubSub solves all of these issues. Instead of subscribing to several sites to get information on a topic, I instead create a PubSub search on the topic and subscrib to it instead. Only posts relevant to the topic I’m interested in are found in the subscription (assuming I can create a really good search query for PubSub) and I miss fewer relevant posts.
My only complaint with PubSub so far is a complaint shared with most search engines. The query language is not friendly, at least for the types of searches I make. But the nature of subscribing compounds this issue. Let me expand on this. The first complaint is that the default "operator" is OR. Users not familiar with boolean algebra think of a search as simply a request for sites that contain all terms that they type in. This is an AND operation, not OR. I created a few bad queries to begin with by making this assumption (and I’m a developer who certainly does understand boolean algebra and query languages). For instance, out of vanity I thought I’d search for references to myself and put in the key words William and Kempf. Got back sites about William Hung, etc. The OR operator didn’t give me what I wanted and I had to modify the search to use an AND operator as I expected it to default to.
Even after familiarizing myself with the query syntax, there’s still problems. However, they are problems shared with nearly every other online search engine I’ve used. The first, and by far the most annoying for me, though it may not effect everyone, is that the queries can’t handle many of the terms I frequently need. For example, .NET causes more false hits than correct ones, because the search engine ignores the period. Likewise C++ and C#, two programming languages I use a lot, are nearly impossible topics to search on. Then there are other terms I want to use, such as Mono, an OpenSource iimplementation of the .NET runtime and C# language, that return too many false hits because they are words with multiple meanings. With static searches this is annoying, but you can either try and ignore the false hits or refine your search by adding more terms that narrow the subject down to what you’re currently really looking for. But with a search for dynamic content you can’t easily narrow the search and the false hits keep coming compounding the annoyance factor and increasing the time wasted in human filtering.
If anyone knows of creative ways around these issues when using PubSub, I’d love to hear them. I’m hoping to learn a few tricks to work around the problem, as I did with static search engines, though I’m finding the dynamic nature of PubSub makes the tricks more difficult to discover… if they exist.