Thought I’d share a few notes on the things we test in the Kaplak Labs these days. Kaplak Labs is simply a WordPress based site in our WordPress MU powered setup, on which we test themes and plugins before we employ them on other sites. Right now I’m preoccupied with setting up a filtering process for Kaplak Stream. This filtering process aims to sanitize feed items and add some stuff to each item, which improves it’s chances for survival in the stream :
- Retrieve all tags/categories from posts and create new tags/categories if they don’t exist.
- Semi-automatically tag/categorize all feed items. Sometimes feed publishers don’t tag/categorize posts very well, and even a well-tagged/categorized item may have new meaning in a different context. We use the Calais Autotagging plugins for WordPress to do this, for the time being.
- Convert all categories and tags to categories only, to keep things clean and simple. We actually treat categories as tags, though. Because WP categories is the more widely used functionality of WordPress of the two, we’ve decided to go with categories over tags.
- Add link to the item source directly in the feed item content, to make sure (sort of) that it stays with the unaltered post when it is fetched and possibly re-published from the Kaplak Stream.
- Cache all images locally to improve performance and avoid traffic spikes on source sites, when subsequent sites fetches images all way back from the source. Kaplak Stream hosts all images (for which we will probably be using Amazon S3) to ensure their availability for all sites which fetch items from the stream.
- It should also filter out spam and duplicate items. We still have to sort out however, what happens if an improved version of a post gets fed back into the Stream. Ultimately, we’d like users to be able to tag and categorize items according to the contexts they use them in, and be able to retrieve these back into posts in the stream.
In the process of setting this up I discovered Yahoo Pipes, which looks like a very useful tool taking in an amount of data (in a feed format), manipulate it and spit out a new feed. Experimented a bit with it, and found it a bit tricky to actually create something useful, but will no doubt give it some further attention. We may be able to use it for something.
Kaplak Stream is based on a Wordpress MU install (currently v2.6.1), where a network of niche sites are fed one or more feeds on a particular subject in the ’stream’ or from particular online services, using feed aggregation tools.
Building the setup for Kaplak Stream so far has revealed a path ridden with challenges (as one might expect). WordPress MU, which is a tremendously powerful package, is not as widely used as it’s popular little sister, and therefore is less well documented and supported, which goes too for the compatibility and effects of various plugins.
One initial thing which gave rise to some trouble, was to get WordPress MU to stop worrying and love embedded stuff such as YouTube videos and widgets. WordPress MU was designed for great environments hosting thousands of blogs, with thousands of different users, and has a higher security threshold than regular WP. And there’s no way to turn this filtering of tags off in the Admin interface.
Now, there’s a plugin called Unfiltered MU which will remove this filtering of posts and thus allow the embedding stuff. Unfortunately this plugin works only with posts actually published using the Admin interface editor. It doesn’t work with imported posts (from your old single-WordPress setup), and apparently it doesn’t work with aggregated posts either. So if you setup MU and want it to import an old blog or set it up to aggregate items from a feed, you still got trouble.
I found out one has to manually edit kses.php to enable the tags used by embedded stuff, at one’s own peril. For our purpose, however, we’re not concerned with security in the sense that we are the only users of our system, for the time being.
At your own peril (I underscore the fact that you may put your setup at risk enabling these HTML tags, but hey, life is dangerous) : Put in these tags and something along the lines of the below code into your “allowed” arrays in kses.php : object, embed, param, script.
'object' => array (
'id' => array (),
'classid' => array (),
'data' => array (),
'type' => array (),
'width' => array (),
'height' => array (),
'allowfullscreen' => array ()),
'param' => array (
'name' => array (),
'value' => array ()),
'embed' => array (
'id' => array (),
'style' => array (),
'src' => array (),
'type' => array (),
'height' => array (),
'width' => array (),
'quality' => array (),
'name' => array (),
'flashvars' => array (),
'allowscriptaccess' => array (),
'allowfullscreen' => array ()),
'script' => array (
'type' => array ()),
Pick the ones which you need for your videos or other embedded media to work. Allowing the ones listed will allow video embeds from most providers, incl. YouTube, Google Video, Viddler, Blip.tv and others as well as widgets from a lot of sources. It works on posts aggregated by FeedWordpress for instance, which was my problem with the “Unfiltered MU” plugin.