Archive for the ‘Programming’ Category

I’ve never had to scale

Wednesday, September 12th, 2007

No, I’m not talking about my sex life, or anything like that. :) It’s just this: I’ve never had a site in the past that had too much success for its own good, and that, therefore, had scalability problems. Each one of my sites has either used some popular, usually well optimized software (say, WordPress or MyBB), or was mostly a bunch of static HTML pages. Neither of which, I believe, have “scalability “problems; the Internet connection or the web server itself (due to the number of simultaneous requests, not really related to what the app does) will complain long before “scalability” enters into it. And, sorry to say, except for a bunch of occasional Digg / Shoutwire / Stumbleupon / Reddit effects, none of my sites was ever truly “stressed”.

That, I hope, is about to change.

For my next project (which is about 75% complete) is a site that may well have scalability problems. Which is good, because I’ll learn about them, and how to cope with them.

What’s that project about? It’s a surprise. :) Suffice to say that, as far as I know, there’s only another one out there, and, weeks ago, it had to shut down its “free” version because it couldn’t deal with its success. On its first days it was quick, then soon it changed into “don’t wait; we’ll email you when the report is ready” mode, and finally the free version went under.

Now, I’m not a full-blown company, I’m “just zis guy, you know”, and I don’t believe I’ll have as much success as that one has / had. But… there is a demand; what happened to it is proof of that. And it’s quite possible that my code won’t scale.

I kind of hope it doesn’t. :)

More additions to the PA top Technorati ranks table

Friday, May 25th, 2007

The top technorati ranks table for Planet Atheism members has been improved again. :) In addition to showing the Technorati rank, number of incoming links (from Technorati as well), and Google Pagerank, the table now shows Alexa ranks as well.

You can now also click on any of the above column titles to sort the table by that particular value/rank. Incoming links and Pagerank are “the more, the merrier”, while Technorati rank and Alexa rank are “the lower, the better”, so sorting takes that into account.

A note of warning: I’ve mentioned before that you shouldn’t really take any of these ranks too seriously, and this is especially true for the Alexa ranks. Alexa is a nice idea (it’s the only one that measures traffic instead of incoming links), but it has the following problems:

  • it only counts hits if the user has installed either the Alexa toolbar (for Internet Explorer) or the SearchStatus Firefox extension (I recommend the latter, since, as everyone knows, MSIE sucks), and
  • it often lumps all subdomains for a particular domain together (i.e. doesn’t distinguish between aaa.domain.com and bbb.domain.com, even though they may be totally unrelated). It apparently has some hard coded exceptions for some (not all) blogspot.com blogs… but the values aren’t really reliable. Still, you can use it to measure the changes in traffic for one site.

Incidentally, the application I’ve coded (and have been improving) to generate this table from a list of blogs is almost ready for public release. :)

Adventures with my Technorati ranks "toy"

Wednesday, May 23rd, 2007

As I mentioned here before, a couple of days ago I coded a program to take an OPML file and generate a table in which the sites listed on that file appear ordered by Technorati ranks. It also shows the number of incoming links (again, from Technorati), and each site’s PageRank.

(By the way: no, this is not ready for release yet. But it will be. Soon.)

Initially, the data collecting part of my program started by clearing a table in a MySQL database, which would then be filled with the values it would get from Technorati and Google. However, this had two problems:

  1. Technorati allows only a limited number of accesses per day. I discovered it when I was making several tests, and, after about half a dozen or so, it stopped giving me data. The problem, then, was that it had already cleared the table… so I ended up with an empty one.
  2. From time to time, Technorati gives me “wrong” ranks / links for a blog – values much lower (but not absurd / “bogus”, just wrong) than what they should be. It’s weird, and not reproducible, and usually, by asking TR again, the correct value is then returned.

To solve the first problem, obviously, some form of keeping the data from the previous run while getting the new values was in order, so that, if Technorati told me to get stuffed, I would still have the data from the day before.

The second problem was a little more complicated, though, in a way, the solution to the first helped me crack it.

My method was this: when running the script, start by copying the original table to another (let’s call it temp1) and clearing the original table. Then get the new data to yet another table (temp2). Afterwards, regenerate the original table with data from temp1 and temp2, the following way:

  • if an entry (identified by the site’s URL) exists in only one of the tables, use it.
  • if an entry exists in both, use the common values (URL, site’s name), and for the 3 numeric values, choose the best value (from the two tables) for each. “Best” means the highest # of incoming links, the highest PageRank, and the lowest Technorati rank.

This way, if once in a while Technorati gives it a much worse value than it should (I’ve never seen it rate a blog better than the reality), it still has a more correct value to use instead.

Sounds fine, doesn’t it? But there’s a problem with this method… which I solved later, but which I’ll discuss the next post. Until then… any guesses as to what it was? :)

My Technorati ranks "toy"

Monday, May 21st, 2007

Inspired by Carlos Andrade‘s own tool, I’ve just coded a couple of scripts to take an OPML file and show an ordered table of Technorati ranks. Naturally, I used it for my own Planet site, Planet Atheism.

Here it is: Technorati Ranks for Planet Atheism members

The implementation was ridiculously simple (and there’s a lot of room for improvement), but, other than Carlos’ tool, I didn’t find any scripts or utilities to do this. And, yes, I searched. Therefore I may release the code soon, as the 2nd project on software.dehumanizer.com, since this can be a fun “toy”. :)

[EDIT: added each blog's Google PageRank to the table. Why not? :) ]

Announcing DailyTasks 0.1

Friday, February 9th, 2007

A few minutes ago, I submitted my first piece of software to Freshmeat (it hasn’t been approved yet; it will probably take a few hours): DailyTasks. It’s a small utility, written in PHP, with both a command line mode and a web interface, which, surprisingly enough, reminds you of daily tasks. :)

The web page linked above tells the “story” in more detail, but, basically, I’m much too chaotic to use traditional task management programs (every time I tried, I seemed to spend more time updating tasks than actually doing them), but I wanted something to remind me, every day, of doing something — from “clean up GMail’s spam folder” through “update a blog” to “do the laundry, if necessary”. :) There was already a similar program (frequent-task-reminder), but it lacked some features that I wanted (such as non-accumulating tasks), and so I wrote my own.

It’s really basic stuff, with no bells and whistles, and the PHP code would probably scare you, so impressionable young people should avoid looking at it. :) But maybe — just maybe — you’ll find it useful.

Adventures with moonmoon and tidy

Tuesday, February 6th, 2007

As I’ve mentioned here before, for Planet Atheism I’m using moonmoon, mostly because 1) everyone else uses planetplanet, and 2) it’s in PHP instead of Python, and I know a little PHP. :)

moonmoon is still on version 0.2, however, and, while it removes “dangerous” tags from feeds automatically, it doesn’t (yet?) deal with unclosed tags. As most of PA’s members are as far from being geeks as possible, they tend to use WYSIWYG editors, and aren’t really worried about “validating HTML”. So, from time to time, a post would make every other post after it show in bold or italic. Annoying, to say the least.

Yesterday, it was even worse: some posts “spilled over” to the sidebar. And it wasn’t just one post causing it, but two, from different blogs, at the same time!

Well, enough was enough.

“Fixing” moonmoon (or, more precisely, SimplePie) was out of the question; I simply don’t know enough PHP / XML parsing to do it. But I tried something else: I saved the generated HTML to a file on the site’s directory, and used tidy on it. Surprise: this new version was perfect! So, I got the idea of using tidy on the generated HTML every time.

Now, PHP has a tidy module, but in PHP5 I would have to compile PHP by hand. Ubuntu doesn’t have a package for that module, unfortunately, and I really didn’t want to make an exception from using apt packages on that server. So, I had to find another way.

My solution was to dump all the page into a buffer (using the ob_ functions in PHP), save it to a temporary file, use the system command to apply tidy on it, load the altered file, and show it to the browser. It’s probably not very efficient, but it works… better than I expected, too. It may be a crude solution, but I’m proud of it anyway. ;)


Creative Commons Attribution-NonCommercial-NoDerivs 2.5 Portugal
This work by Pedro Timóteo is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 Portugal.