Hard drive cleanup

I’ve been doing some hard drive cleanup at home. Yeah, I know, what better way to spend a Saturday afternoon! Unfortunately, if I don’t clean things up from time to time, I end up with a mess. The main one being a desktop that has millions of files on it, and one folder called “Download” and another called “Backup” that take up half my hard drive.

I’ve been using two programs for some time, and I thought I’d share. In fact, it’s really just one program with two flavors. WinDirStat and KDirStat offer a great way to find files and folders that are taking up the most space on my hard drive.

It lists the worst offenders first, and offers an option to delete without sending to the Recycle Bin. However, you have to be *very* careful when deleting files in this way. Especially in Windows. A good rule of thumb is to not delete anything inside the C:\Windows folder, at all. Sure, there are probably things in there you can delete safely, but it’s a risk I’m generally not willing to take.

If you are brave and want to jump into the folder, please watch out for two things. The winsxs folder often is a tempting target. It can be huge. Leave it alone. There are ways to reduce it’s size, but deleting it outright is asking for trouble.

The other is the Installer Cache folder. Microsoft used to provide a utility for cleaning up these files, but it caused issues. So, even Microsoft themselves can get it right. They very quietly stopped supporting the utility.

Hard disk filling up? Check the MySQL binlog.

Last night our monitoring system started throwing errors about a hard drive filling up on one of our servers. Nothing out of the ordinary was going on, but I was doing some maintenance on one of the sites and removing some old spam comments from the system. We use wordpress, and it provides a handy empty spam option, so I figured I’d use that rather than using good old, reliable SQL. That was my first mistake.

When emptying the spam comments, wordpress issues one DELETE statement for each comment. So, not only does it take forever it also had a side effect I was not expecting. It quickly increases the size of the binary logs.

You see, MySQL has a feature called binary logging. It’s an essential feature that serves at least two purposes. One is data recovery. If you have a binary log, you can restore a database from an backup and then tell MySQL to run through the log and bring your db back up to date. The other is replication. The binlogs are used to create a master/slave relationship between two databases. The slave db pulls the binary logs from the master and then can be an exact copy to provide performance benefits and high availability.

Of course, when the alarms started going off, I didn’t suspect that they were in any way related to any of this. Deleting things is supposed to make a database smaller, right? What I eventually discovered was that the binlogs on the server had grown to 46GB. Considering we only had about 2 gigs worth of real data in the system, I had a configuration issue to track down.

It turned out to be simple. Along the way, someone had commented out a line in the my.cnf file.

#expire_logs_days = 14

So, our binlogs had been slowly piling up since that line was commented out… Sometime in early 2010 according to the date on the oldest log. Ouch. Uncommenting it, and restarting MySQL brought the size down considerably. Now we have room to spare again.

Another mystery solved. Now to seek out the next!

WordPress admin hang after upgrade

It’s 8am on a Tuesday morning, and one of the web sites we maintain has just stopped responding.  It’s not completely unexpected. We just upgraded to WordPress 3.1 last night.

The upgrade went smoothly in our development environment, and after patching a few plugins, we were off and running. The upgrade last night on our production system went off without a hitch as well. My main concern was to make sure the frontend pages still worked, and all the sites in our network looked good. At least they did on the public pages.

But now, in the light of day, something isn’t right. As it turns out, the frontend pages *are* all working. But on one of the sites, the admin interface, is acting weird. The editor who is reporting the issue says the admin is locked up, and sure enough, I try to log in and…nothing. The page just loads and loads. I try again with the same result. And then again, but instead of giving up after 30 seconds, I stop the urge to hit the “reload” button over and over and just let it go. Strangely enough, after a few minutes, the admin interface loads as normal. Two minutes to be exact. I navigate to another page in the admin, wait two minutes, and it loads again. And another, and another, and all of them give the exact same response. The site is not slow, it’s just hanging for two minutes exactly, and then loading the page I requested without issue.

Ok, so my first thought is that it’s a plugin issue. I even find some evidence here and there, to back up that claim. I disable all the plugins on the site. Two minutes later, the plugins are disabled, and the admin still behaves in the same manner. Click, two minutes… click, two minutes.

So next, I call in some help from another developer on the team. We try turning on php debugging, but that doesn’t help, because, first, it takes two minutes between page loads to get anything useful out of the admin, and second, there *is no error.* The pages are not broken, just delayed. Next, he puts a dead simple block of debug code into the admin files of WordPress itself.

<?php die(); ?>

When placed at the top of the file, the admin stops right away. At the bottom or in the middle, it takes two minutes, the page renders (mostly), then dies. After a few minutes of moving the line from one place in the file to the next, and following down a few includes, he’s found the culprit. These simple, innocent-looking lines from wp-admin/admin.php in the core WordPress code.

/**
 * On really small MU installs run the upgrader every time,
 * else run it less often to reduce load.
 *
 * @since 2.8.4b
*/
$c = get_blog_count();
if ( $c <= 50 || ( $c > 50 && mt_rand( 0, (int)( $c / 50 ) ) == 1 ) ) {
  require_once( ABSPATH . WPINC . '/http.php' );
  $response = wp_remote_get( admin_url( 'upgrade.php?step=1' ),
    array( 'timeout' => 120, 'httpversion' => '1.1' ) );
  do_action( 'after_mu_upgrade', $response );
  unset($response);
}

The site having problems is brand new and has fewer than 50 posts. And for some reason, the upgrade db script failed to run last night when the rest of the sites were upgraded. That means, each and every time an admin page loads, it tries to run the database upgrade. And it runs the upgrade in a slightly odd way. Instead of calling a function to perform the upgrade, it does a wp_remote_get, which uses http to call another page on the server. We have a quirk in our production architecture that doesn’t allow the servers sitting behind our load balancer call themselves, so instead of just running the upgrade, it hangs. And it hangs for exactly 120 seconds before giving up. The other sites are not really affected, but even if they are hanging, the code above runs randomly, proportionally to the number of posts they have. And luckily for us, they have a *ton* of posts.

So, is there a lesson here? Not really. It just happens to be a good story. I can try to spin this as a lesson about having your development environments match your production environments as much as possible, but you probably know that already. It’s a luxury that is seldom affordable to have an exact mirror between production and development. But when the bug you run into is *caused* by your production server architecture, how can you avoid it if you have those variations? No really, I’d love to know how. I’ll take any suggestions I can get.

I want… iOS vs. Android

UPDATE – This post is over a year old now. An updated version of this post can be found here.

I’ve been meaning to do this for a while now.  I own both an iPod Touch and a Droid 2.  I primarily use the Droid 2, but I still occasionally fall back to the Touch when I want to do specific things.  This is my short list of “I wants” for both platforms.

iOS

  • Swype – This is the main thing I miss when using iOS. Tap typing feels slow and clunky compared to dragging my fingers over the letters.
  • Adobe Flash – Yeah, yeah, I know… Flash can be annoying because so many ads use it, but when you are browsing news and there’s a video, or a nifty interactive graphic you just want to see it. You don’t care what technology it uses.
  • Google Reader App – Sure you can use the mobile web version of Reader, but the native version for Android is quick to load and quick to respond.
  • Sync over WiFi.  I want to subscribe to a podcast and have it update over wifi.

Android

  • Netflix – I love watching TV shows or movies on the iPod Touch. It’s supposedly not available on Android yet because of DRM issues.  Boo.
  • Experience – I sometimes experience lock ups or jerky scrolling, or just plain weirdness.  I’ve had icons go missing, and applications disappear until I went back to the market and reinstalled them.  With iOS, I haven’t had any of those issues.

Invalid Authenticity Token, IE and Underscores

I spent a couple hours time which I will never get back trying to figure this one out. My rails application worked perfectly fine in Firefox, but IE was throwing an error. That’s not terribly unusual with IE, but this time something was fishy. The error is all over the google. There are several suggested fixes, none of which was working for me. The main one was adding a p3p header to the response that is supposed to to convince IE to allow cookies to be set for your domain. I finally stumbled upon this page that has a subtle reference to not using an underscore in your domain name. Of course, right there in the middle of the subdomain name I had chosen was an underscore. As it turns out underscores are not allowed in hostnames.  Some browsers are just more forgiving than others.

One line of code

You’re always there. You’re that line of code. You are the one that isn’t doing what you’re supposed to. You sneak up and bite me when I least expect it. You make me spend hours and hours tracking you down. Sometimes you’re nested so deep within the system that you can only be triggered when 100 things fall into place. But when the stars align… you are the most important thing in the whole system. Sometimes, you are commented out, or missing all together. Somebody forgot to add you, or forgot to check for side effects when removing you. But even though you are stupid, insignificant, barely-worth-mentioning, only-one-tiny-line-of-code; you are the most critical part of what I am working on at this very moment.

I have only one thing to say to you… You cannot run, you cannot hide. You will be found!

Why is udev renaming my network interfaces?

It was driving me nuts. I was setting up a few virtual servers and the network interfaces were not showing up. I looked in dmesg and there were some odd lines that looked like this.

udev: renamed network interface eth1 to eth2
udev: renamed network interface eth0_rename to eth1

It turns out that udev that takes care of matching up the network interfaces to the physical hardware so the same hardware will always get the same device ids. So, in order to save myself time, instead of installing the Ubuntu again on each VM, I just copied the HD from one machine to the next. The old VM had different MAC addresses for the interfaces saved in the udev config. So, with the help of this post, I deleted /etc/udev/rules.d/70-persistent-net.rules and it stopped doing the rename.

Collection Select

I always fail to remember how to create a simple drop down for my models with a has_many / belongs_to relationship. It’s one of the most basic things you could possibly do with a form, and yet each time I want to hook one up I find myself opening up a tab and googling for it. Well, no more. From now on I’ll say to myself, “Self, you put that on your blog.” So without further ado… collection_select to the rescue!

Let’s say I have a model Nerd that has_many Computers. When I create or edit a computer, I want to select the nerd it belongs_to.


class Nerd
  has_many :computers
end

class Computer
  belongs_to :nerd
end

Then in my view for the computer I’d have this:

<% form_for :computer do |f| %>
  <p>
    <%= f.label :nerd %>
    <%= collection_select(:computer, :nerd_id, Nerd.all, :id, :name) %>
  </p>
<% end %>

Nokogiri and Css Selectors with Namespaces

I’ve been using the css function from nokogiri to navigate xml and grab elements, instead of using xpath.  I love the fact that it’s an option, since css feels to much more natural.  But, today I ran into something I haven’t had to deal with before.  I was parsing some xml from a service when I noticed the xml had some namespaced items, like so.

<xml>
<blah:item1>some text</blah:item1>
<blah:item2>more stuff</blah:item2>
</xml>

So, I tried this, thinking I could get the value back.

node.css("blah:item1")

But, no luck. However, as it turns out, there is a special selector made just for this. Here is the reference on w3.org. The correct code is below:

node.css("blah|item1")

Bloxy-two theme comment fix

People were getting 404 errors on charts for charity when trying to comment.  The comments were coming through just fine, but they would be redirected to the 404 page after the comment went through.  Turns out, there is a bug in the bloxy-two theme causing it.  First, I upgraded to WordPress 3.0 thinking that might fix it, but when that failed, a little googling solved the problem for me.  I should have known that with the active community at wordpress.org someone would have run into it before me.  If you stumble upon the same and end up here, continue on to the source of the fix.