Comment on “There’s a hungry digital tiger waiting for us all . . .”

Aside

http://rebeccasnotepad.wordpress.com/2012/05/01/theres-a-hungry-digital-tiger-waiting-for-us-all/

Interesting read. Being in technology, I see this problem too. I call it “code rot”. If a program we’ve written sits on the shelf unused, it slowly becomes incompatible with the inevitable upgrades to our systems.

Open source software seems to be an exception to this. As Linus Torvalds, the creator of the Linux operating system once said, “Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)

Unfortunately, solutions to the problem are hard to find. Popular things are saved, because they are useful, entertaining or valuable to people. Copyright and DRM (digital rights management) also get in the way of successful archiving as well. If someone is willing to save something digitally, or convert it into another format, many times it is actually illegal for them to do so.

Compile this into that

UPDATE: I made a mistake in thinking that RubyMotion compiled Ruby into Objective C. That was incorrect and is corrected in the comments below.

There’s a new trend I find very interesting. It goes like this; take some code in one language, and compile it into another.

Sometimes it’s a completely new language, like CoffeeScript, or Clojure. Other times it’s an existing language like changing Ruby into Objective C with RubyMotion, or Emscripten, which changes C++ into Javascript.

My initial reaction was that programmers are inherently lazy. We want our syntactic sugar. Those are the things that make languages easy to work with. There is nothing bad about this. Making things easy on ourselves is a good thing. We get more done, and the world is better for it.

Also, I lied before when I said this was a new trend. It’s actually been around a long time. No one writes assembly code anymore. C was a better way of expressing code, so we moved to that, and let the compiler do the work of assembling things into machine code. Compile-this-into-that; C into machine code.

But machine code is inflexible. Write once, run once. So, Java comes along and promises to fix all that with bytecode, and a virtual environment. Compile-this-into-that; Java into bytecode. Interpret with a JVM.

There’s also PHP and Ruby. It’s not strictly the same, but the concept is similar. No compile step, but you simply write the code, and then run through an interpreter, much like you do with Java. The interpreters are written in C. So, it’s not compile-this-into-that, but rather, interpret-this-with-that.

So, if this has been going on for a long time, why am I even bother to write about it? Well, the new trend is adding more and more steps, and I find it slightly disturbing. Compile-this-into-that-and-interpret-with-another. Like I said before if it makes us more productive, I’m all for it.

However, I think we may be writing compilers because it’s easier than evolving the systems we use. Examine that statement for a minute. Writing a *compiler* is easier? Why would that be?

My opinion is that it’s because standards are controlled by committee. Compilers are something one person, or a single group of people acting in concert can do. It’s kind of obvious, but I had to write it all down to get my head around it. We’re writing compilers because we can’t get our systems to evolve fast enough to keep up with our needs. We can’t get systems to evolve fast enough because our systems are bogged down by bureaucracy. Here are two lists companies that work on javascript. Ordinary Members, Associate Members. Many of those companies have competing interests, so progress is slow.

But, for now, it’s what we have. It lets us write in our favorite and most comfortable environment and get the benefit of making that code available in many places. And as long as the compilers write better code in the destination language than we do, why not? Anyone have opinions that differ? Let me know in the comments below, or on Twitter or Google Plus.

 

 

 

How to load test your web site or application with siege

Trebuchet

Time to storm the castle!

Your mileage may vary, but this worked well for me. I wanted to test one of our sites under load. Our sites generally get a lot of traffic, so when we move from staging to production we have to keep in mind that we’re about to go from 1 or 2 developers hitting the pages to thousands of people.

I’ve previously used Apache Benchmark (ab) to load test, but it’s fairly limited. It can only test one page. I did a bit of digging and found siege. It was perfect for what I wanted to do. I wanted to submit my site to a heavy load, over a long period of time, and see what happened. I also wanted semi-realistic traffic, and with a few well-typed commands, I was able to create a file that siege can read that contained *exactly* the traffic from our site.

I used this to read the last 1000 hits from our apache logs:

tail -1000 /var/log/apache2/access.log | awk '{print "http://mysite.com" $8}' > /tmp/siege-urls.txt

It’s pretty simple, but I’ll break it down. The following gives me the last 1000 lines from the log. Your log might be in a different location, or named something else. If you want fewer lines, or more lines, change the 1000 to something else.

tail -1000 /var/log/apache2/access.log

Then, pipe the output of the above to awk and print out the url. In our apache logs, the url was in the eighth position. I also appended the site’s domain to the output, since the apache log does not contain that information (at least not in the format I wanted).

| awk '{print "http://mysite.com" $8}'

Lastly, save it to a file for later use.

> /tmp/siege-urls.txt

Next, I used this file to place the site under load. In the example below, I used -i for “internet” mode, where siege randomly reads a line from the file and requests it from the server. I also used only 4 concurrent users or worker threads. If you ramp up the concurrency, you can really put a lot of strain on the server.

siege -i -c 4 -f /tmp/siege-urls.txt

When the siege is underway, you’ll get output that looks like this:

HTTP/1.1 200   0.53 secs:    6926 bytes ==> /some/url?some=params
HTTP/1.1 200   0.54 secs:    7132 bytes ==> /some/other/url?other=params
HTTP/1.1 500   0.13 secs:     521 bytes ==> /some/url?some=params
HTTP/1.1 200   0.64 secs:    7133 bytes ==> /some/other/url?other=params
HTTP/1.1 500   0.13 secs:     521 bytes ==> /some/other/url?other=params
HTTP/1.1 404   0.09 secs:     431 bytes ==> /some/url?some=params

I paid close attention to the 404s and the 500 errors. They indicated that I was getting requests that were erroring out for some reason. Sometimes, those errors are simply bots that grab a hold of old urls and continue to request them. Sometimes, they are cause for concern.

Hitting Control-C ends the siege and you then get output like below.

Lifting the server siege...      done.
Transactions:		         125 hits
Availability:		       88.03 %
Elapsed time:		       29.18 secs
Data transferred:	        1.18 MB
Response time:		        0.41 secs
Transaction rate:	        4.28 trans/sec
Throughput:		        0.04 MB/sec
Concurrency:		        1.77
Successful transactions:         101
Failed transactions:	          17
Longest transaction:	        3.13
Shortest transaction:	        0.08

I also found it useful to look at top and a few other tools on the server while the siege was underway. In our case, I was interested in passenger’s memory consumption, so I used passenger-memory-status and passenger-status.

Photo Credit: Martin AddisonDemonstrating the Trebuchet

I want… iOS vs. Android revisited

I wrote this post about my wishlist for iOS and Android over a year ago, and it’s due for a refresh.

iOS

  • Swype - it’s still missing. The keyboard on iOS is still much the same. The split keyboard showed up in iOS 5, but really nothing for my iPod touch. There is a
    hidden auto-complete suggestion feature as well, but it requires some crazy hackery to get it working.
  • Flash - ditto here. Although I’ve noticed the problem less and less. Sites seem to be more aware of iOS and it is driving them to use less flash. And honestly, that is a good thing.
  • Google Reader App – While there still is no native reader app for iOS, I don’t care anymore. The Google Reader mobile web experience has greatly improved over the last year.
  • Sync over Wifi - This one was a long time coming, but iOS 5 added it and I have no complaints. It works as advertised.

Android

  • Netflix - The Netflix app finally made it to my Droid 2 late last year. Again, it works as advertised and I have no complaints. One thing it did make me notice was the difference in quality of the screen. My iPod Touch’s screen is brighter and provides a more vivid colors.
  • Experience – I received an upgrade to Gingerbread late last year, and nearly all the experience issues have disappeared. On a rare occasion, the UI will seize, but at this point, my iPod does so as well, so it’s a draw.


Developing on the fly

If you see any weirdness for the next few days, please excuse it. I’m trying out MTV and doing some theme development without testing. The best way to learn something new is to use it for something, so why not make my own personal blog the guinea pig? If my experiment fails, I’ll just revert back to the theme I was using before.

UPDATE

I got it working. I had to make one modification to the core, as I am on shared hosting, and I have an open_basedir restriction. The code uses a folder in /tmp, and on my system, I have no access to it. I worked around it by placing the temporary files inside the plugin directory. Not ideal, but it worked.

There were a few problems with the example themes as well. The simple theme worked ok, but the twentyeleven theme was broken. I made some modifications and submitted them back through github.

Anyway, not everything works, but I’m calling it close enough for now. There’s no commenting (sorry), and search doesn’t work either. I did make the category and tag pages work. Before my changes, they were giving 404 errors.

WordPress as a framework

Found on builtwith.com

I use wordpress for this blog. I also use it at work. WordPress is a wonderful solution when you need a website or a blog and you need it *right now*. That is reflected in the fact that as of now (Oct 22, 2011), it runs 1/2 of the top 10,000 sites on the internet. That blew me away the first time I heard it.

In the last year or so, especially with the addition of custom post types, it has become quite flexible as well. It is fairly easy to pull together a quick plugin to add a new type of content to wordpress. At work, we’ve used custom post types to create things like candidate profiles and video galleries. However, wordpress starts to falter a bit when what you are trying to add doesn’t bear at least a small resemblance to a post.

I’ve heard it said that WordPress is a CMS that is trying to be a framework, so I’m always interested when I hear of projects that attempt to make it even more framework-like. MTV is one of those projects. Some developers at the News Apps Blog Chicago Tribune have recently released a plugin that makes WordPress act more like a true framework.

My main question at this point is the same one they themselves ask in their documentation. “Why?” I can understand it if you are limited to PHP hosting. I can also see it as a way to introduce framework-based programming to developers who are not familiar with frameworks like Ruby on Rails and Django. MVC frameworks can be intimidating, and this could be a way to introduce those concepts in a more familiar environment.

At any rate, it’s intriguing and is worth a longer look, which I will try to do in the coming weeks.

Hard drive cleanup

I’ve been doing some hard drive cleanup at home. Yeah, I know, what better way to spend a Saturday afternoon! Unfortunately, if I don’t clean things up from time to time, I end up with a mess. The main one being a desktop that has millions of files on it, and one folder called “Download” and another called “Backup” that take up half my hard drive.

I’ve been using two programs for some time, and I thought I’d share. In fact, it’s really just one program with two flavors. WinDirStat and KDirStat offer a great way to find files and folders that are taking up the most space on my hard drive.

It lists the worst offenders first, and offers an option to delete without sending to the Recycle Bin. However, you have to be *very* careful when deleting files in this way. Especially in Windows. A good rule of thumb is to not delete anything inside the C:\Windows folder, at all. Sure, there are probably things in there you can delete safely, but it’s a risk I’m generally not willing to take.

If you are brave and want to jump into the folder, please watch out for two things. The winsxs folder often is a tempting target. It can be huge. Leave it alone. There are ways to reduce it’s size, but deleting it outright is asking for trouble.

The other is the Installer Cache folder. Microsoft used to provide a utility for cleaning up these files, but it caused issues. So, even Microsoft themselves can get it right. They very quietly stopped supporting the utility.

Hard disk filling up? Check the MySQL binlog.

Last night our monitoring system started throwing errors about a hard drive filling up on one of our servers. Nothing out of the ordinary was going on, but I was doing some maintenance on one of the sites and removing some old spam comments from the system. We use wordpress, and it provides a handy empty spam option, so I figured I’d use that rather than using good old, reliable SQL. That was my first mistake.

When emptying the spam comments, wordpress issues one DELETE statement for each comment. So, not only does it take forever it also had a side effect I was not expecting. It quickly increases the size of the binary logs.

You see, MySQL has a feature called binary logging. It’s an essential feature that serves at least two purposes. One is data recovery. If you have a binary log, you can restore a database from an backup and then tell MySQL to run through the log and bring your db back up to date. The other is replication. The binlogs are used to create a master/slave relationship between two databases. The slave db pulls the binary logs from the master and then can be an exact copy to provide performance benefits and high availability.

Of course, when the alarms started going off, I didn’t suspect that they were in any way related to any of this. Deleting things is supposed to make a database smaller, right? What I eventually discovered was that the binlogs on the server had grown to 46GB. Considering we only had about 2 gigs worth of real data in the system, I had a configuration issue to track down.

It turned out to be simple. Along the way, someone had commented out a line in the my.cnf file.

#expire_logs_days = 14

So, our binlogs had been slowly piling up since that line was commented out… Sometime in early 2010 according to the date on the oldest log. Ouch. Uncommenting it, and restarting MySQL brought the size down considerably. Now we have room to spare again.

Another mystery solved. Now to seek out the next!

WordPress admin hang after upgrade

It’s 8am on a Tuesday morning, and one of the web sites we maintain has just stopped responding.  It’s not completely unexpected. We just upgraded to WordPress 3.1 last night.

The upgrade went smoothly in our development environment, and after patching a few plugins, we were off and running. The upgrade last night on our production system went off without a hitch as well. My main concern was to make sure the frontend pages still worked, and all the sites in our network looked good. At least they did on the public pages.

But now, in the light of day, something isn’t right. As it turns out, the frontend pages *are* all working. But on one of the sites, the admin interface, is acting weird. The editor who is reporting the issue says the admin is locked up, and sure enough, I try to log in and…nothing. The page just loads and loads. I try again with the same result. And then again, but instead of giving up after 30 seconds, I stop the urge to hit the “reload” button over and over and just let it go. Strangely enough, after a few minutes, the admin interface loads as normal. Two minutes to be exact. I navigate to another page in the admin, wait two minutes, and it loads again. And another, and another, and all of them give the exact same response. The site is not slow, it’s just hanging for two minutes exactly, and then loading the page I requested without issue.

Ok, so my first thought is that it’s a plugin issue. I even find some evidence here and there, to back up that claim. I disable all the plugins on the site. Two minutes later, the plugins are disabled, and the admin still behaves in the same manner. Click, two minutes… click, two minutes.

So next, I call in some help from another developer on the team. We try turning on php debugging, but that doesn’t help, because, first, it takes two minutes between page loads to get anything useful out of the admin, and second, there *is no error.* The pages are not broken, just delayed. Next, he puts a dead simple block of debug code into the admin files of WordPress itself.

<?php die(); ?>

When placed at the top of the file, the admin stops right away. At the bottom or in the middle, it takes two minutes, the page renders (mostly), then dies. After a few minutes of moving the line from one place in the file to the next, and following down a few includes, he’s found the culprit. These simple, innocent-looking lines from wp-admin/admin.php in the core WordPress code.

/**
 * On really small MU installs run the upgrader every time,
 * else run it less often to reduce load.
 *
 * @since 2.8.4b
*/
$c = get_blog_count();
if ( $c <= 50 || ( $c > 50 && mt_rand( 0, (int)( $c / 50 ) ) == 1 ) ) {
  require_once( ABSPATH . WPINC . '/http.php' );
  $response = wp_remote_get( admin_url( 'upgrade.php?step=1' ),
    array( 'timeout' => 120, 'httpversion' => '1.1' ) );
  do_action( 'after_mu_upgrade', $response );
  unset($response);
}

The site having problems is brand new and has fewer than 50 posts. And for some reason, the upgrade db script failed to run last night when the rest of the sites were upgraded. That means, each and every time an admin page loads, it tries to run the database upgrade. And it runs the upgrade in a slightly odd way. Instead of calling a function to perform the upgrade, it does a wp_remote_get, which uses http to call another page on the server. We have a quirk in our production architecture that doesn’t allow the servers sitting behind our load balancer call themselves, so instead of just running the upgrade, it hangs. And it hangs for exactly 120 seconds before giving up. The other sites are not really affected, but even if they are hanging, the code above runs randomly, proportionally to the number of posts they have. And luckily for us, they have a *ton* of posts.

So, is there a lesson here? Not really. It just happens to be a good story. I can try to spin this as a lesson about having your development environments match your production environments as much as possible, but you probably know that already. It’s a luxury that is seldom affordable to have an exact mirror between production and development. But when the bug you run into is *caused* by your production server architecture, how can you avoid it if you have those variations? No really, I’d love to know how. I’ll take any suggestions I can get.

I want… iOS vs. Android

UPDATE – This post is over a year old now. An updated version of this post can be found here.

I’ve been meaning to do this for a while now.  I own both an iPod Touch and a Droid 2.  I primarily use the Droid 2, but I still occasionally fall back to the Touch when I want to do specific things.  This is my short list of “I wants” for both platforms.

iOS

  • Swype – This is the main thing I miss when using iOS. Tap typing feels slow and clunky compared to dragging my fingers over the letters.
  • Adobe Flash – Yeah, yeah, I know… Flash can be annoying because so many ads use it, but when you are browsing news and there’s a video, or a nifty interactive graphic you just want to see it. You don’t care what technology it uses.
  • Google Reader App – Sure you can use the mobile web version of Reader, but the native version for Android is quick to load and quick to respond.
  • Sync over WiFi.  I want to subscribe to a podcast and have it update over wifi.

Android

  • Netflix – I love watching TV shows or movies on the iPod Touch. It’s supposedly not available on Android yet because of DRM issues.  Boo.
  • Experience – I sometimes experience lock ups or jerky scrolling, or just plain weirdness.  I’ve had icons go missing, and applications disappear until I went back to the market and reinstalled them.  With iOS, I haven’t had any of those issues.