layout hack
layout hackMain Pagelayout hack
layout hack
RSS Feed RSS Feed Main page
layout hack
layout hack
layout hackAbout Melayout hack
layout hack
Software I wrote
Resume
Friends of mine
Pictures
Musicianship
Stuff I have for sale
layout hack
layout hack
layout hackPersonal Newslayout hack
layout hack
2010:
March, April.
2009:
January, March, August.
2008:
Jan, Feb, Apr, May, July, August, September, October.
2007:
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2006:
Jan, Feb, Mar, Apr, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2005:
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2004:
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2003:
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2002:
Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2001:
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
2000:
Jan, Feb, Apr, May, Jun, Jul, Aug, Oct, Nov, Dec.
1999:
Jan, Feb, Jun, Oct, Dec.
1998:
Jul, Aug, Sep, Nov.
layout hack
layout hack
layout hackGeek Stuff (computer related)layout hack
layout hack
Digital Music
Java
Why LiveWire Sucks
Why ASP Sucks (a bit)
Linux
MacOS
Unix
Oracle
Perl
Emacs
O'Reilly
layout hack
layout hack
layout hack(some of) My Interestslayout hack
layout hack
Humor
Sony Playstation
Cars
layout hack
layout hack
layout hackSearchlayout hack
layout hack

layout hack
layout hackAdslayout hack
layout hack



Valid HTML 4.01!

spacer
April 29, 2004

A few years ago I wrote a pretty sophisticated Makefile for Java using GNU Make. It took care of all the usual stuff that you needed - the CLASSPATH, incremental compilation, using a decent compiler instead of the pokey default one, etc. It was a Makefile, which some people hate because Makefile formatting requires the use of tabs and will get mad if your text editor messes them up. It also had lots of dependencies on exotic external command line utilities like "cp" and "tar" that apparently make Make-hata's really nervous because Cygwin still sucks. Still, it didn't take that long to get it working, it did everything that medium sized software teams needed (probably a half dozen teams of 5-10 people told me they were using it and liked it), and it was hella fast.

Now I'm working with second-edge technology (which is the edge just behind the bleeding edge - not quite so advanced and painful, nor nearly as bad as the penultimate-edge technology that I'd been working with for the past couple of years), I'm using Ant. On my eight month old laptop, Ant needs about five seconds to decide not to do anything. It needs about twelve seconds to rebuild the whole project. So, Ant seems to add a 40% overhead to the compilation of my project. I haven't done a side by side comparison but I seem to recall that back in 1998 or so, a 50MHz Sun SPARCStation 20 named Brio, running a really old version of my Makefile with jikes, would rebuild a project in seven seconds instead of the 20 that javac took. So, it's odd that a current project should need that much time on a machine that has a clock speed 16 times higher and a much more advanced CPU (and yes, compiling Java code is a CPU-bound activity, not an I/O bound activity which would be less directly affected by CPU speed). And lest you think this is just a general "Java takes a long time to start" problem, Ant spits out its first line of output ("Buildfile: build.xml") in less than a half of a second (measured with "ant foo 2>&1 | time head -n 1"). It takes Ant 4.6 seconds to get around to saying "compile:", which means "I'm going to start using javac now" (measured with "ant compile 2>&1 | grep "compile:" | time head -n 1"). My only conclusion is that Ant is just slow. I wish I had time to stick Ant in a proper profiler and see what the heck is taking so long.

I don't have a lot of HTML editing and layout work to do these days, but if I did, I'd really want to do it with HyperEdit. I used it for some simplistic editing and it simply kicks ass. You see your changes in real-time; there's no edit-save-reload cycle. Just edit.

I like this idea for how to fix Expose in Mac OS X so that it's as good as Alt-Tab in Windows.

Recently a friend directed me to the web site of a band she sort of likes (she says she only likes one song). They have audio samples in RealAudio and Windows Media formats but not MP3, because apparently they smoke crack which leads them to think this will protect their marvelous music from being downloaded by potential fans. I chose the Windows Media version, in hopes of playing it in VideoLAN Client which is much nicer than Windows Media Player. Nope. It's a .wax file, whatever the hell that is. And guess what, it won't play in Windows Media Player for Mac OS X. Neat! Wait a minute. .wax? What the hell is .wax? I've worked with Windows Media Server 2000 (what a great joy that was) and at one point I thought I kinda had a handle on how all that stuff worked. How many frickin' file formats does one product need? Apparently, at least nine. I can see needing 2 or 3, but 9? Who cares if a metafile points to audio or video or a streaming scratch-n-sniff file? In fact, who cares if a proprietary container file has audio or video in it? And I might add that I think WMP also plays .avi, .mpg, .mp3, .mp4, .mp2, .pls, .m3u, and .mpeg, so it's not like they didn't already have lots of file formats to choose from. I might be mistaken but I'm pretty sure that QuickTime uses .mov, period, for QuickTime content. That seems a lot more sensible.

I'm continuing to work with my new Linux box, trying to get LVM on top of Software RAID-1 working. It works when booted from another disk (which has neither RAID nor LVM in use on it, but does have those features available in the kernel for use with other drives), but I haven't gotten it to boot yet. I think this message has the key: using initrd instead of booting straight from the hard disk.

I also got some new hardware for it. Apparently the folks who put it together (back before Sapient sold it to me in a post-layoff fire sale) saw fit to skip the whole cooling fan hassle. I guess that was OK with the hardware that used to be in it, but ever since I stuck a couple of extra hard disks in it, it's been freaking out and shutting itself off due to overtemperature alarms. I found that the CPU fan was faulty (which was making the computer beep cryptically at strange times, which turned out to be the BIOS trying to tell me that the CPU was getting too hot), and replaced it for $10 ($5 of which was UPS Ground shipping, oy). I noticed at one point that the hard disks were getting super hot too, so I bought a new case fan. As it turns out, these fans are not measured from the center of the mounting screws like I expected (since that's really the dimension that matters), but rather from the outside of the fan housing. So I had to return the 92mm one that I got, in favor of an 80mm one. If you're breaking out the ruler and calculator at this point to see how I could have made that mistake, don't bother. It doesn't quite match up but I figured that maybe I measured it wrong or something, so I guessed when I was in the store. I guessed incorrectly. Anyway, I also got a hard disk cooler, which is pretty nifty: it screws onto the bottom of a hard disk drive and has two small fans that blow directly onto the bottom. Overheat now, I dare you. But the best purchase was the pair of drive trays with built in dual fans. So not only are there 2 fans per drive (plus 2 fans in the power supply, plus a CPU fan, plus a case fan in the front of the case), but 2 of the 3 big drives can be removed by turning a key and sliding them out of the front of the case. This is actually sort of useful since at some point one of them will fail, and the server will be in a rack, and this will save me the trouble of pulling the whole thing out and unplugging everything just so I can work on it. I can power down the server, pop out the tray, swap drives, put it back, and power it on again. Result: only a couple of minutes' downtime caused by a hardware failure. Cool! And no, it doesn't sound like an aircraft carrier, because most of the fans are of the temperature-controlled variable speed variety.


April 25, 2004

Once upon a time, Apple was serious about using formal usability testing to drive their UI design. Spend some time at AskTog and you'll come to respect the work that they did. The resulting little bitty details are not obvious and in some cases are the opposite of what one would expect would work, but they are the results of actual testing so they're pretty hard to argue with. Combine a whole bunch of these and you get an overall user experience that makes users really happy, even if they can't say things like "I really like the hierarchical menu hysteresis" specifically. Some companies tried to clone individual details of the Mac UI but missed the finer points, and the results compared unfavorably.

Apple went even further, though, and took the bold stance of saying that there was One True Way To Do It, for many values of It. So we got the one-button mouse, the floppy with no eject button, the volume that is ejected by dragging it onto the icon that means "delete this and everything in it" for everything else, the application with no open documents that leaves no evidence on-screen that it's running unless it's in the foreground, the 2-fork files that have to be treated specially before they can be transferred to any other OS or sent across the Internet, and many other decisions that many users with computer experience on other operating systems found to be counterintuitive and annoying about the Mac. In a strange reversal (or twist) of their user-centric design tradition for the Mac, Apple dug in and said "we know better than the users" and refused to change.

Then one day, Apple released a major new OS, with a major new UI layer called Aqua. It was pretty clear that a large number of the decisions for this OS were carried over from an OS that didn't have the same user-centered design tradition as the Mac, and that others were designed for pure eye candy value even when that detracted from usability. The stoplight buttons in the upper left hand corner of windows, the magnification and genie effects of the Dock, the strange return to the original control panel design that limits you to a single System Preference Pane ata time (instead of standalone icons in the Control Panels folder that could be opened independently), and the bizarre "everything in one window" ultra-modal design of several apps including iTunes are examples of this. Somebody decided that optimizing for lickable buttons and drool-worthy screen shots was more important than optimizing for usability.

In many cases this has improved as Mac OS X has matured. The new Finder (one of the worst parts of OS X) has gotten better, incorporating some dearly missed features from Classic MacOS, and adding some new things that were never there before.

Expose is interesting, but not a clear improvement. In the demos, all of the windows contain pictures that look drastically different, so the thumbnailed versions are very easy to choose from. However, many Mac users work with windows full of 9- to 12-point black on white text, which is very difficult to distinguish from other windows full of 9- to 12-point black on white text. Expose is an interesting innovation that works well for some people and not very well at all for others. Fortunately, you can just choose not to use it.

Unfortunately, Apple has decided that users want to be application-centric instead of document-centric, which goes against the early Mac design of using the Finder to manage all documents and only making you launch an application to edit them, in favor of the early Windows MDI model that assumes that a user going to want to use one application at a time, that users will open documents by first launching the application that edits them and then selecting File->Open, and that having a parent window that's maximized over the whole screen (blocking all other application windows) is a good idea. Microsoft dramatically reversed this design in Windows 95, in favor of a much better document centric UI that still has a few problems (failure to use Fitts' law, allowing users to run multiple copies of the same application just by opening two documents of the same kind in Windows Explorer unless the application specifically checks for this and works around it, etc.). Apple clearly didn't fall prey to all of these problems, but at some point they decided that users really wanted to work with all the documents of the same type at once.

Mac loyalists for some reason like to emphasize this "application-centric" assumption as though simply identifying it as a deliberate design decision proves that it is correct. I'd like to see the usability testing that led to this design, because it definitely doesn't work for me, or most of the Mac users I know, most of whom work with many windows in many applications, and who rarely want to work only in one application at a time.

The simple fact is that for certain usage patterns, the Windows Alt-Tab design has superior usability to the Mac Command-Tab design as seen in Mac OS X 10.3. Bringing all of the windows for a certain application forward is not always a good idea, because it screws up window layering for things like drag-and-drop and plain old reading of one window while working in another.

Windows' Alt-Tab (and alt-shift-tab) is very efficient for someone who has 10 browser windows, 10 editor windows, 10 terminal windows, and a bunch of other windows open, and who is working within the same 4 windows (which aren't all from the same application) again and again. The current Mac Alt-Tab scheme buries the other applications' windows under the background windows of the application that is brought to the front, obscuring the window you were just working with under windows you aren't working with. Worse, the Cmd-~ keystroke to switch among the windows of a single application goes in only one direction, so if you go from window A to window B in an application, and you want to get back to window A, you may have to Cmd-~ many times to cycle past windows C, D, E, F, G, and so on until you come back around to window A. The workaround is to minimize any nonessential windows to the Dock, and to place your windows super carefully (since you can't really control their layering order without using the mouse every time you want to switch between windows), but this undermines Expose's usability since Expose ignores any windows that are currently minimized to the Dock.

Apple also considers X11 to be an important feature of Mac OS X, but as many heavy X11 users complain, the Mac OS X 10.3 implementation of Cmd-Tab treats all X11 windows as belonging to the same application. Windows' Alt-tab design would not.

It's very disappointing to me that even in the face of user requests for a Cmd-Tab behavior identical to Windows' Alt-Tab behavior, even Proteron LLC, makers of LiteSwitch, refuse to implement it. I emailed them myself, and the response was that the Apple way was The One True Way:

From: Adam Astley
Subject: Re: FEEDBACK: Suggestion

Jamie,

Thanks for writing to us.  Is there a conspiracy?  No.  I think it's
just that mac users don't find the feature particularly useful.
Although it is our most popular feature request from windows
switchovers [my emphasis], we just haven't found a compelling
reason to do it.  I think that there are two reasons we haven't done
it.  First, Windows (incorrectly IMHO) models the "window" as the unit
of analysis when doing switching.  Macintosh models the app as the
unit of analysis.  Once you have switched to an application, you can
use Apple-Tilde (~) to switch between windows within that application.
Second, Apple has built Expose into OS X 10.3 (Panther).  That
essentially provides all of the intra-app switching functionality that
you would ask for.  By default, pressing F10 will display all of the
open windows in the app you're running.

Cheers,

Adam Astley
Proteron LLC

I find it ironic that a company that makes an alternative UI utility for users who want a certain behavior that isn't the Mac OS X default, and that complains that Apple stole its UI design and didn't even get all of it, is still willing to stick up for Apple and to say that Apple's UI design is right, and that users should adapt themselves to that design instead of making their computer adapt to them.

The Proteron employee above mentions a unique difference with the Mac: there would be no way to select a running application that had no open windows, if the Windows Alt-tab design were to be implemented on a Mac. I answered that a very simple design would be to just add an application icon for those applications, and perhaps highlight (or dim?) that icon in some way to show that it wasn't a document window. He didn't seem convinced, because he never responded to my second e-mail.

Based on my continued quest for this behavior on my Mac, and the many many debates I've seen in forums like this one, I have to believe that if someone actually made such an add-on product, that it would sell like hotcakes. There are quite a few would-be Switchers who only have a few gripes about the Mac that prevent them from switching, and this is a big one. In my opinion, it's fine for Apple to offer One True Way of doing something (like the single button mouse) if that suits the majority of users, as long as I have the ability to override that decision if it doesn't happen to suit me. One size doesn't fit all. But for some reason, the "Think Different" Mac community of users and ISVs seems to be stuck on the idea that conformity and unquestioning reverence to authority on this issue is the One True Way.


April 18, 2004

I've been doing some big bad backups this weekend. I'm trying to get things arranged in my home network so that data loss is really really unlikely to happen. Fortunately, I don't work with particularly enormous files, and hard disks and optical media are getting pretty cheap, so it's possible to have some pretty lofty goals. The details aren't that interesting (at least not until it sorta works), but part of the process of planning this out involves figuring out what the media actually costs. That (to me, anyway) is pretty interesting.

Well, I guess we all kinda know that CD-Rs are cheap, what with all the music downloading and burning that people are doing. But DVD-Rs are pretty cheap too. How cheap? Well, based on my most recent purchases, CD-Rs cost about $0.31, and DVD-Rs cost $1.72. (I didn't go crazy trying to save a buck on either of them but I have a feeling that those are pretty competitive prices for decent-quality media. I'm not interested in spending hours and hours trying to find the triple rebate that makes a $15 CD-R spindle cost $8, nor am I willing to trust my backups to generic media just because it's a tad cheaper.)

What's more interesting than the unit prices is the cost per GB. The problem of course is that disk drive manufacturers overstate drive capacity on an industry-wide level, kind of like the monitor manufacturers did a few years ago. There was a lawsuit, but interestingly enough, if you search for the Reuters story entitled "Computer Makers Sued Over Hard-Drive Size Claims" you'll find tons of blogs that link to various Reuters and Yahoo sites, none of which have the story anymore. What's the deal with that? Here's a Wired story about the lawsuit. Anyway, the hard disk vendors have been sued over this, because they reinterpreted the well-known "kilo=1024" prefix in the computer industry to be "kilo=1000", for marketing reasons. When you buy a USB "thumb drive" with 128MB of capacity, you can store 128 * 1024 * 1024 bytes = 134,217,728 bytes. A hard disk manufacturer would call that 134MB; to them, 128MB means 128,000,000 bytes, or 122MB by the kilo=1024 standard. So, you have to be careful of that when calculating cost per GB, because GB might mean 1,000,000,000 bytes or 1,073,741,824 bytes depending on who the figures come from. (Did I mention they got sued for this?) Everything I say from here on out will use kilo=1024 (and mega = 1024 *kilo, etc.).

An 80-minute CD-R can store 737,280,000 bytes of actual data, or 703.125MB. They say "700MB" on the package, which is actually a bit conservative by the kilo=1024 standard (they could say "703MB" and still be correct). The same goes for 74 minutes CD-Rs; they can store 681,984,000 bytes of actual data, or 650.391MB. I say "actual data" because there are lots of different formats that you can use when making a compact disk, from CD-DA to CD Audio to Redbook, and some of them have less error checking (and thus more capacity) but you'd never actually use that for general purpose data. That can cloud the issue because CD nerds seem to like to bicker about true unformatted capacity vs. the capacity you can use from a CD burning program on your desktop computer for putting regular old files on it. On the other hand, a DVD-R can store 4,707,319,808 bytes, and is marked as having a 4.7GB capacity. Hmm. Looks like they're using the phony GB=1,000,000,000 byte standard. Really, a DVD-R can store 4.38GB, or 4,489.25MB.

So, with my prices and those capacities, we can normalize cost to a common capacity: $/GB. I tried $/MB but it's less than one cent, so there's no point in using that scale. CD-Rs cost about $0.45/GB, and DVD-Rs cost about $0.39/GB.

What about speed? It does seem to take an awfully long time to burn a DVD-R compared to a CD-R. It certainly takes longer, but what about the absolute speed when burning a fixed amount of data to each media type? Well, the drives' speeds are rated in "x", which is a relative factor. Actually a "2x" DVD-R can write at "2 times 1x", but "x" isn't a unit of data. It's not even constant between DVD drives and CD drives. A 1x DVD-R drive is much faster than a 1x CD-R drive, because the "x" is just a factor of how much faster they are than the original consumer players were. According to Plextor (a vendor who makes really nice optical drives), 1x for a DVD drive is 1,350KB/sec, and 1x for a CD drive is 150KB/sec. So a DVD drive's 1x is equal to a CD drive's 9x in terms of write speed. My laptop's DVD-RW/DVD-R/CD-RW/CD-R drive will write CD-Rs at 16x, or DVD-Rs at 2x. Let's normalize that to MB/sec: for CD-R's it will write (16 x 150KB/sec / 1024KB/MB) = 2.34MB/sec; for DVD-Rs it will write (2 * 1350KB/sec / 1024KB/MB) = 2.64MB/sec. So actually the write speed is comparable. Let's change the units to GB/min: CD-R's at 0.137GB/min; DVD-R's at 0.154GB/min. At this point it's pretty clear that DVD-Rs are comparable in price per GB and (at least on my drive) minutes per GB, and they come with a major bonus: fewer discs = less swapping and reduced need for segmentation of large files = a major convenience boost.

At some point I'll look into prices for rewritable media, since that makes a bit more sense for backups. However, writing rewritable media is slower, so that may be a problem. Also, it's nice to have archival backups in case something gets corrupted or deleted and you don't notice until much later that there's a problem; with rewritable media it's harder to manage that - you have to get fancy with media rotation schemes, and I don't think that's worth the trouble given how cheap the write-once media is relative to my backup needs.

One last thought: I have read that 56x is about the limit for CD speed, because at a higher speed the angular momentum is so high that the disc is prone to shatter and fly apart inside the drive. If this is true, and is not true for current DVD drive speeds (there is probably an RPM limit but I don't know what that is, or if it's the same for both kinds of media), then the speed comparison will only continue to tip in favor of DVD drives. So far the fastest drive I've seen is Plextor's new PX-712A drive which writes CD-Rs at 48x (7,200KB/sec) and DVD-Rs at "6x-8x" (which they say is 8,310KB/sec-11,080KB/sec, disagreeing with their other page's figures for what 1x means!). Regardless, it seems that DVD writers are outpacing CD writers in terms of speed and cost per GB, and will continue to do so in the future.


April 17, 2004

I spent a bit more time with my newer Linux server (the Debian-based one) getting it closer to where I want it to be. Once I figured out that the kernel that I wanted - the 2.4.25 version which is new enough to handle my snazzy ATA/133 hard disk controller - was not considered 100% "stable", I told apt-get to look for packages in the "testing" tree. My impression of Debian's definition of "stable" is that it's a very conservative one, and that "testing" isn't really that risky. So, I bit the bullet and updated my package list with the testing packages included, and ran dselect again. There it was. I chose to install it, expecting that there would be a bunch of packages that needed to be updated along with it. I didn't expect that there would be over 300. I had to go through a couple of screens' worth of optional packages; apparently a package can have not only mandatory dependencies on other packages, but also optional/suggested packages, such as documentation, or configuration editors. After that, dselect ran apt-get which told me exactly which packages it was going to install, which it was going to update, and which it was going to remove. I said "OK" and found out that my /var partition wasn't nearly big enough to hold all the package files (something like 350MB of them). I fixed that and re-ran it. It mostly went OK, except for one package that for some reason just kept failing to install. Fortunately it was postgresql-doc, which is just a documentation package for the PostgreSQL database, which I don't use now and just sorta wanted to install in case I ever got around to playing with it. (Free software + 80GB disk = little reason not to install stuff I might never use.) Fortunately, that package had no dependencies, so I was able to just un-select it as something I wanted, and everything was OK. After a good while (2 or 3 hours?) of mostly ignoring it while it installed all the updates, it was done, and (having been prompted by the installer to run lilo, which I did), I rebooted.

Amazingly enough, it worked. Mostly. It forgot to load the kernel module that contains the driver for my ethernet card. I had to edit /etc/modules to fix that (a trivial tweak). On the positive side, it did recognize the ATA/133 controller, so the next thing I did was to use the second 80GB hard disk as backup space while I changed my /usr partition to a ReiserFS filesystem instead of ext3. I also moved the boot disk to that controller, and of course ran lilo again to make that work. Wow. This thing is pretty darn fast.

I still have some things to do before I start migrating stuff over from my older Linux server (the one that's doing everything I need a server to do right now). I want to get RAID 1 (mirroring) set up so that I don't have to worry about hard disk failure making me do a painful restore-from-backup. I guess that would be less painful if I just had to install a known list of packages and then restore config files and home directories (which is about how it would work) but downtime on your main server is bad, so I'd like to avoid that. Hard disks are so very much cheaper than the time it would take to reinstall a whole server and reinstall from backup, so this one's a no-brainer. I have a second ATA/133 card that I'll put one or more backup hard disks on as well, so data loss due to drive failure will be very unlikely indeed. Naturally I will continue to do offline backups with offsite storage; right now that means making a copy of my daily backup folder on an encrypted Mac disk image (see here for more about how to do that), burning that disk image file onto a DVD-R with my laptop, and sticking that somewhere safe. The encryption means that I don't need to worry too much about someone getting their hands on the DVD-R - yes, it's crackable eventually, and yes, I'd lose my backup, but it wouldn't be easy to do. More important is the fact that the data is being backed up; the risk of data loss due to hardware failure or human error or malicious intruders has to be weighed against the risk of somebody getting a hold of that DVD-R and then being able to crack it. And you can put an encrypted disk image inside an encrypted disk image, effectively doubling the encryption key length, if that seems necessary to you.

On to non-geeky things: "A church trying to teach about the crucifixion of Jesus performed an Easter show with actors whipping the Easter bunny and breaking eggs, upsetting several parents and young children. Well, it's a pagan holiday anyway. Or maybe not. But it's still wrong to whip the Easter Bunny in front of children. Were these people still hypnotized by the violence in The Passion of the Christ and thought that it was a really groovy idea to expose children to violent torture of a beloved childhood idol? I'm kinda curious about what else these weirdos are teaching the kids in Sunday school.


April 11, 2004

For a while now I've been meaning to do a major upgrade to my home Linux server. It's running Red Hat Linux 7.3 with lots and lots of updates done from RPM packages, and more recently, painful manual installations from source code. You see, back in the days when Red Hat Linux 7.3 was pretty new, you could go to Red Hat's FTP server and download software updates, in RPM format. It comes as one big file that you just tell the OS to install, and voila, that software is updated. Sometimes it will demand that you update a different package first, but generally that's pretty easy. And then one day Red Hat discontinued support for Red Hat Linux 7.3, and that's when it started to get harder to keep my system up to date.

Why keep my system up to date, you ask? Well, you may have noticed that this wild and wooly internet is full of not only a horde of malicious nerds who break into computers for fun and profit, but also a slew of worms that automatically attack random systems 24/7. If you have an old OS sitting on the internet with public-facing services (and I do), you must keep it updated, or it will be hacked. Maybe you won't notice it, but it will be. On several occasions I've been asked by a friend to help them with their computer for some reason or another, and I've noticed that their computer is running unreasonably slowly. They say "oh yeah, it's old and slow, I'm going to buy a new one soon" but after a bit of investigation it turns out to be viruses. They didn't know that and just blamed it on hardware, or maybe an outdated version of Windows. They're kind of right in that an old OS is slow, but not because the new OS is more efficient. If they'd just reinstall the same OS from scratch it'd be faster, until it got infected with a dozen viruses again. Firewalls are great, but if you use Outlook and/or Internet Explorer, your ordinary usage puts you at risk of attack or infection.

So, Red Hat decided they didn't care about Red Hat 7.3 anymore, and that the best way for me to update would be to reinstall everything from scratch (whee!) and start applying patches for Red Hat 8.0, which everybody I talked to said was completely broken. Instead, I found RPMFind, which is basically a search engine for RPM packages. It worked for a while, but after a while I found that even third party sources (usually individuals) weren't bothering to make updated packages anymore. So I finally fell back to the least convenient method: downloading the source code, configuring and compiling it manually, installing it, and hoping that it worked. That's a bad solution because the OS's package manager (RPM) has no way of knowing that I had updated a program without telling it (since I wasn't using packages to do the updates), so its database of what was installed was actually wrong. That's a problem waiting to happen - at some point, the package manager gets so far out of sync with what's really installed that you can't even use it anymore.

I did some research on this across several operating systems. It's an interesting problem. Commercial, non-free, closed source operating systems like Windows and Mac OS X have installers and uninstallers, but no central package manager. Installers carry along with them every file that they could possibly need to install the program on any supported operating system. Maybe some DLL is present on Windows XP, but not in Windows 2000 or 98, and you want to support Win2K and 98 in addition to XP. Your installer has to include that DLL, and you have to do some sort of licensing dance and print a copyright notice with your software (since it's Microsoft's code you're including with your application). Alternatively, you may just have to write the code yourself, since you can't depend on it being there in every system that you want to support, and it may not be available from the OS vendor as an add-on to the older versions of the OS. So the installer program has to be extremely sophisticated, because it has to see what OS you're running, and what else you might have installed that added functionality that the program being installed would want, so that it can decide if it has an older version of some optional thing that you've already installed. The uninstaller has to be clever about deleting things that only could belong to the program it came with, but not things that might possibly be shared. How could it know if they were being used? Typically an uninstaller will ask the user, who is possibly the least knowledgeable entity regarding what files are on the hard disk. (Shouldn't the computer know that itself?)

Package managers are an attempt to solve this problem. They require that software be installed in a standard format that includes a name for the package being installed, a description of what it does (so that a user can decide if they want it), a bunch of files and instructions for how to install them (and possibly how to compile them, if the package includes source code), and most importantly, a set of rules for packages that this package depends on. This means that the package manager can just keep a database of packages that have been installed already, and can check to make sure that the requisite packages are installed before trying to install a new one. If not, you get an error message that says "you need package XYZ before you can install package ABC!". This is nice because it tells you what the universally accepted name for the package is, and it tells you that you don't have all the stuff you need. It doesn't necessarily tell you where to get what you need, but at least it gives you a keyword to search for. In some cases, the dependency is expressed as a requirement for a single file, instead of a package, which is more ambiguous ("uh, OK, how exactly do I know what package provides the file that you need?"), but at least you have a clue that something is missing before you try and compile or run the program and have it fail mysteriously.

(Solaris has a package manager, and I imagine that there are some other commercial OS's that have one too, so there's not a 100% free / commercial split with respect to whether the OS has a package manager or not.)

RPM mostly worked for me, except for two problems: First, the source of the packages for a given Linux-based OS distributor who uses RPM (including Red Hat, TurboLinux, Mandrake, and others) is probably a commercial vendor who will eventually decide to stop supporting your OS. So, like it or not, you are beholden to the whims of a company and its profit motive even though you're using free software. At some point it will make business sense for them to make you buy an upgrade, or subscribe to their service, at which point you'll be back in "fight the vendor" mode, which is probably why you got started with open-source software in the first place. Second, the different Linux-based OS vendors who use RPM don't agree on where the files should be installed on disk, nor what the packages should be called, exactly. So if Red Hat doesn't have an RPM for the software you want but Mandrake does, that won't help you, even if you have all the right versions of the right software installed. The package from Mandrake depends on package names that Red Hat may not have used for the exact same software, so the package manager will claim that you don't have the required software. Also, the files go in different places in Red Hat Linux from where they go in Mandrake Linux, so even if you could get that foreign distro's package installed, it probably wouldn't work.

Meanwhile, back in January I started to use Fink to install some of the development tools I needed for my new job, including the excellent Subversion and more common things like Apache and MySQL. Fink is a project that ports open source packages to Mac OS X and makes them available for download using the APT package manager. I was impressed by this concept: take a package manager, which is powerful but has flaws, and combine it with a central, volunteer organization that isn't trying to find ways to make you cough up cash. It seems to solve the flaws I experienced with Red Hat and RPM. Somewhere is a big server with a master list of every package you can get, and because there is such a list, there's just one naming standard for these packages. Hopefully, that also means there is one filesystem layout scheme too. Nobody is trying to figure out when they should pull the rug out from under you in order to best twist your arm into giving them money. They're on your side, because they are just like you: they're admins trying to make things easier for themselves, and they decided to give away the packages they made. One one hand, it's a bit scary to think that no big powerful company is doing all the work to make all this work, but on the other hand, I never paid for Red Hat, so they were doing all that work for free from my point of view anyway. I had been looking for a setup like this ever since I experienced the magic of the FreeBSD Ports and Packages Collection, which is basically the same thing (giant package database maintained by a single non-commercial organization) but for the FreeBSD kernel. I don't know who came first (FreeBSD ports or Debian apt-get), but I saw FreeBSD ports first.

So, when I was recently confronted with a couple of desirable hardware devices that my Red Hat Linux 7.3 system couldn't use (a Maxtor 160GB hard disk and an ATA/133 hard disk controller), I decided that I needed to finally give in and do a major OS upgrade. But there was no rush. A recent security scare added the urgency, so I started looking at Debian GNU/Linux. I decided to start with the latest stable release: Debian GNU/Linux 3.0r2, a.k.a. "woody". After messing with Fink trying to get the Jigdo package for my Mac (I never did find the package that the Fink package database claims it has), I gave up and installed it on my current Linux box. Whoosh! It really did download and create CD images as fast as my DSL connection would allow. (I had to pick a decent download mirror first, of course.)

Well, the rumors are true. Debian is hard to install. I invited Kim to help me install it just so she could learn more about Linux, and she agreed. I think that was a good thing, because now she has firsthand experience about why Linux has not conquered the universe. (At one point she said "I just want to send Steve Jobs some money right now".) We got through it OK, but I can see why reviewers complain. The installer is nowhere near as friendly as the Red Hat 5.1 installer was, and that's pretty sad, because that was released in mid-1998. It boots into an ancient (2.2 series) kernel by default even though "woody" is the latest stable release, and that ancient kernel can't handle my four(?) year old network card that I got with the computer in 2001 when it was being sold at a dot-bomb fire sale. Nor can it handle the hard disk controller that I had hoped and upgrade would allow me to use. Oh wait, there's this help screen that tells you a magic command to run to boot into a 2.4 kernel. That's a relief, but why not just boot into that by default, since this release was dated late 2003? The 2.4 (Linux) kernel was released in January 2001, and heck, the long awaited 2.6 kernel just came out in January 2004. I think it's time to stop using a kernel from 1999. The disk partitioning process was clumsy and made a major assumption that was incorrect. Lots of people, myself included, like to split their hard disk into a bunch of partitions for /var, /tmp, /boot, and /usr instead of making it all one giant partition. This is good in a server context because (a) "Quota is handled on a per user, per file system basis", so if you want fine-grained control over quotas, you need to do this, and (b) it's very very bad when a disk fills up completely and there is no free space anywhere. All sorts of stuff fails - you can't read manpages to dig yourself out of your mess, you can't move things away temporarily to free up space while you clean things up... it's bad. The Debian woody installer assumes that you just want one partition, of course. You get to create your partition table, format one, and suddenly you're off to picking and installing packages... on your / partition, which has nowhere near enough space for all that stuff. It filled up, the installer failed, and I had to drop to a root shell to clean up the mess. It wasn't pretty. The workaround appears to be that you first make the partition table and format the / partition within the installer, reboot (as the installer tells you to), and then as soon as the installer starts up again, drop to a shell to format the partitions manually, add them to /etc/fstab manually, mount them manually, and then go back to the installer. Or maybe it would be better to do this before rebooting if you want to use ReiserFS (see below for why).

Here are some entertaining screen shots that I took during the installation:

  • Configuring Xaw3dg (40K JPEG image) OK, so what the hell is libXaw, and why did I need a three dimensional appearance again?
  • Configuring Binutils (56K JPEG image) I hope I never ever have to care about this problem, ever, but they feel obligated to tell me all about it. My favorite part of this is that I'm not really given any way to save this potentially useful information in an "installer messages" file so that I can use it later. I guess I'm just supposed to have a notepad or something. I happen to have such a notepad, but I prefer to keep notes in electronic form. I'm sure lots of other geeky types do too.
  • Configuring Ftpd (37K JPEG image) Oh no, globbing attacks! Is that like being slimed? Actually I think I know what this means but I don't use FTP anyway. But if I were going to use FTP, I'd be sort of annoyed that out of the box there was a known security problem that "reasonable" limits can fix. If it's "reasonable", why not make that the default, and let people change those limits if they know they need to?
  • Configuring Locales (49K JPEG image) Which locales do I want to have generated? Why does a locale need generation? What does generating a locale mean? Who's generating it and for whose use? Fortunately I'm a programmer and know what they mean by a locale, as well as which locale I want to use, without seeing a list (en_US ISO-8859-1). I only know this because XML makes you think about this stuff, and I've been to Slovenia where en_US and ISO-8859-1 are not the default choices. But I wonder how many people just happen to know that en means English, and en_US means US English, and most of all that ISO-8859-1 refers to a character set that they want to use. I already specified my language of choice, my country, and my time zone to the installer, but it didn't use that information to guess which locale I wanted to use. In fact, it's displaying some text to me right there on the screen, and yet it has no idea which locale I could possibly want to use. There's not even a default. There's just a big list of codes and numbers. Awesome.

The tetex-bin package never installed correctly. I removed and reinstalled it several times, no go. This is just a regular application program - why the heck would it fail? I don't know. I don't really need it so this is OK, but that's kind of lame - a package that comes with the OS on the installer CDs won't install, period.

I had to manually migrate from ext3 to ReiserFS, because the initial set of packages installed by tasksel doesn't include reiserfsprogs, so even though my root partition was a ReiserFS filesystem, I couldn't make the other filesystems ReiserFS too. At the time I write this, /usr is still an ext3 filesystem because I don't have enough space to back it up while I recreate the /usr filesystem as a ReiserFS filesystem. I did goof when I changed /boot from ext3 to ReiserFS - I forgot to run lilo before rebooting. Yup. I was visited by the knights who say "LI". No boot, just "LI" and then it hung. Fortunately I could just reboot from the installer, manually mount the / and /boot filesystems, and run "lilo -r /target" to fix it. A word to the wise: write down your partition table details (the stuff you get from "fdisk -l /dev/hda" and the contents of /etc/fstab) on a piece of paper and keep it handy. If you get into a situation like this, you'll be glad you know where / and /boot are. Yes, this was my fault, but I was saved by my own cautiousness about writing stuff like that down.

But I think all of this was worthwhile - my goal was not to have a pleasant installation experience, but to have a pleasant upgrade experience over the next few years. That's what I heard about Debian - it's a PITA to install, but once you get it installed, it's really easy to keep stuff updated. I've already had a surprisingly easy time using dselect to install a bunch of admin tools (including reiserfsprogs!). apt-spy is neat (it picks a fast mirror for you to download from). I'm hoping that I can do a binary kernel upgrade to a version that will see my fancy hard disk controller, at which time the whole system will be a hell of a lot faster. I'm also hoping that I'll be able to find packages for most of the stuff I have on my Red Hat 7.3 server, to minimize the pain of migration. It seems like the main hurdle is just finding out whether the package you want is considered "stable", "testing", or "unstable", and then finding a download mirror that will supply you with those packages. I succeeded at doing that for Fink already, so I don't imagine that will take very long for Debian. The impression that I get so far is that dselect, apt-get, and jigdo are all pretty sophisticated, and that mainly you just have to RTFM before doing something new, or you'll get it wrong. I'm spoiled by the ease of use of the Mac, I know.

See this review for more details of how the installation process works.


April 5, 2004

http://khaaan.com/


April 4, 2004

Google has some new stuff. Google Local rules.

This instruction manual for floor sweeping is clearly a sign of a bureaucracy gone mad, but as a software developer it looks like the kind of functional specification that is actually required in order to make offshore outsourcing work. Anything short of that results in major mismatches between what was originally imagined and what is actually delivered. The reality of being a programmer is that you are the last human being in a chain of human beings who are trying to get the computer to do something. Since you're the one typing in the instructions, you can't leave any details "to be decided"; you have to make thousands of little decisions about whether you expect a user's last name will ever be more than 40 characters long, or how many users will have a web browser that can support images in PNG format as opposed to plain old JPEG. Nobody wants to think about this stuff, because it's tedious and really doesn't add anything to the business value of the program you're writing. But somebody has to make that decision, and if it isn't the programmer, it's the people who wrote the tool the programmer is using. At some point before a computer looks at the code, those decisions were made, and were made by a person. If that person is a programmer who is in a different time zone from the people who actually know why the software is being written and what the real-world requirements are, and there isn't a really solid requirements document, that's really really bad. Those decisions are made incorrectly, without consulting the domain experts, and no one will figure that out until the software is thoroughly QA'd. Since QA is the first thing to go when cutting budgets (requirements documentation is the second, and salaries for developers are the third), typically you'll find that in "offshoring" projects, QA doesn't happen, the requirements are pretty bad, and so the faraway developers have to make guesses that nobody actually verifies later. If you're going to outsource development, then for your own sake, write requirements documents like that floor-sweeping manual, and pay someone (maybe another offshore person) to check that the code follows the spec.

After Richard Clarke's apology in the 9/11 hearings, the Bush administration has now apologized too.

A couple of weeks ago I bought a Griffin iTrip FM transmitter for my iPod. I had done some research on various web sites that have user ratings, such as Amazon, and I was hesitant to buy any of them because of the mixed reviews. Well, the iTrip is great. If you expected CD quality sound from an FM signal, or expected an iPod with the FM adapter to last all day without a recharge or a car power adapter (like some of the reviewers on Amazon apparently did), then you'll be disappointed. However, if you're just looking for a better alternative than a cassette adapter and/or a stack of CDs, this is it. The salesperson at the San Francisco Apple Store where I bought it said that it was hard to get a clear signal in SF since there were no available frequencies, but I haven't found that to be a problem. It does take some time to fiddle with the radio looking for a relatively quiet frequency that you can tune the iPod to, but after that it's no biggie. I'm looking forward to my first out-of-town trip, just so I can take this thing with me and have 6 days' worth of my own favorite music in the rental car.

I really like this article from eWeek. It agrees with what I've been saying for years about programmer productivity vs. machine speed. There was a time before I started programming computers that involved decks of punch cards that were painstakingly manually checked by programmers before wasting the extremely expensive computer's precious time. Those days are gone. Now, the computer is expected to do all sorts of things to help developers write good code, ranging from programming environments that help programmers visualize large programs more easily and even complete words for you if there's only one correct thing that you could possibly be typing at that point, to runtime environments that will never do certain bad things that you'd never actually mean to do and that will take care of major drudgery in an automatic fashion. Some folks are still clinging to the bad old days of software development, when computers' time was tremendously more expensive than it is now, and developers were expected to use the equivalent of stone knives to get their work done. "Bloatware," they cry. "Modern software is bigger but it's just as slow as it was 20 years ago," they cry. Well, in most cases, it's faster than it was 20 years ago on 20-year-old hardware, and it does much more. I can clearly remember a time when programs that were expecting a 16-color display would either crash, or display horribly wrong color schemes on a 256-color display. Nobody thought that the program would ever need to run on different hardware, so they hard-coded the color values and memory locations into the program, and it did the utterly wrong thing on newer hardware. Now it's assumed that programs will have to adapt to all sorts of different environments, and so there's more code to do that. That's just a minimal example; let's not forget that the internet and multimedia and graphic user interfaces are all fairly new things that are now assumed to be among the basic requirements of a mainstream operating system. As such, it's reasonable for a computer to need a lot more resources to do things like play MP3s and download CD images and check email while you're writing a document in a WYSIWYG editor, as compared to writing a text file on a time-sharing machine via a monochrome terminal 20 years ago. For the people writing this vastly more complex and useful software, is it reasonable to ask them to continue to use the crude tools of 20 years ago? Hardly. First of all, as a developer I can say that the performance of a given program is not held at some predefined limit by the programming language that's used. A program is as fast as it was worth someone's time to make it. In other words, if it's fast enough, people stop working on making it go faster, and get back to making it do more. The fact that this measure of "fast enough" seems to be staying put over a period of decades suggests that it's more about people's expectations and less about hardware and software. This is a good thing for developer productivity, though - it means that the developers can keep the simple code in most places, and write convoluted but ultra-efficient code in other places, and that leads to ease of maintenance, which means the software gets cheaper. In fact, a lot of great software is free, if ya hadn't noticed. The less we obsess about using outdated low-level programming tools and languages to write ultra-efficient software that takes an eternity to code and debug, the more time we can spend collectively making software that works properly. That may cost you a bit in hard disk storage but that's a whole lot cheaper than programmer time.