Tuesday, 11 November 2008

Getting Data off an old computer

Well since I've mentioned the dread problem of getting data off legacy computers  I thought I'd write a quick how-to. This isn't an answer, it's more a description of how to go about it.

First of all, build up a computer running linux to do a lot of the conversion work on. There's a lot of open source bits of code out there that you will find invaluable. 

Make sure it has at least one serial port. If you can install a modem card that's even better.

Install the following software:

  • Network
    • ftpd
    • samba
  • Serial communications
  • Conversion
    • Open Office
    • Abiword plus plugins

it helps if you're happy with command line operations, and are old enough to remember the days of asynchronous communications. Depending on the sort of conversion you are looking at you might also need a Windows pc to run windows only software to process the conversion. If your linux machine is sufficiently powerful you could run a virtual machine on your linux box instead.

Now turn to the computer you want to get data off. Check if it will boot up. Examine it carefully to see if it has a serial port, or a network port or an inbuilt modem.

If it has a network port, plus network drivers you're home and dry. Configure up the drivers and get the data off either by binary ftp or by copying it to a network drive - this is only really an option for older microsoft operating systems.

If you have an internal modem connect that to the modem in your linux machine. You will need a clever cable to do this. If you have a pair of serial connections you will need a null modem cable, basically a crossover cable. You may be able to find one, old laplink data transfer cables are good, or you may find that you old machine has an 'eccentric' connection. Google may be your friend here to find the pinouts but you have to make up that special cable. Dick Smith or Radio Shack should have all the bits required, but you may have to learn to solder.

On your old machine you need to look for some file transfer software. Often software like Hyperterm (windows) or Zterm (Macintosh) includes xmodem type capabilities, and quite often they were installed by default on older computers. If not, and if the computer has a working dialup connection, google for some suitable software, and download an install it. On old windows machines, including 3.1, Kermit 3.15 for Dos is ideal and freely available.

Also if you're using pure serial communications you need to set up the serial port to something sensible. As some old serial hardware isn't the fastest, something like 9600 baud, 1 stop bit and no parity is a conservative choice. If you're using modem to modem communication they should autonegotiate sensibly.

Then, on your client, configure the connection to the same settings, 9600,n,1 and hit return. Hopefully you should see the login banner of your linux machine.

Login, connect and transfer files. Remember to transfer the files in binary mode. If you don't do this you will lose the top bit and the files may be utterly garbled and useless for any further work.

Then comes the conversion stage.

The first thing to check is if the software on the old machine can export the files in a format that AbiWord or Open Office can read. If so, give thanks.

Export the files, transfer, import and save as the desired more modern format.

However, life is not always kind. But there are a lot of free if old third party converters out there - for example there are scads for wordstar. Sometimes it will make sense to do the conversion on the old host, other times to run the converter on a more modern pc. If it's a windows only conversion application export the disk space from the linux machine via samba and mount it via a windows pc.

Sometimes there simply isn't a converter. This can often involve writing a simple parser using something like perl. If the file format is something with embedded control characters it's fairly simple to write a 'convert to xml' routine. Alternatively you can take the data, strip out all the control characters and recover the text. What you want to do depends very much on the requirements of the job and how important the formatting is. I've written various examples over the years, but this simple example  to fix Microsoft smart quotes should give you a pointer. How easy it is to write a parser is to a large extent dependent on how well documented the file format is and you will need to make some decisions as to what is an acceptable level of fidelity.

Sometimes, it really can be quicker to take the text, strip out all previous formatting and re mark it up by hand!

No comments: