How to read data from magnetic tapes and playing the detective with octal dump - Part 3

25 min. read

This post is part of a series of three posts. Check them out!:


Playing the detective with octal dump

Remember the sea of octals I mentioned before?

Octal dump, or od, is a very handy command that allows you to see through the entrails of your files. To illustrate this, let's create a file called file, containing the text Hey, I'm an lmd file., and use octal dump on it:


$ od file
0000000 062510 026171 044440 066447 060440 020156 066554 020144
0000020 064546 062554 005056
0000026

The numbers in the first column to the left are the number of bytes read. The other numbers are the contents of our file, in the default representation of octal dump, which is octal shorts. A short is two bytes long, but the smallest variable we have in our data is a char. Does octal dump have a representation that is 1 byte long? Yes, our good ol' friend ASCII:


$ od -c file
0000000   H   e   y   ,       I   '   m       a   n       l   m   d
0000020   f   i   l   e   .  \n
0000026

The option -c tells octal dump to use ASCII characters. The \n is a carriage return. The first column (the number of bytes read) is still in octal shorts, the default representation of octal dump. The octal numbers in that column can be translated into decimal numbers using the Fundamental theorem of arithmetic:

  • 20 = 2x81 + 0x80 = 16 bytes read in the first line, and
  • 26 = 2x81 + 6x80 = 22 bytes read in the second line.

However, doing it this way would be impractical. To make things more human friendly, we can tell octal dump to print the first column directly in decimal format, adding the option -A d:


$ od -c -A d file
0000000   H   e   y   ,       I   '   m       a   n       l   m   d
0000016   f   i   l   e   .  \n
0000022

But our files are in another castle! They were not saved in ASCII, they are raw data. So how can we take a look at the data one byte at a time but not in ASCII format? Remember; 1 byte is 8 bits, an octal. And of course, octal dump has an octal-bytes representation:


$ od -b -A d file
0000000 110 145 171 054 040 111 047 155 040 141 156 040 154 155 144 040
0000016 146 151 154 145 056 012
0000022

The option -b prints each byte of our file in octal format; 110 is the octal representation of the letter H, 145 is the octal representation of the letter e, and so on. Now, we said we have to jump through headers or corrupted parts of a file. We can do that with the option -j <BYTES_TO_JUMP>, for example, to jump four bytes:


$ od -b -A d -j 4 file
0000004 040 111 047 155 040 141 156 040 154 155 144 040 146 151 154 145
0000020 056 012
0000022

See how the first column is updated too? :D

Finally, I had to handle files of hundreds of Mb. How do I check only an amount N of consecutive bytes? Using -N <BYTES_TO_SHOW>:


$ od -b -A d -j 4 -N 2 file
0000004 040 111
0000006

Now, let's take a look at a real lmd file containing real raw data and not ASCII characters.

If we pick one file from the measurements made in 2000 (constant event length), all we have to do is to search for a 000 000 000 052, which is a 42 in octal, written across four bytes. Remember that 42 is the number of signals we have in that experiment (the event length) and that the event length has a size of four bytes. Then comes a short (2 bytes) with the subtype, that is 000 001, then comes a short with the type, or 000 012, where 12 is 10 in octal, then another short with the trigger followed by a short that is zero or 000 000, then, four bytes with the event counter. We can search the next event's counter and check that it is the same number plus one. When we are finished, we can check a file from 1996, where the event length was variable.

In this image, you can see the contents of those files, with the help of octal dump. The first is from 1996 (variable length, CAMAC IDs and values) and the second from 2000 (fixed length, only CAMAC values). The header variables are broken down in the legends to the right, compare those to the tables I showed you before or the ones appearing in GOOSY manual:

A picture showing the dump of two lmd files

In these two examples, we jump over the 16384 bytes file header. The dumps show the first 352 bytes of the files after this jump.

Translation program

Now that we have the structure of our files figured out, we can write that translation program. As mentioned before, the programing language I used was C++.

We can start reading the type and subtype, since their values are known, and test that we obtain the right values. To read raw blocks of data of a fixed size from a stream, we can use the fread() function. We use two shorts to store what we read and print the results on the terminal:


// Array of chars to save the filename
char lmdfile[86] = "mylmdfile.lmd"

// Pointer for reading
FILE *readlmd = NULL;

// Open lmd file for reading
if ( (readlmd = fopen(lmdfile, "r")) == NULL ) {
  printf("\nThe file couldn't be opened, exiting program\n\n");
  exit(1);
}

// Jump the 16384 bytes file header
int i = fseek(readlmd, 16384, SEEK_SET);

// We want to read two shorts
short subtype = 0;
short type    = 0;

// Reading the subtype with fread()
fread (&subtype, sizeof(short), 1, readlmd);
printf("\n Subtype = %d", subtype);

// Reading the type with fread()
fread (&type, sizeof(short), 1, readlmd);
printf("\n Type = %d", type);

We obtain:


Subtype = 512
Type = 5120

WAT.
.
.
.
.

What's happening?

When we read raw data, we have to check if our architecture is swapping the bytes when we store them in variables of types that are bigger than 1 byte, as is the case for short and int. Check this article about big and little endian byte order to see why this happens. In this case, we are not reading 000 001 and 000 012, we are reading 001 000 and 012 000. How to fix this?

We need a swapping function, to swap the bytes after we have stored them in our variables. In our data file, we have variables of size char (1 byte), short (2 bytes) and int (4 bytes), so we will only need to swap for short and int:


// SWAPING FUNCTION FOR 16 BITS VARIABLES ____________________
static inline unsigned short bswap_16(unsigned short x) {
  return (x>>8) | (x<<8);
}

// SWAPING FUNCTION FOR 32 BITS VARIABLES ____________________
static inline unsigned int bswap_32(unsigned int x) {
  return (bswap_16(x&0xffff)<<16) | (bswap_16(x>>16));
}

// Reading the subtype with fread()
fread (&subtype, sizeof(short), 1, readlmd);
subtype = bswap_16(subtype);
printf("\n Subtype = %d", subtype);

// Reading the type with fread()
fread (&type, sizeof(short), 1, readlmd);
type = bswap_16(type);

printf("\n Type = %d", type);

And now we obtain:


Subtype = 1
Type = 10

I'm not going to go into too much detail, but >> is the arithmetic (or signed) right shift operator, and << is the left shift operator, and meets the needs of both logical and arithmetic shifts. They are used to shift the positions of the bits of a number.

The whole program is printed below, for the experiment of 2000 (fixed event length and only CAMAC values printed). Don't judge me too much, I wrote this in 2008 or 2009. I knew nothing about life, the universe or anything else ;-)


/******************************************
 How to run this program:
 Just compile
     $ gcc lmd2ascii_2000.cpp -o lmd2ascii_2000
 or
     $ g++ lmd2ascii_2000.cpp -o lmd2ascii_2000
 and run
     $ ./lmd2ascii_2000

 Tested on:
      gcc version egcs-2.91.66 (egcs-1.1.2 release)
      gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)
******************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

// SWAPING FUNCTION FOR 16 BITS VARIABLES ________________________________________
static inline unsigned short bswap_16(unsigned short x) {
  return (x>>8) | (x<<8);
}

// SWAPING FUNCTION FOR 32 BITS VARIABLES ________________________________________
static inline unsigned int bswap_32(unsigned int x) {
  return (bswap_16(x&0xffff)<<16) | (bswap_16(x>>16));
}

typedef union {
  short semi[2];
  int total;
} scalers;

int main() {
// INPUT AND OUPUT FILES _________________________________________________________
  char lmdfile[86];           // 86 = GOOSY maximun number of char.
  char asciifile[86];         // ASCII formatted lmd file
  char outfile[86];           // Error file
  FILE *readlmd    = NULL;    // Pointer for reading
  FILE *writeascii = NULL;    // Pointer for writing
  FILE *writeout   = NULL;    // Pointer for writing

  do {
    printf("\n	Name of the lmd file to be read:			 ");
    scanf("%s", lmdfile);

    printf("	Type a name for the output ascii file: ");
    scanf("%s", asciifile);

    printf("	Type a name for the output errors file: ");
    scanf("%s", outfile);

  } while ( strcmp(lmdfile, asciifile) == 0);

  // Open lmd file for reading
  if ( (readlmd = fopen(lmdfile, "r")) == NULL ) {
    printf("\nThe file couldn't be opened, exiting program\n\n");
    exit(1);
  }
  printf("\n	Lmd file opened for reading...");

  // Open ASCII file for writing
  if ( (writeascii = fopen(asciifile, "w")) == NULL)	{
    printf("\nThe file couldn't be opened, exiting program\n\n");
    exit(1);
  }
  printf("\n	ASCII file opened for writing...\n");

  // Open errors file for writing
  if ( (writeout = fopen(outfile, "w")) == NULL)	{
    printf("\nThe file couldn't be opened, exiting program\n\n");
    exit(1);
  }
  printf("\n	ASCII file opened for writing...\n");

  int wrote      = 0;
  int oldcounter = 0;
  short flag     = 0;

// GOOSY VARIABLES **************************************************************
  // Event Type 10, Subtype 1
  int dlen      = 0;
  short type    = 0;
  short subtype = 0;
  short dummy   = 0;
  short trigger = 0;
  int counter   = 0;

  // Subevent Type 10, Subtype 1
  int sdlen       = 0;
  short procid    = 0;
  char subcrate   = 0;
  char control    = 0;
  short value_aux = 0;

  // There were 5 scalers with int values
  scalers scaler;

  // Other stuff
  int signals   = 0;            // Signals counter
  int dlen_max  = 74;           // Maximum event length
  int sdlen_max = dlen_max - 8; // Maximum subevent length

// READING THE FILE ______________________________________________________________
  int desp    = 16384 + 48; // 16384 bytes is the size of the header 1287744
  int nEvents = 0;          // Events counter
  int nWrongs = 0;          // Bad events counter
  int size    = 0;          // Lmd file size
  int i = 0, j = 0, k = 0, N = 0, pointerPos = 0;

  printf("\n	Number of events to read (-1 for all): ");
  scanf("%d",  &N);

  i    = fseek(readlmd,0,SEEK_END); // Move pointer to the end of file
  size = ftell(readlmd);
  printf("\n	Lmd file size = %d bytes\n",  size);

  i    = fseek(readlmd, desp, SEEK_SET);	// Move pointer "desp" bytes from begining
  pointerPos = ftell(readlmd);

// SEARCHING THE EVENT HEADER ***************************************************
  // Loop on the N events
  while ( (nEvents != N) && (size - pointerPos > 12) )	{
    // Reading the first 4 bytes word
    fread (&dlen, sizeof(int), 1, readlmd);
    dlen = bswap_32(dlen);

    if ( (dlen > dlen_max) || (dlen < 0) ) {
       pointerPos = ftell(readlmd);
       flag = 1;
       continue;
    }
    //printf("\n Event length = %d",  dlen);

    fread (&subtype, sizeof(short), 1, readlmd);
    subtype = bswap_16(subtype);
    //printf("\n Subtype = %d", subtype);

    fread (&type, sizeof(short), 1, readlmd);
    type = bswap_16(type);
    //printf("\n Type = %d", type);

    fread (&trigger, sizeof(short), 1, readlmd);
    trigger = bswap_16(trigger);
    //printf("\n Trigger = %d", trigger);

    fread (&dummy, sizeof(short), 1, readlmd);
    dummy = bswap_16(dummy);
    //printf("\n Dummy = %d", dummy);

    fread (&counter, sizeof(int), 1, readlmd);
    counter = bswap_32(counter);
    //printf("\n Counter %d\n",  counter);

    // Detecting event header
    if (subtype == 1 && type == 10 && trigger == 1 && dummy == 0) {

// READING THE SUBEVENT HEADER ***************************************************
      fread (&sdlen, sizeof(int), 1, readlmd);
      sdlen = bswap_32(sdlen);

      if ( (sdlen > sdlen_max) || (sdlen < 0) ) {
        pointerPos = ftell(readlmd);
        nWrongs ++;
        flag = 1;
        continue;
      }
      //printf("\n Subevent length = %d		",  sdlen);

      fread (&subtype, sizeof(short), 1, readlmd);
      subtype = bswap_16(subtype);
      //printf("\n Subtype = %d", subtype);

      fread (&type, sizeof(short), 1, readlmd);
      type = bswap_16(type);
      //printf("\n Type = %d", type);

      // Checking subevent header
      if ((dlen - sdlen != 8) || (type != 10) || (subtype !=1)) {
        pointerPos = ftell(readlmd);
        nWrongs ++;
        flag = 1;
        continue;
      }

      nEvents++;
      if (flag == 1) {
        flag=0;
      }
      //printf("\n	 ======== Event found =======	 \n");

      fread (&control, sizeof(char), 1, readlmd);
      //printf("\n Control = %d", control);

      fread (&subcrate, sizeof(char), 1, readlmd);
      //printf("\n Subcrate = %d", subcrate);

      fread (&procid, sizeof(short), 1, readlmd);
      procid = bswap_16(procid);
      //printf("\n Processor ID = %d", procid);

      signals = sdlen - 2;  // Total number of signals

      if (signals != 32)
        printf("\n signals = %d",  signals);
      signals = signals - 6;

      // Creating array dinamically, once one has read the length of the event.
      // There are 6 scalers of size = 2 shorts
      int value[signals];

      //Reading the scalers
      for (i=0; i<6; i++) {
        scaler.total = 0;

        fread (&value_aux, sizeof(short), 1, readlmd); // LOW part
        value_aux      = bswap_16(value_aux);
        scaler.semi[0] = value_aux;

        fread (&value_aux, sizeof(short), 1, readlmd); // HIGH part
        value_aux      = bswap_16(value_aux);
        scaler.semi[1] = value_aux;

        value[i] = scaler.total;
      }

      //Reading the rest of the data
      for (i=6; i<signals; i++) {
        fread (&value_aux, sizeof(short), 1, readlmd);
        value[i] = bswap_16(value_aux);
        //printf("\n CAMAC value = %d", value[i]);
      }

// WRITING THE FILE ______________________________________________________________
      // Event header
      fprintf(writeascii,"99999\n");
      fprintf(writeascii, "%d\n", dlen);
      fprintf(writeascii, "%d\n", subtype);
      fprintf(writeascii, "%d\n", type);
      fprintf(writeascii, "%d\n", trigger);
      fprintf(writeascii, "%d\n", dummy);
      fprintf(writeascii, "%d\n", counter);

      // Subevent header
      fprintf(writeascii, "%d\n", sdlen);
      fprintf(writeascii, "%d\n", subtype);
      fprintf(writeascii, "%d\n", type);
      fprintf(writeascii, "%d\n", control);
      fprintf(writeascii, "%d\n", subcrate);
      fprintf(writeascii, "%d\n", procid);

      for (j=0; j<signals-6; j++)
        fprintf(writeascii, "%d\n", value[j]);

      wrote ++;

      oldcounter = counter;
      fprintf(writeout, "\t%d", counter);
    }

    // if not subtype == 1 && type == 10 && trigger == 1 && dummy == 0
    // Move pointer 12 bytes backwards
    else
      i = fseek(readlmd,-12,SEEK_CUR);

      pointerPos = ftell(readlmd);
  }

  printf("\n Number of bytes read  = %d", pointerPos - desp );
  printf("\n Number of events read = %d", nEvents);

  if (nEvents != N)
    printf(" (Reached end of file)");

  printf("\n Number of bad events   = %d",   nWrongs);
  printf("\n Number of wrote events = %d\n\n", wrote);

  //Close files
  fclose(readlmd);
  fclose(writeascii);
  fclose(writeout);

  return 0;
}

Comments