# How to read data from magnetic tapes and playing the detective with octal dump - Part 3

This post is part of a series of three posts. Check them out!:

## Playing the detective with octal dump

Remember the sea of octals I mentioned before?

Octal dump, or `od`, is a very handy command that allows you to see through the entrails of your files. To illustrate this, let's create a file called `file`, containing the text `Hey, I'm an lmd file.`, and use octal dump on it:

``````
\$ od file
0000000 062510 026171 044440 066447 060440 020156 066554 020144
0000020 064546 062554 005056
0000026
``````

The numbers in the first column to the left are the number of bytes read. The other numbers are the contents of our file, in the default representation of octal dump, which is octal shorts. A `short` is two bytes long, but the smallest variable we have in our data is a `char`. Does octal dump have a representation that is 1 byte long? Yes, our good ol' friend ASCII:

``````
\$ od -c file
0000000   H   e   y   ,       I   '   m       a   n       l   m   d
0000020   f   i   l   e   .  \n
0000026``````

The option `-c` tells octal dump to use ASCII characters. The `\n` is a carriage return. The first column (the number of bytes read) is still in octal shorts, the default representation of octal dump. The octal numbers in that column can be translated into decimal numbers using the Fundamental theorem of arithmetic:

• 20 = 2x81 + 0x80 = 16 bytes read in the first line, and
• 26 = 2x81 + 6x80 = 22 bytes read in the second line.

However, doing it this way would be impractical. To make things more human friendly, we can tell octal dump to print the first column directly in decimal format, adding the option `-A d`:

``````
\$ od -c -A d file
0000000   H   e   y   ,       I   '   m       a   n       l   m   d
0000016   f   i   l   e   .  \n
0000022``````

But our files are in another castle! They were not saved in ASCII, they are raw data. So how can we take a look at the data one byte at a time but not in ASCII format? Remember; 1 byte is 8 bits, an octal. And of course, octal dump has an octal-bytes representation:

``````
\$ od -b -A d file
0000000 110 145 171 054 040 111 047 155 040 141 156 040 154 155 144 040
0000016 146 151 154 145 056 012
0000022``````

The option `-b` prints each byte of our file in octal format; `110` is the octal representation of the letter `H`, `145` is the octal representation of the letter `e`, and so on. Now, we said we have to jump through headers or corrupted parts of a file. We can do that with the option `-j <BYTES_TO_JUMP>`, for example, to jump four bytes:

``````
\$ od -b -A d -j 4 file
0000004 040 111 047 155 040 141 156 040 154 155 144 040 146 151 154 145
0000020 056 012
0000022``````

See how the first column is updated too? :D

Finally, I had to handle files of hundreds of Mb. How do I check only an amount `N` of consecutive bytes? Using `-N <BYTES_TO_SHOW>`:

``````
\$ od -b -A d -j 4 -N 2 file
0000004 040 111
0000006``````

Now, let's take a look at a real `lmd` file containing real raw data and not ASCII characters.

If we pick one file from the measurements made in 2000 (constant event length), all we have to do is to search for a `000 000 000 052`, which is a 42 in octal, written across four bytes. Remember that 42 is the number of signals we have in that experiment (the event length) and that the event length has a size of four bytes. Then comes a short (2 bytes) with the subtype, that is `000 001`, then comes a short with the type, or `000 012`, where 12 is 10 in octal, then another short with the trigger followed by a short that is zero or `000 000`, then, four bytes with the event counter. We can search the next event's counter and check that it is the same number plus one. When we are finished, we can check a file from 1996, where the event length was variable.

In this image, you can see the contents of those files, with the help of octal dump. The first is from 1996 (variable length, CAMAC IDs and values) and the second from 2000 (fixed length, only CAMAC values). The header variables are broken down in the legends to the right, compare those to the tables I showed you before or the ones appearing in GOOSY manual: In these two examples, we jump over the 16384 bytes file header. The dumps show the first 352 bytes of the files after this jump.

## Translation program

Now that we have the structure of our files figured out, we can write that translation program. As mentioned before, the programing language I used was `C++`.

We can start reading the type and subtype, since their values are known, and test that we obtain the right values. To read raw blocks of data of a fixed size from a stream, we can use the `fread()` function. We use two shorts to store what we read and print the results on the terminal:

``````
// Array of chars to save the filename
char lmdfile = "mylmdfile.lmd"

// Pointer for reading
FILE *readlmd = NULL;

// Open lmd file for reading
if ( (readlmd = fopen(lmdfile, "r")) == NULL ) {
printf("\nThe file couldn't be opened, exiting program\n\n");
exit(1);
}

// Jump the 16384 bytes file header
int i = fseek(readlmd, 16384, SEEK_SET);

// We want to read two shorts
short subtype = 0;
short type    = 0;

printf("\n Subtype = %d", subtype);

printf("\n Type = %d", type);
``````

We obtain:

``````
Subtype = 512
Type = 5120
``````

WAT.
.
.
.
.

What's happening?

When we read raw data, we have to check if our architecture is swapping the bytes when we store them in variables of types that are bigger than 1 byte, as is the case for `short` and `int`. Check this article about big and little endian byte order to see why this happens. In this case, we are not reading `000 001` and `000 012`, we are reading `001 000` and `012 000`. How to fix this?

We need a swapping function, to swap the bytes after we have stored them in our variables. In our data file, we have variables of size `char` (1 byte), `short` (2 bytes) and `int` (4 bytes), so we will only need to swap for `short` and `int`:

``````
// SWAPING FUNCTION FOR 16 BITS VARIABLES ____________________
static inline unsigned short bswap_16(unsigned short x) {
return (x>>8) | (x<<8);
}

// SWAPING FUNCTION FOR 32 BITS VARIABLES ____________________
static inline unsigned int bswap_32(unsigned int x) {
return (bswap_16(x&0xffff)<<16) | (bswap_16(x>>16));
}

subtype = bswap_16(subtype);
printf("\n Subtype = %d", subtype);

type = bswap_16(type);

printf("\n Type = %d", type);
``````

And now we obtain:

``````
Subtype = 1
Type = 10
``````

I'm not going to go into too much detail, but `>>` is the arithmetic (or signed) right shift operator, and `<<` is the left shift operator, and meets the needs of both logical and arithmetic shifts. They are used to shift the positions of the bits of a number.

The whole program is printed below, for the experiment of 2000 (fixed event length and only CAMAC values printed). Don't judge me too much, I wrote this in 2008 or 2009. I knew nothing about life, the universe or anything else ;-)

``````
/******************************************
How to run this program:
Just compile
\$ gcc lmd2ascii_2000.cpp -o lmd2ascii_2000
or
\$ g++ lmd2ascii_2000.cpp -o lmd2ascii_2000
and run
\$ ./lmd2ascii_2000

Tested on:
gcc version egcs-2.91.66 (egcs-1.1.2 release)
gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)
******************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

// SWAPING FUNCTION FOR 16 BITS VARIABLES ________________________________________
static inline unsigned short bswap_16(unsigned short x) {
return (x>>8) | (x<<8);
}

// SWAPING FUNCTION FOR 32 BITS VARIABLES ________________________________________
static inline unsigned int bswap_32(unsigned int x) {
return (bswap_16(x&0xffff)<<16) | (bswap_16(x>>16));
}

typedef union {
short semi;
int total;
} scalers;

int main() {
// INPUT AND OUPUT FILES _________________________________________________________
char lmdfile;           // 86 = GOOSY maximun number of char.
char asciifile;         // ASCII formatted lmd file
char outfile;           // Error file
FILE *readlmd    = NULL;    // Pointer for reading
FILE *writeascii = NULL;    // Pointer for writing
FILE *writeout   = NULL;    // Pointer for writing

do {
printf("\n	Name of the lmd file to be read:			 ");
scanf("%s", lmdfile);

printf("	Type a name for the output ascii file: ");
scanf("%s", asciifile);

printf("	Type a name for the output errors file: ");
scanf("%s", outfile);

} while ( strcmp(lmdfile, asciifile) == 0);

// Open lmd file for reading
if ( (readlmd = fopen(lmdfile, "r")) == NULL ) {
printf("\nThe file couldn't be opened, exiting program\n\n");
exit(1);
}
printf("\n	Lmd file opened for reading...");

// Open ASCII file for writing
if ( (writeascii = fopen(asciifile, "w")) == NULL)	{
printf("\nThe file couldn't be opened, exiting program\n\n");
exit(1);
}
printf("\n	ASCII file opened for writing...\n");

// Open errors file for writing
if ( (writeout = fopen(outfile, "w")) == NULL)	{
printf("\nThe file couldn't be opened, exiting program\n\n");
exit(1);
}
printf("\n	ASCII file opened for writing...\n");

int wrote      = 0;
int oldcounter = 0;
short flag     = 0;

// GOOSY VARIABLES **************************************************************
// Event Type 10, Subtype 1
int dlen      = 0;
short type    = 0;
short subtype = 0;
short dummy   = 0;
short trigger = 0;
int counter   = 0;

// Subevent Type 10, Subtype 1
int sdlen       = 0;
short procid    = 0;
char subcrate   = 0;
char control    = 0;
short value_aux = 0;

// There were 5 scalers with int values
scalers scaler;

// Other stuff
int signals   = 0;            // Signals counter
int dlen_max  = 74;           // Maximum event length
int sdlen_max = dlen_max - 8; // Maximum subevent length

// READING THE FILE ______________________________________________________________
int desp    = 16384 + 48; // 16384 bytes is the size of the header 1287744
int nEvents = 0;          // Events counter
int nWrongs = 0;          // Bad events counter
int size    = 0;          // Lmd file size
int i = 0, j = 0, k = 0, N = 0, pointerPos = 0;

printf("\n	Number of events to read (-1 for all): ");
scanf("%d",  &N);

i    = fseek(readlmd,0,SEEK_END); // Move pointer to the end of file
printf("\n	Lmd file size = %d bytes\n",  size);

i    = fseek(readlmd, desp, SEEK_SET);	// Move pointer "desp" bytes from begining

// SEARCHING THE EVENT HEADER ***************************************************
// Loop on the N events
while ( (nEvents != N) && (size - pointerPos > 12) )	{
// Reading the first 4 bytes word
dlen = bswap_32(dlen);

if ( (dlen > dlen_max) || (dlen < 0) ) {
flag = 1;
continue;
}
//printf("\n Event length = %d",  dlen);

subtype = bswap_16(subtype);
//printf("\n Subtype = %d", subtype);

type = bswap_16(type);
//printf("\n Type = %d", type);

trigger = bswap_16(trigger);
//printf("\n Trigger = %d", trigger);

dummy = bswap_16(dummy);
//printf("\n Dummy = %d", dummy);

counter = bswap_32(counter);
//printf("\n Counter %d\n",  counter);

// Detecting event header
if (subtype == 1 && type == 10 && trigger == 1 && dummy == 0) {

sdlen = bswap_32(sdlen);

if ( (sdlen > sdlen_max) || (sdlen < 0) ) {
nWrongs ++;
flag = 1;
continue;
}
//printf("\n Subevent length = %d		",  sdlen);

subtype = bswap_16(subtype);
//printf("\n Subtype = %d", subtype);

type = bswap_16(type);
//printf("\n Type = %d", type);

// Checking subevent header
if ((dlen - sdlen != 8) || (type != 10) || (subtype !=1)) {
nWrongs ++;
flag = 1;
continue;
}

nEvents++;
if (flag == 1) {
flag=0;
}
//printf("\n	 ======== Event found =======	 \n");

//printf("\n Control = %d", control);

//printf("\n Subcrate = %d", subcrate);

procid = bswap_16(procid);
//printf("\n Processor ID = %d", procid);

signals = sdlen - 2;  // Total number of signals

if (signals != 32)
printf("\n signals = %d",  signals);
signals = signals - 6;

// Creating array dinamically, once one has read the length of the event.
// There are 6 scalers of size = 2 shorts
int value[signals];

for (i=0; i<6; i++) {
scaler.total = 0;

fread (&value_aux, sizeof(short), 1, readlmd); // LOW part
value_aux      = bswap_16(value_aux);
scaler.semi = value_aux;

fread (&value_aux, sizeof(short), 1, readlmd); // HIGH part
value_aux      = bswap_16(value_aux);
scaler.semi = value_aux;

value[i] = scaler.total;
}

//Reading the rest of the data
for (i=6; i<signals; i++) {
value[i] = bswap_16(value_aux);
//printf("\n CAMAC value = %d", value[i]);
}

// WRITING THE FILE ______________________________________________________________
fprintf(writeascii,"99999\n");
fprintf(writeascii, "%d\n", dlen);
fprintf(writeascii, "%d\n", subtype);
fprintf(writeascii, "%d\n", type);
fprintf(writeascii, "%d\n", trigger);
fprintf(writeascii, "%d\n", dummy);
fprintf(writeascii, "%d\n", counter);

fprintf(writeascii, "%d\n", sdlen);
fprintf(writeascii, "%d\n", subtype);
fprintf(writeascii, "%d\n", type);
fprintf(writeascii, "%d\n", control);
fprintf(writeascii, "%d\n", subcrate);
fprintf(writeascii, "%d\n", procid);

for (j=0; j<signals-6; j++)
fprintf(writeascii, "%d\n", value[j]);

wrote ++;

oldcounter = counter;
fprintf(writeout, "\t%d", counter);
}

// if not subtype == 1 && type == 10 && trigger == 1 && dummy == 0
// Move pointer 12 bytes backwards
else

}

printf("\n Number of bytes read  = %d", pointerPos - desp );
printf("\n Number of events read = %d", nEvents);

if (nEvents != N)
printf(" (Reached end of file)");

printf("\n Number of bad events   = %d",   nWrongs);
printf("\n Number of wrote events = %d\n\n", wrote);

//Close files