How to read data from magnetic tapes and playing the detective with octal dump - Part 2

14 min. read

Caution! This article is 10 years old. It may be obsolete or show old techniques. It may also still be relevant, and you may find it useful! So it has been marked as deprecated, just in case.

This post is part of a series of three posts. Check them out!:

About the files

The files in the tapes were in lmd format, or "list mode", which is the format in which the data adquisition system wrote the data. This is raw data, i.e., words of bytes of different sizes, containing information for each event received from the detectors, in the order in which the events arrived to the adquisition system.

So this means several things: first, the files are going to follow a certain structure, a general one for the metadata included in the header of each event, and a specific one for the actual data, which is going to depend on the things you had connected, the signals you chose to store, how many detectors did you have, etc. And the other thing is, the way you write that stuff in the tape, what exactly do you write to know that detector X gave signal Y.

You can read about this in the chapter "Buffer and data structures" of the GOOSY manual. This chapter is crucial to translate these files into something that makes sense to a human. There are several types of events and they all have different header structures. Our data was of type 10, subtype 1, which you can find in page 28, and it took me a good while to figure it out. I wish somebody explained this to me when I first started working with my lmd files!

So, let's DO THIS. Let's learn about the general and specific structure of our headers:

The general structure of an event

First of all, the header and EOF files contain no relevant information, so we can get rid of them. We will work with just the data files (the meat of the sandwich). Second, at the beginning of every data file, there is a buffer header with the structure shown in page 17; we can get rid of it as well. It occupies an entire block so we need to know the size of a block. We already know this from the previous section, it is 16 Kb or 16384 bytes. So I had to jump 16384 bytes before reading anything.

In Figure 1.13 of the GOOSY manual we have the event structure. To read through this structure, I wrote a program in C++. What the GOOSY manual calls "word" is one byte (8 bits), and the events are written from right to left. So, when you are positioned at the beginning of a new event in your file, you will have to check for a header with the following structure:

  • First, you will read the length (in bytes) of the event. A space of 4 bytes (32 bits) is reserved to store this number. This means that you should create a variable in your program of type int at least. If you wanted to jump to the next event, you would have to read this number and jump length bytes minus four in your file from your actual position.
  • Next, there is the subtype, followed by the type (remember, it is written from right to left Each one has a reserved space of two bytes (16 bits) to store their values, which are, as said before, 1 and 10, respectively. These values will be useful to detect events in a sea of hexadecimals and octals, as we will see later. For two bytes, use a variable of type short at least.
  • Next comes the trigger. It's two bytes long, so, a short at least. There was no relevant information to store that fit into two bytes, so the next two bytes you will read after the trigger are a bunch of zeros (appears in the figure as "not used") of size short.
  • After, you will find the event counter, four bytes. That adds another int variable to your translation program.

This is the list of all your events, in the order in which they arrived to the adquisition system.

Header structure of an Event type 10, subtype 1
Event length
(4 bytes or 32 bits)
Offset 0
Event subtype = 1
(2 bytes or 16 bits)
Event type = 10
(2 bytes or 16 bits)
Offset 4
(2 bytes or 16 bits)
Not used
(2 bytes or 16 bits)
Offset 8
Event counter
(4 bytes or 32 bits)
Offset 12
Subevent 1 Offset 16
Subevent 2 Offset 20
... Offset ...
Subevent n Offset 16+n

The "subevents" (your actual events, really), also come with their own header, so, after sorting the information in the big header, you'll have to sort the information in every subevent's header. But it is pretty much the same, just that here we have two variables that are 1 byte long, the control and the subcrate. So they can be stored in a... you guessed it, in a char variable.

Header structure of an Event type 10, subtype 1
* CAMAC (Computer Automated Measurement and Control) refers to the electronic modules we were using to receive the analogic signals coming from the detectors and turn them into digital data.
Subevent length
(4 bytes or 32 bits)
Subevent subtype = 1
(2 bytes or 16 bits)
Subevent type = 10
(2 bytes or 16 bits)
(1 bytes or 8 bits)
(1 bytes or 8 bits)
Processor ID
(2 bytes or 16 bits)
CAMAC value[*]
(2 bytes or 16 bits)
CAMAC module ID
(2 bytes or 16 bits)
... ...

Understanding this is a bit like understanding how files are stored in your computer, at a very basic level. When you write a text file in your editor, each letter is stored using 1 byte (8 bits) of space in your disk, which is the equivalent of a variable of type char. In this case, the ASCII code is used to translate a number that can be written with 8 bits (i.e., a number between 0 and 255) into a letter that a human can understand.

The specific structure of an event

As I said before, I had to analyze the data of two experiments, carried out in 1996 and in 2000. They had a different data structure:

  • 1996: The CAMAC module ID and its value were printed ONLY if the CAMAC had given a signal for that event. So the event length was variable. The purpose of doing it this way was to save space.
  • 2000: Only the CAMAC value was printed to the file, but all the CAMAC signals were printed (always in the same order), even if they had given no signal, in which case a zero would be recorded. In this case the event length was constant and had a value of 42 bytes, or 52 in octal, that is, we had 42 signals in total.

The CAMACs used in each experiment were different too. They were used to collect information of times and energies from different detectors (Germanium, Silicon, NaI and BaF scintillators, etc.).

So the easiest files to start looking at are the second ones. We just have to print a long enough amount of bytes, and search for a 52, which is a 42 writen across four bytes, which is the size of the event length. But you can not open these files in a text editor, remember? We need to write a translation program for it. How do we do that? And how do we check that we are reading what we have to read?

Octal dump comes to the rescue, in Part 3.