Processing WAV files

by Tom van Stiphout25. February 2012

Introduction

WAV files are just one example of a large class of files called structured files. While we will use WAV files as an example, the techniques discussed here can be applied to all structured files. In my office, the WAV files are created by Avaya's IP Office, which is a business-class phone system, which includes the ability to save recordings of phone conversations. No doubt you have heard the disclaimer "Phone calls may be recorded for quality assurance". That is exactly what we do.

Notepad files are unstructured; other than the actual characters of the note meaning something to the reader, the file contents are simply a collection of bytes carrying no other specific information.

The contents of a structured file are arranged in a specific defined way. WAV files, image files, Word documents, Access databases, and Executables are examples of structured files. In addition to sound, pixels of the image, text of a document, database objects, and machine instructions there is a lot more information available.

 

The structure of a WAV file

Although there is some human-readable text (and a lot of funny characters), if you open a WAV file in Notepad, the structure may not immediately be evident.  This is an example of what you might see:

 

 

If however, the same file is opened in a HEX Editor such as 010 Editor, the structure becomes apparent:

This editor has built-in intelligence about many structured file types, so I did not have to look on the Internet for the structure of a WAV file (e.g.WaveFormat), which was nice. The image shows the individual bytes in Hexadecimal notation. At the top right we see the same data in Ascii. A period is displayed for bytes that don't have an Ascii equivalent.

The bottom pane shows that the file is made up of five structures. The first structure is called a WAVRIFFHEADER and it is highlighted. This structure starts at 0 hex and has a size of C hex (12 decimal). After that comes a FORMATCHUNK starting at C hex for a length of 18 hex (24 decimal), Then an UNKNOWNCHUNK (more about this later), a FACTCHUNK, and finally the actual sound bytes in the DATACHUNK.

If we expand these five structures by clicking the triangle to the left of the entry, we see:

Great! Now we can see the actual structure of the file, not just random bits. Each chunk starts with the name of the chunk, followed by the length of the chunk, followed by more data. Name and Size would come in handy if I wanted to process the file a byte at a time. Later we will see a more efficient way of processing structured files. We can also see that some fields have a fixed size and others a variable size. For example, the first field groupID is always 4 bytes and always set to RIFF (there is a finer point here, but out of scope of this article). The last field "samples" is an example of a variable-length field: its size depends on the amount of sound contained in the file, and its size is given by the previous chunkSize field.

In the case of my application, information like number of channels or sample rate is interesting but not essential. My objective in wanting to know how to get information from the WAV file is to find out who called who. I then can associate each phone recording file with the correct Company and Contact in our custom CRM application.

It turns out this information is available in the UNKNOWNCHUNK. This can already be seen, to some extent, in the first screenshot above. Unfortunately, as the name suggests, this was a part of the file that nothing was known about. It is a vendor-specific chunk that Avaya and others can use, or not, as they see fit.

Avaya does not publicly document this chunk  so now it's time for some detective work.

The data shows some numbers that we recognize as internal 3-digit extensions, or external phone numbers (9 for outside line, followed by a 10-digit phone number), or today's date and time, Most of the texts seem to fall in 32-byte blocks. A few bytes, like at 008Ch, we have no idea about. After comparing several files and making some experimental recordings of our own (e.g. what would happen if we forward an incoming call to another extension) we were able to derive quite a bit about the structure of this unknown chunk.

First is the ChunkID which is set to ALCH rather than UNKNOWN, probably because Avaya used to have a phone product named ALCHEMY.

Next is the Chunk Size, showing us this chunk is roughly 3 KB in size. That is quite large and it tells us if we have a file with only one millisecond of sound, it will at least be 3 KB in size.

Next is a 32-byte block with the name of the manufacturer. It is always set to "Avaya" for these phone system WAV files. Your WAV files may have a different manufacturer, or may not even include the UNKNOWNCHUNK.

Next is another 32-byte block with either the phone extension if it was an outgoing call, or the phone number of the incoming call.

Next is the name of the user for that extension, or the caller ID of the incoming call.

After that comes a 4-byte block we were not able to determine what it is. In our type definition we call it "unknown1".

We were able to recognize several other fields. The details are in the type definition for the VBA language:  

Private Type ALCHCHUNK
    chunkID         As String * 4               'ALCH (Avaya used to have a product called Alchemy)
    chunkSize       As Long
    manufacturer    As String * 32              'Avaya
    caller          As String * 32              'Extension (out) or Phone number (in)
    callerDisplay   As String * 32              'User name (out) or caller id max 15 chars (in)
    unknown1        As String * 4               'NOTE: binary data
    unknown2        As String * 2               'always null?
    numberDialed    As String * 32
    numberForward   As String * 32
    unknown3        As String * 4               'always null?
    callerDisplay2  As String * 32
    forwardDisplay  As String * 32
    unknown4        As String * 32
    unknown5        As String * 32
    unknown6        As String * 24
    unknown7        As String * 32
    forwardDisplay2 As String * 32
    unknown8        As String * 4               'NOTE: binary data
    voice           As String * 32              'Voice
    Direction       As String * 32              'AutoRecordingIncomingUser or AutoRecordingOutgoingUser
    date            As String * 32
    time            As String * 32
    extension       As String * 32
    unknown9        As String * 232
    machineName     As String * 32
    unknown10       As String * 2176
End Type

Note the use of fixed size strings for the 32-byte text blocks. A Long in VBA is four bytes so it can be used for the chunkSize.

With enough of the structure defined to start reading data, we moved on to enhancing our CRM application.

 

Document Links Application

We have a database with companies and contacts. By decoding the WAV file structure and then writing VBA to store each piece in the correct field, we are able to associate phone calls (documents) with the correct records. In this screenshot you can see the finished product. On the main form we have a new "Docs" column with associated documents, including the count of documents. If the user clicks on the paperclip icon, the Recordings form opens from where basic information is displayed, and the recording can be played.

 

Let's look at how we created this new functionality.

First we need a table to store the Recordings information. Then we need a way to fill it with available recordings. That's what the "Process Files" button on the main form does:

 

Private Sub cmdProcessFiles_Click()
    Dim strFile         As String
    Dim objWav          As New clsWavFile
    Dim rsRecordings    As DAO.Recordset

    Set rsRecordings = CurrentDb.OpenRecordset("Recordings", dbOpenDynaset)
    
    strFile = Dir$(RECORDINGS_FOLDER & "*.WAV")
    While strFile <> ""
        'Check if new file.
        rsRecordings.FindFirst "FileName='" & strFile & "'"
        If rsRecordings.NoMatch Then
            'It's a new file. Read it and save important fields to table.
            objWav.Read RECORDINGS_FOLDER & strFile
            
            rsRecordings.AddNew
            rsRecordings!CustomerID = GetCustomerID(IIf(objWav.Direction = "I", objWav.FromPhoneNumber, objWav.ToPhoneNumber))
            rsRecordings!FileName = strFile
            rsRecordings!Direction = objWav.Direction
            rsRecordings!FromPhone = objWav.FromPhoneNumber
            rsRecordings!ToPhone = objWav.ToPhoneNumber
            rsRecordings.Update
        End If

        'Prepare for next iteration
        strFile = Dir$
    Wend
    
    'Requery the form so the recordings will show.
    Me.Requery
    
    'Final cleanup
    rsRecordings.Close
    Set rsRecordings = Nothing
    Set objWav = Nothing
End Sub

 

After declaring some variables (we will discuss clsWavFile shortly) in line 6 we open a recordset on the Recordings table.

In line 8 we set up a loop over all *.WAV files in the folder with recordings.

The Avaya phone system ensures filenames are unique, so in line 11 FindFirst is used to check if the current file has already been processed. If it's a new file, then in line 14 the WAV File object is used to read the file.

From line 16 onward a new record is added to the Recordings table, setting the field values equal to the properties of the WAV File object. Note that the code here knows nothing about the structure of a WAV file; it only knows how to use some methods and properties of the WAV File object.

In line 26 we move to the next file, if any.

After some cleanup we are done and the main form will display the latest recording information!

 

WAV File class

It is a good idea to encapsulate specific functionality like processing a WAV file in its own class. When you download and read the code in this class you will see that there is very little code. This is because of the beauty of structured files: once you know the structure and have defined it, reading the many fields of information takes only one line of code, using VBA's "Get" statement. For example to read the UNKNOWNCHUNK, we have defined the structure as the Private Type ALCHCHUNK listed above. We then define a variable of that type:

Private m_ac As ALCHCHUNK

Then we read the entire chunk (all 3 KB) with one line of code:

Get #intFile, , m_ac

 

The current WAV File class only contains the methods and properties I needed. However it is very easy to add your own. Say you want to show how many channels of sound there are, or what the sample rate is. You simply create additional Property Get procedures and return the values from the FORMATCHUNK:

Public Property Get SamplesPerSec() As Long
   SamplesPerSec = m_fc.dwSamplesPerSec
End Property

 

Summary

Structured files usually contain lots of information, in addition to the raw bytes for the file type at hand. We extracted caller information from WAV files for use in our CRM application. Now that you see how easy it is to look inside files with a HEX editor, you should try it!  It takes a little time to figure things out but the rewards are great.

Through the link below you can download the sample program in Access 2007 format.

DocumentLinks.zip (509.34 kb)

Tags:

Comments are closed
   

About the author

Tom van Stiphout bio goes here.