The purpose of this exercise is to compare the different algorithm approaches for adjusting a sequence of sound samples. We can analyze how the different gcc optimization levels increase or decrease the performance for each of them on both x86_64 and Aarch64 architectures.

## Creating audio sample

First, we will create an audio sample to test with. Since we would like to be able to compare the runtimes between each algorithm, a substantial data set is required, so we will create a sequence of 500000000 sound samples. This will be coded in c, and we will store the sound samples inside a signed int16 array (-32768 to 32767 range).

#define SAMPLESNUM 500000000 void createAudioSample(int16_t* sample_) { for(int i = 0; i <= SAMPLESNUM; i++) { sample_[i] = rand(); } } int main() { int16_t* sample = malloc(SAMPLESNUM*sizeof(int16_t)); createAudioSample(sample); }

## Increase Volume

### First algorithm – “Naive” approach

The first algorithm written will increase the volume by simply multiplying each sound sample by a volume scale factor then storing the result into a new array.

const float scale = 0.5; // volume scaling factor void naiveVolumeUp(int16_t* sample_, int16_t* newSample_) { for (int i = 0; i <= SAMPLESNUM; i++) { newSample_[i] = sample_[i] * scale; } }

### Second algorithm – Look up table approach

For this approach, we will create a lookup table with all possible values from -32768 to 32767 multiplied by the volume scale factor (in our case 0.5). We can then reference the lookup later on to adjust our volume by the scale factor.

#define MAXSIZE 65536 // maximum size for signed 16 bit integer #define HALF 32767 // half of max size for signed 16 bit integer ... void lookupTableVolumeUp(int16_t* sample_, int16_t* newSample_) { // Create Lookup table int16_t lookupTable[MAXSIZE]; for (int counter = 0; counter < MAXSIZE; counter++) { lookupTable[counter] = ((counter - HALF )*scale); } // Increase using lookupTable for (int i = 0; i <= MAXSIZE; i++) { newSample_[i] = lookupTable[sample_[i] + HALF]; } }

Here is a function to calculate our functions’ execution times:

void printExecTime(struct timeval t1, struct timeval t2) { double elapsed; elapsed = (t2.tv_sec - t1.tv_sec) + 1e-6 * (t2.tv_usec - t1.tv_usec); printf("elapsed: %.8lf seconds\n", elapsed); }

And our main:

int main() { struct timeval t1, t2; int16_t* sample = malloc(SAMPLESNUM*sizeof(int16_t)); createAudioSample(sample); printf("\nAudio sample\n============\n"); printSpecifiedRange(sample, 0, 7); int16_t* newSample = malloc(SAMPLESNUM*sizeof(int16_t)); printf("\nNaive volume up\n===============\n"); gettimeofday(&t1, NULL); // starting time naiveVolumeUp(sample, newSample); // start naive test gettimeofday(&t2, NULL); // end time printExecTime(t1, t2); printSpecifiedRange(newSample, 0, 7); free(newSample); newSample = malloc(SAMPLESNUM*sizeof(int16_t)); printf("\nLookup volume up\n================\n"); gettimeofday(&t1, NULL); // starting time lookupTableVolumeUp(sample, newSample); // start lookup table approach gettimeofday(&t2, NULL); // end time printExecTime(t1, t2); printSpecifiedRange(newSample, 0, 7); return 0; }

Compile our code:

`gcc -o lab5 lab5.c`

`time ./lab5`

Audio sample ============ sample[0]=17767 sample[1]=9158 sample[2]=-26519 sample[3]=18547 sample[4]=-9135 sample[5]=23807 sample[6]=-27574 sample[7]=22764 Naive volume up =============== elapsed: 6.56288600 seconds sample[0]=8883 sample[1]=4579 sample[2]=-13259 sample[3]=9273 sample[4]=-4567 sample[5]=11903 sample[6]=-13787 sample[7]=11382 Lookup volume up ================ elapsed: 0.11074700 seconds sample[0]=8883 sample[1]=4579 sample[2]=-13259 sample[3]=9273 sample[4]=-4567 sample[5]=11903 sample[6]=-13787 sample[7]=11382 real 0m16.653s user 0m11.863s sys 0m1.145s

We can see the lookup approach is signficantly faster. Now we’ll test the different optimization levels on both Xerxes(**x86_64**) and Betty(**Aarch64**) machines.

## Runtime w/ optimization levels

I compiled the code with each optimization level from O0 to O3, i.e:

`gcc -O0 -o lab5 lab5.c`

…

`gcc -O3 -o lab5 lab5.c`

`c99 -O0 -o lab5 lab5.c`

…

`c99 -O3 -o lab5 lab5.c`

And recorded the results into a spreadsheet:

The **Functions** section of the report shows both functions’ execution times, and is displayed in *seconds*.

The first thing I noticed with *O0* optimization was that the ratio between Naive and Lookup was significantly wider on Betty than on Xerxes. The Naive function on Xerxes comparing *O0* to *O1* improves from 3.84 to 1.18. On Betty, it improves from 8.94 to 1.79. The ratio between the two functions from *O1* to *O3* begin to even out and are relatively similar between the two architectures.

The **time** section of the report displays results from the `time ./lab5`

command. This simply displays the runtimes for the whole program execution in real, user and sys time.

The **size** section displays results from the `size lab5`

command.

We can see that all data sizes are equally larger on Betty than on Xerxes. One thing that stands out is the O3 level optimization on Xerxes is 4634 bytes, while on Betty it is 2934 bytes.