Analyzing C compiler options, ELF files & optimization on x86_64 architecture

Compiling a simple hello world C program and exploring some of the different compiler options and how they affect the ELF (Executable and Linkable Format) executable file. This will be done on an x86-64 architecture system.

hello.c

#include <stdio.h>

int main()
{
  printf("Hello world!\n");
}

gcc options:

-g           # enable debugging information
-O0          # do not optimize
-fno-builtin # do not use builtin function optimizations

objdump options:

-f       # display header information for the entire file
-s       # display per-section summary information
-d       # disassemble sections containing code
--source # (implies -d) show source code, if available, along with disassembly

To find out which section headers in the executable contain our code, we will use the -d option:

objdump -d hello

The 3 disassembled sections we see are .init, .plt and .text.

To find out which section contains our string to be printed, we use:

objdump -s hello

If we search for the string “Hello”, we will see that it is inside of the .rodata section.

Contents of section .rodata:

400590 01000200 00000000 00000000 00000000 ................
4005a0 48656c6c 6f20776f 726c6421 0a00     Hello world!..

Now we will compare our original ELF file with new ones using different compiler options:

gcc -g -O0 -fno-builtin hello.c -o hello

ls -l hello
Size of the executable is 11000 bytes.

objdump --source hello


00000000004004f6 :
#include

int main()
{
4004f6: 55             push %rbp
4004f7: 48 89 e5       mov %rsp,%rbp
printf("Hello world!\n");
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
4004ff: b8 00 00 00 00 mov $0x0,%eax
4004ff: e8 ec fe ff ff callq 4003f0 &lt;puts@plt&gt; 400504: b8 00 00 00 00 mov $0x0,%eax
}
400509: 5d             pop %rbp
40050a: c3             retq
40050b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

Adding/Removing compile options

1. Adding the -static option

gcc -g -O0 -fno-builtin -static hello.c -o hello-1

Size of ELF file has grown to 916088 bytes since the dynamic libraries are added to the binary file. This is good for portability since the binary does not require dependencies at runtime in order to run.

2. Removing -fno-builtin option

gcc -g -O0 hello.c -o hello-2

objdump --source hello-2

00000000004004f6 :
#include

int main()
{
4004f6: 55             push %rbp
4004f7: 48 89 e5       mov %rsp,%rbp
printf("Hello world!\n");
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
<del datetime="2017-02-02T16:23:20+00:00">4004ff: b8 00 00 00 00 mov $0x0,%eax</del>
400504: e8 e7 fe ff ff callq 4003f0 &lt;printf@plt&gt; 400509: b8 00 00 00 00 mov $0x0,%eax
}
40050e: 5d             pop %rbp
40050f: c3             retq

Since built-in function optimization is not enabled, we can see one instruction has been removed from the ELF file – size is now 8528 bytes.

3. Removing -g option

gcc -O0 hello.c -o hello-3

00000000004004f6 :
4004f6: 55             push %rbp
4004f7: 48 89 e5       mov %rsp,%rbp
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
4004ff: e8 ec fe ff ff callq 4003f0 &lt;puts@plt&gt; 400504: b8 00 00 00 00 mov $0x0,%eax
400509: 5d             pop %rbp
40050a: c3             retq
40050b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

This compiles without attaching debugging information, and we cannot see the inline source code – size is now 917680 bytes.

4. Adding additional arguments to printf() function

To analyze the differences in registers used for every argument added, 10 sequential integer arguments were added to the printf() function:

hello-4.c

#include

int main()
{
printf("Hello World!, %d%d%d%d%d%d%d%d%d%d\n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}

gcc -O0 -g hello.c -o hello-4

objdump --source lab4-4

00000000004004f6 <main>:
#include <stdio.h>

int main()
{
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
    printf("Hello world!, %d%d%d%d%d%d%d%d%d%d\n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  4004fa:       48 83 ec 08             sub    $0x8,%rsp
  4004fe:       6a 0a                   pushq  $0xa
  400500:       6a 09                   pushq  $0x9
  400502:       6a 08                   pushq  $0x8
  400504:       6a 07                   pushq  $0x7
  400506:       6a 06                   pushq  $0x6
  400508:       41 b9 05 00 00 00       mov    $0x5,%r9d
  40050e:       41 b8 04 00 00 00       mov    $0x4,%r8d
  400514:       b9 03 00 00 00          mov    $0x3,%ecx
  400519:       ba 02 00 00 00          mov    $0x2,%edx
  40051e:       be 01 00 00 00          mov    $0x1,%esi
  400523:       bf d0 05 40 00          mov    $0x4005d0,%edi
  400528:       b8 00 00 00 00          mov    $0x0,%eax
  40052d:       e8 be fe ff ff          callq  4003f0 <printf@plt>
  400532:       48 83 c4 30             add    $0x30,%rsp
  400536:       b8 00 00 00 00          mov    $0x0,%eax
}
  40053b:       c9                      leaveq 
  40053c:       c3                      retq   
  40053d:       0f 1f 00                nopl   (%rax)

From lines 21 to 15, you can see each argument being stored into registers with the mov) function up until it reaches the 6th argument where it uses pushq.

5. Move printf to new function and call function from main

hello-5.c

#include <stdio.h>

void output()
{
    printf("Hello world!\n");
}

int main()
{
    output();
 
    return 0;
}

g++ -g -fno-builtin -O0 hello-5.c -o hello5

We can see a new line for main, where we call the new output function:

00000000004005cc <main>:
...
4005d0:       e8 e1 ff ff ff          callq  4005b6 <_Z6outputv>
...

And the output function:

00000000004005b6 <_Z6outputv>:
#include <stdio.h>

void output()
{
  4005b6:       55                      push   %rbp
  4005b7:       48 89 e5                mov    %rsp,%rbp
    printf("Hello world!\n");
  4005ba:       bf 70 06 40 00          mov    $0x400670,%edi
  4005bf:       b8 00 00 00 00          mov    $0x0,%eax
  4005c4:       e8 e7 fe ff ff          callq  4004b0 <printf@plt>
}
  4005c9:       90                      nop
  4005ca:       5d                      pop    %rbp
  4005cb:       c3                      retq

6. Add optimization O3

We previously compiled our original hello world program with the -O0 optimization.
From mangcc:
-O0 Reduce compilation time and make debugging produce the expected results. This is the default.
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options.

Set O3 optimization:
g++ -g -fno-builtin -O3 hello.c -o hello_O3

Our original binary file size was 11000 bytes, whereas hello_O3 is 11248.

Now we’ll compare the two ELF files and run a diff comparison:


readelf -h hello > hello_readelfh.txt
readelf -h hello6 > hello6_readelfh.txt
diff hello_readelfh.txt hello6_readelfh.txt

< hello_readelfh.txt
> hello6_readelfh.txt

11c11
<   Entry point address:               0x4004c0
---
>   Entry point address:               0x4004e0
13c13
<   Start of section headers:          8784 (bytes into file)
---
>   Start of section headers:          8944 (bytes into file)
19,20c19,20
<   Number of section headers:         35
<   Section header string table index: 32
---
>   Number of section headers:         36
>   Section header string table index: 33

The start of the section headers starts 160 bytes later than the O0 optimized binary, and there is one extra section header for O3.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s