glibc – finding functions optimized for x86_64 but not aarch64

One approach to finding a possible candidate function for optimization was suggested by our professor. This approach was to look for all functions optimized in x86_64 but not on aarch64, since these are the two architectures we have worked with throughout the course.

I wrote a one-liner bash script to achieve this (note: all commands below are run inside the src/glibc folder):

for f in `find ./sysdeps/x86_64/* -name "*.c" |
> sed -rn 's/\.\/sysdeps\/x86_64\/(.*\/)*(.*\.c)/\1\2/p'`; 
> do ls ./sysdeps/aarch64/$f 2>>ls_error.txt; 
> done

This command will look for all c files in the sysdeps/x86_64 directory (recursively) and search for each corresponding filename in sysdeps/aarch64 (with the same directory structure as x86_64). Files that are found in x86_64 but not in aarch64 will cause an error in the ls ./sysdeps/aarch64/$f command and we can redirect those errors to ls_error.txt (files that exist in both will simply redirect to standard output).

$ less ls_error.txt
ls: cannot access './sysdeps/aarch64/dl-procinfo.c': No such file or directory
ls: cannot access './sysdeps/aarch64/dl-runtime.c': No such file or directory
...
...

To get the filenames only from ls_error.txt, I run:

$ cat ls_error.txt | sed -rn "s/^ls.*'\.\/(.*\/)*(.*\.c)'.*$/\2/p" > functions-x86_64_not_aarch64.txt
$ less functions-x86_64_not_aarch64.txt
dl-procinfo.c
dl-runtime.c
...

Now I can go through this list and look for a function that can potentially be optimized for aarch64.

I had previously run pieces of the one-liner script above to write the functions found in x86_64 (to functions_x86_64.txt) and aarch64 (to functions-aarch64.txt) so I could count the number of functions in each architecture and get an idea of what I was dealing with. The functions-x86_64_aarch64.txt file includes functions that exist in both x86_64 and aarch64.

$ ls -l functions*
functions-aarch64.txt
functions-x86_64_aarch64.txt
functions-x86_64_not_aarch64.txt
functions-x86_64.txt
$ wc -l functions*
   54 functions-aarch64.txt
   24 functions-x86_64_aarch64.txt
  178 functions-x86_64_not_aarch64.txt
  199 functions-x86_64.txt

There are 178 files in x86_64 that are not in aarch64. Many of these however are test files, so I excluded the test and tst files:

$ cat functions-x86_64_not_aarch64.txt |
> grep -vP 'test|tst' > functions-x86_64_not_aarch64_notests.txt
$ wc -l functions*
93 functions-x86_64_not_aarch64_notests.txt

Now there are 93 possible functions I can choose from. I will follow up this post with some of my findings.

glibc difftime – no need for optimization

Upon further investigation, difftime can be left as is with no further optimization. Any optimization that can be done will have minimal effect in execution time. I will go over why that is.

double
__difftime (time_t time1, time_t time0)
{
  /* Convert to double and then subtract if no double-rounding error could
     result.  */

  if (TYPE_BITS (time_t) <= DBL_MANT_DIG
      || (TYPE_FLOATING (time_t) && sizeof (time_t) < sizeof (long double)))
    return (double) time1 - (double) time0;

  /* Likewise for long double.  */

  if (TYPE_BITS (time_t) <= LDBL_MANT_DIG || TYPE_FLOATING (time_t))
    return (long double) time1 - (long double) time0;

  /* Subtract the smaller integer from the larger, convert the difference to
     double, and then negate if needed.  */

  return time1 < time0 ? - subtract (time0, time1) : subtract (time1, time0);
}

For the first if condition, TYPE_BITS (time_t) and DBL_MANT_DIG are both constants, so the pre-processor will compare them at compile time and strip them from the executable altogether if they evaluate to true. The same applies to the second if condition. TYPES_BITS <= LDBL_MANT_DIG will be evaluated at compile time.

We can further validate this by compiling the code and looking at the assembly file:

I wrote a tester file that uses time.h's difftime.c:

// len_difftime_test.c
#include <stdio.h>
#include <time.h>
#include <limits.h>
#include <stdint.h>

int main(){
    // test time_t to uint_max conversion
    time_t time1 = time(NULL);
    time_t time0 = time(NULL) + 10;
    uintmax_t dt = (uintmax_t) time1 - (uintmax_t) time0;
    double delta = dt;
    printf("time1 = %d\ntime0 = %d\n", time1, time0);
    printf("(uintmax_t) time1 = %d\n", time1);
    printf("(uintmax_t) time0 = %d\n", time0);

    // test difftime function
    double result;
    result = difftime(time1, time0);
    printf("difftime(time1, time0) = %f\n", result);
    result = difftime(time0, time1);
    printf("difftime(time0, time1) = %f\n", result);

    return 0;
}

Compile:
gcc -g -o len_difftime_test len_difftime_test.c

I use gdb debugger to get to line 18 which makes the first call to difftime.
gdb len_difftime_test

Set a breakpoint at line 18 and run:

(gdb) b 18
Breakpoint 1 at 0x400638: file len_difftime_test.c, line 18.
(gdb) r
Starting program: /home/lisac/SourceCode/Seneca/spo600/project/src/glibc/time/len_difftime_test 
time1 = 1490051018
time0 = 1490051028
(uintmax_t) time1 = 1490051018
(uintmax_t) time0 = 1490051028

Breakpoint 1, main () at len_difftime_test.c:18
18      result = difftime(time1, time0);

Step into the difftime function:
__difftime (time1=1490051390, time0=1490051400) at difftime.c:103
103 {
(gdb) s
114     return (long double) time1 - (long double) time0;
(gdb) s
120 }

Short circuiting or test-reordering will not improve the executable since the pre-processor will rid of the comparison of constants when they evaluate to true. As we can see on line 17, there is no condition, only the returning subtract calculation.

Here is the pre-processor output:

cpp difftime.c

  if ((sizeof (time_t) * 8) <= 53 <-- removed
      || (((time_t) 0.5 == 0.5) && sizeof (time_t) < sizeof (long double))) <-- removed
    return (double) time1 - (double) time0;



  if ((sizeof (time_t) * 8) <= 64 || ((time_t) 0.5 == 0.5)) <-- removed
    return (long double) time1 - (long double) time0;

Now I will be looking into more functions that are better candidates for optimization.

Makefile rules & recipes

Recipes for the rules defined in your Makefiles require specific indentation. Each line in a recipe (i.e. “tests” below) must start with a tab character.

You can run:

cat -n -E -T Makefile
option description
n Show line numbers
e equivalent to -vE
t equivalent to -vT
E, –show-ends display $ at end of each line
T, –show-tabs display TAB characters as ^I
v, –show-nonprinting use ^ and M- notation, except for LFD and TAB

Which produces something like:

include ../Makeconfig$
$
headers := time.h sys/time.h sys/timeb.h bits/time.h^I^I^I\$
^I   bits/types/clockid_t.h bits/types/clock_t.h^I^I^I\$
^I   bits/types/struct_itimerspec.h^I^I^I^I\$
^I   bits/types/struct_timespec.h bits/types/struct_timeval.h^I\$
^I   bits/types/struct_tm.h bits/types/timer_t.h^I^I^I\$
^I   bits/types/time_t.h$
$
routines := offtime asctime clock ctime ctime_r difftime \$
^I    gmtime localtime mktime time^I^I \$
^I    gettimeofday settimeofday adjtime tzset^I \$
^I    tzfile getitimer setitimer^I^I^I \$
^I    stime dysize timegm ftime^I^I^I \$
^I    getdate strptime strptime_l^I^I^I \$
^I    strftime wcsftime strftime_l wcsftime_l^I \$
^I    timespec_get$
aux :=^I    era alt_digit lc-time-cleanup$
$
tests := test_time clocktest tst-posixtz tst-strptime tst_wcsftime \$
^I   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \$
^I   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \$
^I   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \$
^I   tst-tzname$
$

Where ^I represents a tab character and $ represents a newline character. You can use this to check for valid tab and newline indentation in your recipes in case you run into this error: *** missing separator. Stop.
If you’re using vi, make sure to use :set noet to disable replacement of tabs with a tabwidth set number of spaces.

Analyzing C compiler options, ELF files & optimization on x86_64 architecture

Compiling a simple hello world C program and exploring some of the different compiler options and how they affect the ELF (Executable and Linkable Format) executable file. This will be done on an x86-64 architecture system.

hello.c

#include <stdio.h>

int main()
{
  printf("Hello world!\n");
}

gcc options:

-g           # enable debugging information
-O0          # do not optimize
-fno-builtin # do not use builtin function optimizations

objdump options:

-f       # display header information for the entire file
-s       # display per-section summary information
-d       # disassemble sections containing code
--source # (implies -d) show source code, if available, along with disassembly

To find out which section headers in the executable contain our code, we will use the -d option:

objdump -d hello

The 3 disassembled sections we see are .init, .plt and .text.

To find out which section contains our string to be printed, we use:

objdump -s hello

If we search for the string “Hello”, we will see that it is inside of the .rodata section.

Contents of section .rodata:

400590 01000200 00000000 00000000 00000000 ................
4005a0 48656c6c 6f20776f 726c6421 0a00     Hello world!..

Now we will compare our original ELF file with new ones using different compiler options:

gcc -g -O0 -fno-builtin hello.c -o hello

ls -l hello
Size of the executable is 11000 bytes.

objdump --source hello


00000000004004f6 :
#include

int main()
{
4004f6: 55             push %rbp
4004f7: 48 89 e5       mov %rsp,%rbp
printf("Hello world!\n");
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
4004ff: b8 00 00 00 00 mov $0x0,%eax
4004ff: e8 ec fe ff ff callq 4003f0 &lt;puts@plt&gt; 400504: b8 00 00 00 00 mov $0x0,%eax
}
400509: 5d             pop %rbp
40050a: c3             retq
40050b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

Adding/Removing compile options

1. Adding the -static option

gcc -g -O0 -fno-builtin -static hello.c -o hello-1

Size of ELF file has grown to 916088 bytes since the dynamic libraries are added to the binary file. This is good for portability since the binary does not require dependencies at runtime in order to run.

2. Removing -fno-builtin option

gcc -g -O0 hello.c -o hello-2

objdump --source hello-2

00000000004004f6 :
#include

int main()
{
4004f6: 55             push %rbp
4004f7: 48 89 e5       mov %rsp,%rbp
printf("Hello world!\n");
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
<del datetime="2017-02-02T16:23:20+00:00">4004ff: b8 00 00 00 00 mov $0x0,%eax</del>
400504: e8 e7 fe ff ff callq 4003f0 &lt;printf@plt&gt; 400509: b8 00 00 00 00 mov $0x0,%eax
}
40050e: 5d             pop %rbp
40050f: c3             retq

Since built-in function optimization is not enabled, we can see one instruction has been removed from the ELF file – size is now 8528 bytes.

3. Removing -g option

gcc -O0 hello.c -o hello-3

00000000004004f6 :
4004f6: 55             push %rbp
4004f7: 48 89 e5       mov %rsp,%rbp
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
4004ff: e8 ec fe ff ff callq 4003f0 &lt;puts@plt&gt; 400504: b8 00 00 00 00 mov $0x0,%eax
400509: 5d             pop %rbp
40050a: c3             retq
40050b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

This compiles without attaching debugging information, and we cannot see the inline source code – size is now 917680 bytes.

4. Adding additional arguments to printf() function

To analyze the differences in registers used for every argument added, 10 sequential integer arguments were added to the printf() function:

hello-4.c

#include

int main()
{
printf("Hello World!, %d%d%d%d%d%d%d%d%d%d\n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}

gcc -O0 -g hello.c -o hello-4

objdump --source lab4-4

00000000004004f6 <main>:
#include <stdio.h>

int main()
{
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
    printf("Hello world!, %d%d%d%d%d%d%d%d%d%d\n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  4004fa:       48 83 ec 08             sub    $0x8,%rsp
  4004fe:       6a 0a                   pushq  $0xa
  400500:       6a 09                   pushq  $0x9
  400502:       6a 08                   pushq  $0x8
  400504:       6a 07                   pushq  $0x7
  400506:       6a 06                   pushq  $0x6
  400508:       41 b9 05 00 00 00       mov    $0x5,%r9d
  40050e:       41 b8 04 00 00 00       mov    $0x4,%r8d
  400514:       b9 03 00 00 00          mov    $0x3,%ecx
  400519:       ba 02 00 00 00          mov    $0x2,%edx
  40051e:       be 01 00 00 00          mov    $0x1,%esi
  400523:       bf d0 05 40 00          mov    $0x4005d0,%edi
  400528:       b8 00 00 00 00          mov    $0x0,%eax
  40052d:       e8 be fe ff ff          callq  4003f0 <printf@plt>
  400532:       48 83 c4 30             add    $0x30,%rsp
  400536:       b8 00 00 00 00          mov    $0x0,%eax
}
  40053b:       c9                      leaveq 
  40053c:       c3                      retq   
  40053d:       0f 1f 00                nopl   (%rax)

From lines 21 to 15, you can see each argument being stored into registers with the mov) function up until it reaches the 6th argument where it uses pushq.

5. Move printf to new function and call function from main

hello-5.c

#include <stdio.h>

void output()
{
    printf("Hello world!\n");
}

int main()
{
    output();
 
    return 0;
}

g++ -g -fno-builtin -O0 hello-5.c -o hello5

We can see a new line for main, where we call the new output function:

00000000004005cc <main>:
...
4005d0:       e8 e1 ff ff ff          callq  4005b6 <_Z6outputv>
...

And the output function:

00000000004005b6 <_Z6outputv>:
#include <stdio.h>

void output()
{
  4005b6:       55                      push   %rbp
  4005b7:       48 89 e5                mov    %rsp,%rbp
    printf("Hello world!\n");
  4005ba:       bf 70 06 40 00          mov    $0x400670,%edi
  4005bf:       b8 00 00 00 00          mov    $0x0,%eax
  4005c4:       e8 e7 fe ff ff          callq  4004b0 <printf@plt>
}
  4005c9:       90                      nop
  4005ca:       5d                      pop    %rbp
  4005cb:       c3                      retq

6. Add optimization O3

We previously compiled our original hello world program with the -O0 optimization.
From mangcc:
-O0 Reduce compilation time and make debugging produce the expected results. This is the default.
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options.

Set O3 optimization:
g++ -g -fno-builtin -O3 hello.c -o hello_O3

Our original binary file size was 11000 bytes, whereas hello_O3 is 11248.

Now we’ll compare the two ELF files and run a diff comparison:


readelf -h hello > hello_readelfh.txt
readelf -h hello6 > hello6_readelfh.txt
diff hello_readelfh.txt hello6_readelfh.txt

< hello_readelfh.txt
> hello6_readelfh.txt

11c11
<   Entry point address:               0x4004c0
---
>   Entry point address:               0x4004e0
13c13
<   Start of section headers:          8784 (bytes into file)
---
>   Start of section headers:          8944 (bytes into file)
19,20c19,20
<   Number of section headers:         35
<   Section header string table index: 32
---
>   Number of section headers:         36
>   Section header string table index: 33

The start of the section headers starts 160 bytes later than the O0 optimized binary, and there is one extra section header for O3.

Contributing to open source project – npm mysql packages.json

I chose the npm mysql package to contribute to. In this post, I will go over the steps I took for contributing a minor fix to an open source project, using a Linux terminal, Vi editor and Git.

Review Process

Looking at the package.json file, I noticed that the repository field URL did not contain an absolute URL path.  So I ran the file through the package-json-validator:

package-json_result

The validator also recommends to include the “keywords” and “bugs” fields.  I read over the official npm documentation to see how to properly write these fields in package.json.

Repository:

"repository" :
  { "type" : "git",
    "url" : "https://github.com/npm/npm.git"
  }

Keywords: An array of strings to help developers find the package using ‘npm search’.

Bugs:

{
  "url" : "https://github.com/owner/project/issues",
  "email" : "project@hostname.com"
}

I can adjust the fields now to match these formats.

Adding my changes

  • Fork repository from https://github.com/mysqljs/mysql
  • Clone project to my local workspace: git clone git@github.com:lkisac/mysql.git
  • I used Vim editor to edit the package.json file: vi packages.json
  • Adjust repository URL to:
    "repository": {
      "type": "git",
      "url": "http://github.com/mysqljs/mysql"
    },
    
  • Add change to git staging: git add package.json
  • I thought it might be a good idea to keep this change separate from the “keywords + bugs” fields change for the sake of having separate commit messages for each, so I committed this change first: git commit -m "replaced repository url with valid repository url and type"

  • Then back to editing package.json in vi to add the “keywords” and “bugs” fields:
"keywords": [
  "mysql",
  "sql",
  "database",
  "query"
],
"bugs": "http://github.com/mysqljs/mysql/issues",

Since I only included the issues URL in the bugs field, I followed the suggestion in npm of having it as only a single string instead of an object.

  • Add change: git add package.json
  • Commit change: git commit -m "added keywords + bugs url"
  • View changes in commits before pushing to repo: git show

After running git show, I noticed the indentation was off (4 spaces instead of 2) on the “sql” keyword line.  I had already set my tabwidth setting to 2, but found out that this setting does not insert spaces in replacement of a tab character. To do this you have to set the shiftwidth and set expandtab. So I added these two lines to my ~/.vimrc file (as root user, or in Fedora /etc/vimrc):


:set shiftwidth=2
:set expandtab

I switched back to my regular user and opened the package.json file again in vi.  If I press enter now from the “mysql” line, vi now automatically inserts 2 spaces at the beginning of the line.  To double check this, after hitting enter you can backspace and notice the cursor now moves back one space instead of the tabwidth’s set number of spaces.

An alternate quick fix for this of course would be to use a regular expression substitution in vi on lines 13 to 23:

:13,23s/\t/  /g

This will replace all tab characters with two spaces from lines 13 to 23.

Continuing with my changes… now I save these changes and exit vi CTRL + ZZ

  • See if changes are correct: git diff package.json
  • Spaces look good now, so I can add my changes: git add package.json
  • Commit changes to git: git commit -m "fixed vi indent to 2 spaces for sql keyword"

Now looking at the last three commits I made, I figured since the latest commit was only a minor fix to the previous, I wanted to have the two as only one commit.

git log --max-count 3

git_last_3_commits_combine

You can do this with a “git rebase”:

git rebase --interactive HEAD~2

This opens the git rebase file:

git_rebase_start

Here I want to pick the “added keywords + bugs url” as the main commit, and squash the “fixed vi indent…” commit into the main commit.

git_rebase_squash

Now I can hit Shift + ZZ to save my changes and exit git rebase.

Git now displays the combination of commit messages:

git_rebase_commit_msg

Since I want to only use the “added keywords + bugs url” commit message, I can delete the 2nd commit message.

git_rebase_new_commit_msg

Hit Shift + ZZ again to save the changes and exit.

Now the “fixed vi indent…” commit has been squashed with the “added keywords + bugs url” commit as one commit.

git_final_squash_log

Now I can push to the remote repository:
git push

One thing to note, as mentioned in github regarding git rebase, if the commits you squashed locally were previously already pushed to your remote repository, you would have to run:

git push origin branch-name --force

Although you have to be careful when choosing to do a rebase on already pushed commits and make sure they have not been reviewed or used in any way.  Use –force only only for very recent commits.

Creating Pull Request

Create the pull request by clicking the ‘Create pull request’ button in GitHub, then write a description of the changes you’ve made and create the new pull request.

pull_request_checks

Pull Request Review

One of the collaborators were quick to respond within 3 minutes of the pull request.  It was noted that npm automatically populates these fields now so adjusting the packages.json file is not necessary.  The new npm also no longer uses the keywords array in the search and the validator URL is also valid as a shortcut URL for git.

So I ended up creating a new issue for the validator-tool and the maintainer confirmed that the validator is slightly outdated and said they would be looking into updating it when they had the chance.

Vi settings

Show Line Numbers:

:set nu

Set tab size:

:set tabstop=4

Turn off word wrap:

:set nowrap

To permanently use these settings:
Add the lines to ~/.exrc (or ~/.vimrc) file

vi ~/.exrc