glibc – nexttowardf candidate for optimization on aarch64

In my previous post, I went over the steps to finding a function that was optimized for x86_64 but not on aarch64.

After going through the list of functions, I came across nexttowardf.c:

[lisac@localhost glibc]$ find ./* -name "*nexttowardf*"

From nexttowardf‘s man pages:

“The nextafter(), nextafterf(), and nextafterl() functions return the next representable floating-point value following x in the direction of y. If y is less than x, these functions will return the largest representable number less than x.

If x equals y, the functions return y.

The nexttoward(), nexttowardf(), and nexttowardl() functions do the same as the corresponding nextafter() functions, except that they have a long double second argument.”

Looking at the x86_64 optimization sysdeps/x86_64/fpu/s_nexttowardf.c:

#include <sysdeps/i386/fpu/s_nexttowardf.c>

This file contains only an include statement for the i386’s optimization. So I had a look at the sysdeps/i386/fpu/s_nexttowardf.c and it is essentially identical to the original math/s_nexttowardf.c version. One thing to note are the macros inside __nexttowardf: .



float __nexttowardf(float x, long double y)
    int32_t hx,hy,ix,iy;
    u_int32_t ly;

    ix = hx&0x7fffffff;     /* |x| */
    iy = hy&0x7fffffff;     /* |y| */


/* Get a 32 bit int from a float.  */
# define GET_FLOAT_WORD(i,d)                    \
do {                                \
  ieee_float_shape_type gf_u;                   \
  gf_u.value = (d);                     \
  (i) = gf_u.word;                      \
} while (0)
/* Set a float from a 32 bit int.  */
# define SET_FLOAT_WORD(d,i)                    \
do {                                \
  ieee_float_shape_type sf_u;                   \
  sf_u.word = (i);                      \
  (d) = sf_u.value;                     \
} while (0)
/* Get two 32 bit ints from a double.  */

#define EXTRACT_WORDS(ix0,ix1,d)                \
do {                                \
  ieee_double_shape_type ew_u;                  \
  ew_u.value = (d);                     \
  (ix0) =;                   \
  (ix1) =;                   \
} while (0)



float __nexttowardf(float x, long double y)
    int32_t hx,ix,iy;
    u_int32_t hy,ly,esy;

    ix = hx&0x7fffffff;     /* |x| */
    iy = esy&0x7fff;        /* |y| */


/* Direct movement of float into integer register.  */
#define GET_FLOAT_WORD(i, d) \
  do {                                        \
    int i_;                                   \
    asm (MOVD " %1, %0" : "=rm" (i_) : "x" ((float) (d)));            \
    (i) = i_;                                     \
  } while (0)
/* And the reverse.  */
#define SET_FLOAT_WORD(f, i) \
  do {                                        \
    int i_ = i;                                   \
    float f__;                                    \
    asm (MOVD " %1, %0" : "=x" (f__) : "rm" (i_));                \
    f = f__;                                      \
  } while (0)


/* Get three 32 bit ints from a double.  */

#define GET_LDOUBLE_WORDS(exp,ix0,ix1,d)            \
do {                                \
  ieee_long_double_shape_type ew_u;             \
  ew_u.value = (d);                     \
  (exp) =;             \
  (ix0) =;                   \
  (ix1) =;                   \
} while (0)

The x86_64 definition for GET_FLOAT_WORD and SET_FLOAT_WORD contains inline assembly. I will try a similar approach for the aarch64.


glibc – finding functions optimized for x86_64 but not aarch64

One approach to finding a possible candidate function for optimization was suggested by our professor. This approach was to look for all functions optimized in x86_64 but not on aarch64, since these are the two architectures we have worked with throughout the course.

I wrote a one-liner bash script to achieve this (note: all commands below are run inside the src/glibc folder):

for f in `find ./sysdeps/x86_64/* -name "*.c" |
> sed -rn 's/\.\/sysdeps\/x86_64\/(.*\/)*(.*\.c)/\1\2/p'`; 
> do ls ./sysdeps/aarch64/$f 2>>ls_error.txt; 
> done

This command will look for all c files in the sysdeps/x86_64 directory (recursively) and search for each corresponding filename in sysdeps/aarch64 (with the same directory structure as x86_64). Files that are found in x86_64 but not in aarch64 will cause an error in the ls ./sysdeps/aarch64/$f command and we can redirect those errors to ls_error.txt (files that exist in both will simply redirect to standard output).

$ less ls_error.txt
ls: cannot access './sysdeps/aarch64/dl-procinfo.c': No such file or directory
ls: cannot access './sysdeps/aarch64/dl-runtime.c': No such file or directory

To get the filenames only from ls_error.txt, I run:

$ cat ls_error.txt | sed -rn "s/^ls.*'\.\/(.*\/)*(.*\.c)'.*$/\2/p" > functions-x86_64_not_aarch64.txt
$ less functions-x86_64_not_aarch64.txt

Now I can go through this list and look for a function that can potentially be optimized for aarch64.

I had previously run pieces of the one-liner script above to write the functions found in x86_64 (to functions_x86_64.txt) and aarch64 (to functions-aarch64.txt) so I could count the number of functions in each architecture and get an idea of what I was dealing with. The functions-x86_64_aarch64.txt file includes functions that exist in both x86_64 and aarch64.

$ ls -l functions*
$ wc -l functions*
   54 functions-aarch64.txt
   24 functions-x86_64_aarch64.txt
  178 functions-x86_64_not_aarch64.txt
  199 functions-x86_64.txt

There are 178 files in x86_64 that are not in aarch64. Many of these however are test files, so I excluded the test and tst files:

$ cat functions-x86_64_not_aarch64.txt |
> grep -vP 'test|tst' > functions-x86_64_not_aarch64_notests.txt
$ wc -l functions*
93 functions-x86_64_not_aarch64_notests.txt

Now there are 93 possible functions I can choose from. I will follow up this post with some of my findings.

glibc difftime – no need for optimization

Upon further investigation, difftime can be left as is with no further optimization. Any optimization that can be done will have minimal effect in execution time. I will go over why that is.

__difftime (time_t time1, time_t time0)
  /* Convert to double and then subtract if no double-rounding error could
     result.  */

  if (TYPE_BITS (time_t) <= DBL_MANT_DIG
      || (TYPE_FLOATING (time_t) && sizeof (time_t) < sizeof (long double)))
    return (double) time1 - (double) time0;

  /* Likewise for long double.  */

  if (TYPE_BITS (time_t) <= LDBL_MANT_DIG || TYPE_FLOATING (time_t))
    return (long double) time1 - (long double) time0;

  /* Subtract the smaller integer from the larger, convert the difference to
     double, and then negate if needed.  */

  return time1 < time0 ? - subtract (time0, time1) : subtract (time1, time0);

For the first if condition, TYPE_BITS (time_t) and DBL_MANT_DIG are both constants, so the pre-processor will compare them at compile time and strip them from the executable altogether if they evaluate to true. The same applies to the second if condition. TYPES_BITS <= LDBL_MANT_DIG will be evaluated at compile time.

We can further validate this by compiling the code and looking at the assembly file:

I wrote a tester file that uses time.h's difftime.c:

// len_difftime_test.c
#include <stdio.h>
#include <time.h>
#include <limits.h>
#include <stdint.h>

int main(){
    // test time_t to uint_max conversion
    time_t time1 = time(NULL);
    time_t time0 = time(NULL) + 10;
    uintmax_t dt = (uintmax_t) time1 - (uintmax_t) time0;
    double delta = dt;
    printf("time1 = %d\ntime0 = %d\n", time1, time0);
    printf("(uintmax_t) time1 = %d\n", time1);
    printf("(uintmax_t) time0 = %d\n", time0);

    // test difftime function
    double result;
    result = difftime(time1, time0);
    printf("difftime(time1, time0) = %f\n", result);
    result = difftime(time0, time1);
    printf("difftime(time0, time1) = %f\n", result);

    return 0;

gcc -g -o len_difftime_test len_difftime_test.c

I use gdb debugger to get to line 18 which makes the first call to difftime.
gdb len_difftime_test

Set a breakpoint at line 18 and run:

(gdb) b 18
Breakpoint 1 at 0x400638: file len_difftime_test.c, line 18.
(gdb) r
Starting program: /home/lisac/SourceCode/Seneca/spo600/project/src/glibc/time/len_difftime_test 
time1 = 1490051018
time0 = 1490051028
(uintmax_t) time1 = 1490051018
(uintmax_t) time0 = 1490051028

Breakpoint 1, main () at len_difftime_test.c:18
18      result = difftime(time1, time0);

Step into the difftime function:
__difftime (time1=1490051390, time0=1490051400) at difftime.c:103
103 {
(gdb) s
114     return (long double) time1 - (long double) time0;
(gdb) s
120 }

Short circuiting or test-reordering will not improve the executable since the pre-processor will rid of the comparison of constants when they evaluate to true. As we can see on line 17, there is no condition, only the returning subtract calculation.

Here is the pre-processor output:

cpp difftime.c

  if ((sizeof (time_t) * 8) <= 53 <-- removed
      || (((time_t) 0.5 == 0.5) && sizeof (time_t) < sizeof (long double))) <-- removed
    return (double) time1 - (double) time0;

  if ((sizeof (time_t) * 8) <= 64 || ((time_t) 0.5 == 0.5)) <-- removed
    return (long double) time1 - (long double) time0;

Now I will be looking into more functions that are better candidates for optimization.

Open Source Tooling and Automation

Here I will demonstrate an example of using various open source tooling and automation on a GitHub repository.

Create repo to test

Initial commit for new test repository includes:

  • README file
  • .gitignore for Node
  • MIT license

Initialize npm package.json file

Since I have nodejs installed on my machine, I can go ahead and pull the newly created repository to my local machine.

git pull

Initialize the package.json file:
npm init

  "name": "lab7",
  "version": "1.0.0",
  "description": "Open Source Tooling and Automation",
  "main": "seneca.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  "repository": {
    "type": "git",
    "url": "git+"
  "author": "",
  "license": "MIT",
  "bugs": {
    "url": ""
  "homepage": "",
  "bin": {
    "seneca": "./seneca.js"
  "dependencies": {
    "commander": "^2.9.0"

Implement JavaScript functions

 * Given a string `email`, return `true` if the string is in the form
 * of a valid Seneca College email address, `false` othewise.
exports.isValidEmail = function(email) {
    // TODO: needs to be implemented

 * Given a string `name`, return a formatted Seneca email address for
 * this person. NOTE: the email doesn't need to be real/valid/active.
exports.formatSenecaEmail = function(name) {
    // TODO: needs to be implemented

First attempt at implementing stub functions (this will be improved later on using ESLint). Implementation also includes the use of npm’s commander library to be able to pass command line options to the script.

You can set bin environment variables in the package.json file to execute a script:

  "bin": {
    "seneca": "./seneca.js"

Build package.json:

npm install -g

Run script from command line:

$ seneca -v

$ seneca -v

$ seneca -f lkisac
name: lkisac

$ seneca -v -f lkisac
name: lkisac

Code works as expected, although it needs some clean up. In the next section, I will show how ESLint can assist in the clean up process.

Clean code w/ ESLint

Install and configure ESLint to validate our coding style:

npm install eslint --save-dev

--save-dev option adds configuration as development dependency (developing code vs. using code).

For this example, ESLint is configured with Airbnb styleguide, No React, and in JSON format.
./node_modules/.bin/eslint --init

	Installing eslint-plugin-import, eslint-config-airbnb-base
	lab7@1.0.0 C:\github\OpenSourceToolingAutomation
	+-- eslint-config-airbnb-base@11.1.1
	`-- eslint-plugin-import@2.2.0
	  +-- builtin-modules@1.1.1
	  +-- contains-path@0.1.0
	  +-- doctrine@1.5.0
	  +-- eslint-import-resolver-node@0.2.3
	  +-- eslint-module-utils@2.0.0
	  | +-- debug@2.2.0
	  | | `-- ms@0.7.1
	  | `-- pkg-dir@1.0.0
	  +-- has@1.0.1
	  | `-- function-bind@1.1.0
	  +-- lodash.cond@4.5.2
	  `-- pkg-up@1.0.0
	    `-- find-up@1.1.2
	      `-- path-exists@2.1.0
Successfully created .eslintrc.json file in C:\github\OpenSourceToolingAutomation

Now I can run newly configured eslint on the JavaScript file seneca.js:

./node_modules/.bin/eslint seneca.js

Working with warnings/errors

First, there were many linebreak-style issues with the error message: “Expected linebreaks to be ‘LF’ but found ‘CRLF'”. I fixed this by running dos2unix seneca.js to convert line endings to Unix format.

Other warnings/errors included:

  • Unexpected var, use let or const instead
  • Strings must use singlequote
  • Missing space before function parentheses

To organize these fixes properly, I grouped similar issues together:

i.e. for the Unexpected var, use let or const instead error, I ran:

./node_modules/.bin/eslint seneca.js | grep 'Unexpected var, use let or const instead'

Once each line containing that issue was fixed, I commit the fix to GitHub. This will make each commit clearer and specific instead of all issues crammed together into one commit.

History for fixes (pre-pended by “fixed “).

ESLint is extremely useful to get your code to match a specific style. The config file is customizable, so any project can contain its own settings. This can help contributors follow a specific standard for a given project.

Add Travis CI to repository

Following the getting started guide, I set up my Travis account by syncing my existing GitHub account.
You can customize your .travis.yml file for a particular language. List of languages is provided here.

language: node_js
  - "6"
  - npm install
  - npm test

You can also validate your yml file here by providing a link to your repository (containing the yml file), or by pasting your yml file into the textbox provided.


To keep track of your repository’s build status you can add a “build badge” to your repository.

Travis CI is used in almost all GitHub open source projects. Anytime you submit a pull request, it must pass one or more Travis builds.

Makefile rules & recipes

Recipes for the rules defined in your Makefiles require specific indentation. Each line in a recipe (i.e. “tests” below) must start with a tab character.

You can run:

cat -n -E -T Makefile
option description
n Show line numbers
e equivalent to -vE
t equivalent to -vT
E, –show-ends display $ at end of each line
T, –show-tabs display TAB characters as ^I
v, –show-nonprinting use ^ and M- notation, except for LFD and TAB

Which produces something like:

include ../Makeconfig$
headers := time.h sys/time.h sys/timeb.h bits/time.h^I^I^I\$
^I   bits/types/clockid_t.h bits/types/clock_t.h^I^I^I\$
^I   bits/types/struct_itimerspec.h^I^I^I^I\$
^I   bits/types/struct_timespec.h bits/types/struct_timeval.h^I\$
^I   bits/types/struct_tm.h bits/types/timer_t.h^I^I^I\$
^I   bits/types/time_t.h$
routines := offtime asctime clock ctime ctime_r difftime \$
^I    gmtime localtime mktime time^I^I \$
^I    gettimeofday settimeofday adjtime tzset^I \$
^I    tzfile getitimer setitimer^I^I^I \$
^I    stime dysize timegm ftime^I^I^I \$
^I    getdate strptime strptime_l^I^I^I \$
^I    strftime wcsftime strftime_l wcsftime_l^I \$
^I    timespec_get$
aux :=^I    era alt_digit lc-time-cleanup$
tests := test_time clocktest tst-posixtz tst-strptime tst_wcsftime \$
^I   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \$
^I   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \$
^I   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \$
^I   tst-tzname$

Where ^I represents a tab character and $ represents a newline character. You can use this to check for valid tab and newline indentation in your recipes in case you run into this error: *** missing separator. Stop.
If you’re using vi, make sure to use :set noet to disable replacement of tabs with a tabwidth set number of spaces.

glibc – proposed approach to optimize difftime/subtract

This is a continuation to my previous post on choosing a glibc function that could potentially be optimized. Now I’ll discuss my proposed approach for potential optimization.


difftime has a few handlers for calculating doubles and long doubles, but for any other types it will simply subtract the larger time value from the smaller one and return the result.

Let’s look at difftime first:

/* Return the difference between TIME1 and TIME0.  */
__difftime (time_t time1, time_t time0)
  /* Convert to double and then subtract if no double-rounding error could
     result.  */

  if (TYPE_BITS (time_t) <= DBL_MANT_DIG
      || (TYPE_FLOATING (time_t) && sizeof (time_t) < sizeof (long double)))
    return (double) time1 - (double) time0;

  /* Likewise for long double.  */

  if (TYPE_BITS (time_t) <= LDBL_MANT_DIG || TYPE_FLOATING (time_t))
    return (long double) time1 - (long double) time0;

  /* Subtract the smaller integer from the larger, convert the difference to
     double, and then negate if needed.  */

  return time1 < time0 ? - subtract (time0, time1) : subtract (time1, time0);

The IF condition for doubles does not contain any significantly expensive operations (i.e. multiply, divide), and since it doesn’t, it may not be necessary to change anything here, but, we know that if the first condition before the OR is not met, we won’t need to execute the second condition, so this could also be written as:

if (TYPE_BITS (time_t) <= DBL_MANT_DIG) {return (double) time1 - (double) time0;}
if (TYPE_FLOATING (time_t) && sizeof (time_t) < sizeof (long double))) {return (double) time1 - (double) time0;}

Since the first condition is the smaller of the two, we test it first and immediately return our result if the condition is met. If not, we can then check the next slightly larger condition.

We can apply a similar approach for the second condition:

if (TYPE_FLOATING (time_t)) {return (long double) time1 - (long double) time0;}
if (TYPE_BITS (time_t) <= LDBL_MANT_DIG) {return (long double) time1 - (long double) time0;}

Something else I noticed inside the __difftime function was the checks for double and long double were always returning time1 minus time0 regardless of which was the larger value. On my particular machine (x86_64), the second IF condition was true since TYPE_BITS(time_t) was lower than LDBL_MANT_DIG, so line 11 was being executed.

__difftime (time_t time1, time_t time0)
  if (TYPE_BITS (time_t) <= DBL_MANT_DIG
      || (TYPE_FLOATING (time_t) && sizeof (time_t) < sizeof (long double))) {
    return (double) time1 - (double) time0;

  if (TYPE_BITS (time_t) <= LDBL_MANT_DIG || TYPE_FLOATING (time_t)) {
    //return time1 < time0 ? (long double) time0 - (long double) time1 : (long double) 
    return (long double) time1 - (long double) time0;

  return time1 < time0 ? - subtract (time0, time1) : subtract (time1, time0);

I wrote a small tester for this:

int main() {

    // test difftime function
    time_t time1 = time(NULL);
    time_t time0 = time(NULL) + 10;
    printf("time1 = %d\ntime0 = %d\n", time1, time0);
    double result;
    result = __difftime(time1, time0);
    printf("difftime(time1, time0) = %f\n", result);
    result = __difftime(time0, time1);
    printf("difftime(time0, time1) = %f\n", result);

    return 0;

Which outputs:

time1 = 1489180977
time0 = 1489180987
difftime(time1, time0) = -10.000000
time1 = 1489180987
time0 = 1489180977
difftime(time0, time1) = 10.000000

Both results should return 10, but we are missing the time1 < time0 comparison check for each of those conditions, so I included the ternary operators in both conditions:

__difftime (time_t time1, time_t time0)
  if (TYPE_BITS (time_t) <= DBL_MANT_DIG
      || (TYPE_FLOATING (time_t) && sizeof (time_t) < sizeof (long double))) {
    return time1 < time0 ? (double) time0 - (double) time1 : (double) time1 - (double)

  if (TYPE_BITS (time_t) <= LDBL_MANT_DIG || TYPE_FLOATING (time_t)) {
    return time1 < time0 ? (long double) time0 - (long double) time1 : (long double) 


New output:

time1 = 1489181645
time0 = 1489181655
difftime(time1, time0) = 10.000000
time1 = 1489181655
time0 = 1489181645
difftime(time0, time1) = 10.000000


This function is called for any number other than double or long double type. If the time_t type is not a signed type, then the function simply returns the result of time1 - time0. If time_t type is a signed type, handle optimization.

Front End Development w/ Visual Studio Code

VS Code

VS Code is a lightweight code editor for front-end development. It is cross platform and includes many extensions. I will go over some of the extensions I found very useful as well as some of VS code’s neat features.

Test Drive

Open a project

You can have your project directory layout by simply opening the project folder from VS Code.

Here is an example opening the open source brackets project:


Change indent from tabs to spaces

VS Code by default tries to figure out the formatting based on the file you have open.
If you want to explicitly set this, you can open the settings.json file by going to File > Preferences > Settings, and set it to your preference:

"editor.tabSize": 4
"editor.insertSpaces": true

Multi-line editing

If you have multiple lines of common code, you can hit Alt + Shift + Left click and highlight multiple lines to either delete or write new code.

You can also place the cursors arbitrarily and edit your text (while holding Alt + Shift):


Debugger for Chrome

You can download and install the Chrome Debugger extension here, or install it from within VS Code by hitting Ctrl + Shift + X and searching for ‘Debugger’.

Chrome Debugging Protocol Viewer is the same protocol used by Chrome Dev Tools that allows for tools for debugging.

Here is a guide for getting started using Chrome Debugging with VS Code. It is also necessary to go through the readme docs in github.


Getting the VS Code Chrome debugger to work with Thimble has been a challenge. After I have brackets and Thimble up and running I open localhost:3500 in my Chrome browser – everything is ok.

Then attach to chrome using this launch.config:

        "name": "Attach",
        "type": "chrome",
        "request": "attach",
        "port": 9222,
        "url": "http://localhost:3500/en-US/",
        "webRoot": "${workspaceRoot}/client/en_US/",
        "sourceMaps": true,
        "diagnosticLogging": true,
        "sourceMapPathOverrides": {
            "scripts/*": "${webRoot}/scripts/*"

After trying various webRoot and sourceMapPathOverride settings, I would still see ‘sourceRoot undefined’ in my diagnostic log output:

Paths.scriptParsed: could not resolve to a file under webRoot: c:\github\ It may be external or served directly from the server's memory (and that's OK).
Target userAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
SourceMap: creating for http://localhost:3500/node_modules/jquery/dist/jquery.min.js
SourceMap: sourceRoot: undefined
SourceMap: sources: ["jquery.js"]
SourceMap: webRoot: c:\github\
SourceMap: no sourceRoot specified, using webRoot + script path dirname: c:\github\\client\en_US\node_modules\jquery\dist
SourceMaps.scriptParsed: http://localhost:3500/node_modules/jquery/dist/jquery.min.js was just loaded and has mapped sources: ["c:\\github\\\\client\\en_US\\node_modules\\jquery\\dist\\jquery.js"]

Debugging example:

I will include some examples of using the remote debugger with the Thimble project, as well as the settings I used to get it working properly.


You can access a PowerShell terminal from VS Code which by default starts in your project directory.

Common Key Bindings

Open Project Folder Ctrl + K, Ctrl + O
Go to Line: Ctrl + G
Move editor left: Ctrl + PgUp
Move editor right: Ctrl + PgDn
Split Editor: Ctrl + \
List Methods: Ctrl + Shift + O
Search for File: Ctrl + E
Search for files: Ctrl + E
Search for text (in current file): Ctrl + F
Search for text (all files in project): Ctrl + Shift + F

More key bindings

Color Themes

Change Color Theme by Clicking File > Preferences > Color Theme. Or Ctrl + K, Ctrl + T.