smirking teapot

“I’m not an expert, I’m just a dude.” -Scott Schurr, CppCon 2015.

Understanding Negative Integer Literals

Posted at — Jun 14, 2023

TL;DR: Write your negative numbers in base-10 and, wrap them in parentheses.

An unusual behaviour

There are, generally, two ways to represent a negative integer literal:

  1. a number with a negative sign.
  2. two’s complement hex/oct/binary representation.

With the above knowledge, you’d expect that the functions f and g below would return the same value, -1.

Int64 f() { return -1; }
Int64 g() { return 0xffffffff; } // 32-bit hex repr of -1

And you’d be correct… if you’re writing your code in Java. In most other popular languages ( like C++, Python, JavaScript, Rust, Go, Haskell, Swift, Kotlin & C# ), if you write something similar, you’d get the following output:

f() : -1
g() : 4294967295

This was surprising for me. And until I read the references, I thought this was a bug.

It’s not a bug, it’s a feature

The large positive number returned earlier is the expected behavior because in those languages there are NO negative integer literals. Then what’s the -1 that’s returned in f()? Well, that’s a unary minus expression. It means that -1 is actually -(1) i.e. two tokens <minus> <integer_literal>. And this rule applies to hex/oct/bin representations as well.

This means that even if a negative number is represented in two’s-complement system internally, the languages are treating 0xffffffff as a positive integer literal.

The exact reason might be slightly different for different languages for example, In C++ & Rust, the smallest integer type that can fit the literal becomes the type of the literal so, since the largest positive number that can fit in a 32-bit integer is 0x7fffffff, it can’t store the literal 0xffffffff in it so, the next larger type is used to store and then that value, 4294967295, is returned as the result. In languages like JavaScript, Python or Haskell, since they have infinite precision integers, it would not be possible to determine where exactly to put the sign bit. Irrespective of the reason, the point by and large remains the same, that there’re no negative integer literals and 0xffffffff is treated as a positive integer.

But, why?

A common supporting argument is that “it’s a convention to simplify the grammar”. If you treat negative integers as literals, you now need to make the Parser smart enough to differentiate them with unary-minus and other arithmetic expressions.

In fact, I found this behaviour even in FORTRAN which was developed in 1950s, specifically designed for scientific and engineering applications. So another argument for this behaviour, I guess, could be is to keep the behaviour consistent with mathematics i.e., a negative number is the result of an additive inverse operation over a positive number denoted by the minus sign. But, there are other challenges that arise due to this.

Possible challenges

First, languages will have to deal with operator precedence. Some languages like Python, JavaScript, Kotlin & Rust allows method calls on integer literals which means that reality and expectations won’t match if you don’t surround your unary minus expression with parentheses as the operator resolution operator generally has higher precedence than unary minus operator. For example:

In Kotlin:

println( -1.plus(2) ) // prints '-3'
println( (-1).plus(2) ) // prints '1'

In Python:

print(-1 .__add__(2))   # prints -3 
print((-1) .__add__(2))  # prints  1

In Javascript:

// LHS evals to string "-1"
console.log( ( -1 ) .toString() === -1 );    // prints 'false'

// LHS evals to int -1
console.log( -1 .toString() === -1 );       // prints 'true'

In Rust:

println!( "{}", -9i32.abs( ) );   // prints -9
println!( "{}", (-9i32).abs( ) ); // prints  9

In Ada, mod and rem are evaluated before unary minus:

-7 mod 5 // evaluates to -2
(-7) mod 5 // evaluates 3

Second, by treating negative integers as unary-minus expression, there could be a conflict with other operators like pre-decrement and subtraction operators. For example, the grammar would need to differentiate between a --5 and a --x if they allow both the decrement and the unary-minus expr because, decrement being a Read-Modify-Write operation, does not semantically make sense to be applied on a literal which is constant in nature. Although, I’m not aware of any language that allows pre-decremet of both an identifier and a literal.

Third, there’s operator overloading in languages like C++. Since the overloaded ‘operator-’ in C++ must have at least one parameter of a user-defined type so, it can’t be overloaded in this case. phew!

Some alternative solutions

Interestingly, some languages allow an alternative syntax for negative numbers.

For example, APL, an array-programming language, uses a high-minus(Unicode code point U+00AF) sign e.g. ¯3 as opposed to regular minus -3 to denote negative numbers and J, another array-programming language, uses unsderscore, e.g. _3, as a prefix for the same.

In Haskell, you can use the NegativeLiterals extension to treat negative numbers as a single token instead of a unary minus expression.

Then there’s Java. In Java, while the decimal integer -1 is a unary expression, it allows you to write two’s-complement representation for negative integers in hex, oct and binary. So, 0xffffffff is a literal, not an expression, which represents -1.

Bottom line is, handling of negative numbers may vary between languages depending on their language design choice and it’d be better to tread with caution.

I’ve added all the test codes in this github repo. To add other languages, please create a pull request.

References