Represent 64-bit Unsigned Integer in PHP
Recently, while implementing a CRC-64 checksum for cloud file existence checks (to avoid redundant uploads), I encountered an interesting issue with large hexadecimal values in PHP. My usual approach involves a precomputed table for faster calculations, and the polynomial result is represented in hexadecimal. However, I noticed unexpected behavior when dealing with specific values.
For example, the hexadecimal value 0x995DC9BBDF1939FA
should correspond to
the decimal value 11051210869376104954
. However, when dumped in PHP, it’s
displayed as a floating-point number:
% php -r "var_dump(0x995DC9BBDF1939FA);"
float(1.1051210869376104E+19)
This is unexpected and undesirable for checksum calculations. Upon investigation, I confirmed that the value is within the bounds of a 64-bit integer:
2^64 > 11051210869376104954
True
The issue arises because, despite running on a 64-bit system, PHP’s maximum
signed integer value PHP_INT_MAX
is smaller:
% php -r "var_dump(PHP_INT_MAX);"
int(9223372036854775807)
This limitation stems from PHP’s lack of an unsigned integer data type. It uses
int
(signed) and float
for numerical representation.
To understand this better, let’s examine PHP_INT_MAX
in binary:
01111111_11111111_11111111_11111111_11111111_11111111_11111111_11111111 (2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63 bits
The most significant bit (MSB) represents the sign (0 for positive, 1 for
negative). In PHP_INT_MAX
, the MSB is 0, and the remaining bits are set,
effectively representing 2^63
. When a number exceeds PHP_INT_MAX
, PHP
automatically converts it to a float.
So, how can we accurately represent this large number in PHP?
Solutions
Handling High and Low Bits Separately
We can split the 64-bit number into two 32-bit parts (high and low) and reconstruct it during runtime:
(hex) 0x995DC9BBDF1939FA
(dec) 10011001_01011101_11001001_10111011_11011111_00011001_00111001_11111010
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
high bits low bits
To extract the high bits, we right-shift the number by 32 bits. For the low
bits, we apply a bitmask 0xFFFFFFFF
using the bitwise AND operator:
<?php
$high = 0x995DC9BB;
$low = 0xDF1939FA;
We can then reconstruct the original value by left-shifting the high bits by 32 bits and combining them with the low bits using the bitwise OR operator:
<?php
$number = $high << 32 | $low; // 0x995DC9BBDF1939FA
Illustration
(H) 10011001_01011101_11001001_10111011 = 0x995DC9BB
<< 32 bits
= -----------------------------------------------------------------------
10011001_01011101_11001001_10111011_00000000_00000000_00000000_00000000
OR
(L) 11011111_00011001_00111001_11111010 = 0xDF1939FA
= -----------------------------------------------------------------------
10011001_01011101_11001001_10111011_11011111_00011001_00111001_11111010 = 0x995DC9BBDF1939FA
Manual Hexadecimal to Decimal Conversion
Another approach involves manually converting the hexadecimal string to
decimal, bypassing PHP’s implicit conversion. We can achieve this using
the pack
and unpack
functions:
<?php
// 0x995DC9BBDF1939FA
$number = unpack(
'J', pack('H*', '995DC9BBDF1939FA')
);
We use pack with the format H*
to encode the hexadecimal string into a binary
string. Then, unpack with format J
decodes it as a 64-bit unsigned integer.
Refer to the PHP documentation
for a complete description of format codes.
Performance Comparison
A quick benchmark comparing the two methods over 1 million iterations reveals a significant performance difference:
<?php
// helper
// ===================================================
function benchmark(callable $callback): int
{
$start = (int) (microtime(true) * 1_000);
$callback();
$end = (int) (microtime(true) * 1_000);
return $end - $start;
}
// pack && unpack
// ===================================================
$elapsed = benchmark(function () {
for ($i = 0; $i < 1_000_000; $i++) {
unpack(
'J', pack('H*', '995DC9BBDF1939FA')
);
}
});
printf('pack & unpack: %dms'.PHP_EOL, $elapsed);
// bitwise
// ===================================================
$elapsed = benchmark(function () {
for ($i = 0; $i < 1_000_000; $i++) {
0x995DC9BB << 32 | 0xDF1939FA;
}
});
printf('bitwise : %dms'.PHP_EOL, $elapsed);
On my 2019 Intel iMac, the bitwise approach took 5ms, while the pack
/unpack
method took 133ms – a 26x difference. This difference in performance is not
surprising. The functions involve several layers of overhead,
format checking,
parsing the hexadecimal string,
and ultimately, they also rely on bitwise operations
internally
to represent the unsigned integer. These extra steps contribute to the increased execution time.
Given that I’m dealing with some files around 50MiB, performance is crucial, making the bitwise approach the preferred choice.