NAME

Data::HashMap - Fast type-specialized hash maps implemented in C

SYNOPSIS

use Data::HashMap::II;

# Keyword API (fastest - bypasses method dispatch)
my $map = Data::HashMap::II->new();
hm_ii_put $map, 42, 100;
my $val = hm_ii_get $map, 42;    # 100
hm_ii_exists $map, 42;           # true
hm_ii_remove $map, 42;

# Method API (convenient - same operations)
$map->put(42, 100);
my $val = $map->get(42);         # 100
$map->exists(42);                # true
$map->remove(42);

# Counter operations (integer-value variants only)
my $count = hm_ii_incr $map, 1;  # 1
$count = hm_ii_incr $map, 1;     # 2
$count = hm_ii_decr $map, 1;     # 1

# Iteration
my @keys   = hm_ii_keys $map;
my @values = hm_ii_values $map;
my @pairs  = hm_ii_items $map;   # (k1, v1, k2, v2, ...)
while (my ($k, $v) = hm_ii_each $map) { ... }

# Bulk operations
my $href = hm_ii_to_hash $map;   # Perl hashref snapshot
hm_ii_clear $map;                # remove all entries

# LRU cache (max 1000 entries, evicts least-recently-used)
my $lru = Data::HashMap::II->new(1000);

# TTL cache (entries expire after 60 seconds)
my $ttl = Data::HashMap::II->new(0, 60);

# LRU + TTL combined
my $both = Data::HashMap::II->new(1000, 60);

# Per-key TTL
hm_ii_put_ttl $map, 42, 100, 30;   # expires in 30 seconds

# Get or set default
my $v = hm_ii_get_or_set $map, 99, 0;  # insert 0 if key 99 absent

DESCRIPTION

Data::HashMap provides fourteen type-specialized hash map implementations in C, each optimized for its specific key/value type combination. All data access uses keyword syntax, which bypasses Perl's method dispatch for maximum performance. A method-call API ($map->get($key)) is also available for convenience.

Keywords are automatically enabled when you use a variant module.

VARIANTS

Data::HashMap::I16 - int16 keys, int16 values (4 bytes/entry)
Data::HashMap::I16A - int16 keys, any Perl SV* values (refs, objects, etc.)
Data::HashMap::I16S - int16 keys, string values
Data::HashMap::I32 - int32 keys, int32 values (8 bytes/entry)
Data::HashMap::I32A - int32 keys, any Perl SV* values (refs, objects, etc.)
Data::HashMap::I32S - int32 keys, string values
Data::HashMap::II - int64 keys, int64 values (16 bytes/entry)
Data::HashMap::IA - int64 keys, any Perl SV* values (refs, objects, etc.)
Data::HashMap::IS - int64 keys, string values
Data::HashMap::SA - string keys, any Perl SV* values (refs, objects, etc.)
Data::HashMap::SI16 - string keys, int16 values
Data::HashMap::SI32 - string keys, int32 values
Data::HashMap::SI - string keys, int64 values
Data::HashMap::SS - string keys, string values

KEYWORDS

Each variant provides the following keywords (replace xx with the variant prefix: i16, i16a, i16s, i32, i32a, i32s, ia, ii, is, sa, si16, si32, si, ss):

hm_xx_put $map, $key, $value    # insert/update, returns bool
hm_xx_get $map, $key            # lookup, returns value or undef
hm_xx_exists $map, $key         # returns bool
hm_xx_remove $map, $key         # returns bool
hm_xx_size $map                 # returns entry count (includes TTL-expired not yet reaped)

# List-returning:
hm_xx_keys $map                 # returns list of keys
hm_xx_values $map               # returns list of values
hm_xx_items $map                # returns (k1,v1, k2,v2, ...)

Integer-value variants (I16, I32, II, SI16, SI32, SI) also provide:

hm_xx_incr $map, $key           # +1, returns new value (new keys init to 0)
hm_xx_decr $map, $key           # -1, returns new value (new keys init to 0)
hm_xx_incr_by $map, $key, $n    # +N, returns new value (new keys init to 0)

All variants also provide:

hm_xx_max_size $map             # returns max_size (0 = no LRU)
hm_xx_ttl $map                  # returns default TTL in seconds (0 = no TTL)
hm_xx_clear $map                # remove all entries
hm_xx_to_hash $map              # returns a Perl hashref snapshot
hm_xx_each $map                            # returns (key, value) or empty list
hm_xx_iter_reset $map                      # reset each() iterator to start
hm_xx_put_ttl $map, $key, $val, $seconds   # insert with per-key TTL ($seconds=0 uses map default)
hm_xx_get_or_set $map, $key, $default      # get existing or insert default

String-value variants (SS, IS, I32S, I16S) also provide:

hm_xx_get_direct $map, $key   # zero-copy get (read-only, see CAVEATS)

CONSTRUCTOR

my $map  = Data::HashMap::II->new();           # plain (no LRU, no TTL)
my $lru  = Data::HashMap::II->new(1000);       # LRU: max 1000 entries
my $ttl  = Data::HashMap::II->new(0, 60);      # TTL: 60-second expiry
my $both = Data::HashMap::II->new(1000, 60);   # LRU + TTL

LRU eviction

When max_size is set, the map acts as an LRU cache. Inserting beyond capacity evicts the least-recently-used entry. get, put (update), and counter operations promote the accessed entry to most-recently-used.

TTL expiry

When default_ttl is set (in seconds), entries expire lazily: expired entries are removed on access (get, exists, counter ops) and skipped during iteration (keys, values, items, each, to_hash). Note that hm_xx_size returns the count of all inserted entries including those past their TTL that have not yet been lazily removed. exists does not promote entries in LRU mode (read-only check). get_or_set inserts with the map's default TTL; per-key TTL is not supported via get_or_set (use put_ttl for that).

Individual entries can also be given a per-key TTL via hm_xx_put_ttl or $map->put_ttl($key, $value, $seconds), even on maps created without a default TTL. The expires array is lazily allocated on first use.

Both features use parallel arrays indexed by slot, keeping the core node struct unchanged. Maps created without LRU/TTL have zero overhead beyond a never-taken branch.

PERFORMANCE

Benchmarks with 100k entries on Linux x86_64 (higher is better):

INSERT (iterations/sec):
          Rate perl_ss perl_ii    SA    SS  I32A    IA  SI32   SI I32S   IS   II  I32 I16A I16S SI16  I16
perl_ss 16.1/s      --     -3%  -15%  -22%  -45%  -47%  -53% -54% -55% -58% -81% -82% -87% -87% -88% -96%
perl_ii 16.7/s      3%      --  -13%  -19%  -44%  -45%  -51% -53% -54% -57% -80% -81% -87% -87% -87% -96%
SA      19.1/s     18%     15%    --   -7%  -35%  -37%  -44% -46% -47% -50% -77% -78% -85% -85% -85% -95%
SS      20.6/s     28%     24%    8%    --  -30%  -33%  -40% -42% -43% -47% -75% -77% -84% -84% -84% -95%
I32A    29.6/s     83%     77%   55%   44%    --   -3%  -14% -16% -18% -23% -64% -67% -77% -77% -78% -92%
IA      30.5/s     89%     83%   60%   48%    3%    --  -11% -14% -15% -21% -63% -65% -76% -76% -77% -92%
SI32    34.3/s    113%    106%   80%   67%   16%   12%    --  -3%  -5% -11% -59% -61% -73% -73% -74% -91%
SI      35.3/s    119%    112%   85%   71%   19%   16%    3%   --  -2%  -8% -58% -60% -72% -73% -73% -91%
I32S    36.1/s    123%    116%   89%   75%   22%   18%    5%   2%   --  -6% -57% -59% -72% -72% -73% -91%
IS      38.5/s    138%    131%  102%   87%   30%   26%   12%   9%   7%   -- -54% -56% -70% -70% -71% -90%
II      83.1/s    414%    398%  335%  303%  181%  172%  142% 135% 130% 116%   --  -6% -35% -36% -37% -78%
I32     88.3/s    447%    430%  362%  329%  199%  189%  157% 150% 145% 129%   6%   -- -31% -32% -33% -77%
I16A     128/s    690%    666%  568%  519%  332%  318%  272% 262% 254% 231%  54%  45%   --  -1%  -3% -67%
I16S     129/s    700%    675%  577%  527%  337%  323%  276% 266% 258% 235%  55%  46%   1%   --  -2% -66%
SI16     131/s    714%    689%  589%  538%  345%  331%  283% 273% 264% 241%  58%  49%   3%   2%   -- -66%
I16      383/s   2270%   2197% 1905% 1758% 1195% 1154% 1015% 985% 961% 894% 361% 334% 200% 196% 191%   --

LOOKUP (iterations/sec):
            Rate    SS perl_ii SS_direct    SA perl_ss  SI32   SI   IS I32S IS_direct I32A   IA   II  I32 SI16 I16A I16S  I16
SS        29.6/s    --     -4%       -5%   -8%    -10%  -15% -22% -38% -44%      -47% -47% -49% -69% -73% -83% -87% -89% -93%
perl_ii   31.0/s    5%      --       -1%   -3%     -6%  -11% -18% -35% -42%      -44% -44% -46% -67% -72% -82% -86% -88% -92%
SS_direct 31.2/s    5%      1%        --   -3%     -6%  -10% -18% -34% -42%      -44% -44% -46% -67% -72% -82% -86% -88% -92%
SA        32.0/s    8%      3%        3%    --     -3%   -8% -15% -32% -40%      -42% -43% -44% -66% -71% -81% -85% -88% -92%
perl_ss   33.0/s   12%      7%        6%    3%      --   -5% -13% -30% -38%      -41% -41% -43% -65% -70% -81% -85% -87% -92%
SI32      34.7/s   17%     12%       11%    8%      5%    --  -8% -27% -35%      -37% -38% -40% -63% -69% -80% -84% -87% -92%
SI        37.8/s   28%     22%       21%   18%     15%    9%   -- -20% -29%      -32% -32% -34% -60% -66% -78% -83% -85% -91%
IS        47.4/s   60%     53%       52%   48%     44%   37%  25%   -- -11%      -15% -15% -18% -50% -57% -72% -79% -82% -88%
I32S      53.3/s   80%     72%       71%   66%     61%   54%  41%  12%   --       -4%  -4%  -8% -43% -52% -69% -76% -79% -87%
IS_direct 55.5/s   87%     79%       78%   73%     68%   60%  47%  17%   4%        --  -1%  -4% -41% -50% -67% -75% -79% -86%
I32A      55.8/s   88%     80%       79%   74%     69%   61%  47%  18%   5%        1%   --  -3% -41% -49% -67% -75% -78% -86%
IA        57.6/s   95%     86%       85%   80%     75%   66%  52%  22%   8%        4%   3%   -- -39% -48% -66% -74% -78% -86%
II        94.1/s  218%    204%      202%  194%    185%  171% 149%  99%  77%       70%  69%  63%   -- -15% -44% -57% -64% -77%
I32        110/s  273%    256%      254%  244%    234%  218% 192% 133% 107%       99%  98%  91%  17%   -- -35% -50% -57% -73%
SI16       170/s  473%    448%      444%  429%    414%  389% 348% 258% 218%      206% 204% 194%  80%  54%   -- -23% -34% -58%
I16A       221/s  645%    612%      608%  588%    568%  536% 483% 365% 314%      297% 295% 283% 134% 100%  30%   -- -15% -46%
I16S       259/s  774%    735%      730%  707%    684%  645% 584% 446% 385%      366% 364% 349% 175% 135%  52%  17%   -- -37%
I16        409/s 1281%   1219%     1211% 1175%   1138% 1078% 981% 762% 667%      636% 633% 609% 334% 271% 141%  85%  58%   --

ITERATE with each() (iterations/sec):
          Rate perl_ss perl_ii   SA   SS I32A   IA   IS   SI   II  I32 I16A  I16
perl_ss 37.7/s      --     -3% -12% -29% -36% -38% -47% -48% -62% -63% -88% -89%
perl_ii 38.9/s      3%      --  -9% -27% -34% -36% -45% -46% -61% -62% -88% -89%
SA      42.8/s     13%     10%   -- -19% -27% -30% -40% -41% -57% -58% -87% -88%
SS      52.9/s     40%     36%  24%   -- -10% -13% -25% -27% -47% -48% -84% -85%
I32A    59.0/s     56%     52%  38%  11%   --  -3% -17% -18% -41% -42% -82% -84%
IA      61.0/s     62%     57%  43%  15%   3%   -- -14% -15% -39% -40% -81% -83%
IS      71.0/s     88%     83%  66%  34%  20%  16%   --  -2% -29% -30% -78% -80%
SI      72.1/s     91%     85%  69%  36%  22%  18%   2%   -- -28% -29% -78% -80%
II       100/s    166%    158% 135%  90%  70%  65%  41%  39%   --  -2% -69% -72%
I32      102/s    170%    162% 138%  93%  73%  67%  43%  41%   2%   -- -68% -72%
I16A     323/s    757%    731% 655% 510% 448% 430% 355% 348% 222% 217%   -- -10%
I16      358/s    849%    820% 737% 576% 507% 487% 404% 396% 257% 251%  11%   --

ITERATE with keys() (iterations/sec):
          Rate perl_ss perl_ii      SS      SI      II     I32
perl_ss 94.6/s      --     -2%    -31%    -43%    -75%    -76%
perl_ii 96.9/s      2%      --    -29%    -41%    -75%    -76%
SS       136/s     44%     41%      --    -17%    -64%    -66%
SI       165/s     74%     70%     21%      --    -57%    -59%
II       383/s    305%    295%    181%    132%      --     -4%
I32      400/s    323%    313%    193%    143%      4%      --

ITERATE with items() vs each-in-loop (iterations/sec):
            Rate perl_each   perl_kv   SS_each  SS_items   II_each  II_items
perl_each 39.3/s        --       -7%      -24%      -49%      -59%      -82%
perl_kv   42.4/s        8%        --      -18%      -45%      -56%      -80%
SS_each   51.5/s       31%       22%        --      -33%      -47%      -76%
SS_items  76.9/s       96%       81%       49%        --      -21%      -64%
II_each   96.8/s      146%      128%       88%       26%        --      -55%
II_items   215/s      448%      408%      318%      180%      123%        --

INCREMENT (iterations/sec):
          Rate perl_ss perl_ii    SI32      SI      II     I32    SI16     I16
perl_ss 28.4/s      --    -10%    -21%    -22%    -67%    -70%    -78%    -92%
perl_ii 31.5/s     11%      --    -12%    -13%    -63%    -66%    -76%    -91%
SI32    35.9/s     26%     14%      --     -1%    -58%    -62%    -72%    -90%
SI      36.2/s     27%     15%      1%      --    -58%    -61%    -72%    -89%
II      85.9/s    202%    173%    139%    137%      --     -9%    -34%    -75%
I32     94.0/s    231%    198%    162%    160%      9%      --    -27%    -73%
SI16     130/s    356%    312%    261%    258%     51%     38%      --    -62%
I16      343/s   1109%    990%    857%    849%    300%    265%    165%      --

DELETE (iterations/sec):
          Rate    SS    SA SI32 perl_ss   SI perl_ii   IS I32S   IA I32A   II  I32 SI16 I16A I16S  I16
SS      11.9/s    --   -1% -24%    -28% -29%    -34% -37% -39% -44% -44% -68% -75% -85% -86% -87% -93%
SA      12.1/s    1%    -- -23%    -27% -28%    -33% -36% -38% -43% -43% -67% -75% -85% -85% -87% -93%
SI32    15.7/s   32%   30%   --     -5%  -6%    -13% -17% -20% -26% -26% -57% -67% -80% -81% -82% -91%
perl_ss 16.6/s   40%   38%   6%      --  -0%     -8% -12% -15% -22% -22% -55% -66% -79% -80% -81% -90%
SI      16.7/s   40%   38%   6%      0%   --     -8% -12% -15% -21% -22% -55% -66% -79% -80% -81% -90%
perl_ii 18.1/s   52%   50%  15%      9%   9%      --  -4%  -8% -15% -15% -51% -62% -77% -78% -80% -89%
IS      18.9/s   59%   57%  20%     14%  13%      4%   --  -4% -11% -11% -49% -61% -76% -77% -79% -89%
I32S    19.6/s   65%   63%  25%     18%  18%      8%   4%   --  -8%  -8% -47% -59% -75% -76% -78% -89%
IA      21.2/s   78%   76%  35%     28%  27%     17%  12%   8%   --  -1% -42% -56% -73% -74% -76% -88%
I32A    21.3/s   79%   77%  36%     28%  28%     18%  13%   9%   1%   -- -42% -56% -73% -74% -76% -88%
II      36.9/s  210%  206% 135%    122% 121%    104%  95%  88%  74%  73%   -- -24% -53% -55% -59% -79%
I32     48.3/s  306%  301% 208%    191% 190%    167% 156% 147% 128% 127%  31%   -- -38% -41% -46% -72%
SI16    78.0/s  555%  546% 396%    369% 368%    330% 313% 298% 267% 266% 111%  61%   --  -5% -13% -55%
I16A    82.1/s  590%  581% 423%    394% 393%    353% 335% 319% 287% 285% 123%  70%   5%   --  -8% -52%
I16S    89.5/s  652%  642% 470%    439% 437%    394% 374% 357% 322% 320% 143%  85%  15%   9%   -- -48%
I16      172/s 1348% 1328% 997%    937% 934%    850% 812% 779% 712% 708% 367% 256% 121% 110%  92%   --

LRU / TTL overhead

LRU and TTL add parallel arrays for linked-list pointers and expiry timestamps. Maps created without these features have zero overhead beyond a never-taken branch.

INSERT, II variant (iterations/sec):
             Rate     II_lru II_lru_ttl         II
II_lru     59.5/s         --        -2%       -13%
II_lru_ttl 60.7/s         2%         --       -11%
II         68.5/s        15%        13%         --

LOOKUP, II variant (iterations/sec):
             Rate     II_lru II_lru_ttl         II
II_lru     70.3/s         --        -2%       -19%
II_lru_ttl 71.6/s         2%         --       -18%
II         87.2/s        24%        22%         --

LRU EVICTION CHURN: insert 100k into capacity 50k (iterations/sec):
             Rate II_lru_ttl     II_lru
II_lru_ttl 71.9/s         --        -5%
II_lru     76.0/s         6%         --

Method vs keyword overhead

Keywords bypass Perl's method dispatch for maximum performance. Method calls ($map->get($key)) are convenient but slower:

II variant, 100k operations (iterations/sec):
                keyword    method    overhead
LOOKUP           101/s    81.4/s       -20%
INSERT          82.8/s    72.3/s       -13%

MEMORY

Memory usage with 1M entries (fork-isolated measurements):

Variant       Memory       Bytes/entry   vs Perl hash
-------       ------       -----------   ------------
I16*           0.5 MB        16            10x less
I32            28 MB         30            5.5x less
II             44 MB         46            3.5x less
I32S           72 MB         75            2.2x less
IS             72 MB         75            2.2x less
SI16           72 MB         75            2.2x less
SI32           72 MB         75            2.2x less
SI             72 MB         75            2.2x less
I16A           0.5 MB        16            10x less
I32A           90 MB         95            1.7x less
IA             90 MB         95            1.7x less
SS            118 MB        124            1.3x less
SA            137 MB        144            1.1x less
perl %h (int) 155 MB        163            (baseline)
perl %h (str) 162 MB        170            (baseline)

* I16/I16A measured at 30k entries (int16 key range limits max unique keys to ~65k)

LRU / TTL memory overhead

Overhead per entry for LRU (prev/next pointers) and TTL (expiry timestamp), measured with 1M entries in fork-isolated processes.

II variant (int64/int64):
Variant        Bytes/entry   LRU overhead   +TTL overhead
-------        -----------   ------------   -------------
II                    46.4       -              -
II_lru                67.3     +20.9 B          -
II_lru_ttl            79.9       -            +12.6 B

SS variant (string/string, various key+value sizes):
Variant        Bytes/entry   LRU overhead   +TTL overhead
-------        -----------   ------------   -------------
SS  8B keys          123.9       -              -
SS  8B lru           140.7     +16.8 B          -
SS  8B lru+ttl       152.2       -            +11.5 B
SS 16B keys          123.9       -              -
SS 16B lru           140.7     +16.8 B          -
SS 16B lru+ttl       152.2       -            +11.5 B
SS 32B keys          155.9       -              -
SS 32B lru           172.7     +16.8 B          -
SS 32B lru+ttl       181.1       -             +8.4 B
SS 64B keys          220.0       -              -
SS 64B lru           236.7     +16.8 B          -
SS 64B lru+ttl       245.1       -             +8.4 B

IMPLEMENTATION

  • Open addressing with linear probing

  • xxHash v0.8.3 (XXH3_64bits) hash functions for both integer and string keys

  • Automatic resize at 75% load factor

  • Tombstone deletion with automatic compaction

  • Raw C strings (no Perl SV overhead) for string storage

  • UTF-8 flag packed into high bit of length fields

  • Sentinel values for integer keys (INT_MIN, INT_MIN+1 are reserved and silently rejected)

  • each() iterator resets if put/remove/incr/get_or_set triggers a resize or compaction; do not mutate the map during each() iteration

  • Requires 64-bit Perl (use64bitint); II/IS/IA/SI variants use int64 keys/values via IV

  • get_direct returns a read-only SV pointing at the map's internal buffer (zero-copy, no malloc). The returned value must not be held past any map mutation (put, remove, clear, or any operation that may resize). Safe for immediate use: comparisons, printing, passing to functions. Use get (the default) when you need to store the value.

DEPENDENCIES

XS::Parse::Keyword (>= 0.40)

AUTHOR

vividsnow

LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.