Some curiosities about speeddifferences for various allocationmethods

8 posts / 0 new

Last post

Mon, 2026-02-16 14:16

OldFart

Offline

Last seen: 2 months 1 week ago

Joined: 2010-11-30 14:09

Some curiosities about speeddifferences for various allocationmethods

Hi,

Just out of sheer curiosity I decided to build a little proggy to see what the differences would be in speed for the various available ways we can allocate memory. For this test I decided to, per method, build a list with 65536 nodes and subsequently deallocate every node again and that 16 times in succession. The results were quite 'staggering' to say the least (according to these results):

INFO : Started
INFO : -Iteration [ 0 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 1 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 2 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 3 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 4 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 5 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 6 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 7 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 8 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 9 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 10 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 11 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 12 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 13 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 14 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
INFO : -Iteration [ 15 ]
INFO : -- malloc
INFO : -- AllocVec
INFO : -- AllocVecPooled nice
INFO : -- AllocVecPooled brute
INFO : -- ItemPoolAlloc nice
INFO : -- ItemPoolAlloc brute
 
     : Method           Start           Build           Remove          Done            End                Total
     : malloc         [ 53184056 ] -- [ 53184109 ] -- [ 53365143 ] -- [ 53595826 ] -- [ 53595845 ] == [   411789 ]
     : AllocVec       [ 53623861 ] -- [ 53623898 ] -- [ 54376149 ] -- [ 54939848 ] -- [ 54939870 ] == [  1316009 ]
     : MemPool  nice  [ 54969963 ] -- [ 54970181 ] -- [ 56333899 ] -- [ 57580586 ] -- [ 57580748 ] == [  2610785 ]
     : MemPool  brute [ 57614524 ] -- [ 57614703 ] -- [ 58976719 ] -- [ 58976737 ] -- [ 59514870 ] == [  1900346 ]
     : ItemPool nice  [ 59547492 ] -- [ 59547733 ] -- [ 59742363 ] -- [ 59826589 ] -- [ 59869667 ] == [   322175 ]
     : ItemPool brute [ 60219450 ] -- [ 60400551 ] -- [ 60642973 ] -- [ 60642993 ] -- [ 60673755 ] == [   454305 ]
INFO : Finished

The results for both variants of 'ItemPool' are surprising: deallocating all nodes on a one by one base, the nice way, seems to be faster then the 'brute' way, the latter one taking around 40% MORE time to complete, 454305 for brute vs 322175 for noce.

The program Is this one:

/*
**
** main.c - AllocationSpeed
**
**
** Jan 23, 2026: First incarnation
**************************************/
#include <stdlib.h>
#include <time.h>
 
#include <proto/dos.h>
#include <proto/exec.h>
 
void main(void)
{
 clock_t TimeArray[5] [6] = {0};
 uint16 AlloCount;
 
 struct List *L;
 struct Node *N,
             *R;
 
 IDOS->Printf("INFO : Started\n");
 
 for (uint16 i = 0; i < 16; i++)
  {
   IDOS->Printf("INFO : -Iteration [ %ld ]\n", i);
 
 
   IDOS->Printf("INFO : -- malloc\n");
    {
     TimeArray[0][0] += clock();
     AlloCount = 0;
 
     if (L = malloc(sizeof(struct List)))
      {
       IExec->NewList(L);
 
       TimeArray[0][1] += clock();
 
       do
        {
         if (N = malloc(sizeof(struct Node)))
          {
           IExec->AddTail(L, N);
          }
        }
       while (++AlloCount);
 
     TimeArray[0][2] += clock();
       N = IExec->GetHead(L);
 
       while (N)
        {
         R = N;
         N = IExec->GetSucc(N);
 
         IExec->Remove(R);
         free(R);
        }     
 
     TimeArray[0][3] += clock();
       free(L);
      }
 
     TimeArray[0][4] += clock();
    }
 
   /*
   ** ---------------------------------------------------------------
   */
 
   IDOS->Printf("INFO : -- AllocVec\n");
    {
     TimeArray[1][0] += clock();
     AlloCount = 0;
 
     if (L = IExec->AllocVecTags(sizeof(struct List), TAG_END))
      {
       IExec->NewList(L);
 
       TimeArray[1][1] += clock();
 
       do
        {
         if (N = IExec->AllocVecTags(sizeof(struct Node), TAG_END))
          {
           IExec->AddTail(L, N);
          }
        }
       while (++AlloCount);
 
     TimeArray[1][2] += clock();
       N = IExec->GetHead(L);
 
       while (N)
        {
         R = N;
         N = IExec->GetSucc(N);
 
         IExec->Remove(R);
         IExec->FreeVec(R);
        }     
 
     TimeArray[1][3] += clock();
       IExec->FreeVec(L);
      }
 
     TimeArray[1][4] += clock();
    }
 
   /*
   ** ---------------------------------------------------------------
   */
 
    {
     IDOS->Printf("INFO : -- AllocVecPooled nice\n");
     TimeArray[2][0] += clock();
     AlloCount = 0;
 
     APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
 
     if (MemPool)
      {
       if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
        {
         IExec->NewList(L);
 
       TimeArray[2][1] += clock();
 
         do
          {
           if (N = IExec->AllocVecPooled(MemPool, sizeof(struct Node)))
            {
             IExec->AddTail(L, N);
            }
          }
         while (++AlloCount);
 
     TimeArray[2][2] += clock();
         N = IExec->GetHead(L);
 
         while (N)
          {
           R = N;
           N = IExec->GetSucc(N);
 
           IExec->Remove(R);
           IExec->FreeVecPooled(MemPool, R);
          }
 
     TimeArray[2][3] += clock();
         IExec->FreeVecPooled(MemPool, L);
        }
 
       IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
      }
 
     TimeArray[2][4] += clock();
    }
 
   /*
   ** ---------------------------------------------------------------
   */
 
    {
     IDOS->Printf("INFO : -- AllocVecPooled brute\n");
     TimeArray[3][0] += clock();
     AlloCount = 0;
 
     APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
 
     if (MemPool)
      {
       if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
        {
         IExec->NewList(L);
 
       TimeArray[3][1] += clock();
 
         do
          {
           if (N = IExec->AllocVecPooled(MemPool, sizeof(struct Node)))
            {
             IExec->AddTail(L, N);
            }
          }
         while (++AlloCount);
 
     TimeArray[3][2] += clock();
     TimeArray[3][3] += clock();
         IExec->FreeVecPooled(MemPool, L);
        }
 
       IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
      }
 
     TimeArray[3][4] += clock();
    }
 
   /*
   ** ---------------------------------------------------------------
   */
 
    {
     IDOS->Printf("INFO : -- ItemPoolAlloc nice\n");
     TimeArray[4][0] += clock();
     AlloCount = 0;
 
     APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
 
     if (MemPool)
      {
       if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
        {
         IExec->NewList(L);
 
         APTR ItemPool = IExec->AllocSysObjectTags(ASOT_ITEMPOOL, ASOITEM_MFlags, MEMF_CLEAR
                                                                , ASOITEM_ItemSize, sizeof(struct Node)
                                                                , TAG_END);
 
         if (ItemPool)
          {
           TimeArray[4][1] += clock();
 
           do
            {
             if (N = IExec->ItemPoolAlloc(ItemPool))
              {
               IExec->AddTail(L, N);
              }
            }
           while (++AlloCount);
 
     TimeArray[4][2] += clock();
           N = IExec->GetHead(L);
 
           while (N)
            {
             R = N;
             N = IExec->GetSucc(N);
 
             IExec->Remove(R);
             IExec->ItemPoolFree(ItemPool, R);
            }
 
     TimeArray[4][3] += clock();
           IExec->FreeSysObject(ASOT_ITEMPOOL, ItemPool);
          }
 
         IExec->FreeVecPooled(MemPool, L);
        }
 
       IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
      }
 
     TimeArray[4][4] += clock();
    }
 
   /*
   ** ---------------------------------------------------------------
   */
 
    {
     IDOS->Printf("INFO : -- ItemPoolAlloc brute\n");
     TimeArray[5][0] += clock();
     AlloCount = 0;
 
     APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
 
     if (MemPool)
      {
       if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
        {
         IExec->NewList(L);
 
         APTR ItemPool = IExec->AllocSysObjectTags(ASOT_ITEMPOOL, ASOITEM_MFlags, MEMF_CLEAR
                                                                , ASOITEM_ItemSize, sizeof(struct Node)
                                                                , TAG_END);
 
         if (ItemPool)
          {
           TimeArray[5][1] += clock();
 
           do
            {
             if (N = IExec->ItemPoolAlloc(ItemPool))
              {
               IExec->AddTail(L, N);
              }
            }
           while (++AlloCount);
 
     TimeArray[5][2] += clock();
     TimeArray[5][3] += clock();
           IExec->FreeSysObject(ASOT_ITEMPOOL, ItemPool);
          }
 
         IExec->FreeVecPooled(MemPool, L);
        }
 
       IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
      }
 
     TimeArray[5][4] += clock();
    }
 
  } /* for() */
 
 IDOS->Printf("\n     : Method           Start           Build           Remove          Done            End                Total\n"); 
 IDOS->Printf("     : malloc         [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[0][0], TimeArray[0][1], TimeArray[0][2], TimeArray[0][3], TimeArray[0][4], TimeArray[0][4] - TimeArray[0][0]);
 IDOS->Printf("     : AllocVec       [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[1][0], TimeArray[1][1], TimeArray[1][2], TimeArray[1][3], TimeArray[1][4], TimeArray[1][4] - TimeArray[1][0]);
 IDOS->Printf("     : MemPool  nice  [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[2][0], TimeArray[2][1], TimeArray[2][2], TimeArray[2][3], TimeArray[2][4], TimeArray[2][4] - TimeArray[2][0]);
 IDOS->Printf("     : MemPool  brute [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[3][0], TimeArray[3][1], TimeArray[3][2], TimeArray[3][3], TimeArray[3][4], TimeArray[3][4] - TimeArray[3][0]);
 IDOS->Printf("     : ItemPool nice  [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[4][0], TimeArray[4][1], TimeArray[4][2], TimeArray[4][3], TimeArray[4][4], TimeArray[4][4] - TimeArray[4][0]);
 IDOS->Printf("     : ItemPool brute [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[5][0], TimeArray[5][1], TimeArray[5][2], TimeArray[5][3], TimeArray[5][4], TimeArray[5][4] - TimeArray[5][0]);
 
 IDOS->Printf("INFO : Finished\n");
}

And if you want to try it out for yourself, here's the applied makefile:

PROGNAME = AllocationSpeed
 
DEFINES += -DDEBUG \
           -DPROGNAME=$(PROGNAME)
 
#           -DUSE_ARGOPTS \
#           -DUSE_DATETIME
 
$(PROGNAME) : main.c
	gcc -o $(PROGNAME) main.c $(DEFINES)
	@Echo "==== Done compiling =========="

Surprising, no?

OldFart

Wed, 2026-02-18 15:58

walkero

Offline

Last seen: 3 months 2 weeks ago

Joined: 2009-05-03 16:54

Re: Some curiosities about speeddifferences for various...

Interesting tests. If I read the results correctly, malloc is the second fastest way, and AllocVec is more than three times slower, right?

Wed, 2026-02-18 19:04

(Reply to #2) #3

OldFart

Offline

Last seen: 2 months 1 week ago

Joined: 2010-11-30 14:09

Re: Some curiosities about speeddifferences for various...

@walkero

Interesting tests.

And a bit disturbing at the same time!

     : Method                 Total
     : malloc          == [   411789 ] 100%
     : AllocVec        == [  1316009 ] 312,9%
     : MemPool  nice   == [  2610785 ] 634,0%
     : MemPool  brute  == [  1900346 ] 461,5%
     : ItemPool nice   == [   322175 ]  78,2%
     : ItemPool brute  == [   454305 ] 110,3%

Like you stated: Interesting tests.

OldFart

Sun, 2026-02-22 04:56

(Reply to #3) #4

msteed

Offline

Last seen: 4 months 2 weeks ago

Joined: 2022-01-18 08:34

Re: Some curiosities about speeddifferences for various...

Interesting results indeed, and not what one would expect. However, looking over the code I see some things that might affect the results:

o The pooled allocations use MEMF_CLEAR, while malloc() and AllocVec() do not. It takes time to clear the memory as it's allocated, so you're kind of comparing apples and oranges when comparing cleared against uncleared allocations.

o In the ItemPool tests a MemPool is created for a single List allocation. That's rather inefficient, and the overhead of setting up and freeing the MemPool might to some extent mask the speed of the ItemPool. What do you see if you use AllocVecTags() instead to allocate the List, which is more like the unpooled tests?

o You're depending on default puddle sizes for the pooled memory, which might not be optimal for this particular test, or at least might be different between the MemPool test and the ItemPool test.

o Standard libraries for C tend to do their own memory management, on top of whatever the OS provides. So the results for malloc() will likely depend on which C library was used. You seem to be using the default, which would be newlib. The results may be different for clib2/clib4.

o The presence of virtual memory can make tests like this difficult to do, as there might be a bunch of virtual memory operations going on underneath the allocations you're doing. Do you get similar results if you run the test program multiple times? What if you reverse the order of the individual tests (ItemPool brute first, malloc last)?

o Then there's the effect of the CPU's memory cache on the results, which might vary from one system to another. What Amiga are you running the test on?

Trying to look a little deeper at the results, I used the figures from your test and tried to filter out some of the overhead and look only at the the time required to perform the allocations (the difference between step 2 and step 1), the time required to perform the frees (the difference between step 3 and step 2), as well as the sum of those:

Test         Allocations    Frees       Total
               (2 - 1)     (3 - 2)  (Alloc + Free)
----         -----------   -------    --------
malloc         181034      230683      411717
AllocVec       752251      563699     1315950      
Pool (nice)   1363718     1246687     2610405     
Pool (brute)  1362016         18*     1362034
Item (nice)    194630       84226      278856
Item (brute)   242422         20*      242442

* The time required to free the pools was not recorded, so the time shown is just the time between two calls to clock(). It would be more informative to free the pools between the two calls to clock() for the brute tests.

As noted above, the pooled tests had to clear the memory after allocating it, so naturally that makes them take longer. It's also interesting to note that the allocation time for the "nice" and "brute" ItemPool tests is significantly different, even though the code is exactly the same.

Still, the results make a little more sense. malloc() seems to be quite fast compared to AllocVec(). AllocVecPooled() seems rather slow, but to be fair we'd need to see it without MEMF_CLEAR. ItemPools are quite fast, as expected, and would be even faster without MEMF_CLEAR.

Mon, 2026-02-23 11:26

(Reply to #4) #5

OldFart

Offline

Last seen: 2 months 1 week ago

Joined: 2010-11-30 14:09

Re: Some curiosities about speeddifferences for various...

@msteed

Some valid points you bring to the table.

o The pooled allocations use MEMF_CLEAR, while malloc() and AllocVec() do not. It takes time to clear the memory as it's allocated, so you're kind of comparing apples and oranges when comparing cleared against uncleared allocations.

I tweeked the code for both 'MemPool' and 'ItemPool'-variants into 4 scenario's: 'nice & clean', 'nice, but dirty', 'brute, but clean' and 'brute & dirty', where clean and dirty indicate use of the MEMF_CLEAR-flag

o In the ItemPool tests a MemPool is created for a single List allocation. That's rather inefficient, and the overhead of setting up and freeing the MemPool might to some extent mask the speed of the ItemPool. What do you see if you use AllocVecTags() instead to allocate the List, which is more like the unpooled tests?

As this is only once in a loop, I don't see it as much of a hindrance.

o You're depending on default puddle sizes for the pooled memory, which might not be optimal for this particular test, or at least might be different between the MemPool test and the ItemPool test.

Doesn't the autodocs wriggle you into this?

o Standard libraries for C tend to do their own memory management, on top of whatever the OS provides. So the results for malloc() will likely depend on which C library was used. You seem to be using the default, which would be newlib. The results may be different for clib2/clib4.

Most coders will use newlib, no?

o The presence of virtual memory can make tests like this difficult to do, as there might be a bunch of virtual memory operations going on underneath the allocations you're doing. Do you get similar results if you run the test program multiple times? What if you reverse the order of the individual tests (ItemPool brute first, malloc last)?

During testruns results do not show extreme differences, just a few percentage points, so the results can be seen as very consistent. And for that very reason the outer loop runs 16 times. That may even out some external influences.

o Then there's the effect of the CPU's memory cache on the results, which might vary from one system to another. What Amiga are you running the test on?

System of choice, my only option, is an X5000/20.

In the first published list of results, the final line got mangled somehow and reflected wrong figures.

After tweeking, i got these results:

     : Method   Manners              Start            End               Total
     : malloc                     [  90604206 ] -- [  91018348 ] == [   414142 ] (100,0%)
     : AllocVec                   [  91045739 ] -- [  92375586 ] == [  1329847 ] (321,1%)
     : MemPool  nice, but dirty   [  92403133 ] -- [  94945903 ] == [  2542770 ] (614,0%)
     : MemPool  nice & clean      [  94973884 ] -- [  97580379 ] == [  2606495 ] (629,4%)
     : MemPool  brute & dirty     [  97608573 ] -- [  99426198 ] == [  1817625 ] (438,9%)
     : MemPool  brute, but clean  [  99456353 ] -- [ 101365479 ] == [  1909126 ] (461,0%)
     : ItemPool nice, but dirty   [ 101393885 ] -- [ 101686550 ] == [   292665 ] ( 70,7%)
     : ItemPool nice & clean      [ 101716075 ] -- [ 102043355 ] == [   327280 ] ( 79,0%)
     : ItemPool brute & dirty     [ 102071578 ] -- [ 102275613 ] == [   204035 ] ( 49,3%)
     : ItemPool brute, but clean  [ 102304844 ] -- [ 102539147 ] == [   234303 ] ( 56,6%)

OldFart

Mon, 2026-02-23 13:24

(Reply to #5) #6

walkero

Offline

Last seen: 3 months 2 weeks ago

Joined: 2009-05-03 16:54

Re: Some curiosities about speeddifferences for various...

Most coders will use newlib, no?

Not really. Most people that are porting software from other systems started already using clib4 for some time now.

Do you have the latest code somewhere available. I am curious to test on my systems.

Wed, 2026-02-25 09:19

(Reply to #6) #7

msteed

Offline

Last seen: 4 months 2 weeks ago

Joined: 2022-01-18 08:34

Re: Some curiosities about speeddifferences for various...

In the ItemPool tests a MemPool is created for a single List allocation.

As this is only once in a loop, I don't see it as much of a hindrance.

I don't expect it to make much difference either, but since the result of the tests was not what we expected, I was casting about for something that might not work the way we expected.

You're depending on default puddle sizes for the pooled memory...

Doesn't the autodocs wriggle you into this?

Because you're allocating Nodes, which are rather small, I actually expect the default pool parameters (whatever they are) to work well. But again, I was looking for something that might not work the way we expect.

The 'PoolStat' command should show the parameters for the pool created by the test, but you'd need a special version of the test that paused long enough before deleting the pool that you could run the command.

Most coders will use newlib, no?

For writing native OS4 programs newlib will likely be the choice. But as walkero pointed out, when porting code from other platforms clib4 is a better choice.

During testruns results do not show extreme differences, just a few percentage points, so the results can be seen as very consistent.

Good; that tends to rule out interference from other things.

Your revised test results mostly make sense, with the notable exception of MemoryPools:

o Given the additional memory management done by the C library, malloc() is in effect using memory pools (its own, not those of Exec), so it's not surprising that it's faster than AllocVec() (though the difference is more than I might have guessed).

o It takes longer to clear memory after allocating it, but not very much longer (it doesn't take much time to clear something as small as a Node; the difference would be greater with larger allocations). The faster the allocation itself is, the more significant the slowdown due to clearing the memory becomes.

o Freeing a pool in one operation is significantly faster than freeing all the individual allocations in the pool one at a time.

o ItemPools are the fastest, as you'd expect given their limitations.

The real mystery is why MemoryPools are so slow. Faster allocation is supposed to be one of the main reasons to use MemoryPools, but they actually seem to be twice as slow as just using AllocVec(). And as you can see if you run 'PoolStat' after the system has been active for a while, MemoryPools get used a lot.

Maybe the memory manager in OS4 has been improved to the point that it's now faster than MemoryPools? Though the OS4 Wiki background section on MemoryPools seems to suggest that they were enhanced in OS4 as well.

Do you have the latest code somewhere available. I am curious to test on my systems.

I'd be curious to try it on my X1000 too.

Sat, 2026-02-28 04:27

(Reply to #7) #8

msteed

Offline

Last seen: 4 months 2 weeks ago

Joined: 2022-01-18 08:34

Re: Some curiosities about speeddifferences for various...

Thanks to OldFart for sending me a copy of the program so I could run it on my X1000. Here's what I got (stripping out all the intermediate times to make it easier to read):

Method          Manners               Total
malloc                             [  606460 ] (100,0%)
AllocVec                           [ 1776791 ] (293,0%)
AllocVecPooled  nice, but dirty    [ 3733219 ] (615,6%)
AllocVecPooled  nice & clean       [ 3937894 ] (649,3%)
AllocVecPooled  brute & dirty      [ 2312661 ] (381,3%)
AllocVecPooled  brute, but clean   [ 2493997 ] (411,2%)
ItemPooled      nice, but dirty    [  365128 ] ( 60,2%)
ItemPooled      nice & clean       [  397714 ] ( 65,6%)
ItemPooled      brute & dirty      [  286676 ] ( 47,3%)
ItemPooled      brute, but clean   [  296692 ] ( 48,9%)

Since all the percentages are relative to malloc(), the fact that malloc() is somewhat slower relative to the other methods on the X1000 makes the percentages a little lower for most of the other methods than on the X5000. But the overall trend is the same: AllocVecPooled() is roughly six times slower than malloc() and twice as slow as AllocVec(), and ItemPoolAlloc() is roughly twice as fast as malloc(), six times as fast as AllocVec(), and a whopping 8 to 10 times as fast as AllocVecPooled(). So whatever is going on with MemoryPools that makes them so slow is consistent, happening on both the X5000 and the X1000.

I couldn't resist comparing the speed of the X1000 to that of the X5000, using OldFart's results for the X5000 figures. Here the percentage figure is how much longer the X1000 takes to run the test (assuming that clock() is measuring the same thing on both machines, which is supposed to be the case).

Method          Manners            X5000/20     X1000
malloc                           [  414142 ] [  606460 ] (146%)
AllocVec                         [ 1329847 ] [ 1776791 ] (134%)
AllocVecPooled  nice, but dirty  [ 2542770 ] [ 3733219 ] (147%)
AllocVecPooled  nice & clean     [ 2606495 ] [ 3937894 ] (151%)
AllocVecPooled  brute & dirty    [ 1817625 ] [ 2312661 ] (127%)
AllocVecPooled  brute, but clean [ 1909126 ] [ 2493997 ] (131%)
ItemPooled      nice, but dirty  [  292665 ] [  365128 ] (125%)
ItemPooled      nice & clean     [  327280 ] [  397714 ] (122%)
ItemPooled      brute & dirty    [  204035 ] [  286676 ] (141%)
ItemPooled      brute, but clean [  234303 ] [  296692 ] (127%)

As in past benchmarks I've seen, the difference -- the X1000 is 20% to 50% slower -- is greater than the difference in clock speed between the two machines.

Search form

User login