Some curiosities about speeddifferences for various allocationmethods

8 posts / 0 new
Last post
OldFart
OldFart's picture
Offline
Last seen: 4 hours 20 min ago
Joined: 2010-11-30 14:09
Some curiosities about speeddifferences for various allocationmethods

Hi,

Just out of sheer curiosity I decided to build a little proggy to see what the differences would be in speed for the various available ways we can allocate memory. For this test I decided to, per method, build a list with 65536 nodes and subsequently deallocate every node again and that 16 times in succession. The results were quite 'staggering' to say the least (according to these results):

  1. INFO : Started
  2. INFO : -Iteration [ 0 ]
  3. INFO : -- malloc
  4. INFO : -- AllocVec
  5. INFO : -- AllocVecPooled nice
  6. INFO : -- AllocVecPooled brute
  7. INFO : -- ItemPoolAlloc nice
  8. INFO : -- ItemPoolAlloc brute
  9. INFO : -Iteration [ 1 ]
  10. INFO : -- malloc
  11. INFO : -- AllocVec
  12. INFO : -- AllocVecPooled nice
  13. INFO : -- AllocVecPooled brute
  14. INFO : -- ItemPoolAlloc nice
  15. INFO : -- ItemPoolAlloc brute
  16. INFO : -Iteration [ 2 ]
  17. INFO : -- malloc
  18. INFO : -- AllocVec
  19. INFO : -- AllocVecPooled nice
  20. INFO : -- AllocVecPooled brute
  21. INFO : -- ItemPoolAlloc nice
  22. INFO : -- ItemPoolAlloc brute
  23. INFO : -Iteration [ 3 ]
  24. INFO : -- malloc
  25. INFO : -- AllocVec
  26. INFO : -- AllocVecPooled nice
  27. INFO : -- AllocVecPooled brute
  28. INFO : -- ItemPoolAlloc nice
  29. INFO : -- ItemPoolAlloc brute
  30. INFO : -Iteration [ 4 ]
  31. INFO : -- malloc
  32. INFO : -- AllocVec
  33. INFO : -- AllocVecPooled nice
  34. INFO : -- AllocVecPooled brute
  35. INFO : -- ItemPoolAlloc nice
  36. INFO : -- ItemPoolAlloc brute
  37. INFO : -Iteration [ 5 ]
  38. INFO : -- malloc
  39. INFO : -- AllocVec
  40. INFO : -- AllocVecPooled nice
  41. INFO : -- AllocVecPooled brute
  42. INFO : -- ItemPoolAlloc nice
  43. INFO : -- ItemPoolAlloc brute
  44. INFO : -Iteration [ 6 ]
  45. INFO : -- malloc
  46. INFO : -- AllocVec
  47. INFO : -- AllocVecPooled nice
  48. INFO : -- AllocVecPooled brute
  49. INFO : -- ItemPoolAlloc nice
  50. INFO : -- ItemPoolAlloc brute
  51. INFO : -Iteration [ 7 ]
  52. INFO : -- malloc
  53. INFO : -- AllocVec
  54. INFO : -- AllocVecPooled nice
  55. INFO : -- AllocVecPooled brute
  56. INFO : -- ItemPoolAlloc nice
  57. INFO : -- ItemPoolAlloc brute
  58. INFO : -Iteration [ 8 ]
  59. INFO : -- malloc
  60. INFO : -- AllocVec
  61. INFO : -- AllocVecPooled nice
  62. INFO : -- AllocVecPooled brute
  63. INFO : -- ItemPoolAlloc nice
  64. INFO : -- ItemPoolAlloc brute
  65. INFO : -Iteration [ 9 ]
  66. INFO : -- malloc
  67. INFO : -- AllocVec
  68. INFO : -- AllocVecPooled nice
  69. INFO : -- AllocVecPooled brute
  70. INFO : -- ItemPoolAlloc nice
  71. INFO : -- ItemPoolAlloc brute
  72. INFO : -Iteration [ 10 ]
  73. INFO : -- malloc
  74. INFO : -- AllocVec
  75. INFO : -- AllocVecPooled nice
  76. INFO : -- AllocVecPooled brute
  77. INFO : -- ItemPoolAlloc nice
  78. INFO : -- ItemPoolAlloc brute
  79. INFO : -Iteration [ 11 ]
  80. INFO : -- malloc
  81. INFO : -- AllocVec
  82. INFO : -- AllocVecPooled nice
  83. INFO : -- AllocVecPooled brute
  84. INFO : -- ItemPoolAlloc nice
  85. INFO : -- ItemPoolAlloc brute
  86. INFO : -Iteration [ 12 ]
  87. INFO : -- malloc
  88. INFO : -- AllocVec
  89. INFO : -- AllocVecPooled nice
  90. INFO : -- AllocVecPooled brute
  91. INFO : -- ItemPoolAlloc nice
  92. INFO : -- ItemPoolAlloc brute
  93. INFO : -Iteration [ 13 ]
  94. INFO : -- malloc
  95. INFO : -- AllocVec
  96. INFO : -- AllocVecPooled nice
  97. INFO : -- AllocVecPooled brute
  98. INFO : -- ItemPoolAlloc nice
  99. INFO : -- ItemPoolAlloc brute
  100. INFO : -Iteration [ 14 ]
  101. INFO : -- malloc
  102. INFO : -- AllocVec
  103. INFO : -- AllocVecPooled nice
  104. INFO : -- AllocVecPooled brute
  105. INFO : -- ItemPoolAlloc nice
  106. INFO : -- ItemPoolAlloc brute
  107. INFO : -Iteration [ 15 ]
  108. INFO : -- malloc
  109. INFO : -- AllocVec
  110. INFO : -- AllocVecPooled nice
  111. INFO : -- AllocVecPooled brute
  112. INFO : -- ItemPoolAlloc nice
  113. INFO : -- ItemPoolAlloc brute
  114.  
  115. : Method Start Build Remove Done End Total
  116. : malloc [ 53184056 ] -- [ 53184109 ] -- [ 53365143 ] -- [ 53595826 ] -- [ 53595845 ] == [ 411789 ]
  117. : AllocVec [ 53623861 ] -- [ 53623898 ] -- [ 54376149 ] -- [ 54939848 ] -- [ 54939870 ] == [ 1316009 ]
  118. : MemPool nice [ 54969963 ] -- [ 54970181 ] -- [ 56333899 ] -- [ 57580586 ] -- [ 57580748 ] == [ 2610785 ]
  119. : MemPool brute [ 57614524 ] -- [ 57614703 ] -- [ 58976719 ] -- [ 58976737 ] -- [ 59514870 ] == [ 1900346 ]
  120. : ItemPool nice [ 59547492 ] -- [ 59547733 ] -- [ 59742363 ] -- [ 59826589 ] -- [ 59869667 ] == [ 322175 ]
  121. : ItemPool brute [ 60219450 ] -- [ 60400551 ] -- [ 60642973 ] -- [ 60642993 ] -- [ 60673755 ] == [ 454305 ]
  122. INFO : Finished

The results for both variants of 'ItemPool' are surprising: deallocating all nodes on a one by one base, the nice way, seems to be faster then the 'brute' way, the latter one taking around 40% MORE time to complete, 454305 for brute vs 322175 for noce.

The program Is this one:

  1. /*
  2. **
  3. ** main.c - AllocationSpeed
  4. **
  5. **
  6. ** Jan 23, 2026: First incarnation
  7. **************************************/
  8. #include <stdlib.h>
  9. #include <time.h>
  10.  
  11. #include <proto/dos.h>
  12. #include <proto/exec.h>
  13.  
  14. void main(void)
  15. {
  16. clock_t TimeArray[5] [6] = {0};
  17. uint16 AlloCount;
  18.  
  19. struct List *L;
  20. struct Node *N,
  21. *R;
  22.  
  23. IDOS->Printf("INFO : Started\n");
  24.  
  25. for (uint16 i = 0; i < 16; i++)
  26. {
  27. IDOS->Printf("INFO : -Iteration [ %ld ]\n", i);
  28.  
  29.  
  30. IDOS->Printf("INFO : -- malloc\n");
  31. {
  32. TimeArray[0][0] += clock();
  33. AlloCount = 0;
  34.  
  35. if (L = malloc(sizeof(struct List)))
  36. {
  37. IExec->NewList(L);
  38.  
  39. TimeArray[0][1] += clock();
  40.  
  41. do
  42. {
  43. if (N = malloc(sizeof(struct Node)))
  44. {
  45. IExec->AddTail(L, N);
  46. }
  47. }
  48. while (++AlloCount);
  49.  
  50. TimeArray[0][2] += clock();
  51. N = IExec->GetHead(L);
  52.  
  53. while (N)
  54. {
  55. R = N;
  56. N = IExec->GetSucc(N);
  57.  
  58. IExec->Remove(R);
  59. free(R);
  60. }
  61.  
  62. TimeArray[0][3] += clock();
  63. free(L);
  64. }
  65.  
  66. TimeArray[0][4] += clock();
  67. }
  68.  
  69. /*
  70.   ** ---------------------------------------------------------------
  71.   */
  72.  
  73. IDOS->Printf("INFO : -- AllocVec\n");
  74. {
  75. TimeArray[1][0] += clock();
  76. AlloCount = 0;
  77.  
  78. if (L = IExec->AllocVecTags(sizeof(struct List), TAG_END))
  79. {
  80. IExec->NewList(L);
  81.  
  82. TimeArray[1][1] += clock();
  83.  
  84. do
  85. {
  86. if (N = IExec->AllocVecTags(sizeof(struct Node), TAG_END))
  87. {
  88. IExec->AddTail(L, N);
  89. }
  90. }
  91. while (++AlloCount);
  92.  
  93. TimeArray[1][2] += clock();
  94. N = IExec->GetHead(L);
  95.  
  96. while (N)
  97. {
  98. R = N;
  99. N = IExec->GetSucc(N);
  100.  
  101. IExec->Remove(R);
  102. IExec->FreeVec(R);
  103. }
  104.  
  105. TimeArray[1][3] += clock();
  106. IExec->FreeVec(L);
  107. }
  108.  
  109. TimeArray[1][4] += clock();
  110. }
  111.  
  112. /*
  113.   ** ---------------------------------------------------------------
  114.   */
  115.  
  116. {
  117. IDOS->Printf("INFO : -- AllocVecPooled nice\n");
  118. TimeArray[2][0] += clock();
  119. AlloCount = 0;
  120.  
  121. APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
  122.  
  123. if (MemPool)
  124. {
  125. if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
  126. {
  127. IExec->NewList(L);
  128.  
  129. TimeArray[2][1] += clock();
  130.  
  131. do
  132. {
  133. if (N = IExec->AllocVecPooled(MemPool, sizeof(struct Node)))
  134. {
  135. IExec->AddTail(L, N);
  136. }
  137. }
  138. while (++AlloCount);
  139.  
  140. TimeArray[2][2] += clock();
  141. N = IExec->GetHead(L);
  142.  
  143. while (N)
  144. {
  145. R = N;
  146. N = IExec->GetSucc(N);
  147.  
  148. IExec->Remove(R);
  149. IExec->FreeVecPooled(MemPool, R);
  150. }
  151.  
  152. TimeArray[2][3] += clock();
  153. IExec->FreeVecPooled(MemPool, L);
  154. }
  155.  
  156. IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
  157. }
  158.  
  159. TimeArray[2][4] += clock();
  160. }
  161.  
  162. /*
  163.   ** ---------------------------------------------------------------
  164.   */
  165.  
  166. {
  167. IDOS->Printf("INFO : -- AllocVecPooled brute\n");
  168. TimeArray[3][0] += clock();
  169. AlloCount = 0;
  170.  
  171. APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
  172.  
  173. if (MemPool)
  174. {
  175. if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
  176. {
  177. IExec->NewList(L);
  178.  
  179. TimeArray[3][1] += clock();
  180.  
  181. do
  182. {
  183. if (N = IExec->AllocVecPooled(MemPool, sizeof(struct Node)))
  184. {
  185. IExec->AddTail(L, N);
  186. }
  187. }
  188. while (++AlloCount);
  189.  
  190. TimeArray[3][2] += clock();
  191. TimeArray[3][3] += clock();
  192. IExec->FreeVecPooled(MemPool, L);
  193. }
  194.  
  195. IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
  196. }
  197.  
  198. TimeArray[3][4] += clock();
  199. }
  200.  
  201. /*
  202.   ** ---------------------------------------------------------------
  203.   */
  204.  
  205. {
  206. IDOS->Printf("INFO : -- ItemPoolAlloc nice\n");
  207. TimeArray[4][0] += clock();
  208. AlloCount = 0;
  209.  
  210. APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
  211.  
  212. if (MemPool)
  213. {
  214. if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
  215. {
  216. IExec->NewList(L);
  217.  
  218. APTR ItemPool = IExec->AllocSysObjectTags(ASOT_ITEMPOOL, ASOITEM_MFlags, MEMF_CLEAR
  219. , ASOITEM_ItemSize, sizeof(struct Node)
  220. , TAG_END);
  221.  
  222. if (ItemPool)
  223. {
  224. TimeArray[4][1] += clock();
  225.  
  226. do
  227. {
  228. if (N = IExec->ItemPoolAlloc(ItemPool))
  229. {
  230. IExec->AddTail(L, N);
  231. }
  232. }
  233. while (++AlloCount);
  234.  
  235. TimeArray[4][2] += clock();
  236. N = IExec->GetHead(L);
  237.  
  238. while (N)
  239. {
  240. R = N;
  241. N = IExec->GetSucc(N);
  242.  
  243. IExec->Remove(R);
  244. IExec->ItemPoolFree(ItemPool, R);
  245. }
  246.  
  247. TimeArray[4][3] += clock();
  248. IExec->FreeSysObject(ASOT_ITEMPOOL, ItemPool);
  249. }
  250.  
  251. IExec->FreeVecPooled(MemPool, L);
  252. }
  253.  
  254. IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
  255. }
  256.  
  257. TimeArray[4][4] += clock();
  258. }
  259.  
  260. /*
  261.   ** ---------------------------------------------------------------
  262.   */
  263.  
  264. {
  265. IDOS->Printf("INFO : -- ItemPoolAlloc brute\n");
  266. TimeArray[5][0] += clock();
  267. AlloCount = 0;
  268.  
  269. APTR MemPool = IExec->AllocSysObjectTags(ASOT_MEMPOOL, ASOPOOL_MFlags, MEMF_CLEAR, TAG_END);
  270.  
  271. if (MemPool)
  272. {
  273. if (L = IExec->AllocVecPooled(MemPool, sizeof(struct List)))
  274. {
  275. IExec->NewList(L);
  276.  
  277. APTR ItemPool = IExec->AllocSysObjectTags(ASOT_ITEMPOOL, ASOITEM_MFlags, MEMF_CLEAR
  278. , ASOITEM_ItemSize, sizeof(struct Node)
  279. , TAG_END);
  280.  
  281. if (ItemPool)
  282. {
  283. TimeArray[5][1] += clock();
  284.  
  285. do
  286. {
  287. if (N = IExec->ItemPoolAlloc(ItemPool))
  288. {
  289. IExec->AddTail(L, N);
  290. }
  291. }
  292. while (++AlloCount);
  293.  
  294. TimeArray[5][2] += clock();
  295. TimeArray[5][3] += clock();
  296. IExec->FreeSysObject(ASOT_ITEMPOOL, ItemPool);
  297. }
  298.  
  299. IExec->FreeVecPooled(MemPool, L);
  300. }
  301.  
  302. IExec->FreeSysObject(ASOT_MEMPOOL, MemPool);
  303. }
  304.  
  305. TimeArray[5][4] += clock();
  306. }
  307.  
  308. } /* for() */
  309.  
  310. IDOS->Printf("\n : Method Start Build Remove Done End Total\n");
  311. IDOS->Printf(" : malloc [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[0][0], TimeArray[0][1], TimeArray[0][2], TimeArray[0][3], TimeArray[0][4], TimeArray[0][4] - TimeArray[0][0]);
  312. IDOS->Printf(" : AllocVec [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[1][0], TimeArray[1][1], TimeArray[1][2], TimeArray[1][3], TimeArray[1][4], TimeArray[1][4] - TimeArray[1][0]);
  313. IDOS->Printf(" : MemPool nice [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[2][0], TimeArray[2][1], TimeArray[2][2], TimeArray[2][3], TimeArray[2][4], TimeArray[2][4] - TimeArray[2][0]);
  314. IDOS->Printf(" : MemPool brute [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[3][0], TimeArray[3][1], TimeArray[3][2], TimeArray[3][3], TimeArray[3][4], TimeArray[3][4] - TimeArray[3][0]);
  315. IDOS->Printf(" : ItemPool nice [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[4][0], TimeArray[4][1], TimeArray[4][2], TimeArray[4][3], TimeArray[4][4], TimeArray[4][4] - TimeArray[4][0]);
  316. IDOS->Printf(" : ItemPool brute [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] -- [ %lld ] == [ %8lld ]\n", TimeArray[5][0], TimeArray[5][1], TimeArray[5][2], TimeArray[5][3], TimeArray[5][4], TimeArray[5][4] - TimeArray[5][0]);
  317.  
  318. IDOS->Printf("INFO : Finished\n");
  319. }

And if you want to try it out for yourself, here's the applied makefile:

  1. PROGNAME = AllocationSpeed
  2.  
  3. DEFINES += -DDEBUG \
  4. -DPROGNAME=$(PROGNAME)
  5.  
  6. # -DUSE_ARGOPTS \
  7. # -DUSE_DATETIME
  8.  
  9. $(PROGNAME) : main.c
  10. gcc -o $(PROGNAME) main.c $(DEFINES)
  11. @Echo "==== Done compiling =========="

Surprising, no?

OldFart

walkero
walkero's picture
Offline
Last seen: 6 days 19 hours ago
Joined: 2009-05-03 16:54
Re: Some curiosities about speeddifferences for various...

Interesting tests. If I read the results correctly, malloc is the second fastest way, and AllocVec is more than three times slower, right?

OldFart
OldFart's picture
Offline
Last seen: 4 hours 20 min ago
Joined: 2010-11-30 14:09
Re: Some curiosities about speeddifferences for various...

@walkero

Interesting tests.

And a bit disturbing at the same time!

  1. : Method Total
  2. : malloc == [ 411789 ] 100%
  3. : AllocVec == [ 1316009 ] 312,9%
  4. : MemPool nice == [ 2610785 ] 634,0%
  5. : MemPool brute == [ 1900346 ] 461,5%
  6. : ItemPool nice == [ 322175 ] 78,2%
  7. : ItemPool brute == [ 454305 ] 110,3%

Like you stated: Interesting tests.

OldFart

msteed
msteed's picture
Offline
Last seen: 1 week 2 days ago
Joined: 2022-01-18 08:34
Re: Some curiosities about speeddifferences for various...

Interesting results indeed, and not what one would expect. However, looking over the code I see some things that might affect the results:

o The pooled allocations use MEMF_CLEAR, while malloc() and AllocVec() do not. It takes time to clear the memory as it's allocated, so you're kind of comparing apples and oranges when comparing cleared against uncleared allocations.

o In the ItemPool tests a MemPool is created for a single List allocation. That's rather inefficient, and the overhead of setting up and freeing the MemPool might to some extent mask the speed of the ItemPool. What do you see if you use AllocVecTags() instead to allocate the List, which is more like the unpooled tests?

o You're depending on default puddle sizes for the pooled memory, which might not be optimal for this particular test, or at least might be different between the MemPool test and the ItemPool test.

o Standard libraries for C tend to do their own memory management, on top of whatever the OS provides. So the results for malloc() will likely depend on which C library was used. You seem to be using the default, which would be newlib. The results may be different for clib2/clib4.

o The presence of virtual memory can make tests like this difficult to do, as there might be a bunch of virtual memory operations going on underneath the allocations you're doing. Do you get similar results if you run the test program multiple times? What if you reverse the order of the individual tests (ItemPool brute first, malloc last)?

o Then there's the effect of the CPU's memory cache on the results, which might vary from one system to another. What Amiga are you running the test on?

Trying to look a little deeper at the results, I used the figures from your test and tried to filter out some of the overhead and look only at the the time required to perform the allocations (the difference between step 2 and step 1), the time required to perform the frees (the difference between step 3 and step 2), as well as the sum of those:

Test         Allocations    Frees       Total
               (2 - 1)     (3 - 2)  (Alloc + Free)
----         -----------   -------    --------
malloc         181034      230683      411717
AllocVec       752251      563699     1315950      
Pool (nice)   1363718     1246687     2610405     
Pool (brute)  1362016         18*     1362034
Item (nice)    194630       84226      278856
Item (brute)   242422         20*      242442

* The time required to free the pools was not recorded, so the time shown is just the time between two calls to clock(). It would be more informative to free the pools between the two calls to clock() for the brute tests.

As noted above, the pooled tests had to clear the memory after allocating it, so naturally that makes them take longer. It's also interesting to note that the allocation time for the "nice" and "brute" ItemPool tests is significantly different, even though the code is exactly the same.

Still, the results make a little more sense. malloc() seems to be quite fast compared to AllocVec(). AllocVecPooled() seems rather slow, but to be fair we'd need to see it without MEMF_CLEAR. ItemPools are quite fast, as expected, and would be even faster without MEMF_CLEAR.

OldFart
OldFart's picture
Offline
Last seen: 4 hours 20 min ago
Joined: 2010-11-30 14:09
Re: Some curiosities about speeddifferences for various...

@msteed

Some valid points you bring to the table.

o The pooled allocations use MEMF_CLEAR, while malloc() and AllocVec() do not. It takes time to clear the memory as it's allocated, so you're kind of comparing apples and oranges when comparing cleared against uncleared allocations.

I tweeked the code for both 'MemPool' and 'ItemPool'-variants into 4 scenario's: 'nice & clean', 'nice, but dirty', 'brute, but clean' and 'brute & dirty', where clean and dirty indicate use of the MEMF_CLEAR-flag

o In the ItemPool tests a MemPool is created for a single List allocation. That's rather inefficient, and the overhead of setting up and freeing the MemPool might to some extent mask the speed of the ItemPool. What do you see if you use AllocVecTags() instead to allocate the List, which is more like the unpooled tests?

As this is only once in a loop, I don't see it as much of a hindrance.

o You're depending on default puddle sizes for the pooled memory, which might not be optimal for this particular test, or at least might be different between the MemPool test and the ItemPool test.

Doesn't the autodocs wriggle you into this?

o Standard libraries for C tend to do their own memory management, on top of whatever the OS provides. So the results for malloc() will likely depend on which C library was used. You seem to be using the default, which would be newlib. The results may be different for clib2/clib4.

Most coders will use newlib, no?

o The presence of virtual memory can make tests like this difficult to do, as there might be a bunch of virtual memory operations going on underneath the allocations you're doing. Do you get similar results if you run the test program multiple times? What if you reverse the order of the individual tests (ItemPool brute first, malloc last)?

During testruns results do not show extreme differences, just a few percentage points, so the results can be seen as very consistent. And for that very reason the outer loop runs 16 times. That may even out some external influences.

o Then there's the effect of the CPU's memory cache on the results, which might vary from one system to another. What Amiga are you running the test on?

System of choice, my only option, is an X5000/20.

In the first published list of results, the final line got mangled somehow and reflected wrong figures.

After tweeking, i got these results:

  1. : Method Manners Start End Total
  2. : malloc [ 90604206 ] -- [ 91018348 ] == [ 414142 ] (100,0%)
  3. : AllocVec [ 91045739 ] -- [ 92375586 ] == [ 1329847 ] (321,1%)
  4. : MemPool nice, but dirty [ 92403133 ] -- [ 94945903 ] == [ 2542770 ] (614,0%)
  5. : MemPool nice & clean [ 94973884 ] -- [ 97580379 ] == [ 2606495 ] (629,4%)
  6. : MemPool brute & dirty [ 97608573 ] -- [ 99426198 ] == [ 1817625 ] (438,9%)
  7. : MemPool brute, but clean [ 99456353 ] -- [ 101365479 ] == [ 1909126 ] (461,0%)
  8. : ItemPool nice, but dirty [ 101393885 ] -- [ 101686550 ] == [ 292665 ] ( 70,7%)
  9. : ItemPool nice & clean [ 101716075 ] -- [ 102043355 ] == [ 327280 ] ( 79,0%)
  10. : ItemPool brute & dirty [ 102071578 ] -- [ 102275613 ] == [ 204035 ] ( 49,3%)
  11. : ItemPool brute, but clean [ 102304844 ] -- [ 102539147 ] == [ 234303 ] ( 56,6%)

OldFart

walkero
walkero's picture
Offline
Last seen: 6 days 19 hours ago
Joined: 2009-05-03 16:54
Re: Some curiosities about speeddifferences for various...

Most coders will use newlib, no?

Not really. Most people that are porting software from other systems started already using clib4 for some time now.

Do you have the latest code somewhere available. I am curious to test on my systems.

msteed
msteed's picture
Offline
Last seen: 1 week 2 days ago
Joined: 2022-01-18 08:34
Re: Some curiosities about speeddifferences for various...
In the ItemPool tests a MemPool is created for a single List allocation.

As this is only once in a loop, I don't see it as much of a hindrance.

I don't expect it to make much difference either, but since the result of the tests was not what we expected, I was casting about for something that might not work the way we expected.

You're depending on default puddle sizes for the pooled memory...

Doesn't the autodocs wriggle you into this?

Because you're allocating Nodes, which are rather small, I actually expect the default pool parameters (whatever they are) to work well. But again, I was looking for something that might not work the way we expect.

The 'PoolStat' command should show the parameters for the pool created by the test, but you'd need a special version of the test that paused long enough before deleting the pool that you could run the command.

Most coders will use newlib, no?

For writing native OS4 programs newlib will likely be the choice. But as walkero pointed out, when porting code from other platforms clib4 is a better choice.

During testruns results do not show extreme differences, just a few percentage points, so the results can be seen as very consistent.

Good; that tends to rule out interference from other things.

Your revised test results mostly make sense, with the notable exception of MemoryPools:

o Given the additional memory management done by the C library, malloc() is in effect using memory pools (its own, not those of Exec), so it's not surprising that it's faster than AllocVec() (though the difference is more than I might have guessed).

o It takes longer to clear memory after allocating it, but not very much longer (it doesn't take much time to clear something as small as a Node; the difference would be greater with larger allocations). The faster the allocation itself is, the more significant the slowdown due to clearing the memory becomes.

o Freeing a pool in one operation is significantly faster than freeing all the individual allocations in the pool one at a time.

o ItemPools are the fastest, as you'd expect given their limitations.

The real mystery is why MemoryPools are so slow. Faster allocation is supposed to be one of the main reasons to use MemoryPools, but they actually seem to be twice as slow as just using AllocVec(). And as you can see if you run 'PoolStat' after the system has been active for a while, MemoryPools get used a lot.

Maybe the memory manager in OS4 has been improved to the point that it's now faster than MemoryPools? Though the OS4 Wiki background section on MemoryPools seems to suggest that they were enhanced in OS4 as well.

Do you have the latest code somewhere available. I am curious to test on my systems.

I'd be curious to try it on my X1000 too.

msteed
msteed's picture
Offline
Last seen: 1 week 2 days ago
Joined: 2022-01-18 08:34
Re: Some curiosities about speeddifferences for various...

Thanks to OldFart for sending me a copy of the program so I could run it on my X1000. Here's what I got (stripping out all the intermediate times to make it easier to read):

Method          Manners               Total
malloc                             [  606460 ] (100,0%)
AllocVec                           [ 1776791 ] (293,0%)
AllocVecPooled  nice, but dirty    [ 3733219 ] (615,6%)
AllocVecPooled  nice & clean       [ 3937894 ] (649,3%)
AllocVecPooled  brute & dirty      [ 2312661 ] (381,3%)
AllocVecPooled  brute, but clean   [ 2493997 ] (411,2%)
ItemPooled      nice, but dirty    [  365128 ] ( 60,2%)
ItemPooled      nice & clean       [  397714 ] ( 65,6%)
ItemPooled      brute & dirty      [  286676 ] ( 47,3%)
ItemPooled      brute, but clean   [  296692 ] ( 48,9%)

Since all the percentages are relative to malloc(), the fact that malloc() is somewhat slower relative to the other methods on the X1000 makes the percentages a little lower for most of the other methods than on the X5000. But the overall trend is the same: AllocVecPooled() is roughly six times slower than malloc() and twice as slow as AllocVec(), and ItemPoolAlloc() is roughly twice as fast as malloc(), six times as fast as AllocVec(), and a whopping 8 to 10 times as fast as AllocVecPooled(). So whatever is going on with MemoryPools that makes them so slow is consistent, happening on both the X5000 and the X1000.

I couldn't resist comparing the speed of the X1000 to that of the X5000, using OldFart's results for the X5000 figures. Here the percentage figure is how much longer the X1000 takes to run the test (assuming that clock() is measuring the same thing on both machines, which is supposed to be the case).

Method          Manners            X5000/20     X1000
malloc                           [  414142 ] [  606460 ] (146%)
AllocVec                         [ 1329847 ] [ 1776791 ] (134%)
AllocVecPooled  nice, but dirty  [ 2542770 ] [ 3733219 ] (147%)
AllocVecPooled  nice & clean     [ 2606495 ] [ 3937894 ] (151%)
AllocVecPooled  brute & dirty    [ 1817625 ] [ 2312661 ] (127%)
AllocVecPooled  brute, but clean [ 1909126 ] [ 2493997 ] (131%)
ItemPooled      nice, but dirty  [  292665 ] [  365128 ] (125%)
ItemPooled      nice & clean     [  327280 ] [  397714 ] (122%)
ItemPooled      brute & dirty    [  204035 ] [  286676 ] (141%)
ItemPooled      brute, but clean [  234303 ] [  296692 ] (127%)

As in past benchmarks I've seen, the difference -- the X1000 is 20% to 50% slower -- is greater than the difference in clock speed between the two machines.

Log in or register to post comments