Multithreading – Delphi Seattle 10, multithreading / core performance
I have a 100% Delphi code application It is a 64 - bit windows console application with workload manager and a fixed number of work programs This is done by creating threads, and each thread is a worker The thread does not die, it gets work from the queue populated by the workload manager
It seems to work well
However, I found that on a 16 core system, I saw a processing time of about 90 minutes (it had 2000000 workloads; each had database work) When I added 16 to 32 cores, I saw performance degradation! No database contention In essence, the database is waiting for things to happen
Each thread has its own DB connection Each thread's query uses only that thread connection
I updated Delphi mm to use scalemm2; This has made great progress; But I still don't know why adding cores will reduce performance
When the application has 256 threads, the total CPU utilization is 80% on 32 cores When the application has 256 threads and 16 cores, the total CPU utilization is 100% (that's why I want to add cores) – it slows down:-(
I have applied as many suggestions as I can understand to the code base
Ie – the function does not return a string and uses const as a parameter to protect "shared" data with small critical parts (in fact, it uses multi read exclusive write) I currently have no processor affinity assigned; I'm reading conflicting suggestions about using it So I am not currently (it will be difficult to add, but not today)
Problem - I tend to "think" that this problem revolves around thread contention
How to find the problem of thread contention? Is there a tool specifically for such contention identification? How can I determine what is "heap" and what is not to further reduce contention there?
Understanding, guidance and guidance will be appreciated
Can provide relevant code areas... If I know what is relevant
Procedure TXETaskWorkloadExecuterThread.Enqueue(Const Workload: TXETaskWorkload); Begin // protect your own queue FWorkloadQueue.Enter; FWorkloads.Add(Workload); FWorkloadQueue.Leave; End; Procedure TXETaskManager.Enqueue(Const Workload: TXETaskWorkload); Begin If FWorkloadCount >= FMaxQueueSize Then Begin WaitForEmptyQueue; FWorkloadCount := 0; End; FExecuters[FNextThread].Enqueue(Workload); // round-robin the queue Inc(FNextThread); Inc(FWorkloadCount); If FNextThread >= FWorkerThreads Then Begin FNextThread := 0; End; End; Function TXETaskWorkloadExecuterThread.Dequeue(Var Workload: TXETaskWorkload): Boolean; Begin Workload := Nil; Result := False; FWorkloadQueue.Enter; Try If FNextWorkload < FWorkloads.Count Then Begin Workload := FWorkloads[FNextWorkload]; Inc(FNextWorkload); If Workload Is TXETaskWorkLoadSynchronize Then Begin FreeAndNil(Workload); Exit; End; Result := True; End Else Begin FWorkloads.Clear; FNextWorkload := 0; FHaveWorkloadInQueue.ResetEvent; FEmptyAndFinishedQueue.SetEvent; End; Finally FWorkloadQueue.Leave; End; End;
Edit -
Thanks for all the comments Clarification
This system / VM has nothing else The problematic executable is the only thing that uses CPU Single threaded performance means linearity I simply say that this is a divide and rule If I have 5 million cars to park, I have 30 drivers and 30 different parking lots I can tell each driver to wait for other vehicles to stop, which is slower than telling 30 drivers to park at the same time
Analysis in a single thread shows that nothing is causing this I have seen that Delphi and multi-core performance "gotcha's" (mainly related to string processing and lock) are mentioned on this board
Database is essentially boring, waiting for things to do I checked an Intel VTune Generally speaking, it says... Lock But I can't find it anywhere The content I have is very simple, and the locked current area is necessary and small What I can't see is that locking may occur due to other factors... Such as the string that creates the lock, or thread 1 causes problems in the main process by accessing the data (even through key part protection)
Continue to study Thank you again for your feedback
Solution
Your workload manager is deciding which thread gets which work item If a given thread is blocked (such as long work, database latency, etc.), you will queue more items to that thread, even if they may not be processed temporarily, if any
Typically, work items should be stored in a single shared queue and pulled out by multiple threads When any given thread is ready, it pulls out the next available work item For example:
constructor TXETaskManager.Create; var I: Integer; begin FWorkloadQueue := TCriticalSection.Create; FWorkloads := TList<TXETaskWorkload>.Create; FEmptyQueue := TEvent.Create(nil,True,''); FHaveWorkloadInQueue := TEvent.Create(nil,False,''); FNotFullQueue := TEvent.Create(nil,''); FTermEvent := TEvent.Create(nil,''); ... FMaxQueueSize := ...; FWorkerThreads := ...; for I := 0 to FWorkerThreads-1 do FExecuters[I] := TXETaskWorkloadExecuterThread.Create(Self); end; destructor TXETaskManager.Destroy; begin for I := 0 to FWorkerThreads-1 do FExecuters[I].Terminate; FTermEvent.SetEvent; for I := 0 to FWorkerThreads-1 do begin FExecuters[I].WaitFor; FExecuters[I].Free; end; FWorkloadQueue.Free; FWorkloads.Free; FEmptyQueue.Free; FHaveWorkloadInQueue.Free; FNotFullQueue.Free; FTermEvent.Free; ... inherited; end; procedure TXETaskManager.Enqueue(Const Workload: TXETaskWorkload); begin FWorkloadQueue.Enter; try while FWorkloads.Count >= FMaxQueueSize do begin FWorkloadQueue.Leave; FNotFullQueue.WaitFor(INFINITE); FWorkloadQueue.Enter; end; FWorkloads.Add(Workload); if FWorkloads.Count = 1 then begin FEmptyQueue.ResetEvent; FHaveWorkloadInQueue.SetEvent; end; if FWorkloads.Count >= FMaxQueueSize then FNotFullQueue.ResetEvent; finally FWorkloadQueue.Leave; end; end; function TXETaskManager.Dequeue(var Workload: TXETaskWorkload): Boolean; begin Result := False; Workload := nil; FWorkloadQueue.Enter; try if FWorkloads.Count > 0 then begin Workload := FWorkloads[0]; FWorkloads.Delete(0); Result := True; if FWorkloads.Count = (FMaxQueueSize-1) then FNotFullQueue.SetEvent; if FWorkloads.Count = 0 then begin FHaveWorkloadInQueue.ResetEvent; FEmptyQueue.SetEvent; end; end; finally FWorkloadQueue.Leave; end; end; constructor TXETaskWorkloadExecuterThread.Create(ATaskManager: TXETaskManager); begin inherited Create(False); FTaskManager := ATaskManager; end; procedure TXETaskWorkloadExecuterThread.Execute; var Arr: THandleObjectArray; Event: THandleObject; Workload: TXETaskWorkload; begin SetLength(Arr,2); Arr[0] := FTaskManager.FHaveWorkloadInQueue; Arr[1] := FTaskManager.FTermEvent; while not Terminated do begin case TEvent.WaitForMultiple(Arr,INFINITE,Event) of wrSignaled: begin if Event = FTaskManager.FHaveWorkloadInQueue then begin if FTaskManager.Dequeue(Workload) then try // process Workload as needed... finally Workload.Free; end; end; end; wrError: begin RaiseLastOSError; end; end; end; end;
If you find that threads are not getting enough work, you can adjust the number of threads as needed You should not normally use much more threads than the available CPU cores