Multithreading – Delphi Seattle 10, multithreading / core performance

I have a 100% Delphi code application It is a 64 - bit windows console application with workload manager and a fixed number of work programs This is done by creating threads, and each thread is a worker The thread does not die, it gets work from the queue populated by the workload manager

It seems to work well

However, I found that on a 16 core system, I saw a processing time of about 90 minutes (it had 2000000 workloads; each had database work) When I added 16 to 32 cores, I saw performance degradation! No database contention In essence, the database is waiting for things to happen

Each thread has its own DB connection Each thread's query uses only that thread connection

I updated Delphi mm to use scalemm2; This has made great progress; But I still don't know why adding cores will reduce performance

When the application has 256 threads, the total CPU utilization is 80% on 32 cores When the application has 256 threads and 16 cores, the total CPU utilization is 100% (that's why I want to add cores) – it slows down:-(

I have applied as many suggestions as I can understand to the code base

Ie – the function does not return a string and uses const as a parameter to protect "shared" data with small critical parts (in fact, it uses multi read exclusive write) I currently have no processor affinity assigned; I'm reading conflicting suggestions about using it So I am not currently (it will be difficult to add, but not today)

Problem - I tend to "think" that this problem revolves around thread contention

How to find the problem of thread contention? Is there a tool specifically for such contention identification? How can I determine what is "heap" and what is not to further reduce contention there?

Understanding, guidance and guidance will be appreciated

Can provide relevant code areas... If I know what is relevant

Procedure TXETaskWorkloadExecuterThread.Enqueue(Const Workload: TXETaskWorkload);
Begin
  // protect your own queue
  FWorkloadQueue.Enter;
  FWorkloads.Add(Workload);
  FWorkloadQueue.Leave;
End;

Procedure TXETaskManager.Enqueue(Const Workload: TXETaskWorkload);
Begin
  If FWorkloadCount >= FMaxQueueSize Then Begin
    WaitForEmptyQueue;
    FWorkloadCount := 0;
  End;

  FExecuters[FNextThread].Enqueue(Workload);
  // round-robin the queue
  Inc(FNextThread);
  Inc(FWorkloadCount);
  If FNextThread >= FWorkerThreads Then Begin
    FNextThread := 0;
  End;
End;


Function TXETaskWorkloadExecuterThread.Dequeue(Var Workload: TXETaskWorkload): Boolean;
Begin
  Workload := Nil;
  Result := False;

  FWorkloadQueue.Enter;
  Try
    If FNextWorkload < FWorkloads.Count Then Begin
      Workload := FWorkloads[FNextWorkload];
      Inc(FNextWorkload);
      If Workload Is TXETaskWorkLoadSynchronize Then Begin
        FreeAndNil(Workload);
        Exit;
      End;
      Result := True;
    End Else Begin
      FWorkloads.Clear;
      FNextWorkload := 0;
      FHaveWorkloadInQueue.ResetEvent;
      FEmptyAndFinishedQueue.SetEvent;
    End;
  Finally
    FWorkloadQueue.Leave;
  End;
End;

Edit -

Thanks for all the comments Clarification

This system / VM has nothing else The problematic executable is the only thing that uses CPU Single threaded performance means linearity I simply say that this is a divide and rule If I have 5 million cars to park, I have 30 drivers and 30 different parking lots I can tell each driver to wait for other vehicles to stop, which is slower than telling 30 drivers to park at the same time

Analysis in a single thread shows that nothing is causing this I have seen that Delphi and multi-core performance "gotcha's" (mainly related to string processing and lock) are mentioned on this board

Database is essentially boring, waiting for things to do I checked an Intel VTune Generally speaking, it says... Lock But I can't find it anywhere The content I have is very simple, and the locked current area is necessary and small What I can't see is that locking may occur due to other factors... Such as the string that creates the lock, or thread 1 causes problems in the main process by accessing the data (even through key part protection)

Continue to study Thank you again for your feedback

Solution

Your workload manager is deciding which thread gets which work item If a given thread is blocked (such as long work, database latency, etc.), you will queue more items to that thread, even if they may not be processed temporarily, if any

Typically, work items should be stored in a single shared queue and pulled out by multiple threads When any given thread is ready, it pulls out the next available work item For example:

constructor TXETaskManager.Create;
var
  I: Integer;
begin
  FWorkloadQueue := TCriticalSection.Create;
  FWorkloads := TList<TXETaskWorkload>.Create;
  FEmptyQueue := TEvent.Create(nil,True,'');
  FHaveWorkloadInQueue := TEvent.Create(nil,False,'');
  FNotFullQueue := TEvent.Create(nil,'');
  FTermEvent := TEvent.Create(nil,'');
  ...
  FMaxQueueSize := ...;
  FWorkerThreads := ...;
  for I := 0 to FWorkerThreads-1 do
    FExecuters[I] := TXETaskWorkloadExecuterThread.Create(Self);
end;

destructor TXETaskManager.Destroy;
begin
  for I := 0 to FWorkerThreads-1 do
    FExecuters[I].Terminate;
  FTermEvent.SetEvent;
  for I := 0 to FWorkerThreads-1 do
  begin
    FExecuters[I].WaitFor;
    FExecuters[I].Free;
  end;
  FWorkloadQueue.Free;
  FWorkloads.Free;
  FEmptyQueue.Free;
  FHaveWorkloadInQueue.Free;
  FNotFullQueue.Free;
  FTermEvent.Free;
  ...

  inherited;
end;

procedure TXETaskManager.Enqueue(Const Workload: TXETaskWorkload);
begin
  FWorkloadQueue.Enter;
  try
    while FWorkloads.Count >= FMaxQueueSize do
    begin
      FWorkloadQueue.Leave;
      FNotFullQueue.WaitFor(INFINITE);
      FWorkloadQueue.Enter;
    end;

    FWorkloads.Add(Workload);

    if FWorkloads.Count = 1 then
    begin
      FEmptyQueue.ResetEvent;
      FHaveWorkloadInQueue.SetEvent;
    end;

    if FWorkloads.Count >= FMaxQueueSize then
      FNotFullQueue.ResetEvent;
  finally
    FWorkloadQueue.Leave;
  end;
end;

function TXETaskManager.Dequeue(var Workload: TXETaskWorkload): Boolean;
begin
  Result := False;
  Workload := nil;

  FWorkloadQueue.Enter;
  try
    if FWorkloads.Count > 0 then
    begin
      Workload := FWorkloads[0];
      FWorkloads.Delete(0);
      Result := True;

      if FWorkloads.Count = (FMaxQueueSize-1) then
        FNotFullQueue.SetEvent;

      if FWorkloads.Count = 0 then
      begin
        FHaveWorkloadInQueue.ResetEvent;
        FEmptyQueue.SetEvent;
      end;
    end;
  finally
    FWorkloadQueue.Leave;
  end;
end;

constructor TXETaskWorkloadExecuterThread.Create(ATaskManager: TXETaskManager);
begin
  inherited Create(False);
  FTaskManager := ATaskManager;
end;

procedure TXETaskWorkloadExecuterThread.Execute;
var
  Arr: THandleObjectArray;
  Event: THandleObject;
  Workload: TXETaskWorkload;
begin
  SetLength(Arr,2);
  Arr[0] := FTaskManager.FHaveWorkloadInQueue;
  Arr[1] := FTaskManager.FTermEvent;

  while not Terminated do
  begin
    case TEvent.WaitForMultiple(Arr,INFINITE,Event) of
      wrSignaled:
      begin
        if Event = FTaskManager.FHaveWorkloadInQueue then
        begin
          if FTaskManager.Dequeue(Workload) then
          try
            // process Workload as needed...
          finally
            Workload.Free;
          end;
        end;
      end;
      wrError: begin
        RaiseLastOSError;
      end;
    end;
  end;
end;

If you find that threads are not getting enough work, you can adjust the number of threads as needed You should not normally use much more threads than the available CPU cores

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>