If I have a script that I need to run against multiple computers, or with multiple different arguments, how can I execute it in parallel, without having to incur the overhead of spawning a new PSJob with Start-Job
?
As an example, I want to re-sync the time on all domain members, like so:
$computers = Get-ADComputer -filter * |Select-Object -ExpandProperty dnsHostName
$creds = Get-Credential domain\user
foreach($computer in $computers)
{
$session = New-PSSession -ComputerName $computer -Credential $creds
Invoke-Command -Session $session -ScriptBlock { w32tm /resync /nowait /rediscover }
}
But I don't want to wait for each PSSession to connect and invoke the command. How can this be done in parallel, without Jobs?
Update - While this answer explains the process and mechanics of PowerShell runspaces and how they can help you multi-thread non-sequential workloads, fellow PowerShell aficionado Warren 'Cookie Monster' F has gone the extra mile and incorporated these same concepts into a single tool called
Invoke-Parallel
- it does what I describe below, and he has since expanded it with optional switches for logging and prepared session state including imported modules, really cool stuff - I strongly recommend you check it out before building you own shiny solution!With Parallel Runspace execution:
Reducing inescapable waiting time
In the original specific case, the executable invoked has a
/nowait
option which prevents blocking the invoking thread while the job (in this case, time re-synchronization) finishes on its own.This greatly reduces the overall execution time from the issuers perspective, but connecting to each machine is still done in sequential order. Connecting to thousands of clients in sequence may take a long time depending on the number of machines that are for one reason or another inaccessible, due to an accumulation of timeout waits.
To get around having to queue up all subsequent connections in case of a single or a few consecutive timeouts, we can dispatch the job of connecting and invoking commands to separate PowerShell Runspaces, executing in parallel.
What is a Runspace?
A Runspace is the virtual container in which your powershell code executes, and represents/holds the Environment from the perspective of a PowerShell statement/command.
In broad terms, 1 Runspace = 1 thread of execution, so all we need to "multi-thread" our PowerShell script is a collection of Runspaces that can then in turn execute in parallel.
Like the original problem, the job of invoking commands multiple runspaces can be broken down into:
RunspacePool template
PowerShell has a type accelerator called
[RunspaceFactory]
that will assist us in the creation of runspace components - let's put it to work1. Create a RunspacePool and
Open()
it:The two arguments passed to
CreateRunspacePool()
,1
and8
is the minimum and maximum number of runspaces allowed to execute at any given time, giving us an effective maximum degree of parallelism of 8.2. Create an instance of PowerShell, attach some executable code to it and assign it to our RunspacePool:
An instance of PowerShell is not the same as the
powershell.exe
process (which is really a Host application), but an internal runtime object representing the PowerShell code to execute. We can use the[powershell]
type accelerator to create a new PowerShell instance within PowerShell:3. Invoke the PowerShell instance asynchronously using APM:
Using what is known in .NET development terminology as the Asynchronous Programming Model, we can split the invocation of a command into a
Begin
method, for giving a "green light" to execute the code, and anEnd
method to collect the results. Since we in this case are not really interested in any feedback (we don't wait for the output fromw32tm
anyways), we can make due by simply calling the first methodWrapping it up in a RunspacePool
Using the above technique, we can wrap the sequential iterations of creating new connections and invoking the remote command in a parallel execution flow:
Assuming that the CPU has the capacity to execute all 8 runspaces at once, we should be able to see that the execution time is greatly reduced, but at the cost of readability of the script due to the rather "advanced" methods used.
Determining the optimum degree of parallism:
We could easily create a RunspacePool that allows for the execution of a 100 runspaces at the same time:
But at the end of the day, it all comes down to how many units of execution our local CPU can handle. In other words, as long as your code is executing, it does not make sense to allow more runspaces than you have logical processors to dispatch execution of code to.
Thanks to WMI, this threshold is fairly easy to determine:
If, on the other hand, the code you are executing itself incurs a lot of wait time due to external factors like network latency, you can still benefit from running more simultanous runspaces than you have logical processors, so you'd probably want to test of range possible maximum runspaces to find break-even:
Adding to this discussion, what's missing is a collector to store the data that is created from the runspace, and a variable to check the status of the runspace, i.e. is it completed or not.
Check out PoshRSJob. It provides same/similar functions as the native *-Job functions, but uses Runspaces which tend to be much quicker and more responsive than the standard Powershell jobs.
@mathias-r-jessen has a great answer though there are details I'd like to add.
Max Threads
In theory threads should be limited by the number of system processors. However, while testing AsyncTcpScan I achieved far better performance by choosing a much larger value for
MaxThreads
. Thus why that module has a-MaxThreads
input parameter. Keep in mind that allocating too many threads will hinder performance.Returning Data
Getting data back from the
ScriptBlock
is tricky. I've updated the OP code and integrated it into what was used for AsyncTcpScan.