You are on page 1of 4

3/2/2020 My-LabVIEW: For Loop “Iteration Parallelism”

More Create Blog Sign In

Features of LabVIEW, Tools and Tips for LabVIEW Developers.

ெசவ் வாய் , 27 ஆகஸ்ட், 2013 என்ைனப் பற்

For Loop “Iteration Parallelism” என ய வரத்ைதக்
Let’s say we are developing LabVIEW code to monitor the number of LabVIEW instances
running at any given point of time over a LAN. Then our code snippet will look like this.
வைலப் ப காப் பகம்

March (1)
January (1)
March (1)
January (1)
August (1)
May (1)
February (1)

In this code snippet, the “TCP Open connection” primitive will timeout after 1 second, and the
loop runs 256 iterations. so the worst case execution time of the above snippet would be around
256 seconds. The actual measurement shows the execution time to be around 225 seconds on a
quad code machine running Windows 7.

Common questions that arise in our minds are:

1. Is there a way to reduce the execution time?

2. Can we improve the performance by making use of all the available CPU-cores?

The answer to both these questions is yes. Let’s see how we can achieve this.

From LabVIEW 2009 onward, The For Loop has a new feature called “Iteration Parallelism”.
We enabled this feature in our code and played around with its settings to check for optimal
settings. We were able to reduce to the execution time for our code from 225 seconds to just 4

Steps to enable “Iteration Parallelism” option:

1. Check for iteration parallelism in the for loop by using the tool which is under the menu
“Tools >> Profile >> Find Parallelizable Loops
2. If the loop is not parallelizable, Remove any data dependencies between the iterations. Shift
registers and feedback nodes are the most common cause for data dependencies between
loop iterations, so try to avoid them.
3. In our case we have modified the code to separate the “iteration independent code” from the
“iteration dependent code”. After the modification our code will look like this. 1/4
3/2/2020 My-LabVIEW: For Loop “Iteration Parallelism”

The first “for loop” is Iteration Parallelizable unlike the second one because of its
iteration dependent code.

4. Right click on the border of the first for loop and select “Configure Iteration
The following dialog box will popup.

5. In the above window we will refer to the “number of generated parallel loop instances” as
“T”. The maximum value that can be entered for T is 64.
6. Leave the “Allow debugging” unchecked and “Automatically partition iterations” selected.
7. Check the “Enable Loop iteration Parallelism”, Click ok and the P terminal will appear on
the For loop border.
8. Create a control for this terminal and label it as “P”.
9. The following table shows the results of the execution time of our code with various
combinations of P and T.

P terminal Execution time with different values of “# of generated

input parallel loop instances” (T)
2 4 8 16 32 64
2 113.019s 124.996s 122.009s 124.005s 119.005s 110.019s
Not wired
(on Quad core) 100.029s 61.005s 58.005s 61.017s 61.005s 53.781s
4 115.020s 60.003s 60.002s 54.016s 49.004s 54.013s
8 114.012s 61.993s 28.010s 28.019s 31.004s 28.071s
16 102.030s 58.005s 26.011s 15.009s 14.008s 14.024s
32 100.007s 57.008s 30.004s 15.011s 6.035s 7.009s
64 123.003s 62.992s 29.013s 15.008s 7.009s 3.072s
128 122.002s 58.003s 29.007s 15.003s 7.010s 4.020s
256 118.008s 59.005s 30.007s 14.013s 7.034s 4.084s 2/4
3/2/2020 My-LabVIEW: For Loop “Iteration Parallelism”
If you observe carefully for any value of P> T, the execution time is almost same as the as when
Looking at the 2nd and 3rd row the execution time is almost the same. It indicates that if you don’t
wire any input to P, LabVIEW will take 2 for dual core, 4 for a Quad core machine and so on.
So we removed the second row for plotting the variation of Execution time with respect to P and

Variation along each column

Variation along each Row

It is clear that increasing the no. of threads (both T and P) reduces the execution time. We
reduced the execution time from 256 seconds to about 4 seconds with the help of Iteration
parallelism (T=64 and P>=64).

1. The Parameter “T” that is “Number of generated parallel loops instances” is the number of
threads to be generated at compile time.
2. The “P” terminal controls the number of threads to use at run time from the available Threads
3. If you don’t wire any input to P, LabVIEW will take 2 for dual core, 4 for a Quad core machine
and so on.
4. Wiring any value above 64 has no impact on the performance.
5. Since our example code is very small, creating 64 copies of the code will not consume too much
memory, but if your code is big the chances of running out of memory is possible. In that case
try reducing T.
6. Our example code has a delay (as part of the Open TCP connection) so all the 64 threads were
able to run in parallel so we saw a significant performance improvement. But most algorithms
will not have any delay and will require the CPU for each of the 64 threads. In such cases, you
will not observe a performance improvement greater than 2x or 4x, and there are even chances
of performance degradation due to an overhead caused by thread handling.
7. In our example using all of the 64 available threads gave us the best performance (3 seconds)
but this may not always be the cases. The Main reasons for this being: limited available
memory, no delays used in the code and the number of CPUs.
8. If there is no delay inside the for loop, it is better to leave the P terminal unwired, so that
LabVIEW can decide how many threads are to used based on the number of CPU cores in the
target machine. 3/4
3/2/2020 My-LabVIEW: For Loop “Iteration Parallelism”

இ ைக ட்ட anishprabu243 ேநரம் ற் பகல் 8:31

க த் கள் இல் ைல:

க த் ைர க

உங் கள் க த்ைத உள் ளி க...

இவ் வா க த் ைர : Google கணக்

ெவளி ன்ேனாட்

யஇ ைக கப் பைழய இ ைககள்

இதற் ேசர்: க த் ைரகைள இ (Atom)

க் றள்

Developd by Anish T. சாதாரணம் ம் . Blogger இயக் வ . 4/4

You might also like