Trivial Solution To CPU-Bound Thread Slowdown On Multicore Systems

According to David Beasley, the CPython GIL doesn't just prevent CPU-bound applications from taking advantage of multiple cores. He says it also slows them down on multicore systems.

As an example, he says, following CPU-bound function runs is slower on multicore systems when run in two threads vs when run twice sequentially on the same thread:
def counts(n=10000000):
while n > 0:
n -= 1

I tested the claim by running this little program (let's call it
from time import time
from threading import Thread

def counts(n=10000000):
while n > 0:
n -= 1

def measure(f, comment):
print time()-t, "seconds"

def sequ():

def para():
t1 = Thread(target=counts,args=())
t2 = Thread(target=counts,args=())
t1.join(); t2.join()

if __name__ == "__main__":
measure(sequ, "Sequential Execution")
measure(para, "Sequential Execution")

Here's the result from running the script on a Core Duo 2 laptop:
Sequential Execution: 4.81399989128 seconds
Parallel Execution: 11.2730000019 seconds

He's sort-of right in this case. So why has the GIL has survived? Partly because the problem is easily solved.

One way to do that is to use the sys.checkinterval function which allows CPU-bound threads to do a little more work before giving up control of the GIL. Since the overhead of GIL passing in the above is more than 100%, we need to increase the check interval by a factor of hundred or so. (The default is 100.)

Just add the following to after "if __name__ == '__main__'":
import sys

Let's run it again and see what happens:
Sequential Execution: 4.63999986649 seconds
Parallel Execution: 4.63199996948 seconds

Problem solved!

Avoiding Pre-emption Altogether

Personally, I like to include "sys.setcheckinterval(sys.maxint)" in all my scripts, so I don't have to lock shared data structures when manipulating them in Python code. If you do that, your threads will never be pre-empted as long as you don't call any blocking functions that release the GIL, and you can avoid the overhead of fine-grained locking, potential deadlocks, etc. What do you think?

11 comments :: Trivial Solution To CPU-Bound Thread Slowdown On Multicore Systems

  1. Forgive me if I'm wrong, but I'll mirror my reddit comment (since I don't want this to be google cached and lead some folks astray):

    Setting the check interval so high just means that they're effectively running sequentially... it's just running Thread1 until it's done and then Thread2 until it's done and voila... sequential time.

    With parallel execution you should get more than 0.008 seconds faster (which I would call a fluke).

  2. The above comment is correct. Also, time.time() measures wall clock time, so your benchmark times will be incorrect. You really want to measure CPU time, using time.clock()

  3. @TheMoken: 10000 ticks is as high as you think!

    In high performance mode, my core 2 duo runs 20 million iterations of the loop above in 2 seconds. That's 10 million iterations per second.

    Each iteration involves at least 3 bytecode instructions: decrement, compare, and jump. So that's at least 30 million python bytecode instructions per second.

    If the bytecode interpreter switches threads every 100 instructions, that means it's switching threads 300,000 times per second. That's ridiculous; no wonder it's more than twice as slow!

    Switching threads every 10000 instructions means switching threads 3000 times per second. I don't see how that's not enough to preserve the illusion of concurrent execution.

    I understand that for most real-world applications, each bytecode instruction is going to do much more than decrementing an integer counter, but then the slowdown on multicore systems will be correspondingly lower for those applications, too.

    If my deductions above are correct, then I think we can boldly say the GIL is not worse than what we thought it was, as long as the checkinterval is set appropriately.

    I think the default of switching every 100 bytecode instructions is extremely unrealistic in a world where systems have multiple cores and run millions of bytecode instructions per second.

  4. Correction:
    10,000 ticks is *NOT* as high as you think.

  5. It doesn't matter what you set it to, you're just closing in on sequential time by running them almost sequentially. Python's thread problems don't just go away because you start running larger block of the threads at a time. All that does is prevent the totally pointless threading overhead from choking your program. Basically, the threads are functioning in lockstep rather than in parallel, in addition to having the overhead of getting the GIL.

    If the system was properly threading, it would be a no-brainer to match sequential time but on your dual-core core2 the runtime should've dropped significantly because the native OS threads would've been able to take advantage of the multi-processor. That is... if Python's threading weren't fundamentally broken.

  6. Yes, CPython's threading is 'broken', but it's not as broken as the presentation suggested. It doesn't run more slowly on multi-core systems if configured appropriately.

  7. Seun, setting interval to 10,000 (3,000 per sec) will have the effect of limiting I/O per second, no?

  8. Peace be with you brother! i see your good works every where and i need your help in creating a forum similar to the one you've got.
    Agile Developers

  9. Very wonderful information Here bro, i really appreciate the concern in helping your fellow brothers out. Thank You and Peace. web host

  10. Hey Boss! I've heard and read a lot about you, and I'm impressed with it. I would love to join your crew as an apprentice programmer, or maybe as an article writer. My mail address is: Thank you.

  11. MR Henry On 01/07/2015 13:45:41
    #ATTENTION, WELCOME TO THE GREAT TEMPLE OF ILLUMINATI INITIATION HOW TO BECOME A MEMBER OF ILLUMINATI IN NIGERIA CALL DIRECT: +2348104497242 contact my mail box (THE ILLUMINATI_SECRET_SOCIETY WE OFFER YOU THE GREATEST OF GOOD FORTUNE) PLEASE MAKE UP YOUR MIND BEFORE YOU CALL BECAUSE ONCE YOU ARE IN THERE IS NO GOING BACK ONCE YOUR ARE SUCCESSFULLY INITIATED INTO THE ILLUMINATI. Now Belong to GREAT ILLUMINATI and get RICH without any human sacrifice,take away fear from your mind and become super rich on FAME,POWER,and RICHES. BRIEF ABOUT THE ILLUMINATI CONTACT US TODAY BY CALLING US Many members are people who WANT to achieve success, make more money and increase their own personal control, Love, Ministries, power, confidence, certainty and security. Many people join Illuminati Legacy because they have not yet achieved wealth, but WANT financial freedom and independence. They have goals and dreams and WANT them to come true. Maybe this is YOU right now. AIMS OF JOINING THE ILLUMINATI the Illuminati is a gathering of the elite and wise and those who believe in a new and better world,a one world government,that will keep joy in everyone face and make the wrongs right again. they will project a world of equal right and justice. uniting the world is what we stands for,helping those with dreams to rule and live a good and happy life achieving their dreams, the things you hear most of the time are not real please find out for yourself... WHO CAN JOIN THE ILLUMINATI... i believe you have heard different stories about our members all over the world. this is to tell you that everyone is welcome but you must be above 18 years all that matters to us is your precious soul and star,the brighter your star, the quicker for you to achieve your dreams,don't let anyone talk you down,just believe in yourself..and we shall live in you, you shall be reborn and see life in new dimension

Post a Comment