Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Joblib worker error
#1
I'm using Joblib for parralelization.

Here is my call to joblib:

backtestdf_list = Parallel(n_jobs = coremax)(delayed(do_backtest)(backtest_start_index) for backtest_start_index in backtest_start_index_list)
Here is the function do_backtest:

def do_backtest(backtest_start_index):
    try:
        day_start_time = time.time()
        #Calculate the best params for the last X days
        def summarize(sample_dur):
            sampledf = completedf[(completedf['date'] >= unique_dates[backtest_start_index - sample_dur]) & (completedf['date'] < unique_dates[backtest_start_index])]
            return get_summarydf(sampledf).nlargest(1, 'avgdaily_percent_ROR')
        summarized_list = list(map(summarize, sample_dur_list))
        summarized_df = pd.concat(summarized_list)
        #Add sample_dur and b_index to the dataframe
        summarized_df['sample_dur'] = sample_dur_list
        summarized_df['b_index'] = backtest_start_index
        #summarized_df = summarized_df.sort_values(by='avgdaily_percent_ROR', ascending=False) #Uncomment if you want to view the summarized_df sorted, logically it does not matter if it is sorted or not at this point
        print('Finished backtesting ' + unique_dates[backtest_start_index] + ' @ ' + datetime.now().strftime(progressformat) + ' - Execution time (HH:MM:SS.xx) : ' + timer(day_start_time))
        return summarized_df
    except Exception as e:
        print('oh nooooo')
        print(e)
        errorlist.append((unique_dates[backtest_start_index], e))
And this it the output I get:

Output:
Finished backtesting 2023-12-21 @ 05-03-2024 12:01:09 PM - Execution time (HH:MM:SS.xx) : 00:00:06.46 Finished backtesting 2023-12-22 @ 05-03-2024 12:01:10 PM - Execution time (HH:MM:SS.xx) : 00:00:06.51 Finished backtesting 2023-12-26 @ 05-03-2024 12:01:11 PM - Execution time (HH:MM:SS.xx) : 00:00:06.43 Finished backtesting 2023-12-27 @ 05-03-2024 12:01:12 PM - Execution time (HH:MM:SS.xx) : 00:00:06.43 Finished backtesting 2023-12-28 @ 05-03-2024 12:01:12 PM - Execution time (HH:MM:SS.xx) : 00:00:06.41 Finished backtesting 2023-12-29 @ 05-03-2024 12:01:13 PM - Execution time (HH:MM:SS.xx) : 00:00:06.35 Finished backtesting 2024-01-02 @ 05-03-2024 12:01:14 PM - Execution time (HH:MM:SS.xx) : 00:00:06.52 Finished backtesting 2024-01-03 @ 05-03-2024 12:01:15 PM - Execution time (HH:MM:SS.xx) : 00:00:06.55 Finished backtesting 2024-01-04 @ 05-03-2024 12:01:15 PM - Execution time (HH:MM:SS.xx) : 00:00:06.37 Finished backtesting 2024-01-05 @ 05-03-2024 12:01:16 PM - Execution time (HH:MM:SS.xx) : 00:00:06.43 Finished backtesting 2024-01-08 @ 05-03-2024 12:01:17 PM - Execution time (HH:MM:SS.xx) : 00:00:06.46 Finished backtesting 2024-01-09 @ 05-03-2024 12:01:18 PM - Execution time (HH:MM:SS.xx) : 00:00:06.69 /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. warnings.warn( Finished backtesting 2024-01-10 @ 05-03-2024 12:01:19 PM - Execution time (HH:MM:SS.xx) : 00:00:06.59 Finished backtesting 2024-01-11 @ 05-03-2024 12:01:19 PM - Execution time (HH:MM:SS.xx) : 00:00:06.49 Finished backtesting 2024-01-12 @ 05-03-2024 12:01:20 PM - Execution time (HH:MM:SS.xx) : 00:00:06.27 Finished backtesting 2024-01-16 @ 05-03-2024 12:01:20 PM - Execution time (HH:MM:SS.xx) : 00:00:06.12 Finished backtesting 2024-01-17 @ 05-03-2024 12:01:21 PM - Execution time (HH:MM:SS.xx) : 00:00:06.48 Finished backtesting 2024-01-18 @ 05-03-2024 12:01:22 PM - Execution time (HH:MM:SS.xx) : 00:00:06.37 Finished backtesting 2024-01-19 @ 05-03-2024 12:01:23 PM - Execution time (HH:MM:SS.xx) : 00:00:06.32 Finished backtesting 2024-01-22 @ 05-03-2024 12:01:23 PM - Execution time (HH:MM:SS.xx) : 00:00:06.09 /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. warnings.warn( Finished backtesting 2024-01-23 @ 05-03-2024 12:01:25 PM - Execution time (HH:MM:SS.xx) : 00:00:06.76 Finished backtesting 2024-01-24 @ 05-03-2024 12:01:25 PM - Execution time (HH:MM:SS.xx) : 00:00:06.62 Finished backtesting 2024-01-25 @ 05-03-2024 12:01:26 PM - Execution time (HH:MM:SS.xx) : 00:00:06.50 Finished backtesting 2024-01-29 @ 05-03-2024 12:01:27 PM - Execution time (HH:MM:SS.xx) : 00:00:06.61 Finished backtesting 2024-01-26 @ 05-03-2024 12:01:28 PM - Execution time (HH:MM:SS.xx) : 00:00:06.63 Finished backtesting 2024-01-30 @ 05-03-2024 12:01:28 PM - Execution time (HH:MM:SS.xx) : 00:00:06.57 Finished backtesting 2024-01-31 @ 05-03-2024 12:01:29 PM - Execution time (HH:MM:SS.xx) : 00:00:06.36 Finished backtesting 2024-02-01 @ 05-03-2024 12:01:30 PM - Execution time (HH:MM:SS.xx) : 00:00:06.70 Finished backtesting 2024-02-02 @ 05-03-2024 12:01:30 PM - Execution time (HH:MM:SS.xx) : 00:00:06.57 Finished backtesting 2024-02-05 @ 05-03-2024 12:01:31 PM - Execution time (HH:MM:SS.xx) : 00:00:06.64 Finished backtesting 2024-02-06 @ 05-03-2024 12:01:32 PM - Execution time (HH:MM:SS.xx) : 00:00:06.65 Finished backtesting 2024-02-07 @ 05-03-2024 12:01:33 PM - Execution time (HH:MM:SS.xx) : 00:00:06.53
So as you can see there I am occassionally getting errors with certain joblib workers. Here are my observations:

I'm running across 2 different computers, Windows 11 and MacOS 14.4.1 both on Python3 updated to latest version of joblib.

The error seems to be happening MUCH more often on the Mac than the Windows machine. In fact, it's quite rare on the Windows machine, but on the mac, if I'm the joblib function for more than a few hundred times, it's bound to happen at least once or twice. This discrepancy between MacOS and Windows makes me think maybe it's an actual bug with the joblib code on MacOS?

I also though maybe I'm running out of RAM, I have been monitoring my memory usage during execution and it happens plenty enough when there is still extra RAM to spare, so that's not it.

If I run my script multiple times, it never happens in the same place. Nor does it happen the same number of times, sometimes it doesn't happen at all. So it seems random when it does happen.

It's also my understanding that if a worker fails, joblib does NOT automatically retry running that worker and we have a bit of our data missing. If someone can confirm that this is correct, I'd appreciate it. So you see in my function, I thought I'd write in some error handling code, because I was checking of ways to make joblib re-run a worker if it fails. My error handling code above, is failing however - if it catches the exception, it should at least print('oh nooooo') but it doesn't even do that so the exceptions are not caught. It is also my understanding that joblib has no internal way to automatically re-run failed workers (if someone can please confirm), so I need to manually handle a re-run.

The error also says this may be caused by too short of a worker timeout. In my research, by default if you don't specify a worker timeout, then isn't the timeout infinite in joblib? So by specifying a timeout, I would only make it shorter so that wouldn't help... or so I think....

Ultimately if a worker fails, I just need the data re-run. So any suggestions on how to properly code it so that the data does re-run on a worker fail?
Reply
#2
Output:
b_index: 265 Finished backtesting 2024-01-24 @ 05-03-2024 01:32:15 PM - Execution time (HH:MM:SS.xx) : 00:00:08.69 b_index: 266 Finished backtesting 2024-01-25 @ 05-03-2024 01:32:16 PM - Execution time (HH:MM:SS.xx) : 00:00:08.86 b_index: 267 Finished backtesting 2024-01-26 @ 05-03-2024 01:32:17 PM - Execution time (HH:MM:SS.xx) : 00:00:08.69 b_index: 268 Finished backtesting 2024-01-29 @ 05-03-2024 01:32:17 PM - Execution time (HH:MM:SS.xx) : 00:00:08.65 b_index: 269 Finished backtesting 2024-01-30 @ 05-03-2024 01:32:18 PM - Execution time (HH:MM:SS.xx) : 00:00:08.47 b_index: 270 Finished backtesting 2024-01-31 @ 05-03-2024 01:32:19 PM - Execution time (HH:MM:SS.xx) : 00:00:08.49 b_index: 271 Finished backtesting 2024-02-01 @ 05-03-2024 01:32:20 PM - Execution time (HH:MM:SS.xx) : 00:00:08.53 b_index: 272 Finished backtesting 2024-02-02 @ 05-03-2024 01:32:21 PM - Execution time (HH:MM:SS.xx) : 00:00:08.81 b_index: 273 Finished backtesting 2024-02-05 @ 05-03-2024 01:32:21 PM - Execution time (HH:MM:SS.xx) : 00:00:08.62 b_index: 274 Finished backtesting 2024-02-06 @ 05-03-2024 01:32:22 PM - Execution time (HH:MM:SS.xx) : 00:00:08.73 b_index: 275 Finished backtesting 2024-02-07 @ 05-03-2024 01:32:23 PM - Execution time (HH:MM:SS.xx) : 00:00:08.41 b_index: 276 Finished backtesting 2024-02-08 @ 05-03-2024 01:32:24 PM - Execution time (HH:MM:SS.xx) : 00:00:08.65 b_index: 277 Finished backtesting 2024-02-09 @ 05-03-2024 01:32:24 PM - Execution time (HH:MM:SS.xx) : 00:00:08.47 b_index: 278 Finished backtesting 2024-02-12 @ 05-03-2024 01:32:25 PM - Execution time (HH:MM:SS.xx) : 00:00:08.31 b_index: 279 Finished backtesting 2024-02-13 @ 05-03-2024 01:32:26 PM - Execution time (HH:MM:SS.xx) : 00:00:08.42 b_index: 280 Finished backtesting 2024-02-14 @ 05-03-2024 01:32:26 PM - Execution time (HH:MM:SS.xx) : 00:00:08.26 /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. warnings.warn( b_index: 281 Finished backtesting 2024-02-15 @ 05-03-2024 01:32:27 PM - Execution time (HH:MM:SS.xx) : 00:00:08.55 b_index: 282 Finished backtesting 2024-02-16 @ 05-03-2024 01:32:28 PM - Execution time (HH:MM:SS.xx) : 00:00:08.50 b_index: 283 Finished backtesting 2024-02-20 @ 05-03-2024 01:32:29 PM - Execution time (HH:MM:SS.xx) : 00:00:08.59 b_index: 284 Finished backtesting 2024-02-21 @ 05-03-2024 01:32:30 PM - Execution time (HH:MM:SS.xx) : 00:00:08.61 b_index: 285 Finished backtesting 2024-02-22 @ 05-03-2024 01:32:31 PM - Execution time (HH:MM:SS.xx) : 00:00:08.51 b_index: 286 Finished backtesting 2024-02-23 @ 05-03-2024 01:32:32 PM - Execution time (HH:MM:SS.xx) : 00:00:08.94 b_index: 287 Finished backtesting 2024-02-26 @ 05-03-2024 01:32:33 PM - Execution time (HH:MM:SS.xx) : 00:00:08.99 b_index: 288 Finished backtesting 2024-02-27 @ 05-03-2024 01:32:33 PM - Execution time (HH:MM:SS.xx) : 00:00:08.96 b_index: 289 Finished backtesting 2024-02-28 @ 05-03-2024 01:32:34 PM - Execution time (HH:MM:SS.xx) : 00:00:08.83 b_index: 290 Finished backtesting 2024-02-29 @ 05-03-2024 01:32:35 PM - Execution time (HH:MM:SS.xx) : 00:00:08.77 b_index: 291 Finished backtesting 2024-03-01 @ 05-03-2024 01:32:35 PM - Execution time (HH:MM:SS.xx) : 00:00:08.75 b_index: 292 Finished backtesting 2024-03-04 @ 05-03-2024 01:32:36 PM - Execution time (HH:MM:SS.xx) : 00:00:08.88 b_index: 293 Finished backtesting 2024-03-05 @ 05-03-2024 01:32:38 PM - Execution time (HH:MM:SS.xx) : 00:00:08.89
So above is another run of joblib. I inserted a variable b_index to be shown in the output - as a reminder my joblib calls function do_backtest() which returns a dataframe - and b_index is inserted into one of the columns of the dataframe. So I thought that if a joblib worker fails, then it does not re-execute and you lose the data - but I guess I was wrong? Every b_index value in the vicinity is executed, and in the resultant dataframe b_index 280 has all the required values. I know that joblib sometimes does not execute all the items in order, my machine has 16 threads and I checked all the b_indexes from 260-300 just to be safe in my resultant dataframe and all the data is there... so even though I got the message, it processed all the data...

But I still don't know what triggers this message, or does it always still process and give me all the data or sometimes not...
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  About command 'joblib.load' middlestudent 4 987 Sep-25-2023, 05:56 AM
Last Post: middlestudent
  Suspending worker threads olderguy 1 2,065 Apr-21-2021, 10:28 AM
Last Post: Larz60+
  Receive data from Redis RQ worker process freak14 0 1,938 Jul-15-2019, 12:39 PM
Last Post: freak14
  Sending a custom signal from a worker mhc 4 4,857 Jul-03-2018, 04:05 PM
Last Post: mhc
  Good way to have a worker queue accessible by multiple process? cheater 2 2,552 Dec-21-2017, 09:30 PM
Last Post: wavic
  Worker thread? micko 1 3,075 Feb-27-2017, 09:52 AM
Last Post: Ofnuts

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020