Variable | Location | Usage |
---|---|---|
BLOCKS_OVERRIDE | .env.gpu.xx | CUDA blocks count |
THREADS_OVERRIDE | .env.gpu.xx | threads per CUDA block |
POINTS_OVERRIDE | .env.gpu.xx | points per thread |
GPU_MEM_CLOCK_xx | .env.watchdog | Memory clock to use |
GPU_POWER_LIMIT_xx | .env.watchdog | Power limit to use |
If your card model best settings is not stored in server database, you may need to find the best settings to use. For this you can use the Benchmarker container to run and compare results.
- Let say xx is the card id to test : first from Portainer stop the watchdog container then the worker associated to your card xx. you wil be able then to run benchmark.
- Now still from Portainer, start the benchmarker container then open a shell inside (Exec console)
1. ./info.py : This script will show gpus infos, it give the possibles values to use for GPU_MEM_CLOCK_xx and GPU_POWER_LIMIT_xx, and it show the commands to run to update those settings (using nvidia-smi)
2. ./benchmark.py : This script is the benchmark, it will scan a range to display the scan speed. You can set the BLOCKS_OVERRIDE, THREADS_OVERRIDE and POINTS_OVERRIDE in the command line and compare results to find best one.
Test with default settings
Test with custom settings
3. My methodology and experience on optimizing settings (done on a RTX 3060Ti).
First I didn't played with gpu clock and power limit and focused on the triplet (blocks/threads/points). Once I got the good settings for blocks/threads/points I was satisfied of the speed : around 900 MH/s with 240 W power cunsumption.
Then I focused on power cunsumption an tried to lower power limit, I noticed I can get a speed around 640 MH/s by reducing memory clock from 6801 to 5001 and power limit from 240 to 100. Card had 100 W power cunsumption at this time.
I decided to keep 5001/100 settings as energy cost is high in France and even if I'm losing 28% speed, I can reduce energy cost by 60% with those settings. But if energy was cheap I may have kept 6801/240 (factory) settings. You decide.
So basically you will optimize a lot speed by finding the best values for blocks/threads/points, this is the most important. Memory speed/power limit can be an additionnal tuning to save money on energy.
Now lets focus on each parameters, the scanner used is cuBitcrack and show the following in the readme :
Good settings will varying with your Gpu model, on a RTX3060Ti 112/256/2048 seems a good triplet for b/t/p
To find it I've first tried only to change blocks (b), checking what value give the best speed. Trying to pick then the minimal one that match this best speed (112 in my case).
I then added threads (t) and noticed 256 seems a good value (this is also the default value).
Finally I added points and tried to find the highest points (p) value I can use, that was 2048.
I then tried to confim those settings were good by settings the 3 values at once, changing one to compare if this was faster or slower than my settings.
I suspect blocks is matching the count of processing unit on your Gpu (so hardware), threads is not significantly impacting performance and should be kept to 256 and points is tied to your available memory on the Gpu (maybe a kind of cache).
For those who want to optimize energy use, I suggest to lower memory according to availables values, getting lower but not lowest one (on my Gpu I had 7001, 6801, 5001, 810 and 405 choice, 6801 was the factory setting, so I choosed 5001 and not 810 and 405 as they are far too low). Then you can lower the power limit and check which one is best (in my case it was the lowest one 100W, I was not able to lower it more).
If you took the time to optimize your card, you can share your results in this thread , so I can add it to the database, once inside the database those values will be used if you do not define BLOCKS_OVERRIDE / THREADS_OVERRIDE / POINTS_OVERRIDE in your .env.gpu.xx files.