20210723, 14:59  #1 
"University student"
May 2021
Beijing, China
127 Posts 
Something productive from "Is it forbidden..." thread
However not every user here has a Radeon VII
Many people, like me, run Prime95 on laptops or home computers. For safety and environmental reasons, we could not run Prime95 24/7. Thus it takes us longer to finish assignments, that's months for 108M exponents, and 1.5 years for 332M. Last fiddled with by axn on 20210725 at 13:01 
20210723, 15:32  #2  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×5×593 Posts 
Quote:
None of what I post should be misconstrued as disparagement of the smallthroughput user or their hardware. It's all welcome, as long as it does not interfere with orderly progress, and all adds up. 

20210723, 21:25  #3 
"Tucker Kao"
Jan 2020
Head Base M168202123
5·113 Posts 
Buy AMD Threadripper 5970X and Nvidia Geforce 3080 Ti, exponents of the M332M should finish within at most 3 weeks.

20210724, 21:19  #4 
"David Kirkby"
Jan 2021
Althorne, Essex, UK
111000000_{2} Posts 
I'm no expert on GPUs, but I thought if you were going to buy a GPU to test exponents, the CPU does not need to be very powerful  the GPU is doing all the work. Obviously if one is not constrained by money, heat or power consumption, then buy the best of everything. But if one is trying to achieve a good performance system without spending a fortune, then buying both a highend CPU and a highend GPU, would be unnecessary.
Last fiddled with by drkirkby on 20210724 at 21:20 
20210724, 22:26  #5  
"Tucker Kao"
Jan 2020
Head Base M168202123
5×113 Posts 
Quote:
I'm waiting to hear from another user who already bought Geforce 3080 Ti, the details of heat consumptions and GHz days/Day. I use the CPU of my current old machine to run all the P1 factoring of all M168,***,*23 with B1 = 1,000,000 and B2 = 40,000,000, will take around 20 hours each. Running my GPU of my current old machine to finish those exponents up to 2^78, it seems to me that both can function at the same time without significant slowing downs. When I get my new PC which will likely be after Nov 21, 2021(Threadripper 5970X release date), I can perform 2 PRPs at the same time, 1 on CPU and 1 on GPU. Last fiddled with by tuckerkao on 20210724 at 23:18 

20210725, 00:28  #6 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5930_{10} Posts 
RTX 3080 Ti is fast at ~4800 GHD/d in TF, but at ~93. GHD/day for PRP, LL or P1, is comparable to a GTX 1080 Ti or RX 5500 XT or ~30% of an RX 6900XT or Radeon VII per https://www.mersenne.ca/cudalucas.php
I have multiple Radeon VIIs on a system served by a Celeron G1840, so yes it does not take much CPU to keep GPU apps going. Except when doing GCDs in P1. I recommend about as many physical CPU cores as GPUs & HT so the GPUs are unlikely to wait for each other. Also >16GB of system ram if doing a lot of GPU P1 on multiple 16GBvram GPUs simultaneously. 
20210725, 03:29  #7  
"Tucker Kao"
Jan 2020
Head Base M168202123
5×113 Posts 
Quote:
How do I know exactly the amount of days and hours needed to finish a PRP test of M168779323 on AMD RX 6900XT is no one else runs it the first time. Glad Kriesel mentioned about the difference between trial factoring and PRPs on GPU that Geforce 3080 Ti cannot support both. Once I get the new machine, I won't ask anyone's help, I'll just run myself. 

20210725, 12:11  #8  
"David Kirkby"
Jan 2021
Althorne, Essex, UK
2^{6}·7 Posts 
Quote:
For PRP tests it is not clear the GPU wins, but for trialfactoring the CPUs are not good. 

20210725, 14:49  #9 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×5×593 Posts 
For the CUDALucas or TF benchmark pages on mersenne.ca, and any blue column heading or the downward arrow right of GHzDays/day, pause your mouse cursor on them for popup descriptions.
The mersenne.ca CUDALucas benchmark page is useful within its limitations for relative comparisons between GPUs. The ~295 GHD/day values for Radeon VII are old, from old less efficient versions of Gpuowl, or from CUDALucas, and considerably understate maximum available performance with recent versions of Gpuowl. Tdulcet reported ~75% longer run times on Colab & NVIDIA Tesla GPUs with CUDALucas than recent Gpuowl. I've extensively benchmarked a Radeon VII across a wide variety of Gpuowl versions and all fft lengths supported in them from 3M to 192M, on Windows 10, for specified conditions. Resulting timings in ms/iter can be seen at the last attachment of https://www.mersenneforum.org/showpo...35&postcount=2. Those timings correspond to a range of performance for best version timing per fft length, from 316. to 486. GHD/day. (It might be possible to find other fft formulations that perform better; I used the first / default for each size. On occasion an alternate may perform better.) Note that these measurements were made while the GPU was neither as aggressively clocked as I and others have been able to reliably use on Radeon VIIs with Hynix Vram, nor operating at full GPU power, nor highest performance OS/driver combo. Benchmarking was done at 86% power limit for improved power efficiency. Also, reportedly ROCm on Linux provides the highest performance, with Woltman having reported 510 GHD/day with it on IIRC 5M fft. Compare to 447. at reduced power and clock on Windows at 5M. Finally, power consumption may be elevated by the more aggressive than standard GPU fan curve I'm using. Note also that prime/prime95 and Gpuowl each have some fft lengths for which running the next higher fft can be faster. I've found in benchmarking Gpuowl that the 13smooth ffts (3.25M, 6.5M etc) tend to be slower than the next larger fft (3.5M, 7M, etc.), as does 15M. At current wavefront ~105.1M, 5.5M fft applies, and Gpuowl V6.11380 benchmarked at 0.821 ms/iter, which corresponds to 0.9987 day/exponent/GPU, 419. GHD/day/GPU, again at reduced GPU power, on Windows, with belowmaximum reliable vram clocking. I computed ~1.53 GHD/d/W for a multiRadeonVII system, with power measured at the AC power cord, while running prime95 on its cpu. The GPUonly efficiency would be slightly higher. That AC input power accounts for all power used, including the system ram which drkirkby omitted from his list, and at 384GiB ECC on his system, is probably consuming considerable power in his system. Due to the high cost of a >1KW output UPS, I am running my GPUs rig with inline surge suppression but not UPS. Indicated GPU power per GPU range from 190 to 212W at the 86% setting. Total AC input power divided by number of GPUs operating was less than the nominal max GPU TDP. I'm currently running these GPUs at 80% for better power efficiency. The 419. GHD/day/GPU/~200Wactual/GPU is ~2.1 GHD/d/W on the GPUs alone, omitting system overhead and conversion losses. One Radeon VII so configured can match the throughput of the dual26core8167M $5000 system under certain conditions, at better power efficiency, and original cost of the entire open frame system divided by number of GPUs was ~$700. More power efficient, and much more capital efficient per unit throughput. And would still be ~4x more cost effective today than the 8167M system if created with current GPU costs. Last fiddled with by kriesel on 20210725 at 15:45 
20210725, 15:51  #10 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·5·593 Posts 

20210725, 16:08  #11 
Jun 2003
1010001010010_{2} Posts 
TDP doesn't mean maximum power consumed by the CPU. A 165W TDP processor could easily consume 200W or more running flat out. Not saying that's what your CPUs are doing, but it is possible.
Also 12 sticks of RAM consumes a fair bit of power. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Is it forbidden to talk about manual testing strategies?  Dobri  Dobri  45  20210726 03:39 
HTTP forbidden message?  bchaffin  Aliquot Sequences  1  20111226 06:48 
Which of these CPUs is most productive?  Rodrigo  Hardware  123  20110205 21:42 
LLR benchmark thread  Oddball  Riesel Prime Search  5  20100802 00:11 
Deutscher Thread (german thread)  TauCeti  NFSNET Discussion  0  20031211 22:12 