31 May 2015

31st of May

Farm news
The farm had a mixed week with running out of Climate Prediction work and then doing Asteroids and Seti work. This weekend saw the return of the Climate work units so I ran down the cache on the Intel-GPU machines so they can run without any contention.

I have even put one of the 6 core/12 thread machines on the job. Its running 6 at a time. They haven't run CPDN work before so it will be interesting to see how they go. I am sure they will work, just not how long they will take.

BOINC testing
We got 7.6.2 late last week and I have it installed on a couple of machines. I have made a few suggestions about how the new Options windows should appear. I don't expect anything will happen but hopefully they will improve before it becomes a release version.

File Server
The file server is back and running. At the moment we've only hooked up a single hard disk. I have ordered an Intel RAID controller so there is no point running the others until its installed..We're not too sure if the controller comes with cables or not and its at least a week away. If no cables then they have to be ordered and another week's delay.

The shop put a PWM fan in the top of the case. It continually revs-up and then slows down. I've told them they will have to change it to a normal one. Its noisy and very annoying.

Proxy server
I run one of these to try and reduce the downloads. Since switching back to using the old file server I noticed it doesn't seem to be saving much of anything so have spent some time this weekend going through the logs and adjusting settings.

The trigger for this was the CPDN downloads bring down an 80Mb file per machine as well as a number of 20Mb and 26Mb files, all of which get duplicated on the other machines. Windows Update does the same.

I am also looking at replacing it with something more energy efficient as well as having more CPU power. It will become a dedicated proxy server.

24 May 2015

24th of May

Farm news
Crunching continues with the Intel-GPU part of the cluster finishing off their Climate Prediction models. They ran out of the ANZ ones. I can do some of the short ones but they download large amounts of data and do a couple of 64Mb uploads. I did some of them but most seem to be resends of failed work from late last year and they fail anyway.

I had the 6 core/12 thread machines doing Asteroids work but Asteroids went off-line Saturday so they are now doing Seti work.

I did try some GPUgrid work but they too ran out of short work units. It seems they've refilled the queue so I will get them running again once the 6/12 machines finish off their work. Despite the cooler weather I can't run both at the same time because the room gets too hot.

File server
The shop got the socket 2011-v3 mounting only to find out the manufacturer uses a non-standard mounting so you have to purchase their cooler. Because they are proper servers they offer coolers for 1U, 2U and 4U rack mount. I have it in a tower case so size isn't an issue.

Maybe they will get it working next week so I can pick it up on the weekend. I will then need to set it up the way it was before.

I have also been researching RAID controllers. The motherboard has 2 built-in RAID controllers that can do RAID levels 0, 1, 5 and 10. These days RAID 6 (or 6+0) is recommended due to the probability of losing a 2nd drive while rebuilding the array. RAID 6 normally requires a chip to calculate the 2nd set of parity bits. I have narrowed it down to two cards of interest, one is an Adaptec and the other an Intel. Given the delays with the file server I have been holding off ordering one.

Future upgrades
One thing will be to update the CPU's in the 6/12 core machines. They currently have i7-5820k CPU's but I am thinking of updating them to the i7-5960x. Another option would be a Xeon as they also fit into a socket 2011-v3.

Another thing on the list is to update the proxy server to something more modern and more energy efficient. It could be an i3 or i5 based machine. At the moment the backup file server (a Pentium 4 @ 1.8Ghz) is doing the job but its slow.

16 May 2015

16th of May

Farm news
The Intel-GPU machines have all been doing Climate Prediction work. Well they were up until this afternoon when the CPDN database went off-line so I can't report work or do trickles, I have suspended them and we're concentrating on Asteroids work now.

I noticed a lot of the CPDN work units taking 165+ hours to complete when they normally take 110 hours. I put this down to CPU cache contention running 8 at a time. I am now running only 4 at a time per machine.

I did manage to do some GPUgrid work during the week. I was also testing a fix to BOINC to do with reuse of slot directories. The official fix will be coming in 7.6, at the moment we are testing the user option settings.

The Raspberry Pi's still haven't officially switched to Debian Jessie yet despite it becoming the stable release a week ago. Hopefully they will get there in a couple of weeks.

File server upgrade
The file server is off in the shop for an upgrade. The shop have already encountered their first problem. As a result they now have it until next weekend. The CPU cooler doesn't have the right mounting bracket for an Intel socket 2011-v3 so they will need to get one in. The case was full of dust despite cleaning it regularly and dust filters. I am taking the opportunity to replace all the case fans. They are still working, its more preventative maintenance seeing as the server runs 24x7.

I still need to find a suitable RAID controller for the file server that can do RAID 6 or better. The motherboard supports RAID 5. Statistically a second drive will fail around the same time as the first one and RAID 5 can only handle losing one drive so they recommend RAID 6 or better these days.

02 May 2015

2nd of May

Farm news
This week had a bit of the usual crunching, some Linux upgrades and a hard disk failure. Oh and finalising the file server upgrade.

Crunching is continuing for the Climate Prediction ANZ work units which take about 5 days to complete (each). The Intel GPU part of the cluster is running them. I also have been running a few GPUgrid long work units that take about 11 hours on the GTX970 and Asteroids work on the CPU cores.

Debian Jessie was released to the public along with Ubuntu Vivid. I was already running Jessie on the Raspberry Pi's but they haven't officially updated to Jessie yet. I tried upgrading the Parallella's to Vivid by going to Utopic (14.10) and then upgrading it to Vivid (15.04) but that failed. I had to reimage the SD card back to 14.04 and then update to Utopic. I suspect the kernel is too old as it been stuck on 3.12.0 for quite a while.

The hard disk in one of the GPUgrid crunchers failed after it had been running overnight. Fortunately I have a few spares so its been swapped out with another of the same vintage. I had to reinstalled windows, BOINC and a few other apps. It had a WD Black manufactured in June 2012 and they have a 5 year warranty. The on-line retailer has gone out of business so the only option is to return it to Malaysia. Given the postage cost its not going to happen. I am surprised that WD don't have a Australian distributor or a collection point.

File server upgrade
The other thing this week had me chasing up the computer shop in regards to the file server. I can't get the CPU I was originally after. Intel only shows them being available in trays, which means buying 50 or 100. I settled on the next best CPU a 6 core/12 thread Xeon with an 83 watt power rating. Memory unfortunately has to be ECC and DDR4 so that is costing a bit.

I have also ordered some 4Tb WD Se drives to go into the file server. I can then reduce the number of drives and still have more disk space. They will be in a 3 drive RAID 5 configuration which should give around 8Tb of usable space.