28 June 2015

28th of June

Farm news
We're back to running ANZ climate models. CPDN threw out a few thousand and I managed to get 98 of them. I had to do some "manual intervention" to get BOINC to pick up these as all the machines had a full load of Asteroids work in flight.

The farm is currently struggling to upload all the result files now the first batch of 32 CPDN work has completed. There are lots of big files trying to upload at once. Despite having a theoretical upload speed of 1Mbit (Speedtest says 700Kbits) I can only get around 100Kbps to the CPDN servers. I am managing it by suspending file transfers and only allowing 2 machines at a time to upload and only a single transfer each. Of course the work units all finish around the same time which is why I need to manage it.


File Server
The cable for the RAID controller that is destined for the file server still hasn't arrived so that task is yet to be finished. At least the shop has the replacement case fan.

The replacement UPS arrived on Monday and was put into service straight away. Its the same as the previous model so the software to monitor it and the cable to the computer are the same. No changes required there at least.


Linux
I am looking at a couple of Linux distributions too. I have tried Debian 18 but got stuck. Where's the sudo command gone? You can install sudo from the repo but then you aren't in the group to use it so I gave it the flick. I have a old copy of Mint 15 on CD so also gave that a try but it looks like the repos are no longer available.

I am trying this on an old Pentium 4 and keep unplugging the hard disks from it so I can clean install it and then go back to what it had later. I have a box of old hard disk drives so swapping  a PATA one out isn't a problem.

21 June 2015

21st of June

Farm news
Nothing exciting happening at the moment. Just crunching away. I did a burst of Seti work and then switched over to running Asteroids work. There is no sign of new climate models.

While setting up an app_config file to control the Einstein GPU app I discovered there was an easier way to control how many tasks a project runs at the same time. For Climate Prediction I had specified each of their model types with a max_concurrent tag which limits them to that many running at the same time (per model). each type has a separate app that is specific to that climate model. I found that there is now a tag project_max_concurrent that controls it for all types. Rather than specifying each app can only run 4 at a time I can just put a single statement in and tell it to run 4 in total, that way if I get a mix of models/apps it will still do what I want.

They say Windows 10 will be a 3Gb download, which is why I would prefer a DVD that I can use on all the machines. I have read some on-line commentaries saying they don't think it will be stable by the end of July.

I am also looking at the server operating system to bring the file server up to date now that the hardware has been refreshed. On that note I am still waiting for the RAID controller cable to arrive so its not quite completed yet. The shop thinks it will arrive next week after it got back-ordered.


Raspberry Pi's
Q: What is better that a Raspberry Pi2?
A: Two of them!
I have three Pi2's. Now that they are stable, after re-imaging them a couple of weeks ago, I decided to overclock one of them. It lasted a day before it crashed. The default Pi2 overclock of 1000Mhz is too much for them even with a heatsink on the CPU. I removed the overclock (edit the file /boot/config.txt) and its back to factory defaults again. I might try again with lower settings than 1000Mhz. It could also be the memory speed and/or overvolt that makes it unstable.

We're still waiting on Raspbian switching to the Jessie release. At the moment their last official release of Raspbian was on the 5th of May based upon Wheezy. Debian are on Jessie as their release version.

15 June 2015

15th of June

Farm news
The UPS on the file server died. It hasn't been holding its charge as you can see it regularly dropping to 94% battery and then quickly recharging. Then it started beeping and complaining "Battery failure". So I ordered another one (both Vanguard VGD1000) on Friday. I plugged the old one in on the weekend so I could get the error message and when I turned it on bang! and the smell of ozone. Well that is the end of it, probably could have changed the battery before, but not now. Oh well the new one is in and running, hopefully I will get more than 2 and a half years out of it.

I am still waiting for the RAID card and cable so I can get the drives in the file server set up. At the moment I am using a single drive.

On the crunching front the ANZ climate models ran out. I finished off the last one this morning. As the various machines finished theirs off I switched them over to Asteroids work, so now all the Intel GPU machines are running it. I gave the 6 core/12 thread machines a run at Seti work when they finished their climate models but they are currently off.


Windows 10
I updated the 6 core/12 thread machines with all the latest windows updates so they now have the icon to get Windows 10. I'd rather not download 10 copies of Win10 when it comes out so I will have to see if I can purchase a DVD. I would also prefer to clean-install them but will wait and see how the release goes, no doubt there will be some teething problems when it gets released at the end of July.

On that vein I am also trying to keep the graphics drivers up to date. Nvidia comes out with new drivers about once a month but they are usually just profiles for the latest games. Intel also comes out with new graphics drivers every couple of months, however theirs usually don't work well for number crunching so we tend to run older drivers which might not work with Win10.


Network infrastructure
I am also looking at the network setup with a view to making it a bit more secure and centralised. It will probably entail a new machine to replace the old file server which is doing other duties at the moment. There will probably also be some software changes and maybe a managed switch to allow network card teaming. Now all I need is my own networking consultant.

06 June 2015

6th of June

A slightly different approach to the blog this week. Rather than write about what the farm is doing I thought I would show it in the screen shots below. They are taken from BOINCtasks showing the various groupings of machines.

This lot are what I call the Intel-GPU machines. They are all i7-3770's with integrated HD4000 graphics. As you can see we are only using half the available threads. That is because the climate models seem to take around 50% longer if run on all threads.

This is the Nvidia-GPU machines. Actually its only two of them, the 6 core/12 thread machines. As you can see they too are running climate models on half the threads. With a bit of fiddling I could have them also running GPU work on their dual GTX750Ti's but given the climate models come in burst of work and then none haven't bothered.

This is the Raspberry Pi part of the farm. There are three Pi2's and a B+ in there. As you can see they are all running Einstein work.

There are also a couple of Parallella's but I haven't bothered taking a screen shot of them. They are running Einstein work as well.

I got a one question survey from Adapteva (the people who make the Parallella) asking what we wanted to see on it. My answer was "FFTW running on the Epiphany chip". The reason being that a lot of the projects use FFTW and if it can run on the Epiphany and is able to do them quicker than the ARM cores then we can get a performance boost for a number of projects.


Farm news
In other news the shop got the RAID card that I wanted to put into the file server but it didn't come with any cables so they now need to source a cable before I can make use of it. I haven't paid for it yet. If they can't get the cable then its useless and they can return it to their supplier.

A couple of the Pi2's have been having issues recently. I ended up re-imaging them with the same image the the one reliable Pi2 had. They've been running for the last 3 days so hopefully that has cured the reliability issues. It could have been the overclock or the frequent kernel updates that made them unreliable.


Windows 10
There is a fair bit of chatter in the message boards about Windows 10 and Microsoft having rolled out a number of updates to Windows 7 and Windows 8 users that gives an icon to do the upgrade when its released on July 29th. Most of the crunching machines are running Win7 so its a free upgrade for me. The Parallella's and Pi's are running Linux. There is Windows 10 available for the Pi2 but I can't see myself switching them to Win10.

I haven't installed the Win10 preview so I am not too sure how its going to look. I have seen some of the video clips of the preview which shows a lot of features that I don't want or need on a number crunching machine. The preview updates daily and may not look like the final product yet. I will look at putting it into a virtual machine and see what it looks like on there before deciding if I will be upgrading the Win7 machines.