Paranoid, Init

November 18th, 2014

Having marvelled at the er… unique nature of MikeeUSA’s Systemd Blues: Took our thing (Wooo) blues homage to the perils of using systemd, I decided what the world actually needs is something from the metal genre.

So, here’s the lyrics to Paranoid, Init.

Default soon on Debian
This doesn’t help me with my mind
People think I’m insane
Because I am trolling all the time

All day long I fight Red Hat
And uphold UNIX philosophy
Think I’ll lose my mind
If I can’t use sysvinit on jessie

Can you help me
Terrorise pid 1?
Oh yeah!

Tried to show the committee
That things were wrong with this design
They can’t see Poettering’s plan in this
They must be blind

Some sick joke I could just cry
GNOME needs logind API
QR codes gave me a feel
Then binary logs just broke the deal

And so as you hear these words
Telling you now of my state
Can’t log off and enjoy life
I’ve another sock puppet to create

Currently not possible

October 12th, 2014

On Thursday 9th, after weeks of low-level frustration at having to press “close” on every login, I sent a complaint to Barclays asking them to stop asking me on every single login to switch to paperless statements with a dialog box that has only two options:

Switch to paperless statements

This morning they replied:

Please be advised that it is currently not possible for us to remove the switch to paperless statements advert.

So, uh, I suppose if you’re a web developer who thinks that it’s acceptable to ask a question on every login and not supply any means for the user to say, “stop asking me this question”, there is still a job for you in the banking industry. No one there will at any point tell you that this is awful user experience. They will probably just tell you, “good job”, from their jacuzzi full of cash that they got from charging people £5.80 a month to have a bank account, of which £0.30 is for posting a bank statement.

Meanwhile, on another part of their site, I attempt to tell them to send me letters by email not post, but the web site does not allow me to because it thinks I do not have an email address set. Even though the same screen shows my set email address which has been set for years.

Go home Barclays, you're drunk

After light mocking on Twitter they asked me to try using a different browser, before completely misunderstanding what I was talking about, at which point I gave up.

Diversity at OggCamp comment

October 10th, 2014

There’s an interesting post about diversity at a tech conference. It is itself a response to a number of tweets by an attendee, so you should read both those things, and probably all of the other comments first.

I’ve now tried twice to add a comment to this article, but each time my comment disappears into the ether. Mark tells me that he is not seeing the comments, i.e. they are not being held for moderation, so I just assume some bit of tech somewhere is failing. Yes, I do get the captcha challenge thing and do complete it successfully. Blog comment systems are awful aren’t they?

So anyway, here’s the most recent version of the comment I tried to add:

I originally wrote this comment on the evening of the 6th, but the blog appears to have eaten it, and I no longer have a copy of it so I’ll have to try to re-type it from memory. Also since then I note a number of other comments which are highly opposed to what I wrote, so you’ll have to take my word for it that this is genuine comment and not an attempt to cause strife.

I do not believe that OggCamp specifically has a problem and I agree with much of what Mark has written, particularly that the unconference format is not in fact used to excuse lack of diversity (though it can be, and doubtless will be, by someone). I do believe that OggCamp has tried quite hard to be welcoming to all, and in many ways has succeeded. There seems to be a slightly larger percentage of female attendees at OggCamp compared to other tech conferences I have been to. I feel strongly that there is a larger percentage of female speakers at OggCamp.

I do however believe the widespread observation that tech conferences and tech in general do have a problem with attracting people who aren’t white males. I do believe that any group organising a conference are obligated to try to fix this, which means that the organisers of OggCamp are.

Stating that there is no such problem and that everyone is welcome is not going to fix it. Clearly there is a problem here, there’s people reporting that there’s a problem and they don’t think you’re doing all that you could do to be welcoming. There’s a word for telling people who say they’re subject to an unwelcoming environment that they in fact are wrong about how they feel, and I’d really like for this not to go there.

However I do not think that many of the things that Mark has proposed will actually make any difference, as well-intentioned as they are. To help improve matters I think that OggCamp should do some things that Mark (and many others in these comments, apparently) will not like.

I am in favour of positive reinforcement / affirmative action / speaker quotas / whatever you want to refer to it as, as part of a diversity statement. Like, aspirational. To be regarded as a sort of “could do better” if it wasn’t achieved. I believe it has shown to be effective.

My first suggestion is to have some sort of diversity goal, perhaps one like, “ideally at least one largest-stage slot per day will be taken by a person who is not a white male”. If we assume one largest stage, two slots each on morning and afternoon, that’s four per day so that’s aiming for 25% main stage representation of speakers who aren’t white males. I believe the gender split alone (before we consider race or other marginalised attributes) in the tech industry is something like 80/20 so this doesn’t sound outrageous.

My second suggestion—and I feel this is possibly more important than the first—is to get more diversity in the group of people selecting the invited speakers. I think a bunch of white males (like myself) sitting about pontificating about diversity isn’t very much better than not doing anything at all. Put those decisions into the hands of the demographic we are trying to encourage.

So, I suggest asking zenaynay to speak at the next OggCamp, and I suggest asking zenaynay if they know any other people who aren’t white males who would like to speak at a future OggCamp.

I do not think that merely marketing OggCamp in more places will fix much. People that aren’t white males tend to be put off from speaking at events like OggCamp and the only way to change their minds is to directly contact them. More diverse speakers will lead to more diverse attendees.

In the same vein, there’s the code of conduct issue. We tend to believe that we are all really nice guys doing the best we can; we would never offend or upset anyone, we would never exclude anyone. The thing is, people who aren’t like us have a very different experience of the world. So just saying that we’re not like that isn’t really enough. Codes of conduct for conferences are a good idea for this reason. Many people who are not white males will not attend a conference that doesn’t have one, because they feel like there is no commitment there and they’re not welcome (or in many cases, safe).

Ashe Dryden compiled a useful page of tips for increasing diversity at tech conferences. If there is genuine desire to do this then I think you have to come up with a great counter-argument as to why it isn’t worth trying the things that Ashe Dryden has said have worked for others. Codes of conduct and diversity goals are in there. As is personally inviting speakers.

“We don’t have time to run a full CFP process” seems like one of the stronger counter-arguments to all of this, to which I think there are two answers:

  1. Don’t bother then; nothing changes.
  2. Try to find volunteers to do it for you; something may change.

Shanley Kane wrote a great collection of essays called Your Startup is Broken. Of course this is about startups (and a US-centric slant, too) not conferences, but it is a great read nontheless and touches upon all the sorts of issues that are relevant here. I really recommend it. It’s only $10.

Finally, I feel that many of the commentors are being a little too defensive. Try to take it as an indictment of the tech sector, not an indictment of OggCamp, and try to use it as feedback to improve things.

What’s my btrfs doing? And how do I recover from it?

August 8th, 2014

I’ve been experimenting with btrfs on my home file server for a while. Yes, I know it’s not particularly bleeding edge or anything any more but I’m quite conservative even for just my household’s stuff as restoring from backup would be quite tedious.

Briefly, the btrfs volume was initially across four 2TB disks in RAID10 for data and metadata. At a later date I also added a 500G disk but had never rebalanced so that had no data on it.

$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.03GB path /dev/sdh
        devid    3 size 1.82TB used 555.03GB path /dev/sdi
        devid    4 size 1.82TB used 555.03GB path /dev/sdj
        devid    5 size 465.76GB used 0.00 path /dev/sdk
        devid    2 size 1.82TB used 555.03GB path /dev/sdg
 
Btrfs v0.20-rc1-358-g194aa4a
$ sudo btrfs filesystem df /srv/tank
Data, RAID10: total=1.08TB, used=1.08TB
System, RAID10: total=64.00MB, used=128.00KB
System: total=4.00MB, used=0.00
Metadata, RAID10: total=2.52GB, used=1.34GB

Yesterday, one of the disks started misbehaving:

Aug  7 12:17:32 specialbrew kernel: [5392685.363089] ata5.00: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.369272] ata5.01: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.375651] ata5.02: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.381796] ata5.03: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.388082] ata5.04: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.394213] ata5.05: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.400213] ata5.15: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug  7 12:17:32 specialbrew kernel: [5392685.406556] ata5.15: irq_stat 0x00060002, PMP DMA CS errata
Aug  7 12:17:32 specialbrew kernel: [5392685.412787] ata5.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug  7 12:17:32 specialbrew kernel: [5392685.419286] ata5.00: failed command: WRITE DMA
Aug  7 12:17:32 specialbrew kernel: [5392685.425504] ata5.00: cmd ca/00:08:56:06:a1/00:00:00:00:00/e0 tag 1 dma 4096 out
Aug  7 12:17:32 specialbrew kernel: [5392685.425504]          res 9a/d7:00:00:00:00/00:00:00:10:9a/00 Emask 0x2 (HSM violation)
Aug  7 12:17:32 specialbrew kernel: [5392685.438350] ata5.00: status: { Busy }
Aug  7 12:17:32 specialbrew kernel: [5392685.444592] ata5.00: error: { ICRC UNC IDNF ABRT }
Aug  7 12:17:32 specialbrew kernel: [5392685.451016] ata5.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug  7 12:17:32 specialbrew kernel: [5392685.457334] ata5.01: failed command: WRITE DMA
Aug  7 12:17:32 specialbrew kernel: [5392685.463784] ata5.01: cmd ca/00:18:de:67:9c/00:00:00:00:00/e0 tag 0 dma 12288 out
Aug  7 12:17:32 specialbrew kernel: [5392685.463784]          res 9a/d7:00:00:00:00/00:00:00:00:9a/00 Emask 0x2 (HSM violation)
.
.
(lots more of that)
.
.
Aug  7 12:17:53 specialbrew kernel: [5392706.325072] btrfs: bdev /dev/sdh errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
Aug  7 12:17:53 specialbrew kernel: [5392706.325228] btrfs: bdev /dev/sdh errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
Aug  7 12:17:53 specialbrew kernel: [5392706.339976] sd 4:3:0:0: [sdh] Stopping disk
Aug  7 12:17:53 specialbrew kernel: [5392706.346436] sd 4:3:0:0: [sdh] START_STOP FAILED
Aug  7 12:17:53 specialbrew kernel: [5392706.352944] sd 4:3:0:0: [sdh]  
Aug  7 12:17:53 specialbrew kernel: [5392706.356489] end_request: I/O error, dev sdh, sector 0
Aug  7 12:17:53 specialbrew kernel: [5392706.365413] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug  7 12:17:53 specialbrew kernel: [5392706.475838] lost page write due to I/O error on /dev/sdh
Aug  7 12:17:53 specialbrew kernel: [5392706.482266] lost page write due to I/O error on /dev/sdh
Aug  7 12:17:53 specialbrew kernel: [5392706.488496] lost page write due to I/O error on /dev/sdh

After that point, /dev/sdh no longer existed on the system.

Okay, so then I told btrfs to forget about that device:

$ sudo btrfs device delete missing /srv/tank
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    3 size 1.82TB used 555.03GB path /dev/sdi
        devid    4 size 1.82TB used 555.03GB path /dev/sdj
        devid    5 size 465.76GB used 0.00 path /dev/sdk
        devid    2 size 1.82TB used 555.03GB path /dev/sdg
        *** Some devices missing
 
Btrfs v0.20-rc1-358-g194aa4a

Apart from the obvious fact that a device was then missing, things seemed happier at this point. I decided to pull the disk and re-insert it to see if it still gave errors (it’s in a hot swap chassis). After plugging the disk back in it pops up as /dev/sdl and rejoins the volume:

$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.04GB path /dev/sdl
        devid    3 size 1.82TB used 555.03GB path /dev/sdi
        devid    4 size 1.82TB used 555.03GB path /dev/sdj
        devid    5 size 465.76GB used 0.00 path /dev/sdk
        devid    2 size 1.82TB used 555.03GB path /dev/sdg
 
Btrfs v0.20-rc1-358-g194aa4a

…but the disk is still very unhappy:

Aug  7 17:46:46 specialbrew kernel: [5412439.946138] sd 4:3:0:0: [sdl] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Aug  7 17:46:46 specialbrew kernel: [5412439.946142] sd 4:3:0:0: [sdl] 4096-byte physical blocks
Aug  7 17:46:46 specialbrew kernel: [5412439.946247] sd 4:3:0:0: [sdl] Write Protect is off
Aug  7 17:46:46 specialbrew kernel: [5412439.946252] sd 4:3:0:0: [sdl] Mode Sense: 00 3a 00 00
Aug  7 17:46:46 specialbrew kernel: [5412439.946294] sd 4:3:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug  7 17:46:46 specialbrew kernel: [5412439.952286]  sdl: unknown partition table
Aug  7 17:46:46 specialbrew kernel: [5412439.990436] sd 4:3:0:0: [sdl] Attached SCSI disk
Aug  7 17:46:47 specialbrew kernel: [5412440.471412] btrfs: device label tank devid 1 transid 504721 /dev/sdl
Aug  7 17:47:17 specialbrew kernel: [5412470.408079] btrfs: bdev /dev/sdl errs: wr 7464, rd 0, flush 332, corrupt 0, gen 0
Aug  7 17:47:17 specialbrew kernel: [5412470.415931] lost page write due to I/O error on /dev/sdl

Okay. So by then I was prepared to accept that this disk was toast and I just wanted it gone. How to achieve this?

Given that data was still being read off this disk okay (confirmed by dd, iostat), I thought maybe the clever thing to do would be to tell btrfs to delete this disk while it was still part of the volume.

According to the documentation this would rebalance data off of the device to the other devices (still plenty of capacity available for two copies of everything even with one disk missing). That way the period of time where there was a risk of double disk failure leading to data loss would be avoided.

$ sudo btrfs device delete /dev/sdl /srv/tank

*twiddle thumbs*

Nope, still going.

Hmm, what is it doing?

$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.04GB path /dev/sdl
        devid    3 size 1.82TB used 556.03GB path /dev/sdi
        devid    4 size 1.82TB used 556.03GB path /dev/sdj
        devid    5 size 465.76GB used 26.00GB path /dev/sdk
        devid    2 size 1.82TB used 556.03GB path /dev/sdg

Seems that it’s written 26GB of data to sdk (previously unused), and a little to some of the others. I’ll guess that it’s using sdk to rebalance onto, and doing so at a rate of about 1GB per minute. So in around 555 minutes this should finish and sdl will be removed, and I can eject the disk and later insert a good one?

Well, it’s now quite a few hours later and sdk is now full, but the btrfs device delete still hasn’t finished, and in fact iostat believes that writes are still taking place to all disks in the volume apart from sdl:

$ sudo iostat -x -d 5 sd{g,i,j,k,l}
Linux 3.13-0.bpo.1-amd64 (specialbrew.localnet)         08/08/14        _x86_64_        (2 CPU)
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdg               6.50     0.89    2.49    1.60    54.30   136.42    93.31     0.43  105.19   73.77  154.12   1.63   0.67
sdk               0.00     0.79    0.00    0.89     0.02    97.93   218.89     0.08   91.43    5.69   91.79   5.70   0.51
sdj               2.26     1.10    0.79    1.38    65.45   136.39   185.57     0.19   86.94   46.38  110.20   5.17   1.12
sdi               8.27     1.34    3.39    1.21    88.11   136.39    97.55     0.60  130.79   46.89  365.87   2.72   1.25
sdl               0.24     0.00    0.01    0.00     1.00     0.00   255.37     0.00    1.40    1.40    0.00   1.08   0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdg               0.00     0.00    0.00   87.20     0.00  4202.40    96.39     0.64    7.39    0.00    7.39   4.43  38.64
sdk               0.00     0.20    0.00  102.40     0.00  3701.60    72.30     2.40   23.38    0.00   23.38   8.63  88.40
sdj               0.00     0.00    0.00   87.20     0.00  4202.40    96.39     0.98   11.28    0.00   11.28   5.20  45.36
sdi               0.00     0.20    0.00  118.00     0.00  4200.80    71.20     1.21   10.24    0.00   10.24   4.45  52.56
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.04GB path /dev/sdl
        devid    3 size 1.82TB used 555.29GB path /dev/sdi
        devid    4 size 1.82TB used 555.29GB path /dev/sdj
        devid    5 size 465.76GB used 465.76GB path /dev/sdk
        devid    2 size 1.82TB used 555.29GB path /dev/sdg
 
Btrfs v0.20-rc1-358-g194aa4a

Worse still, btrfs thinks it’s out of space:

$ touch foo
touch: cannot touch `foo': No space left on device
$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
-               7.8T  2.2T  5.2T  30% /srv/tank/backups

So, that’s a bit alarming.

I don’t think that btrfs device delete is ever going to finish. I think what I probably should have done is just forcibly yanked sdl and then done btrfs device delete missing, and put up with the window of possible double disk failure.

But what’s done is done and now I need to recover from this.

Should I ctrl-c the btrfs device delete? If I do that and the machine is still responsive, should I then yank sdl?

I have one spare disk slot into which I could place the new disk when it arrives, without rebooting or interrupting anything. I assume that will then register as sdm and I could add it to the btrfs volume. Would the rebalancing then start using that and complete, thus allowing me to yank sdl?

Some input from anyone who’s actually been through this would be appreciated!

Update 2014-08-12

It’s all okay again now. Here’s a quick summary for those who just want to know what I did:

  • Asked for some advice from Hugo, who knows a lot more about btrfs than me!
  • Found I could not ctrl-c the device delete and had to reboot.
  • Discovered I could mount the volume with -oro,degraded,recovery, i.e. read-only. It couldn’t be mounted read-write at this stage.
  • Took a complete local backup of the 1.08TiB of data via the read-only mount onto one of the new 3TB disks that had arrived on the Friday.
  • Made a bug report against the Linux kernel for the fact that mount -odegraded,recovery would go into deadlock.
  • Compiled the latest mainline kernel from source using the instructions in the Debian Linux Kernel Handbook. After booting into it mount -odegraded,recovery worked and I had a read-write volume again.
  • Compiled a new btrfs-tools.
  • Inserted one of the new 3TB disks and did a btrfs replace start /dev/sdj /dev/sdl /srv/tank in order to replace the smallest 500GB device (sdj) with the new 3TB device (sdl).
  • Once that was complete, did btrfs filesystem resize 5:max /srv/tank in order to let btrfs know to use the entirety of the device with id 5 (sdl, the new 3TB disk).
  • Did a btrfs balance start -v -dconvert=raid1,soft -mconvert=raid1,soft /srv/tank to convert everything from RAID-10 to RAID-1 so as to be more flexible in future with different-sized devices.
  • Finally btrfs device delete missing /srv/tank to return the volume to non-degraded state.
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 4 FS bytes used 1.09TiB
        devid    2 size 1.82TiB used 372.03GiB path /dev/sdg
        devid    3 size 1.82TiB used 373.00GiB path /dev/sdh
        devid    4 size 1.82TiB used 372.00GiB path /dev/sdi
        devid    5 size 2.73TiB used 1.09TiB path /dev/sdl
 
Btrfs v3.14.2

A more detailed account of the escapade follows, with some chat logs between Hugo and I thrown in to help people’s web searching.

A plan is hatched

<grifferz> according to iostat it's writing quite a lot to all four
           disks, and doing no reading at all
<grifferz> but it is also constantly saying
<grifferz> Aug  8 06:48:28 specialbrew kernel: [5459343.262187] btrfs:
           bdev /dev/sdl errs: wr 122021062, rd 0, flush 74622, corrupt
           0, gen 0
<darkling> OK, reading further, I don't think you'll be able to ^C the
           dev delete.
<darkling> So at this point, it's probably a forcible reboot (as polite
           as you can make it, but...)
<darkling> Take the dead disk out before the OS has a chance to see it.
<grifferz> if I waited and did nothing at all until the new disk
           arrives, if I insert it and add it to the volume do you think
           it will recover?
<darkling> This is then the point at which you run into the other
           problem, which is that you've got a small disk in there with
           4 devices on a RAID-10.
<grifferz> if adding the new disk would allow the dev delete to
           complete, presumably I could then do another dev delete for
           the 500G disk
<darkling> No, dev delete is going to fall over on the corrupt sections
           of the device.
<darkling> I wouldn't recommend using it in this case (unless it's dev
           delete missing)
<grifferz> so you would suggest to reboot, yank sdl, hopefully get up
           and running with a missing device, do dev delete missing,
           insert replacement disk, rebalance?
<darkling> It's kind of a known problem. We probably need a "device
           shoot-in-the-head" for cases where the data can't be
           recovered from a device.
<darkling> Yes.
<darkling> With the small device in the array, it might pay to do the
           dev delete missing *after* adding the new disk.
<grifferz> what problems is the 500G disk going to cause me?
<grifferz> apart from this one that I am having now I suppose :)
<darkling> Well, RAID-10 requires four devices, and will write to all
           four equally.
<darkling> So the array fills up when the smallest device is full.
<darkling> (If you have 4 devices)
<darkling> Have a play with http://carfax.org.uk/btrfs-usage/ to see
           the effects.
<grifferz> is that why it now thinks it is full because I had four 2T
           disks and a 500G one and I tried to delete one of the 2T
           ones?
<darkling> Yes.
<grifferz> ah, it's a shame it couldn't warn me of that, and also a
           shame that if I added a new 2T one (which I can probably do
           today) it won't fix itself
<darkling> I generally recommend using RAID-1 rather than RAID-10 if you
           have unequal-sized disks. It behaves rather better for space
           usage.
<grifferz> I bet I can't convert RAID-10 to RAID-1 can I? :)
<darkling> Of course you can. :)
<darkling> btrfs balance start -dconvert=raid1,soft
           -mconvert=raid1,soft /
<grifferz> oh, that's handy. I saw balance had dconvert and mconvert to
           raid1 but I thought that would only be from no redundancy
<darkling> No, it's free conversion between any RAID level.
<grifferz> nice
<grifferz> well, thanks for that, at least I have some sort of plan now.
           I may be in touch again if reboot ends up with a volume that
           won't mount! :)

Disaster!

In which it doesn’t mount, and then it only mounts read-only.

fuuuuuuuuuuuuuuuuuuuuuu

<grifferz> oh dear, I hit a problem! after boot it won't mount:
<grifferz> # mount /srv/tank
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.358894] BTRFS:
           device label tank devid 5 transid 798058 /dev/sdj
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.372031] BTRFS
           info (device sdj): disk space caching is enabled
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.379825] BTRFS:
           failed to read the system array on sdj
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.403095] BTRFS:
           open_ctree failed
<grifferz> mount: wrong fs type, bad option, bad superblock on
           /dev/sdj,
<grifferz> googling around but it seems like quite a generic message
<darkling> Was sdj the device that failed earlier?
<grifferz> no it was sdl (which used to be sdh)
<darkling> OK.
<grifferz> # btrfs fi sh
<grifferz> Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
<grifferz>  Total devices 5 FS bytes used 1.08TB
<grifferz>  devid    5 size 465.76GB used 465.76GB path /dev/sdj
<grifferz>  devid    3 size 1.82TB used 555.29GB path /dev/sdh
<grifferz>  devid    4 size 1.82TB used 555.29GB path /dev/sdi
<grifferz>  devid    2 size 1.82TB used 555.29GB path /dev/sdg
<grifferz>  *** Some devices missing
<grifferz> (now)
<grifferz> perhaps ask it to do it via one of the other disks as sdj
           is now the small one?
<darkling> Yeah.
<darkling> Just what I was going to suggest. :)
<grifferz> even when specifying another disk it still says "failed to
           read the system array on sdj"
<darkling> But, with that error, it's not looking very happy. :(
<darkling> What kernel was this on?
<grifferz> it was on 3.13-0 from debian wheezy backports but since I
           rebooted it booted into 3.14-0.bpo.2-amd64
<grifferz> I can try going back to 3.13-0
<darkling> 3.14's probably better to stay with.
<darkling> Just checking it wasn't something antique.
<grifferz> I could also plug that failing disk back in and remove sdj.
           it probably still has enough life to be read from
<darkling> Well, first, what does btrfs check say about the FS?
<darkling> Also try each drive, with -s1 or -s2
<grifferz> check running on sdj, hasn't immediately aborted…
<darkling> Ooh, OK, that's good.
# btrfs check /dev/sdj
Aug  8 19:13:15 specialbrew kernel: [  884.840987] BTRFS: device label tank devid 2 transid 798058 /dev/sdg
Aug  8 19:13:15 specialbrew kernel: [  885.058896] BTRFS: device label tank devid 4 transid 798058 /dev/sdi
Aug  8 19:13:15 specialbrew kernel: [  885.091042] BTRFS: device label tank devid 3 transid 798058 /dev/sdh
Aug  8 19:13:15 specialbrew kernel: [  885.097790] BTRFS: device label tank devid 5 transid 798058 /dev/sdj
Aug  8 19:13:15 specialbrew kernel: [  885.129491] BTRFS: device label tank devid 2 transid 798058 /dev/sdg
Aug  8 19:13:15 specialbrew kernel: [  885.137456] BTRFS: device label tank devid 4 transid 798058 /dev/sdi
Aug  8 19:13:15 specialbrew kernel: [  885.145731] BTRFS: device label tank devid 3 transid 798058 /dev/sdh
Aug  8 19:13:16 specialbrew kernel: [  885.151907] BTRFS: device label tank devid 5 transid 798058 /dev/sdj
warning, device 1 is missing
warning, device 1 is missing
warning devid 1 not found already
Checking filesystem on /dev/sdj
UUID: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 49947638987 bytes used err is 0
total csum bytes: 1160389912
total tree bytes: 1439944704
total fs tree bytes: 150958080
total extent tree bytes: 55762944
btree space waste bytes: 69500665
file data blocks allocated: 1570420359168
 referenced 1568123219968
Btrfs v0.20-rc1-358-g194aa4a
<grifferz> it doesn't seem to have complained. shall I give mounting
           another try, or fsck again from another disk?
<darkling> Hmm. Odd that it's complaining about the system array, then.
<darkling> That check you just did is read-only, so it won't have
           changed anything.
<grifferz> doing the fsck with another device gives identical output
<grifferz> and no, I still can't mount it
<darkling> Oooh, hang on.
<darkling> Try with -odegraded
<grifferz> # mount -odegraded /srv/tank
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.388182] BTRFS:
           device label tank devid 5 transid 798058 /dev/sdj
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.628728] BTRFS
           info (device sdj): allowing degraded mounts
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.633978] BTRFS
           info (device sdj): disk space caching is enabled
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.725065] BTRFS:
           bdev (null) errs: wr 122025014, rd 0, flush 293476, corrupt
           0, gen 0
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.730473] BTRFS:
           bdev /dev/sdg errs: wr 3, rd 8, flush 0, corrupt 0, gen 0
<grifferz> prompt not returned yet
<darkling> OK, that's probably good.
<grifferz> bit worrying it says it has an error on another disk though!
<darkling> Those are cumulative over the lifetime of the FS.
<darkling> Wouldn't worry about it too much.
<grifferz> right okay, some of those happened the other day when the
           whole bus was resetting
<grifferz> prompt still not returned :(
<darkling> Bugger...
<grifferz> yeah iostat's not showing any disk activity although the
           rest of the system still works
<darkling> Anything in syslog?
<grifferz> no that was the extent of the syslog messages, except for a
           hung task warning just now but that is for the mount and
           for btrs-transactiblah
<darkling> How about -oro,recovery,degraded?
<darkling> You'll probably have to reboot first, though.
<grifferz> I can't ctrl-c that mount so should I try that in another
           window or reboot and try it?
<grifferz> probably best to reboot I suppose
<grifferz> I suspect the problem's here though:
<grifferz> Aug  8 19:26:33 specialbrew kernel: [ 1682.538282]
           [<ffffffffa02f1610>] ? open_ctree+0x20a0/0x20a0 [btrfs]
<darkling> Yeah, open_ctree is a horrible giant 1000-line function.
<darkling> Almost every mount problem shows up in there, because
           that's where it's used.
<grifferz> hey that appears to have worked!
<darkling> Cool.
<grifferz> but it doesn't say anything useful in the syslog
<grifferz> so I worry that trying it normally will still fail
<darkling> Now unmount and try the same thing without the ro option.
<darkling> Once that works, you'll have to use -odegraded to mount the
           degraded FS until the new disk arrives,
<darkling> or simply balance to RAID-1 immediately, and then balance
           again when you get the new disk.
<grifferz> that mount command hasn't returned :(
<darkling> That's -odegraded,recovery ?
<grifferz> I think I will put the new disk in and take a copy of all my
           data from the read-only mount
<grifferz> and yes that is correct
<darkling> OK, might be worth doing one or both of upgrading to 3.16
           and reporting to bugzilla.kernel.org
<darkling> You could also take a btrfs-image -c9 -t4 of the filesystem
           (while not mounted), just in case someone (josef) wants to
           look at it.

A bug report was duly filed.

A new kernel, and some success.

It’s been a long time since I bothered to compile a kernel. I remember it as being quite tedious. Happily the procedure is now really quite easy. It basically amounted to:

$ wget -qO - https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.16.tar.xz | xzcat | tar xvf -
$ cd linux-3.16
$ cp /boot/config-3.14-0.bpo.2-amd64 .config
$ make oldconfig
(press Return a lot)
$ make deb-pkg
# dpkg -i ../linux-image-3.16.0_3.16.0-1_amd64.deb

That procedure is documented in the Debian Linux Kernel Handbook.

I wasn’t expecting this to make any difference, but it did! When booted into this kernel I was able to do:

# mount -odegraded,recovery /srv/tank
# umount /srv/tank
# mount -odegraded /srv/tank

and end up with a read-write, working volume.

There were no interesting syslog messages.

Thankfully from this point on the volume was fully read-write, so even though a fair bit of work was still needed I could put it back in service and no further reboots were required.

<grifferz> oh, that's interesting. after a short delay, mounting
           -orecovery,degraded on 3.16 does actually work. it appears!
<darkling> \o/
<grifferz> do I need to unmount it and remount it with just -odegraded 
           now?
<darkling> Yes, that should work.
<grifferz> and then I can put the new disk in, add it to the volume,
           rebalance it, remove the small 500G disk, convert to raid-1?
<darkling> A bit faster to use btrfs dev replace to switch out the
           small device for the new one.
<darkling> Then btrfs dev resize n:max /mountpoint for the n that's
           the new device.
<darkling> Then restripe to RAID-1.
<grifferz> right, great, it's mounted with just -odegraded
<grifferz> so: 1) insert new disk, 2) "dev replace" the 500G one for
           this new device?
<darkling> Yes.
<darkling> That will leave the new device with an FS size of 500G, so
           you need to resize it.
<darkling> (Same idea as resizing the partition but not the ext* FS on
           it)
<darkling> The resize should take a few ms. :)
<grifferz> I don't seem to have a "btrfs device replace" command. do I
           need to build a new btrfs-progs?
<darkling> What --version is it?
<darkling> (Probably build a new one, yes)
<grifferz> Btrfs v0.20-rc1-358-g194aa4a
<darkling> Yeah, that's old enough you're mising some features.
<grifferz> ah, it's not "btrfs device replace" it's just "btrfs
           replace …" I've built v3.14.2 now

So that was:

$ sudo btrfs replace start /dev/sdj /dev/sdl /srv/tank

after carefully confirming that /dev/sdj really was the 500G disk and /dev/sdl really was the new 3TB disk I just inserted (several of the device names change throughout this post as disks are ejected and inserted!).

<darkling> Oh, OK. Seems like a slightly odd place to put it. :(
<darkling> The userspace tools are a bit of a mess, from a UI point of
           view.
<darkling> I'm currently struggling with several problems with btrfs
           sub list, for example.
<grifferz> heh: $ sudo btrfs replace status /srv/tank
<grifferz> 0.4% done, 0 write errs, 0 uncorr. read errs
<darkling> Look on the bright side: it's way faster than two balances.
<grifferz> won't this still leave me with a volume that it thinks has a
           device missing though?
<darkling> Yes, but if you're going to remove the small device, this is
           still probably the fastest approach.
<grifferz> after it's finished with the replace and I've done the
           resize, will a "device delete" of the small one leave it
           satisfied?
<darkling> Once the replace has finished, the small device should no
           longer be a part of the FS at all.
<grifferz> oh yeah
<grifferz> surely it should be happy at that point then, with 4 equal
           sized devices?
<darkling> You might want to run wipefs or a shitload of /dev/zero
           with dd over it, just to make sure. (Bonus points for doing
           it from orbit. ;) )
<darkling> The replace is a byte-for-byte replacement of the device.
<darkling> So if you were degraded before that, you're degraded after
           it.
<grifferz> right but after the replace and resize then?
<darkling> The resize just tells the FS that there's more space it can
           use -- it's a trivial O(1) operation.
<grifferz> what will I need to do to make it happy that there aren't
           any missing devices then?
<darkling> An ordinary balance. (Or a balance with -dconvert=raid1 if
           you want to go that way)
<grifferz> I do ultimately. In which case do you think there is any
           reason to do the balances separately?
<darkling> No reason at all.
<grifferz> righto :)

The replace finishes:

Started on 11.Aug 20:52:05, finished on 11.Aug 22:29:54, 0 write errs, 0 uncorr. read errs

It turns out wipefs wasn’t necessary; I did it with -n anyway just to see if it would find anything, but it didn’t.

Time to do the balance/convert.

<grifferz> $ sudo btrfs balance start -v -dconvert=raid1,soft
           -mconvert=raid1,soft /srv/tank
<grifferz> Dumping filters: flags 0x7, state 0x0, force is off
<grifferz>   DATA (flags 0x300): converting, target=16, soft is on
<grifferz>   METADATA (flags 0x300): converting, target=16, soft is on
<grifferz>   SYSTEM (flags 0x300): converting, target=16, soft is on
<grifferz> fingers crossed :)
<grifferz> I am a bit concerned that syslog is mentioning sdj which is
           no longer part of the volume (it was the smallest disk)
<grifferz> Aug 11 22:45:23 specialbrew kernel: [10551.595830] BTRFS
           info (device sdj): found 18 extents
<grifferz> for example
<grifferz> and btrfs fi sh confirms that sdj is not there any more
<grifferz> well I think it is just confused because iostat does not
           think it's touching sdj any more
<grifferz> hah, balance/convert complete, but:
<grifferz> $ sudo btrfs fi sh
<grifferz> Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
<grifferz>         Total devices 5 FS bytes used 1.09TiB
<grifferz>         devid    2 size 1.82TiB used 372.03GiB path /dev/sdg
<grifferz>         devid    3 size 1.82TiB used 373.00GiB path /dev/sdh
<grifferz>         devid    4 size 1.82TiB used 372.00GiB path /dev/sdi
<grifferz>         devid    5 size 2.73TiB used 1.09TiB path /dev/sdl
<grifferz>         *** Some devices missing
<grifferz> Btrfs v3.14.2
<grifferz> so now half my data is on sdl, the rest is split between
           three, and it still thinks something is missing!
<darkling> btrfs dev del missing /mountpoint
<grifferz> aha!
<darkling> And the way that the allocator works is to keep the amount
           of free space as even as possible -- that maximises the
           usage of the FS.
<grifferz> that was it :)
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 4 FS bytes used 1.09TiB
        devid    2 size 1.82TiB used 372.03GiB path /dev/sdg
        devid    3 size 1.82TiB used 373.00GiB path /dev/sdh
        devid    4 size 1.82TiB used 372.00GiB path /dev/sdi
        devid    5 size 2.73TiB used 1.09TiB path /dev/sdl
 
Btrfs v3.14.2

Phew!

everything went better than expected

How to work around lack of array support in puppetlabs-firewall?

June 23rd, 2014

After a couple of irritating firewalling oversights I decided to have a go at replacing my hacked-together firewall management scripts with the puppetlabs-firewall module.

It’s going okay, but one thing I’ve found quite irritating is the lack of support for arrays of things such as source IPs or ICMP types.

For example, let’s say I have a sequence of shell commands like this:

#!/bin/bash
 
readonly IPT=/sbin/iptables
 
for icmptype in redirect router-advertisement router-solicitation \
                address-mask-request address-mask-reply; do
    $IPT -A INPUT -p icmp --icmp-type ${icmptype} -j DROP
done

You’d think that with puppetlabs-firewall you could do this:

class bffirewall::prev4 {
    Firewall { require => undef, }
 
    firewall { '00002 Disallow possibly harmful ICMP':
        proto    => 'icmp',
        icmp     => [ 'redirect', 'router-advertisement',
                      'router-solicitation', 'address-mask-request',
                      'address-mask-reply' ],
        action   => 'drop',
        provider => 'iptables',
    }
}

Well it is correct syntax which installs fine on the client, but taking a closer look it hasn’t worked. It’s just applied the first firewall rule out of the array, i.e.:

iptables -A INPUT -p icmp --icmp-type redirect -j DROP

There’s already a bug in Puppet’s JIRA about this.

Similarly, what if you need to add a similar rule for each of a set of source hosts? For example:

readonly MONITORS="192.168.0.244 192.168.0.238 192.168.4.71"
readonly CACTI="192.168.0.246"
readonly ENTROPY="192.168.0.215"
 
# Allow access from:
# - monitoring hosts
# - cacti
# - the entropy VIP
for host in ${MONITORS} ${CACTI} ${ENTROPY}; do
    $IPT -A INPUT -p tcp --dport 8888 -s ${host} -j ACCEPT
done

Again, your assumption about what would work…

    firewall { '08888 Allow egd connections':
        proto    => 'tcp',
        dport    => '8888',
        source   => [ '192.168.0.244', '192.168.0.238', '192.168.4.71',
                      '192.168.0.246', '192.168.0.215' ],
        action   => 'accept',
        provider => 'iptables',
    }

…just results in the inclusion of a rule for only the first source host, with the rest being silently discarded.

This one seems to have an existing bug too; though it has a status of closed/fixed it certainly isn’t working in the most recent release. Maybe I need to be using the head of the repository for that one.

So, what to do?

Duplicating the firewall {} blocks is one option that’s always going to work as a last resort.

Puppet’s DSL doesn’t support any kind of iteration as far as I’m aware, though it will in future — no surprise, as iteration and looping is kind of a glaring omission.

Until then, does anyone know any tricks to cut down on the repetition here?

The HIPPOBAG experience

May 21st, 2014

Sacks of soil and stones in our front garden
We’ve been doing some work in our front garden recently. Part of that involved me digging up the top 5cm or so of the stony horrible existing soil.

All in all, around a tonne of stones and soil got bagged up and had been sitting in our front garden for a couple of months. We’ve kept four sacks for use in the back garden which left 21 sacks and around 800kg to dispose of.

I enquired at our local tipreuse and recycling centre, Space Waye, who informed me that

“waste from home improvements/renovations is classed as industrial waste and is liable for charging.”

Apparently soil from your own garden is also classed as the result of an improvement and is therefore industrial waste, the disposal of which is charged at £195 per tonne. If I were to dispose of the soil and stones at the council facility I would need to transport it there and pay in the region of £156.

Looking for alternatives, I came across HIPPOBAG. The business model is quite simple:

  1. You pick which size of bag you require.
  2. They post it to you (or you can buy it at a few different DIY stores).
  3. You fill it.
  4. You book a collection.
  5. They come and take it away, which they aim to do within 5 working days.
  6. They recycle over 90% of the waste they collect.

Their “MEGABAG” at 180cm long × 90cm wide × 70cm tall and with a maximum weight of 1.5 tonnes seemed the most appropriate, and cost £94.99 — a saving of £61 over the council’s offering, and no need to transport it anywhere ourselves.

The bag turned up in the post the next day, at which point I discovered a discrepancy in the instructions.

The filling instructions on the documentation attached to the bag stated that it should only be filled two thirds with soil, and levelled out. Neither the Frequently Asked Questions page nor the How To Use A Hippobag page say anything about this, and all pictures on the site show the bags filled right up, so I was completely unaware of any such restriction.

Now, I “only” had 800kg of soil but that was some 21 sacks which when placed in the bag did fill it past the top level. I don’t see how you could use the maximum capacity of 1.5t and only fill it two thirds. I was really worried that they weren’t going to carry out the collection.

With the awkward shape of the rubble sacks they weren’t packing that well into the bag. There was a lot of wasted space between them. In the interests of packing down the soil more level we decided to split open many of the sacks so the soil and stones would spread out more evenly.

I had some misgivings about this because if Hippobag decided there was too much to collect and if it were all still in sacks, at least I might have had the option of removing some of the sacks and not entirely waste my money. On the other hand it did look like it would pack down a lot further.

What we were left with was a bag about half to two thirds full of soil and stones with three or four more sacks of it plonked in the middle, no higher than the lip of the bag.

On the evening of Monday 12th I booked a collection. I was expecting to be able to choose a preferred day, but it seems the only option is “as soon as possible”, and

we aim to collect your HIPPOBAG at any time within 5 working days of your booking

So, by Monday 19th then?

I wrote in the “comments to the driver” section that I would definitely be in so they should ring the bell (they don’t need you to be at home to do a collection). I wanted to check everything was okay and ask them about the filling instructions.

The afternoon of Monday 19th came and still no collection. I filled in the contact form on their web site to enquire when it might take place.

At some point on Tuesday 20th May I looked out of our front window and the bag had gone. I hadn’t heard them make the collection and they didn’t ring the bell. It must have happened when I was out in the back garden. They shoved a collection note through our door. My comments to the driver were printed on the bottom, so they must have seen them. I still haven’t received a response to my enquiry. They did actually reply to my enquiry on the afternoon of Tuesday 20th. I’d missed that at the time this was written.

Not a big deal since they did perform the collection without issue and only a day later than expected.

Really I still think that council refuse sites should be more open to taking waste like this at no charge, or a lot cheaper than ~£156, if you can prove it is your own domestic waste.

I understand that the council has a limited budget and everyone in the borough would be paying for services that not everyone uses, but I also think there would be far fewer incidents of fly tipping — which the council have to clean up at huge expense to the tax payer.

Compared to having to transport 21 sacks of soil to Space Waye and then pay £156 to have them accept it though, using HIPPOBAG was a lot more convenient and £61 cheaper. It’s a shame about the unclear instructions and slow (so far no) response to enquiries, but we would most likely use them again.

On attempting to become a customer of Metro Bank

April 15th, 2014

On the morning of Saturday 12th April 2014 I visited the Kingston Upon Thames store of Metro Bank in an attempt to open a current account.

The store was open — they are open 7 days a week — but largely empty. There was a single member of staff visible, sat down at a desk with a customer.

I walked up to a deserted front desk and heard footsteps behind me. I turned to be greeted by that same member of staff who had obviously spotted I was looking a bit lost and come to greet me. He apologised that no one had greeted me, introduced himself, asked my name and what he could help me with. After explaining that I wanted to open a current account he said that someone would be with me very soon.

Within a few seconds another member of staff greeted me and asked me to come over to her desk. So far so good.

As she started to take my details I could see she was having problems with her computer. She kept saying it was so slow and made various other inaudible curses under her breath. She took my passport and said she was going to scan it, but from what I could see she merely photcopied it. Having no joy with her computer she said that she would fill in paper forms and proceeded to ask me for all of my details, writing them down on the forms. Her writing was probably neater than mine but this kind of dictation was rather tedious and to be quite honest I’d rather have done it myself.

This process took at least half an hour. I was rather disappointed as all their marketing boasts of same day quick online setup, get your bank details and debit card same day and so on.

Finally she went back to her computer, and then said, “oh dear, it’s come back saying it needs head office approval, so we won’t be able to open this right now. Would you be available to come back later today?”

“No, I’m busy for the rest of the day. To be honest I was expecting all this to be done online as I’m not really into visiting banks even if they are open 7 days a week…”

“Oh that’s alright, once it’s sorted out we should be able to post all the things to you.”

“Right.”

“This hardly ever happens. I don’t know why it’s happened. Even if I knew I wouldn’t be able to tell you. It’s rare but I have to wait for head office to approve the account.”

As she went off to sort something else out I overheard the conversation between the customer and staff member on the next table. He was telling the customer how his savings account couldn’t be opened today because it needed head office approval and it was very rare that this would happen.

I left feeling I had not achieved very much, but hopeful that it might get sorted out soon. It wasn’t a very encouraging start to my relationship with Metro Bank.

It’s now Tuesday 15th April, three days after my application was made or two working days, and I haven’t had any further communication from Metro Bank so I have no idea if my account is ever going to be opened. I don’t really have any motivation to chase them up. If I don’t hear soon then I’ll just go somewhere else.

I suppose in theory a bank branch that is open 7 days a week might be useful for technophobes who don’t use the Internet, but if the bank’s systems don’t work then all you’ve achieved is to have a large high street box full of people employed to tell you that everything is broken. Until 8pm seven days a week.

Update 2014-04-15 15:30: After contact on twitter, the Local Director of the Kingston branch called me to apologise and assure me that he is looking into the matter.

About 15 minutes later he called back to explain, roughly:

The reason the account was not approved on the day is that I’ve only been in my current address for 7 months, so none of the proofs of address would have been accepted. Under normal circumstances it is apparently possible to open an account with just a passport. If not then the head office approval or rejection should happen within 24 hours, but their systems are running a bit slowly. Someone should have called me to let me know this, but this did not happen. Apparently approval did in fact come through today – I am told someone was due to call me today with the news that my account has been opened. I should receive the card and cheque book tomorrow.

I’m glad this was so quickly resolved. I’m looking forward to using my account and hopefully everything will be smoother now.

Yearly (Linux) photo management quandary

January 1st, 2014

Here we are again, another year, another dissatisfying look at what options I have for local photo management.

Here’s what I do now:

  • Photos from our cameras and my phone are imported using F-Spot on my desktop computer in the office, to a directory tree that resides over NFS on a fileserver, where they will be backed up.
  • Tagging etc. happens on the desktop computer.
  • For quick viewing of a few images, if I know the date they were taken on, I can find them in the directory structure because it goes like Photos/2014/01/01/blah.jpg. The NFS mount is available on every computer in the house that can do NFS (e.g. laptops).
  • For more involved viewing that will require searching by tag or other metadata, i.e. that has to be done in F-Spot, I have to do it on the desktop computer in the office, because that is the only place that has the F-Spot database. So I either do it there, or I have to run F-Spot over X11 forwarding on another machine (slow and clunky!).

The question is how to improve that experience?

I can’t run F-Spot on multiple computers because it stores its SQLite database locally and even if the database file were synced between hosts or kept on the fileserver it would still need the exact same version of F-Spot on every machine, which is not feasible — my laptop and desktop already run different releases of Ubuntu and I want to continue being able to do that.

It would be nice to be able to import photos from any machine but I can cope with it having to be done from the desktop alone. What isn’t acceptable is only being able to view them from the desktop as well. And when I say view I mean be able to search by tags and metadata, not just navigate a directory tree.

It sounds like a web application is needed, to enforce the single point of truth for tags and metadata. Are there actually any good ones that you can install yourself though? I’ve used Gallery before and was never really satisfied with ease of use or presentation.

Your-Photos-As-A-Service providers like Flickr and even to some extent Google+ and Facebook have quite nice interfaces, but I worry about spending many hours adding tags and metadata, not bothering to back it all up, and then one day the service shuts down or changes in ways I don’t like.

I’m normally quite good about backing things up but the key to backups is to make them easy and automatic. From what I can see these service providers either don’t provide a backup facility or else it’s quite inconvenient, e.g. click a bunch of times, get a zip file of everything. Ain’t nobody got time for that, as a great philosopher once wrote.

So.. yeah.. What do you do about it?

Removing dead tracks from your Banshee library

October 6th, 2013

I used MusicBrainz Picard to reorganize hundreds of the files in my music collection. For some reason Banshee spotted that new files appeared, but it didn’t remove the old ones from the library. My library had hundreds of entries relating to files that no longer existed on disk.

I would have hoped that Banshee would just skip when it got to a playlist entry that didn’t exist, but unfortunately (as of v2.4.1 in Ubuntu 12.04) it decides to stop playing at that point. I have to switch to it and double click on the next playlist entry, and even then it thinks it is playing the file that it tried to play before. So I have to double click again.

I have filed a bug about this.

In the mean time I wanted to remove all the dead entries from my library. I could delete my library and re-import it all, but I wanted to keep some of the metadata. I knew it was an SQLite database, so I poked around a bit and came up with this. It seems to work, so maybe it will be useful for someone else.

Wanted: cheap but cheerful small Linux device

September 3rd, 2013

I changed ISP recently for my broadband at home and switched from ADSL2+ to FTTC, so that’s required a new broadband router.

Initially I got things working with the Technicolor TG582N as supplied by the ISP, but it appears quite horrible in most of its functionality. I find most cheap domestic broadband routers are, to be honest. Little plastic blobs with the absolute minimum spec of hardware, configured via web interfaces that can politely be described as clunky, and packing many unwanted features.

With FTTC here in the UK you have a separate NTE box supplied by British Telecom and then you supply (or your ISP supplies) a router that connects to that by Ethernet and talks PPP-over-Ethernet to your ISP. So, anything that can do PPPoE works as the router, no special hardware required. Any Linux box will do.

I had this Soekris net4801 box that I purchased in 2005, been running it constantly ever since, and it still works fine. It’s a nice little thing; 266MHz fanless CPU, 128MiB RAM, three 10/100 Ethernet ports and CompactFlash for storage. Draws under 10W when idle and not a lot more at full tilt.

Really quite expensive though. After delivery charges, purchase of compatible PSU and CF card and currency conversions are done you’re probably talking £200 now and I seem to recall it was similar back in 2005 too.

I upgraded that from Debian etch to lenny to squeeze to wheezy — which went remarkably without incident by the way, a testament to Debian’s excellent upgrade procedure — and set it to work as the router. Since it’s just a relatively conventional Debian install it’s really easy to configure PPPoE, IPv4, NAT, IPv6, firewalling and anything else.

There’s a couple of things I’m not too happy about though.

What if it dies?

If you have a Soekris last several years then it’s going to be pretty reliable. There’s no moving parts, the most likely faults are going to be the CF card or the power supply. Even so, this one’s been in service about 8 years and that’s a really good innings. It could go any time and then what will I replace it with?

Of course I still have the Technicolor and that will work well enough to get connectivity until I put something better in its place again, but what would be that better thing?

Back in 2005 I had a bit more disposable income than I do now and £200 was okay to spend on something I was interested in playing with. I’m done playing with it now though and spending £200 to end up with a Linux box that runs at 266MHz and has 128M RAM is going to hurt. Also the net4801 is end of life so will get harder and harder to purchase new, and any replacement will cost a little more.

Is the Soekris really beefy enough?

Right now I only have 40M down, 10M up FTTC and the Soekris doesn’t appear to be limiting that any more than the Technicolor limited it.

Conceivably though I may one day upgrade it to 80/20 or more and that is starting to push the limits of a 100M Ethernet port, let alone a 266MHz CPU.

As you would expect from a 266MHz CPU with 128M RAM it’s dog slow at doing anything much in user land. This is a pretty minor gripe as the use case here is that of an appliance, like the broadband router it replaced. You shouldn’t really need to touch it much. Something slightly less puny would be a nice bonus though.

Options

HP Microserver

HP have been doing cash back deals on their Microserver range for a few years now. I already have one here at home being a file server and a few other bits and pieces. If they were still doing the cash back then I’d strongly consider buying another one to use for this.

It would draw a fair bit more power than the Soekris does, but they are still quite efficient machines and I would probably find it more things to do since it would be a lot more capable.

Without the cash back though I don’t think it can be justified. Retail price of a Microserver at the moment is around £265+VAT.

Update: It appears the cash back offer has returned, at least for September 2013!

http://www.serversplus.com/servers/tower_servers/hp_tower_servers/704941-421

Some Linksys WRT device with OpenWrt

It’s a contender, but it will leave me with some cheap nasty hardware running a non-standard Linux distribution on an ARM CPU. I’m sure OpenWrt is great but I don’t know it, I’d have to learn it just for this, and it’s not likely to be useful knowledge for anything else.

If possible I want to remain running Debian.

More enterprisey router hardware from Cisco or Mikrotik

This would certainly work; a Cisco off of ebay may be cheap enough, otherwise a new Mikrotik Routerboard would be within budget. Say an RB450G.

The main issue again would be it’s not Linux. That’s not necessarily a bad thing, it’s just that it wouldn’t feel familiar to me. I know how to configure everything in Linux.

Something from Fabiatech

I stumbled across a blog post by Richard Kettlewell entitled Linux In A Small box. In it he considers much the same issue as I have been, and ends up going for a Fabiatech FX5624

Looks good. £289+VAT though.

omg!! Raspberry Pi everywareeeeeeeeeeeeeeeeeee!!!!!

Yeah, Raspberry Pis are nice pieces of kit for what they are designed for. Which is not passing large amounts of network traffic. They only have one 100M Ethernet, and it’s driven by USB 2.0 so it’s going to suck. It will suck even more when you attach a USB hub and more USB Ethernets.

Something from Jetway

Alex suggested looking at these devices. They look quite fun.

A bare bones system that on paper should do the job (1.6GHz Intel Cedar Trail CPU, two Realtek gigabit Ethernet, one SO-DIMM slot for up to 4GB RAM) seems to be £149+VAT.

There seems to be a good selection of other main boards and daughter boards if that config wasn’t suitable.

Anyone got any personal experience of this hardware?

This Is Not An Exit

I still don’t know what I will do. I might put off the decision until the Soekris releases its magic blue smoke. I would be interested to hear any suggestions that I haven’t thought of.

Here are the requirements:

  • Capable of running a mainstream Linux distribution in a supportable fashion without much hacking around.
  • Has at least two gigabit Ethernet ports.
  • Is beefier than a 266MHz Geode CPU with 128M RAM
  • Easy to run its storage from an inexpensive yet reasonably reliable medium like CompactFlash or SD/microSD. Write endurance doesn’t really matter. I will mount it read-only if necessary.

Some nice-to-haves:

  • At least one serial port so I can manage it from another computer when its network is down, without having to attach a VGA monitor and keyboard. The Soekris manages this perfectly, because it’s what it’s designed for. It doesn’t even have a VGA port.
  • Total configuration of the BIOS from the serial port, so a VGA monitor and keyboard are never necessary. Again, that’s how Soekris products work.
  • Ethernet chipsets that are actually any good, i.e. not Realtek or Broadcom.
  • Capable of being PXE booted so that I don’t have to put the storage into another machine to write the operating system onto it.