Wheel Meeting Agenda - 2019-04-08 ================================= Present: [MPT], [FVP], [DAS], [NTU], [TPG], [BOB], [MLG] ...then [GOZ], [GEE], [THA] Apologies: [SJH], [MSH], [MTL], [LE@], [*OX], [333], [AJT], [CFE], [ZAR], [DAA], [TRS], [RME], [CHB] *Meeting opened 19:18* ## Next meeting - Tue/Wed/Fri? Weekend? Time of day? Whenisgood? - [MLG] Next Thursday! (was kidding) - [BOB] Monday night isn't great. - 8 weeks since the last meeting. - Most things are done over mailing lists. - Not June 1st, or 15th. - [msh] mon seems difficult - Next meeting: Saturday 22nd June 2019 - Also, camp network night, 2 weeks before camp ## New wheel members, additions, nominations - [MPT] says "We can all read!" - [THA] Tom Hill Almeida - Not here yet. :'-( - [DAS] Donald Sutherland - Welcome to wheel! - Mentoring: pick a wheel member - Read /home/wheel/docs/WelcomeToWheel - Add to mailing lists. - Poke a wheel member to get started with uccpass, uccroot, figure out how things work... - Most important thing: Tell people what you're doing! - Core services: make sure you have a backup before you break stuff. - [NTU] Git - who is happy with it? - [BOB] Steep learning curve - gitlab makes it a lot easier - still a barrier to quick text editing of a file - [NTU] You can get a long way with quickly learned concepts - Something called `etckeeper` which lets you easily commit/track changes to stuff in /etc/ - Some old machines like mooneye probably don't have daily auto-commits, alyways take the chance to commit manually with a useful comment/message - We need mentors - who here has someone they are confident to ask questions? - [DAS] points at [FVP] and [MPT] - [FVP] mentions [CFE] - [THA] has spoken to [AJT], but isn't here - [DAS] would be fine just asking stuff in Discord/IRC - [FVP] would too - [BOB] Make sure you have phone numbers of at least a couple of old wheel members - We all want to be jack-of-all-trades, but the reality is that we are all really only good at one thing - If UCC is broken, call the person who knows how to fix it - [MPT] Do we have startup procedures? - [BOB] Generally just turn things on, but you have to check things after - [NTU] Home directories on merlo - it was recently rebooted and that didn't work first time - Make sure you restart syslogd after mounting homes - [BOB] Even just running `mount -a` on everything - Also make sure the mailserver is working properly - Start postfix, mailman, ... - [BOB] Proxmox cluster needs at least 2 nodes to start ## General discussion - [NTU] Encourage the recording of minutes directly here in `pwd` - Being done now! - [NTU] Applies to all minutes. - Collaborative editing? - Through a tmux/screen session - Self-hosted codimd , etherpad, etc. - [DAS] UniGames uses it: codimd/hackmd - [FVP] Google docs - [BOB] wants [FVP] to note that [FVP] is the worst - [FVP] notes that other clubs use google docs - [TPG] That's not a good argument! - [DAS] Unigames uses a google-doc-like thing, but it's a markdown editor - [BOB] All it takes is one bad committee handover to lose everything - [TPG] Bad handovers... not going to say any more. - [NTU] We have the best handovers, it's all on the wiki - [TPG] That's what oldguard/wheel is for - Wheel is an excuse to keep oldguard around... - [BOB] Wheel is "a round" [sic], you idiot! - [FVP] notes that more than 3 levels of indent is a good indication of getting off-topic - Also markdown probably breaks after about that many - Tempted to test it now - ... off-topic discussion continues... - [MPT] "If you need more than 3 levels of indentation, [...]" - [MLG] probably said something important, was missed due to off-topic discussions - [TPG] Putting minutes on the projector is a good idea, committee should do it - [NTU] visibly reinduct members new (and old?) with the "Wheel Group Ethical Guidelines" - So, we have just had actual wheel members who did not trust that things we had written down that were guidelines, do what they were supposed to do - People still believe there's a "shadow committee called wheel" when committee-only has been doing just that - We should have a look at wheel guidelines, - Questions on different points - Questions on what happens when two points are in conflict? - LOLCATDOG-esque? - [DAS] Interview questions? - [TPG] Something like that - there isn't a "right answer" - Make sure people understand the norms, ethical guidelines - Wheel is a position of trust - there are a lot of things, nobody can know what you did - [BOB] wishes to acknowledge [TPG]'s handling of the addition-to-wheel-group-shenanigans - [NTU] summarises the series of events - tl;dr: someone was added to wheel, was undocumented, nobody was told, was against guidelines/regulations - *Wheel group ethical guidelines are displayed on-screen* - Privacy: - testem01 is still a thing - Many testing situations can be done with non-confidential data, looking at configs, logs, etc. - Some information is already public: size of /var/mail/user, some files in homedirs (e.g. ~/.forward) - Communication / honesty: - [TPG] Everyone f*cks up at some point, usually within a few months of being on wheel... - [BOB] When you do break something, it's important that you let people know - Get help if you don't know how to fix something - Education: - [BOB] Sharing of knowledge is perhaps the most important part. - [TPG] If you make a mistake, learn from it, but also let others läærn from it - [BOB] In UCC, documenting stuff on the wiki is very important - Being the only person who can maintain something is, frankly, horrible. - [TPG] Dispense is [functional], it [cannot] break! - [BOB] Classic thing that broke and never got fixed is spamfilter - [NTU] Something on my mind - temperature monitoring - Every summer, we push the limits, something breaks... - [BOB] You can only learn if someone has taught. - [NTU] We always want more documentation, it's always in need of updating. - [NTU] Consolidate nested git repositories - ...and non-nested ones - [NTU] Revision control is great! We've started to use it more... - loads of stuff is in git, but there are so many local directories with git repos... - [TPG] The firewall /etc/init.d script is in git... - [NTU] and /etc is - and half a dozen under /services/http - [BOB] gitlab is less obfuscated - I'd really like [for UCC] to get rid of git.ucc.asn.au and move it to gitlab. - [NTU] or Sourcehug? https://sr.ht/ - [BOB] To clarify - the problem is that we have too many git repositories everywhere - [NTU] They're all fragmented, you can't see the history very easily - I want to move to a system where you can have a pull request or patch to machine's config or to the website - You don't need "magic" permissions to a particular directory to do something - [BOB] I must admit - I've been setting up a service, and noted down all the times that I needed "wheel permissions" to do something - it's quite a lot. - Would be nice to decrease the barriers to that. - [NTU] We have switch configs in git, thanks to [rancid](http://cvs.ucc.asn.au/cgi-bin/viewvc.cgi/rancid/ucc/) - [BOB] It's also really informal - by email, ... - [TPG] Some people don't commit so we don't know who changed things - [NTU] Reverting is hard as well - [GEE] and [THA] arrive at 20:12 - [NTU] What we can do: un-nest a bunch of git repos - Need to test it, make sure it works - [FVP] Submodules are a thing, but I don't know how they work - find /dirname -type d -name ".git" -ls - [BOB] Suppose we had a wiki page "how to make changes to our firewall" - Any security issues making that public? - [NTU] Some people worry, but it's not such a big secret. - [NTU] We're all in shodan. - [MPT] As long as we make sure passwords aren't being made public, that's the important thing. - [NTU] We have all the passwords collected in uccpass, so we can rotate them (once in a while) - [FVP] leaves 20:16 - ACTION: push murasoi:/etc/.git to gitlab? (especially ucc-fw) - ACTION: push mooneye:/etc/.git to gitlab? (especially bind/DNS) - [FVP] Minutes and stuff should really be in git "properly" (as well as on the website) - [NTU] Annotations are encouraged - Call for volunteers to take care of firewall and DNS in git stuff - send to tech@ucc - [NTU] Samba AD familiarisation/maintenance/updates/config-managed rebuilds - [NTU] It's still a bit Flaky (TM) - [FVP] installed sssd on a bunch of machines that were previously running winbind - It's got a lot of configuration options, but no good default configs - Have fun reading through the manpages to try to figure out how to make it work better - [NTU] Mooneye is currently using sssd, but is currently broken - /etc/passwd has 1300 or so users mooneye:~# wc -l /etc/passwd 1369 /etc/passwd mooneye:~# getent passwd|wc -l 1483 mooneye:~# grep passwd /etc/nsswitch.conf #passwd: files winbind ldap passwd: files sss motsugo:~# getent passwd | wc -l 1198 - If AD was 100% reliable, it would be fine, but something says it probably isn't... - Monitoring would be very good. - [TPG] Maybe it's doing de-duplication when running `getent`? - [NTU] It definitely _has_ broken, it's unreliable when restarting machines, there's only 1 AD DC - [FVP] notes that the last attempt to add another DC resulted in the Great AD Breakage of Charity Vigil 2018 - [NTU] We need to be able to detect when AD gets out of sync - `samba-tool drs showrepl` - [FVP] Look at v.ucc.asn.au (nested virtualisation is great!) - that resulted in https://wiki.ucc.asn.au/NewActiveDirectory - [BOB] Was the current AD wiki page updated? - [FVP] No, the new AD wiki page needs to be tweaked and backported to the current AD setup... - ACTION: [FVP] update the ActiveDirectory wiki page, or delete it. - recap on 2019-01-28.txt - [FVP] suggestion: get a proper Windows AD setup, at least for testing - It's actually used in the industry, something valuable from a "gaining useful experience" perspective - [NTU] We should hold onto the samba AD, but we could use it for the Windows machines - [FVP] Split-brain AD, even if intentional, is probably a bad idea... - [BOB] Can you get a quorum-type arrangement with AD? - (something about records and collisions and readonly mode and global catalogs, etc.) - ([FVP] thinks something about "FSMO roles" as well) - ([MPT] thinks something, and [THA] does too) - (many thoughts are floating around the room, it's hard to note down all of them...) - [BOB] In terms of what we can actually do: (yes, we have maaxen, it's Windows Server 2008, but probably time to upgrade it?) - Drop some money on upgrading it? - If someone has dreamspark, get it! Otherwise maybe drop some money on it. - Server 2019 Standard Version ~= $972 USD - techsoup.org has Windows Server Standard for USD$8 admin fee - [GOZ] notes that techsoup also has Google G-suite for free - You have to register with techsoup, as a charity, then you can get stuff. - ACTION: Sign us up to techsoup, put it in uccpass! - *[MPT] sneezes explosively at 20:52:17* - Australian "chapter" of techsoup is connectingup.org - connectingup has Windows Server Standard for $12 AUD+GST. - [GOZ] arrives 20:42 - We got 50 license keys for Windows desktops, we should also be able to get 50 concurrent sessions on a terminal server license - [GOZ] leaves 20:43 - [TPG] We could potentially upgrade to master Windows server, so we have a backup system in case samba dies - [FVP] Another suggestion was a barebones Windows server (maybe for terminal sessions) - [NTU] Plenty of spare hardware - [BOB] Oodles of spare CPU/RAM - [GOZ] arrives 20:45 - [GOZ] I don't see why it _can't_ be a VM... - [NTU] Take a Proxmox VM, migrate it to HyperV, and back! - Volunteers needed! - ACTION: Schedule event: Day? Evening? - [NTU] In a meeting context we can't really do that much - [BOB] Anything we say tonight should be taken as permission for someone to _do_ it - [NTU] netconsole ? - Sun Sep 28 10:22:26 WST 2014 motsugo OOM'ed: http://nick.ucc.asn.au/motsugo-oom.20140928-0200.png netconsoles would be nice ``` 09:44 < msh> evil has netconsole going iirc 09:44 < msh> pretty simple, just a few cmdline params 09:44 < Zanchey_> we have it on a couple of things 09:44 < Zanchey_> getting it to play nice with vlans and bridges seemed hardish last time I tried 09:45 < msh> can't you just hardcode a MAC? 09:45 < Zanchey_> I think it's the source interface bit 09:45 < Zanchey_> I think I got it to work but it took a few goes 09:47 < Nick> I was wondering if we can reset through the SSH interface, or just the web one? 09:48 < msh> only web, iirc 09:48 < msh> though actually, ssh should be possible if you're root 09:48 < msh> poke at some gpio pins perhaps :) 09:49 < msh> strace the webserver as you perform a reboot 09:49 < msh> is the admin user uid=0? 10:30 < Nick> msh: "is the admin user uid=0?" Yes. strace sounds like an idea! Some interesting tools in /usr/local/bin ``` - [BOB] Netconsole is a tool where if everything else has crashed, it exports logs to a machine - reliant on MAC addresses rather than IP (ie. it's reliable) - murasoi has a server running - set up individual machines and test - probably remnants of it on a number of machines - Good for catching things that can't write to syslog when the machine is crashing - [NTU] e.g. in the case of dead local disks - [NTU] Demo: remote console access to the major servers? - See [Apache Guacamole](https://secure.ucc.asn.au/login) for OS-level remote access - (maxxen for JAVA apps? TODO: get that working elsewhere) - IPMI / out of band management - often requires extra licenses - [FVP] One thing that's been on my TODO list for some time is to check that DNS/IP/login/name is consistent for server management interfaces - [BOB] notes that the IPMI login details are different from the OS root password, but should be in the same file in uccpass - They're really buggy, insecure, exploits galore, don't use them for anything secure. - [BOB] Something concerning to note: [MSX] had a BIOS password installed on her server Ashera. - Nobody in the room knows - [NTU] Nobody who has signed the ethical guidelines should have done it. - ACTION: Check with [MSX], get logs from the motherboard, check the webcams, find out who did it. - ACTION: Login & test IPMI stuff. - Tell people, organise an event - [NTU] Demo: restore files from offsite? - Skipped! - [FVP] Discuss procedure of wheel 'reapplications' - Security, transparency, responsibility, etc - Regeneration & public register of SSH keys & fingerprints - Keep the wheel group listing up to date, https://wiki.ucc.asn.au/ChangeLog ... - [NTU] Disambiguation? - Keep the tech/wheel group mailing lists up to date - https://wiki.ucc.asn.au/ChangeLog where not recorded elsewhere? - Semi-automation of https://wiki.ucc.asn.au/ChangeLog ? - Group entries up to date in normal AD/dispense/handover procedures - vs move locked accounts to an "old wheel" group after some time? ([JCF] suggested the name "retyred") - [NTU] When someone isn't a member of the club, their account is locked - portal.ucc.asn.au is great! - but now more people can very easily join/rejoin who aren't physically present - [FVP] Delegation of trust: I can't possibly know all the members of wheel - If I trust one person, can I then trust everyone they trust? - UCC has a unique system [for a student club] whereby there can be generational gaps amongst active members. - Trust has to work both ways - new members have to trust the people old members trust, and old members have to trust the people the new members trust. - It's complicated! - [BOB] Show of hands - who would be at all in support of any sort of "wheel reapplication process" - roughly half of people paying attention raise hands - [GOZ] People should be able to send an email to committee to explain who they are and why they should be trusted - [NTU] Things have to work by [rough] consensus - [TPG] Because of lack of transparency (ie. things that aren't in public git) - Many things you need wheel access to do - [NTU] That's a bug to fix. - [MPT] Anything that can be other-readable that isn't private, should definitely be. - [TPG] Until we fix that, we can't "boot off" all the people who have access / know how it works. - [MLG] If you're on committee, but aren't on wheel, you can't kick people off wheel (directly) - [GOZ] Problem statement: establishing an "acceptably short" chain of trust between members and administrators - Proposed fix: instead of chains of trust amongst wheel members, trust directly from committee - If someone isn't maintaining contact to committee then they lose trust. - Something above simply establishing a communication requirement. - Non-binding show of hands: all who would endorse a system where wheel members are required to remain in contact with committee - all who are paying attention raise hands - [NTU] Password/Key rotations - [NTU] Possible projects for new wheel/sprocket/winadmin members? - Ticketing system to add these sort of things to? - OTRS? - debbugs? - Ensure vendserver cannot be hung by an unresponsive MIFARE reader - http://git.ucc.asn.au/?p=uccvend-vendserver.git - Reliable server startup after reboots and power outages; - Monitoring - [AJT] LibreNMS? - http://www.linux-ha.org/source-doc/assimilation/html/index.html ? - Revive temperature monitoring either ad-hoc ( https://www.ucc.asn.au/stats/ ) or as part of other monitoring system - Network - https://www.rfc-editor.org/info/rfc1288 on workstations? - other? - Semi-automated emergency shutdowns e.g. temperature/UPS events; - Lower power murasoi backup - sync firewall configs to an Edgerouter? with a 4G backup uplink? - [msh] mussel's www.ucc functionality should be moved to a new vm. doesn't need to mount homedirs, easy to give all webmasters root access - [BOB] Access permissions within the sql databases needs a cleanup. Lots of accounts that should be locked, or others with blanket permissions that shouldn't have them. - phpmyadmin is out of date, badly needs an update - phppgadmin probably is in a similar boat - Permissions for users in that system are too broad, needs to be updated with locked accounts - tl;dr: account locking is hard - ACTION: Find volunteers to update stuff! - [BOB] VM firewalling is becoming an issue. Such quick churn of machines means that some VM's are open where they shouldn't be because old rules are still in place. Worth trying proxmox firewalling again? Alternatively need to improve our processes for VM deletion. - VMs are being set up everywhere, for testing, workshops, etc. - Firewall rules being configured, then not deleted - Just be aware that this is a possible issue. - [NTU] [JCF] [TEC] fail2ban/logging docs? - [FVP] Maybe just allow people to do their own firewalling through proxmox: finegrained access control, logged, etc. - ACTION: Test, update wiki, tell people, fix the script. - [BOB] This is additionally a proxmox permissions issue - when VM's are deleted, a user maintains permissions on that VM number, and then VM numbers are re-used. - Be aware! Your VM is being watched! (and not just by [BOB]!) - [DAA] Molmol NFS performance Apologies from me for the evening. I have not made any progress on analysing the performance issues on Molmol. Certainly the local disk appears to be very fast. It would be worth trialling against a local NFS mount as well as benchmarking a raw TCP connection, before trying to get to the bottom of the NFS client. I would be happy to work with someone on this. Molmol urgently needs an OS upgrade regardless. - [NTU] Not on my own - [MPT] would be interested, wants to find time [to schedule that] - [FVP] would be too - [THA] bagz'd it - [BOB] Need a lot of notice / proper warning - Benchmark NFS before and after - [FVP] More SSDs for ceph, and stuff - [BOB] will buy the disks Meeting closed 21:38.