Personal tools
  • We're Hiring!

You are here: Home Community Minutes Mini Group meetings OMERO/HIC Meetings 2011.04.01 Inaugural Meeting

2011.04.01 Inaugural Meeting

Attending: Josh, Jason, Simon, Andy

Agenda

  1. Matters Arising (<10 mins)
  2. New Hires (5 mins)
  3. Hardware
    • "4 CPUs (up to ~3.3GHz each), 50 - 100GB storage, 8GB memory."
    • "Initially this would share the host machine's network card but we would look to order a separate 1GB card (in the coming days) which would be dedicated to the VM, cost is likely to be £100-200 (charged to SHIP). If you think this isn't going to meet the immediate resource needs then we will need to purchase a new host server."
    • "The VM will reside inside the university firewall but outside of ours, and not on our domain - thus it will have a 134.36.204."
    • "IP and be accessible across the uni network, but not externally. If external access is needed then this would be need to be authorised by ICS."
    • "We can provide a Windows 2008 R2 VM or we can host a Linux VM, either way you can have full admin rights. Thus if you have an existing template VM that we can host within HyperV - please make this available to us. If you don't then please indicate which OS is preferred - n.b. we can setup the win2k8 instance very quickly as we have templates."
  4. Access to Datasets (10 mins)
  5. Requirements for initial prototype (20 mins)
    • group-based v. individual based
    • requirements on auditing, who's reviewing how; per row
  6. Requirements for Wellcome visit demo (20 mins)
  7. Any other business (<5 mins)
    • Process & visibility of web resources

Notes

* Matters arising
    * Don't need privileges on plone (yet); have privileges on trac

* New Hires
    * Job ad is out on monster
    * Has it been posted to NoSQL? Not yet
        * http://www.lifesci.dundee.ac.uk/vacancies/2011/03/30/software-developer-2-posts
    * Simon also looking into known developer groups
    * Andy: http://careers.stackoverflow.com/
    * Simon official? As far as we know.

* Hardware
    * VM with full admin rights
    * Hosted by HIC (physical environment - similar to J. Monk's)
    * Can move the VM elsewhere
    * One genetics file is a Gig.
    * NIC to the cluster.
    * Should only be able to see 134.36.204.*
    * Andy: May need to pester Simon to get it configured.
    * Simon: patched & updated debian (with our keys), put up somewhere so Andy can access it
    * leading into to dataset...

* Datasets
    * Josh: what can't we do with the data on the VM?
    * Andy: "don't download data" / "data should stay on the VM"
        * Can pipe it, but ...
        * SOPs should protect us
        * Not a real issue
        * Perhaps a list of what people can and can't do
        * Simon: shell into box, but don't scp all the data
        * Andy: just behave and their are no problems
        * At the moment, we don't have any control anyway (what we're trying to fix)
    * Jason: everyone sign?
        * Andy: anyone who can access VM needs to sign.
        * Josh: use SSH keys. Simon in charge; only people who Andy has ok'd.

* Prototype
    * Andy: group-based at the moment (within GoDARTs)
        * smallest group can be one person
        * never heard of a project with individual permissions
        * for pilot, group is fine
    * Auditing system
        * Andy: never done this before
        * Just setting it up on the Oracle box
        * Auditing of every query which is run
            * login
            * select of one record, etc.
            * admins are tracked as well
        * Reviewing by default only done by HIC data coordinator (Alison & Andy)
            * Already have external audits by independent company
            * But they may do it through the HIC offices
            * researcher signs SOP, need to make sure data is used appropriately
            * i.e. visualize spot check
            * Level 1
                * human recalling time-series of what researcher did
            * Level 2 could be audited, but outside the scope for the moment
    * Simon: ok to instant message you to clarify points?
        * We spent about 30 minutes talking
        * Andy: just email us!
        * Previous projects had IRC open 24/7
        * Jason: Simon, do you want jabber?
            * Not it's ok.
            * Andy: NHS blocks lots.
            * Can join IRC when pinged.
    * Status
        * Josh:
            * Simon is commiting the start of the researcher tools
            * Josh will start working on the auditing bits
        * Andy: got the data from HIC, but they've lost the scripts with linkages / aliases
            * ## aliases may be a problem
            * Having to reverse engineer their data
            * Luckily had a bit of code to do this.
            * Having to go in a fill in XML by hand (tedious)
            * Sample file with 100 rows per file, plus schema file
            * They will be popped in as their down.
            * PLINK is completely command-line (a bit messy)

* Visit demo
    * Josh: any way to pin down what that needs to be
    * Andy: loading in / pulling data out probably won't show up there
    * Josh: could at least mock up a auditing web page
        * "Oh my god, Bob did X..."
    * Jason: thinking...
        * What would Andrew (who has an iPad) think of...
        * walking with visitors around HIC
        * he's able to get a web form, securely login to site Y,
        * and can see 2 other investigators,
        * click on one of them,
        * see series of datasets associated with them,
        * click on dataset,
        * see some listing of types of data (or something like that)
        * graphical representation of some of that data
        * would that make the point?
        * Andy:
            * going into provenance / governance of the data usage
            * nice way of going about it.
            * login, see a couple of users
            * see how much data that is in
            * queries over a couple of days, number of bytes they've moved
            * looking at metadata for web pages
            * pick couple of samples, subsets to graph X people with type 2
            * and these are the queries run against this variable
            * demoing the audit trail is fine
        * Jason
            * the problem is that it's AAA, important in safe haven
            * but we also want to show some science
            * Andy: number of summary statistics for whole database
                * that would be useful
            * Jason: few pages of a presentation like an admin interface
            * Safe haven is shown by administrator (Prof. Morris) monitoring what's done
            * could we also have someone write a query against the database to get some value
        * Andy: for the researchers,
            * running a query to get a subset to do analysis on
            * or pulling genetics data and running against PLINK
        * Simon
            * Core thread is governance and adminstration
            * but with some demonstration of how a scientist may use it
        * Andy
            * genetics file formats are regularly transformed
            * perhaps that would be an example
            * Jason: the point would be they don't do that anymore
    * Jason: Back to "The Day"
        * 2 users, click on user, see a dataset
        * 2 datasets and a fused dataset (incorporates calculation which occured)
        * ...
        * once we get the data, then we'll be complete about what we're looking at
        * we have reasonable SNP analysis people (D.M.) for how to process that data
        * then we'll figure out if we can run that analysis and show the result
        * need to know what the analysis and the visualization look like.
        * how do we show that we did it, and that Andrew can review it.

* AOCB
    * Genetics
        * Jason: didn't understand operation
        * Andy: don't yet either
        * 18K patients (10K cases)
        * 3 chip array analyses (blood tests), 750K SNPs
        * metab. chip did 120K (different locations)
        * looking at overlap from chips for people who are in both, inputing the gap
        * one of the files is binary (reversing it led to increase in size 20M -> 1GB)

Action Items

* Simon: talk to Chris about which NoSQL boards to post job URL to
* Jason: post job URL to the OME list
* Josh: look into careers at stackoverflow for Dundee
* Simon: provide VM to Andy (Fusion)
* Andy: draw up a list of best practices for data on VM
* Simon, Josh, Jason: get signed forms.
* Andy: confirm that public notes is ok
* Next meeting in 2 weeks on 15th.
Document Actions