Personal tools
  • We're Hiring!

You are here: Home Community Minutes Mini Group meetings OMERO/HIC Meetings 2011.08.01

2011.08.01

Attending: Josh, Simon, Andy, Jason


* Josh: can we figure out what you'd want regardless of OMERO
    * Andy: slippery slope of different desires
    * SqlServer v. OMERO server
    * Andrew started out by wanting to query what's happening with datasets
    * Knock on implications
    * Team is used to SQL
    * e.g. longit. to 5000 columns per person
    * 40-50 queries, aliasing, subsets, aggregates, one join in every query
    * Trying to ignore the data mgmt. and not change their part of work
    * Think carefully about what's possible, based on the scope
    * Technically feasible versus usable
    * They just use SQL to extract data from large tables into subsets
    * Ask them: "Anything in SQL select statement, temp table
    * Don't want to learn new syntax.
    * Managing 1 year pilot have to keep things separate.
        * Just go with OMERO model
    * Longer-term: risk if we don't have SQL interface, even governance people think it's great.
    * Josh: they can create tables, etc? Definitely.

* Simon: different people
    * Prepairing/cleaning in NHS sql server (out of scope)
    * HIC data analysts/admins (Chris, Al., Andy, 1 more and 2 trainees) --> Flat files (e.g. OMERO --> OMERO)
    * Project silo: researchers looking at their own data
    * Governance users: guardians at data source or patients, "who's using my data"

* Andy: Researchers don't tend to know SQL
    * andy:

* Governance
    * Terms
        * Disclosure (done on aggregates
        * Proportionate ...
    * All being made up as we go along
        * e.g. "privacy risk of joining 2 tables"?
        * talking about that in SHIP, but it's not defined.
        * no one's tried to automate it before (That's the ticket!)
    * Andrew seems to be more focusing on the audit trail
        * powerstation like feed
    * Perhaps a layer to manage the various auditing sources.

* Big face-to-face
    * more about us sitting in a room rather than steering committee
    * external auditing about how the datasets are used
    * not currently much done at the project level.

* To decide (Josh)
    * schema of the auditing and example data
        * Andy: but that's also what we need to decide.
        * Simon: how static is that info?
        * Think so: what datasets used by what project / researcher
        * Andy: e.g.
            * Project silo 267
            * know which tables a researcher CAN access, but not that they have.
    * definition of governance
        * Andy to put up links to best practice documents (ISDs current)
        * as if it's only ever been used on aggregates.
    * wiki page with
        * various auditing information in SqlServer
        * the automated oversite pieces (PM service, etc.)
        * graffle of the audit data flow
        * best practices and SHIP drafts
        * definition of XML exchange format (chi mappings, security, etc)

* Jason: what is "the audit info"?
    * Andy: simplest level (current) who can or cannot access what
    * Andrew was saying in steering meeting
        * any question along 3 dimensions
            * who or what has used any data from a dataset (rows per past month)
            * what has project 267 used
            * what has the subject level used (patient, give me an audit trail of researcher X)
    * audit info from SqlServer into OMERO?
        * One part, yes.
    * ...josh went through the whole spiel...
    * Jason: why are we trying to preserve what you guys are doing?
        * i.e. imagine where we are 18 months from now.
        * Are the DCs SqlServers or CSV files?
        * Andy: ignoring NHS SqlServer (out-of-scope)
            * have more stable data (static for a month, etc.) on the uni-side
            * cleanest: SqlServer push to OMERO via a long-query
            * benefit is its a quick solution.
            * SQL because data analysts use complicated queries
            * and there'd be a re-training and implementation problem.
            * perhaps in a year's time. (trying not to break everything at once)
            * Researcher might query 
                * boxi and business objects, then out-of-scope of OMERO
                * (don't particularly like it though)
                * Josh: also a concern for us since then there's something missing from the end product
        * Jason: again, generic DC is producing data in what format?
            * Andy: flat files --> SqlSever --> flat files
            * XML --> Sql query
Document Actions