2011.08.01
Attending: Josh, Simon, Andy, Jason
* Josh: can we figure out what you'd want regardless of OMERO * Andy: slippery slope of different desires * SqlServer v. OMERO server * Andrew started out by wanting to query what's happening with datasets * Knock on implications * Team is used to SQL * e.g. longit. to 5000 columns per person * 40-50 queries, aliasing, subsets, aggregates, one join in every query * Trying to ignore the data mgmt. and not change their part of work * Think carefully about what's possible, based on the scope * Technically feasible versus usable * They just use SQL to extract data from large tables into subsets * Ask them: "Anything in SQL select statement, temp table * Don't want to learn new syntax. * Managing 1 year pilot have to keep things separate. * Just go with OMERO model * Longer-term: risk if we don't have SQL interface, even governance people think it's great. * Josh: they can create tables, etc? Definitely. * Simon: different people * Prepairing/cleaning in NHS sql server (out of scope) * HIC data analysts/admins (Chris, Al., Andy, 1 more and 2 trainees) --> Flat files (e.g. OMERO --> OMERO) * Project silo: researchers looking at their own data * Governance users: guardians at data source or patients, "who's using my data" * Andy: Researchers don't tend to know SQL * andy: * Governance * Terms * Disclosure (done on aggregates * Proportionate ... * All being made up as we go along * e.g. "privacy risk of joining 2 tables"? * talking about that in SHIP, but it's not defined. * no one's tried to automate it before (That's the ticket!) * Andrew seems to be more focusing on the audit trail * powerstation like feed * Perhaps a layer to manage the various auditing sources. * Big face-to-face * more about us sitting in a room rather than steering committee * external auditing about how the datasets are used * not currently much done at the project level. * To decide (Josh) * schema of the auditing and example data * Andy: but that's also what we need to decide. * Simon: how static is that info? * Think so: what datasets used by what project / researcher * Andy: e.g. * Project silo 267 * know which tables a researcher CAN access, but not that they have. * definition of governance * Andy to put up links to best practice documents (ISDs current) * as if it's only ever been used on aggregates. * wiki page with * various auditing information in SqlServer * the automated oversite pieces (PM service, etc.) * graffle of the audit data flow * best practices and SHIP drafts * definition of XML exchange format (chi mappings, security, etc) * Jason: what is "the audit info"? * Andy: simplest level (current) who can or cannot access what * Andrew was saying in steering meeting * any question along 3 dimensions * who or what has used any data from a dataset (rows per past month) * what has project 267 used * what has the subject level used (patient, give me an audit trail of researcher X) * audit info from SqlServer into OMERO? * One part, yes. * ...josh went through the whole spiel... * Jason: why are we trying to preserve what you guys are doing? * i.e. imagine where we are 18 months from now. * Are the DCs SqlServers or CSV files? * Andy: ignoring NHS SqlServer (out-of-scope) * have more stable data (static for a month, etc.) on the uni-side * cleanest: SqlServer push to OMERO via a long-query * benefit is its a quick solution. * SQL because data analysts use complicated queries * and there'd be a re-training and implementation problem. * perhaps in a year's time. (trying not to break everything at once) * Researcher might query * boxi and business objects, then out-of-scope of OMERO * (don't particularly like it though) * Josh: also a concern for us since then there's something missing from the end product * Jason: again, generic DC is producing data in what format? * Andy: flat files --> SqlSever --> flat files * XML --> Sql query