2011.05.27
Attending: Andy, Jason, Josh, Simon
Agenda
-
Action points from last time
- CVs to Andy
- Status from Andy on joins, aliasing, specific typing, string lengths
- Josh: description of service (See listing 1)
Notes
HIC: Jason, Simon, Andy, Josh
Admin
- Getting Andy access
-
Jason: emails for interviews, etc?
- Andy is ok with all times (Monday)
- Jason will forward to Josh
Data info
- Andy: haven't had time
FS/service (josh)
- Architecting a "Haven" as a subclass of Repository
- Andy: shouldn't use the word Haven
-
Other good words:
- Semantic Type
- Agents
-
Andy: could certainly seem new data coming in.
- GoDARTs just added aliasing
- Could envision one master copy come in (with anonymisation)
- and then splitting it for the individual projects
Querying
- Not very done.
Mid-July is deadline to show something
- Tyson (?Kison?) should be someone we can work with. (and Colin...)
- SNP imputing against version XYZ
- Tools are command-line based.
- Not HDF5.
-
Scenario:
- Load full CSV
- Research goes to command-line
-
Runs a query with joins
-
Biochemistry: give me the patients who have type 2 with BMI in this range
- All columns? Probably
- Select the rows with a BMI over this.
- Longitudinal all rows / subset
-
Biochemistry: give me the patients who have type 2 with BMI in this range
-
PLink computer or whatever
- Andy needs to understand more
- Takes one file of phenotypical data in a particular file format
- Need to sit down.
- See results as txt file
-
Installing / Testing new files
- Andy: One set of data. Load it into DB
- Josh: But we're developing the DB (and SQL ...)
- Jason: how useful is it there is file system
-
Andy: simon come to visit to see the work.
- Up to 100 student projects a year
- for GoDARTs its reasonably stable
- but could understand the "Your data is ready"
Plan
-
For next week, @@having scaffolding of the CLI in place
- then Simon has a visit with Andy to compare the workflows
-
Andy: @@data flow diagram (from Simon)
- Where does the stats package sit?
- Then get auditing working
- Then a website with simple queries
-
No meeting currently planned
- Jason: Next 2 months ain't good
- Andy: keen to get the workflow (been asking the same questions)
- @@Andy will setup visit, inviting Simon
-
For next week, @@having scaffolding of the CLI in place
Listing 1
class TestHavens(lib.ITest):def testSimple(self): if False: ##### Possible service styles #### grid = self.client.sf.sharedResources() # Pure repo solution. Need to under repoMap = grid.repositories() # Returns "Repository" mimetype repoMap = grid.repositoriesOfType("Haven") # Mimetype hic = "..." # Get the appropriate on somehow hic.list("/") hic.mkdir("/Patients") hic.attachAgent("/Patients", "HavenType", {"schema":original_file_id}) client.upload("/Patients", "test.csv") # Shared resource solution grid.deleteHaven("test") grid.createHaven("test") grid.findHaven("test") haven_prx = grid.getRepositoryForHaven(haven_obj) # Share solution haven_obj = iSharePrx.creatShare("...", haven = True) # Object solution haven_prx = grid.findHaven("test") haven_prx = grid.openHaven(HavenI(1, False)) # Or is the repo a group, i.e. a shared pot?!?!?! # Negative tests assertRaises(createHaven, "no.periods") assertRaises(createHaven, "no spaces") assertRaises(createHaven, "no_punctunation") assertRaises(createHaven, "no-punctunation") # Result of the above should be a single haven prx haven_prx = Haven() # Admin original_file = client.upload("test.xml") haven_prx.addType("test", original_file) original_file = client.upload("test.csv") haven_prx.addData("test", original_file) # Audit settings haven_prx.setRecordRowAccess(True) # Audit information logs = haven_prx.getEventLogs() # Return own ITime for log in logs: print log.action, print log.entityType, # "<TYPE>.<columnname> print log.entityId, # row, -1 for no record row access # User names = haven_prx.getTypes() for name in names: print name cols = haven_prx.getColumns(name) for col in cols: print col.name, print col.description, print col.originalName, print col.originalPosition