Personal tools
  • We're Hiring!

You are here: Home Community Minutes Conference Calls 2014 2014-06-24 Tuesday Team Meeting

2014-06-24 Tuesday Team Meeting

Attending: Dundee: Andrew, Simone, Kenny, Simon, Chris, Emil, Petr, Dominik, Jason, Graeme, June, Blazej, Balaji, Ola, Mark, Jean-Marie, Will, Roger Dundee non-ome: Joe Ward, Chris Cole, Nick Schurch, ?, ? Remote: Seb S, Kelli, Andreas, Curtis, Douglas, Josh, Melissa, Ian, Seb B, Yuri, Tom Walsh

Dundee: Andrew, Simone, Kenny, Simon, Chris, Emil, Petr, Dominik, Jason, Graeme, June, Blazej, Balaji, Ola, Mark, Jean-Marie, Will, Roger Dundee non-ome: Joe Ward, Chris Cole, Nick Schurch, ?, ? Remote: Seb S, Kelli, Andreas, Curtis, Douglas, Josh, Melissa, Ian, Seb B, Yuri, Tom Walsh

Agenda - 2:30pm Start

  1. Simone Leo: Dundee Project (20-25 minutes plus 15 mins questions max) - 14:32 UK

    • Jason: what does “in production” mean? Who’s using it? Simone: 2 large studies on longevity and autoimmune diseases. Guys who own the data are from CNR. Came to CRS4 due to available resources. Automation system is mostly used by external contractors who turn to CRS4 to perform the analysis.
    • Chris Cole (UoD GJB): If I wanted to implement this, how portable is it for other places? Simone: Code for OMERO.biobank has been released for some time. You need to install OMERO from source after modifying the models and install a python package for biobank. (...) An example of what we actually run in production is SEAL, which includes a distributed version of BWA. OMERO/Galaxy integration is a work-in-progress. Depends on how you want things to work.
    • Nick Schurch (UoD GJB): driving everything via Hadoop/HDFS? Hadoop: yes, HDFS: not directly, since using GPFS (since it’s parallel) but all the computationally complicated tasks are driven via Hadoop. Removing hadoop should be trivial if galaxy is the driver. How to do you keep track in OMERO of what galaxy is doing? If there is a new tool in Galaxy, how does OMERO know how to work with that? Simone which tool, which parameter, etc. are contained in the action objects, much stored as JSON. Nick: Galaxy wrapper has to be structured correctly, then? Yes. All the information to drive the tool and store the metadata back in OMERO (not the data) … defined in one registration file (for both OMERO & Galaxy).
    • Integrating Hadoop & OMERO (slidestack 2)
    • Jason: discussed for Summer the feature calculation or light-sheet calculation, based on the size of the data (doing calculation in Sardinia)
    • Josh: HDFS integration with Bio-Formats would get us the same for OMERO.fs. Data layout is going to be an issue. Chris Allan: several groups are working on binary data in HDFS, don’t want to try to build our own. Simone: cF. Avro.
    • ???: What does HDFS replication setting do to storage? n X as much. (plus boundary wastage)
    • Chris Cole: Use case. N affymetrix in 7 seconds not fast enough; why? That’s only a part of one table. Full table takes 1 day. If you want to try multiple different configurations, it’s too painful. Nick Schurch: could trade off precision in the probabilities that you’re storing. > <li>
    • Jason: how long are we talking about? Don’t want to make predictions
    • Roger: how smart is the split driver for HDFS? There is a more advanced use case in which they are in different directories. Will look into whether or not have to specify the driver on opening.
  2. Accepting minutes from last meeting

  3. Project Status (15 minutes max)

    1. Mainline (5 minutes)

      • Java 8: Simon - quite nasty, even the investigation. J-m: need to warn the community. Jason: should decide what we’re doing. Chris: as with Java 7, we choose to not fix these are our peril.
      • Jason exits; stage left.
    2. Glencoe Update (5 minutes)

    3. Consortium Update (5 minutes) - tabled (most had to leave early)

  4. AOB (5 mins max - technical discussions should be highlighted to relevant people and rescheduled)

    • ...
Document Actions