Personal tools
  • We're Hiring!

You are here: Home Community Minutes Conference Calls 2016 2016-02-09 Tuesday Team Meeting

2016-02-09 Tuesday Team Meeting

Dundee: Dominik, Mark C, Kenny, Gus, Simone, Simon, Petr, June, Helen, Balaji, Roger, Will, Ola, Chris A (14:36 UK), Josh B Remote: Sebastien B, Ian, Kelli, Colin, Harald, Stick, Ilya, David, Melissa, Eleanor, Josh M, Wilma, Chris C, Josh B, Emil (14:33 UK), Rebecca, Jason

Agenda - 2:30pm Start

  1. Accepting minutes from last meeting

    • accepted
  2. Project Timelines (2-3 minutes each)

    1. Spaces - 14:30 UK

      1. Mainline (J-m)
        1. Web download: PR opened
          • Ola: testing tomorrow
        2. Polyline/Polygon (insight): not compatible with model, PR opened
        3. Units support in Web viewer: Will is looking at it.
        4. Ice 3.6 J-M to ask on zeroc forum if/when release of 3.6.2
        5. Blog post about Windows support. Draft under review
        6. Balaji: pretty good for 5.1.8 release, e-mail discussion about documentation changes re. MicroManager
          • Josh: will need to make a decision ASAP
      2. Model (Sebastien)
        • Mark: now made graph operations Folder-aware and opened related design issue
        • Mark: will now be making further changes to Shape properties
        • David: current PR finally green so now moving onto nested Folder tags
        • Sebastien: measurement tool now working with folders, now client work moving onto nested folders, still on schedule for m1 and demo (Feb 26)
      3. Metadata (Josh)
        • currently looking into activating web in devspace
        • feature calculation bits later
        • Eleanor: finished one more dataset, now looking at the next
    2. Other releases/upgrades - 14:38 UK

      1. Figure
        • Josh: might unit changes require figure changes?
          • Will: units already supported reasonably well, just displays units as saved rather than converting, so probably okay
      2. FLIMfit/OPT (Ian)
        • FLIMfit - "latest" builds working and tested.
        • Localisation - final touches to updated UI and presentation of same.
      3. ImageJ
      4. Learning
      5. Sysadmin
        • Kenny: ironed out some kickstart PXE bugs
        • Jason: thanks for help with stats.
    3. Glencoe Update (Chris) - 14:41 UK

      • getting things back into opensource, including fixes to physical pixel sizes
        • will need announcement of what may have been affected how badly (scalebars, etc.) since 5.1.0
        • images with micron pixel sizes should have been fine, but EM tends to be in nm
        • Sebastien: may be able to use configurations for test data to determine which readers may be affected
      • PRs coming re. issues with OMERO reader, CellProfiler, etc. on large data with IO via OMERO
      • finishing up populate metadata issues
  3. AOB (5 mins max - technical discussions should be highlighted to relevant people and rescheduled) - 14:46 UK

    • Ilya and Chris C introduction
      • OME for machine learning, extracting features from biological images
  4. "Distributed Feature Calculation with Pydoop" presentation by Simone - 14:49 UK

    • Ilya: given performance issues with non-streaming random access to image files on distributed filesystem by Bio-Formats readers, could it be better to use OMERO as image server for cluster nodes running map operation (feature extraction)?
      • or, perhaps some not-Hadoop distributed computing framework may be better
      • Josh B: hadoop is designed for text-based streaming, something like SPARK may be a better fit
        • Simone: depends in part on available memory per node
    • Chris A: how come performance dropped much further from ideal when splitting by plane instead of series?
      • Simone: many more map processes so correspondingly more overhead
      • Chris A: but why so much more overhead, for jobs taking many minutes things like JVM startup should be negligible?
      • Simone: one issue is “stragglers” where some of the jobs take rather longer than others
      • Simone: issue is HDFS making seeks expensive
      • Chris A: is unconvinced that explains the difference, especially given durations and trend: IO is a small fraction of this overhead so look at the framework for the culprit
        • Simone: suspects the issue is more one of using so many cores for so little data
        • Chris A: though, current datasets are mostly z=1, t=1 in size
        • Josh M: some do have plenty of planes though
    • Ilya: With many more cores and fragmenting the data more, overhead would be much worse again?
      • Josh: probably yes, guessing that much overhead is Bio-Formats setId when the reader initializes
      • Chris A: is still struck by how quickly large HCS planes can be read in other scenarios, so is doubtful that IO is the problem
      • Ilya: better to gather more performance data points before drawing conclusions, especially with more cores still, concerned about trend
        • Josh B: expects that more nodes will mean more overhead
        • Simone: also experiments were on a relatively small dataset given how many cores were applied
    • Simon: the problem parallelizes well by planes (Josh M: or tiles), so focus on how to get the planes out
      • Ilya: also matters how large a plane it makes sense to calculate features on
      • Simone: these are 384x384-pixel planes
      • Simon: pull planes from OMERO server?
        • Chris A: too many Bio-Formats instances, would need Java processes on nodes rather than on server
          • can initialize Bio-Formats locally
          • Mark: series is a property of Image
    • Josh M: what infrastructure are we building for testing WND-CHARM across the whole of IDR?
      • Simone: just presenting past hadoop work, not sure which is best, but definitely should use some framework rather than something manual
      • Chris A: consider a classical grid approach (rather more lightweight), a “poor man’s parallelization framework” with a scheduler and writing out planes as separate files
        • Ilya: now hadoop-based solution is implemented it’s worth trying it on a real problem
        • Simon: still some setup to do to achieve that
        • Ilya: all rides on having a good distributed filesystem
    • Chris A: suggest taking what has already been done, except move to GPFS and local Bio-Formats on nodes
    • Chris C: WND-CHARM currently reads one pixel plane per TIFF. Would it help for WND-CHARM to more tightly integrate with Bio-Formats for reading the pixel data?
      • Josh M: would be useful to not rely on reading TIFF, could instead provide numpy arrays
      • Ilya: would be good if it effectively wraps a pointer to memory accessible by the local process
      • Chris A: existing Avro-based solution should suffice
        • Josh M: except don’t serialize to HDFS
        • Simone: currently going via socket, not file
          • Avro-serialized data transferred via hadoop protocol
        • Chris A: can omit HDFS from the Java-to-Python code, use local filesystem?
        • Simone: absolutely
        • Chris A: so we already have the link from Bio-Formats to WND-CHARM
        • Simone: some caveats re. RGB, etc.
        • Chris A: can use channel splitter, etc. to regularize input of pixel data
        • Chris A: so for GPFS just need code to get filepath, series, etc. from OMERO?
        • Simone: yes
        • Chris A: given size of problem, okay to make filesets or images the unit of work, still plenty more than available CPUs
          • Josh M: initialization of large filesets (plates) takes many seconds even with memo files, may still be negligible against WND-CHARM feature extraction
          • Ilya: for resumability need better granularity than fileset for if features are already computed
          • Chris A: distributed processing framework should bring us resumability
          • Josh M: probably not the biggest problem, even “parallel” command-line utility offers useful logging
          • Chris A: could be useful to have longer-running Java process per node and scheduler tends to hand nodes work from same fileset
          • Simone: concerned about ending up reimplementing hadoop
          • Chris A: if working at fileset level but wanting decent resumability then already a fair bit of coding to be done
    • Simon: what tile sizes make sense for WND-CHARM?
      • Ilya: should compute multiple feature sets per image
        • low-res 512^2, as well as tiles of 512 of full size
      • Chris C: some feature values depend on size of pixel plane, so variety in plane size can be an issue
        • 200x200 is analyzed quickly, 512x512 may be on the large side
        • Ilya: either makes sense, question is more than of if we do multiple overlapping computations at different size

Done 17:15 UK

Document Actions