Background Job RAM.png

Optimize RAM Usage for Labeling 3D Data

Launched 59 weeks ago

For this Bounty, we are looking for Python code that can scale SciKit Image's measure.label (link) function to process a large 3D array. SciKit Image's measure.label is a useful function that links connected clusters of pixels and creates an integer label for each cluster. An analogy would be finding and labeling all the raisins in a loaf of cinnamon-raisin bread without slicing up the loaf. See the Starter Notebook on the Data Tab for a worked example. However, scikit.measure.label becomes RAM intensive on large 3D arrays (10-100's GB), and the function can fail. To solve this, we need a solution that first splits the large 3D array into manageable chunks.  The difficult part is that a chunk cannot separate any cluster.  In our cinnamon-raisin bread example, imagine slicing the loaf without cutting through a single raisin.  Once the chunks are defined then the scikit.measure.label function can be run and the chunks can be reassembled to give the full view of labeled clusters. 

Your solution must:

  • Break the target 3D array into chunks. A chunk cannot split a cluster or there must be a method to reassemble it with only one label.  A chunk must be no larger than 2 GB in size.

  • Run the scikit.measure.label function on each cluster so that each cluster gets a unique label.

  • Reassemble all the chunks into the original 3D array shape and order.


Submissions should be a Zip file containing a Python script.

Requirements for the Python script:

  • Can be run on a cloud virtual machine (Ex. AWS)

  • Script comments, doc strings, or markdown describing the steps involved.

  • Clear description of all parameters (username, dates, etc.).

  • Clear description of the setup and how to run the script.

This job will be closed after 5 successful submissions.


Five prizes will be distributed in three categories:

  • $2,000 - Best Performance

  • $1,500 - First to submit a working code

  • $500 - Honorable Mentions x 3


You will only be able to submit once per day per job.  Every day (12:00 UTC), a Xeek Judge will review all submissions for completeness.  Submissions will be judged on a ml.c5.18xlarge SageMaker instance.  

To evaluate submitted code, Judges will check whether the code meets the requirements (chunks are <= 2GB in size and no clusters intersected), and cleanliness of code (10% of score).  

At the end of the month, Xeek will reach out to successful participants for a prize payout over all jobs.


Time Stamp


First to Submit

5/26/2023 9:38

Leo Dinendra

Best Solution

05/29/2023 23:00

Daniel Cano

Honorable Mention

05/29/2023 13:38


Honorable Mention

06/12/2023 07:43