ThinkOnward - Accelerating Energy Industry Innovation

Completed

Optimize RAM Usage for Labeling 3D Data

$5,000

Launched 115 weeks ago

Completed

For this Bounty, we are looking for Python code that can scale SciKit Image's measure.label (link) function to process a large 3D array. SciKit Image's measure.label is a useful function that links connected clusters of pixels and creates an integer label for each cluster. An analogy would be finding and labeling all the raisins in a loaf of cinnamon-raisin bread without slicing up the loaf. See the Starter Notebook on the Data Tab for a worked example. However, scikit.measure.label becomes RAM intensive on large 3D arrays (10-100's GB), and the function can fail. To solve this, we need a solution that first splits the large 3D array into manageable chunks. The difficult part is that a chunk cannot separate any cluster. In our cinnamon-raisin bread example, imagine slicing the loaf without cutting through a single raisin. Once the chunks are defined then the scikit.measure.label function can be run and the chunks can be reassembled to give the full view of labeled clusters.

Your solution must:

Break the target 3D array into chunks. A chunk cannot split a cluster or there must be a method to reassemble it with only one label. A chunk must be no larger than 2 GB in size.
Run the scikit.measure.label function on each cluster so that each cluster gets a unique label.
Reassemble all the chunks into the original 3D array shape and order.

Submissions

Submissions should be a Zip file containing a Python script.

Requirements for the Python script:

Can be run on a cloud virtual machine (Ex. AWS)
Script comments, doc strings, or markdown describing the steps involved.
Clear description of all parameters (username, dates, etc.).
Clear description of the setup and how to run the script.

This job will be closed after 5 successful submissions.

Prize

Five prizes will be distributed in three categories:

$2,000 - Best Performance
$1,500 - First to submit a working code
$500 - Honorable Mentions x 3

Evaluation

You will only be able to submit once per day per job. Every day (12:00 UTC), a Xeek Judge will review all submissions for completeness. Submissions will be judged on a ml.c5.18xlarge SageMaker instance.

To evaluate submitted code, Judges will check whether the code meets the requirements (chunks are <= 2GB in size and no clusters intersected), and cleanliness of code (10% of score).

At the end of the month, Xeek will reach out to successful participants for a prize payout over all jobs.

Prize	Time Stamp	Participant
First to Submit	5/26/2023 9:38	Leo Dinendra
Best Solution	05/29/2023 23:00	Daniel Cano
Honorable Mention	05/29/2023 13:38	Moto
Honorable Mention	06/12/2023 07:43	vecxoz