Completed
Patch the Planet: Restore Missing Data
$50,000
Completed 26 weeks ago
0 team
This is part 1 of the 4 part Encoded Reality Series of challenges that take different approaches to analyzing and building models for geophysical data. Your goal for the Patch the Planet challenge is to build an algorithm that can accurately fill in a missing volume of data given the context around it (Figure 1). We are looking for a clever machine learning or deep learning model that can fill in these missing volumes.
Figure 1: Three 3D oblique views of the challenge data for Patching the Planet. On the Left is the entire 3D volume, the middle is a view with the volume with the missing Target. On the Right is the Target data.
Challenge Structure
For Patching the Planet, the problem's difficulty is increased from the other Encoding Reality challenges as a winning solution must fill in missing data on the seismic image and in three dimensions. There are two levels of difficulty for this problem depending on whether you use either the synthetic or real seismic dataset that we are providing (see Data Tab). Note that the Predictive Leaderboard scoring and final evaluation will focus on the synthetic data; there will be several honorable mentions for models in the final evaluation that perform the best on real data.
One way to think about this problem is predicting the next frame of a movie:
The z-axis is the height of the movie screen.
The x-axis is the width of the movie screen.
The y-axis is the time in the movie.
To jumpstart your thinking on this problem, check out this popular LSTM implementation that predicts the next frame in an animation (https://keras.io/examples/vision/conv_lstm/).
Data
Participants will be provided with 500 synthetic seismic datasets and 50 real seismic datasets. Real and synthetic data will be delivered as Numpy arrays with a shape of (300,300,1259). You will also be provided with a training data generation code in the starter notebook to build the training data. This code allows experimentation with different-sized missing data volumes in the seismic data. The challenger can increase the percentage of the missing section in each seismic volume to increase the difficulty. The default missing section will be set to 25%. Challengers are free to use any combination of data they choose. The test and holdout data used for scoring will have a missing section equal to 25% of the entire seismic volume. You are encouraged to reuse ideas and code from other challenges in the Encoding Reality Series.
Evaluation
To evaluate the performance of your solution, each challenger will provide a submission file containing six 2D arrays taken from each 3D seismic volume in the test dataset. Instructions and submission file generation code is provided at the bottom of the starter notebook. For this challenge, the leaderboard evaluation will use the scikit-image implementation of the Structural Similarity Index. The Structural Similarity Index is a metric used to measure the similarity between two images. When the SSI equals 1, the images are identical. When the SSI equals 0, the images are completely dissimilar. Please refer to the `scikit-image` docs for more information about the metric, as well as examples of implementation. Similarity will be calculated for all predictions. The minimum and maximum SSI values will be dropped, and the mean SSI score across all predictions will be the final score.
For the Final Evaluation, the top submissions on the Predictive Leaderboard will be invited to send Onward Challenges their fully reproducible code to be reviewed by a panel of judges. The judges will run a submitted algorithm on an AWS SageMaker g5.12xlarge instance, and it must run within 24 hours.The Structural Similarity Index used for the Predictive Leaderboard will be used to determine 95% of the user's final score. The remaining 5% of the final score will assess submissions on the interpretability of their submitted Jupyter Notebook. The interpretability criterion focuses on the degree of documentation (i.e., docstrings and markdown), clearly stating variables, and reasonably following standard Python style guidelines.
A successful final submission must contain the following:
Jupyter Notebook with a clearly written pipeline.
Requirements.txt file that gives instructions on how to run training/inference for the submission
Any supplemental data or code to reproduce your results.
It must contain the libraries and their versions and the Python version (>=3.6). See the Starter Notebook on the Data Tab for an example.
Timelines and Prizes
Challenge will open on December 13, 2023 and close on March 15, 2024 at 22:00 UTC.
The main prize pool will have prizes awarded for the first ($20,000), second ($12,000), and third ($8,000) in the final evaluation. Participants with scores in fourth through tenth places will receive $1,000 for their valid submission. There will be two $1,500 honorable mentions for valid submissions that score highly with the real seismic data samples.