ThinkOnward - Accelerating Energy Industry Innovation

Scorpius 2 - susan-wilkinson-fMXyMuOxuYU-unsplash.jpg

Generative AI

Completed

Two Birds, One Neural Network

$40,000

Completed 76 weeks ago

0 team

A chef who has only maple syrup and celery to create a meal with. A cab driver who has to get across town during rush hour with only a quarter of a tank of gasoline. A football captain is down one player, and it has started to snow on the pitch. Humans excel at finding different solutions to problems, especially when their options are limited. The excitement around generative AI is bringing that level of solution-finding to algorithms. This challenge continues our journey into the world of generative algorithms after the “From One to Many” challenge earlier this year.

The goal of this challenge is to develop a generative neural network that maximizes the diversity of generated outputs that meet two complex conditions.

Background

The original concept for this challenge comes from the difficulties of understanding the Earth using only a limited number of data sources. McAliley and Li (2021) took an interesting approach to this problem by using generative algorithms to create a diversity of subsurface models consistent with a given geophysical measurement. Earlier this year, Xeek’s “From One to Many” challenge (link) looked at the fundamentals of this approach by optimizing the McAliley and Li (2021) CVAE algorithm for a simplified scenario. Now, we are adding complexity with a second condition (equivalent to adding a second geophysical measurement to constrain the subsurface scenarios) and sinusoidal functions.

Data and Labels

At a high level, let's imagine that we have two properties: x0 (a descending straight line of 50 points in the range of 0-1) and x1 (an ascending straight line of 50 points in the range of 0-1]). They simultaneously satisfy two functions: y1(x0,x1) and y2(x0,x1), as shown in Figure 1.

Figure 1: Single example of the data for this challenge, two signals (x0, x1) that satisfy two different functions (y1, y2).

For this challenge, we want you to build a neural network that generates highly diverse x0, x1 values, while preserving their nature as descending or ascending straight lines that satisfy the specified y1, y2 functions as shown in Figure 2.

Figure 2: Expected outcomes for a passing algorithm for this challenge.

We provide two datasets on the Data Tab to get you started: train_dataset (100,000 elements) and validation_dataset (3,000 elements). Use the train_dataset to build your solution and the validation_dataset to self-evaluate your solution. Both datasets have the same structure where each element is a tensor structured as a pair of X0/X1 straight lines of 50 points (X0 descending, X1 ascending) with an associated tensor of conditional values Y1/Y2 for each pair X0/X1.

Evaluation

The Predictive Leaderboard evaluation uses a separate, smaller dataset named "scoring_dataset," also found on the Data Tab. For this dataset, your solution must generate 30 samples for each of the 50 elements within the scoring dataset. The generated results should be combined and submitted to Xeek for scoring for the Predictive Leaderboard. You can find an example of how the results are generated in the Starter Notebook.

The Xeek Predictive Scoring algorithm will analyze a submission on three different criteria to generate a score. Your submission must meet the first two criteria in order to be placed on the Predictive Leaderboard.

Shape Criteria: a submission must be the expected shape: 1500 elements 30 different {x0,x1} samples generated by NN for each of 50 elements from scoring_dataset. Please review the "sample-submission" file for an example. Passing submissions move to the next criteria or else receive an error message.

Goodness of Fit Criteria: all samples should correspond to the expected nature of x0 and x1:
- Straight-line slope of the correct sign.
- Have a goodness of fit > 0.8 (i.e., Pearson Coefficient).
- A reconstructed y1, y2 match the provided scoring y1, y2 with an RMSE of < 0.1.

Passing submissions move to the next scoring criteria or else receive a 0 score on the Predictive Leaderboard. See the “From One to Many” Challenge (link) for examples of this Criteria.

Diversity Criteria: the Minimum Diversity for x0 and x1 for each sample is averaged through all the elements. This is the primary scoring metric for this challenge and valid scores will be added to the Predictive Leaderboard. A higher score indicates a better solution. A submission that doesn't pass the Shape or Goodness of Fit criteria will be scored as 0.

On the Data Tab, the “sample_submission.pt” file demonstrates the format for a successful submission. Note, you may need to modify it to accommodate the specific characteristics of your neural network.

For the Final Evaluation, the top submissions on the Predictive Leaderboard will be invited to send Xeek their fully reproducible code, which includes the developed Neural Network within the framework you built. The submission should contain a Jupyter Notebook (Python >3.6) with a clearly written pipeline and any necessary supplements to reproduce your results. The Xeek Judges will retrain your model and generate 100 samples of x0, x1 for a holdout dataset on 1,000 elements. Outputs will then be scored using the same method as the Live Scoring algorithm. This score will count for 95% of a participant's final score.

In addition, submissions will be assessed for the interpretability of their submitted code. The interpretability criterion will focus on the extent of documentation, including docstrings and markdown, clear variable naming, and adherence to standard Python style guidelines. Interpretability counts for 5% of a participant's final score.

Important: Ensure you have preserved the reproducibility of the training process for the proposed neural network and keep your best seed. Your work will be disqualified if your results are unreproducible during the final evaluation stage.

References

W. Anderson McAliley and Yaoguo Li, (2021), "Machine learning inversion of geophysical data by a conditional variational autoencoder," SEG Technical Program Expanded Abstracts : 1460-1464.