User Manual¶
If you want to participate in a benchmark on the EYRA Benchmark Platform, you need to create a model based on participant data, that can predict outcomes (e.g., class labels or numeric values) given test data. What you submit to the benchmark platform is a Docker container that does the predictions, given your model or algorithm. Benchmark organisers need to provide a Docker container for evaluating the results produced by the models/algorithms of participants. The EYRA tools can be used to generate boilerplate containers that set up as much as possible, so you can focus on implementing the prediction algorithm or evaluation metrics.
To be able to use the Docker container as an submission or evaluation, you need to publish it on Docker Hub. If you (or your organization) do(es) not yet have a Docker Hub account, you need to sign up for one.
Quickstart¶
Generate a boilerplate container by running:
eyra-generate [submission|evaluation] <name>. A directory called<name>is created.Put the input data in the
<name>/data/input/directory.- Benchmark participants should download the public test data and put it
in this directory. The file should be called
data/input/test_data. - Benchmark organisers should create a ground truth file as
data/input/ground_truthand create an example output file (as if created by a participants container), and store it asdata/input/implementation_output.
- Benchmark participants should download the public test data and put it
in this directory. The file should be called
For submissions, implement the prediction code in
src/submission.py. For evaluations, implement evaluation metrics insrc/evaluation.py. You can test your code by runningpython src/submission.pyorpython src/evaluation.pyfrom the<name>directory. Output is written to<name>/data/output.Important
Please do not change the file paths in the part of the code after
if __name__ == "__main__":. Running submissions and evaluations depends on default file names. When running the container locally (see step 7 of this quickstart), these paths are mapped to your local copy of the input and output directories. When running on the benchmark platform, they are mapped to the challenge input and output directories.Add any file you need to the
<name>/srcdirectory. This can be (Python) code, but also (binary) files containing a model (see the tutorial for an example). For evaluations, you might not need additional files.Add your code’s dependencies to the
<name>/requirements.txtfile.Update the
run()method of theSubmissionorEvaluationobject to call a function or functions fromsrc/submission.pyorsrc/evaluation.py.Tip
The generated code contains a complete, albeit very simple, example of a submission or evaluation. For a more realistic example have a look at the demo benchmark tutorial.
Test your container by running
test.sh. This will build the container and run the prediction or evaluation code on your local copy of the data. Output is written to<name>/data/output.If you are done developing, run the
push.shscript to tag your Docker container with a version number and push it to Docker Hub../push.sh [version]
If you omit the version number, the Docker image is tagged with
latest.Specify the Docker container using
<docker hub account>/<name>:<version>on the EYRA Benchmark Platform.
Complete example¶
After installing the EYRA Tools, and acquiring a Docker Hub account, generate a boilerplate container by running:
eyra-generate [submission|evaluation] <name> [-d <docker hub account>]
For submissions, this will create a directory with the following structure:
<name>
├── .gitignore
├── Dockerfile
├── README.md
├── build.sh
├── data
│ ├── .gitignore
│ ├── input
│ │ ├── .gitignore
│ │ └── test_data
│ └── output_data_appears_here.txt
├── export.sh
├── push.sh
├── requirements.txt
├── src
│ ├── run_submission.py
│ └── submission.py
└── test.sh
For evaluations, the file names are slightly different:
<name>
├── .gitignore
├── Dockerfile
├── README.md
├── build.sh
├── data
│ ├── .gitignore
│ ├── input
│ │ ├── .gitignore
│ │ ├── ground_truth
│ │ └── implementation_output
│ └── output_data_appears_here.txt
├── export.sh
├── push.sh
├── requirements.txt
├── src
│ ├── evaluation.py
│ └── run_evaluation.py
└── test.sh
For more information on what the files and directories are used for, have a look at What are all the files?
Tip
It is good practice to use version control when writing code. Now is a good time to do so. If you are using git, run:
git init
git add -A
git commit -m "Initial commit"
Note
It is also good practice to separate code and data. However, because the
boilerplate container contains a bash script for running the code inside the
Docker container, it
is convenient to have the data in the directory generated by running
eyra-generate (otherwise you
wouldn’t be able to run test.sh on another computer). To prevent the
data from being uploaded to github or another Git repository hosting
service, the data directory is ignored by git.
Data preparation¶
The boilerplate code comes with example data. The data files can be found in the
data/input directory. For submissions, there is a single data file called
test_data and for evaluations there are two files, ground_truth and
implementation_output.
Both submissions and evaluations should produce a single file called output,
that is written to the data directory.
Important
As the EYRA Benchmark Platform uses these default file names, do not change them when working on your own submission or evaluation!
For a submission, you can get the data files from the benchmark page on the EYRA Benchmark Platform. For evaluations, it is the responsability of the benchmark organizers to prepare the data files.
Implementation¶
All files related to your algorithm should be put in the src directory.
Because for development and debugging it is easier to run the code on
your computer instead of inside the Docker container, the code is divided over
two files: submission.py and run_submission.py
for submissions and evaluation.py and run_evaluation.py for evaluations.
<container_type>.py contains the functionality for running the submission or
evaluation and run_<container_type>.py contains the code for running the
submission or evaluation inside the Docker container.
submission.py looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from pathlib import Path
# Add your imports here; numpy is only used as an example
import numpy as np
def my_submission(test_file, out_file):
with open(test_file, 'r') as test:
data = np.loadtxt(test)
with open(out_file, 'w') as f:
for sample in data:
f.write(str(int(sample % 2)))
f.write('\n')
if __name__ == "__main__":
# Run the algorithm on your local copy of the data by typing:
# python src/submission.py
# These are the default file paths (names) for input and output, so don't
# change them.
test_file = str(Path('data')/'input'/'test_data')
out_file = str(Path('data')/'output')
my_submission(test_file, out_file)
|
The code contains a single function my_submission() that takes two arguments:
a path to the input file (i.e., data/input/test_data) and a path to the
output file (i.e., data/output) (line 7). On line 8 and 9, the test data
is read and put into a numpy array. Next, the output file is opened for writing
(line 11) and we loop over the values in the test data. The example prediction
algorithm is very simple: for every value in the test data, we write a zero to
the output file if the number is even and a one if the number is odd.
The code can be run by typing python src/submission.py. If you do that,
everything after line 17 is excecuted. First, we create file paths for the input
and output file (lines 23 and 24). Then my_submission() is called.
After running the code, the data directory contains a new file called
output.
The boilerplate code for evaluations (evaluation.py) is very similar:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | import json
from pathlib import Path
# Add your imports here; numpy is only used as an example
import numpy as np
def my_evaluation(submission_file, test_gt_file, out_file):
with open(submission_file, 'r') as subm:
submission_labels = np.loadtxt(subm)
with open(test_gt_file, 'r') as gt:
gt_labels = np.loadtxt(gt)
num_correct = 0
for subm_label, gt_label in zip(submission_labels, gt_labels):
if subm_label == gt_label:
num_correct += 1
output = {'metrics': {'accuracy': float(num_correct)/len(gt_labels)}}
with open(out_file, 'w') as f:
json.dump(output, f)
if __name__ == "__main__":
# Run the algorithm on your local copy of the data by typing:
# python src/evaluation.py
# These are the default file paths (names) for input and output, so don't
# change them.
submission_file = str(Path('data')/'input'/'implementation_output')
test_gt_file = str(Path('data')/'input'/'ground_truth')
out_file = str(Path('data')/'output')
my_evaluation(submission_file, test_gt_file, out_file)
|
The main function is called my_evaluation() and requires three arguments:
the submission file (i.e., data/input/implementation_output), the ground
truth (i.e., data/input/ground_truth), and the output file (i.e.,
data/output). On lines 11 and 12 and 13 and 14, the input files are read into
numpy arrays. For the evaluation, we are going to count how often the numbers in
both arrays are the same. We set a counter to zero (line 16), and loop over the
numbers in both arrays simultaneously (line 17). If the numbers are equal (line 18)
we add one to the counter (line 19). Next, we prepare the output. The output
should be a json file containing a single object (dictionary) with a metrics
key. The value of metrics is an object (dictionary) listing the names of the
metrics and their value. In this case, we have a single value called accuracy
for which we calculate the value by taking the number of samples for which the
predicted label was equal to the gold standard and dividing by the total number
of samples. On lines 23 and 24, this data is written to data/output.
If you are done implementing your submission or evaluation code, it is time to
make sure it can be run inside the Docker container. In order to do so, you need
to update run_submission.py or run_evaluation.py.
run_submission.py looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | from pathlib import Path
from submission import my_submission
class Submission(object):
def run(self, test_file, out_file):
"""This is boilerplate. Delete the contents of this method and put your
own code here. Please do not change the class name (Submission),
the method name (run), or the arguments.
"""
my_submission(test_file, out_file)
# Please do not change anything below
if __name__ == "__main__":
# These are the default file paths (names) for input and output
test_file = Path('/')/'data'/'input'/'test_data'
out_file = Path('/')/'data'/'output'
Submission().run(test_file, out_file)
|
On line 3, we import the my_submission() function from submission.py,
which is called inside the Submission object’s (line 5) run() method
(line 11) on line 11. For your submission, change the code to import the
function(s) needed to create the output, and call them in the run() method.
If you need to split up your code into multiple Python files or if you need
additional files for predicting outcomes, put them in the src
directory, and load them inside the run() method (see
Demo benchmark: the Iris data for an example).
Warning
If you change the code below line 14, you run the risk that it will not work on the EYRA Benchmark Platform.
Note
This is the code for submissions. The code for evaluations looks slightly
different. For evaluations line 5 is class Evaluation(object): and
the run() method has three arguments (i.e., the submission file, the
ground truth file, and the output file) instead of two.
Important
When running the Docker container locally, the input and output directory on your
hard drive are mapped to /data/input/ and /data/ using docker.
When running it on the benchmark platform, these directories are mapped
to the benchmark input and output directories.
Output formats¶
For submissions, the benchmark page should specify the output format. For evaluations, the output should be a json file containing:
{
"metrics": {
"metric1": <numeric value>
...
}
}
Dependencies¶
All dependencies (like numpy (line 5) in the example submission), should be
listed in requirements.txt, so they are installed inside the container.
Building the Docker container¶
To build the Docker container, run build.sh.
$ ./build.sh
Sending build context to Docker daemon 31.74kB
Step 1/7 : FROM python:3.7-slim
---> 783362c5ef81
Step 2/7 : RUN mkdir -p /opt/src /input /output
---> Using cache
---> 5cf2874fe9d2
Step 3/7 : WORKDIR /opt/src
---> Using cache
---> 4876a0b73b86
Step 4/7 : COPY requirements.txt /opt/src/
---> Using cache
---> d397b4c2f203
Step 5/7 : RUN python -m pip install -r requirements.txt
---> Using cache
---> b7815db6c39e
Step 6/7 : ADD src /opt/src/
---> Using cache
---> e6ea2d81fb51
Step 7/7 : ENTRYPOINT "python" "-m" "run_submission"
---> Using cache
---> 013e85d0112d
Successfully built 013e85d0112d
Successfully tagged <name>:latest
If you want to build the Docker container and test your code, run test.sh.
Have a look at the scripts page for an overview of the scripts available for manipulating the Docker container.
Tagging and pushing¶
If you are done developing, run the push.sh script to tag your Docker
container with a version number and push it to Docker Hub.
./push.sh [version]
If you omit the version number, the Docker image is tagged with latest.
Warning
Please use semantic versioning rather
than the latest tag. If you submit latest Docker images, the
EYRA Benchmark Platform might not use the latest version.
Tip
If you get the following message: denied: requested access to the
resource is denied unauthorized: authentication required.
You need to login to Docker Hub using docker login.
You can also manually tag and push your image:
docker tag <name/id> <docker hub account>/<name>:<version>
docker push <docker hub account>/<name>
Submitting¶
To submit a submission or evaluation to the EYRA Benchmark Platform, put
<docker hub account>/<name>:<version> in the designated form field.