Skip to content

Annotation data

Currently, SceneFun3D provides three categories of annotated data:

  • Functional interactive element annotations
  • Language task descriptions
  • Motion annotations

In the sections below, we describe the provided data for each category. Each annotation is accompanied with a unique identifier of the form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.

Functional interactive elements

The format of the functional interactive element annotations can be seen below.

  "visit_id": the identifier of the scene,
  "annotations": [
      "annot_id": unique id of the annotation,
      "indices": the mask indices of the original laser scan point cloud ({visit_id}_laser_scan.ply) that comprise the functional interactive element instance,
      "label": affordance label

Currently, the SceneFun3D dataset contains interactions with the following affordance labels:

Label Description
rotate functionalities that are adjusted by a rotary switch knob, e.g., thermostat
key_press surfaces that consist of keys that can be pressed, e.g., remote control, keyboard
tip_push functionalities that can be triggered by the tip of the finger, e.g., light switch
hook_pull surfaces that can be pulled by hooking up fingers, e.g., fridge handle
pinch_pull surfaces that can be pulled through a pinch movement, e.g., drawer knob
hook_turn surfaces that can be turned by hooking up fingers, e.g., door handle
foot_push surfaces that can be pushed by foot, e.g., foot pedal of a trash can
plug_in surfaces that comprise electrical power sources
unplug removing a plug from a socket

In addition to these affordance categories, we have annotated functionalities whose geometry or the parent object’s geometry is not well-captured in the laser scans (e.g., reflective or transparent surfaces) under the label exclude. These cases are excluded during the evaluation process.

Language task descriptions

The format of the natural language task descriptions can be seen below.

  "visit_id": the identifier of the scene,
  "descriptions": [
      "desc_id": unique id of the description,
      "annot_id": [
        list of the associated annotation id's in the *annotations.json* file
      "description": language instruction of the task

We highlight that, in some cases, more than one instance of functional interactive elements may correspond to a single language task description.

Motion annotations

The format of the motion annotations can be seen below.

  "visit_id": the identifier of the scene,
  "motions": [
      "motion_id": unique id of the description,
      "annot_id": the associated annotation id in the *annotations.json* file,
      "motion_type": motion type (rotational or translational),
      "motion_dir": motion direction (three element array),
      "motion_origin_idx": point index of the original laser scan point cloud ({visit_id}_laser_scan.ply) which comprises the motion axis origin ,
      "motion_viz_orient": motion visualization orientation (optional)`