# Data Format

# Data Directory Structure

[OUTPUT_DIR]
├── [START_TIMESTAMP]
│   ├── ide_tracking.xml
│   ├── eye_tracking.xml
│   ├── archives
│   │   ├── [ARCHIVE_TIMESTAMP_1].archive
│   │   ├── [ARCHIVE_TIMESTAMP_2].archive
│   │   ├── ...
│   ├── screen_recording
│   │   ├── clip_1.mp4
│   │   ├── clip_2.mp4
│   │   ├── ...
│   │   ├── frames.csv

Comment:

  • [OUTPUT_DIR] is the output directory specified in the configuration.
  • [START_TIMESTAMP] is the timestamp when the tracking starts.
  • [ARCHIVE_TIMESTAMP] is the timestamp when the archive is triggered.
  • video_clip_[k].mp4 is the video clip of the screen recording from the (k-1)-th pause (0-th pause is start) to the k-th pause.
  • frames.csv records the timestamp and clip number of each frame in the video clip.

All the timestamps used by CodeGRITS are Unix time in milliseconds, starting from 1970-01-01 00:00:00 UTC.

The editor coordinate system (e.g., line, column) of IntelliJ Platform starts from 0.

# IDE Tracking

[OUTPUT_DIR]
├── [START_TIMESTAMP]
│   ├── ide_tracking.xml
│   ├── archives
│   │   ├── [ARCHIVE_TIMESTAMP_1].archive
│   │   ├── [ARCHIVE_TIMESTAMP_2].archive
│   │   ├── ...
  • <ide_tracking>
    • <environment>
    • <archives>
      • <archive>
    • <actions>
      • <action>
    • <typings>
      • <typing>
    • <files>
      • <file>
    • <mouses>
      • <mouse>
    • <carets>
      • <caret>
    • <selections>
      • <selection>
    • <visible_areas>
      • <visible_area>

Element: <ide_tracking>

Sub-element:

  • <environment>
  • <archives>
  • <actions>
  • <typings>
  • <files>
  • <mouses>
  • <carets>
  • <selections>
  • <visible_areas>

Comment:

  • The root element of the ide_tracking.xml file.

# Environment

Element: <environment>

Attribute:

  • ide_name
  • ide_version
  • os_name
  • java_version
  • project_name
  • project_path
  • screen_size
  • scale_x
  • scale_y

Example:

<environment ide_name="IntelliJ IDEA" ide_version="2022.2.5" java_version="17.0.6" os_name="Windows 10"
             project_name="HelloWorld" project_path="C:/Users/Lenovo/IdeaProjects/HelloWorld" scale_x="1.25"
             scale_y="1.25" screen_size="(1536,864)"/>

Comment:

  • scale_x and scale_y are used to calculate the real screen resolution based on the screen_size. In the example above, the real screen resolution is (1536*1.25, 864*1.25) = (1920, 1080).
  • java_version will be replaced by python_version in PyCharm, etc.
  • All path attributes in the data start with / are relative to project_path, otherwise they are absolute paths. Sometimes the path is empty, which means the data is irrelevant to any file or not successfully tracked.

# Archives

Element: <archives>

Sub-element: <archive>

Comment:

  • A real-time archive mechanism is implemented to track the state of the code file and console output at any timestamp during the development process. The file archive is triggered under two specific conditions: (1) When a file is opened or closed, or its selection changes; (2) When the content of the code in the main editor changes. The console archive is triggered when the console output changes (e.g., run class).
  • The archived data is stored in the archives directory, with the name [ARCHIVE_TIMESTAMP].archive, where [ARCHIVE_TIMESTAMP] is the timestamp when the archive is triggered. Relevant information is stored in the <archive> element, including the timestamp, the path of the file, and the remark.
  • Thus, if you want to know the state of the code file at a specific timestamp, you can find the archive file with the largest timestamp that is smaller than the target timestamp.

# Archive

Element: <archive>

Attribute:

  • id
  • timestamp
  • path: only used in fileArchive
  • remark: only used in fileArchive

Example:

<archive id="fileArchive" path="/src/Main.java" remark="fileOpened" timestamp="1696203834202"/>
<archive id="fileArchive" path="/1696203101069/ide_tracking.xml" remark="fileOpened | NotCodeFile | Fail"
         timestamp="1696203834208"/>
<archive id="fileArchive" path="/src/Main.java" remark="contentChanged" timestamp="1696203839648"/>
<archive id="consoleArchive" timestamp="1696203842925"/>

Comment:

  • id could be fileArchive or consoleArchive.
  • remark could be fileOpened, fileClosed, fileSelectionChanged, contentChanged | OldFile, contentChanged | NewFile.
  • If the file is not a code file, i.e., the file extension is not in the ".java", ".cpp", ".c", ".py", ".rb", ".js", or ".md", NotCodeFile | Fail would be added to the remark. This is to prevent archiving data files with large sizes.
  • If there are IO errors when archiving the file, IOException | Fail will be added to the remark.

# Actions

Element: <actions>

Sub-element: <action>

Comment:

  • The elements in <actions> are all the IDE-specific features, technically are all objects that implement the AnAction abstract class in IntelliJ IDEA. The range is diverse, from the basic editing features like EditorEnter, EditorBackSpace, clipboard features like EditorPaste, EditorCut, run features like RunClass, Stop, ToggleLineBreakpoint, Debug, navigating features like GotoDeclaration, Find, ShowIntentionActions, advanced IDE features like CompareTwoFiles, ReformatCode, to many others that cannot be fully listed here.

# Action

Element: <action>

Attribute:

  • id
  • timestamp
  • path

Example:

<action id="ReformatCode" path="/src/Main.java" timestamp="1696214487353"/>
<action id="SaveAll" path="/src/Main.java" timestamp="1696214490354"/>
<action id="RunClass" path="/src/Main.java" timestamp="1696214496053"/>
<action id="ToggleLineBreakpoint" path="/src/Main.java" timestamp="1696214500296"/>
<action id="EditorEnter" path="/src/Main.java" timestamp="1696214504846"/>
<action id="EditorBackSpace" path="/src/Main.java" timestamp="1696214505280"/>
<action id="SaveAll" path="/src/Main.java" timestamp="1696214506877"/>
<action id="GotoDeclaration" path="/src/Main.java" timestamp="1696214513473"/>
<action id="CodeGRITS.StartStopTracking"
        path="C:/Program Files/Java/jdk-16.0.2/lib/src.zip!/java.base/java/io/PrintStream.java"
        timestamp="1696214517658"/>
<action id="EditorCopy" path="/src/Main.java" timestamp="1696216114539"/>
<action id="$Paste" path="/src/Main.java" timestamp="1696216116839"/>
<action id="$Undo" path="/src/Main.java" timestamp="1696216117569"/>
<action id="Debug" path="/src/Main.java" timestamp="1696216129173"/>
<action id="NewClass" path="/src" timestamp="1696217116236"/>
<action id="RenameElement" path="/src/ABC.java" timestamp="1696217122074"/>

Comment:

  • CodeGRITS-related actions are also implemented as AnAction objects, and their id is prefixed with CodeGRITS, such as CodeGRITS.StartStopTracking, CodeGRITS.PauseResumeTracking, etc.
  • The "add label" action is also tracked here, with id as "CodeGRITS.AddLabel.[LABEL_NAME]", where the label name is pre-set in the configuration.
  • Other IntelliJ plugins may also implement their own AnAction objects, which will also be tracked here. For example, the copilot.applyInlays in the GitHub Copilot plugin.

# Typings

Element: <typings>

Sub-element: <typing>

Comment:

  • The <typings> element records the typing action of the user in the code editor. The data including the character, the timestamp, the path of the file, the line number, and the column number.

# Typing

Element: <typing>

Attribute:

  • character
  • timestamp
  • path
  • line
  • column

Example:

<typing character="S" column="8" line="3" path="/src/Main.java" timestamp="1696216429855"/>
<typing character="y" column="9" line="3" path="/src/Main.java" timestamp="1696216430111"/>
<typing character="s" column="10" line="3" path="/src/Main.java" timestamp="1696216430233"/>

# Files

Element: <files>

Sub-element: <file>

Comment:

  • The <files> element records the file-related actions including opening, closing, and selection change. The data including the timestamp and the path of the file.

# File

Element: <file>

Attribute:

  • id
  • timestamp
  • path: only used in fileOpened/fileClosed
  • old_path: only used in selectionChanged
  • new_path: only used in selectionChanged

Example:

<file id="fileClosed" path="/src/Main.java" timestamp="1696216679318"/>
<file id="selectionChanged" new_path="/src/ABC.java" old_path="/src/Main.java"
      timestamp="1696216679330"/>
<file id="fileOpened" path="/src/ABC.java" timestamp="1696216679338"/>

Comment:

  • id could be fileOpened, fileClosed, or selectionChanged.

# Mouses

Element: <mouses>

Sub-element: <mouse>

Comment:

  • The <mouses> element records the mouse-related actions including pressing, releasing, clicking, moving, and dragging. The data includes the timestamp, the path of the file, the x-coordinate, and the y-coordinate.

# Mouse

Element: <mouse>

Attribute:

  • id
  • timestamp
  • path
  • x
  • y

Example:

<mouse id="mousePressed" path="/src/DEF.java" timestamp="1696217839651" x="642" y="120"/>
<mouse id="mouseReleased" path="/src/DEF.java" timestamp="1696217840187" x="642" y="120"/>
<mouse id="mouseClicked" path="/src/DEF.java" timestamp="1696217840188" x="642" y="120"/>
<mouse id="mousePressed" path="/src/DEF.java" timestamp="1696217843026" x="642" y="120"/>
<mouse id="mouseDragged" path="/src/DEF.java" timestamp="1696217843026" x="634" y="118"/>
<mouse id="mouseReleased" path="/src/DEF.java" timestamp="1696217843830" x="535" y="117"/>
<mouse id="mouseMoved" path="/src/DEF.java" timestamp="1696217843901" x="536" y="117"/>
<mouse id="mouseMoved" path="/src/DEF.java" timestamp="1696217843908" x="537" y="117"/>

Comment:

  • id could be mousePressed, mouseReleased, mouseClicked, mouseMoved, or mouseDragged.
  • x and y are the coordinates relative to the screen_size in the environment, not the actual screen resolution.

# Carets

Element: <carets>

Sub-element: <caret>

Comment:

  • Caret is the cursor in the code editor. The <carets> element records the change of the caret position in the code editor. The data includes the timestamp, the path of the file, the line number, and the column number.

# Caret

Element: <caret>

Attribute:

  • id
  • timestamp
  • path
  • line
  • column

Example:

<caret column="18" id="caretPositionChanged" line="0" path="/src/DEF.java" timestamp="1696217839651"/>

Comment:

  • id could only be caretPositionChanged.

# Selections

Element: <selections>

Sub-element: <selection>

Comment:

  • The <selections> element records data when the user selects a piece of code in the code editor. The data includes the timestamp, the path of the file, the start position, the end position, and the selected text.

# Selection

Element: <selection>

Attribute:

  • id
  • timestamp
  • path
  • start_position: line:column
  • end_position: line:column
  • selected_text

Example:

<selection end_position="0:18" id="selectionChanged" path="/src/DEF.java" selected_text="F {" start_position="0:15"
           timestamp="1696219345156"/>
<selection end_position="0:18" id="selectionChanged" path="/src/DEF.java" selected_text="EF {" start_position="0:14"
           timestamp="1696219345169"/>

Comment:

  • id could only be selectionChanged.

# Visible Areas

Element: <visible_areas>

Sub-element: <visible_area>

Comment:

  • The <visible_areas> element records the visible area of the code editor.

# Visible Area

Element: <visible_area> Attribute:

  • id
  • timestamp
  • path
  • x
  • y
  • width
  • height
<visible_area height="277" id="visibleAreaChanged" path="/src/DEF.java" timestamp="1696219585893" width="883" x="0"
              y="198"/>
<visible_area height="275" id="visibleAreaChanged" path="/src/DEF.java" timestamp="1696219585921" width="883" x="0"
              y="198"/>

Comment:

  • id could only be visibleAreaChanged.
  • x and y are the coordinates of the left-top corner of the visible area in code editor, relative to the left-top corner of the code editor including the invisible part (i.e., the line 0 and column 0). The unit of x, y, width, and height is measured by screen_size in the environment, not the actual screen resolution.
  • The change of x and y is usually caused by scrolling code editor, which could be used to track the horizontal and vertical scrolling respectively. The change of width and height is usually caused by resizing code editor, which could be used to track the horizontal and vertical resizing respectively.

# Eye Tracking

[OUTPUT_DIR]
├── [START_TIMESTAMP]
│   ├── eye_tracking.xml
  • <eye_tracking>
    • <setting>
    • <gazes>
      • <gaze>
        • <left_eye>
        • <right_eye>
        • <location>
        • <ast_structure>
          • <level>

Element: <eye_tracking>

Sub-element:

  • <setting>
  • <gazes>

Comment:

  • The root element of the eye_tracking.xml file. CodeGRITS supports both Mouse simulation and Tobii Pro eye tracker devices.
  • Since Tobii Pro SDK does not support Java, we use the Python library tobii-research to collect eye tracking data and use Java ProcessBuilder to call the Python script to collect data. The Python interpreter is specified in the configuration.

# Setting

Element: <setting>

Attribute:

  • eye_tracker
  • sampling_rate

Example:

<setting eye_tracker="Tobii Pro Fusion" sample_frequency="30"/>

Comment:

  • eye_tracker could be Mouse for simulation, or a real Tobii Pro eye tracker device name (e.g., Tobii Pro Fusion), which is obtained from eyetracker.model in the tobii-research library.
  • sampling_rate is the sampling rate of the eye tracker in Hz, which is pre-set in the configuration and whose range could be in eyetracker.get_all_gaze_output_frequencies() called in the tobii-research library.

# Gazes

Element: <gazes>

Sub-element: <gaze>

Comment:

  • Collection of all gaze data.

# Gaze

Element: <gaze>

Sub-element:

  • <left_eye>
  • <right_eye>
  • <location>: only used when the gaze point can be mapped to its location in the code editor
  • <ast_structure>: only used when the gaze point cannot be mapped to its location in the code editor, and the code file is java.

Attribute:

  • timestamp
  • remark: only used when the gaze point cannot be mapped to location in the code editor

Example:

<gaze timestamp="1696224370377">
    <left_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
              pupil_diameter="2.4835662841796875" pupil_validity="1.0"/>
    <right_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
               pupil_diameter="2.7188568115234375" pupil_validity="1.0"/>
    <location column="25" line="2" path="/src/Main.java" x="820" y="150"/>
    <ast_structure token="println" type="IDENTIFIER">
        <level end="2:26" start="2:19" tag="PsiIdentifier:println"/>
        <level end="2:26" start="2:8" tag="PsiReferenceExpression:System.out.println"/>
        <level end="2:42" start="2:8" tag="PsiMethodCallExpression:System.out.println(&quot;Hello world!&quot;)"/>
        <level end="2:43" start="2:8" tag="PsiExpressionStatement"/>
        <level end="3:5" start="1:43" tag="PsiCodeBlock"/>
        <level end="3:5" start="1:4" tag="PsiMethod:main"/>
        <level end="4:1" start="0:0" tag="PsiClass:Main"/>
    </ast_structure>
</gaze>

Comment:

When the gaze point cannot be mapped to its location in the code editor in the following 3 cases, the remark attribute is used:

  1. The raw gaze point from the eye tracker is invalid. (i.e., nan). In this case, the remark is Fail | Invalid Gaze Point.
  2. The code editor is not founded. In this case, the remark is Fail | No Editor.
  3. The code editor is founded, but the gaze point is out of the code editor. In this case, the remark is Fail | Out of Text Editor.

# Left Eye

Element: <left_eye>

Attribute:

  • gaze_point_x
  • gaze_point_y
  • gaze_validity
  • pupil_diameter
  • pupil_validity

Example:

<left_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
          pupil_diameter="2.4835662841796875" pupil_validity="1.0"/>

Comment:

  • gaze_point_x and gaze_point_y are the location on the screen, ranging from 0 to 1, where (0, 0) is the top-left corner of the screen, and (1, 1) is the bottom-right corner of the screen.
  • gaze_validity and pupil_validity are the validity of the gaze point and pupil diameter, which is binary, 0 for invalid, 1 for valid. When using a mouse to simulate eye tracker, gaze_validity is always 1.0, and pupil_validity is always 0.0.
  • pupil_diameter is the diameter of the pupil in mm, when using a mouse to simulate eye tracker, pupil_diameter is always 0.

# Right Eye

Element: <right_eye>

Attribute:

  • gaze_point_x
  • gaze_point_y
  • gaze_validity
  • pupil_diameter
  • pupil_validity

Example:

<right_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
           pupil_diameter="2.7188568115234375" pupil_validity="1.0"/>

Comment:

  • gaze_point_x and gaze_point_y are the location on the screen, ranging from 0 to 1, where (0, 0) is the top-left corner of the screen, and (1, 1) is the bottom-right corner of the screen.
  • gaze_validity and pupil_validity are the validity of the gaze point and pupil diameter, which is binary, 0 for invalid, 1 for valid. When using a mouse to simulate eye tracker, gaze_validity is always 1.0, and pupil_validity is always 0.0.
  • pupil_diameter is the diameter of the pupil in mm, when using a mouse to simulate eye tracker, pupil_diameter is always 0.

# Location

Element: <location>

Attribute:

  • path
  • line
  • column
  • x
  • y

Example:

<location column="25" line="2" path="/src/Main.java" x="820" y="150"/>

Comment:

  • x and y are the coordinates of the gaze relative to the top-left corner of the visible code editor, whose unit is same as the screen_size in environment, not the actual screen resolution.
  • line and column are the line number and column number of the gaze point in the code editor, which is calculated by xyToLogicalPosition(@NotNull Point p) method of Editor interface in the IntelliJ Platform SDK.

# AST Structure

Element: <ast_structure>

Sub-element: <level>: only used when the current token is different from the previous token

Attribute:

  • token
  • type
  • remark: only used when the current token is same as the previous token

Example:

<ast_structure token="println" type="IDENTIFIER">
  <level end="2:26" start="2:19" tag="PsiIdentifier:println"/>
  <level end="2:26" start="2:8" tag="PsiReferenceExpression:System.out.println"/>
  <level end="2:42" start="2:8" tag="PsiMethodCallExpression:System.out.println(&quot;Hello world!&quot;)"/>
  <level end="2:43" start="2:8" tag="PsiExpressionStatement"/>
  <level end="3:5" start="1:43" tag="PsiCodeBlock"/>
  <level end="3:5" start="1:4" tag="PsiMethod:main"/>
  <level end="4:1" start="0:0" tag="PsiClass:Main"/>
</ast_structure>
<ast_structure remark="Same (Last Successful AST)" token="println" type="IDENTIFIER"/>

Comment:

  • The abstract syntax tree (AST) of the code file is recorded in the <ast_structure> element. The AST is calculated by program structure interface (PSI) of IntelliJ Platform.
  • token is the text of the leaf node in the AST of the current gaze point, which is calculated by psiElement.getText().
  • type is the type of the leaf node, which is calculated by psiElement.getNode().getElementType().
  • remark is used when the current token is same as the previous token, which means the gaze point is still in the same leaf node. In this case, the remark is Same (Last Successful AST). We designed this mechanism to avoid eye_tracking.xml from being too large.
  • We calculate the parent nodes of the leaf node by psiElement.getParent() until the file level (i.e. PsiFile), and save them in the <level> element. In the previous example, the leaf node is PsiIdentifier:println, and its parent nodes are PsiReferenceExpression:System.out.println => PsiMethodCallExpression:System.out.println("Hello world!") => PsiExpressionStatement => PsiCodeBlock => PsiMethod:main => PsiClass:Main. The original code text is
    public class Main {
        public static void main(String[] args) {
            System.out.println("Hello world!");
        }
    }

# Level

Element: <level>

Attribute:

  • start: line:column
  • end: line:column
  • tag

Example:

<level end="3:5" start="1:4" tag="PsiMethod:main"/>

Comment:

  • start and end are the start and end position of the AST node level in the code file, which is calculated by psiElement.getTextRange().
  • tag is the type of the AST node level, which is calculated by psiElement.toString().

# Screen Recording

[OUTPUT_DIR]
├── [START_TIMESTAMP]
│   ├── screen_recording
│   │   ├── clip_1.mp4
│   │   ├── clip_2.mp4
│   │   ├── ...
│   │   ├── frames.csv
  • clip_[k].mp4
  • frames.csv

# Video Clips

clip_[k].mp4

Comment:

  • The video clip of the screen recording from the (k-1)-th pause (0-th pause is start) to the k-th pause. We designed this mechanism to avoid the video file in the memory being too large especially when pausing the tracking for a long time.

# Frames

frames.csv

Column:

  • timestamp
  • frame_number
  • clip_number

Example:

timestamp,frame_number,clip_number
1703661629399,Start,1
1703661630996,1,1
1703661631247,2,1
1703661644518,Pause,1
1703661646446,Resume,2
1703661646824,1,2
1703661647737,Stop,2

Comment:

  • The frame rate is 12 fps.
  • frame_number is the frame number of the frame in its video clip.
  • clip_number is the number of the video clip to which the frame belongs.
  • We also record the timestamp of Start, Pause, Resume, and Stop actions in the frames.csv file, which could be used to separate each stage of the development process.