#
Data Format
#
Data Directory Structure
[OUTPUT_DIR]
├── [START_TIMESTAMP]
│ ├── ide_tracking.xml
│ ├── eye_tracking.xml
│ ├── archives
│ │ ├── [ARCHIVE_TIMESTAMP_1].archive
│ │ ├── [ARCHIVE_TIMESTAMP_2].archive
│ │ ├── ...
│ ├── screen_recording
│ │ ├── clip_1.mp4
│ │ ├── clip_2.mp4
│ │ ├── ...
│ │ ├── frames.csv
Comment:
[OUTPUT_DIR]is the output directory specified in the configuration.[START_TIMESTAMP]is the timestamp when the tracking starts.[ARCHIVE_TIMESTAMP]is the timestamp when the archive is triggered.video_clip_[k].mp4is the video clip of the screen recording from the (k-1)-th pause (0-th pause is start) to the k-th pause.frames.csvrecords the timestamp and clip number of each frame in the video clip.
All the timestamps used by CodeGRITS are Unix time in milliseconds, starting from 1970-01-01 00:00:00 UTC.
The editor coordinate system (e.g., line, column) of IntelliJ Platform starts from 0.
#
IDE Tracking
[OUTPUT_DIR]
├── [START_TIMESTAMP]
│ ├── ide_tracking.xml
│ ├── archives
│ │ ├── [ARCHIVE_TIMESTAMP_1].archive
│ │ ├── [ARCHIVE_TIMESTAMP_2].archive
│ │ ├── ...
<ide_tracking><environment><archives><archive>
<actions><action>
<typings><typing>
<files><file>
<mouses><mouse>
<carets><caret>
<selections><selection>
<visible_areas><visible_area>
Element: <ide_tracking>
Sub-element:
<environment><archives><actions><typings><files><mouses><carets><selections><visible_areas>
Comment:
- The root element of the
ide_tracking.xmlfile.
#
Environment
Element: <environment>
Attribute:
- ide_name
- ide_version
- os_name
- java_version
- project_name
- project_path
- screen_size
- scale_x
- scale_y
Example:
<environment ide_name="IntelliJ IDEA" ide_version="2022.2.5" java_version="17.0.6" os_name="Windows 10"
project_name="HelloWorld" project_path="C:/Users/Lenovo/IdeaProjects/HelloWorld" scale_x="1.25"
scale_y="1.25" screen_size="(1536,864)"/>
Comment:
scale_xandscale_yare used to calculate the real screen resolution based on thescreen_size. In the example above, the real screen resolution is (1536*1.25, 864*1.25) = (1920, 1080).java_versionwill be replaced bypython_versionin PyCharm, etc.- All
pathattributes in the data start with/are relative toproject_path, otherwise they are absolute paths. Sometimes the path is empty, which means the data is irrelevant to any file or not successfully tracked.
#
Archives
Element: <archives>
Sub-element: <archive>
Comment:
A real-time archive mechanism is implemented to track the state of the code file and console output at any timestamp during the development process. The file archive is triggered under two specific conditions: (1) When a file is opened or closed, or its selection changes; (2) When the content of the code in the main editor changes. The console archive is triggered when the console output changes (e.g., run class).
The archived data is stored in the
archivesdirectory, with the name[ARCHIVE_TIMESTAMP].archive, where[ARCHIVE_TIMESTAMP]is the timestamp when the archive is triggered. Relevant information is stored in the<archive>element, including the timestamp, the path of the file, and the remark.Thus, if you want to know the state of the code file at a specific timestamp, you can find the archive file with the largest timestamp that is smaller than the target timestamp.
#
Archive
Element: <archive>
Attribute:
- id
- timestamp
- path: only used in
fileArchive - remark: only used in
fileArchive
Example:
<archive id="fileArchive" path="/src/Main.java" remark="fileOpened" timestamp="1696203834202"/>
<archive id="fileArchive" path="/1696203101069/ide_tracking.xml" remark="fileOpened | NotCodeFile | Fail"
timestamp="1696203834208"/>
<archive id="fileArchive" path="/src/Main.java" remark="contentChanged" timestamp="1696203839648"/>
<archive id="consoleArchive" timestamp="1696203842925"/>
Comment:
idcould befileArchiveorconsoleArchive.remarkcould befileOpened,fileClosed,fileSelectionChanged,contentChanged | OldFile,contentChanged | NewFile.- If the file is not a code file, i.e., the file extension is not in the ".java", ".cpp", ".c", ".py", ".rb", ".js",
or ".md",
NotCodeFile | Failwould be added to the remark. This is to prevent archiving data files with large sizes. - If there are IO errors when archiving the file,
IOException | Failwill be added to the remark.
#
Actions
Element: <actions>
Sub-element: <action>
Comment:
- The elements in
<actions>are all the IDE-specific features, technically are all objects that implement theAnActionabstract class in IntelliJ IDEA. The range is diverse, from the basic editing features likeEditorEnter,EditorBackSpace, clipboard features likeEditorPaste,EditorCut, run features likeRunClass,Stop,ToggleLineBreakpoint,Debug, navigating features likeGotoDeclaration,Find,ShowIntentionActions, advanced IDE features likeCompareTwoFiles,ReformatCode, to many others that cannot be fully listed here.
#
Action
Element: <action>
Attribute:
- id
- timestamp
- path
Example:
<action id="ReformatCode" path="/src/Main.java" timestamp="1696214487353"/>
<action id="SaveAll" path="/src/Main.java" timestamp="1696214490354"/>
<action id="RunClass" path="/src/Main.java" timestamp="1696214496053"/>
<action id="ToggleLineBreakpoint" path="/src/Main.java" timestamp="1696214500296"/>
<action id="EditorEnter" path="/src/Main.java" timestamp="1696214504846"/>
<action id="EditorBackSpace" path="/src/Main.java" timestamp="1696214505280"/>
<action id="SaveAll" path="/src/Main.java" timestamp="1696214506877"/>
<action id="GotoDeclaration" path="/src/Main.java" timestamp="1696214513473"/>
<action id="CodeGRITS.StartStopTracking"
path="C:/Program Files/Java/jdk-16.0.2/lib/src.zip!/java.base/java/io/PrintStream.java"
timestamp="1696214517658"/>
<action id="EditorCopy" path="/src/Main.java" timestamp="1696216114539"/>
<action id="$Paste" path="/src/Main.java" timestamp="1696216116839"/>
<action id="$Undo" path="/src/Main.java" timestamp="1696216117569"/>
<action id="Debug" path="/src/Main.java" timestamp="1696216129173"/>
<action id="NewClass" path="/src" timestamp="1696217116236"/>
<action id="RenameElement" path="/src/ABC.java" timestamp="1696217122074"/>
Comment:
CodeGRITS-related actions are also implemented as
AnActionobjects, and theiridis prefixed withCodeGRITS, such asCodeGRITS.StartStopTracking,CodeGRITS.PauseResumeTracking, etc.The "add label" action is also tracked here, with
idas"CodeGRITS.AddLabel.[LABEL_NAME]", where the label name is pre-set in the configuration.Other IntelliJ plugins may also implement their own
AnActionobjects, which will also be tracked here. For example, thecopilot.applyInlaysin the GitHub Copilot plugin.
#
Typings
Element: <typings>
Sub-element: <typing>
Comment:
- The
<typings>element records the typing action of the user in the code editor. The data including the character, the timestamp, the path of the file, the line number, and the column number.
#
Typing
Element: <typing>
Attribute:
- character
- timestamp
- path
- line
- column
Example:
<typing character="S" column="8" line="3" path="/src/Main.java" timestamp="1696216429855"/>
<typing character="y" column="9" line="3" path="/src/Main.java" timestamp="1696216430111"/>
<typing character="s" column="10" line="3" path="/src/Main.java" timestamp="1696216430233"/>
#
Files
Element: <files>
Sub-element: <file>
Comment:
- The
<files>element records the file-related actions including opening, closing, and selection change. The data including the timestamp and the path of the file.
#
File
Element: <file>
Attribute:
- id
- timestamp
- path: only used in
fileOpened/fileClosed - old_path: only used in
selectionChanged - new_path: only used in
selectionChanged
Example:
<file id="fileClosed" path="/src/Main.java" timestamp="1696216679318"/>
<file id="selectionChanged" new_path="/src/ABC.java" old_path="/src/Main.java"
timestamp="1696216679330"/>
<file id="fileOpened" path="/src/ABC.java" timestamp="1696216679338"/>
Comment:
idcould befileOpened,fileClosed, orselectionChanged.
#
Mouses
Element: <mouses>
Sub-element: <mouse>
Comment:
- The
<mouses>element records the mouse-related actions including pressing, releasing, clicking, moving, and dragging. The data includes the timestamp, the path of the file, the x-coordinate, and the y-coordinate.
#
Mouse
Element: <mouse>
Attribute:
- id
- timestamp
- path
- x
- y
Example:
<mouse id="mousePressed" path="/src/DEF.java" timestamp="1696217839651" x="642" y="120"/>
<mouse id="mouseReleased" path="/src/DEF.java" timestamp="1696217840187" x="642" y="120"/>
<mouse id="mouseClicked" path="/src/DEF.java" timestamp="1696217840188" x="642" y="120"/>
<mouse id="mousePressed" path="/src/DEF.java" timestamp="1696217843026" x="642" y="120"/>
<mouse id="mouseDragged" path="/src/DEF.java" timestamp="1696217843026" x="634" y="118"/>
<mouse id="mouseReleased" path="/src/DEF.java" timestamp="1696217843830" x="535" y="117"/>
<mouse id="mouseMoved" path="/src/DEF.java" timestamp="1696217843901" x="536" y="117"/>
<mouse id="mouseMoved" path="/src/DEF.java" timestamp="1696217843908" x="537" y="117"/>
Comment:
idcould bemousePressed,mouseReleased,mouseClicked,mouseMoved, ormouseDragged.xandyare the coordinates relative to thescreen_sizein theenvironment, not the actual screen resolution.
#
Carets
Element: <carets>
Sub-element: <caret>
Comment:
- Caret is the cursor in the code editor. The
<carets>element records the change of the caret position in the code editor. The data includes the timestamp, the path of the file, the line number, and the column number.
#
Caret
Element: <caret>
Attribute:
- id
- timestamp
- path
- line
- column
Example:
<caret column="18" id="caretPositionChanged" line="0" path="/src/DEF.java" timestamp="1696217839651"/>
Comment:
idcould only becaretPositionChanged.
#
Selections
Element: <selections>
Sub-element: <selection>
Comment:
- The
<selections>element records data when the user selects a piece of code in the code editor. The data includes the timestamp, the path of the file, the start position, the end position, and the selected text.
#
Selection
Element: <selection>
Attribute:
- id
- timestamp
- path
- start_position: line:column
- end_position: line:column
- selected_text
Example:
<selection end_position="0:18" id="selectionChanged" path="/src/DEF.java" selected_text="F {" start_position="0:15"
timestamp="1696219345156"/>
<selection end_position="0:18" id="selectionChanged" path="/src/DEF.java" selected_text="EF {" start_position="0:14"
timestamp="1696219345169"/>
Comment:
idcould only beselectionChanged.
#
Visible Areas
Element: <visible_areas>
Sub-element: <visible_area>
Comment:
- The
<visible_areas>element records the visible area of the code editor.
#
Visible Area
Element: <visible_area>
Attribute:
- id
- timestamp
- path
- x
- y
- width
- height
<visible_area height="277" id="visibleAreaChanged" path="/src/DEF.java" timestamp="1696219585893" width="883" x="0"
y="198"/>
<visible_area height="275" id="visibleAreaChanged" path="/src/DEF.java" timestamp="1696219585921" width="883" x="0"
y="198"/>
Comment:
idcould only bevisibleAreaChanged.xandyare the coordinates of the left-top corner of the visible area in code editor, relative to the left-top corner of the code editor including the invisible part (i.e., the line 0 and column 0). The unit ofx,y,width, andheightis measured byscreen_sizein theenvironment, not the actual screen resolution.- The change of
xandyis usually caused by scrolling code editor, which could be used to track the horizontal and vertical scrolling respectively. The change ofwidthandheightis usually caused by resizing code editor, which could be used to track the horizontal and vertical resizing respectively.
#
Eye Tracking
[OUTPUT_DIR]
├── [START_TIMESTAMP]
│ ├── eye_tracking.xml
<eye_tracking><setting><gazes><gaze><left_eye><right_eye><location><ast_structure><level>
Element: <eye_tracking>
Sub-element:
<setting><gazes>
Comment:
The root element of the
eye_tracking.xmlfile. CodeGRITS supports both Mouse simulation and Tobii Pro eye tracker devices.Since Tobii Pro SDK does not support Java, we use the Python library
tobii-researchto collect eye tracking data and use Java ProcessBuilder to call the Python script to collect data. The Python interpreter is specified in the configuration.
#
Setting
Element: <setting>
Attribute:
- eye_tracker
- sampling_rate
Example:
<setting eye_tracker="Tobii Pro Fusion" sample_frequency="30"/>
Comment:
eye_trackercould beMousefor simulation, or a real Tobii Pro eye tracker device name (e.g.,Tobii Pro Fusion), which is obtained fromeyetracker.modelin thetobii-researchlibrary.sampling_rateis the sampling rate of the eye tracker in Hz, which is pre-set in the configuration and whose range could be ineyetracker.get_all_gaze_output_frequencies()called in thetobii-researchlibrary.
#
Gazes
Element: <gazes>
Sub-element: <gaze>
Comment:
- Collection of all gaze data.
#
Gaze
Element: <gaze>
Sub-element:
<left_eye><right_eye><location>: only used when the gaze point can be mapped to its location in the code editor<ast_structure>: only used when the gaze point cannot be mapped to its location in the code editor, and the code file is java.
Attribute:
- timestamp
- remark: only used when the gaze point cannot be mapped to location in the code editor
Example:
<gaze timestamp="1696224370377">
<left_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
pupil_diameter="2.4835662841796875" pupil_validity="1.0"/>
<right_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
pupil_diameter="2.7188568115234375" pupil_validity="1.0"/>
<location column="25" line="2" path="/src/Main.java" x="820" y="150"/>
<ast_structure token="println" type="IDENTIFIER">
<level end="2:26" start="2:19" tag="PsiIdentifier:println"/>
<level end="2:26" start="2:8" tag="PsiReferenceExpression:System.out.println"/>
<level end="2:42" start="2:8" tag="PsiMethodCallExpression:System.out.println("Hello world!")"/>
<level end="2:43" start="2:8" tag="PsiExpressionStatement"/>
<level end="3:5" start="1:43" tag="PsiCodeBlock"/>
<level end="3:5" start="1:4" tag="PsiMethod:main"/>
<level end="4:1" start="0:0" tag="PsiClass:Main"/>
</ast_structure>
</gaze>
Comment:
When the gaze point cannot be mapped to its location in the code editor in the following 3 cases, the remark attribute
is used:
- The raw gaze point from the eye tracker is invalid. (i.e., nan). In this case, the
remarkisFail | Invalid Gaze Point. - The code editor is not founded. In this case, the
remarkisFail | No Editor. - The code editor is founded, but the gaze point is out of the code editor. In this case, the
remarkisFail | Out of Text Editor.
#
Left Eye
Element: <left_eye>
Attribute:
- gaze_point_x
- gaze_point_y
- gaze_validity
- pupil_diameter
- pupil_validity
Example:
<left_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
pupil_diameter="2.4835662841796875" pupil_validity="1.0"/>
Comment:
gaze_point_xandgaze_point_yare the location on the screen, ranging from 0 to 1, where (0, 0) is the top-left corner of the screen, and (1, 1) is the bottom-right corner of the screen.gaze_validityandpupil_validityare the validity of the gaze point and pupil diameter, which is binary, 0 for invalid, 1 for valid. When using a mouse to simulate eye tracker,gaze_validityis always 1.0, andpupil_validityis always 0.0.pupil_diameteris the diameter of the pupil in mm, when using a mouse to simulate eye tracker,pupil_diameteris always 0.
#
Right Eye
Element: <right_eye>
Attribute:
- gaze_point_x
- gaze_point_y
- gaze_validity
- pupil_diameter
- pupil_validity
Example:
<right_eye gaze_point_x="0.5338541666666666" gaze_point_y="0.17407407407407408" gaze_validity="1.0"
pupil_diameter="2.7188568115234375" pupil_validity="1.0"/>
Comment:
gaze_point_xandgaze_point_yare the location on the screen, ranging from 0 to 1, where (0, 0) is the top-left corner of the screen, and (1, 1) is the bottom-right corner of the screen.gaze_validityandpupil_validityare the validity of the gaze point and pupil diameter, which is binary, 0 for invalid, 1 for valid. When using a mouse to simulate eye tracker,gaze_validityis always 1.0, andpupil_validityis always 0.0.pupil_diameteris the diameter of the pupil in mm, when using a mouse to simulate eye tracker,pupil_diameteris always 0.
#
Location
Element: <location>
Attribute:
- path
- line
- column
- x
- y
Example:
<location column="25" line="2" path="/src/Main.java" x="820" y="150"/>
Comment:
xandyare the coordinates of the gaze relative to the top-left corner of the visible code editor, whose unit is same as thescreen_sizeinenvironment, not the actual screen resolution.lineandcolumnare the line number and column number of the gaze point in the code editor, which is calculated byxyToLogicalPosition(@NotNull Point p)method ofEditorinterface in the IntelliJ Platform SDK.
#
AST Structure
Element: <ast_structure>
Sub-element: <level>: only used when the current token is different from the previous token
Attribute:
- token
- type
- remark: only used when the current token is same as the previous token
Example:
<ast_structure token="println" type="IDENTIFIER">
<level end="2:26" start="2:19" tag="PsiIdentifier:println"/>
<level end="2:26" start="2:8" tag="PsiReferenceExpression:System.out.println"/>
<level end="2:42" start="2:8" tag="PsiMethodCallExpression:System.out.println("Hello world!")"/>
<level end="2:43" start="2:8" tag="PsiExpressionStatement"/>
<level end="3:5" start="1:43" tag="PsiCodeBlock"/>
<level end="3:5" start="1:4" tag="PsiMethod:main"/>
<level end="4:1" start="0:0" tag="PsiClass:Main"/>
</ast_structure>
<ast_structure remark="Same (Last Successful AST)" token="println" type="IDENTIFIER"/>
Comment:
- The abstract syntax tree (AST) of the code file is recorded in the
<ast_structure>element. The AST is calculated by program structure interface (PSI) of IntelliJ Platform. tokenis the text of the leaf node in the AST of the current gaze point, which is calculated bypsiElement.getText().typeis the type of the leaf node, which is calculated bypsiElement.getNode().getElementType().remarkis used when the current token is same as the previous token, which means the gaze point is still in the same leaf node. In this case, theremarkisSame (Last Successful AST). We designed this mechanism to avoideye_tracking.xmlfrom being too large.We calculate the parent nodes of the leaf node by
psiElement.getParent()until the file level (i.e.PsiFile), and save them in the<level>element. In the previous example, the leaf node isPsiIdentifier:println, and its parent nodes arePsiReferenceExpression:System.out.println=>PsiMethodCallExpression:System.out.println("Hello world!")=>PsiExpressionStatement=>PsiCodeBlock=>PsiMethod:main=>PsiClass:Main. The original code text ispublic class Main { public static void main(String[] args) { System.out.println("Hello world!"); } }
#
Level
Element: <level>
Attribute:
- start: line:column
- end: line:column
- tag
Example:
<level end="3:5" start="1:4" tag="PsiMethod:main"/>
Comment:
startandendare the start and end position of the AST node level in the code file, which is calculated bypsiElement.getTextRange().tagis the type of the AST node level, which is calculated bypsiElement.toString().
#
Screen Recording
[OUTPUT_DIR]
├── [START_TIMESTAMP]
│ ├── screen_recording
│ │ ├── clip_1.mp4
│ │ ├── clip_2.mp4
│ │ ├── ...
│ │ ├── frames.csv
clip_[k].mp4frames.csv
#
Video Clips
clip_[k].mp4
Comment:
- The video clip of the screen recording from the (k-1)-th pause (0-th pause is start) to the k-th pause. We designed this mechanism to avoid the video file in the memory being too large especially when pausing the tracking for a long time.
#
Frames
frames.csv
Column:
- timestamp
- frame_number
- clip_number
Example:
timestamp,frame_number,clip_number
1703661629399,Start,1
1703661630996,1,1
1703661631247,2,1
1703661644518,Pause,1
1703661646446,Resume,2
1703661646824,1,2
1703661647737,Stop,2
Comment:
- The frame rate is 12 fps.
frame_numberis the frame number of the frame in its video clip.clip_numberis the number of the video clip to which the frame belongs.- We also record the timestamp of
Start,Pause,Resume, andStopactions in theframes.csvfile, which couldbe used to separate each stage of the development process.