CodeImpove: Program Adaptation for Deep Code Models

CodeImprove currently employs 15 transformation operators. The list of operators are as follows:

Function and Variable Name Renaming

We will extract all identifiers from a given input by parsing the code and capturing the identifiers by filtering out keywords, macros, and special identifiers (e.g., main, stdio, etc.). This step ensures that the identifiers are only variable/function names.
Next, the process of renaming identifiers involves two steps: identifying and subsituting the most important identifier.
Our approach is focused on obtaining the logit outputs by the target model for supervision. We introduce a metric called the importance score. To compute the importance score we retrieved the tokens of both original code C = [t₀, · · ·, t_i, · · ·] and code after replacing the identifier with [MASK] at the i^th token C_i = [t₀, · ·, t_i-1, [MASK], t_i+1, · · ·]. The logit output of C and C_i denoted as O_y(C) and O_y(C_i) respectively. The importance score is denoted as:

I_i = O_y(C) − O_y(C_i)

Once the importance scores are computed, we select the top K identifiers. For each identifier in K we replace the original identifier by generating candidates using the RoBERTa masked language model. Then we selected the identifier with highest logit output value to replace the original code. Maintaining K identifiers ensure to limit the large search space because the search space is limited to K identifiers.


                            int i; to int t;

for-loop to while-loop.

            
            For-loop:
                for (initialization; condition; increment) {
                // Loop body
                // Statements to execute
                }

            While-loop Transformation:
                initialization
                while (condition) {
                    // Loop body
                    // Statements to execute
                    increment;
                }

while-loop to for-loop.

               
            While-loop:
                while (condition) {
                        // Loop body
                        // Statements to execute
                        increment;
                }

            For-loop Transformation:
                for(initialization; condition; increment) {
                    // Loop body
                    // Statements to execute
                }

do-loop to while-loop.

During do-while-loop transformation into a while-loop, the body statement inside the do-while-loop will be executed once before entering the transformed while-loop.

              
               Do-while-loop:
                    do {
                        // Loop body
                        // Statements to execute
                        increment;
                    } while (condition);

                While-loop Transformation:
                        // Execute the loop body
                        // Statements to execute
                        while (condition) {
                        // Loop body
                        // Statements to execute
                        increment;
                    }

if elseif to if else

              
               if else-if :
                    if(condition A){ 
                          //body A
                    }else-if (condition B){
                          //body B
                    } else {
                          //body C
                    }

                if-else transformation:
                    if(condition A){ 
                          //body A
                    } else {
                         if (condition B) {
                            //body B
                         }
                         else {
                            //body C
                          }
                     }

if else to if else-if

              

                  if-else:
                    if(condition A){ 
                          //body A
                    } else {
                         if (condition B) {
                            //body B
                         }
                         else {
                            //body C
                          }
                     } 

                    if else-if transformation:
                      if(condition A){ 
                          //body A
                        }else-if (condition B){
                          //body B
                        } else {
                          //body C
                        }

Switch statements to if elseif


                              
                  switch statement:
                      switch (a) {
                        case A: body A
                        case B: body B
                        default: body C
                     }

                  if else-if transformation: 
                      if (a ==A) body A
                      else if (a==B) body B
                      else body C


              
                    
                  
              Relation expression transformation. 

               Transform relational expressions such as  a < b to b >a 
              
              
              Modification to Unary operations. 
 
              Modify unary operations such as  i++; to i=i+1;
              
              
              Modifications to incremental operations. 

              Modifies incremental operators such as  i+=1; to i=i+1;
              
              
              Modifying constants 
 
              Modify the constant values in expressions such as  i = 0; to i = 10-10;
              
              
              Modifications to variable definitions. 

              Modifies the definitions of variables.  
              int b = 0; to int b; b=0;
              
              
              
              Add junk code. 

              Adds code that will never be executed
                  
                      before adding the junk code: 
                          if(a){
                              \\body A
                          }

                      after adding the junk code:
                          if(a){
                              \\body A
                              if (0) return 0;
                          }


                  
              
              Change order of statements in a block. 

                  Only reorder the statements without any data- or control-dependency.

                  
                    before reorder: 
                      a = b+10;
                      c = d+10;

                    after reorder:
                      c = d+10;
                      a = b+10;

                      
                  
              
              
              
              Deleting statements that print debugging hints and comments. 

                Only delete comments or statements that print debugging hints or intermediate results.
                 
                        printf("trial");
                        //comments;


                         to 

                          printf("trial); 
                           //comments;

Experiment	Vulnerability Detection			Defect Prediction
Experiment	CodeBERT	RoBERTa	GraphCodeBERT	CodeBERT	RoBERTa	GraphCodeBERT
CLD	0.850	0.819	0.757	0.889	0.828	0.873
CodeImprove-no-dropout	0.850	0.819	0.757	0.891	0.903	0.898
CodeImprove	0.876	0.825	0.781	0.911	0.924	0.909

CodeImprove: Program Adaptation for Deep Code Models

Abstract

Overview

Problem Definition

Transformation Rules

Experiment Results

More Examples on Adapted Inputs by CodeImprove