Beyond Verses

My blog that specializes in Space Science and latest news from NASA

Thursday, April 12, 2018

Cheating (plagiarism) detection with MOSS on Linux

Hi all,

This post will be very helpful for teaching assistants like me who is trying to make plagiarism detection for the assignments they receive.

We shall use MOSS, thanks to Professor Aiken.

I have developed a simple script that will prepare the data for the MOSS. As MOSS will read only one level of directories, and hence comes my script to move all the files to the student's parent folder.

Linux OS and Perl should be installed.

1. Register for a new account though MOSS home page, an email will be sent to you with a script, there will be a different user id.

2. Copy the script from your email and paste it to ""

3. Find my preparation script

echo "Staring the script";
echo "[IMP] Is that the correct folder '$1'? (y/n) "
read ans;
if [ $ans != "y" ]; then
# move source code to root directories
for student in "$1"/*; do
  if [ -d "$student" ]; then
    echo "Processing student: "$student;
    for files in "$student"/*; do
      if [ -d "$files" ]; then
        find "$files"/ -type f -regextype posix-extended -regex ".*\.($2)" | wc -l > tempFile;
        count=$(cat tempFile); rm -f tempFile;
        if [ $count -ne 0 ]; then
          find "$files"/ -type f -regextype posix-extended -regex ".*\.($2)" -print0 | xargs -0 mv -t "$student"/;
          if [ $? -eq 0 ]; then
            echo "Removing ... "$files;
            rm -rf "$files";
            echo "run the script one more time";
    echo "";

4. Copy the script and paste it to ""

5. Give both scripts execute permissions

    sudo chmod ug+x
    sudo chmod ug+x

6. Run the "" script

    ./ Minesweeper "c|h"

Where "Minesweeper" is the folder has a set of folders, each one represents a student.
Where "c|h" is the file extensions you want to move to the student's parent folder.

7. Run "" script

    perl -l c -m 20 -d Minesweeper/*/*.c Minesweeper/*/*.h

Where "-l" is the language attribute (c, java, ...)
Where "-m" 20 is a parameter to indicate if a segment of code has been found in more than 20 students, then it is not cheating.
Where "-d" is followed by the files to be checked.

8. Wait until it finishes, then you will be given a link in the terminal like this

The above link contains the results of plagiarism detection.

9. You can save the results also for offline use

    wget -r -np

I hope that would help you :)

Ahmed Hamdy, M.Sc.
Teaching Assistant at Computer and Systems Engineering Dept.
Faculty of Engineering, Alexandria University, Egypt.