Rank Professors

A CWW engine which ranks professors using Perceptual Computing.

Table of Contents

Usage

Getting started is super easy.

You can clone the repository:

git clone --recursive https://github.com/mrkatebzadeh/RankProfessors.git

And then run it:

cd RankProfessors
./src/main.py -i intervals.csv -c config.json [-l level]

The standard levels and their applicability are described below (in increasing order of severity):

Level When it’s used
DEBUG Detailed information, typically of interest only when diagnosing problems.
INFO Confirmation that things are working as expected.
WARNING An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘There is no interval remaining’). The software is still working as expected.
ERROR Due to a more serious problem, the software has not been able to perform some function.
CRITICAL A serious error, indicating that the program itself may be unable to continue running.

The default level is CRITICAL.


Configuration Files

We have four configuration files. Each file is described below.

Interval File

Interval file is a CSV file with at least 3 rows. The first row includes the words for creating the codebook:

غیرقابل قبول,,مناسب,,ضعیف,,نسبتا متوسط,,قابل قبول,,نامناسب,,فوق العاده خوب

In the second row, for each word we have its Finglish word in English. These Finglish words are needed for plotting (matplotlib dosn’t support unicode for Farsi strings):

Fogholade khub,,Na monaseb,, Ghabele Ghabul,,Nesbatan motevaset,,Zaeef,,Monaseb,,Gheire Ghabele Ghabul

The following rows after the first two rows have the collected samples of intervals for words. In each row we have 2 * n numbers, where n is the number of words. For each word, the first number represents the left and the second number shows the right points of an interval for the given word. For example, in the following row:

80,90,50,70,60,80,70,80,90,100,50,60,10,20

80 and 90 are left and right points of an interval for فوق العاده خوب word and etc.

Config File

The config file is a JSON file which includes some information about professors and also address of question config file. The value of professors is an array of professors. Each professor has a name field, and a courses key which has an array of courses. Each course has its name, weight and file which refers to a file with students comments about that professor in the given course. The content of a config file is like:

{
  "questionFile" : "questionConfig.json",
  "professors" : [
    {
      "name" : "p1",
      "courses" : [
        {
          "name" : "course1" ,
          "weight" : 1,
          "file" : "course1.csv"
        },
        {
          "name" : "course1" ,
          "weight" : 2,
          "file" : "course2.csv"
        },
        {
          "name" : "course3" ,
          "weight" : 2,
          "file" : "course3.csv"
        }
      ]
    },
    {
      "name" : "p2",
      "courses" : [
        {
          "name" : "course4" ,
          "weight" : 2,
          "file" : "course4.csv"
        },
        {
          "name" : "course5" ,
          "weight" : 3,
          "file" : "course5.csv"
        }
      ]
    }
  ]
}

Question File

The question file is a JSON file which has questions key. The value of questions is an array of questions. Each question has title and weight keys. The content of the question file is like:

{
  "questions" : [
    {
      "title": "Q1",
      "weight" : 1
    },
    {
      "title": "Q2",
      "weight" : 3
    },
    {
      "title": "Q3",
      "weight" : 1
    },
    {
      "title": "Q4",
      "weight" : 3
    },
    {
      "title": "Q5",
      "weight" : 1
    },
    {
      "title": "Q6",
      "weight" : 2
    }
  ]
}

Course File

The course files are CSV files located in the courses directory. In each course file, there are multiple rows. Each row is a comment from a student about that course. The first column has the weight of the given student’s comment and the rest are answers to questions. The number of answers and questions in the question file must be same. Each comment is like:

1,بسیار خوب,بسیار خوب,مطلوب,مطلوب,مطلوب,معمولی

How this works

This application uses CWW library. CWW library includes encoder, engine and decoder modules for computing perceptions.

Data Collection

For a group of specific questions, a group of 120 M.Sc and Ph.D. students have been asked to answer the questions. According to frequency, 32 words have been collected as final words. After the data collection, for each word, a group of 42 individuals have been asked the following question:
On a scale of 0–100, what are the endpoints of an interval that you associate with the word W?
Then 42 data intervals [a(i), b(i)] are collected from these students, where i = 1, . . . , n, indicates the ith individual. The words that are used in our application is:

1 2 3 4 5 6 7 8 9 10 11
عالی معمولی مطلوب بسیار خوب واقعا عالی بد نیست نسبتا بد بسیار عالی فوق العاده ضعیف به درد نخور نامطلوب
12 13 14 15 16 17 18 19 20 21
غالبا بد بسیار ضعیف تقریبا خوب نسبتا قابل قبول فوق العاده بد متوسط نسبتا خوب بد غالبا خوب بسیار بد
22 23 24 25 26 27 28 29 30 31 32
افتضاح در حد عالی خوب در حد معمول غیرقابل قبول مناسب ضعیف نسبتا متوسط قابل قبول نامناسب فوق العاده خوب

Encoding

For this phase, we have implemented Yang algorithm.

Data preprocessing

After the interval collection intervals, the intervals of the words are preprocessed through the four steps similar to EIA preprocessing phase (Bad Data Processing, Outlier Processing, Tolerance Limit Processing, Reasonable Interval Processing). Using the remaining intervals, each word will be encode to an FOU.

Modeling

We used the Cloud model in CWW module for encoding words. This model uses the fuzzy statistics to compute the MF based on the m surviving data intervals of the given word. To approximate the fitted Gaussian MF of a word using a Cloud model, the fitted coefficients and root mean squared error are mapped into the parameters of a Cloud model, i.e., Ex = mean, En = standard deviation, He = rmse.

Next table provides the number of eliminated intervals in each preprocessing step and the final fitted coefficients mean, std, and root mean squared error rmse:

Word BDP OP TLP RIP Mean Std RMSE
عالی 0 11 0 0 93.49 4.9 0.12
معمولی 1 7 2 12 51.22 5.67 0.12
مطلوب 0 2 3 29 64.76 4 0.08
بسیار خوب 0 1 3 24 84.27 3.19 0.1
واقعا عالی 2 9 3 12 97.53 1.46 0.08
بد نیست 0 5 4 0 40.29 14.73 0.14
نسبتا بد 1 1 1 35 24.95 2.89 0.14
بسیار عالی 1 12 0 4 95.99 2.79 0.09
فوق العاده ضعیف 0 4 2 15 5.27 2.81 0.09
به درد نخور 0 5 3 30 4.28 2.06 0.07
نامطلوب 0 4 5 30 21.45 1.46 0.1
غالبا بد 0 9 5 17 15.32 2.79 0.09
بسیار ضعیف 0 6 4 10 5.6 2.81 0.1
تقریبا خوب 0 4 1 17 61.49 4.94 0.08
نسبتا قابل قبول 0 10 2 22 52.92 3.08 0.07
فوق العاده بد 0 10 2 3 3.98 2.77 0.09
متوسط 0 4 4 20 49.59 5.18 0.09
نسبتا خوب 0 15 1 13 63.59 3.65 0.08
بد 0 16 2 13 23.28 3.73 0.08
غالبا خوب 0 4 3 27 67.45 3.82 0.07
بسیار بد 0 4 2 29 10.57 2.7 0.07
افتضاح 1 6 10 9 2.34 1.42 0.07
در حد عالی 0 17 2 13 84.95 2.89 0.14
خوب 0 1 4 25 74.59 3.26 0.12
در حد معمول 0 10 4 9 53.36 3.66 0.08
غیرقابل قبول 0 2 3 31 14.95 2.89 0.14
مناسب 0 4 5 26 57.45 4.79 0.09
ضعیف 0 1 3 30 24.95 2.89 0.14
نسبتا متوسط 0 9 4 12 45.78 4.59 0.06
قابل قبول 0 4 4 22 58.91 5.26 0.1
نامناسب 0 1 8 23 23.72 3.2 0.07
فوق العاده خوب 0 3 1 18 88.52 4.76 0.08

Engine

After creating the codebook, the evaluation of each student should be imported. Note that for all aggregations, we have used Linguistic Weighted Average method. Each student answered each question with a word. For more simplicity, we assumed that we have the word in our codebook. The comment of a student should be an FOU based on aggregation of his/her answers. Finally, each professor can be evaluated by his/her courses FOUs.

Decoding

Now for each professor we have an FOU. The final step in CWW is assigning FOUs to similar words. Jaccard Similarity algorithm is used for comparing FOUs and words.

After decoding FOUs, we rank professors by sorting their FOUs. We have used Centroid Based Rank FOUs method for this purpose.