From 082cb23b5d4608737252464d35b72302a4dbdd53 Mon Sep 17 00:00:00 2001 From: Karl Stroetmann Date: Sat, 28 Sep 2024 22:27:03 +0200 Subject: [PATCH] renaming --- Python/Chapter-03/03-Exam-Evaluation.ipynb | 567 +++++++++++++++++++++ 1 file changed, 567 insertions(+) create mode 100644 Python/Chapter-03/03-Exam-Evaluation.ipynb diff --git a/Python/Chapter-03/03-Exam-Evaluation.ipynb b/Python/Chapter-03/03-Exam-Evaluation.ipynb new file mode 100644 index 0000000..67805f7 --- /dev/null +++ b/Python/Chapter-03/03-Exam-Evaluation.ipynb @@ -0,0 +1,567 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Evaluating an Exam Using Ply\n", + "\n", + "This notebook shows how we can use the package [`ply`](https://ply.readthedocs.io/en/latest/ply.html)\n", + "to implement a scanner. Our goal is to implement a program that can be used to evaluate the results of an exam. Assume the result of an exam is stored in the string `data` that is defined below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = '''Class: Algorithms and Complexity\n", + " Group: TINF22AI1\n", + " MaxPoints = 60\n", + " \n", + " Exercise: 1. 2. 3. 4. 5. 6.\n", + " Jim Smith: 9 12 10 6 6 0\n", + " John Slow: 4 4 2 0 - -\n", + " Susi Sorglos: 9 12 12 9 9 6\n", + " 1609922: 7 4 12 5 5 3\n", + " '''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This data show that there has been a exam with the subject Algorithms and Complexity\n", + "in the group TIT09AID. Furthermore, the equation\n", + "```\n", + " MaxPoints = 60\n", + "```\n", + "shows that in order to achieve the best mark, 60 points would have been necessary.\n", + " \n", + "There have been 6 different exercises in this exam and, in this small example, only four students took part, namely *Jim Smith*, *John Slow*, *Susi Sorglos*, and some student that is only represented by their matriculation number. Each of the rows decribing the results of the students begins with the name (or matriculation number) of the student followed by the number of points that they have achieved in the different exercises. Our goal is to write a program that is able to compute the marks for all students." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports\n", + "\n", + "We will use the package [ply](https://ply.readthedocs.io/en/latest/ply.html).\n", + "In particular, we will use the scanner generator that is provided by the module `ply.lex`. \n", + "Furthermore, we will additionally use regular expressions to extract a number from a string. \n", + "Therefore, we also have to import the module `re`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ply.lex as lex\n", + "import re" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Auxiliary Functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function `mark(max_points, points)` takes two arguments:\n", + "- `points` is the number of points achieved by the student whose mark is to be computed.\n", + "- `max_points` is the number of points that need to be achieved in order to get the best mark of $1.0$.\n", + " \n", + "It is assumed that the relation between the mark of an exam and the number of points achieved in this exam is mostly linear and that a student who has achieved $50\\%$ of `max_points` points will get the mark $4.0$, while a student who has achieved $100\\%$ of `max_points` points will get the mark $1.0$. Therefore, the formula to calcuclate the grade is as follows:\n", + "$$ \\textrm{grade} = 7 - 6 \\cdot \\frac{\\texttt{points}}{\\texttt{max\\_points}} $$\n", + "However, the worst mark is $5.0$. Therefore, if the mark would fall below that line, the `min` function assures that it is less or equal than $5.0$. Furthermore, the resulting number is rounded to one digit." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def mark(max_points, points):\n", + " grade = 7 - 6 * points / max_points\n", + " return round(min(5.0, grade), 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lets test this function by plotting it. To do this we have to install `matplotlib`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install matplotlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "max_points = 60\n", + "points = [points for points in range(max_points+1)]\n", + "grades = [mark(max_points, points) for points in range(max_points+1)]\n", + "\n", + "plt.figure(figsize=(9, 6))\n", + "plt.plot(points, grades, marker='o', linestyle='-')\n", + "plt.title('Grade as a Function of Points (Max Points = 60)')\n", + "plt.xlabel('Points')\n", + "plt.ylabel('Grade')\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Token Declarations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We begin by declaring the list of tokens. Note that the variable `tokens` is a keyword of `ply` to define the names of the token classes. In this case, we have declared six different tokens.\n", + "The definitions of these tokens are given later.\n", + "- `HEADER` will match the first two lines of the string `data` as well as the fifth line that begins with \n", + " the string `Exercise:`. \n", + "- `MAXDEF` matches the line containing the definition of `MaxPoints`.\n", + "- `NAME` matches the name of a student.\n", + "- `MATRICULATION` matches the matriculation number of a student. \n", + " This number is supposed to have exactly 7 digits.\n", + "- `NUMBER` matches a natural number.\n", + " This is used for the points.\n", + "- `IGNORE` is a token that will match an empty line. For example, the fourth line in `data` is empty.\n", + "- `LINEBREAK` is a token that will match the newline character `\\n` at the end of a line." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tokens = [ 'HEADER', # r'[A-Za-z]+:.*\\n'\n", + " 'MAXDEF', # r'MaxPoints\\s*=\\s*[1-9][0-9]*'\n", + " 'NAME', # r'[A-Za-z -]+:'\n", + " 'MATRICULATION' # r'[0-9]{7}:' \n", + " 'NUMBER', # r'0|[1-9][0-9]*'\n", + " 'IGNORE', # r'^[ \\t]*\\n'\n", + " 'LINEBREAK' # r'\\n'\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Token Definitions\n", + "\n", + "Next, we need to provide the definition of the tokens. One way to define tokens is via python functions. \n", + "In this notebook we are only going to use these functional token definitions.\n", + "The document string of these functions is a raw string that contains the regular expression defining the semantics of the token. The regular expression can be followed by code that is needed to further process the token. The name of the function defining a token has to have the form `t_`**name**, where **name** is the name of the token as declared in the list `tokens`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The `HEADER` Token\n", + "\n", + "The token `HEADER` matches any string that is made up of upper and lower case characters followed by a colon. This colon may be followed by arbitrary characters.\n", + "The token extends to the end of the line and includes the terminating newline.\n", + "\n", + "When the function `t_HEADER` is called it is provided with a token `t`. This is an object that has four\n", + "attributes:\n", + "- `t.lexer` is an object of class `Lexer` that contains the scanner that was used to extract the token `t`.\n", + " We are free to attach additional attributes to this `Lexer` object.\n", + "- `t.type` is a string containing the type of the token. For tokens processed in the function\n", + " `t_HEADER` this type is always the string `HEADER`.\n", + "- `t.value` is the actual string matched by the token.\n", + "- `t.lexpos` is the position of the token in the input string that is scanned.\n", + "\n", + "Furthermore, the lexer object has one important attribute:\n", + "- `t.lexer.lineno` is the line number. However, it is our responsibility to update this variable\n", + " by incrementing `t.lexer.lineno` every time we read a newline.\n", + "\n", + "In the case of the token `HEADER` we need to increment the attribute `t.lineno`, as the regular expression contains a newline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_HEADER(t):\n", + " r'[A-Za-z]+:.*\\n'\n", + " t.lexer.lineno += 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Token `MAXDEF`\n", + "\n", + "The token `MAXDEF` matches a substring of the form `MaxPoints = 60`. Note that the regular expression defining the semantics of this token uses the expression `\\s*` to match the white space before and after the character `=`. We cannot just write a blank here because `ply.lex` uses verbose regular expressions that may contain whitespace for formatting. Hence a blank character \"` `\" inside a regular expression is silently discarded.\n", + " \n", + "After defining the regular expression, the function `t_MAXDEF` has some action code that is used to extract the maximal number of points from the token value and store this number in the variable `t.lexer.max_points`.\n", + "`t.value` is the string that is matched by the regular expression.\n", + "We extract the maximum number of points using conventional Python regular expressions. Furthermore, we initialize the student name, \n", + "which is stored in `t.lexer.name`, to the empty string." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_MAXDEF(t):\n", + " r'MaxPoints\\s*=\\s*[1-9][0-9]*'\n", + " t.lexer.max_points = int(re.findall(r'[1-9][0-9]*', t.value)[0])\n", + " t.lexer.name = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Token Name\n", + "\n", + "The token `NAME` matches the *name* of a student, which is followed by a colon character `:`. In general, a student name can be any sequence of letters that contains optional hyphens and blanks. Note that it is not necessary to use `\\s` inside a character range, as we can use a blank character instead.\n", + "Furthermore, note that the hypen `-` is the last character in the square brackets so it cannot be mistaken for the hyphen of a range.\n", + "\n", + "Note that every name has to contain at least one space so that the definition of `NAME` and `HEADER` do not overlap.\n", + "\n", + "The action code has to reset the variable `sum_points` that is stored in `lexer.sum_points`to `0`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_NAME(t):\n", + " r'[A-Za-z -]+:'\n", + " t.lexer.name = t.value[:-1] # cut of and discard the colon\n", + " t.lexer.sum_points = 0 # start counting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Token `MATRICULATION`\n", + "\n", + "The token `MATRICULATION` matches a string consisting of seven digits. These digits are followed by a colon.\n", + "Again we have to reset the variable `sum_points`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_MATRICULATION(t):\n", + " r'[0-9]{7}:'\n", + " t.lexer.name = t.value[:-1] # cut of colon\n", + " t.lexer.sum_points = 0 # start counting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Token `NUMBER`\n", + "\n", + "The token `NUMBER` matches a natural number. We have to convert the value, which is initially a *string* of digits, into an integer. Furthermore, this value is then added to the number of points the current student has achieved in previous exercises." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_NUMBER(t):\n", + " r'0|[1-9][0-9]*'\n", + " t.lexer.sum_points += int(t.value)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Token `IGNORE`\n", + "\n", + "The token `IGNORE` matches a line that contains only whitespace. In order to keep track of line numbers we have to increment `lexer.lineno`. However, we do not return a token at the end of the function. Hence, if the input contains an empty line, this line is silently discarded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_IGNORE(t):\n", + " r'^[ \\t]*\\n'\n", + " t.lexer.lineno += 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Token `LINEBREAK`\n", + "\n", + "The token `LINEBREAK` matches a single newline character `\\n`. If a student name is\n", + "currently defined, then we output the result for this student. Note that we set `lexer.name` back to the empty string once we have processed the student.\n", + "This allows for empty lines between different students." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_LINEBREAK(t):\n", + " r'\\n'\n", + " t.lexer.lineno += 1\n", + " if t.lexer.name != '':\n", + " n = t.lexer.name\n", + " m = t.lexer.max_points\n", + " p = t.lexer.sum_points\n", + " print(f'{n} has {p} points and achieved the mark {mark(m, p)}.')\n", + " t.lexer.name = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Ignoring Characters\n", + "\n", + "The string `t_ignore` specifies those characters that should be ignored. Note that this string is **not** interpreted as a regular expression. It is just a string of *single characters*. These characters are allowed to occur as part of other tokens, but when they occur on their own and would otherwise generate a scanning error, they are silently discarded instead of triggering an error. \n", + "\n", + "In this example we ignore hyphens `-`, blanks ` `, and tab characters. Hyphens occur when a student has not attempted a given exercise." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t_ignore = '- \\t'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Error Handling\n", + "\n", + "The function `t_error` is called when a string at the beginning of the input that\n", + "has not yet been processed can not be matched by any of the regular expressions defined in the various tokens defined above. In our implementation we print the first character that could not be matched, discard this character and continue." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def t_error(t):\n", + " print(f\"Illegal character '{t.value[0]}' at line {t.lexer.lineno}.\")\n", + " t.lexer.skip(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Tricking Ply\n", + "\n", + "The line below is necessary to trick `ply.lex` into assuming this program is part of an ordinary python file instead of being a *Jupyter notebook*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "__file__ = 'main'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Generating the Scanner and Running It\n", + "\n", + "The next line generates the scanner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lexer = lex.lex()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we feed an input string into the generated scanner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lexer.input(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to scan the data that we provided in the last line, the function `scan` iterates\n", + "over all tokens generated by our scanner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def scan(lexer):\n", + " for t in lexer:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we can run the scanner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "scan(lexer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.6" + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}