{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting started with Python" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# Run this cell to view a YouTube video related to this topic\n", "from IPython.display import YouTubeVideo\n", "YouTubeVideo('65u7GK9c78o', height=350, width=600)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This brief introduction will give you an idea of Python syntax. \n", "\n", "You will learn about key concepts such as variables, what they are, and how they are created and updated. \n", "\n", "You will also learn about various types of objects defined in Python and how the type of an object determines its behaviour." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variables\n", "\n", "Understanding the concept of a variable is crucial when getting started with Python and other programming languages.\n", "\n", "To put it simply, variables are unique names for objects defined in the program. If an object does not have a name, it cannot be referred to elsewhere in the program.\n", "\n", "In Python, variables are assigned on the fly using a single equal sign `=`. \n", "\n", "The name of the variable is positioned left of the equal sign, while the object that the variable refers to is placed on the right-hand side.\n", "\n", "Let's create a variable named `var` containing a _string_ object and call this object by its name.\n", "\n", "Note that string objects are always surrounded by single or double quotation marks!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = \"This is a variable.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cell simply calls the variable, returning the object that the variable refers to." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the notion of a *variable* suggests, the value of a variable can be changed or updated." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = \"Yes, the variable name stays the same but the contents change.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you happen to need a placeholder for some object, you can also assign the value `None` to a variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variable names can be chosen freely and thus the names should be informative. \n", "\n", "Variable names are case sensitive, which means that `var` and `Var` are interpreted as different variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling the variable `Var` raises a `NameError`, because a variable with this name has not been defined." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Naming variables is only limited by keywords that are part of Python's syntax. \n", "\n", "Running the following cell prints out these keywords." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import keyword\n", "\n", "keywords = keyword.kwlist\n", "\n", "print(keywords)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Printing out a list of keywords introduces several important aspects of Python: the `import` command can be used to load additional modules and make their functionalities available in Python. \n", "\n", "We will frequently use the `import` command to import various external libraries and/or their parts for natural language processing and other tasks.\n", "\n", "In this case, the _module_ `keyword` has an _attribute_ called `kwlist`, which contains a _list_ of keywords. We assign this list to the variable `keywords` and print out its contents using the `print()` _function_." ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "### Quick exercise\n", "\n", "Choose a name for a variable and assign a string object that contains some text to the variable. Remember the quotation marks around string objects!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "### Enter your code below this line and run the cell (press Shift and Enter at the same time)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objects\n", "\n", "A list is just one _type_ of object defined in Python. More specifically, a list is one kind of _data structure_ in Python.\n", "\n", "We can use the `type()` _function_ to check the type of an object. To get the type of an object assigned to some variable, place its name within parentheses." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(keywords)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember our variable `var`? Let's check its type as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(var)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `type()` function is essential when hunting for errors in code.\n", "\n", "Knowing the type of a Python object is useful, because it determines what can be done with the object. \n", "\n", "For instance, brackets that follow the variable name can be used to access _items_ contained in a _list_. \n", "\n", "Note that Python lists are zero-indexed, which means that counting starts from zero, not one." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "keywords[3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This returns the fourth item in the `keywords` list. \n", "\n", "Can we do the same with the variable `var`?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var[3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will not work, since we set `var` to `None`, which is a special type of object called _NoneType_.\n", "\n", "Python raises a `TypeError`, because unlike a _list_ object, a _NoneType_ object cannot contain any other objects.\n", "\n", "Let's return to the list of Python keywords under the variable `keywords` and check the type of the fourth _item_ in the _list_." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(keywords[3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, a _list_ can contain other types of objects.\n", "\n", "Both strings and lists are common types when working with textual data.\n", "\n", "Let's define a toy example consisting of a string with some HTML (Hypertext Markup Language, the language used for creating webpages) tags." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"

This is an example string with some HTML tags thrown in.

\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python provides various methods for manipulating strings such as the one stored under the variable `text`. \n", "\n", "The `split()` method, for instance, splits a _string_ into a _list_.\n", "\n", "The `sep` argument defines the character that is used as the boundary for a split. \n", "\n", "By default, the separator is a _whitespace_ or empty space.\n", "\n", "Let's use the `split()` method to split the string under `text` at empty space." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tokens = text.split(sep=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We assign the result to the varible `tokens`. Calling the variable returns a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tokens" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can just as easily define some other separator, such as the less than symbol (<) marking the beginning of an HTML tag." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text.split('<')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the `split()` method is destructive: the character that we defined as the boundary is deleted from each string in the list.\n", "\n", "Note that we do not necessarily have to give the arguments such as `sep` explicitly: a correct type (string, `':'`) at the correct position (as the first *argument*) is enough." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we would like to remove the HTML tags from our example string?\n", "\n", "Let's go back to our original string stored under the variable `text`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python strings also have a `replace()` method, which allows replacing specific characters or their sequences in a string.\n", "\n", "Let's begin by replacing the initial tag `

` in `text` by providing `'

'` as input to its `replace` method.\n", "\n", "Note that the tag `

` is in quotation marks, as the `replace` method requires the input to be a string.\n", "\n", "The `replace` method takes two inputs: the string to be replaced (`

`) and the replacement (`''`). By providing an empty string as input to the second argument, we essentially remove any matches from the string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = text.replace('

', '')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Success! The first tag `

` is no longer present in the string. The other strings, however, remain in place." ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "### Quick exercise\n", "\n", "What about the remaining tags? Replace the `` tag in `text` with an empty string." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "### Enter your code below this line and run the cell (press Shift and Enter at the same time)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the `replace` method allowed us to easily replace parts of a string, it is not the most effective way to do so. What if the data contains dozens of HTML tags or other kind of markup? For this reason, we will explore more efficient ways of manipulating text data in Part II.\n", "\n", "This introduction should have given you a first taste of Python and its syntax. We will continue to learn more Python while working with actual examples." ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" }, "nbsphinx": { "allow_errors": true } }, "nbformat": 4, "nbformat_minor": 2 }