{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Managing textual data using *pandas*\n",
    "\n",
    "This section introduces how to prepare and manage textual data for analysis using *[pandas](http://pandas.pydata.org/)*, a Python library for working with tabular data.\n",
    "\n",
    "After reading this section, you should know:\n",
    "\n",
    "- how to import data into a *pandas* DataFrame\n",
    "- how to explore data stored in a *pandas* DataFrame\n",
    "- how to append data to a *pandas* DataFrame\n",
    "- how to save the data in a *pandas* DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing data to *pandas*\n",
    "\n",
    "Let's start by importing the *pandas* library. \n",
    "\n",
    "Note that we can control the name of the imported module using the `as` addition to the `import` command. *pandas* is commonly abbreviated `pd`.\n",
    "\n",
    "This allows us to use the variable `pd` to refer to the *pandas* library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Importing data from a single file\n",
    "\n",
    "You must often load and prepare the data yourself, either from a single file or from multiple files.\n",
    "\n",
    "Typical formats for distributing corpora include CSV files, which stands for Comma-separated Values, and JSON, which stands for JavaScript Object Notation or simple plain text files.\n",
    "\n",
    "*pandas* provides plenty of functions for [reading data in various formats](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html). You can even try importing Excel sheets!\n",
    "\n",
    "The following example shows how to load a corpus from a CSV file for processing in Python using the [SFU Opinion and Comments Corpus (SOCC)](https://github.com/sfu-discourse-lab/SOCC) (Kolhatkar et al. [2020](https://doi.org/10.1007/s41701-019-00065-w)).\n",
    "\n",
    "Let's load a part of the SFU Opinion and Comments Corpus, which contains the opinion articles from [The Globe and Mail](https://www.theglobeandmail.com/), a Canadian newspaper.\n",
    "\n",
    "We can use the `read_csv()` function from *pandas* to read files with comma-separated values, such as the SOCC corpus.\n",
    "\n",
    "The `read_csv()` function takes a string object as input, which defines a path to the input file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the CSV file and assign the output to the variable 'socc'\n",
    "socc = pd.read_csv('data/socc_gnm_articles.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*pandas* does all the heavy lifting and returns the contents of the CSV file in a *pandas* [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), which is data structure native to *pandas*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Examine the type of the object stored under the variable 'socc'\n",
    "type(socc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use the `head()` method of a DataFrame to check out the first five rows in the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>26842506</td>\n",
       "      <td>The Tories deserve another mandate - Stephen H...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2015-10-16 EDT</td>\n",
       "      <td>2187.0</td>\n",
       "      <td>1378.0</td>\n",
       "      <td>&lt;p&gt;All elections are choices among imperfect a...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>26055892</td>\n",
       "      <td>Harper hysteria a sign of closed liberal minds</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/harper-...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2015-08-24 EDT</td>\n",
       "      <td>1103.0</td>\n",
       "      <td>455.0</td>\n",
       "      <td>&lt;p&gt;If even a fraction of the darkness that his...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>19047636</td>\n",
       "      <td>The Globe's editorial board endorses Tim Hudak...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2014-06-06 EDT</td>\n",
       "      <td>905.0</td>\n",
       "      <td>432.0</td>\n",
       "      <td>&lt;p&gt;Over four days, The Globe editorial board l...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   article_id                                              title  \\\n",
       "0    26842506  The Tories deserve another mandate - Stephen H...   \n",
       "1    26055892     Harper hysteria a sign of closed liberal minds   \n",
       "2     6929035  Too many first nations people live in a dream ...   \n",
       "3    19047636  The Globe's editorial board endorses Tim Hudak...   \n",
       "4    11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "\n",
       "                                         article_url            author  \\\n",
       "0  http://www.theglobeandmail.com/opinion/editori...   GLOBE EDITORIAL   \n",
       "1  http://www.theglobeandmail.com/opinion/harper-...  Konrad Yakabuski   \n",
       "2  http://www.theglobeandmail.com/opinion/too-man...   Jeffrey Simpson   \n",
       "3  http://www.theglobeandmail.com/opinion/editori...   GLOBE EDITORIAL   \n",
       "4  http://www.theglobeandmail.com/news/world/disg...    Campbell Clark   \n",
       "\n",
       "   published_date  ncomments  ntop_level_comments  \\\n",
       "0  2015-10-16 EDT     2187.0               1378.0   \n",
       "1  2015-08-24 EDT     1103.0                455.0   \n",
       "2  2013-01-05 EST     1164.0                433.0   \n",
       "3  2014-06-06 EDT      905.0                432.0   \n",
       "4  2013-05-02 EDT     1129.0                411.0   \n",
       "\n",
       "                                        article_text  \n",
       "0  <p>All elections are choices among imperfect a...  \n",
       "1  <p>If even a fraction of the darkness that his...  \n",
       "2  <p>Large elements of aboriginal Canada live in...  \n",
       "3  <p>Over four days, The Globe editorial board l...  \n",
       "4  <p>Growing discontent among Arab nations over ...  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the DataFrame has a tabular form.\n",
    "\n",
    "The DataFrame contains several columns such as **article_id**, **title** and **article_text**, accompanied by an index for each row (**0, 1, 2, 3, 4**).\n",
    "\n",
    "The `.at[]` accessor can be used to inspect a single item in the DataFrame.\n",
    "\n",
    "Let's examine the value in the column **title** at index 123."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"How Toronto got a 'world-class,' gold-plated, half-billion-dollar empty train\""
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.at[123, 'title']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "### Quick exercise\n",
    "\n",
    "Let's go back to the SOCC corpus stored under the variable `socc`.\n",
    "\n",
    "Who is the author (`author`) of article at index 256? \n",
    "\n",
    "How many top-level comments (`ntop_level_comments`) did the article at index 1000 receive?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "### Enter your code below this line and run the cell (press Shift and Enter at the same time)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Importing data from multiple files\n",
    "\n",
    "Another common scenario is that you have multiple files with text data, which you want to load into *pandas*.\n",
    "\n",
    "Let's first collect the files that we want to load."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PosixPath('data/WP_1990-08-10-25A.txt'),\n",
       " PosixPath('data/NYT_1991-01-16-A15.txt'),\n",
       " PosixPath('data/WP_1991-01-17-A1B.txt')]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Import the patch library\n",
    "from pathlib import Path\n",
    "\n",
    "# Create a Path object that points to the directory with data\n",
    "corpus_dir = Path('data')\n",
    "\n",
    "# Get all .txt files in the corpus directory\n",
    "corpus_files = list(corpus_dir.glob('*.txt'))\n",
    "\n",
    "# Check the corpus files\n",
    "corpus_files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To accommodate our data, let's create an empty pandas DataFrame and specify its *shape* in advance, that is, the number of rows (`index`) and the names of the columns `columns`.\n",
    "\n",
    "We can determine the number of rows needed using Python's `range()` function. This function generates a list of numbers that fall within certain range, which we can use for the index of the DataFrame.\n",
    "\n",
    "In this case, we define a `range()` between `0` and the number of text files in the directory, which are stored under the variable `corpus_files`. We retrieve their number using the `len()` function, which returns the length of Python objects, if applicable.\n",
    "\n",
    "For the columns of the DataFrame, we simply create columns for filenames and their textual content by providing a list of strings to the `columns` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>filename</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  filename text\n",
       "0      NaN  NaN\n",
       "1      NaN  NaN\n",
       "2      NaN  NaN"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a DataFrame and assign the result to the variable 'df'\n",
    "df = pd.DataFrame(index=range(0, len(corpus_files)), columns=['filename', 'text'])\n",
    "\n",
    "# Call the variable to inspect the output\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have an empty data with rows for each file in the corpus, we can loop over the file paths under `corpus_files` and add their contents to the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loop over the corpus files and count each loop using enumerate()\n",
    "for i, f in enumerate(corpus_files):\n",
    "    \n",
    "    # Open the file for reading\n",
    "    c_file = open(f, encoding=\"utf-8\")\n",
    "    \n",
    "    # Get the filename from the Path object\n",
    "    filename = f.name\n",
    "        \n",
    "    # Read the file contents\n",
    "    text = c_file.read()\n",
    "    \n",
    "    # Assign the text from the file to index 'i' at column 'text'\n",
    "    # using the .at accessor – note that this modifies the DataFrame\n",
    "    # \"in place\" – you don't need to assign the result into a variable\n",
    "    df.at[i, 'text'] = text\n",
    "    \n",
    "    # We then do the same to the filename\n",
    "    df.at[i, 'filename'] = filename"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's check the result by calling the variable `df`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>filename</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>WP_1990-08-10-25A.txt</td>\n",
       "      <td>﻿*We Don’t Stand for Bullies': Diverse Voices ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NYT_1991-01-16-A15.txt</td>\n",
       "      <td>﻿U.S. TAKING STEPS TO CURB TERRORISM: F.B.I. I...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>WP_1991-01-17-A1B.txt</td>\n",
       "      <td>﻿U.S., Allies Launch Massive Air War Against T...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 filename                                               text\n",
       "0   WP_1990-08-10-25A.txt  ﻿*We Don’t Stand for Bullies': Diverse Voices ...\n",
       "1  NYT_1991-01-16-A15.txt  ﻿U.S. TAKING STEPS TO CURB TERRORISM: F.B.I. I...\n",
       "2   WP_1991-01-17-A1B.txt  ﻿U.S., Allies Launch Massive Air War Against T..."
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the DataFrame has been populated with filenames and text.\n",
    "\n",
    "Now that we know how to load data into DataFrames, we can turn towards accessing and manipulating the data that they store."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examining DataFrames\n",
    "\n",
    "*pandas* DataFrames can hold a lot of information, which is often organised into columns.\n",
    "\n",
    "The columns present in a DataFrame are accessible through the attribute `.columns`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['article_id', 'title', 'article_url', 'author', 'published_date',\n",
       "       'ncomments', 'ntop_level_comments', 'article_text'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Retrieve the columns and their names\n",
    "socc.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*pandas* provides various methods for examining the contents of entire columns, which can be accessed just like the keys and values of a Python dictionary.\n",
    "\n",
    "The brackets `[]` can be used to access entire columns by placing the column name within the brackets as a string.\n",
    "\n",
    "Let's retrieve the contents of the column `author`, which contains author information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         GLOBE EDITORIAL\n",
       "1        Konrad Yakabuski\n",
       "2         Jeffrey Simpson\n",
       "3         GLOBE EDITORIAL\n",
       "4          Campbell Clark\n",
       "               ...       \n",
       "10334     GLOBE EDITORIAL\n",
       "10335     GLOBE EDITORIAL\n",
       "10336     GLOBE EDITORIAL\n",
       "10337      Adam Radwanski\n",
       "10338     GLOBE EDITORIAL\n",
       "Name: author, Length: 10339, dtype: object"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc['author']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the column `author` contains 10399 objects, as indicated by the *Length* and *dtype* properties. The numbers on the left-hand side give the index, that is, the row numbers.\n",
    "\n",
    "The columns of a *pandas* DataFrame consist of another object type, namely *pandas* Series. You can think of the DataFrame as an entire table, whose columns consist of Series.\n",
    "\n",
    "We can verify this by examining their type using Python's `type()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(pandas.core.frame.DataFrame, pandas.core.series.Series)"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(socc), type(socc['author'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When printing out the contents of a DataFrame or Series, *pandas* omits everything between the first and last five rows by default. This is convenient when working with thousands of rows.\n",
    "\n",
    "This also applies to the output for methods such as `value_counts()`, which allows counting the number of unique values in a Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GLOBE EDITORIAL                            2712\n",
       "Jeffrey Simpson                             649\n",
       "Margaret Wente                              547\n",
       "Konrad Yakabuski                            404\n",
       "Gary Mason                                  365\n",
       "                                           ... \n",
       "STEVEN HOFFMAN                                1\n",
       "RON DEIBERT                                   1\n",
       "Whitney Lackenbauer and Adam Lajeunesse       1\n",
       "Rolando Ochoa                                 1\n",
       "Kyle Kirkup and Brenda Cossman                1\n",
       "Name: author, Length: 1896, dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc['author'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not surprisingly, the editorial team at the *The Globe and Mail* is responsible for most of the editorials!\n",
    "\n",
    "Let's take another look at the data by visualising the result by calling the `.plot()` method for the author information column. This method calls an external library named *matplotlib*, which can be used to produce all kinds of plots and visualisations.\n",
    "\n",
    "More specifically, we instruct the `plot()` method to draw a bar chart by providing the string `bar` to the `kind` argument.\n",
    "\n",
    "We also use the brackets `[:10]` to limit the output to the ten most profilic authors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAFICAYAAAC8zi5PAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAxUUlEQVR4nO3dd5xsVZnu8d9DEkURlCMqQYLoFQOIBxPMmEbFCGICRTGMoIIi17mKEy6YxjRmMYCAoChi5KgIAqMioMgBETgglzMCAiKgIqA4kp77x1pFVzfd5xw4XWvX1H6+n09/uvau6l5vh3pr1wrvkm0iIqIfVuk6gIiIaCdJPyKiR5L0IyJ6JEk/IqJHkvQjInokST8iokdW6zqAZVlvvfW8ySabdB1GRMT/KGeeeebvbS+Y7b6xTvqbbLIJixcv7jqMiIj/USRdOtd96d6JiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB4Z68VZK2KT/b630t/jkvc/Zx4iiYgYf7nSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHllu0pe0kaQfSjpf0hJJ+9TzB0i6QtLZ9ePZQ1/zDklLJV0o6ZlD53eo55ZK2m80P1JERMxlRaps3gK81fZZku4FnCnphHrfR23/x/CDJW0J7AI8HHggcKKkh9S7DwSeDlwOnCFpke3z5+MHiYiI5Vtu0rd9JXBlvX2DpAuADZbxJTsCR9n+G3CxpKXAY+t9S23/GkDSUfWxSfoREY3cqT59SZsAjwZOr6f2lnSOpEMlrVvPbQBcNvRll9dzc52f2cYekhZLWnzNNdfcmfAiImI5VjjpS7on8A3gLbavBz4DbA5sTXkn8OH5CMj2QbYX2l64YMGC+fiWERFRrdDOWZJWpyT8I21/E8D2VUP3Hwx8tx5eAWw09OUb1nMs43xERDSwIrN3BBwCXGD7I0PnHzD0sBcA59Xbi4BdJN1N0qbAFsDPgTOALSRtKmkNymDvovn5MSIiYkWsyJX+dsArgHMlnV3P/TOwq6StAQOXAHsC2F4i6WjKAO0twF62bwWQtDdwPLAqcKjtJfP2k0RExHKtyOydUwDNctexy/ia9wLvneX8scv6uoiIGK2syI2I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiR5SZ9SRtJ+qGk8yUtkbRPPX8fSSdIuqh+Xreel6RPSFoq6RxJ2wx9r93r4y+StPvofqyIiJjNilzp3wK81faWwOOBvSRtCewHnGR7C+CkegzwLGCL+rEH8BkoLxLA/sDjgMcC+w9eKCIioo3lJn3bV9o+q96+AbgA2ADYETi8PuxwYKd6e0fgCBc/A9aR9ADgmcAJtv9o+1rgBGCH+fxhIiJi2e5Un76kTYBHA6cD69u+st71O2D9ensD4LKhL7u8npvr/Mw29pC0WNLia6655s6EFxERy7HCSV/SPYFvAG+xff3wfbYNeD4Csn2Q7YW2Fy5YsGA+vmVERFQrlPQlrU5J+Efa/mY9fVXttqF+vrqevwLYaOjLN6zn5jofERGNrMjsHQGHABfY/sjQXYuAwQyc3YFjhs6/ss7ieTxwXe0GOh54hqR16wDuM+q5iIhoZLUVeMx2wCuAcyWdXc/9M/B+4GhJrwUuBV5S7zsWeDawFLgReDWA7T9KejdwRn3cu2z/cT5+iIiIWDHLTfq2TwE0x91Pm+XxBvaa43sdChx6ZwKMiIj5kxW5ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPbLcpC/pUElXSzpv6NwBkq6QdHb9ePbQfe+QtFTShZKeOXR+h3puqaT95v9HiYiI5VmRK/0vADvMcv6jtreuH8cCSNoS2AV4eP2aT0taVdKqwIHAs4AtgV3rYyMioqHVlvcA2ydL2mQFv9+OwFG2/wZcLGkp8Nh631LbvwaQdFR97Pl3PuSIiLirVqZPf29J59Tun3XruQ2Ay4Yec3k9N9f5iIho6K4m/c8AmwNbA1cCH56vgCTtIWmxpMXXXHPNfH3biIjgLiZ921fZvtX2bcDBTHXhXAFsNPTQDeu5uc7P9r0Psr3Q9sIFCxbclfAiImIOdynpS3rA0OELgMHMnkXALpLuJmlTYAvg58AZwBaSNpW0BmWwd9FdDzsiIu6K5Q7kSvoK8GRgPUmXA/sDT5a0NWDgEmBPANtLJB1NGaC9BdjL9q31++wNHA+sChxqe8l8/zAREbFsKzJ7Z9dZTh+yjMe/F3jvLOePBY69U9FFRMS8yorciIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JEk/YiIHllu0pd0qKSrJZ03dO4+kk6QdFH9vG49L0mfkLRU0jmSthn6mt3r4y+StPtofpyIiFiWFbnS/wKww4xz+wEn2d4COKkeAzwL2KJ+7AF8BsqLBLA/8DjgscD+gxeKiIhoZ7lJ3/bJwB9nnN4ROLzePhzYaej8ES5+Bqwj6QHAM4ETbP/R9rXACdzxhSQiIkbsrvbpr2/7ynr7d8D69fYGwGVDj7u8npvrfERENLTSA7m2DXgeYgFA0h6SFktafM0118zXt42ICO560r+qdttQP19dz18BbDT0uA3rubnO34Htg2wvtL1wwYIFdzG8iIiYzV1N+ouAwQyc3YFjhs6/ss7ieTxwXe0GOh54hqR16wDuM+q5iIhoaLXlPUDSV4AnA+tJupwyC+f9wNGSXgtcCrykPvxY4NnAUuBG4NUAtv8o6d3AGfVx77I9c3A4IiJGbLlJ3/auc9z1tFkea2CvOb7PocChdyq6iIiYV1mRGxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPZKkHxHRI0n6ERE9kqQfEdEjq3UdwCTYZL/vrfT3uOT9z5mHSCIili1X+hERPZKkHxHRI0n6ERE9kqQfEdEjSfoRET2SpB8R0SNJ+hERPbJS8/QlXQLcANwK3GJ7oaT7AF8FNgEuAV5i+1pJAj4OPBu4EXiV7bNWpv2YLusFImJ55uNK/ym2t7a9sB7vB5xkewvgpHoM8Cxgi/qxB/CZeWg7IiLuhFF07+wIHF5vHw7sNHT+CBc/A9aR9IARtB8REXNY2aRv4AeSzpS0Rz23vu0r6+3fAevX2xsAlw197eX1XERENLKytXe2t32FpPsBJ0j61fCdti3Jd+Yb1hePPQA23njjlQwvIiKGrdSVvu0r6uergW8BjwWuGnTb1M9X14dfAWw09OUb1nMzv+dBthfaXrhgwYKVCS8iIma4y0lf0lqS7jW4DTwDOA9YBOxeH7Y7cEy9vQh4pYrHA9cNdQNFREQDK9O9sz7wrTITk9WAL9s+TtIZwNGSXgtcCrykPv5YynTNpZQpm69eibYjIuIuuMtJ3/avga1mOf8H4GmznDew111tLyIiVl5W5EZE9EiSfkREj2S7xJhXKQURMd5ypR8R0SNJ+hERPZLunZhIK9vNlC6mmFS50o+I6JEk/YiIHknSj4jokST9iIgeSdKPiOiRJP2IiB5J0o+I6JHM048YkZSkiHGUpB8x4cZhoVpeAMdHunciInokST8iokeS9CMieiR9+hHRG+MwvtG1XOlHRPRIrvQjIhrqeiZTrvQjInokST8iokeS9CMieiRJPyKiR5L0IyJ6JEk/IqJHkvQjInokST8iokeS9CMieqR50pe0g6QLJS2VtF/r9iMi+qxp0pe0KnAg8CxgS2BXSVu2jCEios9aX+k/Flhq+9e2bwKOAnZsHENERG/JdrvGpBcBO9j+x3r8CuBxtvceeswewB718KHAhSvZ7HrA71fye8yHcYhjHGKA8YhjHGKA8YhjHGKA8YhjHGKAlY/jQbYXzHbH2FXZtH0QcNB8fT9Ji20vnK/v9z85jnGIYVziGIcYxiWOcYhhXOIYhxhGHUfr7p0rgI2Gjjes5yIiooHWSf8MYAtJm0paA9gFWNQ4hoiI3mravWP7Fkl7A8cDqwKH2l4y4mbnratoJY1DHOMQA4xHHOMQA4xHHOMQA4xHHOMQA4wwjqYDuRER0a2syI2I6JEk/YiIHulF0pf0H13HEDGOJK0iae2u44h2etGnL+k3tjfuOo4uSNoe2ML2YZIWAPe0fXGjtv+X7V9J2ma2+22f1SKOGsvdgBcCmzA0gcH2uxq1f7Ttl0g6Fxh+0qmE4Ue1iKPG8mXg9cCtlBl1awMft/2hVjGME0kPojxHTpR0d2A12zc0avt/L+t+2x+Z7zbHbnHWiKhpY9LOwAeA+9W2B0/spldUkvYHFlJWNh8GrA58CdiuUQj/m7K6+sOz3GfgqY3iADgGuA44E/hbw3YH9qmfn9tB2zNtaft6SS8Hvg/sR/m9NE364/A8kfQ6yv/ofYDNKWuHPgs8rVEI96qfHwpsy9QU9ucBPx9Ji7Yn4oPyR5vt477A5Y1jWQo8bAx+J2dTnki/GDp3TgdxrDLLuTUbx3Be13+PGscHVuTciGNYQrkA+BrwpHrulx38Ljp/ntTnyBozniPndhDHycC9ho7vBZw8irYmqU//TGBx/Tz8sRi4uXEsV9m+oHGbs7nJ5T/IAJLW6iiOzw8f1Di+1ziG0yQ9snGbs3n6LOee1TiGzwKXAGsBJ9fujesbxwDj8Tz5m0vxRwAkrcb07rdW1gduGjq+qZ6bdxPTvWN7065jGLJY0leBbzPUlWD7m43jOFrS54B16tvY1zAjATdyhaRP236jpHUpCf/gxjFsD7xK0sWUv0nTvnRJbwDeCGwu6Zyhu+4FnNYihhrHKpRku8HQud8AT2kVw5BxeJ78WNI/A3eX9HTK3+g7DdsfOAL4uaRv1eOdgC+MoqGJHsiVtDnwMmAX2w9v2O5hs5y27de0imEolqcDz6AkueNtn9A6hhrHBykDho8B3m/7G43bf9Bs521f2qj9ewPrAu+j9KEP3GD7jy1iGIplXIqKdf48qS+Cr2XoOQJ83g0ToyRRxhIWAH9XT59s+xcjaW/Skr6kBwIvpST7R1KeZN+0fW6ngXVA0gdsv31550bY/s7Dh8C/UQanjoP273wkbcXUk+ontn/Zsv0awxdtv2J550Ycw/spZXu/CvxlcL71i8+4qDN2Nra9smXcVyaGc2036X6cmKRf6/DvCmwAHF0/jumi20fShsAnmZol8xNgH9uXN47jLNvbzDh3TsMujdmu5AZaX9HtA7wOGLzQvAA4yPYnW8VQ45j2N6l9yOfYbraDXO3imsm2N2vU/ttsf1DSJ5ml/9z2m1vEUWN5PmXW0hq2N5W0NfAu289vFUON43DgU7bPGHVbE9OnD3wK+CnwMtuLASR19Yp2GPBl4MX1eLd6brZBvHk31H+82Sz9x6e2iAHA9qtbtbUCXkvZsOcvUN7xUP5fmiR9Se8ABn3H1zM1jfgmGhf5GoPxr8Hg7eJOoyj2p+zo9yMA22dL6uL38zjg5ZIupbz7GtmY0yQl/QdQkuyHJd2fcqW/ekexLLA9fJX7BUlvadj+lynzrzvvP4bb+/PfA/yV0rXzKGBf219qGQZlMdLArTRcv2H7fcD7JL3P9jtatTsbSfegrKHY2PYekrYAHmr7uy3atz0YKL3R9tdmxPbiWb5klG62fV3pVr9dFxeLz2zV0MRM2bT9B9uftf0kysKKPwFXSbpA0r83DucPknaTtGr92A34Q6vGbV9n+xLbuwKXU6asGrinpC5WJj/D9vWUhUmXAA8G/k/jGA4DTpd0gKR3Aj8DDmkcA8C/1P+NfwOQtJGkxzaO4TDKO4wn1uMrKC/Krc324tf6BXGJpJcBq0raonY5NZtNNWD70jqp4K+U5+rtU63n28T06c+lXsXs6kbL7WubD6J0GzyhnjoVeLPt37SKocaxN3AAcBVwWz3dbJriUBzn2X6EpM8DX7d9nKRf2t6qcRzbUKZuQhnIHcnsiOXE8BnK3+Kpth9Wp7D+wPa2DWNYbHuhpF/YfnQ91+zvIelZwLOBl1AGkwfWpqwWbvYiWN/1/AvTZ++82/Z/t4qhxvF8ysr1BwJXAw8CLhjFrMOJ6d6ZMVNkpvOaBcLt0wCbDgTN4S2Ut+3N3mXM4buSfkW5inlDrQHU+km1ObDE9lmSngL8naSLbf+pZRyUcYVtJP0CwPa1KrvItXRTnbEyWLS3OW1LU/yW0p//fMoCyoEbgH0bxoHtGylJ/19atjuLdwOPB060/ej6P7rbKBqamKRPqVUxFzM1a2PkxqQPG+AySr2ZTtner/5OrrN9q6S/ADs2DuMbwEJJD6asSF1EGft4duM4bpa0KlMJdwFT78Ja2Z/yf7mRpCMps8xe1apx27+UdB7wTNuHt2p3mEohws1sH1GPv04p2wLwHtv/2Tikm23/QaXq6Sq2fyjpY6NoaGKS/pjNFHmG7bdJegGlD3tnSm2N1kn/18CPJH2P6Sse571y37JIeuXQ7eG7jmgYxm0u23XuTJka98nB1XZjnwC+BdxP0nuBFwH/2jIA2ydIOotyZSnKdOLfN47h1jqescZwGYSG3gm8aej4oZQXvrUos6xaJ/0/SbonJU8cKelqhtZQzKeJSfoA9Qpq3cE/cH3b/CrKVfbDGoYy+L0+B/jaLLMDWvlN/VijfnRluL96TcpA+1m0Tfo3S9oVeCVT7wqbz+6yfaSkMym/AwE7tao/ozuWuL6yft5Y0sZuWOq6uhg4VdIipi8Sa3FRsrbt84eOL7J9JoCk9zVof6YdKT0D+wIvB+4NjGQccmKSvqRdgM8Bf5F0EfBe4FBKvfCXNw6n8z5sANvvhDJYVfsuO2F7+IoKSesARzUO49WUGvLvtX1xnYv9xcYxIOkQ4JO2Dxw6d4DtAxo0PyhxvSal5PYvKS88j6L0sT9hjq8blf+qH6swVWK4lXWGD2wPjwmOpNDZctwPuLIOIB9ex1zWZxSz/txhWdP5/KAM1j643t6G0p3xvA7juQ+war19D+D+HcTwBOB84Df1eCvg02Pwt1oduLDrODr62S+nJNtXDp07q3EM3wQeOXT8CMqsqs5/Pw1/B98BnjPL+ecC3+sgnsWUVcGD4zWAM0bR1sRc6VPKCC+FsiOTpIs8tQikKUlrUrqVtq+rgk8BPtNBKB+jLPpYBLcPoP196yAkfYepOcerAg+jLJ5rGcMWlMVqW1KudAFwo9IDQ66mVLT8kqTHUTZXad3391AP1aKyfZ6klt2fwO2D2G8DHs70v0mLzXX2Bb4n6UWUrkYoxQCfSDcb3azmobEN2zeNalbXJCX9+2n61mPrDB+77eDlEZTpZ4Ml/i+jdCW0Xm2I7ctmjCfcOtdjR2h4j+JbgEvduA4RZUHS/sBHKUn31XSzOFG2rwOeJ+kAyvL/ezeO4Zy6ZmIwseDlwDnLePyoHEmZp/9cStfb7sA1LRq2vVTSoyg/+2Au/MnA6914jn51jaTn214EIGlHSlG8eTcxi7NUtgack2v/dqNYzveMAlqznWsQx9eBj1DqEg2uKhfa3qVlHDWW+1NqnJjytvV3jds/0/ZjNFTNcHCucRzvtL3/0PHzKBMNmm0dWd+JvgEYvOs7GfhM62Q39De5vQigpDPccKHauKhrJY6kLM4SZbr1Kwe9F/Pa1qQk/XEi6UuUaYE/q8ePA/ay/cplf+W8x7Ee8HHgHyj/SD+gTM9rulhL0j8C/5cyDU7AkyiVDA9tGMNplNW4X69xXEGp6//QVjHEdJJ+Zvvxko6nTGX9LWVsYfOOQ+tMnbaJ7T+PrI1JSvp1efc7KP22UPYC/YDtYxvHcQFl3u+g7MLGwIWUrg17xGUQJB1DKf1wKuWquot50MPxXAg8cfBiI+m+wGktE66kbSnVHdehrH68N/DBwQtzwzjGYTPw7SjlOR7EUBdv6/ENSc+llB3fiNIVujbwzkEXRx9I2s32l2Z0Td9uFN3SE9Onr7Id4J6UgaFBydaFwPslbWi7ZfnaHRq2NZuDKQNS7wUeVaePnkZ5ETjN9lWN4/kDZYxj4AYaFqAD8FSd8j9T+vO78kHKrLIu94Y9hDKQeSbdjPEA4KmqntfRzXaN09Q6SBvZbjm+Mdi3utmU1Ym50pd0PrC9Z5QOrleVp7jt4qzb/4GYfiXVevHLYMHao4EnUwbLNrW9aqO2B1cvW1N2MTuG0qe/I2XjkFc1iGGZV41uv1nGqba3W/4jRxrD6bYf12H7n1jW/W67icqPKDWAVqO8CF4NnGp71ivvEcWwNaXy7JIWFwMTc6VPeQG7Q614l3oWbQOR3k2ZsvlfTE1VNNBysG49ytX+EynL7dcETqRsHNLK4OplsAhn4JiGMTyBMij2FeB02k+PnGkcNgP/oaQPUebrD8fQ6qLk9ZR1NUdT+vG7/Jvc2/b1ddzpCNv7a/rGQyOlUmL7FZQXnA+q7Ldw8CjbnKSkf72krTxj31OVfVFvmONrRuUlwOZd9aXXFcnXUYqMHU8pIDWygaG5tJwxtQz3p+xYtitl6uz3gK/YXtJRPGsDN1JK+Q40LQhImckFpftzOIZWFyWDDY9eShnn+iplAPdPjdoftpqkB1Ces11U2twF2Nr2jbVX4jhK9+zITFLSfyuwSGVf1kG51oWUub8jKVG6DOdRBgyvbtzuwKGUq/sXUrpVHiHpp8AvbDfvw+1yEU79eY8DjpN0N0ry/1GdOvmpUbc/SzydFwa03Wn/eR3Q/yzwWZX9pHcBzpf0dtutS2O8i3JhdIrtMyRtBlzUsP2/uZZIqb0SI187MjF9+gCS1gf2YmqxxfnAgR3MCV9I6cI4j+lvn5vX2Jf0EEoXzxMoUxZ/77K7WMsYfkC5mvsnhhbh2H57o/bvRil+tyuwCWWF8qG2r2jR/oxYHkJZnb2+y8YyjwKeb7vZzlWS/u9s591wo6EaxzaUv8nTKRdqH/b0ImgTT9KfKOskoHRz/d3Q8UhyxsQkfUlfaDEwuCIkLaEUfzuXoVrptn/cOI7NKAl/u/r5gcDptpsuM+9yEY6kIyi1ZY4FjrLddEOdWeL5MWWryM95ateq82w/omEMbx06XJOyIvYC269p1P67KC/CF1AK7x1n+5YWbc8Sy2HMsi1hw9/FMi/ARpEzJinpn2V7ZunYTnS9qlDStyj9ttdTpmqeRpmR0Mk0wS4X4Ui6jamyvcP/7M3nx9d4zrC9raZvVXi27a1bxjEjprsBx9t+cqP2bqOUVR5Ufh38XQZ/k2bbeUp64dDhmsALgN+2nEHU2iT16d9D0qOZYyZA4+mSP1Gpyb2IbmZHHAa8zo03xliG90i6N2XcZbAI5y0tGrbdRX2dZfl9XXI/2DnrRUzVte/KPYANG7a3acO2lsn2N4aPJX2FUiBxYk3Slf4NlNr5syV9N65t8sOuYxgnkja1ffGMc9sOLZjqjdrldhClu+1ayhXvy132VW4Vw7lMr3q6gFIWo/nA9riR9FBKaeUHdx3LqExS0r/97XKMF5Wdop4/GDhVKe98oGvhsz6StBalyueNwC62j2zY9oOGDm8BruqqT71r9WJxOAn+DnjHzHcADeMZ+YZHk9S907ku6mj8D/F64Nu1ouQ2lLr2rTck75SktSkzyzagzOw6sR6/lVLWuFnSH7yrkHQ/Sj/2AyVh+zfL/srJY7v1jl23k7S67Zvr7ScCnwfuSdm+citgT9tvnO92x62/c2W8resAmF5HY7aPpiSdtCLnRq1247yZUuXzAOAfbF/WMgZJb6qlMbryRUoRvnOB1wE/pCxQeoHtHVsGIun5dQHfxcCPgUuA77eMYSiWu9culU5IekEdbxocryNpp0bN7yFp+3r7o5QNj/4AZcMjpkpfz6tJ6t4Z7qecdheNZwR0TaVe+j0oieXJTI1zrE2ZHve/GsUxvGMWlOqnV1L6spuuW5D0HsoioLMoi9eOd8N/fk2v478q5fewsTvYsEPSLymrb0+0/WhJTwF2s/3axnE8j7LBzhq2N601aN7V+P/iDjOnWnUV1/+Dj9p+86Ae0oxZXb+0vdV8tztJ3TtdbHE2jUqlzx/Zvkil4M8hlFWxlwK72/5Fo1D2pMyOeSBTW8FBmcLZcrDuP5b/kDZs/2utc/IMSpXNT0k6GjjE9n8t+6vnxc1Dsdwq6fIuEv4glsHqT0mr2P6hpI91EMcBlI11fgRg+2yVDetbmq23o0lerKvFB1NDL6tdPJa0OmXDo5FMsZ6YpD/UT7kOsEU9/f9ctqZrZR/gC/X2rpSNyDejVLn8BGW13cjZ/jjwcUlvsv3J5X7B6OJouhhteWxb0u8og3W3AOsCX5d0gu1Rdw9uJen6elvA3etxF+sF/qSyWcfJwJGSrmZqLUNLN9u+TtMLIrbuelgs6SPAgfV4L6bKuLT0esqGRxtQNvj5QY1l3k1S987dKKtgd6L0VYqyScS3KPtejrz42fBbRUlfpqx+/Xg9br54rM4Q2ZfSjbCHyubgD/VUHfNWcTyeMj//YcAalGmCf2mZ6CTtA7ySsu/o54Fv275ZpdbJRS0Wio2L+n/xV8pV7sspG8oc6fY7qh0CnATsR3lH/GZgdduvbxjDWsC/UXaXM3AC8F7bzV4EazfPEbZf3qK9ibnSB/4VWJ2yCcINAJLuRXkF/7f6MWq3qVTsuxZ4GmUTk4G7N2h/pkMpVy1PrMdXAF8DmiZ9SpfSLrXthZTk+5DGMawL7DxzPrzt21R2cOqNQUKrA9vXA+e1TvjVmyiVLf8GfJlaEbZV4zXZftfdF6C7VdKDJK3R5OJ0gq70zwMeO3OOa30b+7MWtU1q8vgc5Ur2O7ZfV88/CXib7eeMOoYZ8Sy2vbDF4NAKxjFce6fZuor65F7SagB7XEn6LrCf7fPqxclZlF3mNgMOtv2xLuPrQp3NtnPjbuDZ4jiC8k54EUNdbaOY5j1JV/q3zbaowfafJTV5ZbP93brw5V62rx26azGldnhrN0m6O1NL/jdnqCzEqElatQ5W3ShpDeBsSR+kzFxpNl24XkldKGnjPs5FH7KppwrOvRo4wfYr6zviU4GPtQxG0gnAi13r6Nd3HkfZfmbDMP4MnFtjGU62rWvvDDYaWoURT++epKTv+k8zWxmG22Y5N5ogysrGa2ec62KQDGB/Si35jSQdSam2+aqG7Z8p6Q2UnYFWAfamjDFsROnDbWldYImknzP9yd283HWHbh66/TTqZh22b1Apgtbaeh7aOMX2tXXBWEvfpO0GNrNyww2HJinp35vSfz1r7Z3GsXSuDlCuC+xM2VBFwD5uW4RtT8oA7i8p3VvXAl3tptViTGe5JO0MfAC4H+Vv0nL2zmWS3gRcTlkZfVyN6e6U8bDWbht+91XfJTd9rto+vGV7M82ylmWaUVyUTEyfftzRoC+94xhEmY72T5RVn8P7C0xs+dq5SFoKPM8dlLmuV9HvomxXeKDtH9TzTwEeY7vpugpJO1CKz/2YqQ1E9rB9fMMYtqCUBdmS6bu6bdao/UE9/Z0pW3t+qR7vSqmJtO+8tzkpSV+17k29vZ3tU4fu29sNKwhK+iZlYdb3bXfxtnkQx/spUxS/yvQujTtsID/CGO4LfIjypPoc05N+s6uscZg2WuM41fZ2LdscZ5LWo7wThTLhomk5cEmnULpBPwo8jzLWsYrtWXcXG2Ecd7hAG9VF2yQl/dvnwc+cE996jrykf6D88zyeMk3xMNsXtmp/KI6LZznthlcxr6fsEvUhyk5Rnf2zSVrMLNNGbb+jUfs715tPolzRfZvpey103q/cBUkbUNbT3N7VbPvkub9i3tsf7Oo2XCbjTNuPaRVDbfMC4Dm2f12PNwWOtf2w+W5rkvr0Ncft2Y5HyvaJwIkqhZx2rbcvowycfcm1sl6DOLrerGJ74Am2u9ogfhrbS4dmFB0m6RdAk6RPuYocuJFSDuL20BiDwcTWJH2AMqttCVPvAM3QHrEN/G2wQE/S3pS1LPds2P7AvsCPJP2aqYWle46ioUlK+p7j9mzHI1e7NXajzFz5BaV07vaUTcGf3DCOR3DH/sojWrRte7cW7aygrqeNvhru2PU4ONcqjjGzE2WFeLNpxLPYh1Kc8M3AuymF6HZvHYTt4+r4wmAtya9G9XuZpO6dG4GllFfJzett6vFmttea62tHEMu3KGV0vwh8wfaVQ/c1G1yVtD/lBWZLysbgzwJOsf2iFu2Pkzoz5CpKf/6+lNlen7a9dJlfOP9x3KGrsYPux0/Mcvo6YLHtYxrG8X3KPP0/t2pznKkUXNuE6V1d836BNklX+vPe97USPmF7ti0TaTyb5kWUom+/sP1qSeszNTugV4bKL/w3HUwblfQESjmMBZq+yc7alEHlltakXFF+rR6/kFKvaitJT7H9lkZx3Eh553US08c3Rj6rS9KiZd3fev2GpC9SLlbPBm4dhAEk6c9lZk2Vjp0u6V/puNAZ8NdaW+YWlZ2brqYsjGpC0n2WdX+LWUSSdgQ2tH1gPT6dsicslLUDXx91DNUalL7i1Zi+4vJ6yotzS48CtqtjG0j6DPATSvfjuQ3jWFQ/uvAE4DLgK8DpNB73m8VCYMsWkx0mJumPmcMYj0Jni1VKTR9c4/kz8NOG7Z9JuVoRsDFlpbKAdYDfAC0Gmt9GmbUzcDdgW8ouZ4cBTZK+S5npH0v6gu1L1WAv1GVYl/ICNKg3sxZwn1quoln/uu3D68KwjTuY3XZ/4OmUiRYvA74HfMX2ksZxDJxXY7pyeQ9cWZO0XeI42dz2B6nL3uuTu/mVhO032v6T7c9S/sF3HwwoNmp/0zo99ETKgqT1bN+XsuHNDxqFsYanb814iu0/1FWgzcZ5hjxQ0vnArwAkbSXp041j+CClW+UwSV+gTDT4kEqZ4RNbBaGyc9bZTK0M3np53S7zxfatto+zvTtlavVSyuyZvVu0P4v1gPMlHS9p0eBjFA1N0kDu2ravn+O+poW2JJ1GqW1yqu1taqGzr9h+bKsYahyzDQ5eB1xaawS1iuP2OdDLOjeitpfafvAc9/2XG9fRr91LLwIWeary6XluUAV2RhwPoOxaBXCG7d+2bL/GcCZltsyPuvhdqOzB8RzK1f4mlK6mQ21f0aL9GbE8abbzHsFGRJPUvfMjSj0RJJ1k+2lD9317cF8jXRc6G/g05ec+h/JO4xGUOdH3lvSGwTL8Bn5bxzgGg8gvB1olmdMlvc72wcMnJe0J/LxRDNPYvkzTd4u6da7HjoKkwYbbg8KAD5b04JaLoqrZds5qsoJdpZTxIyiz2t7pqeqjnZiZ3FU2TN+VUqJiXk1S0h/+z5k5gNisa0XjUehs4LfAawf9lJK2pNReeRtlMVCrpL8r5YXwW/X45HquhX2Bb0t6GVP7BT+G0re/U6MYhjXbC3UZ/s/Q7TUpV/yDq+6WltS/y6p1ssObgdMatb0bpTTJPsCbh154uti+sjQsPZoyvvBiymyqb4yknQnq3hmnMgydFzqrcdzhrfLgnIa2duwDSU8FHl4Pl9j+z47iWI+yF+o/UMbUjqdcFHSxc9Ugpo2Aj9luWu5a0j0oO2cNVicfD7zH3W0Y35ykh1AugHZlqk7WP9l+0MjanKCkfznwEcor9b71NvX4LbZbTlXsvNBZjeNo4A/AUfXUSykDRq+gDGhu2yiOBZR3Fw9n+srg1leWMQuVy9wltrds2OaqwInueKvCrqnsY/ATyjvypfXcrz3C+liT1L1zMFPzn4dvQ9kIu6XBLlnDu9mbsi1dS7sDbwTeUo9PpZQ4vhlo+WQ7kvIC+FxKmeXdgWsatj82JG1GudJ/POV/4qfAvoNCW41i+CRTpUlWAbZmquuriTo99DZJ93bHWxV2bGfKlOIfSjqOcoE20u7oibnSHweSXmz7a5I2a/kkniOWsbmS0lQlw+E9cs9o9U5jnEj6GXAgZVEQlCf8m2w/rmEMw7VlbgEumVkPqFEcxwCPBrreqrBzdbrsjpRunqdSVuJ+axSTLSYm6Utak3KFfS3wHcpg1d9T9p18d4uB1MHYQesxhGXEMy6bPv/M9uMlHQ98gjLA/PXW0yXHwfAL39C5LjarXwN4SD280I0qv86IYdbCZu54N6uuqWz7+mLgpTNmIc7P95+gpH80pdtiLcrsmfMoyX97YGvbz20Qw4mUKWePZZbysB3U8xiLKylJz6X0W25E2chkbco0ua6W4Dc3VJLi7ZQLk6MoXSwvBdZ1o7r+NZYnA4cDl1C6EjaiLNxrOmVT0tOA02z/tWW7fTdJSX8wK2U14HLb9x+6r8mVVL162oZSXfMfZ94/ioUWy4mn8yup2s30ZtsfbdXmOFLZ0GZQkmImj3LgbpZYzgReNih9UGeQfMXtNw45nFID54+Ui4KTKRMMrl3mF8ZKmaSB3JsAbN8iaebCn1aLXw6x/QpJB7dO8LMZh7fJdcBuV8p2dL3l7je0Gbb6cK0b2/+vrhloqpZAQNIDKauUDwQeyGTlpbEzSb/cDVXqhGvoNvV4g0YxPKb+A79c0sHMuKrrYMpmp5s+DzlV0qe44xTWpjNGxoU63NimWizp80xfIb24YfsASNqNshn6IylTnD9FueKPEZqk7p1l7nbT4qpX0puBN1CmZl7B9KTf9C18jWdcNn2ebW8B93GevsZgY5tac2YvyngXlET7aTfewUrS7ykTLT4L/ND2JS3b76uJSfrDJN0TwB3tyCPpM7bf0EXbM+IYi02fY4qkc5na2GYr1Y1tbD+9cRwLAGx3ul5C0sMps+y2B7agzCR6RZcxTbpJ6t5B0hsoG12vVY//DHzAdtPStbbfUAsmbWH7sLr0/l62L24ZB2Oy6XO9snwhd9wK7l2tYxkDnW1sU1fe7g/sTS2rLulW4JNd/C3qz78xZRPwTShbWDYpuNZnE1NPv1ZxfB7wZNv3danb/hTgWfW+lrHsT5maN5iGtwbdbFM4vOnzYyjlF5pv+gwcQ1l4cgulT3/w0UczN7Y5i3Yb2+xLqfi6re372L4P8DhgO0n7Noph2CmU5+w5lDnpDx0M7sboTEz3jqQLga1mFmtS2Znnl7YfMvtXjiSWsynz48/yVJ3wOyzK6YvZCr8FSNoEWNv2OY3a+wXw9JkLFWtXzw8G/6sx2Sape8ezVeez/dda1Kilm2xbkuH2JdbNaMw2fQZOk/RI2y33Xx1LdZX0h20fOxi4lHSQ7T0aNL/6bCvTbV/TxZTNFOLrxiQl/SskPc32ScMna0ndke87OcPRkj4HrCPpdcBraVv0bdw2fd4eeFVdoPQ3pmqW9/Gdz6bA2yVta/ud9VyrMtw33cX7RiWF+DowSd07D6f0HZ9C6SuF8mTaDtjRjTc8lvR0huqE22659+iqTG36/Cg63vRZ0qy1wW1f2jqWrkk6i1Km4xOUAdzdKNMVR16rqQ7azjaWImBN202v9lOIrxsTc6Vve0ld9PIypjbLOBnYkzJFbuQk3cBUydrhq+vXS/pvypzkf5n5bmS+2b6Vsl3jcXXmzK6UTZ/faftTo2x7jngulbQVZSEOwE9s/7J1HGNCLvsTv1HSqygXKeu2aNj2qi3auRMGRd6ulPQcSiG+mbvexTybmCv9ZZH0G9sbdxzDqpQ9OY9sMaip8dr0eR/gdZQtGgFeABxk+5OtY+mapD1tf27o+DHAXrZf02FYnZijEN8Btr/TaWATri9J/zI33DlrWWY+6UfUxvCmz0e5402fJZ0DPMH2X+rxWsBP+9SnL2lt29cPVducpnWJjnEl6S22P9Z1HJOsL0m/8yv9lupspUHf7fAfuJNNn+sq1G0Hs6tU9j44Y7BKuA8kfdf2c+eottm8RMe46ttztQsT06cv6TtMT3C33wXct3E4nbI9bovuDgNOl/SterwTcEh34bTnup/DmFXbHEddzzSbeBNzpS/pScu6fxxKHfeNpI1sX1Zvb8P0Al8b2P5uZ8E1Vn/+OfW14uhMudIfvYlJ+jF+JP0K2GFm9URJr6HMYurNdolzVBod6FXF0Rmz3KbdBdzd9sT0QIyjiUn6knYENrR9YD0+HVhQ736b7a93FlxPSXo28DHgObYvquf2o9Rvf5btyzsML6KXxq3vd2W8jTItceBuwLaU2uWdlznuI9vHUn7335f0CEkfA54P/H3fEr6ktw3dfvGM+/69fUTRV5OU9NcY9B9Xp9j+g+3fUEstR3t1IdqrgR9RNpd5qvu5B+ouQ7dnboK+Q8tAot8mqe9s2qpG23sPHS4gmhvquxXlndfTgKtrXffmU0c7pjluz3YcMTKTlPRPl/Q62wcPn5S0J/DzjmLqNdv36jqGMeI5bs92HDEykzSQez/g25QqjoPpb4+hXGHuZPuqjkKLGC52JuDuwI2Du+ig2Fn018Qk/YFaSnlQcG2J7f/sMp6IiHEycUk/IiLmNkmzdyIiYjmS9CMieiRJPyKiR5L0IyJ6JEk/IqJH/j9C4AyPg3aiNwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This is some Jupyter magic that allows us to render matplotlib plots in the notebooks!\n",
    "# You only need to enter this command once.\n",
    "%matplotlib inline\n",
    "\n",
    "# Count the values in the column 'author' and clip the result to top-10 before plotting.\n",
    "socc['author'].value_counts()[:10].plot(kind='bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For columns with numerical values, we can also use the `.describe()` method to get basic descriptive statistics on the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    10339.000000\n",
       "mean        26.384273\n",
       "std         39.786923\n",
       "min          0.000000\n",
       "25%          1.000000\n",
       "50%         14.000000\n",
       "75%         35.000000\n",
       "max       1378.000000\n",
       "Name: ntop_level_comments, dtype: float64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc['ntop_level_comments'].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, the column `ntop_level_comments` has a total of 10339 rows. \n",
    "\n",
    "The average number of comments received by an editorial is approximately 26, but this number fluctuates, as the standard deviation from the average is nearly 40.\n",
    "\n",
    " - Some editorials do not have any comments at all, as indicated by the minimum value of 0. \n",
    " - The lowest quartile shows that 25% of the data has only one comment or less (none).\n",
    " - The second quartile (50%), which is also known as the median, indicates that *half* of the data has less than 14 comments and *half* has more than 14 comments. \n",
    " - The third quartile shows that 75% of the data has 35 comments or less. \n",
    " - The most commented editorial has 1378 comments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if we would like to find the articles with zero comments?\n",
    "\n",
    "We can use the DataFrame accessor `.loc` to access columns or rows based on their labels.\n",
    "\n",
    "The number of comments is stored in the column `ntop_level_comments`, but we also need to specify that the DataFrame stored under the variable `socc` contains the column that we wish to examine. \n",
    "\n",
    "This causes the somewhat repetitive reference to the `socc` DataFrame, which is nevertheless necessary for being explicit.\n",
    "\n",
    "We also need to provide a command for \"is equal to\". Since the single equal sign `=` is reserved for assigning variables in Python, two equal signs `==` are used for comparison.\n",
    "\n",
    "Finally, we place the value we want to evaluate against on the right-hand side of the double equal sign `==`, that is, zero for no comments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7797</th>\n",
       "      <td>33441604</td>\n",
       "      <td>Joseph Boyden, where are you from?</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/joseph-...</td>\n",
       "      <td>Hayden King</td>\n",
       "      <td>2016-12-28 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Hayden King teaches in the School of Public...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7798</th>\n",
       "      <td>33316285</td>\n",
       "      <td>Globe editorial: Rejoice! Congress just gave t...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2016-12-13 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;The United States may have just elected a p...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7799</th>\n",
       "      <td>33009790</td>\n",
       "      <td>Police and La Presse: Warrants not warranted</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2016-11-23 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;The discovery that the Montreal Police obta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7800</th>\n",
       "      <td>32970624</td>\n",
       "      <td>The Galloway affair: Salem comes to UBC</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/the-gal...</td>\n",
       "      <td>Margaret Wente</td>\n",
       "      <td>2016-11-22 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;I have a question about the Steven Galloway...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7801</th>\n",
       "      <td>32927142</td>\n",
       "      <td>Justice delayed: the law of unintended consequ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/unreaso...</td>\n",
       "      <td>BENJAMIN PERRIN</td>\n",
       "      <td>2016-11-19 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Benjamin Perrin is a law professor at the U...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10334</th>\n",
       "      <td>533784</td>\n",
       "      <td>WTO action on China's rare-earth quotas makes ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2012-03-14 EDT</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;The confusingly named substances known as '...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10335</th>\n",
       "      <td>533594</td>\n",
       "      <td>A customer-friendly Finance Department</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2012-03-13 EDT</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Of the many things that frustrate the retai...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10336</th>\n",
       "      <td>533508</td>\n",
       "      <td>Video raises questions about Nik Zoricic's 'fr...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2012-03-12 EDT</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Officials and fans are mourning the death o...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10337</th>\n",
       "      <td>533504</td>\n",
       "      <td>McGuinty can't afford misgivings about gaming</td>\n",
       "      <td>http://www.theglobeandmail.com/news/politics/m...</td>\n",
       "      <td>Adam Radwanski</td>\n",
       "      <td>2012-03-12 EDT</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Unlike so many of the other measures that m...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10338</th>\n",
       "      <td>533471</td>\n",
       "      <td>In Russia, Canada should look for investment, ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2012-03-12 EDT</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;As if in swift response to Prime Minister V...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2542 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       article_id                                              title  \\\n",
       "7797     33441604                 Joseph Boyden, where are you from?   \n",
       "7798     33316285  Globe editorial: Rejoice! Congress just gave t...   \n",
       "7799     33009790       Police and La Presse: Warrants not warranted   \n",
       "7800     32970624            The Galloway affair: Salem comes to UBC   \n",
       "7801     32927142  Justice delayed: the law of unintended consequ...   \n",
       "...           ...                                                ...   \n",
       "10334      533784  WTO action on China's rare-earth quotas makes ...   \n",
       "10335      533594             A customer-friendly Finance Department   \n",
       "10336      533508  Video raises questions about Nik Zoricic's 'fr...   \n",
       "10337      533504      McGuinty can't afford misgivings about gaming   \n",
       "10338      533471  In Russia, Canada should look for investment, ...   \n",
       "\n",
       "                                             article_url           author  \\\n",
       "7797   http://www.theglobeandmail.com/opinion/joseph-...      Hayden King   \n",
       "7798   http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "7799   http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "7800   http://www.theglobeandmail.com/opinion/the-gal...   Margaret Wente   \n",
       "7801   http://www.theglobeandmail.com/opinion/unreaso...  BENJAMIN PERRIN   \n",
       "...                                                  ...              ...   \n",
       "10334  http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "10335  http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "10336  http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "10337  http://www.theglobeandmail.com/news/politics/m...   Adam Radwanski   \n",
       "10338  http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "\n",
       "       published_date  ncomments  ntop_level_comments  \\\n",
       "7797   2016-12-28 EST        0.0                  0.0   \n",
       "7798   2016-12-13 EST        0.0                  0.0   \n",
       "7799   2016-11-23 EST        0.0                  0.0   \n",
       "7800   2016-11-22 EST        0.0                  0.0   \n",
       "7801   2016-11-19 EST        0.0                  0.0   \n",
       "...               ...        ...                  ...   \n",
       "10334  2012-03-14 EDT        0.0                  0.0   \n",
       "10335  2012-03-13 EDT        0.0                  0.0   \n",
       "10336  2012-03-12 EDT        0.0                  0.0   \n",
       "10337  2012-03-12 EDT        0.0                  0.0   \n",
       "10338  2012-03-12 EDT        0.0                  0.0   \n",
       "\n",
       "                                            article_text  \n",
       "7797   <p>Hayden King teaches in the School of Public...  \n",
       "7798   <p>The United States may have just elected a p...  \n",
       "7799   <p>The discovery that the Montreal Police obta...  \n",
       "7800   <p>I have a question about the Steven Galloway...  \n",
       "7801   <p>Benjamin Perrin is a law professor at the U...  \n",
       "...                                                  ...  \n",
       "10334  <p>The confusingly named substances known as '...  \n",
       "10335  <p>Of the many things that frustrate the retai...  \n",
       "10336  <p>Officials and fans are mourning the death o...  \n",
       "10337  <p>Unlike so many of the other measures that m...  \n",
       "10338  <p>As if in swift response to Prime Minister V...  \n",
       "\n",
       "[2542 rows x 8 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.loc[socc['ntop_level_comments'] == 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This returns a total of 2542 rows where the value of the column `ntop_level_comments` is zero.\n",
    "\n",
    "For more complex views of the data, we can also combine multiple criteria using the `&` symbol, which is the Python operator for \"AND\".\n",
    "\n",
    "Note that individual criteria must be placed in parentheses `()` to perform the operation.\n",
    "\n",
    "Let's check if the first author in our result, Hayden King, wrote any other articles with zero comments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7797</th>\n",
       "      <td>33441604</td>\n",
       "      <td>Joseph Boyden, where are you from?</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/joseph-...</td>\n",
       "      <td>Hayden King</td>\n",
       "      <td>2016-12-28 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Hayden King teaches in the School of Public...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      article_id                               title  \\\n",
       "7797    33441604  Joseph Boyden, where are you from?   \n",
       "\n",
       "                                            article_url       author  \\\n",
       "7797  http://www.theglobeandmail.com/opinion/joseph-...  Hayden King   \n",
       "\n",
       "      published_date  ncomments  ntop_level_comments  \\\n",
       "7797  2016-12-28 EST        0.0                  0.0   \n",
       "\n",
       "                                           article_text  \n",
       "7797  <p>Hayden King teaches in the School of Public...  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.loc[(socc['ntop_level_comments'] == 0) & (socc['author'] == 'Hayden King')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "### Quick in-class exercise\n",
    "\n",
    "How many articles with zero top-level comments were authored by the editorial team (`GLOBE EDITORIAL`)?\n",
    "\n",
    "Write out the whole command yourself instead of copy-pasting to get an idea of the syntax."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "### Enter your code below this line and run the cell (press Shift and Enter at the same time)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extending DataFrames\n",
    "\n",
    "You can easily add information to *pandas* DataFrames.\n",
    "\n",
    "One common scenario could involve loading some data from an external file (such as a CSV or JSON file), performing some analyses and storing the results to the same DataFrame.\n",
    "\n",
    "We can easily add an empty column to the DataFrame. This is achieved using the column accessor `[]` and the Python datatype `None`.\n",
    "\n",
    "Let's add a new column named `comments_ratio` to the DataFrame `socc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "socc['comments_ratio'] = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>26842506</td>\n",
       "      <td>The Tories deserve another mandate - Stephen H...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2015-10-16 EDT</td>\n",
       "      <td>2187.0</td>\n",
       "      <td>1378.0</td>\n",
       "      <td>&lt;p&gt;All elections are choices among imperfect a...</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>26055892</td>\n",
       "      <td>Harper hysteria a sign of closed liberal minds</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/harper-...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2015-08-24 EDT</td>\n",
       "      <td>1103.0</td>\n",
       "      <td>455.0</td>\n",
       "      <td>&lt;p&gt;If even a fraction of the darkness that his...</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>19047636</td>\n",
       "      <td>The Globe's editorial board endorses Tim Hudak...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2014-06-06 EDT</td>\n",
       "      <td>905.0</td>\n",
       "      <td>432.0</td>\n",
       "      <td>&lt;p&gt;Over four days, The Globe editorial board l...</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   article_id                                              title  \\\n",
       "0    26842506  The Tories deserve another mandate - Stephen H...   \n",
       "1    26055892     Harper hysteria a sign of closed liberal minds   \n",
       "2     6929035  Too many first nations people live in a dream ...   \n",
       "3    19047636  The Globe's editorial board endorses Tim Hudak...   \n",
       "4    11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "\n",
       "                                         article_url            author  \\\n",
       "0  http://www.theglobeandmail.com/opinion/editori...   GLOBE EDITORIAL   \n",
       "1  http://www.theglobeandmail.com/opinion/harper-...  Konrad Yakabuski   \n",
       "2  http://www.theglobeandmail.com/opinion/too-man...   Jeffrey Simpson   \n",
       "3  http://www.theglobeandmail.com/opinion/editori...   GLOBE EDITORIAL   \n",
       "4  http://www.theglobeandmail.com/news/world/disg...    Campbell Clark   \n",
       "\n",
       "   published_date  ncomments  ntop_level_comments  \\\n",
       "0  2015-10-16 EDT     2187.0               1378.0   \n",
       "1  2015-08-24 EDT     1103.0                455.0   \n",
       "2  2013-01-05 EST     1164.0                433.0   \n",
       "3  2014-06-06 EDT      905.0                432.0   \n",
       "4  2013-05-02 EDT     1129.0                411.0   \n",
       "\n",
       "                                        article_text comments_ratio  \n",
       "0  <p>All elections are choices among imperfect a...           None  \n",
       "1  <p>If even a fraction of the darkness that his...           None  \n",
       "2  <p>Large elements of aboriginal Canada live in...           None  \n",
       "3  <p>Over four days, The Globe editorial board l...           None  \n",
       "4  <p>Growing discontent among Arab nations over ...           None  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's populate the column with some data by calculating which percentage of the comments are top-level comments, assuming that a high percentage of top-level comments indicates comments about the article, whereas a lower percentage indicates more discussion about the comments posted.\n",
    "\n",
    "To get the proportion of top-level comments out of all comments, we must divide the number of top-level comments by the number of all comments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "socc['comments_ratio'] = socc['ntop_level_comments'] / socc['ncomments']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Column accessors can be used very flexibly to access and manipulate data stored in the DataFrame, as exemplified by the division."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>26842506</td>\n",
       "      <td>The Tories deserve another mandate - Stephen H...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2015-10-16 EDT</td>\n",
       "      <td>2187.0</td>\n",
       "      <td>1378.0</td>\n",
       "      <td>&lt;p&gt;All elections are choices among imperfect a...</td>\n",
       "      <td>0.630087</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>26055892</td>\n",
       "      <td>Harper hysteria a sign of closed liberal minds</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/harper-...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2015-08-24 EDT</td>\n",
       "      <td>1103.0</td>\n",
       "      <td>455.0</td>\n",
       "      <td>&lt;p&gt;If even a fraction of the darkness that his...</td>\n",
       "      <td>0.412511</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "      <td>0.371993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>19047636</td>\n",
       "      <td>The Globe's editorial board endorses Tim Hudak...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2014-06-06 EDT</td>\n",
       "      <td>905.0</td>\n",
       "      <td>432.0</td>\n",
       "      <td>&lt;p&gt;Over four days, The Globe editorial board l...</td>\n",
       "      <td>0.477348</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "      <td>0.364039</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   article_id                                              title  \\\n",
       "0    26842506  The Tories deserve another mandate - Stephen H...   \n",
       "1    26055892     Harper hysteria a sign of closed liberal minds   \n",
       "2     6929035  Too many first nations people live in a dream ...   \n",
       "3    19047636  The Globe's editorial board endorses Tim Hudak...   \n",
       "4    11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "\n",
       "                                         article_url            author  \\\n",
       "0  http://www.theglobeandmail.com/opinion/editori...   GLOBE EDITORIAL   \n",
       "1  http://www.theglobeandmail.com/opinion/harper-...  Konrad Yakabuski   \n",
       "2  http://www.theglobeandmail.com/opinion/too-man...   Jeffrey Simpson   \n",
       "3  http://www.theglobeandmail.com/opinion/editori...   GLOBE EDITORIAL   \n",
       "4  http://www.theglobeandmail.com/news/world/disg...    Campbell Clark   \n",
       "\n",
       "   published_date  ncomments  ntop_level_comments  \\\n",
       "0  2015-10-16 EDT     2187.0               1378.0   \n",
       "1  2015-08-24 EDT     1103.0                455.0   \n",
       "2  2013-01-05 EST     1164.0                433.0   \n",
       "3  2014-06-06 EDT      905.0                432.0   \n",
       "4  2013-05-02 EDT     1129.0                411.0   \n",
       "\n",
       "                                        article_text  comments_ratio  \n",
       "0  <p>All elections are choices among imperfect a...        0.630087  \n",
       "1  <p>If even a fraction of the darkness that his...        0.412511  \n",
       "2  <p>Large elements of aboriginal Canada live in...        0.371993  \n",
       "3  <p>Over four days, The Globe editorial board l...        0.477348  \n",
       "4  <p>Growing discontent among Arab nations over ...        0.364039  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the column `comments_ratio` now stores the result of our calculation!\n",
    "\n",
    "However, we should also keep in mind that some articles did not receive any comments at all: thus we would have divided zero by zero.\n",
    "\n",
    "Let's examine these cases again by retrieving articles without comments, and use the `.head()` method to limit the output to the first five rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7797</th>\n",
       "      <td>33441604</td>\n",
       "      <td>Joseph Boyden, where are you from?</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/joseph-...</td>\n",
       "      <td>Hayden King</td>\n",
       "      <td>2016-12-28 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Hayden King teaches in the School of Public...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7798</th>\n",
       "      <td>33316285</td>\n",
       "      <td>Globe editorial: Rejoice! Congress just gave t...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2016-12-13 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;The United States may have just elected a p...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7799</th>\n",
       "      <td>33009790</td>\n",
       "      <td>Police and La Presse: Warrants not warranted</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2016-11-23 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;The discovery that the Montreal Police obta...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7800</th>\n",
       "      <td>32970624</td>\n",
       "      <td>The Galloway affair: Salem comes to UBC</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/the-gal...</td>\n",
       "      <td>Margaret Wente</td>\n",
       "      <td>2016-11-22 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;I have a question about the Steven Galloway...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7801</th>\n",
       "      <td>32927142</td>\n",
       "      <td>Justice delayed: the law of unintended consequ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/unreaso...</td>\n",
       "      <td>BENJAMIN PERRIN</td>\n",
       "      <td>2016-11-19 EST</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>&lt;p&gt;Benjamin Perrin is a law professor at the U...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      article_id                                              title  \\\n",
       "7797    33441604                 Joseph Boyden, where are you from?   \n",
       "7798    33316285  Globe editorial: Rejoice! Congress just gave t...   \n",
       "7799    33009790       Police and La Presse: Warrants not warranted   \n",
       "7800    32970624            The Galloway affair: Salem comes to UBC   \n",
       "7801    32927142  Justice delayed: the law of unintended consequ...   \n",
       "\n",
       "                                            article_url           author  \\\n",
       "7797  http://www.theglobeandmail.com/opinion/joseph-...      Hayden King   \n",
       "7798  http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "7799  http://www.theglobeandmail.com/opinion/editori...  GLOBE EDITORIAL   \n",
       "7800  http://www.theglobeandmail.com/opinion/the-gal...   Margaret Wente   \n",
       "7801  http://www.theglobeandmail.com/opinion/unreaso...  BENJAMIN PERRIN   \n",
       "\n",
       "      published_date  ncomments  ntop_level_comments  \\\n",
       "7797  2016-12-28 EST        0.0                  0.0   \n",
       "7798  2016-12-13 EST        0.0                  0.0   \n",
       "7799  2016-11-23 EST        0.0                  0.0   \n",
       "7800  2016-11-22 EST        0.0                  0.0   \n",
       "7801  2016-11-19 EST        0.0                  0.0   \n",
       "\n",
       "                                           article_text  comments_ratio  \n",
       "7797  <p>Hayden King teaches in the School of Public...             NaN  \n",
       "7798  <p>The United States may have just elected a p...             NaN  \n",
       "7799  <p>The discovery that the Montreal Police obta...             NaN  \n",
       "7800  <p>I have a question about the Steven Galloway...             NaN  \n",
       "7801  <p>Benjamin Perrin is a law professor at the U...             NaN  "
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc.loc[socc['ntop_level_comments'] == 0].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For these rows, the `comments_ratio` column contains values marked as `NaN` or \"not a number\".\n",
    "\n",
    "This indicates that the division was performed on these cells as well, but the result was not a number.\n",
    "\n",
    "*pandas* automatically ignores `NaN` values when performing calculations, as show by the `.describe()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    7797.000000\n",
       "mean        0.537057\n",
       "std         0.205398\n",
       "min         0.083333\n",
       "25%         0.384615\n",
       "50%         0.485714\n",
       "75%         0.647059\n",
       "max         1.000000\n",
       "Name: comments_ratio, dtype: float64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "socc['comments_ratio'].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the difference in the result for the count. Only 7797 items out of 10399 were included in the calculation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if we would like to do some natural language processing and store the results in the DataFrame?\n",
    "\n",
    "Let's select articles that fall within the first quartile in terms of the ratio of original comments to all comments made (`comments_ratio`) and have received more than 200 comments (`ncomments`). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "      <td>0.371993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "      <td>0.364039</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>26691065</td>\n",
       "      <td>Fifty years in Canada, and now I feel like a s...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/fifty-y...</td>\n",
       "      <td>SHEEMA KHAN</td>\n",
       "      <td>2015-10-07 EDT</td>\n",
       "      <td>1142.0</td>\n",
       "      <td>376.0</td>\n",
       "      <td>&lt;p&gt;'Too broken to write,' I told my editor, af...</td>\n",
       "      <td>0.329247</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>25731634</td>\n",
       "      <td>I'm Canadian - and I should have a right to vote</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/im-cana...</td>\n",
       "      <td>Donald Sutherland</td>\n",
       "      <td>2015-07-28 EDT</td>\n",
       "      <td>1021.0</td>\n",
       "      <td>348.0</td>\n",
       "      <td>&lt;p&gt;My name is Donald Sutherland. My wife's nam...</td>\n",
       "      <td>0.340842</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>13647608</td>\n",
       "      <td>A nation of $100,000 firefighters</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/a-natio...</td>\n",
       "      <td>Margaret Wente</td>\n",
       "      <td>2013-08-08 EDT</td>\n",
       "      <td>1102.0</td>\n",
       "      <td>338.0</td>\n",
       "      <td>&lt;p&gt;Everyone loves firefighters. They save live...</td>\n",
       "      <td>0.306715</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   article_id                                              title  \\\n",
       "2     6929035  Too many first nations people live in a dream ...   \n",
       "4    11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "5    26691065  Fifty years in Canada, and now I feel like a s...   \n",
       "6    25731634   I'm Canadian - and I should have a right to vote   \n",
       "8    13647608                  A nation of $100,000 firefighters   \n",
       "\n",
       "                                         article_url             author  \\\n",
       "2  http://www.theglobeandmail.com/opinion/too-man...    Jeffrey Simpson   \n",
       "4  http://www.theglobeandmail.com/news/world/disg...     Campbell Clark   \n",
       "5  http://www.theglobeandmail.com/opinion/fifty-y...        SHEEMA KHAN   \n",
       "6  http://www.theglobeandmail.com/opinion/im-cana...  Donald Sutherland   \n",
       "8  http://www.theglobeandmail.com/opinion/a-natio...     Margaret Wente   \n",
       "\n",
       "   published_date  ncomments  ntop_level_comments  \\\n",
       "2  2013-01-05 EST     1164.0                433.0   \n",
       "4  2013-05-02 EDT     1129.0                411.0   \n",
       "5  2015-10-07 EDT     1142.0                376.0   \n",
       "6  2015-07-28 EDT     1021.0                348.0   \n",
       "8  2013-08-08 EDT     1102.0                338.0   \n",
       "\n",
       "                                        article_text  comments_ratio  \n",
       "2  <p>Large elements of aboriginal Canada live in...        0.371993  \n",
       "4  <p>Growing discontent among Arab nations over ...        0.364039  \n",
       "5  <p>'Too broken to write,' I told my editor, af...        0.329247  \n",
       "6  <p>My name is Donald Sutherland. My wife's nam...        0.340842  \n",
       "8  <p>Everyone loves firefighters. They save live...        0.306715  "
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filter the DataFrame for highly commented articles and assign the result to the variable 'talk'\n",
    "talk = socc.loc[(socc['comments_ratio'] <= 0.384) & (socc['ncomments'] >= 200)]\n",
    "\n",
    "# Call the variable to examine the output\n",
    "talk.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's import spaCy, load a medium-sized language model for English and assign this model to the variable `nlp`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the spaCy library\n",
    "import spacy\n",
    "\n",
    "# Note that we now load a medium-sized language model!\n",
    "nlp = spacy.load('en_core_web_md')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's limit processing to article titles and create a placeholder column to the DataFrame named `processed_title`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "talk['processed_title'] = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*pandas* warns about performing this command, because `talk` is only a slice or a _view_ into the DataFrame. \n",
    "\n",
    "Assigning a new column to **only a part of the DataFrame** would cause problems by breaking the tabular structure.\n",
    "\n",
    "We can fix the situation by creating a _deep copy_ of the slice using Python's `.copy()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "talk = talk.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try creating an empty column again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "talk['processed_title'] = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "      <td>0.371993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "      <td>0.364039</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>26691065</td>\n",
       "      <td>Fifty years in Canada, and now I feel like a s...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/fifty-y...</td>\n",
       "      <td>SHEEMA KHAN</td>\n",
       "      <td>2015-10-07 EDT</td>\n",
       "      <td>1142.0</td>\n",
       "      <td>376.0</td>\n",
       "      <td>&lt;p&gt;'Too broken to write,' I told my editor, af...</td>\n",
       "      <td>0.329247</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>25731634</td>\n",
       "      <td>I'm Canadian - and I should have a right to vote</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/im-cana...</td>\n",
       "      <td>Donald Sutherland</td>\n",
       "      <td>2015-07-28 EDT</td>\n",
       "      <td>1021.0</td>\n",
       "      <td>348.0</td>\n",
       "      <td>&lt;p&gt;My name is Donald Sutherland. My wife's nam...</td>\n",
       "      <td>0.340842</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>13647608</td>\n",
       "      <td>A nation of $100,000 firefighters</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/a-natio...</td>\n",
       "      <td>Margaret Wente</td>\n",
       "      <td>2013-08-08 EDT</td>\n",
       "      <td>1102.0</td>\n",
       "      <td>338.0</td>\n",
       "      <td>&lt;p&gt;Everyone loves firefighters. They save live...</td>\n",
       "      <td>0.306715</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   article_id                                              title  \\\n",
       "2     6929035  Too many first nations people live in a dream ...   \n",
       "4    11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "5    26691065  Fifty years in Canada, and now I feel like a s...   \n",
       "6    25731634   I'm Canadian - and I should have a right to vote   \n",
       "8    13647608                  A nation of $100,000 firefighters   \n",
       "\n",
       "                                         article_url             author  \\\n",
       "2  http://www.theglobeandmail.com/opinion/too-man...    Jeffrey Simpson   \n",
       "4  http://www.theglobeandmail.com/news/world/disg...     Campbell Clark   \n",
       "5  http://www.theglobeandmail.com/opinion/fifty-y...        SHEEMA KHAN   \n",
       "6  http://www.theglobeandmail.com/opinion/im-cana...  Donald Sutherland   \n",
       "8  http://www.theglobeandmail.com/opinion/a-natio...     Margaret Wente   \n",
       "\n",
       "   published_date  ncomments  ntop_level_comments  \\\n",
       "2  2013-01-05 EST     1164.0                433.0   \n",
       "4  2013-05-02 EDT     1129.0                411.0   \n",
       "5  2015-10-07 EDT     1142.0                376.0   \n",
       "6  2015-07-28 EDT     1021.0                348.0   \n",
       "8  2013-08-08 EDT     1102.0                338.0   \n",
       "\n",
       "                                        article_text  comments_ratio  \n",
       "2  <p>Large elements of aboriginal Canada live in...        0.371993  \n",
       "4  <p>Growing discontent among Arab nations over ...        0.364039  \n",
       "5  <p>'Too broken to write,' I told my editor, af...        0.329247  \n",
       "6  <p>My name is Donald Sutherland. My wife's nam...        0.340842  \n",
       "8  <p>Everyone loves firefighters. They save live...        0.306715  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "talk.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To retrieve the title for each article from the column `title`, feed it to the language model under `nlp` for processing and store the output into the column `processed_title`, we need to use the `.apply()` method of a DataFrame.\n",
    "\n",
    "As the name suggests, the `.apply()` method applies whatever is provided as input to the method to each row in the column.\n",
    "\n",
    "In this case, we pass the language model `nlp` to the `.apply()` method, essentially retrieving the titles stored as string objects in the column `title` and \"applying\" the language model `nlp` to them.\n",
    "\n",
    "We assign the output to the DataFrame column named `processed_title`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "      <th>processed_title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "      <td>0.371993</td>\n",
       "      <td>(Too, many, first, nations, people, live, in, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "      <td>0.364039</td>\n",
       "      <td>(Disgruntled, Arab, states, look, to, strip, C...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>26691065</td>\n",
       "      <td>Fifty years in Canada, and now I feel like a s...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/fifty-y...</td>\n",
       "      <td>SHEEMA KHAN</td>\n",
       "      <td>2015-10-07 EDT</td>\n",
       "      <td>1142.0</td>\n",
       "      <td>376.0</td>\n",
       "      <td>&lt;p&gt;'Too broken to write,' I told my editor, af...</td>\n",
       "      <td>0.329247</td>\n",
       "      <td>(Fifty, years, in, Canada, ,, and, now, I, fee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>25731634</td>\n",
       "      <td>I'm Canadian - and I should have a right to vote</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/im-cana...</td>\n",
       "      <td>Donald Sutherland</td>\n",
       "      <td>2015-07-28 EDT</td>\n",
       "      <td>1021.0</td>\n",
       "      <td>348.0</td>\n",
       "      <td>&lt;p&gt;My name is Donald Sutherland. My wife's nam...</td>\n",
       "      <td>0.340842</td>\n",
       "      <td>(I, 'm, Canadian, -, and, I, should, have, a, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>13647608</td>\n",
       "      <td>A nation of $100,000 firefighters</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/a-natio...</td>\n",
       "      <td>Margaret Wente</td>\n",
       "      <td>2013-08-08 EDT</td>\n",
       "      <td>1102.0</td>\n",
       "      <td>338.0</td>\n",
       "      <td>&lt;p&gt;Everyone loves firefighters. They save live...</td>\n",
       "      <td>0.306715</td>\n",
       "      <td>(A, nation, of, $, 100,000, firefighters)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1694</th>\n",
       "      <td>30474884</td>\n",
       "      <td>A dangerous moment in history: Can the politic...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/can-the...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2016-06-16 EDT</td>\n",
       "      <td>239.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>&lt;p&gt;As anyone trying to maintain perspective wh...</td>\n",
       "      <td>0.209205</td>\n",
       "      <td>(A, dangerous, moment, in, history, :, Can, th...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1735</th>\n",
       "      <td>32088785</td>\n",
       "      <td>Clinton shines in first debate, and not just i...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2016-09-27 EDT</td>\n",
       "      <td>232.0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>&lt;p&gt;For those who wondered whether Hillary Clin...</td>\n",
       "      <td>0.211207</td>\n",
       "      <td>(Clinton, shines, in, first, debate, ,, and, n...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2213</th>\n",
       "      <td>30508530</td>\n",
       "      <td>U.S. gun control: Don't look for logic after O...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/us-gun-...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2016-06-20 EDT</td>\n",
       "      <td>243.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>&lt;p&gt;The script is by now tediously formulaic. A...</td>\n",
       "      <td>0.164609</td>\n",
       "      <td>(U.S., gun, control, :, Do, n't, look, for, lo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2301</th>\n",
       "      <td>31605288</td>\n",
       "      <td>Let's make sure Ontario's sex-ed curriculum is...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/lets-ma...</td>\n",
       "      <td>DEBRA SOH</td>\n",
       "      <td>2016-08-30 EDT</td>\n",
       "      <td>239.0</td>\n",
       "      <td>39.0</td>\n",
       "      <td>&lt;p&gt;Debra W. Soh is a sex writer and sexual neu...</td>\n",
       "      <td>0.163180</td>\n",
       "      <td>(Let, 's, make, sure, Ontario, 's, sex, -, ed,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2302</th>\n",
       "      <td>24363093</td>\n",
       "      <td>Dad rules when sex ed collides with religion</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/dad-rul...</td>\n",
       "      <td>MICHAEL ADAMS</td>\n",
       "      <td>2015-05-11 EDT</td>\n",
       "      <td>222.0</td>\n",
       "      <td>39.0</td>\n",
       "      <td>&lt;p&gt;Michael Adams is founder and president of t...</td>\n",
       "      <td>0.175676</td>\n",
       "      <td>(Dad, rules, when, sex, ed, collides, with, re...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>519 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      article_id                                              title  \\\n",
       "2        6929035  Too many first nations people live in a dream ...   \n",
       "4       11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "5       26691065  Fifty years in Canada, and now I feel like a s...   \n",
       "6       25731634   I'm Canadian - and I should have a right to vote   \n",
       "8       13647608                  A nation of $100,000 firefighters   \n",
       "...          ...                                                ...   \n",
       "1694    30474884  A dangerous moment in history: Can the politic...   \n",
       "1735    32088785  Clinton shines in first debate, and not just i...   \n",
       "2213    30508530  U.S. gun control: Don't look for logic after O...   \n",
       "2301    31605288  Let's make sure Ontario's sex-ed curriculum is...   \n",
       "2302    24363093       Dad rules when sex ed collides with religion   \n",
       "\n",
       "                                            article_url             author  \\\n",
       "2     http://www.theglobeandmail.com/opinion/too-man...    Jeffrey Simpson   \n",
       "4     http://www.theglobeandmail.com/news/world/disg...     Campbell Clark   \n",
       "5     http://www.theglobeandmail.com/opinion/fifty-y...        SHEEMA KHAN   \n",
       "6     http://www.theglobeandmail.com/opinion/im-cana...  Donald Sutherland   \n",
       "8     http://www.theglobeandmail.com/opinion/a-natio...     Margaret Wente   \n",
       "...                                                 ...                ...   \n",
       "1694  http://www.theglobeandmail.com/opinion/can-the...   Konrad Yakabuski   \n",
       "1735  http://www.theglobeandmail.com/opinion/editori...    GLOBE EDITORIAL   \n",
       "2213  http://www.theglobeandmail.com/opinion/us-gun-...   Konrad Yakabuski   \n",
       "2301  http://www.theglobeandmail.com/opinion/lets-ma...          DEBRA SOH   \n",
       "2302  http://www.theglobeandmail.com/opinion/dad-rul...      MICHAEL ADAMS   \n",
       "\n",
       "      published_date  ncomments  ntop_level_comments  \\\n",
       "2     2013-01-05 EST     1164.0                433.0   \n",
       "4     2013-05-02 EDT     1129.0                411.0   \n",
       "5     2015-10-07 EDT     1142.0                376.0   \n",
       "6     2015-07-28 EDT     1021.0                348.0   \n",
       "8     2013-08-08 EDT     1102.0                338.0   \n",
       "...              ...        ...                  ...   \n",
       "1694  2016-06-16 EDT      239.0                 50.0   \n",
       "1735  2016-09-27 EDT      232.0                 49.0   \n",
       "2213  2016-06-20 EDT      243.0                 40.0   \n",
       "2301  2016-08-30 EDT      239.0                 39.0   \n",
       "2302  2015-05-11 EDT      222.0                 39.0   \n",
       "\n",
       "                                           article_text  comments_ratio  \\\n",
       "2     <p>Large elements of aboriginal Canada live in...        0.371993   \n",
       "4     <p>Growing discontent among Arab nations over ...        0.364039   \n",
       "5     <p>'Too broken to write,' I told my editor, af...        0.329247   \n",
       "6     <p>My name is Donald Sutherland. My wife's nam...        0.340842   \n",
       "8     <p>Everyone loves firefighters. They save live...        0.306715   \n",
       "...                                                 ...             ...   \n",
       "1694  <p>As anyone trying to maintain perspective wh...        0.209205   \n",
       "1735  <p>For those who wondered whether Hillary Clin...        0.211207   \n",
       "2213  <p>The script is by now tediously formulaic. A...        0.164609   \n",
       "2301  <p>Debra W. Soh is a sex writer and sexual neu...        0.163180   \n",
       "2302  <p>Michael Adams is founder and president of t...        0.175676   \n",
       "\n",
       "                                        processed_title  \n",
       "2     (Too, many, first, nations, people, live, in, ...  \n",
       "4     (Disgruntled, Arab, states, look, to, strip, C...  \n",
       "5     (Fifty, years, in, Canada, ,, and, now, I, fee...  \n",
       "6     (I, 'm, Canadian, -, and, I, should, have, a, ...  \n",
       "8             (A, nation, of, $, 100,000, firefighters)  \n",
       "...                                                 ...  \n",
       "1694  (A, dangerous, moment, in, history, :, Can, th...  \n",
       "1735  (Clinton, shines, in, first, debate, ,, and, n...  \n",
       "2213  (U.S., gun, control, :, Do, n't, look, for, lo...  \n",
       "2301  (Let, 's, make, sure, Ontario, 's, sex, -, ed,...  \n",
       "2302  (Dad, rules, when, sex, ed, collides, with, re...  \n",
       "\n",
       "[519 rows x 10 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Apply the language model under 'nlp' to the contents of the DataFrame column 'title'\n",
    "talk['processed_title'] = talk['title'].apply(nlp)\n",
    "\n",
    "# Call the variable to check the output\n",
    "talk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have the processed titles in a separate column named `processed_title`!\n",
    "\n",
    "Let's examine the first row in the DataFrame `talk`, whose index is 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Too many first nations people live in a dream palace"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "talk.at[2, 'processed_title']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "spacy.tokens.doc.Doc"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(talk.at[2, 'processed_title'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the cell contains a spaCy _Doc_ object.\n",
    "\n",
    "Let's now define our own Python **function** to fetch lemmas for each noun in the title.\n",
    "\n",
    "Python functions are _defined_ using the command `def`, which is followed by the name of the function, in this case `get_nouns`. \n",
    "\n",
    "The input to the function is given in parentheses that follow the name of the function.\n",
    "\n",
    "In this case, we name a variable for the input called `nlp_text`. This is an arbitrary variable, which is needed for referring to whatever is being provided as input to the function. To put it simply, you can think of this variable as referring to any input that will be eventually provided to the function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a function named 'get_nouns' that takes a single object as input.\n",
    "# We refer to this input using the variable name 'nlp_text'.\n",
    "def get_nouns(nlp_text):\n",
    "    \n",
    "    # First we make sure that the input is of correct type\n",
    "    # by using the assert command to check the input type\n",
    "    assert type(nlp_text) == spacy.tokens.doc.Doc\n",
    "    \n",
    "    # Let's set up a placeholder list for our lemmas\n",
    "    lemmas = []\n",
    "    \n",
    "    # We begin then begin looping over the Doc object\n",
    "    for token in nlp_text:\n",
    "        \n",
    "        # If the fine-grained POS tag for the token is a noun (NN)\n",
    "        if token.tag_ == 'NN':\n",
    "            \n",
    "            # Append the token lemma to the list of lemmas\n",
    "            lemmas.append(token.lemma_)\n",
    "            \n",
    "    # When the loop is complete, return the list of lemmas\n",
    "    return lemmas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have defined our function, we can use the function with the `.apply()` method to collect all nouns to the column `nouns`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>article_id</th>\n",
       "      <th>title</th>\n",
       "      <th>article_url</th>\n",
       "      <th>author</th>\n",
       "      <th>published_date</th>\n",
       "      <th>ncomments</th>\n",
       "      <th>ntop_level_comments</th>\n",
       "      <th>article_text</th>\n",
       "      <th>comments_ratio</th>\n",
       "      <th>processed_title</th>\n",
       "      <th>nouns</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6929035</td>\n",
       "      <td>Too many first nations people live in a dream ...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/too-man...</td>\n",
       "      <td>Jeffrey Simpson</td>\n",
       "      <td>2013-01-05 EST</td>\n",
       "      <td>1164.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>&lt;p&gt;Large elements of aboriginal Canada live in...</td>\n",
       "      <td>0.371993</td>\n",
       "      <td>(Too, many, first, nations, people, live, in, ...</td>\n",
       "      <td>[dream, palace]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11672346</td>\n",
       "      <td>Disgruntled Arab states look to strip Canada o...</td>\n",
       "      <td>http://www.theglobeandmail.com/news/world/disg...</td>\n",
       "      <td>Campbell Clark</td>\n",
       "      <td>2013-05-02 EDT</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>411.0</td>\n",
       "      <td>&lt;p&gt;Growing discontent among Arab nations over ...</td>\n",
       "      <td>0.364039</td>\n",
       "      <td>(Disgruntled, Arab, states, look, to, strip, C...</td>\n",
       "      <td>[agency]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>26691065</td>\n",
       "      <td>Fifty years in Canada, and now I feel like a s...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/fifty-y...</td>\n",
       "      <td>SHEEMA KHAN</td>\n",
       "      <td>2015-10-07 EDT</td>\n",
       "      <td>1142.0</td>\n",
       "      <td>376.0</td>\n",
       "      <td>&lt;p&gt;'Too broken to write,' I told my editor, af...</td>\n",
       "      <td>0.329247</td>\n",
       "      <td>(Fifty, years, in, Canada, ,, and, now, I, fee...</td>\n",
       "      <td>[class, citizen]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>25731634</td>\n",
       "      <td>I'm Canadian - and I should have a right to vote</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/im-cana...</td>\n",
       "      <td>Donald Sutherland</td>\n",
       "      <td>2015-07-28 EDT</td>\n",
       "      <td>1021.0</td>\n",
       "      <td>348.0</td>\n",
       "      <td>&lt;p&gt;My name is Donald Sutherland. My wife's nam...</td>\n",
       "      <td>0.340842</td>\n",
       "      <td>(I, 'm, Canadian, -, and, I, should, have, a, ...</td>\n",
       "      <td>[right]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>13647608</td>\n",
       "      <td>A nation of $100,000 firefighters</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/a-natio...</td>\n",
       "      <td>Margaret Wente</td>\n",
       "      <td>2013-08-08 EDT</td>\n",
       "      <td>1102.0</td>\n",
       "      <td>338.0</td>\n",
       "      <td>&lt;p&gt;Everyone loves firefighters. They save live...</td>\n",
       "      <td>0.306715</td>\n",
       "      <td>(A, nation, of, $, 100,000, firefighters)</td>\n",
       "      <td>[nation]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1694</th>\n",
       "      <td>30474884</td>\n",
       "      <td>A dangerous moment in history: Can the politic...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/can-the...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2016-06-16 EDT</td>\n",
       "      <td>239.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>&lt;p&gt;As anyone trying to maintain perspective wh...</td>\n",
       "      <td>0.209205</td>\n",
       "      <td>(A, dangerous, moment, in, history, :, Can, th...</td>\n",
       "      <td>[moment, history, centre, hold]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1735</th>\n",
       "      <td>32088785</td>\n",
       "      <td>Clinton shines in first debate, and not just i...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/editori...</td>\n",
       "      <td>GLOBE EDITORIAL</td>\n",
       "      <td>2016-09-27 EDT</td>\n",
       "      <td>232.0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>&lt;p&gt;For those who wondered whether Hillary Clin...</td>\n",
       "      <td>0.211207</td>\n",
       "      <td>(Clinton, shines, in, first, debate, ,, and, n...</td>\n",
       "      <td>[debate, comparison]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2213</th>\n",
       "      <td>30508530</td>\n",
       "      <td>U.S. gun control: Don't look for logic after O...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/us-gun-...</td>\n",
       "      <td>Konrad Yakabuski</td>\n",
       "      <td>2016-06-20 EDT</td>\n",
       "      <td>243.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>&lt;p&gt;The script is by now tediously formulaic. A...</td>\n",
       "      <td>0.164609</td>\n",
       "      <td>(U.S., gun, control, :, Do, n't, look, for, lo...</td>\n",
       "      <td>[gun, control, logic]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2301</th>\n",
       "      <td>31605288</td>\n",
       "      <td>Let's make sure Ontario's sex-ed curriculum is...</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/lets-ma...</td>\n",
       "      <td>DEBRA SOH</td>\n",
       "      <td>2016-08-30 EDT</td>\n",
       "      <td>239.0</td>\n",
       "      <td>39.0</td>\n",
       "      <td>&lt;p&gt;Debra W. Soh is a sex writer and sexual neu...</td>\n",
       "      <td>0.163180</td>\n",
       "      <td>(Let, 's, make, sure, Ontario, 's, sex, -, ed,...</td>\n",
       "      <td>[sex, ed, curriculum]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2302</th>\n",
       "      <td>24363093</td>\n",
       "      <td>Dad rules when sex ed collides with religion</td>\n",
       "      <td>http://www.theglobeandmail.com/opinion/dad-rul...</td>\n",
       "      <td>MICHAEL ADAMS</td>\n",
       "      <td>2015-05-11 EDT</td>\n",
       "      <td>222.0</td>\n",
       "      <td>39.0</td>\n",
       "      <td>&lt;p&gt;Michael Adams is founder and president of t...</td>\n",
       "      <td>0.175676</td>\n",
       "      <td>(Dad, rules, when, sex, ed, collides, with, re...</td>\n",
       "      <td>[sex, ed, religion]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>519 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      article_id                                              title  \\\n",
       "2        6929035  Too many first nations people live in a dream ...   \n",
       "4       11672346  Disgruntled Arab states look to strip Canada o...   \n",
       "5       26691065  Fifty years in Canada, and now I feel like a s...   \n",
       "6       25731634   I'm Canadian - and I should have a right to vote   \n",
       "8       13647608                  A nation of $100,000 firefighters   \n",
       "...          ...                                                ...   \n",
       "1694    30474884  A dangerous moment in history: Can the politic...   \n",
       "1735    32088785  Clinton shines in first debate, and not just i...   \n",
       "2213    30508530  U.S. gun control: Don't look for logic after O...   \n",
       "2301    31605288  Let's make sure Ontario's sex-ed curriculum is...   \n",
       "2302    24363093       Dad rules when sex ed collides with religion   \n",
       "\n",
       "                                            article_url             author  \\\n",
       "2     http://www.theglobeandmail.com/opinion/too-man...    Jeffrey Simpson   \n",
       "4     http://www.theglobeandmail.com/news/world/disg...     Campbell Clark   \n",
       "5     http://www.theglobeandmail.com/opinion/fifty-y...        SHEEMA KHAN   \n",
       "6     http://www.theglobeandmail.com/opinion/im-cana...  Donald Sutherland   \n",
       "8     http://www.theglobeandmail.com/opinion/a-natio...     Margaret Wente   \n",
       "...                                                 ...                ...   \n",
       "1694  http://www.theglobeandmail.com/opinion/can-the...   Konrad Yakabuski   \n",
       "1735  http://www.theglobeandmail.com/opinion/editori...    GLOBE EDITORIAL   \n",
       "2213  http://www.theglobeandmail.com/opinion/us-gun-...   Konrad Yakabuski   \n",
       "2301  http://www.theglobeandmail.com/opinion/lets-ma...          DEBRA SOH   \n",
       "2302  http://www.theglobeandmail.com/opinion/dad-rul...      MICHAEL ADAMS   \n",
       "\n",
       "      published_date  ncomments  ntop_level_comments  \\\n",
       "2     2013-01-05 EST     1164.0                433.0   \n",
       "4     2013-05-02 EDT     1129.0                411.0   \n",
       "5     2015-10-07 EDT     1142.0                376.0   \n",
       "6     2015-07-28 EDT     1021.0                348.0   \n",
       "8     2013-08-08 EDT     1102.0                338.0   \n",
       "...              ...        ...                  ...   \n",
       "1694  2016-06-16 EDT      239.0                 50.0   \n",
       "1735  2016-09-27 EDT      232.0                 49.0   \n",
       "2213  2016-06-20 EDT      243.0                 40.0   \n",
       "2301  2016-08-30 EDT      239.0                 39.0   \n",
       "2302  2015-05-11 EDT      222.0                 39.0   \n",
       "\n",
       "                                           article_text  comments_ratio  \\\n",
       "2     <p>Large elements of aboriginal Canada live in...        0.371993   \n",
       "4     <p>Growing discontent among Arab nations over ...        0.364039   \n",
       "5     <p>'Too broken to write,' I told my editor, af...        0.329247   \n",
       "6     <p>My name is Donald Sutherland. My wife's nam...        0.340842   \n",
       "8     <p>Everyone loves firefighters. They save live...        0.306715   \n",
       "...                                                 ...             ...   \n",
       "1694  <p>As anyone trying to maintain perspective wh...        0.209205   \n",
       "1735  <p>For those who wondered whether Hillary Clin...        0.211207   \n",
       "2213  <p>The script is by now tediously formulaic. A...        0.164609   \n",
       "2301  <p>Debra W. Soh is a sex writer and sexual neu...        0.163180   \n",
       "2302  <p>Michael Adams is founder and president of t...        0.175676   \n",
       "\n",
       "                                        processed_title  \\\n",
       "2     (Too, many, first, nations, people, live, in, ...   \n",
       "4     (Disgruntled, Arab, states, look, to, strip, C...   \n",
       "5     (Fifty, years, in, Canada, ,, and, now, I, fee...   \n",
       "6     (I, 'm, Canadian, -, and, I, should, have, a, ...   \n",
       "8             (A, nation, of, $, 100,000, firefighters)   \n",
       "...                                                 ...   \n",
       "1694  (A, dangerous, moment, in, history, :, Can, th...   \n",
       "1735  (Clinton, shines, in, first, debate, ,, and, n...   \n",
       "2213  (U.S., gun, control, :, Do, n't, look, for, lo...   \n",
       "2301  (Let, 's, make, sure, Ontario, 's, sex, -, ed,...   \n",
       "2302  (Dad, rules, when, sex, ed, collides, with, re...   \n",
       "\n",
       "                                nouns  \n",
       "2                     [dream, palace]  \n",
       "4                            [agency]  \n",
       "5                    [class, citizen]  \n",
       "6                             [right]  \n",
       "8                            [nation]  \n",
       "...                               ...  \n",
       "1694  [moment, history, centre, hold]  \n",
       "1735             [debate, comparison]  \n",
       "2213            [gun, control, logic]  \n",
       "2301            [sex, ed, curriculum]  \n",
       "2302              [sex, ed, religion]  \n",
       "\n",
       "[519 rows x 11 columns]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Apply the 'get_nouns' function to the column 'processed_title'\n",
    "talk['nouns'] = talk['processed_title'].apply(get_nouns)\n",
    "\n",
    "# Call the variable to examine the output\n",
    "talk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, an empty DataFrame column is actually not required for adding new data, because *pandas* creates a new column automatically through assignment, as exemplified by `talk['nouns']`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also easily extract information from DataFrames into Python's native data structures. \n",
    "\n",
    "The `tolist()` method, for instance, can be used to extract the contents of a *pandas* Series into a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['dream', 'palace'],\n",
       " ['agency'],\n",
       " ['class', 'citizen'],\n",
       " ['right'],\n",
       " ['nation'],\n",
       " [],\n",
       " ['reform'],\n",
       " ['leader', 'parade'],\n",
       " ['pm'],\n",
       " ['government', 'monopoly']]"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Cast pandas Series to a list\n",
    "noun_list = talk['nouns'].tolist()\n",
    "\n",
    "# Call the variable to check the output\n",
    "noun_list[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What we have now under `noun_list` is a list of lists, because each row in the `nouns` column contains a list.  \n",
    "\n",
    "Let's loop over the list and collect the items into a single list named `final_list` using the `extend()` method of a Python list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up the placeholder list\n",
    "final_list = []\n",
    "\n",
    "# Loop over each list in the list of lists\n",
    "for nlist in noun_list:\n",
    "    \n",
    "    # Extend the final list with the current list\n",
    "    final_list.extend(nlist)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's briefly examine the first ten items in final list and then count the number of items in the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['dream',\n",
       " 'palace',\n",
       " 'agency',\n",
       " 'class',\n",
       " 'citizen',\n",
       " 'right',\n",
       " 'nation',\n",
       " 'reform',\n",
       " 'leader',\n",
       " 'parade']"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_list[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "887"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(final_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To plot the 10 most frequent nouns, we can cast the `final_list` into a *pandas* Series, count the occurrences of each lemma using `value_counts()` and plot the result using the `plot()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEXCAYAAAC06B/dAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYS0lEQVR4nO3dfZRlVXnn8e/TtETDO+kKwURpYYmmo4LQIApmfB8Mo2KCKCGKiQbH4AR8SUSdNRglhphgdOGE0CraKL7AKCOGOEFeBEFEuoHwIsOAvGRJkG5EgaAJAs/8sc/tul1UdTVd9+57dvr7WatX1T33Vu0H7qnfPWefvfeJzESS1J5Fky5AkrRpDHBJapQBLkmNMsAlqVEGuCQ1anHNxpYsWZJLly6t2aQkNW/16tV3Z+bUzO1VA3zp0qWsWrWqZpOS1LyIuH227XahSFKjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSo6rOxJzP0mPPWfDvuO2Eg0ZQiST1n0fgktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalR8wZ4RDwpIi6MiO9FxPURcXS3fceI+EZE3NR93WH85UqSBjbmCPwh4J2ZuQzYDzgqIpYBxwLnZ+ZTgfO7x5KkSuYN8My8MzOv7L6/H7gB+FXgVcDK7mUrgYPHVKMkaRaPqQ88IpYCzwYuB3bKzDu7p34I7DTa0iRJG7LRAR4RWwNfBo7JzPuGn8vMBHKOnzsyIlZFxKq1a9cuqFhJ0rSNCvCIeBwlvE/PzK90m++KiJ2753cG1sz2s5m5IjOXZ+byqampUdQsSWLjRqEE8Cnghsz8yNBTZwNHdN8fAXx19OVJkuayMXel3x94PXBtRFzdbXsvcAJwRkS8CbgdOHQsFUqSZjVvgGfmJUDM8fSLR1uOJGljORNTkhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDVq3gCPiFMjYk1EXDe07f0RcUdEXN39+63xlilJmmljjsA/Axw4y/a/ycw9u3//MNqyJEnzmTfAM/Ni4J4KtUiSHoOF9IG/LSKu6bpYdhhZRZKkjbKpAX4ysBuwJ3AncOJcL4yIIyNiVUSsWrt27SY2J0maaZMCPDPvysyHM/MR4BPAvht47YrMXJ6Zy6empja1TknSDJsU4BGx89DDVwPXzfVaSdJ4LJ7vBRHxBeAFwJKI+AFwHPCCiNgTSOA24C3jK1GSNJt5AzwzD5tl86fGUIsk6TFwJqYkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjVo86QL6aOmx5yz4d9x2wkEjqESS5uYRuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY2aN8Aj4tSIWBMR1w1t2zEivhERN3VfdxhvmZKkmTbmCPwzwIEzth0LnJ+ZTwXO7x5LkiqaN8Az82LgnhmbXwWs7L5fCRw82rIkSfPZ1D7wnTLzzu77HwI7zfXCiDgyIlZFxKq1a9duYnOSpJkWfBEzMxPIDTy/IjOXZ+byqamphTYnSepsaoDfFRE7A3Rf14yuJEnSxtjUAD8bOKL7/gjgq6MpR5K0sTZmGOEXgMuAp0XEDyLiTcAJwEsj4ibgJd1jSVJFi+d7QWYeNsdTLx5xLZKkx8CZmJLUKANckhplgEtSo+btA9dkLD32nAX/jttOOGgElUjqK4/AJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY1yIo82yAlFUn95BC5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY1yJqZ6ry+zQRdaRx9q6EsdfahhVHVMkkfgktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEY5kUeSFmCSk5o8ApekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNWtAwwoi4DbgfeBh4KDOXj6IoSdL8RjEO/IWZefcIfo8k6TGwC0WSGrXQAE/g3IhYHRFHzvaCiDgyIlZFxKq1a9cusDlJ0sBCA/yAzNwLeDlwVET85swXZOaKzFyemcunpqYW2JwkaWBBAZ6Zd3Rf1wBnAfuOoihJ0vw2OcAjYquI2GbwPfAy4LpRFSZJ2rCFjELZCTgrIga/5/OZ+X9GUpUkaV6bHOCZeQuwxwhrkSQ9Bg4jlKRGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNWlCAR8SBEXFjRNwcEceOqihJ0vw2OcAjYgvgfwIvB5YBh0XEslEVJknasIUcge8L3JyZt2Tmg8AXgVeNpixJ0nwiMzftByMOAQ7MzDd3j18PPCcz3zbjdUcCR3YPnwbcuOnlArAEuHuBv2Oh+lAD9KOOPtQA/aijDzVAP+roQw3QjzpGUcMumTk1c+PiBf7SeWXmCmDFqH5fRKzKzOWj+n2t1tCXOvpQQ1/q6EMNfamjDzX0pY5x1rCQLpQ7gCcNPf61bpskqYKFBPgVwFMj4ikRsSXwOuDs0ZQlSZrPJnehZOZDEfE24B+BLYBTM/P6kVU2t5F1xyxAH2qAftTRhxqgH3X0oQboRx19qAH6UcfYatjki5iSpMlyJqYkNcoAl6RGGeCS1CgDXJIaNfaJPKMQEQcAT83MT0fEFLB1Zt5aqe2TgDmv9GbmH9eoo6tlf+D9wC6U9y5KCblrrRqGatmF8p6cFxFPABZn5v0V298K+FlmPhIRuwNPB76emT+vVcNQLTtQ5kSs+3vKzCsrtv8LwO8AS2fU8IFaNQzVMun9YnfgZGCnzHxGRDwLeGVmHl+xhvMz88XzbRuF3gd4RBwHLKdMw/808Djgc8D+lUpY1X3dn7Jo15e6x68BvlephoFPAW8HVgMPV257nYj4Q8ryCDsCu1Emcf0dMPIddAMuBp7fhee5lHkJrwUOr1gDEfFB4I3A95n+oE/gRRXL+CpwL2W/+PeK7a6nJ/vFJ4A/AU4ByMxrIuLzwNgDPCIeD/wisKTbL6N7alvgV8fRZu8DHHg18GzgSoDM/JeI2KZW45m5EiAi3gockJkPdY//DvhWrTo692bm1yu3OZujKIuZXQ6QmTdFxC9XriEy86cR8SbgbzPzwxFxdeUaAA4FdusWdJuUX8vMAyfY/kAf9otfzMzvRsTwtocqtf0W4BjgiZQP00ER9wEfH0eDLQT4g5mZEZGw7tR5EnagfJLe0z3euttW04UR8VfAVxg60qp5ut7598x8cPBHEhGL2UA305hERDyXcsT9pm7bFpVrALgO2B5YM4G2B74dEc/MzGsnWAP0Y7+4OyJ2G7TbLbp3Z42GM/NjwMci4r9l5kk12mwhwM+IiFOA7btTtD8APjmBOk4AroqICymfrL9J6Y+u6Tnd1+GFcWqfrgNcFBHvBZ4QES8F/gj4WuUajgHeA5yVmddHxK7AhZVrAPgLyn5xHet/qL6yYg0HAG+MiFu7GgbXRp5VsQbox35xFGXm49Mj4g7gVuD3ahaQmSdFxPN49DWJ00bdVhMzMbud4WWUHfMfM/MbE6rjV5gO0csz84eTqGPSImIR5ah33XsCfDJb2JlGLCKup/S3Xgs8MtiemRdVrGGX2bZn5u21aujqeNR+kZmfqFnDUC1bAYtqXkAdavuzlGsAVzN9rSrHMeCh9wEeEX+Zme+eb9sY299rQ89XHm2wHXAc5egf4CLgA5l5b60aJi0iPpqZx0TE15jl9LzykS8RcUVm7lOzzTnq2AN4fvfwW5n5TxOo4eiuG2GD28Zcw/bAG3j00W/N0WI3AMtqHNC0EOBXZuZeM7ZdU+v0sOsymUtmZrXui4j4MqXPdWW36fXAHpn527Vq6Oq4lkeH572UETvHZ+aPxtj23pm5OiL+02zP1zzy7er5CKXb4mwmdF0iIo4G/pBybQTKhf8Vtfphh+qY7W/1qsx8dsUavg18h0efEa2c84dGX8OZwB9n5tj73nsb4N2ojz8CdqUM0RrYBrg0M6v1a3Wnhs/NzEtrtTlHHVdn5p7zbatQx4cpp4af7za9jjJ86oeUkTqvqFzPDsCTMvOamu12bc/2AV/7g/0ayv75QPd4K+Cyigc5hwG/S+mLHx6ZtQ3wyDjGP2+glkd9iFRse3BWuA2wJ/BdxnxdpM8XMT8PfJ1ykWj4jvf3Z+Y9s//IeHSTRT5OGc44ST+LiAMy8xJYN7HnZxOo4yUz/kiuHfzhRESVD9aI+CbwSso+vBpYExGXZuY7arQ/kJkvrNneHIL15wU8zPQQthq+TRnpsQQ4cWj7/UDtD9XPdoMd/p71w7NGZvx1hTbW09sA7/p17wUOA+jGkz4e2Doits7Mf65c0vkR8TvAVyZ4se6twMquLzwoQxrfOIE6toiIfTPzuwARsQ/TQ/hqjbndLjPvi4g3A6dl5nHdkWhVEbET8CHgiZn58ohYRjka/lTFMj4NXB4RZ1H2i1dRJn1V0V0svR14bq02N+BB4K+A97H+xKqxz1au3X0HPe5CGYiIVwAfoQyOX0OZRn5DZv5G5TruB7aiHN38jOmhWtvWrKOrZVtK4/fVbrtrfx/gVMpY+KBMVHgzcD1wUGaeUaGGaymjHVYC78vMK2peGxmq4+uUAH1fZu7RjX2+KjOfWbmOvShdGAlckplX1Wy/q2E/4CTg14EtKR/qD9T8G4mIW4B9M3NiNzLusmKua0TvzMxbRtVWb4/AhxwP7Aecl5nPjogXUnlcJ0BmVpv9OVNE/F5mfi4i3jFjOwCZ+ZGa9WTmFcAzuzOBwdnSwNjDu/MByvDFS7rw3hW4qVLbw5Zk5hkR8R5Yd6eqSS1zEJTgqNl9MuzjlOshZ1LmKrwB2L1yDTcDP63c5kwfBX5A6QYOyv+T3SizyU8FXjCqhloI8J9n5o8iYlFELMrMCyPio5MoJCJeyfQQvm9m5t9Xanow+3S2D5Hqp1AxY/GkoQ+SaosnZeaZlKAYPL6lq6m2ByLil5ie+bcf5Wirmoj4H5S1eb5MCYxPR8SZWXEBp4HMvDkitsjMh7s6rqJMuKrlAeDq7uLycB94tWGElMWz9hh6vKIbbPDubqLTyLQQ4D+JiK0pixedHhFrKG9SVRFxArAPcHq36eiI2D8zx75zZuYp3bfnzRwJ013IrG1iiydFxJ9mWfdk1lUiK/+hAryDMoRwt4i4FJgCDqlcw+GU4aT/Buv21aupsIDTDD+NcoPzq7uRSndSf8nq/939m6SfRsShwP/qHh8C/Fv3/UgPuFroA9+K8h8flB11O+D0cY41nqOOa4A9M/OR7vEWlL7Oan2uc4yzrT5sKiKuy8xn1GxzqO1XZObXIuKI2Z6vOd53qKbFlNUyA7gxKy9p2x1tvjozf9I93p5ysb3qEgvdjNC7KP3fb6f8rf5tZt5cs45J67rzPka5qJuUcelvB+4A9h6MIhuF3h+BD41t3Zb66yrMtD3Ti1ltV6vRKIs2PQ+YmtEPvi2TWcBpYosndeG9BfDMzHxX7fYHImKuyVO7RwSZ+ZU5nh9lDYOzkHuB6yPiG93jl1LGIFeVmbd3R+BLKZOKbsxKqzRGxBmZeegck8xyRpfGWHXdeXPNhRhZeEMDAR4RbwH+jHIU/gjTF2pq38TgQ8CV3fjjwWJWx27wJ0ZnS8qIj8Ws3w9+H/VP12HCiydl5sMT6joatqHJSsn0rMhxGqxVvxo4a2j7Nyu0/SgRcRBl/e/vU/aJp0TEW7LOEshHd19voKwHvq4s4MMV2p9I914LXSg3UcbVTmxYUFfH54D/B/wYuA24IisvZhURu2TlBYrmqmO27TVri4iTKYvkn8nQNZEaR7590p2NnJaZVW9kMUct/xf4L4MukyjLup6TmU+vWMPElt6IiB9l5i9FxDGUnFjPOLr3en8ETvk0n/SwICgTI55Pmf23G2UJ0Yuz4kI9wCcj4jVDfZ07AF/MzP9csYZ1QT00uWoSHg/8iPWX0q115LtONwLlOIbGYFMWGKtyjaY7G9klIras1V2xAffP6O++hTIbc+xiaOmNGRO6tgFqLYFxV0Q8Efh9ylDBsQ/nbCHA30Ppc72cyQ0Lohu+eDFlJMoLgf8K/AblYkUtSwbh3dX046h/x5PBcMoTmTG5ivL/o5ZFwNEzPsxO3OBPjMcXKSOkBkMYD6fcdu8lFWu4Bbg0Is5m/bORKvMDhq4HrIqIf6DMBUjK0MYratRAP5beOBk4n9K9u3po+9i6fVsI8FOAC5ixulhtEXE+ZTz2ZZQFe/bJzNp3YXkkIp48WEag68qYRB/YB5n85KpnzfJhNom1anbOzA8OPT4+Il5buYbvd/8WMftcgXEbvh5wFzBYKXItlc7QZi69MQlZVn88KSJOzsy31mizhQB/XFZeoGgO1wB7A8+g7Cg/iYjLMrPmYlLvAy6JiIson+rPp9xEtrY+TK5aFBE7ZOaPASJiRyazP58bEa9jegbqIZQZotVk5p/VbG+W9n9/ku33Ta3whjYuYn6IctHwa9RfXWy2erahLCD1LuBXMvMXKre/hHL0C/CdSVzcjYjzgIMpp6tLKN0o+2Tm8yrW8AbgvUzPxnwN8OeZ+dlK7Q/WuwjKmdng7HAR8K+V1/+YAv6U0oW17oh3AuPAd6d0I+yUmc+IiGdRZiVWnxG6uWghwG+dZXNmZtVhhBHxNsoR796UD5RvUe58ckHFGgaTmXbNzA9ExJMpHyJVx/z2aHLVMqYvYl6Qmd+r2X5fRMS5lH73d1GuzRwBrM1Kd60aquMiyhC+U7K7icMkJ31tDnof4H0REe+ihPbqzKy1ZOrMGk6mHOm9KDN/vbtwd2724JZem7MJrpEzaH91Zu49PFwuJnCrt0GbMXQXnpjADUc2J73tA4+IF2XmBXPNeKs93jczqy/WPovnZLlpwlWw7sLdlrWL6N6TvwR+mXIUPrGldSdtkmvkDBlM3b+zm0zzL8COFdsfuLsb+z1Y2OsQynooGpPeBjjlSvYFzD7jrfp43574eTdxY/AHMsVkRuZ8GHhFZt4wgbb75rdYf42clUDtFfiOj7K07zsp63FvCxxTsf2Bo4AVwNMj4g7gVkoXm8bELpSGRMThwGuBvSg3MjgE+O9ZllatWcelmTnpqey90E0aecHgono3GuablRc5W8n6Y+J3BP46M/+gUvszR4k9gXIx9wGov1795qS3R+Cz7BTr2Rx3isw8PSJWAy+mdFscXPMoeMaEjS9Rlu0cHhm0OZ4VfYgyK/dC6q+RMzBzTPw9lcfED8aeP43SnfRVyv+L1zOBRbU2J70NcDY8IWGzOm3ojqgG1gBfGH6u4pDKQXdWUpY3eNnQc5tdt1ZELKJ0Ye1HCS6Ad9deI4cJj4kfjEPvZirvlZn3d4/fD5xTq47NUW8DfGinmHl6OKkp05O0mukxx8MfXlVXZhxM2PA9KTLzkW4FujMoN3WYlBOByyJivTHxE6hjJ8pNhQce7LZpTHob4EP6MmV6YjLzKbDuiO9w4ClD48B3nkBJm/17MuS8bojpl1h/HZJqE80y87SIWMX0mPjfntCY+NOA70bEYGnbg4HPTKCOzUbvL2JGxD9RLhINnx5elJXv+t0HfRkH7nsyrZtoNtvaz7XXq++FiNiLMuEN4OLMvGqS9fxH18IReF9OD/ugF+PA8T0ZtoyyjOlgOdlvUW5qsFnKzCspd19XBb0/AgenTA90S+o+j3Izib26ceDnDma9Va7F94RyKy/KnZEGE3l+F9guMw+dXFXaXDQR4Cr6Mg5c0yLie5m5bL5t0ji00IWizqTHgWtWV0bEfpn5HYCIeA7T96qUxsojcGkBIuIGygSWf+42PRm4EXiIijd61ubJAJcWIOa4wfNA9uAm1PqPywCXpEYtmnQBkqRNY4BLUqMMcElqlAEuSY36/xXCepbbTIgCAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "pd.Series(final_list).value_counts()[:10].plot(kind='bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saving DataFrames\n",
    "\n",
    "*pandas* DataFrames can be easily saved as *pickled* objects using the `to_pickle()` method.\n",
    "\n",
    "The `to_pickle()` method takes a string as input, which defines a path to the file in which the DataFrame should be written.\n",
    "\n",
    "Let's pickle the DataFrame with the three articles stored under `df` into a file named `pickled_df.pkl` into the directory `data`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write the DataFrame to disk using pickle\n",
    "df.to_pickle('data/pickled_df.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can easily check if the data has been saved successfully by reading the file contents using the `read_pickle()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>filename</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>WP_1990-08-10-25A.txt</td>\n",
       "      <td>﻿*We Don’t Stand for Bullies': Diverse Voices ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NYT_1991-01-16-A15.txt</td>\n",
       "      <td>﻿U.S. TAKING STEPS TO CURB TERRORISM: F.B.I. I...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>WP_1991-01-17-A1B.txt</td>\n",
       "      <td>﻿U.S., Allies Launch Massive Air War Against T...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 filename                                               text\n",
       "0   WP_1990-08-10-25A.txt  ﻿*We Don’t Stand for Bullies': Diverse Voices ...\n",
       "1  NYT_1991-01-16-A15.txt  ﻿U.S. TAKING STEPS TO CURB TERRORISM: F.B.I. I...\n",
       "2   WP_1991-01-17-A1B.txt  ﻿U.S., Allies Launch Massive Air War Against T..."
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read the pickled DataFrame and assign the result to 'df_2'\n",
    "df_2 = pd.read_pickle('data/pickled_df.pkl')\n",
    "\n",
    "# Call the variable to examine the output\n",
    "df_2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's compare the DataFrames, which returns a Boolean value (True/False) for each cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>filename</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   filename  text\n",
       "0      True  True\n",
       "1      True  True\n",
       "2      True  True"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df == df_2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section should have given you a basic idea of the *pandas* library and how DataFrames can be used to store and manipulate textual data."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Edit Metadata",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}