Working with CSV files in Python

Coding with Python

🕑 This lesson will take about 20 minutes

In this lesson, you will learn how to open and read spreadheets (as .csv files) in Python. CSV stands for Comma Separated Values. A CSV file contains data where each cell (or field) in a column is separated by a comma and each record is on a new line. To work with CSV files, we need to make sure we import the csv module first.

In this lesson, we will use a CSV file containing monthly mean maximum temperatures in Sydney, Australia from 1859 to 2017 (from the Bereau of Meteorology) as an example data set. If you want to work with the example in this lesson, you can download the BOM temperature data for Sydney here. Download the CSV file and then save it in the same folder as your Python code.

Make sure if using Visual Studio Code, that you have opened the folder (containing both the Python code and the CSV file) before attempting to run the Python code. To do this, clicke File > Open Folder and then select the folder containing your code and CSV file.

The following example code will open the CSV file (in the same directory as the Python file) and print the entire contents on the screen. In this example, a CSV file called temps.csv is being opened and read from, and then a for loop is calling the print() function to display each row of the spreadsheet.

Skipping a header row

If your CSV file has a head row, you can skip it using the next function, for example:

skip(csv_reader)

We can use indexing to reference specific cells (where a column and row intersect) in the spreadsheet. Indexing begins at 0. Keep in mind that if you skip the header row, the second row (the one after the header row) will be given an index of 0.

Referencing specific rows and cells

We can also reference a specific cell (where a column and row intersect) in a CSV file. In the example code below, a target row and column are specified. A for loop is used to iterate over each row in the file until the target row is found. Then, the target column from that row is referenced. This achieved by referencing the index of the row and the index of the column. In this example, the header row has been skipped using the next() function, so the second row in the file has an index of 0.

Example: processing a CSV file

In this example, we have a CSV file (called temps.csv) that contains the mean maximum temperatures for each month of each year between 1859 and 2017 as well as the annual mean maximum temperature for each year. Let’s say we want to find out the average maximum temperature over that entire timeframe. The last column of the spreadsheet (which has an index of 15) contains the annual mean maximum temperature for each year (in each row of the spreadsheet, excluding the header row). The code below reads the CSV file and then uses a for loop to iterate over each row in the spreadsheet (after skipping the header row) and add the annual mean maximum temperature value for each year to a total, whilst keeping track of the number of rows containing data in the file. After the for loop ends, the average maximum temperature for all years between 1859 and 2017 is calculated by dividing the total by the row count, rounded to 2 decimal places and then displayed on the screen using the print() function.

Writing data to a CSV file

Let’s say we have some data we wish to write some data to a CSV file. In this example, we have a two-dimensional (2D) list containing annual mean maximum temperatures between 2010 and 2017. The first “row” of the list contains the header values “Year” and “Annual mean”, and the following rows have the year as the first element and the temperature value as the second element. The data will be written to a CSV file called “output.csv” using the writerows() function, so that each list (or “row”) within the 2D list will be written as a separate row in the CSV file.

In this last example, we have a simple one-dimension list that contains some names. The code below shows how to write each element from the list to a CSV file where each element will be in a new row in one column. Note that in the example code above, we used the writerows() function to write each “row” of a two-dimensional list to rows in a CSV file, whereas in this example, we use a for loop to iterate over each element in a one-dimensional list and then call the writerow() function to write each value to a new row in the CSV file.

Later on in this course, we will look at how we can use the numpy and matplotlib libraries to analyse and plot data from datasets such as CSV files.


Next lesson:
Logging