CPSC150A
Scientific Computing

Activity 5

Project Speadsheet

Data Selection

Now that you have a dataset and a question, it’s time to start writing code to analyze the dataset. Many datasets contain far more data than you will need to answer your questions. Getting rid of the extra data will make it easier for you to work with data, and it will help your programs run faster. Begin by looking at the columns of your dataset and picking the ones that you think might be helpful. Most datasets will also have a codebook that details the data collection process provides descriptions of the columns. Use the codebook to help you find relevant columns. Use a spreadsheet program like Microsoft Excel, Apple Numbers, or OpenOffice Calc to remove the unwanted columns. Don’t save the file because you will probably want to go back later and add or remove columns. Instead, export the dataset as a CSV file.


Spreadsheet

Upload the CSV file you created to HumblePython and write a program that uses the spreadsheet module to print values from the spreadsheet. Verify that it has all of the data you exported by printing the column headers, and the width and height of the spreadsheet.