DATA170
Exploring Data

Activity 10

Pandas

  1. Write a Python program that uses numpy and pandas to normalize the columns of a dataset.

    For example:

    import numpy as np
    import pandas as pd
    
    df = pd.read_csv("/content/sample_data/california_housing_test.csv")
    norm = # put your code here
    print(norm.describe()) # min and max for each column should be 0 and 1
  2. Write a Python program that uses numpy and pandas to find the index of the row that is the nearest neighbor to a row in a dataset.

    For example:

    import numpy as np
    import pandas as pd
    
    df = pd.read_csv("/content/sample_data/california_housing_test.csv")
    norm = # normalize code here
    norm = norm.to_numpy() # convert to numpy
    index = 1 # row of data to search for nearest neighbor of
    row = norm[index, :] # row
    test = np.delete(norm, index, axis=0) # data excluding row
    dist =  # put your code here
    print(np.argmin(dist)) # print the index of the nearest row