CS 2120: Class #9

Array – first steps

  • The list was our first data structure.

  • Now we’re going to meet a similar, but slightly different, one: the array

  • Let’s get started:

    >>> a=numpy.array([5,4,2])
    >>> print a
    [5 4 2]
    
  • Looks a lot like a list, doesn’t it?

  • Can we manipulate it like a list?

    >>> print a[0]
    5
    >>> print a[1]
    4
    
  • We can definitely index it, the same as a list.

  • I wonder if arrays are mutable?

    >>> a[1]=7
    >>> print a
    [5 7 2]
    
  • Yes, arrays are mutable.

  • With lists, I could mix types in a single list. Like this:

    >>> l = [5,4,3]
    >>> l[2] = 'walrus'
    >>> print l
    [5, 4, 'walrus']
    
  • Can I do that with arrays?

    >>> a=numpy.array([5,4,2])
    >>> a[2]='walrus'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: invalid literal for long() with base 10: 'walrus'
    
  • Ah ha! We found a way in which arrays are different.

  • Lists are just collections of stuff. Any old stuff. Each element can be of a different type.

  • In an array, every element must have the same type!

Activity

Create two arrays of integers, each having the same number of elements.

What mathematical operations can you do on the arrays? (+,-,*,/).

What happens if you try to perform the operations on arrays of different sizes?

How does + work differently on arrays than lists?

NumPy object attributes

  • We can ask NumPy what type the items in an array have like this:

    >>> a.dtype
    dtype('int64')
    
  • This is a new notation for us. We’re used to passing something to a function, which will tell us the type. Like this:

    >>> type(something)
    
  • Here, we instead asked NumPy to tell us about an attribute of our array (In this case the attribute dtype – standing for “data type”):

    >>> a.dtype
    
  • Objects in NumPy have many attributes. These will mostly get set automatically for you, but we’ll need to set a few of them manually later.

  • If you want to see all the attributes your array has you can either type a. and then press the [Tab] key (if you’re using ipython) or you can do:

    >>> dir(a)
    
  • That’s a lot of attributes!

  • Some of those attributes are things like dtype that store information about the state of the object.

  • Some are special functions that can only be applied to that object. For example, every NumPy object comes with it’s own view function. Check it out:

    >>> a.view()
    array([5, 4, 2])
    
  • When a function appears after a . , that function is automatically applied to the object appearing before the .
    • These special functions built in to objects can also take parameters.
  • For example, we can change the types of the elements of our array:

    >>> b=a.astype(numpy.float32)
    >>> b.view()
    array([ 5.,  4.,  2.], dtype=float32)
    

Activity

Create an array a = numpy.array([1,2,6,7,5,4,3]). Figure out to find/do the following things with attributes of a:

  1. “view” a
  2. sort a
  3. find the maximum value in a
  4. find the minimum value in a
  5. add up (sum) all the values in a
  6. find the average (mean) of the values in a

Making arrays bigger

  • With lists, we could always append items to make them bigger (+)

    >>> [1,2,3] + [4]
    [1,2,3,4]
    
  • Arrays are meant to have fixed size.

  • Why do you think this is?

  • If you really, really, want to make an array bigger... you can’t.

  • You can however, make a new array that is bigger using numpy.append():

    >>> a = numpy.array([1,2,3,4])
    >>> a.view()
    array([1, 2, 3, 4])
    >>> b = numpy.append(a,5)
    >>> a.view()
    array([1, 2, 3, 4])
    >>> b.view()
    array([1, 2, 3, 4, 5])
    
  • Note carefully that numpy.append() did not change a. It created a new array, b.

Activity

Create an array of 4 integers.

Create a new, bigger, array by appending the integer 7 on to your array.

Create another new array by appending the string 'walrus'.

Did that last one work? What happened?

Flexibility vs Power

  • Arrays are less flexible than lists:

    • We can’t change their size
    • They can only store data of a single type
  • But... it is this very lack of flexibility that lets us do all sorts of cool stuff like have a .sum() attribute.

Activity

How would you implement .sum() for a list?

Higher dimensions

  • Numpy arrays generalize to higher dimensions.

  • Let’s create a 2D array:

    >>> a=numpy.array([[1,2,3],[4,5,6],[7,8,9]])
    >>> a.view()
    array([[1, 2, 3],
           [4, 5, 6],
           [7, 8, 9]])
    
  • Note the format in our call to numpy.array. A list of lists.

  • Each row of the array gets its own list.

  • As long as two 2D arrays have the same shape, you can do arithmetic on them, just like 1D arrays.

  • How do we check the shape of an array?

    >>> a.shape
    (3, 3)
    

Activity

Create a 4x4 array. Verify that it has shape (4,4).

You’ve changed your mind. The array should actually be 2x8. reshape your 4x4 array in to a 2x8 array without recreating it from scratch.

Verify that the reshaped array is (2,8).

Finally flatten your 2D array into a 1D array.

Starting points

  • Sometimes you want an array of shape (n,m) that contains all zeros:

    >>> a=numpy.zeros([n,m])
    
  • Guess what numpy.ones() does?

  • How about numpy.eye()?

Slicing

  • We’ve already seen that you can index arrays like lists (and strings)
  • Likewise, you can use Python’s powerful slicing on arrays.

Activity

Create an array arr = numpy.array([0,1,2,3,4,5,6,7]). Using a single command
  1. Print the first 3 elements
  2. Print the last 3 elements
  3. Print the even elements of arr
  • Slicing works for higher dimensional arrays, too. For example:

    >>> a=numpy.arange(25).reshape(5,5)
    >>> a.view()
    array([[ 0,  1,  2,  3,  4],
           [ 5,  6,  7,  8,  9],
           [10, 11, 12, 13, 14],
           [15, 16, 17, 18, 19],
           [20, 21, 22, 23, 24]])
    >>> print a[0:2,1:4]
    [[1 2 3]
     [6 7 8]]
    
  • Note the use of numpy.arange which works like range but returns an array.

  • If you want a whole column/row/etc, you can use a plain : as the index. For example, if I wanted to pull out every row of the first two columns:

    >>> print a[:,0:2]
    [[ 0  1]
     [ 5  6]
     [10 11]
     [15 16]
     [20 21]]
    

Activity

Modify the previous command to print all of the columns of the first two rows.

For loops

  • If for loops work for lists, do you think they’ll work for arrays?

Activity

Write a function printeach(arr) that uses a for loop to print each element of an array that is passed in as a parameter.

Test it on a 1D array.

Now try a 2D array.

If you’re feeling bold, how about a 3D array?

NumPy Matrices

  • NumPy makes a distinction between a 2D array and a matrix.

  • Every matrix is definitely a 2D array, but not every 2D array is a matrix.

    >>> a = numpy.eye(4)
    >>> b=numpy.array([1,2,3,4])
    >>> b*a
    array([[ 1.,  0.,  0.,  0.],
        [ 0.,  2.,  0.,  0.],
        [ 0.,  0.,  3.,  0.],
        [ 0.,  0.,  0.,  4.]])
    
  • That... wasn’t what we expected. It’s a perfectly reasonable interpretation of our request, but here we really wanted matrix-vector multiplication.

  • If we want Python to treat the NumPy arrays as matrices and vectors, we have to explicitly say so. Like this:

    >>> numpy.dot(a,b)
    array([ 1.,  2.,  3.,  4.])
    
  • Another option is to convert a to the matrix type:

    >>> a = numpy.matrix(a)
    >>> b *a
    matrix([[ 1.,  2.,  3.,  4.]])
    
  • The preferred option is the first one. Keep everything as an array and use numpy.dot when you want matrix/vector multiplication.

  • Moral of the story: if you want your 2D array to behave like a matrix when you do things like multiplication... you need to tell Python that it’s a matrix.

Activity

Write a function chain(matrix,n) that will take an input matrix, and return the result of multiplying it by itself n times. Test it out on a square matrix with every entry between 0 and 1 and the entries in each row summing exactly to 1. Like this:

>>> a=numpy.array([[0.2,0.8,0.0],[0.1,0.3,0.6],[0.0,0.8,0.1]])

This is an example of something called a Right Stochastic Matrix

Note

That last activity took you most of the way to implementing a Markov Chain . If your interests run towards state-based probabilistic simulation... you’ll want to follow up on this.

NumPy Linear Algebra

  • As you might expect, NumPy already has a whoooole bunch of linear algebra routines built right in:

    >>> a=numpy.random.rand(5,5) # Create a random matrix
    >>> matshow(a)               # Visualize it
    >>> numpy.linalg.eigvals(a)  # Find its eigenvalues
    >>> numpy.linalg.det(a)      # Find its determinant
    >>> numpy.linalg.inv(a)      # Find its inverse
    
    >>> b=numpy.random.rand(5)   # Create a random vector
    >>> numpy.linalg.solve(a,b)  # Solve the system of linear equations ax = b
    
  • And much, much, more.

  • Bottom line: If there’s linear algebra you need to do, NumPy/SciPy almost certainly already have built-in routines to do it. Google and the NumPy docs are your friends!

Going further with NumPy

  • We’ve only just scratched the surface of what NumPy can do.
  • If you foresee yourself using NumPy frequently, read the assigned reading carefully.
  • Even that is just the bare-bones basics.
  • You’ll eventually want to dig in to The NumPy docs. Don’t try to read those “cover-to-cover” though. The best way to use the docs is to look up stuff that you actually need and just read about that. Eventually you’ll get a feel for the capabilities of the package.

For next class