**ā” NumPy, which stands for Numerical Python, is an opensource library that allows users to store large amounts of data using less memory and perform extensive operations (mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation, etc) easily using homogenous, one-dimensional, and multidimensional arrays**.

The basic data structure of NumPy is a ndarray, similar to a list.

š” An array in NumPy is a data structure organized like a grid of rows and columns, containing values of the same data type that can be indexed and manipulated efficiently as per the requirement of the problem.

## Difference between NumPy and Python standard List

The three most important differences between NumPy arrays and standard Python sequences are:

NumPy Array | Python Sequences (list, tuple, range) | |
---|---|---|

Creation Size | Fixed size | Python list can grow dynamically |

Datatype | Elements are of same datatype | Elements can be of multiple datatypes |

Speed | Fast as its partially written in C | Slower compared to NumPy |

## Why use Numpy: Computation time

A python list can very well perform all the operations that NumPy arrays perform; it is simply a fact that NumPy arrays are faster ā” and convenient when it comes to large complex computations.

Let's add two matrix of 9 million elements each to see the computation time.

```
import time
import numpy as np
# python standard list
list_A = [i for i in range(1,9000000)]
list_B = [j**2 for j in range(1,9000000)]
t0 = time.time()
sum_list = list(map(lambda x, y: x+y, list_A, list_B))
t1 = time.time()
list_time = t1 - t0
print ("Time taken by Python standard list is ",list_time)
# numpy array
array_A = np.arange(1,9000000)
array_B = np.arange(1,9000000)
t0 = time.time()
sum_numpy = array_A + array_B
t1 = time.time()
numpy_time = t1 - t0
print ("Time taken by NumPy array is ",numpy_time)
print("The ratio of time taken is {}".format(list_time//numpy_time))
```

```
Time taken by Python standard list is 0.6801159381866455
Time taken by NumPy array is 0.04106783866882324
The ratio of time taken is 16.0
```

You can notice that NumPy is a lot faster than the list. Below is a table to show the difference between the python standard list and NumPy computation speed on different operations.

Size of each matrix | Type of operation | Time taken by list | Time taken by numpy | Ratio (List Time / Numpy Time) |
---|---|---|---|---|

9 million | Addition (+) | 0.56s | 0.017s | 32.0 |

9 million | Subtraction (-) | 0.61s | 0.016s | 36.0 |

9 million | Multiplication (*) | 0.69s | 0.016s | 42.0 |

9 million | Division (/) | 0.51s | 0.022s | 23.0 |

From the above table, we can conclude that NumPy is a lot faster than the python standard list. In the real world when the data is in billions and the operation are more complex, this ratio will be even bigger.

## Installing NumPy

To start working with NumPy, you need to install it and you can't go wrong if you follow instructions from numpy official website.

[Optional]: Follow this guide to install python, if you don't have it already installed. It's not required but it's ideal to install python packages inside a virtual environment to avoid version-related conflicts in the future.

## Basics of Numpy

As a prerequisite, you will need to know beginner-level python. See this Python tutorial for refreshing your concepts.

In the above image array is an object of ndarray class of the NumPy library.

Whenever you work with a dataset, the first step is to get an idea about the dataset array. Four important attributes of NumPy array to get information about the dataset are:

- .ndim: returns number(int) of dimensions (axis) of the array.
- .shape: returns a tuple of
**n**rows and**m**column (n,m). - .size: returns a number(int) of total elements in the array.
- .dtype: returns an object of
**numpy.dtype**that describes the type of elements in the array.

Below is a code snippet of the attributes described above.

```
array = np.array([[1,2,3],[4,5,6]]) # Creating NumPy array from list
print("Dimension: ",array.ndim, type(array.ndim))
print("Shape: ",array.shape, type(array.shape))
print("Size: ",array.size, type(array.size))
print("Datatype: ",array.dtype, type(array.dtype))
print("Itemsize: ",array.itemsize, type(array.itemsize))
print("Data: ",array.data, type(array.data))
```

```
Dimension: 2 <class 'int'>
Shape: (2, 3) <class 'tuple'>
Size: 6 <class 'int'>
Datatype: int64 <class 'numpy.dtype[int64]'>
Itemsize: 8 <class 'int'>
Data: <memory at 0x7f2d807312b0> <class 'memoryview'>
```

### Array Creation

A NumPy array is created by passing an array-like data structure such as python's list or a tuple.

Let's create a **0-D**, **1-D**, **2-D**, and a **3-D** array from a list.

- 0-D array:
`np.array(11)`

- 1-D array:
`np.array([1, 2, 3, 4, 5])`

- 2-D array:
`np.array([[1, 2, 3], [4, 5, 6]])`

- 3-D array:
`np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])`

```
array_0D = np.array(11)
array_1D = np.array([1, 2, 3, 4, 5])
array_2D = np.array([[1, 2, 3], [4, 5, 6]])
array_3D = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(array_0D)
print(array_1D)
print(array_2D)
print(array_3D)
```

```
11
[1 2 3 4 5]
[[1 2 3]
[4 5 6]]
[[[1 2 3]
[4 5 6]]
[[1 2 3]
```

Like the python standard list, here are 7 ways to create a NumPy array.

- .array([1,2,3]): Returns array from list.
- .array((1.1,2.2,3.3)): NumPy array from tuple.
- .zeros((2,3)): Returns array filled with zeros (2 rows, 3 columns).
- .ones((2,3)): NumPy array filled with ones (2 rows, 3 columns).
- .empty((2,4)): Returns array of arbitary data of given shape and type.
- .arange((2,10,2)): Returns evenly spaced values within a given range. Similar to python range().
- .linspace((2,4,9)): Return evenly spaced 9 numbers between 2 and 4.

```
array_list = np.array([1,2,3], dtype=int) # From List
array_tuple = np.array((1.1,2.2,3.3)) # From Tuple
array_zeroes = np.zeros((2,3)) # Array of zeroes: 2 rows and 3 columns
array_ones = np.ones((2,3)) # Array of ones: 2 rows and 3 columns
array_empty = np.empty((2,4)) # Array of zeroes: 2 rows and 3 columns
array_arange = np.arange(2,10,2) # Similar to python range()
array_linspace = np.linspace(2,4,9) # Array of 9 numbers between 2 and 4
```

Just like **dtype=int** parameter, you can make use of others parameters like **copy**, **order**, **subok**, **ndim**, **like**. You can explore other NumPy arrays parameters.

Let's practice some methods to create arrays

š” Tip: Use

helpto see syntax when required

```
help(np.zeros)
```

```
array([[ 0.],
[ 0.]])
>>> s = (2,2)
>>> np.zeros(s)
array([[ 0., 0.],
[ 0., 0.]])
>>> np.zeros((2,), dtype=[('x', 'i4'), ('y', 'i4')]) # custom dtype
array([(0, 0), (0, 0)],
dtype=[('x', '<i4'), ('y', '<i4')])
```

Create a **1D** array of ones.

```
arr = np.ones(9)
print(arr)
print(arr.dtype)
```

```
[1. 1. 1. 1. 1. 1. 1. 1. 1.]
float64
```

Notice that, by default, NumPy creates a data type **float64**. Let's provide dtype explicitly.

```
arr = np.ones(9, dtype=int)
print(arr)
print(arr.dtype)
```

```
[1 1 1 1 1 1 1 1 1]
int64
```

Create a **4x3** array of **zeroes**.

```
arr = np.ones((4,3), dtype=int)
print(arr)
```

```
[[1 1 1]
[1 1 1]
[1 1 1]
[1 1 1]]
```

Create an array of **integers between 3 to 7**.

```
arr = np.arange(4,7)
print(arr)
```

```
[4 5 6]
```

Create an array of integers from **5 to 20 with a step of 2**

```
arr = np.arange(5,21,2)
print(arr)
```

```
[ 5 7 9 11 13 15 17 19]
```

Create an array of **random integers of size 10**.

```
arr = np.random.randint(5,size=10)
print(arr)
```

```
[3 2 2 0 4 0 1 3 2 0]
```

Create an array of **random integers between 6 and 9 of size 10**.

```
arr = np.random.randint(7,9,size=10)
print(arr)
```

```
[8 8 7 7 8 8 8 7 7 7]
```

Create a **2x3** 2D array of random numbers.

```
arr = np.random.random([2,3])
print(arr)
```

```
[[0.9664729 0.33623868 0.52633769]
[0.80454667 0.68146984 0.08063325]]
```

Create an array of **size 10 between 1.5 and 2**.

```
arr = np.linspace(1.5,2,10)
print(arr)
```

```
[1.5 1.55555556 1.61111111 1.66666667 1.72222222 1.77777778
1.83333333 1.88888889 1.94444444 2. ]
```

That's all for the basic ways of creating arrays. You can also explore these other 4 ways to create arrays as well:

- .full(): Create a constant array of any number ānā
- .tile(): Create a new array by repeating an existing array for a particular number of times
- .eye(): Create an identity matrix of any dimension
- .random.randint(): Create a random array of integers within a particular range

### Basic Operations

NumPy can perform a variety of operations, the very basics include, addition, subtraction, and multiplication. Below are a few basic operations that can be done in NumPy without using loops.

**Create** a NumPy array to store the marks of 5 students.

```
marks = [1, 2, 3, 4, 5]
marks_np = np.array(marks)
print(marks_np)
```

```
[1 2 3 4 5]
```

**Add** marks of 5 subjects of two different students.

```
marks_A = [10,20,10,20,14]
marks_B = [23,12,43,12,43]
marks_np_A = np.array(marks_A)
marks_np_B = np.array(marks_B)
total = marks_np_A + marks_np_B # Add using + operator
print(total)
```

```
[33 32 53 32 57]
```

**Convert** weight of 5 students from kg to gram

```
weight = [45, 55, 53, 63, 60] # In KG
weight_np = np.array(weight)
weight_in_gram = weight_np * 1000 # 1kg = 1000gm
print(weight_in_gram)
```

```
[45000 55000 53000 63000 60000]
```

**Calculate** the BMI of 5 students. To calculate BMI we need

- Two arrays of height and weight
- Apply the formulae
**weight_in_kg / (height_in_m ** 2)**

```
heights_in_inch = [71,72,73,74,75]
weights_in_lbs = [195, 180, 250, 230, 200]
```

First, let's convert height from inch to meter and weight lbs to kg

```
height_in_m = np.array(heights_in_inch) * 0.0254
weight_in_kg = np.array(weights_in_lbs) * 0.453592
```

Now, we have converted the array into the right units, let's calculate BMI

```
BMI = weight_in_kg / (height_in_m ** 2)
print("BMI",BMI)
```

```
BMI [27.19667848 24.41211827 32.98315848 29.52992539 24.99800911]
```

Here is a list of 5 common basic functions in NumPy ndarray:

- .sum: returns sum of elements over a given axis
- .min: return minimum number along a given axis.
- .max: return maximum number along a given axis.
- .cumsum: return cumulative sum of elements along a given axis.
- .mean: return average of elements along a given axis.

NumPy also provides universal functions like **sin**, **cos**, and **exp**, these are also called **ufunc**.

### Indexing, Slicing, and Iterating

```
bmi_first_element = BMI[0] #First Element
bmi_last_element = BMI[1] # second element
bmi_first_five_elements = BMI[0:5] # elements 1-5
bmi_last_five_elements = BMI[-1:] # elements 1-5 from the last
```

Filter BMI array where BMI > 23

```
# Conditional Filter
BMI_filtered = BMI[BMI > 23]
print(BMI_filtered)
```

```
[27.19667848 24.41211827 32.98315848 29.52992539 24.99800911]
```

Now you know the basics to work with a NumPy array and you should be able to create arrays and perform operations on them.

You can also checkout these tutorials on: