Numpy

Meist hat man nach in einer Auswertung Datenpunkte, die verarbeitet werden müssen
Numpy ist eine Python-Bibliothek, die den Umgang mit Datenpunkten enorm vereinfacht

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

Inhalt¶

Grundlagen¶

import numpy as np

Grunddatentyp von Numpy: das Array
Kann man sich als effizientere Liste vorstellen
Idee von Numpy: Man kann ein Array ähnlich wie eine Zahl verwenden. Operationen werden dann auf allen Elementen ausgeführt
Am besten versteht man das mit einigen Beispielen:

# convert list to array
x = np.array([1, 2, 3, 4, 5])

2 * x

array([ 2,  4,  6,  8, 10])

x**2

array([ 1,  4,  9, 16, 25])

x**x

array([   1,    4,   27,  256, 3125])

np.cos(x)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219])

Bei großen Datensätzen relevant: Laufzeit!

%%timeit
xs = [42] * 100000
xs2 = [x**2 for x in xs]

15.2 ms ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit 
x = np.full(100000, 42)
x2 = x**2

224 µs ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Achtung: Man braucht das cos aus numpy!

import math
math.cos(x)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-87cdf88e21ce> in <module>
      1 import math
----> 2 math.cos(x)

TypeError: only size-1 arrays can be converted to Python scalars

Selbstgeschriebene Funktionen, die nur für eine Zahl geschrieben wurden, funktionieren oft ohne Änderung mit Arrays!

def poly(y):
    return y + 2 * y**2 - y**3

poly(x)

array([  2,   2,  -6, -28, -70])

poly(np.pi)

-8.125475224531307

# this also works:
def poly(x):
    return x + 2 * x**2 - x**3

poly(x)

array([  2,   2,  -6, -28, -70])

Das erlaubt es einem unter anderem sehr leicht physikalische Formeln auf seine Datenpunkte anzuwenden.

Arrays können beliebige Dimension haben:

# two-dimensional array
y = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

y + y

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])

Das erlaubt es z.B. eine ganze Tabelle als ein Array abzuspeichern.

Nützliche Eigenschaften¶

a = np.array([1.5, 3.0, 4.2])
b = np.array([[1, 2], [3, 4]])

print(a.ndim, a.shape, a.size, a.dtype)
print(b.ndim, b.shape, b.size, b.dtype)

1 (3,) 3 float64
2 (2, 2) 4 int64

Erstellen von Arrays¶

Es gibt viele nützliche Funktionen, die bei der Erstellung von Arrays helfen:

np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.ones((5, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

np.linspace(0, 1, 11)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

# like range() for arrays:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.logspace(-4, 5, 10)

array([1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03,
       1.e+04, 1.e+05])

Numpy Indexing¶

Numpy erlaubt einem sehr bequem bestimmte Elemente aus einem Array auszuwählen

x = np.arange(0, 10)

# like lists:
x[4]

4

# all elements with indices ≥1 and <4:
x[1:4]

array([1, 2, 3])

# negative indices count from the end
x[-1], x[-2]

(9, 8)

# combination:
x[3:-2]

array([3, 4, 5, 6, 7])

# step size
x[::2]

array([0, 2, 4, 6, 8])

# trick for reversal: negative step
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

Indexing1D

y = np.array([x, x + 10, x + 20, x + 30])
y

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

# comma between indices
y[3, 2:-1]

array([32, 33, 34, 35, 36, 37, 38])

# only one index ⇒ one-dimensional array
y[2]

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

# other axis: (: alone means the whole axis)
y[:, 3]

array([ 3, 13, 23, 33])

# inspecting the number of elements per axis:
y.shape

(4, 10)

Indexing2D

Ausgewählten Elementen kann man auch direkt einen Wert zuweisen

y

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

y[:, 3] = 0
y

array([[ 0,  1,  2,  0,  4,  5,  6,  7,  8,  9],
       [10, 11, 12,  0, 14, 15, 16, 17, 18, 19],
       [20, 21, 22,  0, 24, 25, 26, 27, 28, 29],
       [30, 31, 32,  0, 34, 35, 36, 37, 38, 39]])

Man kann Indexing sogar gleichzeitig auf der linken und rechten Seite benutzen

y[:,0] = x[3:7]
y

array([[ 3,  1,  2,  0,  4,  5,  6,  7,  8,  9],
       [ 4, 11, 12,  0, 14, 15, 16, 17, 18, 19],
       [ 5, 21, 22,  0, 24, 25, 26, 27, 28, 29],
       [ 6, 31, 32,  0, 34, 35, 36, 37, 38, 39]])

Transponieren des Arrays kehrt die Reihenfolge der Indizes um:

y

array([[ 3,  1,  2,  0,  4,  5,  6,  7,  8,  9],
       [ 4, 11, 12,  0, 14, 15, 16, 17, 18, 19],
       [ 5, 21, 22,  0, 24, 25, 26, 27, 28, 29],
       [ 6, 31, 32,  0, 34, 35, 36, 37, 38, 39]])

y.shape

(4, 10)

y.T

array([[ 3,  4,  5,  6],
       [ 1, 11, 21, 31],
       [ 2, 12, 22, 32],
       [ 0,  0,  0,  0],
       [ 4, 14, 24, 34],
       [ 5, 15, 25, 35],
       [ 6, 16, 26, 36],
       [ 7, 17, 27, 37],
       [ 8, 18, 28, 38],
       [ 9, 19, 29, 39]])

y.T.shape

(10, 4)

Masken¶

Oft will man Elemente auswählen, die eine bestimmte Bedingung erfüllen.

Hierzu erstellt man zuerst eine Maske (Arrays aus True/False-Werten).

Diese kann man in eckigen Klammern übergeben.

a = np.linspace(0, 2, 11)
b = a**2

print(a >= 1)

print(a[a >= 1])

[False False False False False  True  True  True  True  True  True]
[1.  1.2 1.4 1.6 1.8 2. ]

Reduzieren von Arrays¶

Viele Rechenoperationen reduzieren ein Array auf einen einzelnen Wert

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.sum(x)

45

np.prod(x)

0

np.mean(x)

4.5

Standardabweichung

np.std(x)

2.8722813232690143

Fehler des Mittelwerts (geht auch einfacher):

np.std(x, ddof=1) / np.sqrt(len(x))

0.9574271077563381

Schätzer der Standardabweichung

np.std(x, ddof=1)

3.0276503540974917

Differenzen zwischen benachbarten Elementen

z = x**2
np.diff(z)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17])

Input / Output¶

Einlesen aus Textdateien: genfromtxt

Sie gibt den Inhalt einer Textdatei als Array zurück.

Das Gegenstück ist savetxt.

n = np.arange(11)
x = np.linspace(0, 1, 11)

np.savetxt('test.txt', [n, x])

# see exercise 1-python/6-readwrite

with open('test.txt', 'r') as f:
    print(f.read())

0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00 1.000000000000000000e+01
0.000000000000000000e+00 1.000000000000000056e-01 2.000000000000000111e-01 3.000000000000000444e-01 4.000000000000000222e-01 5.000000000000000000e-01 6.000000000000000888e-01 7.000000000000000666e-01 8.000000000000000444e-01 9.000000000000000222e-01 1.000000000000000000e+00

data = np.array([n, x])

np.savetxt('test.txt', np.column_stack([n, x]))

with open('test.txt', 'r') as f:
    print(f.read())

0.000000000000000000e+00 0.000000000000000000e+00
1.000000000000000000e+00 1.000000000000000056e-01
2.000000000000000000e+00 2.000000000000000111e-01
3.000000000000000000e+00 3.000000000000000444e-01
4.000000000000000000e+00 4.000000000000000222e-01
5.000000000000000000e+00 5.000000000000000000e-01
6.000000000000000000e+00 6.000000000000000888e-01
7.000000000000000000e+00 7.000000000000000666e-01
8.000000000000000000e+00 8.000000000000000444e-01
9.000000000000000000e+00 9.000000000000000222e-01
1.000000000000000000e+01 1.000000000000000000e+00

Man sollte aber immer erklären, was man da abspeichert:

n = np.arange(11)
x = np.linspace(0, 1, 11)

# header schreibt eine Kommentarzeile in die erste Zeile der Datei
np.savetxt('test.txt', np.column_stack([n, x]), header="n x")
with open('test.txt', 'r') as f:
    print(f.read())

# n x
0.000000000000000000e+00 0.000000000000000000e+00
1.000000000000000000e+00 1.000000000000000056e-01
2.000000000000000000e+00 2.000000000000000111e-01
3.000000000000000000e+00 3.000000000000000444e-01
4.000000000000000000e+00 4.000000000000000222e-01
5.000000000000000000e+00 5.000000000000000000e-01
6.000000000000000000e+00 6.000000000000000888e-01
7.000000000000000000e+00 7.000000000000000666e-01
8.000000000000000000e+00 8.000000000000000444e-01
9.000000000000000000e+00 9.000000000000000222e-01
1.000000000000000000e+01 1.000000000000000000e+00

Einlesen der Werte mit genfromtxt :

a, b = np.genfromtxt('test.txt', unpack=True)
a, b

(array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]))

Um die Datentypen zu erhalten, muss fmt angegeben werden:

np.savetxt(
    'test.txt',
    np.column_stack([n, x]),
    fmt=['%d', '%.4f'],       # first column integer, second 4 digits float
    delimiter=',',
    header='n,x',
)

data = np.genfromtxt(
    'test.txt',
    dtype=None,    # guess data types
    delimiter=',', 
    names=True,
)

data ist ein besonderes array, das sich ähnlich wie ein dict verhält:

data

array([( 0, 0. ), ( 1, 0.1), ( 2, 0.2), ( 3, 0.3), ( 4, 0.4), ( 5, 0.5),
       ( 6, 0.6), ( 7, 0.7), ( 8, 0.8), ( 9, 0.9), (10, 1. )],
      dtype=[('n', '<i8'), ('x', '<f8')])

data['n'], data.shape, data.dtype

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
 (11,),
 dtype([('n', '<i8'), ('x', '<f8')]))