Issuu on Google+

ivanidris.net

Portfolio analysis with Pandas for the win I saw a PyCon presentation about pandas. Pandas is a data analysis Python library, which works with timeseries data and handles missing data automatically. It is based on NumPy and should work well together with for instance scikits.statsmodels.

Pandas correlation A Panda DataFrame is a matrix and dictionary-like data structure. In fact, it is the central data structure in Pandas and you can apply all kinds of timeseries operations on it. It is quite common to have a look at the correlation matrix of a portfolio. So I did that for a number of CSV files containing end-of-day price data, although one can argue, that it is a bit pointless. First, I created the DataFrame with Pandas for each symbol’s daily returns. Then I joined these on the date. At the end the correlation was printed and plot shown. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

... for symbol in symbols: dates,close = loadtxt(fileDir + '/' + last = len( close ) - 1

symbol + '.csv', delimiter=',', usecols=(1,6)

data = { symbol : diff( close ) / close[ : last ] } newdates = dates[ : last ] dates = Index([datetime.fromordinal(int(d)) for d in newdates]) df = DataFrame(data, index=dates) if len( all ) == 0: all = df else: all = all.join( df ) print all.corr() all.plot() legend(symbols) show() ...

Check out NumPy Beginner's Guide for more information about NumPy.


NumPyPandas