Loading and Processing Big Matrices on Python/numpy for NNMF

I was loading a quite big dataset (3446×14807 floats) to a Python / Numpy matrix to perform a Nonnegative Matrix Factorization (NNMF). First I tried generating the Python source code from another process and inserting literally the whole matrix, something like:

w = matrix([
[0.0072992700729927,0.0072992700729927,0.0291970802919708,0.0145985401459854,
0.0072992700729927,0.0072992700729927,0.0072992700729927,0.0072992700729927, ... ,])

But the process crashed with a cryptic MemoryError while loading the matrix. Then I tried using

loadtxt()

Numpy’s method but the same thing happened. I found this thread. I’m using both a XP laptop and a Win2003 Server with Python 2.5.1 and 2.5.2 respectevly, both 32bits. This didn’t seemed the problem since the matrix could be constructed with ones or zeros and enough memory was available (the server has 8gigs of RAM). It rather looked like the problem was while building the matrix. So I found that building the array first and then adding the values did the trick:


JW = zeros((len(jobs),len(words)), float)
f=open(argv[1]+'_JW.csv')
line = 0
for l in f:
  vals = l.split(',')	
  for i in range(0,len(vals)):
    JW[line,i] = float(vals[i])
  line += 1
f.close()