Ruby: Error with Gems 1.2.0 + Win32_Process + $KCODE + DRb

This is really awkward. The problem was found after moving some scripts that worked fine on one server to another server with rubygems recently updated. The script normally spawns several processes that are DRb Servers, but it failed silently. After some testing I found that with every other thing being equal (Ruby 1.8.6 patchlevel 111 and win32-process 0.5.3 on both machines) the script crashed with gem 1.2.0 and not with 0.9.4.

The script iterates calling Process.fork with a code block that creates the new DRb server. The DRb object is loaded via require at the begining of the script. I’m posting here a reduced example for clarity.

puts "iniciando"
require 'rubygems'
require 'drb'
puts 'loading pingpong'
require 'pingpong'
puts 'loaded pingpong'
puts 'loading process '
require 'win32/process'
puts 'loaded process'

def start_server(port)
  uri="druby://0.0.0.0:#{port}"
  trap("INT"){puts("Interrupted"); DRb.thread.exit}
  DRb.start_service(uri,PingPong.new)
  puts("Listening #{uri}")
  DRb.thread.join
end 

puts "here #{ARGV.inspect}"
2.times{|port| puts("Sending #{port}");Process.fork{start_server(5850+port)}}
puts "out"

The PingPong class is defined in the following script. Note the $KCODE declaration since it turned out to be the problem:

$KCODE = 'UTF-8'

class PingPong
  def ping
    "pong"
  end
end

In the current implementation of win32_process, Process.fork works by creating a new Windows process calling Ruby with the same script and passing an argument indicating which child is creating. This is very important as indicated in the documentation since one would expect that each child’s execution started at the code block passed to the fork.

The process failed like this with gem 1.2.0 (but worked perfectly on gem 0.9.4:)

C:\>test_fork_simple.rb
iniciando
loading pingpong
loaded pingpong
loading process
loaded process
here []
Sending 0
Sending 1
out

After some hours debugging I found that the problem arised when the $KCODE assigment is done on the child processes; if the assigment is deleted the processes are created correctly:

C:\>test_fork_simple.rb
iniciando
loading pingpong
loaded pingpong
loading process
loaded process
here []
Sending 0
Sending 1
out

iniciando
iniciando
loading pingpong
loading pingpong
loaded pingpong
loading process
loaded pingpong
loading process
loaded process
here ["child#0"]
Sending 0
loaded process
here ["child#1"]
Sending 0
Sending 1
Listening druby://0.0.0.0:5850
Listening druby://0.0.0.0:5851

But in the real programs the line can’t be left out since it’s critical. Oddly enough, the workaround consisted on switching the ‘require’ statments for the child class and win32/process:

#Crashes silently:
require 'pingpong'
require 'win32/process'

#Works!
require 'win32/process'
require 'pingpong'

It’s also very strange that, when the script fails, the child processes are not even created (as you can see on the first output, the “iniciando” message is never sent). I’ll post an update if I found what’s happening here.

Loading and Processing Big Matrices on Python/numpy for NNMF

I was loading a quite big dataset (3446×14807 floats) to a Python / Numpy matrix to perform a Nonnegative Matrix Factorization (NNMF). First I tried generating the Python source code from another process and inserting literally the whole matrix, something like:

w = matrix([
[0.0072992700729927,0.0072992700729927,0.0291970802919708,0.0145985401459854,
0.0072992700729927,0.0072992700729927,0.0072992700729927,0.0072992700729927, ... ,])

But the process crashed with a cryptic MemoryError while loading the matrix. Then I tried using

loadtxt()

Numpy’s method but the same thing happened. I found this thread. I’m using both a XP laptop and a Win2003 Server with Python 2.5.1 and 2.5.2 respectevly, both 32bits. This didn’t seemed the problem since the matrix could be constructed with ones or zeros and enough memory was available (the server has 8gigs of RAM). It rather looked like the problem was while building the matrix. So I found that building the array first and then adding the values did the trick:


JW = zeros((len(jobs),len(words)), float) f=open(argv[1]+'_JW.csv') line = 0 for l in f: vals = l.split(',') for i in range(0,len(vals)): JW[line,i] = float(vals[i]) line += 1 f.close()