When parsing a CASTEP formatted grid with python, you may be tempted to use numpy.loadtxt – in particular, as it allows for fast access to the numerical data afterwards. However, you need to know the number of grid points in each dimension in order to work out the proper arguments for numpy.reshape. I my office, ridiculous ideas how to get this information from the comments of the file header emerged quite quickly. Here’s the associated timing along with the disappointing result that the simplest solution actually is the fastest. (Tests on Late-2013 Mac with a 2.4 GHz i5 and SSD)
import numpy as np import subprocess as sp import timeit inPath = '2x2x2_surf_0_b-velec.dat' def method_grep(): nLines = sp.check_output(['grep', '-A1', '-m', '2', 'grid size', inPath]).split('\n')[1].split() return nLines[1], nLines[3], nLines[5] def method_unique(): return tuple(np.unique(data[:, _]).shape[0] for _ in range(3)) def method_find(): Nz = data.shape[0]/len(np.where(data[:, 0] == data[0, 0])[0]) Ny = len(np.where(data[::Nz, 1]==data[0, 1])[0]) Nx = data.shape[0]/Ny/Nz return Nx, Ny, Nz def method_file(): nextLine = False with open(inPath) as fh: while nextLine is False: line = next(fh) if 'grid size' in line: nextLine = True line = next(fh) return map(int, line.split()[1::2]) data = np.loadtxt(inPath) print 'grep ', timeit.timeit(";method_grep()";, setup=";from __main__ import method_grep";, number=10)/10 print 'unique ', timeit.timeit(";method_unique()";, setup=";from __main__ import method_unique,data";, number=10)/10 print 'where ', timeit.timeit(";method_find()";, setup=";from __main__ import method_find,data";, number=10)/10 print 'file ', timeit.timeit(";method_file()";, setup=";from __main__ import method_file";, number=10)/10
Results (time per run in seconds):
grep 0.308823895454 unique 0.0853920936584 where 0.00581040382385 file 3.71932983398e-05