python h 5 pyファイルの読み書き(転載)

4889 ワード

原文リンク、みんなはやはり原文を見ましょう.(個人情報の保存のみで、嫌なら噴き出さないでください)
 
1. Creating HDF5 files
      We first load the  numpy  and  h5py  modules
      
import numpy as np
import h5py

Now mock up some simple dummy data to save to our file.
d1 = np.random.random(size = (1000,20))
d2 = np.random.random(size = (1000,200))
print d1.shape, d2.shape

output:(1000, 20) (1000, 200)

The first step to creating a HDF5 file is to initialise it. It uses a very similar syntax to initialising a typical text file in numpy. The first argument provides the filename and location, the second the mode. We’re writing the file, so we provide a w for write access.
hf = h5py.File('data.h5', 'w')

This creates a file object,  hf , which has a bunch of associated methods. One is  create_dataset , which does what it says on the tin. Just provide a name for the dataset, and the numpy array.
hf.create_dataset('dataset_1', data=d1)
hf.create_dataset('dataset_2', data=d2)

All we need to do now is close the file, which will write all of our work to disk.
hf.close()

2.  Reading HDF5 files
     To open and read data we use the same  File  method in read mode, r.
      
hf = h5py.File('data.h5', 'r')

   To see what data is in this file, we can call the  keys()  method on the file object.
hf.keys()
[u'group1']

 We can then grab each dataset we created above using the  get  method, specifying the name.
n1 = hf.get('dataset_1')
n1

This returns a HDF5 dataset object. To convert this to an array, just call numpy’s array method.
n1 = np.array(n1)
n1.shape
(1000, 20)
hf.close()

3. Groups
Groups are the basic container mechanism in a HDF5 file, allowing hierarchical organisation of the data. Groups are created similarly to datasets, and datsets are then added using the group object.
d1 = np.random.random(size = (100,33))
d2 = np.random.random(size = (100,333))
d3 = np.random.random(size = (100,3333))
hf = h5py.File('data.h5', 'w')
g1 = hf.create_group('group1')
g1.create_dataset('data1',data=d1)
g1.create_dataset('data2',data=d1)

We can also create subfolders. Just specify the group name as a directory format.
g2 = hf.create_group('group2/subfolder')
g2.create_dataset('data3',data=d3)

As before, to read data in irectories and subdirectories use the  get  method with the full subdirectory path.
group2 = hf.get('group2/subfolder')
group2.items()
[(u'data3', )]
group1 = hf.get('group1')
group1.items()
[(u'data1', ),
 (u'data2', )]
n1 = group1.get('data1')
np.array(n1).shape
(100, 33)
hf.close()

4.  Compression
To save on disk space, while sacrificing read speed, you can compress the data. Just add the  compression  argument, which can be either  gziplzf  or  szipgzip  is the most portable, as it’s available with every HDF5 install,  lzf  is the fastest but doesn’t compress as effectively as  gzip , and  szip  is a NASA format that is patented up; if you don’t know about it, chances are your organisation doesn’t have the patent, so avoid.
For  gzip  you can also specify the additional  compression_opts  argument, which sets the compression level. The default is 4, but it can be an integer between 0 and 9.
hf = h5py.File('data.h5', 'w')

hf.create_dataset('dataset_1', data=d1, compression="gzip", compression_opts=9)
hf.create_dataset('dataset_2', data=d2, compression="gzip", compression_opts=9)

hf.close()