About | Formal basis | Workpackages | Documents | Meetings | Partners | Contact list |

 »   Workpackages

 »   WP 1.3 Data archivisation system

Leader »
Jerzy M. Zaczek, PhD
ACK CYFRONET AGH, Kraków
Co-executors » TASK, WCNS
Start date » 1 Dec, 2002
Ending date » 31 Oct, 2004
[ Timetable (in Polish)  | Team ]
Sort task description

The problems of storing, securing and providing accessibility of the continuously expanding data generated by distributed users environment using nationwide computational resources are among the most important issues concerning the SGI cluster utilization. For the needs of the national cluster a natural solution would be to create a data management hierarchical meta-system (MHSM - Meta Hierarchical Storage Management) that would work basing on local HSM systems that currently exist in Gdańsk, Kraków, Poznań and Wrocław and are planned to be established in Łódż and Warszawa (IMGW). Work relating to the MHSM system creation will concentrate on the following issues:

  • Functional development of existing local HSM systems
  • Meta database (MDB) for distributed data localization and a common API interface for local HSM systems
  • Integration, testing and improvements

Functional development of local HSM systems consists in creating a subsystem for fast access to huge tape drive files and a subsystem calculating file access time. The aim of the first sub-task is to design and implement a subsystem assuring fast access to huge files stored on the magnetic tape drives, using the file division strategy. For this strategy to be realized, a subsystem residing between a client application and an HSM system must be designed. During the process of writing the files onto a tape this subsystem will split them into pieces, transparently for the user. Information about files fragmentation will be stored in an index database. To keep the data transmission rate constant and make the transmission itself shorter, prefetching will be applied additionally. The prefetching method consists in fetching the subsequent subfile to the system cache simultaneously to the transmission of the previous one from the system cache to the client application. The aim of the second sub-task is to design and implement a subsystem able to answer what is the access time to a file from the given HSM. The answer depends on many factors like HSM load, queue length, number and throughput of the drives, file size and so on.