# Algorithms for Data Migration

This thesis (Algorithms for Data Migration) is concerned with the issue associated with data storage and management. A large storage server includes several 100s of disks. To balance the burden throughout hard disks, the machine computes data layouts which are usually altered in accordance with the amount of work. As workloads change with time, the system recomputes the data layout, and rearranges the data objects based on the latest layout. We determine the challenge of computing an effective data migration strategy that changes an initial structure to a target structure. We define the data migration difficulty as follows: for every item, there are a pair of hard disks which have the item (sources) and a group of hard drives that are looking to get the item (destinations). We strive to migrate the data items from the sources to destinations. The important restriction is that every hard drive can engage in just one transfer at any given time. The most typical goal is to reduce the makespan, which is the time when we complete all the migrations. The main problem is NP-hard, and…

Contents

1 Introduction
1.1 Data Migration Problem
1.2 Models and De nitions
1.3 Contributions
1.4 Outline of Thesis
2 Related Work
2.1 Previous Work in Data Migration
2.3 Minimizing Total Completion Time
2.4 Heterogenous Networks
2.5 Other Related Work

3 Data Migration with Cloning
3.1 Motivation
3.2 9.5-approximation algorithm
3.2.2 9.5-approximation Algorithm Variants
3.3 Heuristics
3.3.1 Edge Coloring Based Heuristic
3.3.2 Matching Based Heuristic
3.4 Bounded Bandwidth Model
4 The Correspondence Problem
4.1 Motivation
4.2 Problem De nition
4.3 Algorithms
4.3.1 Simple min max matching
4.3.2 Simple min sum matching
4.3.3 Complex min sum matching
5 Experimental Study
5.1 ExperimentalFramework
5.1.1 User Request Distributions
5.1.2 Parameters and Layout Creation
5.2 Results
5.2.1 Different correspondence methods
5.2.2 Different data migration algorithms
Minimizing Sum of Completion Times
6.1 Problem De nition
6.2 Algorithms for Total Vertex Completion Time
6.2.1 NP-hardness
6.2.2 Integer programming formulation
6.2.3 When edges have unit lengths
6.2.4 When edges have arbitrary integer lengths
6.3 Algorithm for Total Edge Completion Time