top of page

SDN-Aware Data Transfer for scientific Applications

 

In the age of data-driven science, researchers face a significant hurdle: accessing and transferring geographically distributed data across multiple networks and domains. Data is often scattered across numerous sources, requiring:

  • Efficient data location and backend selection.

  • Reliable transfers from storage systems to consumers.
    Traditional methods struggle with performance, reliability, and transparency, especially for large-scale scientific applications.

 
Our Vision

 

By combining Software-Defined Networking (SDN) with advanced data management services, we aim to transform how researchers access and move data. Our approach integrates scattered data and loosely coupled infrastructures, delivering seamless and optimized data experiences.

 
The Solution: SDN-Aware Data Access Service

 

Our SDN-Aware Data Access Service introduces a paradigm shift in data transfer technology:

  • Dynamic Data Source Selection: Automatically identify and switch to the best-performing source during transfers.

  • Optimized Network Pathfinding: Continuously select the best network paths without interrupting active transfers.

  • Protocol Independence: Enable flexible transfers across various protocols for maximum compatibility.

 
How It Works

 

We model the research infrastructure as a programmable network with:

  • A consumer requesting data from multiple sources.

  • Sources and switches, represented in a dynamic, weighted graph.
    The goal? Optimize Quality of Service (QoS) by minimizing transfer time and additional delays caused by network load. Our optimization strategy considers:

  • Bandwidth and latency for determining the fastest paths.

  • Real-time traffic conditions to ensure adaptive and reliable transfers.

 
Key Benefits

 

  • Performance Boost: Shorter transfer times and reduced delays with optimized routing.

  • Reliability: Resilient transfers that adapt to changing network conditions.

  • Ease of Use: Transparent integration of data sources for researchers.

 
Empowering Data-Driven Discoveries

 

Our SDN-driven approach unlocks the potential of high-speed research networks, enabling scientists to focus on breakthroughs rather than battling data transfer limitations. By combining cutting-edge technologies and intelligent data management, we’re paving the way for a new era of collaborative research.

Motivation

  • Improve performance and reliability of data transfers with programmable networks

  • Take advantage of the high-speed research networks

  • Offer automated tools capable of eectively moving data

  • Combining data management services and SDN, scattered data and loosely coupled infrastructure can be easily and transparently integrated to provide scientists with the data they need

 

Problem statement 

  • Most scientific applications in Reaserch Infrastructures (RI) require access to geographically distributed data:

  • Data is often scattred over number of sources through a network that spans several sites and / or domains

  • Access to data is often provided by Data access service which:

    1. Locates data sources

    2. Selects suitable backend

    3. Transfers data from storage backend to consumer

 

 

The main research problem we are addressing is how to improve the existing data access services using SDN and how

to optimize QoS of large data transfers between a consumer and a set of sources streaming data from a backend

 
Infrastructure Model
 
  • We model the RI with a single consumer requesting data which can be transferred from multiple sources

  • We assume  RI uses SDN solutions, so the state of the network is available and controllable

  • The QoS optimization problem is represented as the Multiple Source Shortest Path (MSSP) problem

  • We want to discover an optimal path from a set of data sources to a destination

  • The infrastructure is modeled as a bidirectional weighted graph G(V; E)

    • V is the set of all vertices in the network

    • E all the edges or links

    • A single vertex c from V represents the consumer

    • D = {d1, d2, ..., dn}  V are the sets of data sources

    • S = {s1, s2, ..., sm}  V the switches

    • S and D are disjoint sets



  • The optimization goal of the weight function is primarily performance

  • It needs include measures of bandwidth, latency and load

 

 

SDN-Aware Data Access Service

SDN-Aware Data Access Service aims at enabling :

  • Felxible data transfer  independent from specific protocol

  • Use the best data source and autonomously switch sources if the current one is experiencing heavy load

  • Identify abd select best network path during transfer without requesting restart of the transfer

 

more details about this work can be found in:


[1] S. Koulouzis, D. Vasyunin, R.S. Cushing, A.S.Z. Belloum, Cloud Data Storage Federation for Scientific Applications, In Proceedings of the Euro-Par 2013: Parallel Processing Workshops, Lecture Notes in Computer Science, Aachen, Germany, Aug 2013.

 

[2] Cloud Federation for Sharing Scientific Data S. Koulouzis, R. Cushing, D. Vasunin, A.S.Z Belloum and M.T. Bubak 8th IEEE International Conference on eScience (eScience 2012) Chicago, Illinois, 8-12 October 2012. [poste]





  • LinkedIn Social Icon
bottom of page