Special Report: Network Provisioning

To place current network limitations in perspective, consider a 4 Terabyte data transfer from North Carolina State University to ORNL, a typical daily output from a high-end supercomputer. Initial measurements show a bandwidth of 10 Mbps between the end nodes, which would thus require about 40 days to transfer the data if TCP can be optimized and executed continuously with very low losses. The infrastructure has been upgraded to equip the end hosts with Gigabit Ethernet (GigE) cards and connect them via Internet2 and ESnet via Gbps connections. In reality, today's TCP implementations deliver only about 400 Mbps to the user under good conditions over wide-area Gbps connections. Thus, this scheme has a best-case transfer time of roughly a day, assuming suitable un-congested bandwidth exists. If the end nodes could be upgraded to 10 GigE cards and links can be upgraded to OC192, the best possible transfer time is about one hour, but only if the entire bandwidth is provided for this transfer and the hosts are equipped with the required transport and middleware modules. Hence, to support such transfers we require: (a) an infrastructure that provides dedicated channels of 10Gbps or higher, and (b) network technologies that can provide link bandwidths at the application level. In general the underlying infrastructure must be upgraded to meet the requirement. But, simply upgrading the network links is not adequate since current off-the-shelf transport protocols are not adequate to achieve throughput that match link rates. To highlight the Issues, consider an old example in Figure 3 that shows TCP throughput of a large data transfer over an OC12 link between ORNL and NERSC. The link rate is about 620 Mbps but TCP achieved only 20 Mbps after 50 seconds: initial losses prematurely terminated the slow-start, and the subsequently TCP spent most of its time in recovering as per the Additive Increase Multiplicative Decrease (AIMD) scheme. While this limitation can be easily rectified by employing parallel-TCP or adapting the TCP dynamics, it is archetypical of the issues that should be paid close attention to when the infrastructure is upgraded. In particular, the transport methods may not scale with the link rates and it is very important to optimize them to the specific infrastructure. Figure 3. Premature termination of TCP slow-start severely limits the achievable throughput. Collaborative visualization of dynamic objects does not need extraordinary amounts of bandwidth (30-50 Mbps is often adequate), but it does impose a different type of dynamic constraint on the throughput. That is, an interactive visualization stream can not wait through a normal "TCP slow- start" bandwidth ramp-up; it should be capable of starting and stopping in response to interactive requests in tens or at most hundreds of milliseconds, irrespective of the congestion levesl. In contrast, and as Figure 3 demonstrates, TCP can require tens or hundreds of seconds to achieve full speed. An interactive visualization that responded to requests for fastforward, rewind, jog, and play with this sort of latency would be unacceptable. As another example, consider the remote control of an instrument (an electron microscope for example or neutron goniometer). Although it may seem obvious, remote instrument control requires a stable control loop. This in turn requires a tight control of the packet-arrival-time jitter, something TCP is famously unable to provide. To ensure smooth control of end devices, a computation or an experiment, it is important to send control messages quickly and without jitter. Jitter introduces high frequency components in the control signals that destabilize the control loops, and as a result, controlled objects (including devices, instruments, visualizations, and computations) may be damaged or driven into undesired regions. TCP is unsuited to support remote control loops due to its highly non-linear and abrupt dynamics in presence of even small amount of losses. In Figure 4, we show delay measurement of fixed-sized control messages sent at regular intervals from ORNL to University of Oklahoma. Indeed, TCP can be analytically shown to contain chaotic dynamics, which makes it very difficult to deploy it for supporting control-loops. Another complication arises over the Internet since the chaotic TCP dynamics are often mixed in with its response to the inherent randomness of traffic dynamics. Figure 4. Delay measurements in seconds for fixed sized messages (10K) sent at regular intervals over the Internet The main problems with TCP dynamics are due its congestion control part, which can be circumvented if the congestion is avoided altogether by using dedicated bandwidth pipes. Even so, TCP exhibits utilization problems due to the bandwidth unutilized within the "teeth" of its sawtooth profile for the congestion window; in such a case it will be more efficient to use a different class of protocols that incorporate certain TCP properties. Finally, even bandwidth requirements can be difficult to deal with when they are so large that it is not cost-effective to purchase and keep the bandwidth available 24 hours a day, seven days a week, 365 days a year. The approach of dynamic provisioning is an effective way of addressing this issue but such a capability requires newer ways of configuring the networks, arbitrating the bandwidth requests, setting up and tearing down of the dedicated channels, and matching the transport and middleware with the provisioned channels.

Previous    Next    Table of Content for report: Network Provisioning    Home

Network Provisioning

 

 

 

Photuris.com - Optical Data Networking