Special Report: Network Provisioning
To place current network limitations in perspective, consider a 4 Terabyte
data transfer from North Carolina State University to ORNL, a typical daily
output from a high-end supercomputer. Initial measurements show a bandwidth
of 10 Mbps between the end nodes, which would thus require about 40 days
to transfer the data if TCP can be optimized and executed continuously with
very low losses. The infrastructure has been upgraded to equip the end hosts
with Gigabit Ethernet (GigE) cards and connect them via Internet2 and ESnet
via Gbps connections. In reality, today's TCP implementations deliver only
about 400 Mbps to the user under good conditions over wide-area Gbps connections.
Thus, this scheme has a best-case transfer time of roughly a day, assuming
suitable un-congested bandwidth exists. If the end nodes could be upgraded
to 10 GigE cards and links can be upgraded to OC192, the best possible transfer
time is about one hour, but only if the entire bandwidth is provided for
this transfer and the hosts are equipped with the required transport and
middleware modules. Hence, to support such transfers we require: (a) an
infrastructure that provides dedicated channels of 10Gbps or higher, and
(b) network technologies that can provide link bandwidths at the application
level. In general the underlying infrastructure must be upgraded to meet
the requirement. But, simply upgrading the network links is not adequate
since current off-the-shelf transport protocols are not adequate to achieve
throughput that match link rates. To highlight the Issues, consider an old
example in Figure 3 that shows TCP throughput of a large data transfer over
an OC12 link between ORNL and NERSC. The link rate is about 620 Mbps but
TCP achieved only 20 Mbps after 50 seconds: initial losses prematurely terminated
the slow-start, and the subsequently TCP spent most of its time in recovering
as per the Additive Increase Multiplicative Decrease (AIMD) scheme. While
this limitation can be easily rectified by employing parallel-TCP or adapting
the TCP dynamics, it is archetypical of the issues that should be paid close
attention to when the infrastructure is upgraded. In particular, the transport
methods may not scale with the link rates and it is very important to optimize
them to the specific infrastructure. Figure 3. Premature termination of
TCP slow-start severely limits the achievable throughput. Collaborative
visualization of dynamic objects does not need extraordinary amounts of
bandwidth (30-50 Mbps is often adequate), but it does impose a different
type of dynamic constraint on the throughput. That is, an interactive visualization
stream can not wait through a normal "TCP slow- start" bandwidth ramp-up;
it should be capable of starting and stopping in response to interactive
requests in tens or at most hundreds of milliseconds, irrespective of the
congestion levesl. In contrast, and as Figure 3 demonstrates, TCP can require
tens or hundreds of seconds to achieve full speed. An interactive visualization
that responded to requests for fastforward, rewind, jog, and play with this
sort of latency would be unacceptable. As another example, consider the
remote control of an instrument (an electron microscope for example or neutron
goniometer). Although it may seem obvious, remote instrument control requires
a stable control loop. This in turn requires a tight control of the packet-arrival-time
jitter, something TCP is famously unable to provide. To ensure smooth control
of end devices, a computation or an experiment, it is important to send
control messages quickly and without jitter. Jitter introduces high frequency
components in the control signals that destabilize the control loops, and
as a result, controlled objects (including devices, instruments, visualizations,
and computations) may be damaged or driven into undesired regions. TCP is
unsuited to support remote control loops due to its highly non-linear and
abrupt dynamics in presence of even small amount of losses. In Figure 4,
we show delay measurement of fixed-sized control messages sent at regular
intervals from ORNL to University of Oklahoma. Indeed, TCP can be analytically
shown to contain chaotic dynamics, which makes it very difficult to deploy
it for supporting control-loops. Another complication arises over the Internet
since the chaotic TCP dynamics are often mixed in with its response to the
inherent randomness of traffic dynamics. Figure 4. Delay measurements in
seconds for fixed sized messages (10K) sent at regular intervals over the
Internet The main problems with TCP dynamics are due its congestion control
part, which can be circumvented if the congestion is avoided altogether
by using dedicated bandwidth pipes. Even so, TCP exhibits utilization problems
due to the bandwidth unutilized within the "teeth" of its sawtooth profile
for the congestion window; in such a case it will be more efficient to use
a different class of protocols that incorporate certain TCP properties.
Finally, even bandwidth requirements can be difficult to deal with when
they are so large that it is not cost-effective to purchase and keep the
bandwidth available 24 hours a day, seven days a week, 365 days a year.
The approach of dynamic provisioning is an effective way of addressing this
issue but such a capability requires newer ways of configuring the networks,
arbitrating the bandwidth requests, setting up and tearing down of the dedicated
channels, and matching the transport and middleware with the provisioned
channels.
Previous Next Table
of Content for report: Network Provisioning Home