Tuesday, January 26, 2010

When is a graph not a graph...

I continued to be surprised at the number of times a group of people can look at the same graph and come to, not only different but, directly opposing conclusions.



Our department was recently contacted because one office was experiencing "slower than usual" transport speeds. Since my client heavily depends on transferring files betweeen offices and between client, a report of slow transfer speeds gets shoved to the top of stack.



I asked what speeds they usually experience, and what they are experiencing now. The response consisted of simply this graph:




image001.png



So...do you see the problem? Me neither. There's not even a label on the Y axis as to what this graph represents. It's certainly not megabits/sec, or even megabytes/sec. Possibly bytes/sec. So I called backed to ask and found out it's the number of files transferred per hour.



Now that we have that settled. It actually appears that we have recently transferred more files per hour than in the recent past. So another call is placed to clarify the perceived slowness. "It takes longer." "You're transferring twice as many files." "It shouldn't take this long." "How long should it take?". "It should be faster."



At this point I needed to prove (at least to myself) that there was definitively no network issue or--if there was--to uncover it.



At first glance, a chart of bandwidth utilization didn't reveal anything telling. There were no errors, no buffer allocation problems, latency was well within tolerance. However, one noticable artifact is that the bandwidth seem to be stair stepped. Many lines peaked at 0.5 mbps, with several more around 1.5 mbps, a few lines at 3.5 mpbs, and none more than 4 mbps. Since this was a dedicated T3 circuit, I would have expected more random spikes.



201001-OR-Bandwidth.PNG



Since access to the far end was limited, we could only test in one direction. It was now time to dig into the application and how, exactly, were files being transferred. We found out (1) files are being transferred via FTP, (2) a script kicks off every other hour to send all files in a given directory, (3) a third script is kicked off on demand.



This is now beginning to make sense. With limited visibility on the far end, we decided to push a 1 gig file (hope they have space). Bandwidth raised to 2 mbps and platuaed. While that was running, a second transfer was started. Bandwidth raised to 4 mbps. We continued adding multiple threads, up to 10 consecutive 1 gig files were being pushed. The circuit climbed to an almost predicatble rate of 20 mbps.

201001-OR-Bandwidth-multistream.PNG



With this evidence we were able to contact the server own on the far end. Indeed, his server was limiting the per session throughput to 2 mbps.



But this really wasn't a lesson in technical troubleshooting. This was a lesson in investigative work. A nameless graph and seemingly contradicts the end-user reports. Armed with limited capabilities, we were able to diagnose the problem and, better yet, propose a new solution.



On to the next...

No comments:

Post a Comment