Lynn's Industrial Protocols over IP

Tuesday, October 13, 2009

Cellular DNS Lookup Cost

I have been working with a customer using two cellular devices with a DDNS provider, so the 'Client/Master' device used a DNS name to connect to the remote 'Server/Slave' device.

Cost #1 - added delay
The first cost they were seeing was added time lag to open a connection.
  • To open a TCP socket between two idle (but connected) cellular devices requires each device to be "unparked" by their cell tower. So the client device would need to request a resource and be assigned it by the local cell tower - this can require 2 to 5 seconds.
  • Next, a DNS query would be sent and a response would be required. Some cellular carrier DNS servers don't seem very peppy, so this could add a few more seconds.
  • Then the client device would receive notice from the cell-tower control channel to request a resource and be "unparked"
  • Finally, after "X" seconds of delay and lag the client device would receive confirmation that the socket is open (or it might have given up to fast already & aborted).
So using a DNS name instead of an IP Address might add a few seconds delay - which might be just enough to cause problems. The customer I am working with has a client tool with a hard-coded MAX timeout of 25 seconds, and it seems to often hit this during original socket open. So they have no trouble communicating once the TCP socket is open, but it can be difficult to get the first data packets moving.

Cost #2 - added traffic
Of course the DNS query also adds raw traffic. DNS uses UDP, and a query might be 44 bytes and a response 60 bytes. So 104 bytes extra to open a TCP socket (which can be hundreds of bytes by itself).

Is this traffic important? Perhaps. Perhaps Not. Most cell plans today round up traffic hourly, so an extra 104 bytes for DNS and 150 bytes for a Modbus poll/response is still "1K of traffic".

For TCP/IP, the DNS lookup would only occur for a socket open. So regardless of the DNS name Time-To-Live, you might only see this added 104 bytes once per day or month - or anytime you open/reopen a socket.

However for UDP/IP you MIGHT see this DNS lookup for every packet. For example, a DDNS record from dyndns.org might expire in 60 seconds, so sending one 100 byte UDP packet every 5 minutes, could become 204 bytes per 5 minutes, so 2448 bytes per hour rounded up to 3K per hour or 72K per day and NOT the 49K expected.

So while UDP/IP can save a lot of traffic cost over TCP/IP, if short-lived DNS names are required then some of UDP's savings will be lost.

One solution is a custom application to do indirect DNS lookup. In other words, to directly lookup the DNS name within the application, then to ignore the DNS "Time-to-Live" and use only the cached IP address UNTIL COMMUNICATIONS FAILS - then do a new lookup. This is not how normal operating system DNS caches work - they literally honor the DNS server's time to live and would cause a new look up based on the 60 second time-to-live. Yet this is general safe for cellular end-devices since normally the dynamic IP only changes when the cell link is lost, so the remote IP is unlikely to change more than a few times per day.

Labels: , , ,

Wednesday, June 10, 2009

Does Cellular 3G help your Rockwell PLC Access?

In short - No.

Cellular data systems are continuously being sub-optimized for the unidirectional fire-hose paradigm of web page viewing, media streaming, image viewing, email download and so on. In other words, you only see the 3G "broadband" performance when you have tens of thousands of bytes to shove in one direction.

Actual field tests of RSLogix 5000 downloading to ControlLogix processors show that using 3G is only a few percent faster than standard technology - plus still probably slower than analog dialup. The problem is caused by RSLogix and Ethernet/IP moving thousands of small transactions in a half-duplex manner. So RSLogix downloads the PLC object by object, confirming success after each object is written. With cellular latency, each of these transactions can take up to a second to complete, so multiple 1 second by 3000 to 8000 transactions and you have your PLC download time.

3G apologists will claim "Oh, but latency in 3G is much lower than older, coarser technologies."

That may be, but when the rubber meets the road (aka RSLogix5000 downloads), 3G doesn't perform any better. 3G might have other value, but for most people moving data "faster" just means your data usage and the final bill racks up faster as well.

Labels: ,

Saturday, June 06, 2009

Cellular to Redundant Rockwell ControlLogix

Can RSLinx access a redundant pair via cellular?

Background
A Pair of Rockwell Automation ControlLogix racks with SRM Module and dual ENBT's will share a pair of IP addresses. One IP address is "the primary", and the other IP is "the backup" (if/when it is online). When the ENBTs switch role, they will issue the requisite Gratuitous ARPs to cause other local Ethernet devices (like a Digi cellular gateway) to update their ARP cache, thus comprehending that the IP-to-MAC address mapping has changed. Thus a NAT/Router forwarding to the primary IP should handle the fail-over with only modest bumps.

Config Details
A user desiring RSLinx (and RSLogix or RSView etc) to access a remote Rockwell ControlLogix (any RA/AB PLC) will be doing what the industry calls "Mobile-Terminated" access. The user needs to arrange a cell plan which offers either a fixed IP address to target, or at least a Dynamic DNS name to target (like tk101.iatips.com or panel2.digi.com). This is NOT what you obtain with an iPhone or personal air-card data plan. Those will have private IPs which only permit outgoing connections - called "Mobile-Originated". That was a buzzword lesson - expect to be asked about those two terms when you ask about cellular data plans!

So once you can arrange your targetable IP or DNS name, you need a cellular router such as one of the Digi Connect WAN family products. My favorite model today is the ConnectPort X4, but that has large memory for Python programming, wireless mesh and other goodies you won't need to link up your Rockwell PLC (Hey, I said it was MY favorite - doesn't mean it has to be yours!)

Note that contrary to folklore or urban legend, all cellular devices need certification to work on a system - even GSM devices. Many small suppliers get around this by including fine-print that say the device buyer is responsible to arrange such legalities, and since you (the buyer) don't read such fine print the salesperson will just say "Heck, it's GSM - so it is allowed everywhere world wide!" Deal with this issue as you see fit, but Digi has more formal certs in more countries worldwide than any of the other industrial players.

But back on subject, when the gateway comes up, it is assigned your known IP or DNS name. This is exactly how your home or business DSL/T-line line works. Yet when RSLinx tries to talk to the gateway on Ethernet/IP's well-known TCP port of 44818, the gateway will reject the connection as a weird attempt at hacking. You need to instruct the gateway:
  1. to not reject the Ethernet/IP traffic on TCP port 44818
  1. to instead forward it to a local IP on the Ethernet - which would be the IP of your ControlLogix ENBT (or the primary IP of the redundant pair)
The details of how this man-in-the-middle fake-out works is fascinating (to me), but quite a pile of text. If you are interested, this older blog entry goes through the NAT/Router details blow by blow. But bottom-line, your ENBT receives the RSLinx packet and needs to have its own Gateway IP set to the Digi gateway's local Ethernet IP address. Free hint: 9 out of 10 guys who call saying "Why can't RSLinx see my PLC through your Digi gateway?" have failed to set the correct Gateway IP in the PLC/ENBT.

So assuming your gateway and PLC are setup correctly, then targeting the RSLinx "Ethernet Devices" driver (with timeouts slowed down to 30 seconds) will cause your PLC's little icon to show up. With RSLinx running, you will create up to 200MB of billable cell traffic per month doing absolutely nothing - so don't leave it active. Note that the "Ethernet/IP Driver" won't work as it requires UDP broadcast, which can't be routed over the Internet.

At this point you'll say "Cool, now can I see my backup ControlLogix or a second PLC?", and the simple answer is "No." One of the realities of RSLinx and AB PLC is that the Ethernet/IP protocol is hard fixed to only the TCP/IP port 44818, and the NAT/Router can only forward TCP port 44818 to a single local PLC. The easy fix would be for Rockwell to change RSLinx to enable adding both an IP/DNS name and TCP port number - then the NAT/Router could forward TCP port 44818 as 44818 to the primary ENBT, TCP 44819 fixed-up to 44818 to the secondary ENBT, TCP 17256 (a random number :-]) fixed up to 44818 to an RSView panel and so on. Because the NAT/Router can restore all traffic to 44818 on the local Ethernet, RSLinx is the only tool needing to change.

Will Rockwell ever do this? It would take a programmer half a day to do - then a few weeks to test - then a few months to document and forestall support headaches. So who knows. They might. They might not.

But bottomline is a simple cellular NAT/Router can be used to talk to a pair of ControlLogix running in a redundant configuration - you will just be limited to seeing only the primary ENBT and the primary IP address.

Labels: , , ,

Tuesday, April 07, 2009

Cellular Antenna and your PLC

One of my customers is having fun with one of their own customers. My customer uses Digi gateways running a Python application to collect hourly tank levels, which are fetched by cellular once per day. The tanks hold 250-gallons of a chemical additive which is injected at a variable rate into crude oil pipelines. Small battery-powered ultrasonic level sensors (www.massa.com) push the last 8 hourly samples every few hours using wireless IEEE 802.15.4 (aka Zigbee). The end customer's goal is to forecast when the tanks need to be refilled ... and I don't mean just receive an alarm when it is running low, they literally want to plan truck routes in advance.

Fun stuff.

The problem is that one of the pilot sites has bad cell coverage, meaning many days the central system cannot upload the log. Of course the data eventually it all uploads since it is held for over a month. Now, never mind that this system sits down in a dry wash (a small valley), the end-user says "Hah, that's because carrier A*** stinks; we all love carrier S****." My customer does not want to mix carriers - especially since negotiated pricing between the two in question is so different.

So remember that "coverage" for fixed RTU must be viewed differently from "coverage" for mobile uses.

For a mobile device (or user) "coverage" is defined by the probability that a valid carrier connection will be available as the device moves around. Thus the number of towers (etc) is important, and if there is no signal in one spot, then hopefully will be one a mile or two away.

Unfortunately, our little RTU bolted to a power pole in that gully won't be moving a mile or two ever. So in the end, "coverage" for a fixed device is defined only by the tower(s) which can be seen. So the carrier with the best cell-phone coverage might not be the best carrier to support a particular fixed RTU.

The first step will be to move the RTU up out of the gully, which is easy since the data signals are all wireless and power can be tapped from any of the privately owned power poles. Probably carrier A*** (stinky or not) will work fine once the RTU moves. As plan B and make the end-customer feel listened to, a S****-based RTU will be placed on the same pole as the relocated A*** one.

Could they run out a long external antenna? Sure, they could. But why bother? Put the RTU where it has a nice signal. Even in 2 or 3 store buildings we suggest customers put the cellular router up in the roof-area and run the Ethernet UTP it's 100 meters instead of trying to deal with the signal loss in long antenna cables. We even have customers using 900Mhz Ethernet bridges to link a cellular router placed where it must be placed back to the Ethernet devices which want access.

Worst case, a directional yagi antenna could be used, however you need to understand that cell towers are routinely turned off without warning. The carriers are truly geared towards mobile users who expect bad signals in some places. Towers (or the active elements) also move. Most of the cell towers you see along the highway work like strip-malls; a company owning the tower and supplying power leases tower space to a mix of carriers. This allows everyone to be flexible.

So prematurely locking down a cellular device to a single tower with a directional antenna can cause future problems since it will not see other weaker towers should the targeted tower be turned off or even moved.

Labels: , , ,

Friday, August 01, 2008

Tunneling Serial Data over Cellular

Tunneling Serial Data over Cellular

Question: you have a Windows-based application which expects to talk over normal serial ports - how can you connect to a remote serial device over a cellular-IP link?

Products: works with Digi Connect WAN, WANIA or WANVPN, Digi ConnectPort VPN, X4 or X8

Answer: Our Digi RealPort driver for Windows 2000/XP/Vista dated Nov 2008 now supports a nice low-overhead "UDP: Serial Data Only" mode. This will cost a fraction of what normal TCP-mode Realport or competitors TCP redirectors will cost.

Here is a web page explaining how to install RealPort for UDP mode (DDNS names can be used!)
http://digitips.wikispaces.com/Digi+Realport+with+UDP

Here is a web page explaining how to set the WANIA to UDP Sockets.
http://digitips.wikispaces.com/Digi+UDP+Sockets+WAN

I set this up yesterday and have been polling a Modbus/ASCII serial slave on my CPX4 using RealPort and UDP mode. Responses take from 2 to 3 seconds to return … in part because I am polling so slowly that the modem PARKS in between every poll and I wait 30 seconds for a response timeout. I see perhaps 0.25% error / no response timeout but then my SIM doesn’t have the best signal where my product sits.

However, I still believe people need to be realistic – even in the situations where the slave response times out I’d suggest NOT retrying immediately since the retries have a higher than normal probability of failing as well. Missing a few polls a day is better than needlessly paying more for data plans.

Labels: , ,

Wednesday, July 23, 2008

Evolution of Data Plan Billing

Summary: the big three have moved away from unlimited data, towards limited data.

It is interesting - I once (as in last year) had a talk with a potential partner who'd been at some European conference and was convinced the world was on the verge of low-cost (sub-$20/month) unlimited cellular data plans. We were discussing the creation of report-by-exception tools to reduce SCADA costs, and this partner's strong faith in this belief caused them to eventually bail out of the talks, saying "In a year or two, no SCADA company will care about how much cellular data they use."

Yet as of the summer of 2008 the world of cellular data is moving in the opposite direction. Last year the big three (AT&T/Sprint/Verizon) offered "Unlimited Data" for personal users with the Service Terms listing a VERY narrow list of permitted activities - mainly email and web browsing, with many common things like file download/upload, media-streaming prohibited. So when ever one of the big three would cut off a user for moving too much data on an "unlimited plan", the service provider would fall back on the "You are doing prohibitted things, thus impacting our network, thus take your business elsewhere". What a way to cause bad feelings, eh? Note that this change is CONSUMER plans - machine-to-machine have always been limited, priced by the MB/month without rollover, plus with charges for data overages.

Now all three have dropped the price from the $80/month range down to $60/month range ... but added a hard limit of 5GB per month. Isn't free & vigorous market competition wonderful?

Sounds reasonable - 5,000 megabytes of data is a lot, yet this doesn't mean 5GB of data transfer. It means 5GB of metered activity, with many activities I've studied including up to 95% overhead. Thus someone only moving 20-30MB of real data in small packets per month might hit pretty close to their 5GB limit! My experience with normal wide-area-network traffic hints that a real PC user doing simple email and web-browsing once a day would probably move 1-2GB of data before hitting the 5GB total activity limit.

To paraphrase the wireless data service terms for all three:
  • Data transport is always measured in full kilobytes
  • Actual transport is always rounded up to next full-kilobyte at "end of session"
  • Network overhead and resend requests caused by network errors can increase measured kilobytes.
  • 2 of 3 mention always rounding up to nearest kilobyte every hour period.
  • All warn that you will NOT receive an itemized detail of how your charges are calculated; you will NOT see which services were used or during which time periods the charges were inccurred under.
So if I send a single 50 byte UDP/IP packet, is that a full session and billed as 1024 bytes? Could be under this language since UDP is 'sessionless'.

Hmm, the term session is pretty ambiguous. Perhaps it means per "time you enable your PC-based cellular data card." That seems likely - plus if you left your device on twenty-four hours a day then the once per hour round-up would catch you.

I'm afraid I haven't offered any new answer here, other than to suggest you understand that low-cost unlimited data plans ARE NOT just around the corner ... at best we left them behind last year and I don't foresee them ever returning. I suppose all three now understand that huge new profits are to be made with these 5GB limits, which will cause many "super-salesman" using their cellular data plan daily to spend an extra $50 to $500 in monthly overage charges.

Labels: , ,

Thursday, May 22, 2008

Cellular to Wireless Zigbee

An interesting new market we are moving into is cellular access to wireless mesh (Zigbee as example). In some sense it's a supply-chain dream come true. Imagine you supply a product to customers and EVERYONE (except the customer's IT department) want you to be able to see what the stock level is and auto-schedule deliveries.

The customer benefits because they can treat the product as a 'utility' - turn the tap and there it is.

You benefit because you can minimize emergency truck-rolls - no more angry, panicked customers demanding you send a truck over two-thirds empty because someone forgot to schedule a special delivery because the customer needed to use 60% more product for two days. Of course as a supplier the cost of the truck, fuel, and driver are critical parts of your margin/profit. You desire to only send out full trucks which return gracefully empty!

So we are now working with several of the largest chemical suppliers in the world to enable:
  1. drop in a powered cellular unit at the customer site
  2. drop in powered or battery tank sensors
  3. log levels hourly, for the supplier to upload daily (reduces cellular data charges) The supplier uses this as their 'secret-sauce', their own proprietary value-add to predict when trucks need to roll to maximize efficiency
  4. enable alarm call-out if the levels hit unexpected low-low levels
How this works varies by suppliers. The one I'm working with is using Modbus/TCP to pull up the logs daily. Some other suppliers are having SMTP clients push emails back to the supplier with XML formatted reports. The next supplier I might work will wants the binary logs to be compressed (ZIP'd) and then pushed upstream once a day by FTP, where their accounting system will convert from binary to XML to import and issue bills on product usage per MINUTE.

Of course key to all of this is the wireless drop-in-network concept. The supplier doesn't want to invest thousands of dollars pulling wires through SOMEONE ELSE'S PLANT - especially when the supplier's contract might end in a few months.

Wireless sensors aren't new; cellular data access isn't new; supply-chain systems which auto-detect product levels aren't new. What is new here is the merger of many technologies which reduce infra-structure costs, and thus increase ROI.

Labels: ,

Friday, March 07, 2008

Optimizing Modbus for Cellular

Goal: lower your data costs

How nice it would be if you could take your Ethernet applications and just move them to cellular (or satellite). Well, of course you can ... but you'll pay through the nose for this.

At the moment I'm in the process of creating intelligent cellular gateways accessible by Modbus/UDP (aka Modbus/TCP form in UDP) which support data logging, report-by-exception and other cost-saving goodies.

A few Facts:
  1. You are charged for all IP, TCP, and UDP overhead - the cellular system moves your TCP/IP packet as raw payload encapsulated in mobile-IP or other transports. So to them the 40-52 bytes TCP/IP header as NOT DISTINGUISHED from your data. ( My Blog entry on this )
  2. Thus Modbus/UDP (aka Modbus/TCP in UDP/IP) will save you from 60 to 90% of your data costs. It is a single one-shot request followed by a single one-shot response. In contrast TCP/IP might require up to 400 bytes of socket open & close overhead, plus TCP acknowledge packets.

So if you are concerned about cost, you should first make sure your DATA POLLING can be done with Modbus/UDP - not TCP. No sweat if you need to use Modbus/TCP to reprogram or monitor your RTU or PLC short-term, just make sure your 24/7 repetitive data polling in via UDP. ( Is UDP reliable enough? )


So I've defined a few extensions (read as 'heresy' to many). Full Details are Here at iatips.com:
  1. I allow use of the full Modbus/TCP header - so one can read 500 registers in a single request. This greatly reduces charges for header overhead
  2. I allow returns LESS than data than requested when the context is appropriate. This saves having to pay for data padding, plus not having to poll a status register to see how much logged data is waiting
  3. I allow use of data compression like ZLIB. Bottomline, 'ZIP' compression of small data sucks, but since I can return 500 registers (1K bytes) ZLIB starts showing value.
  4. I allow packing multiple Modbus-ADU below a single header, which (unlike pipelining) signals the gateway to return multiple responses in a single packet.
  5. I am looking at support for simple AES encryption, not as true 'security', but as a good-enough means to support Modbus without making development difficult.
If you are interested in the details, I have more discussion on my wiki site.

Labels: ,

Lost My SIM

Well, I've been quiet for awhile - lost my "free development SIM" as part of Cingular's reorg into AT&T. Thus my collection of PLC with free demo access are off-line. Is interesting to support cellular without a SIM ... not that my boss is ignoring this, but there are 'plans in the works' which involve several companies ... plans which just keep moving out.

However, I am working on Cellular gateways with intelligence now. Goal is to allow a Modbus client/master to come in once per day and upload time-stamped logs; plus if certain events occur use Modbus to call for help.

Labels:

Friday, October 05, 2007

How Exposed in Public Cellular IP

One of the concerns about having a PUBLIC IP address for cellular is your exposure to public hackers probing public IPs for services. In theory, you pay for all of these attempts since the mobile IP system encapsulates and transports them all on your behalf.

I'm happy to report that after logging such attempts for many months my cellular devices receive less than 4 probes in any one day and likely under 20 total per month. So many days pass with no probes at all and it appears to stay under 2-3K per month. This compares to perhaps 200 probes per day on my DSL router/firewall.

I am not sure why the difference, although I'd guess it has to do with the high initial latency in contacting a cellular IP. So any "script-kiddie" tool scanning IP address ranges probably is not willing to wait up to 5 seconds for cellular devices on busy towers which need "unparking" to respond.

What kinds of probes are they? Mainly those looking for MS-SQL servers, with a rare access to FTP and the remainder of accesses aimed to seemingly random, unnamed ports - likely associated with trojans or zombie networks.

Labels: ,

Wednesday, August 22, 2007

The Truth about Cellular IP

I have been very busy analyzing real-world telemetry traffic over cellular IP, and unfortunately I am now 100% convinced that you cannot effectively use most (any?) off-the-shelf "Ethernet" software tool to talk to remote Ethernet devices over cellular IP. Bottom-line is that - unless your host app is custom written to be data cost and time delay sensitive - your data costs will be bloated due to the nature of the tool. Even something as "obvious" as adding compression doesn't solve the problem because telemetry packets tend to be too small for effective compression. For example: an 8-byte Modbus/RTU request becomes 12-bytes after ZIP-style compression. Plus this doesn't reduce the 104-bytes of TCP overhead nor 28-bytes of UDP overhead. None of the cellular providers allow use of RFC-class TCP header compression, since it requires all of the infra-structure to maintain
copies of headers etc.

So I have been working on "reduction" solutions - how to obtain the effect of moving "X" IP packets but only moving "X-minus-a-bunch" of actual IP packets.

Tunneling TCP thru UDP
The most promising and generic form of reduction is to tunnel TCP/IP via UDP/IP over cellular. So the host application talks TCP/IP to a local proxy, which acts as the TCP end-point. All of the TCP SYN, ACK and Keepalive traffic is limited to the local Ethernet. The local proxy then initiates a UDP "session" with a remote proxy over cellular & we instantly see a 60-90% reduction in data costs. The remote proxy initiates a TCP/IP connection to the remote Ethernet device, which again isolated the extra TCP overhead to the remote Ethernet.

The reaction of non-IA network engineers to this idea is predictable and a bit humorous after a while. They immediately say "You cannot do that!!! UDP/IP is unreliable!!! You'll break something!!! You are committing a mortal Sin!!!" But in reality none of the IA protocols leverage the reliability of TCP anyway. For example, Rockwell RSLogix doesn't send a program block to a ControlLogix and blindly assume it was successful after the TCP Acknowledge from the peer is processed. Instead, RSLogix sits (blocks literally) and waits for a successful CIP response on a single CIP Connection. So if the local proxy returns a TCP-ACK to the RSLogix host and the CIP request is lost within the UDP/IP tunnel ... eventually RSLogix times out the CIP connection and the application (and/or user) will restart.

Fortunately, cellular is very reliable - all of my tests sending 10,000 UDP packets rarely even lost 1 packet and I'm not sure if such a rare loss is due to cellular or just my test script hiccupping & dropping a packet. Plus cellular tends to have only very bursty error problems. In other words, you won't lose 1 packet per 10,000; instead you'll lose all packets for 5 minutes or just 5 random packets out of a group of 10 sent. This shotgun-damage tends to confuse TCP/IP state machines to the point that they abort the connection anyway. In truth, in all of my Wireshark/Ethereal trace reviewing I have never seen a single situation where a TCP retry did anything but add data cost; every TCP retransmission just results in a "Duplicate ACK" showing up a few packets below in the trace & a doubling of the cost of that block of data.

So overall, anyone planning to use cellular should first investigate if they can use UDP/IP instead of TCP/IP.

TCP Problem #1 - added cost for pointless ACK
As mentioned above, real-world analysis of telemetry use of TCP shows the TCP ACK isn't useful; but worse, Embedded TCP devices tend to sub-optimize the ACK timing to "speed up" data transmission and recovery. Almost universally moving an IA protocol via TCP/IP results in 4 TCP packets instead of the idealized 3.

  • Your app sends a TCP request (request data size + 40-52 bytes of overhead)
  • 800-1100 msec later your app receives a TCP ACK without data (another 40-52 bytes of overhead)
  • 10-40 msec later your app receives the protocol response (response data size + 40-52 bytes of overhead)
  • Within a few msec, your app sends the TCP ACK without data.

So what would have been a 2-packet transaction with only 56 bytes of overhead under UDP/IP, or what should have been a 3-packet transaction with 120-156 bytes of overhead under ideal TCP/IP usually becomes a 4-packet transaction with 160-208 bytes of overhead. Yes, there exists a TCP socket option and the concept of "Delayed ACK Timer" to prevent the first empty TCP ack from being returned over cellular, but few embedded products use this since it adds code complexity, and it slows down overall data communications. At least in the IA world it seems everyone wants their Ethernet Product costing 2-4 times more than their serial product to appear lightning fast. So they ignore the TCP community's decades of hard-earned experience and "hack" their TCP stack to sub-optimize fast local Ethernet performance.

So this is where the instant 60-90% data cost savings of using UDP over TCP comes from. UDP has smaller headers and results in fewer packets being sent. Since the cellular IP system is "encapsulating" your TCP/IP packets in a manner similar to PPP, the entire IP header, TCP or UDP header, and your data is all considered billable payload.

There is also a myth propagated to this day that the TCP ack causes retry to occur more rapidly out in the wide-area-network infrastructure. The rhetoric goes, "If the 3rd and 4th router link is congested and the TCP data packet is lost, then the 3rd router will retransmit ... which is faster ..." Perhaps this was true back in the 1980's, but today the 3rd and 4th router (and all of the other 20 to 30 routers in a cellular end-to-end path) are just tossing IP packets upstream with no awareness of the packet functions. In reality, it is only the TCP state machines within your host machine and within your remote device that have any ability to retransmit anything.

TCP Problem #2 - added cost for premature retry
The TCP RFC includes many dynamic timers that automatically adjust themselves based on real-world performance. This is actually pretty neat. It means if the TCP ACK and response times tend to be longer than normal, then the TCP state-machine slowly increases the delay before retransmission. But I've seen 3 problems with this.

  1. The most effective way to leverage auto-adjust is to include the 12-byte TCP header options that time-stamps all packets. Linux system add this by default and installing one of many PLC engineering tools on your Windows computers causes Windows to also start always using this. The setting generally is global - you either have 40-byte TCP headers or 52-byte TCP headers forever. So for small telemetry packets, this adds a disproportionately large increase in data costs.
  2. Many embedded devices (PLC, RTU and I/O devices) have "hacked" the TCP ACK sub-system to force connection failure to be faster than the standard 3-4 minutes. For example, I worked with one large PLC company which expected TCP sockets failure in less than 1 second, so they forced TCP retransmission in hundreds of msec and without any normal exponential backoff between retries. This is totally unusable over cellular; you will end up with 30% to 90% of your data traffic being premature retries and responses to premature retries. I have literally seen Wireshark/Ethereal traces which are mainly black lines with red text - which is the default color used to show TCP "problems" such as lost-frag, retrans, dup-ack, etc.
  3. The latency in cellular is abnormal by an order of magnitude. Even browsing the internet or doing a telemetry polling test over DSL/cable broadband averages latencies in the 100-150 msec range. This is what a Windows or Linux defines as "slow/bad" - not the 800 to 3500msec of cellular. So even watching a Windows or Linux TCP state machine auto-adjust the retransmission delay over time, you will not see it achieve a 100% effective setting which eliminates wasted TCP retransmissions. The delay seems to top out
    at about 1.5 to 1.8 seconds, which is just too close to the actual "normal" latency range. So again, use of UDP/IP frees the use user from data costs associated with TCP legacy assumptions - both the main-stream MIS/IT market variety of assumptions and the misapplied IA vendors "speed-ups".

TCP Problem #3 - uncontrollable SYN/Socket Opens

Given the way all cellular systems "park" inactive cellular data devices, it is exceedingly rare to ever see a host app open a new TCP socket without prematurely retrying/retransmitting the SYN packet. This is because one is virtually guaranteed that it will take about 2.5 seconds for the data device to be given active airwave resources and return the SYN+ACK response. This has NOTHING to do with the "always connected" feature Digi and others claim. The data device (even when parked) is fully connected by IP and fully authenticated by the system - it is "always connected". However, the local cell tower only has finite airwave resources, so any device (cell phone or data device) which is idle from 3 to 45 seconds is "parked" without having any preallocated airwave resources. Literally when the TCP "SYN" shows up, the cell tower has to use the control channel to inform the data device to request airwave resources, and after these are requested and allocated the data device can receive and response to the TCP socket open request.

But that's not the real problem related to TCP Socket Opens ... the real problem is yet another case of IA vendors sub-optimizing TCP behavior for fast local Ethernet performance. For example, I once had a customer who normally paid about $40 per month receive a $2000 bill one month. It turns out they had powered down the remote site for 3+ days and the off-the-shelf 3rd-party host application they used would try to reopen the TCP socket every 5 seconds!!! So Windows would send the initial TCP SYN to start the open, since the remote was off-line Windows would retransmit this TCP SYN a few seconds later. After a total of 5 seconds, the application would ABORT this TCP socket attempt and start a new one. So this host app was pushing 24 billable TCP packets per minute out to a remote site that was powered down. This was nothing the host app vendor documented, nor was it anything a user could configure or over-ride. The user could configure the host app to ONLY poll once per 5 minutes; but the user had no control over this run-away TCP SYN/Open behavior.

Tunneling TCP through UDP effectively decouples the TCP SYN/Open from cellular data charges. The first TCP Syn/Open request to the local proxy would succeed even if the remote IP site is offline. No retries would be required. Even if the host app attempts to retry the data poll every 5 seconds, this is something the UDP proxy can be configured to "resist". If the user truly wants data packets to only move every few minutes, that is something the UDP proxy can easily enforce.

TCP Problem #4 - sub-optimized TCP keepalive

The final problem I'll discuss (but not by any means the "last" problem with TCP) is that many embedded IA devices have relatively fast TCP Keepalives hard-coded to speed up lost-socket detection. While this is an admirable goal, a Rockwell PLC sending out a TCP Keepalive at a fixed 45 second interval can create up to 6MB of monthly traffic by doing this. Siemens S7 PLC seem to issue TCP keepalive every 60 seconds - a bit better, but not by much. Maybe such a heart-beat is useful to know the remote is accessible, but given the reliability of cell phones (when the last time you had a dropped call or no signal ...) you'll obtain a lot of false-alarms if you treat every missed packets as something requiring maintenance's attention.

Again, tunneling TCP through UDP effectively eliminates the automatic, possibly uncontrollable use of TCP Keepalive. If your process can handle you talking to it once an hour, then the cost of TCP socket open and close, as well as any TCP Keepalive is all wasted investment.

Not only this, but the cellular providers do NOT want users who send a simple, rather empty packet every 30 to 60 seconds - this is literally the worst kind of customer, as this forces the cell tower to "waste" one of its very limited airwave resources with almost no income returned to the carrier. From what I hear, carriers either want customers who talk constantly and pay huge monthly fees (say $90 to $350/month); or they want customers who rare talk and pay a small fee (even just $5/mo) but cost the carrier virtually no direct expenses.

Putting this is "restaurant terms":

  • A cellular data device that talks constantly but pays for a large plan is like a restaurant patron who sits at a table, constantly ordering more food and paying a larger bill.
  • A cellular data device that rarely talks is like the restaurant patron who comes in once a month, sits at a table, orders a meal, pays and then vacates the table.
  • A cellular data device that keeps an idle channel open full time but rarely talks is like the restuarant patron who sits at a table in the resturant, reading the paper but rarely ordering food or paying a bill.

In fact, in private chats with carrier account people, I have heard several times that they have been directly to prefer either customers who talk constantly on large plans or those who talk at most once an hour (better once a day) on small plans. Customers planning to talk every few minutes have been defined as bad investments. It may be fair to say that after years of building up the data-plan customer base, the cellular carriers have come to understand that the REAL cost of data plans is not the bulk data bytes moved; it is instead the percentage of time the device consumes (or squats on) 1-of-N scare airwave resources in proportion to the monthly fee they pay.

Labels: , , ,

Tuesday, June 12, 2007

Real World Cellular - ControlLogix PLC

Summary: Before I listed some real world numbers for Modbus polling. This time I walk through some of the costs and issues of using ODVA Ethernet/IP to talk to a Rockwell ControlLogix PLC.

The Convoluted Path of Wide-Area-Networks:
In general the magic of IP hides reality from us all. We tend to think "now I am browsing Google.com or iatips.com", but we don't really understand how COMPLEX and MIRACULOUS this really is. Your computer is NOT connected to either of these web servers; instead your computer uses the services of a dozen or more other computers/routers to get from "here" to "there". Every single data byte must be forwarded hop-by-hop through all of these cooperative peers.

As example, here is a Trace Route (tracert) of access from a computer within my test lab to a ControlLogix PLC sitting six (6) feet away. I am using public Internet access via a cellular Digi Connect WAN to the Ethernet (ENB) of the ControlLogix. Some of the public IP have "X" entered replacing the digits; you don't need to really know the exact IP value.

My computer has private IP = 10.9.92.1
01 01 ms 10.9.1.1 (Digi's private Intranet)
02 01 ms 10.10.11.10 (Digi's private Intranet)
03 01 ms 10.254.254.2 (Digi's private Intranet)
04 16 ms 66.77.x.x (Digi Co-Host/Internet Link)
05 04 ms 69.8.x.x (Digi Co-Host/Internet Link)
06 64 ms 66.77.x.x (Digi Co-Host/Internet Link)
07 09 ms min-core-02.inet.qwest.net [205.171.128.110]
08 11 ms cer-core-02.inet.qwest.net [67.14.8.18]
09 12 ms cer-brdr-01.inet.qwest.net [205.171.139.62]
10 39 ms qwest-gw.cgcil.ip.att.net [192.205.32.97]
11 35 ms tbr2.cgcil.ip.att.net [12.123.4.254]
12 35 ms tbr2.sl9mo.ip.att.net [12.122.10.46]
13 75 ms tbr2.attga.ip.att.net [12.122.10.137]
14 31 ms 12.122.85.157
15 34 ms 12.86.140.146
16 * Request timed out. (Part of Cellular Infra-Structure)
17 * Request timed out.
18 * Request timed out.
19 * Request timed out.
20 1276 ms mobile-166-XXX-XXX-XXX.mycingular.net [166.XXX.XXX.XXX]
Digi Connect WAN has private local IP = 192.168.196.80 (is 'gateway')
ControlLogix PLC has private local IP = 192.168.196.21

These traces always amaze me - how something so seemingly trivial takes so much effort to really function. Notice how my lab PC has to route through 6 devices to even get out of Digi's company network, then through Qwest (our ISP), through AT&T (my cellular SIM provider), through some unnamed hops of the cell system, and finally be port forwarded to the ControlLogix PLC. The packets may be passing through Minneapolis, Chicago, Detroit, Atlanta, and then finally returning to the PLC sitting right beside me.

Effect of NAT (Network Address Translation)
Now lets look at what happens when RSLinx on my PC opens an ODVA Ethernet/IP socket to the ControlLogix PLC. Every TCP/IP packet requires 4 unique values which define a connection:
  1. Destination IP (target device)
  2. Destination Port (target application within device)
  3. Source IP (return address to originator)
  4. Source Port (likely random port, originator is waiting for responses here)

So we start out with the 4-tuple DST=166.x.x.x : 44818 and SRC=10.9.92.1 : 22256. The 166.x.x.x IP is assigned by my cellular carrier. Port 44818 is ODVA's "well-known" port for Ethernet/IP. 10.9.92.1 is an internal Digi selected private IP. TCP port 22256 is the ephemeral (or random) port selected by RSLinx to listen for responses.

The first NAT effect is the Digi corporate firewall changes the request to be DST=166.x.x.x : 44818 and SRC=66.77.x.x : 22256. My private IP of 10.9.92.1 is meaningless out in the Qwest or AT&T's networks, so something needs to swap this for a "real" world-unique IP leased by Digi. Our corporate NAT interface creates a record (with a lifetime of 5 minutes) that allows any responses to be correctly restored to 10.9.92.1

The second NAT effect is when the Digi Connect WAN forwards to the ControlLogix with another private IP. So the 4-tuple now becomes DST=192.168.196.21 : 44818 and SRC=66.77.x.x : 22256. The ControlLogix thinks IP host 66.77.something is connected to it - not the real host IP of 10.9.92.1. Plus the ControlLogix has NOT CLUE that the RSLinx thinks the ControlLogix as IP of 166.something.

Now, to send a response the ControlLogix issues a TCP/IP packet with the flipped 4-tuple of DST=66.77.x.x : 22256 and SRC=192.168.196.21 : 44818. The Digi Connect WAN restores (undoes) the NAT and changes this to DST=66.77.x.x : 22256 and SRC=166.x.x.x : 44818. After passing back through AT&T and Qwest, Digi's corporate NAT interface restores its own NAT and changes it back to DST=10.9.92.1 : 22256 and SRC=166.x.x.x : 44818.

This understanding of NAT and IP is useful for understanding the capability and limitations of cellular access to certain devices with certain protocols. A future entry will cover setting up RSLinx Classic and using RSLogix 5000 to download over cellular to a L5555 processor.

Labels: ,

Thursday, May 31, 2007

Unlimited Cellular Data - Revisted

As I go around dealing with large utility customers and hear feedback from Digi salespeople who survive large accounts, I hear some interesting but disturbing trends rising.

First, there are the people (usually who are new to cellular) who claim any day now the age of cheap, unlimited cellular data plans means all my hard work to understand or offer reduced traffic are wasted effort. I especially hear this coming from European partners.

Then there are the other people ... people I know work with very large, very powerful end users who fail to get the cellular plans they desire. Things I hear:
  • I have heard of customers who pilot projects and hear promises of unlimited traffic, but when the time comes to sign the contract for 3000+ sites, the carrier decides that they cannot offer unlimited traffic ... period. Hmm, I guess this is the difference showing between the carrier's commissioned sales staff and the business managers who need to keep the cellular system profitable.
  • I have worked with large customers who do manage to get "unlimited deals" for a modest sized system - say 50 or 100 sites, but the carrier insists on adding the 2 clauses 1) the carrier has the right to artificially slow down the data communications (without detailing what this means) and 2) the carrier has the right to just stop all the customer's data traffic temporarily if the cell system gets busy (again, not details of this defined). Hmm, I have to kind of wonder what kind of control engineer agrees to such "clauses"? You'd think trying to get one's data under control to avoid the need for unlimited data is a wiser design.
  • I have heard that overall the cellular carriers are starting to rethink the value of machine-to-machine data plans. Unlike DSL or cable, this is NOT an issue of bandwidth; it is an issue of the % of the time the device "hogs" 1 of N slots or resources on the tower forever. Imagine having 10 or 20 such devices squatting there, locking up that tower resource. So it is not even so much an issue of talking once per few minutes verse flat out unlimited traffic. In either case 1 of N finite tower resources is used, so long-term the only "good" data plan may be for a data system using the cellular resource every few hours or a few times a day.

So overall it looks like my efforts to understand and reduce traffic is useful.

Labels:

Wednesday, April 25, 2007

Rockwell and Modbus Data Traffic

Summary: I compare the data costs for four common off-the-shelf PLC protocols to a remote cellular site. As a rule of thumb, even if you have an Ethernet-based PLC your lowest data costs for SCADA-style periodic polling are obtained using serial protocols via the PLC's serial port. Their modern "Ethernet protocols" are very wasteful of cellular bandwidth.

PLC Protocol Example:
A simple but realistic SCADA scenario is to poll every 15 minutes and read 10 words of data and write 2 words of data. This commonly requires 1 Read command and 1 Write command (I'll ignore the rarely supported Modbus command that reads & writes within a single command.)

While there exists special SCADA protocols and special products that optimize remote traffic, I am not looking at those protocols at the moment. Instead, since cellular and satellite access to remote IP and Ethernet products has enabled people to use off-the-shelf PLC technology, I am looking at the more traditional PLC protocols. These are things which affect users when they apply an Ethernet design to an IP-based wide-area-network system.

I compare these 4 PLC protocols:
  • AB/DF1 Radio Modem (RM) encapsulated in UDP/IP. DF1 RM is basically DF1 Full-Duplex with no ACK/NAK and is supported by the SLC5 and MicroLogix line.
  • Modbus/RTU encapsulated in UDP/IP. Modbus/TCP within UDP/IP is roughly the same size.
  • AB/CSPv4 in TCP/IP as supported by SLC5/05 and PLC5E MSG blocks.
  • AB/Ethernet/IP as moved by ControlLogix Explicit MSG blocks to PCCC-based remote PLC. Note that Ethernet/IP "I/O Messaging" does NOT work through NAT'd wide-area-networks since the protocol embeds IP information within the data packets and is thus is "broken" by NAT.
The bytes per 15 minutes includes the IP headers, any PLC protocol overhead, and the actual data moved for each update (always 20 + 4 bytes). The MB per month is just defined by 30 days of such polling, or 2880 updates at once per 15 minutes. The 2 serial protocol have been encapsulated into UDP/IP and I assume the remote IP end-point can handle connecting this to the PLC's serial port.

ProtocolTransportPer 15 MinMB per monthRelative Cost
Ethernet/IPTCP/IP1202 bytes3.46 MB100%
AB/CSPv4TCP/IP960 bytes2.76 MB80%
Modbus/RTUUDP/IP166 bytes0.48 MB14%
DF1 Radio ModemUDP/IP194 bytes0.56 MB16%

The two Rockwell "Ethernet" protocols cost a lot more to use in part because they force use of TCP/IP, and therefore suffer the repeated cost of TCP socket opening and closing, plus extra TCP acknowledgment overhead. They also suffer because they both involve connection registration and service functions that needlessly repeat every time the connection is reestablished. While the actual data packets of these protocols are roughly twice the size of the serial encapsulated protocols, the real burden they suffer is all the extra TCP/IP packets exchanged that do NOT directly involve field data update.

Both the serial Modbus/RTU and DF1 Radio Modem benefit that they move no IP packets that don't relate to the field data update - no TCP/IP open or close or acknowledgement; no protocol "service function" overhead. Each moves just 1 read request and 1 read response, plus 1 write request and 1 write response.

Discussion of Other PLC Protocols:

Most other PLC Ethernet protocols will either approach the costs of the AB/CSPv4 - or they won't work at all due to use of direct "Ether-Types" and lack of IP compatibility. Most serial protocols with roughly either match the 2 show here or be twice the cost if protocol ACKs are used by the protocol.

Modbus/ASCII will almost double the cost of Modbus/RTU since each data packet is roughly twice the size. But this wouldn't increase the IP overhead any.

Using DF1 Full-Duplex instead of Radio Modem would effectively double the cost over DF1 RM since DF1 Full-Duplex moves the protocol ACK/NAK, which doubles the IP header overhead also. Using DF1 Half-Duplex would triple or even quadruple the costs since HD not only moves protocol DF1 ACK/NAKs, but the ENQ/EOT polling overhead.

Most other protocols I am aware of - such as Omron Hostlink, GE-Fanuc SNPX, and Siemens PPI - would cost roughly 2 to 3 times more than Modbus/RTU or DF1 RM since they include protocol ACK, while a few even encode many parts of the message as ASCII or BCD form instead of as binary.

Labels: , , ,

Thursday, April 19, 2007

Rockwell PLC5E and SLC5 over Cellular

Summary: Customers ask me "How much will cellular cost" - this blog post walks through some examples of SCADA-style periodic polling of AB/PLC5E or SLC5/05 using the CSPv4 protocol (aka AB/Ethernet to 3rd parties) over TCP port 2222.

Real-World Numbers
For a simple SCADA-style example assume we need to read 10 words of data (20 bytes) and write 2 words (4 bytes) every time period. Obviously there would be simple optimizations to this, such as only writing data which changes or using PLC MSG blocks to push data from PLC to SCADA only when something changes. However my goal in this blog post isn't to "tweak" a solution to minimize cost, but to examine the protocol impact of using Rockwell CSPv4 over IP.

The table below shows the megabyte per month when polling once per second, per 5 seconds, per 1 minute, per 5 minutes, per 15 minutes, and per 1 hour. There are lots of variables considered ... and many more ignored. The traffic ranges from worst-case of 1005.0 MB for TCP/IP with larger header options polled once per second to best case of 0.2 MB for UDP/IP polled once per hour. This assumes the use of the CSPv4 submode 7, with local LSAP addressing and ignores that Rockwell PLC5E and SLC5/05 don't support CSPv4 within UDP/IP. Raw efficiency at moving the data bytes ranges for about 10% for UDP/IP to barely 1% for TCP/IP; which means most of what you are paying for is not related to actual, meaningful field data.

( Click this image to see a larger version )
CSPv4 Poll Record
(is at http://iatips.com/blogimage/rockwell_cspv4_traffic.png)

Notes on the Table

Since this example reads and writes small amounts of data, it assumes a SLC5-style Protected Typed Read with 3-Address Fields and the corresponding SLC5 write.

The smaller 40 byte TCP/IP header has no options attached; the larger 52-byte TCP/IP header includes the RFC 1323 Timestamp and Window Scale TCP options. These appear to be the normal default for Linux and easily becomes enabled under Windows since all applications share a single setting in the Registry.

The two time columns "15 min (Alive)" and "1 hr (Alive) assume a roughly 4 min 45 sec TCP keepalive to prevent the socket from closing. This reduces the traffic by the extra open/close overhead in exchange for billable TCP Keepalive packets. Keep in mind this ALSO requires the PLC to be properly configured to NOT close the idle sockets. By default, my SLC5/05 seems to close the idle connections in a few minutes.

The two time columns "15 min (Cls)" and "1 hr (Cls) assume the socket is closed after the a data polls, and the TCP socket and CSPv4 session must be reopened for teh next poll.

Discussion
Of course the standard costs of using TCP/IP verse UDP/IP apply:
  • TCP/IP uses larger headers, ranging from 40 to 52 bytes per packet as compared to UDP/IP's smaller 28 byte of header.
  • TCP/IP involves the TCP Acknowledgments, which may result in separate, billable 40 to 52 byte packets moving frequently without any meaningful field data.
  • TCP/IP may require reopening a socket, costing 120 to 250 bytes per open, plus closing costing from 160 to 400 bytes. Exact sizes are hard to predict since both opening and closing of sockets tend to be "pushed" and result in excess retransmissions and retries when high network latency is true.
  • TCP/IP over unknown 3rd party wide-area-network infrastructure requires at least 1 TCP packet to move every 4 minutes 45 seconds to maintain health. This means either a data packet or a TCP Keepalive with data.
It should be clear to see why using UDP/IP over cellular (which is very reliable) is much cheaper than using TCP/IP. There are no socket open, close, acknowledgment, or keepalive costs. Plus field experience has shown that rapid IP retries rarely succeed. For example, a customer polling with UDP every 5 minutes with 3 explicit retries if no answer will likely have the original poll succeed or that poll and all 3 retries fail. Again, cellular is very reliable in that packets almost always make it through unless there is a network or congestion issues and then only time (a few minutes) solves the problem. So such customers have abandoned retries and just ignore 1 failed 5 minute poll and "retry" in 5 minutes.

CSPv4 issues include:
  • Rockwell PLC and software tools do NOT support use of UDP/IP - my tests with UDP/IP have to be conducted with the Digi One IAP which happily bridges CSPv4 between TCP and UDP (as well as to or from Ethernet/IP and DF1).
  • CSPv4 requires the exchange of a pair of 28-byte negotiation TCP packets when a new TCP socket is opened to inform the client (master) of a server (slave) assigned session handle. This nearly doubles the overhead of an open-poll-close socket paradigm.
  • The 28 byte CSPv4 header really contains little useful information; such excess bytes cost nothing tangible under Ethernet but cost cash in the form of requiring larger cell plans over cellar.
In conclusion, CSPv4 will be a rather poor choice for raw, periodic polling of remote AB PLC. ODVA Ethernet/IP (as implemented by all vendors) is even worse.

Your only effective solution at present is to carefully craft a set of MSG blocks to push data from the field in a report-by-exception paradigm. Of course you also must include safe guards within your PLC to prevent rapid, repeated MSG block triggers during system failure that could cost you thousands of dollar ($$$) in a few days.

Labels: ,

Monday, April 09, 2007

Cellular to Allen-Bradley SLC5/05 on TCP 2222

The old CSPv4 protocol.

The Rockwell/AB SLC5/05 and PLC5E natively speak an older "unpublished" protocol named CSPv4, although most third party vendors call it either AB/Ethernet or AB/TCP. It moves only on TCP port 2222 - ODVA Ethernet/IP I/O Messaging is only on UDP port 2222, so they don't conflict. The protocol consists (normally) of a 3-part packet:
  • 28-byte header
  • 4 or 15-byte LSAP or end-point addressing packet
  • PCCC message which is basically what DF1 documents as an Application Packet
In general, the packets are fairly sparse and compressible (if you have the tools to do this). The characteristics of CSPv4 which impact cellular (and wide-area-network) support:
  • Rockwell tools and PLC only support use of TCP/IP and port 2222; this greatly limits use of CSPv4 in NAT'd networks since the remote NAT router can only forward TCP port 2222 to a single remote PLC.
  • CSPv4 includes a single TCP packet exchange to "register a session" or connect. If you are polling faster than the PLC will hang-up on you, then this is not important. However, if you poll slow enough that a new TCP/IP socket must be opened for each poll, then even ignoring the TCP socket open/close overhead this nearly doubles your traffic costs.
  • In tests, a SLC5/05 seems effective at including the TCP ACK response to the host within the CSPv4 data response packet, so you only have to pay for one empty TCP ACK, which is the host's acknowledgment to the PLC for the response.
  • TCP Keepalive could be an issue, since most hosts fail to issue it and the SLC5/05 I've tested against either doesn't issue TCP keepalives
    or does it very frequently.

Labels: ,

Thursday, April 05, 2007

Interested in Cellular? Setup a DMZ Lab

Summary: Last post I suggested people interested in cellular data start by learning how to use their home cable router. In this post I suggest the next step, of how to make your life easy at work once you're ready to start testing real cellular or satellite access by IP.

Your Second Step should be to set up a simple, isolated low-speed broadband link at work ... create your own DMZ lab.

Sigh - I waste so much time listening to customers complain about how difficult it is to get the IT department to give them custom firewall permissions. Since modern "Security" wisdom is to block everything until proven safe, I waste more time asking customers complaining that their Modbus/TCP or Rockwell access not working to first talk to their IT group to make sure they aren't blocking unknown binary traffic by default. I waste yet more time when customers struggle for days and finally have to formally get someone in IT to help study the corporate firewall logs to see if any traffic is getting through or not. An interesting epiphany occurs when I suggest they just look into paying roughly $50 per month for a private connection for this. It is surprisingly cheap to do this and makes a lot of people's jobs 200% easier.

So far the feedback from customers has been quite positive, with IT departments over-joyed at the idea (slight exaggeration :-] lessor-of-two-evils may be a better term). This really makes sense; IT is charged with keeping the corporate system working and secure, so when you ask for yet another odd, unknown firewall hole to be opened, you ask them to risk their jobs. Plus trying to keep custom firewall settings updated for 50 different projects is an ongoing headache and ongoing risk for mistakes. I know that Digi's IT group is very satisfied with their policy of not offering custom firewall rules on the corporate LAN but instead helping teams set up such private connections in a safe, isolated manner.

Simple DMZ Lab Design
The simplest lab design is little more than a copy of what you have at home: a computer or two, an 8 or 24-port Ethernet switch, and a simple NAT router to "share" the internet connection with a dozen devices. This allows you to freely set up a few OPC servers and Master PLC to test access to remote cellular and other wide-area-network based systems.
  • Locate an empty office or lab room for your new network. Perhaps your IT people should pull out or disable the corporate Ethernet in this room. You are going to create a small "DMZ"; a small isolated network that has limited security consequences if you goof up and let a hacker inside. You'll want good security tool installed on your Windows and Linux computer used in here.
  • Arrange for a low-speed business broadband link with one fixed IP address. 256Kbps is more than enough for general PLC/SCADA testing and should cost in the range of $35 to $50 per month. Yes, just $35-50 per month! I had one customer forced to pay his IT department $100 per month to open one TCP/IP hole in the corporate firewall!! Gee, he could install 2 DSL links for that. Now, be patent when you talk to your carrier, as they are geared to sell the expensive primary access lines used for all corporate traffic including servers. Keep stressing that you want a low-speed secondary line for use with some network testing and eventually you'll locate the low-cost plans you want.
  • Set up a DNS name for your DMZ lab. Online dynamic DNS providers support user-selected DNS names for static IP addresses. I use dyndns.org for both my dynamic and static IP but there are many out there. You won't need any form of DDNS update client since your IP never changes and you must enter the name manually anyway.
  • Unless you plan to implement large VPN systems, just buy a nice consumer-grade DSL/Cable Router ... the same kind you use at home is fine. If you plan to set up a serious VPN infra-structure, then you'll just need to bite-the-bullet (& suffer the learning curve) of buying a commercial IT-grade router with VPN server capability built in. Be warned that while many consumer-grade routers mention "VPN Support", they are in fact sub-optimized and documented only for home-office users who connect into a corporate Windows or Cisco VPN server. Normal human beings will find them nearly impossible to set up for anything else!
  • Do you want more than one public IP address? You need to pay a monthly surcharge which varies greatly per carrier, but could be in the range of $25 per month for 8 IP addresses instead of just 1. Plus you will need a larger IT-grade router since the consumer-grade routers won't support more than 1 public IP address. Most users won't need more than 1 IP address. However, having more than one IP address is helpful if more than 1 team shares the lab; this prevents them from trying to setup conflicting router configurations. Also, a few extra IP are helpful if you want to place a PLC "online" for your customers or sales force to access during customer-site demonstrations in the field.
  • If you need to access your corporate network from your DMZ lab, then you need to arrange some rules with your IT people. Perhaps the rule is using a notebook computer with 802.11 wireless to the corporate network is Ok as long as the notebook is NEVER connected to the lab's Ethernet. Remember, since your lab has it's own public IP address you can even use FTP or a VPN client to connect "legally" out your corporate network and back into your lab via the Internet.
Fancier DMZ Lab Design
Since Digi is basically a "communication company", the lab I get to use is much fancier. It has 32 public IP addresses and even limited secure access from the corporate LAN. Of course I have to share this lab with other teams, so I'm not owner of 32 IP. As an example, here is how my lab is setup:
  • Digi's IT group maintains a Cisco PIX router (a mid-range $800 model) that manages the 32 public IP addresses. Actually, this is NOT a complication since this router does not by default firewall any traffic; it merely distributes raw traffic based on public IP to one of many to internal IP addresses in an organized manner. I was lucky enough to get in the lab early and to be assigned 2 of the 32 public IP addresses.
  • My first IP address receives 100% raw internet traffic at an internal static IP I selected; so an external IP such as 70.x.x.140 forwards to my internal IP of 192.168.20.159. Since the goal of the lab is to avoid burdening IT with TCP/UDP port forwarding chores, one could place a consumer-grade DSL/Cable router at this IP. Placing 2 routes in series is NOT a problem - do a net-trace of how you access www.google.com and you'll see a dozen or more routers in series. Instead of a pure hardware box I have a Ubuntu Linux machine running firewall and router tools at this IP. This is where I forward Modbus/TCP to one PLC and Rockwell Ethernet/IP to another PLC. I prefer the Linux box to the $39 hardware box because it gives me a richer view of traffic in and out, plus I can run an Ethernet sniffer such as WireShark to see a complete trace of the 2-way conversation taking place.
  • For my second IP address I had Digi IT setup the PIX router to just forward a simple, safe list of Modbus, Rockwell, Digi, and other industrial protocol TCP and UDP ports. I normally have a Digi One IAP (an industrial-protocol aware Ethernet-to-serial device) at this IP address, but I can safely swap in a Windows machine when a test requires use of Windows tools.
Just as your home router does, the main Digi IT-supplied PIX router does out-going NAT to the internet and internal DHCP address assignment. So our DMZ lab has its own internal subnet of 192.168.20.x addresses. Any device in the DMZ lab can access the Internet - with or without going through my Linux firewall/router. Of course the other teams in the lab don't go through my router, but direct to the PIX router.

Since i study cellular usage, I have have a Digi Connect WAN providing an Ethernet-based cellular router in the DMZ lab. So I really have 3 potential routers to use - while it takes a bit of IP experience to not get confused, this allows me to have devices configured to selectively treat any 1 of the 3 routers as "the default gateway/router".
  • For example, I can have a Master/Client device connect out to a cellular-based Slave/Server using a route such as Master => out PIX+DSL => in Cellular => Slave.
  • For example, I can have a Master/Client device connect via cellular to a DSL-based Slave/Server using a route such as Master => out Cellular => in DSL+PIX => in Linux-Box => Slave.
  • In both of this situations I can see BOTH ends of the conversation, which is a huge help in testing, timing, or troubleshooting new applications.
The bonus for having Digi's IT team involved is the PIX router also allows controlled, secure access into the DMZ lab from our corporate LAN. Technically, the PIX runs a set of rules similar to those used by the main corporate firewall out to the Internet and it just treats the 192.168.20.x subnet as a miniature internet. So I can sit at my desk and safely check on equipment running in the DMZ lab. Of course this is limited to equipment treating the PIX as the default router - if you understand IP routing you'll understand why, but at least it lets me log into my Ubuntu Linux router from desk.

Labels: , ,

Tuesday, March 27, 2007

Interested in Cellular? Do some homework

Summary: Unless you are a pro at IP routing, you'll save money and time by learning the basics of IP routing over your home Cable/DSL connection instead of a demo cellular account. Bottom-line ... if you cannot succeed at using your office computer to poll a PLC placed at home via your Cable/DSL connection, then you will NOT succed trying the same trick over satellite or cellular connections.

Homework - Work at Home
When engineers first launch into a cellular data pilot it can be a bit like Christmas with the excitement of new toys, future trends and being "on top of it". However, I encourage anyone interested in using cellular or satellite-based IP systems to do some home work first ... literally "work at home". You'll save lots of cash and avoid many headaches by learning the basics at home first.

Most of you have cable or DSL router/modem at home, so start there. Take a PLC or controller home. If it has an Ethernet port you are all set; however if your device is RS-232 based, then beg, steal, borrow, or purchase a simple Device Server such as the Digi One IAP (fancier, Modbus and Rockwell protocol aware) or the Digi One SP (much cheaper but just a raw Ethernet-to-serial converter). Your goal is to connect from your office computer over the Internet to this device at home ... if you cannot succeed at this, then you won't succeed at cellular access either! But unlike with cellular, all of your trial-and-error over your Cable/DSL Route won't be costing you by the byte.

Just remember that your "Home Cable/DSL Terms and Service Agreement" likely forbids running "servers" so don't go and try to setup an e-commerce shop once you see how easy it is to access your home from the Internet.

Get to Know Your Cable/DSL Router Box
Hopefully you all have an external commercial router box that you either got from your ISP or bought at any big-box store for $39 to $59. If your computer connects directly into your modem or you were fooled into using Microsoft's "Internet Sharing" tool on one computer, save your sanity and go buy a cheap router box! For your $39-59 you get a 4-port switch, a professional stateful-firewall and NAT (more about that later), a wireless access point, and it all consumes maybe 8-10 watts of power so costs you a few $ a year to run. If for no other reason, you just don't want the mindless broadcasts and hacker probes taking a percentage of your home computer's bandwidth. For my VPN testing I have some Linux boxes up exposed like this and they see up to 50 broadcasts per second and a few dozen probes for open Windows and Unix services per hour. There is NO REASON to expose your home PC to this rubbish - use an external router box ... period.

Step 1: Learn how to log onto your Cable/DSL Router.
  • Under Windows 2000 or newer, open a command window and type the command "ipconfig". You should be shown your computer's current IP Address and the Default Gateway, which is another name for your Cable/DSL router. Most likely the router has an IP such as 192.168.0.1 or 192.168.1.1.
  • Confirm you can ping your Cable/DSL router with this IP
  • Open your web browser and browse to the address - as example type the URL "http://192.168.1.1". You should be asked for a user name and password.
  • Check with your router documentation or go on line to the vendor and read the user guide. For example, at home I have an ActionTec router/wireless access point supplied by Qwest, and when it first came it has no user name and a password of "admin". This is actually not so insecure since by default you can ONLY access this web page from inside your firewall/router. But common sense says changing this name/password is wise.
  • There is no way I can explain how all Cable/DSL routers work, but once you can log in you should be able to find a status web page which gives your currently assigned external IP address and 2 DNS addresses. This is how the world sees your home system - write this info down. For example, my home Cable/DSL router (as of today) has the temporary (dynamic) IP of 63.228.51.x.
Step 2: Nail Down a fixed DNS name for your Cable/DSL Router.
So at this point, you know how to access your raw "face" exposed on the Internet. Now we want to give ourself a nice, memory-friendly DNS name to represent that face.
  • As mentioned above, my Qwest IP is dynamic and liable to change at any time. So while I could go to the office and try to point my OPC server or PLC software at 63.228.51.x, I can never be sure how long this will work. In reality it only changes every few months or if I power-cycle my router, but the solution to this problem is very easy so we should solve instead of work-around it.
  • Sign up with one of the many free online Dynamic DNS providers - I use dyndns.org. The Digi Connect WAN (cellular router) family directly supports this, as do many LinkSys and DLink-class home products. In a nutshell, they allow you to create a domain name such as sillyjoe.gotdns.org or sammy345.dyndns.org and then a client tool on your home system automatically updates this DNS name every time your ISP changes your dynamic IP address.
  • While the above service is free, you may want to pay the $10 or so per year for a minimum account. This makes the service more tolerant of errors on your part - for example many services automatically delete your free account if it is untouched for 45 days and so on.
  • If you have a Windows computer, the easiest DDNS update client is just to download the Windows tool recommended by your Dynamic DNS provider. This client automatically monitors the Cable/DSL router's IP as it accesses the internet. If your IP has changed, it correctly updates the DDNS (dynamic DNS servers). I stress the word "correctly" since many external Cable/DSL Router boxes which support DynDns and such services come with bugs which cause your free service to be deleted within hours of setup. So if you chose to use your Cable/DSL Router to maintain your DDNS name, make sure you have the latest firmware upgrade on it!
  • Within an hour of setup, anyone in the world should be able to ping your new DDNS name and get a response.
Step 3: Learn how to Port-Forward within your Cable/DSL Router.
We are almost ready to try access - but if you point your OPC server at your DDNS name ... nothing will happen since your Cable/DSL Router does NOT understand Modbus or other industrial protocols. Remember, the IP your DDNS name represents is the IP address of your Cable/DSL Router and NOT the IP of your home computer nor is it the IP address of your PLC/controller device.
  • Log back into your Cable/DSL Router and locate the setup for port forwarding. Some routers call it setup for applications and games. We need to configure the router to FORWARD specific TCP and UDP ports to Ethernet-based devices you have at home. If you don't know what that means, you are in for a tough time using wide-area-network technologies - I suggest you go to any bookstore and buy a book on basic networking that covers what TCP, UDP, IP and NAT are. This is really key to success in this area. You don't need to be an expert, but you do need to understand the basics!
  • In summary, if you think of the IP address as being synonymous with the main phone number in a building (aka - how to telephone the building), then the TCP and UDP port numbers are synonymous with phone extensions within that building (aka - how to reach a certain department or service). So for example, a Modbus/TCP OPC server will connect to your router using the IP (main phone number) attached to your DDNS name, and then request a connect to TCP port 502 (the service). We need to configure the router to accept and forward the Modbus/TCP traffic to your Modbus PLC. So take the example of a Modbus/TCP device on my local network with the IP 192.168.1.105. We need the router (at say IP 63.228.51.x) to accept any incoming connection on TCP port 502 and forward the packets to the local Ethernet device at IP 192.168.1.105 TCP port 502. Since the PLC has a web server and most ISP block access to home web servers, we'll tell the router to forward TCP port 8080 to local port 80. Depending on the brand of router you have the configuration can get fancier than that - but basically we'll end up with a line in the table looking something like:

Incoming portServiceLocal IPLocal Port
502TCP192.168.1.105502
8080TCP192.168.1.10580

Step 4: Get to Work Learning
That's it - at this point you should be able to use a Modbus/TCP OPC server to poll your PLC indirectly by polling your DDNS name on the standard TCP port 502. Pointing your browser to http://your DDNS name:8080 will pull up your PLC's web pages (the ":8080" tells the web server to use TCP port 8080 instead of default 80).

Of course there is no security offered here - anyone in the world can access your PLC so this is just for educational purposes. Here are some common port numbers to use:
  • Modbus/TCP uses TCP port 502
  • Digi Ethernet-to-Serial products use ports like 2101, 2102, etc to access a serial device by raw TCP or UDP sockets.
  • Digi RealPort uses TCP port 771 (TCP 1027 for SSL/TLS secure connection).
  • Rockwell AB PLC5E and SLC5/05 use TCP 2222 for the older legacy CSPv4. This is often called AB/Ethernet or AB/TCP by 3rd party vendors
  • Rockwell ControlLogix and ODVA Ethernet/IP uses TCP port 44818, UDP port 44818, and UDP port 2222. But be warned Rockwell tools are very poorly designed for wide-area network use.
  • Siemens S7 protocol uses TCP port 102
  • GE SRTP uses TCP ports 18245 and 18246
  • GE QuickPanels use TCP port 57176 for configuration
Ok, so now you should be able to take any of your Ethernet or serial device and test to see if you can access remotely. You'll likely need to slow down your tool - increase the response timeouts from a few seconds to 10 seconds for direct Cable/DSL access and 30 to 60 seconds for cellular.

Labels: , ,

Wednesday, March 14, 2007

Rockwell PLC and TCP Headers

I have started running some tests of standard Rockwell protocols querying off-the-shelf Allen-Bradley PLC, with the goal to create a series of "estimators" for traffic. A user would enter the data to poll and the tool will estimates the data byte load contributed by this poll pattern.

The Mystery 17% Cost Increase:
Last night I ran a test polling ten words once a minute from an Allen-Bradley SLC5/05C's N7 file over GSM. This is nothing exotic - I ran similar tests a few months ago and had preconceived ideas of what to expect ... beep ... wrong! In between Then and Now, some unknown application changed my Windows XP system registry, enabling the "RFC 1323 Timestamp and Window Scale TCP options". The end result was an unexpected 16.51% increase in data byte traffic with no perceived value.

I have no clue which tool did this; and unfortunately Windows (at least 2K and XP) use a single global setting for the entire TCP stack. I could change it back ... but would that break this other mystery application? Will this other mystery application just change it back? Will I launch a mini cold-war race as this mystery application tries to keep RFC 1323 enabled and my test tools try to keep it disabled?

The Byte Counts with and without RFC1323:
Here is an exact accounting of the change in byte counts - remember, cellular is basically a mobile-IP tunnel which moves TCP/IP or UDP/IP as pure data payload. So you pay for both the IP and TCP headers, plus any data-less TCP Acknowledge or Keepalive packets.

I'll ignore the opening and closing of the socket, plus TCP Keepalive since I'm polling fairly steady-state once per minute. The PLC includes the TCP ACK in the response, so at least we avoid 1-of-2 data-less TCP Acknowledgments.


no RFC1323with RFC1323
Request: IP header2020
Request: TCP header2032
Request: CSPv4 Packet4242
Response: IP header2020
Response: TCP header2032
Response: CSPv4 Packet5656
Client ACK: IP header2020
Client ACK: TCP header2032
Client ACK: (no data)00


no RFC1323with RFC1323
Total Bytes per Poll218254
Total Bytes per Hour13,08015,240
Total Bytes per Day313,920365,760
Total Bytes per Month9,417,60010,972,800

So this means a user doing 1 read of 10 words per minute would magically see a 16.51 % increase in data traffic ... just because they (or the IT department or even Microsoft Windows Update) changes a hidden registry setting. This is yet another example of both how hard it is to keep tight control on your cellular data costs; plus adds to my belief that using off-the-shelf host applications over cost sensitive IP networks is a losing battle. At some point you'll need a tool or device which is 100% "under-control" when it come to packet creation.

Windows Registry Details:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts

Tcp1323Opts
Key: Tcpip\Parameters
Value Type: REG_DWORD—number (flags)
Valid Range: 0, 1, 2, 3
  • 0 (disable RFC 1323 options)
  • 1 (window scaling enabled only)
  • 2 (timestamps enabled only)
  • 3 (both options enabled)
Default: No value. The default behavior is as follows: do not use the Timestamp and Window Scale options when initiating TCP connections but use them if the TCP peer that is initiating communication includes them in the SYN segment.

Description: This parameter controls the use of RFC 1323 Timestamp and Window Scale TCP options. Explicit settings for timestamps and window scaling are manipulated with flag bits. Bit 0 controls window scaling, and bit 1 controls timestamps.

Labels: , ,

Friday, February 23, 2007

Modbus Report-By-Exception over Cellular IP

Summary: While traditionally serial Modbus has been considered unable to use Report-by-Exception, when combined with IP networks Modbus Report-by-Exception becomes very natural and effective.

Modbus/TCP is inherently peer-to-peer
People using Modbus/TCP over Ethernet or IP are familiar with its ability to function as peer-to-peer. Most PLC with Ethernet ports can function concurrently as a Modbus/TCP slave and Modbus/TCP master. So 2 PLC can very easily connect - with 2 separate Master-Slave TCP connections - and share information. One TCP connection is a Master/Slave connection with the first PLC as Master and second PLC as Slave. The other TCP connection is a Master/Slave connection with the first PLC as Slave and second PLC as Master.

Technically, this is not Report-By-Exception in the true sense of a protocol. However, since the PLC-as-Master communication events can be triggered by field inputs, it has the same effect as writing information only upon exception or when change is relevant.

Modbus/RTU as peer-to-peer
Serial Modbus/RTU is a bit harder to use this peer-to-peer trick with. A device with 2 serial ports can of course have 1 port configured as Master to issue remote reads and writes, while the 2nd port is configured as Slave to answer requests. When connected to a 2 serial port Modbus IP to Serial Bridge (such as the Digi One IAP), the 2-port serial RTU becomes a full Modbus/TCP peer, capable of operating fully peer-to-peer with other PLC and SCADA/OPC applications.

However, vendor's aren't blind to the marketing aspect of "more hardware". While adding a 2nd serial port to a one-port RTU may only cost a few dollars, most likely the 2-port RTU is a much more powerful unit, so the actual end-user cost may go up hundreds of dollars. The same is true of Ethernet; while adding an Ethernet port may only cost a few dollars, user's expectations of Web Pages and fancy functions means the Ethernet-enhanced device price may be $500 or more above that of a simple 1 serial port RTU

Fortunately, the Digi One IAP (as well as PortServer family) allow Modbus/RTU slaves to use Report-By-Exception on the serial port. As long as only one serial slave is on each port, the Digi uses configured knowledge to "split" the single serial port conversation into two traditional Modbus/TCP connections. So traditional remote Modbus/TCP masters can query the serial slave RTU, completely unaware that on occasion the serial slave RTU wakes up a acts as a Master to write data during Exceptions. Somewhere, a traditional remote Modbus/TCP slave will receive Modbus/TCP messages from the serial RTU slave, completely unaware that when not busy reporting exceptions, the remote "Master" is really a passive Modbus/RTU slave.

This feature is ideal in wide-area-network situations were bandwidth is limited or data traffic is billed on volume of bytes moved. For example, many SCADA systems only need to check on remote status every few hours ... for example lift pumps in a storm sewer system do absolutely nothing interesting for weeks or even months in the absence of rain. Even during a normal rain, checking on them every few hours is likely enough ... that is *IF* the remote life pumps can send Report-By-Exception messages during system problems.

For example, we have one customer piloting use of Modbus Report-By-Exception over cellular data network. Their eventual target is to poll the remote sites once per day. They use a simple, single-port Modbus/RTU slave which combines I/O with an LCD and push buttons to make a simple, self contained "RTU" or Remote-Terminal-Unit in the truest sense of the word. Use of Modbus in UDP/IP and Report-By-Exception allows this customer to plan for $12 per month per site bills. If forced to poll continuously with Modbus over TCP/IP, they would need to pay $50 or more per site per month. With hundreds or thousands of sites, that is a huge cost savings and opportunity for better ROI (return on investment).

More Information
Here is a general discussion of how to design Modbus/RTU serial slaves and masters to gracefully handle Report-By-Exception:
Here is a more focused discussion of the Digi One IAP and PortServer TS1, TS2, TS4, TS8, and TS16 handle serial Modbus/RTU Report-By-Exception

Labels: ,

Friday, February 09, 2007

Cellular IP-Friendly Apps - Response Delays

Back to my series of entries on creating graceful IP apps

Many newly written Ethernet-enabled applications incorrectly equate "Ethernet = Fast". They overlook that Ethernet is often just a path into other slower IP-based networks. Worse, some well meaning programmers set the response default to 250 milliseconds and limit the user configuration to a maximum of 5 seconds - I'd say so far about 20% of the applications I've had to help customers will limit Ethernet timeouts to 5 seconds or less.

But cellular networks have a high end-to-end latency - especially if the line has been idle for a few minutes. Normal slave response times will be near 2 seconds with round trip delays up between 10 to 12 seconds common each day (see my entry on real world Modbus numbers). Interestingly enough, every cellular "expert" I talk to keeps correcting me that cellular latencies are in the 50 to 100msec range and getting better every new "gen". Well, I guess my Saturn ION can do 400 miles-per-hour also ... if you drop it out of an airplane! Well, regardless of what these "experts" are smoking, my simple tests show otherwise where it really counts ... in actual real world tests run over the Internet to cellular-based IP devices.

Recommendation: IP applications should default to a 3 second response timeout. Applications must allow users to configure this timeout to be lower (perhaps to 250msec) and also higher to at least 60 seconds.

Impact: On Ethernet this should have no direct consequences since the timeout only has affect if the remote is no longer available - in which case the remote is going 'offline' anyway. The minority of users who really want a 250 millisecond timeout can set it manually, while cellular users who want a more reasonable timeout of 15 seconds can also set it also.

For cellular networks, the real problem with premature timeout is the customer has already paid for the request and very likely will also pay for the response - even if the response comes after the application gave up on the response and did a request retry. Assuming the user is polling the remote at a moderate pace to control costs, there is no harm is waiting longer for the response to maximize the value of the traffic paid for.

Another simple example is an application that sends a request, then timeouts twice and retries twice. How will the application react when it receives three responses at the same time? Remember, the first two requests probably were not lost; they still likely reached the remote device and created responses. Their responses may have been just delayed longer than expected. Since serial Modbus doesn't include enough information in a response to match it up to a request, this can cause serious misoperation of the system. Protocols including a sequence number should handle this more gracefully, but it will still be a waste of money.

We have also seen protocols which treat unexpected responses as a reason to abort and reset the communication channel, which further adds to cost. For example, we had one super headache with a big-name seller of "energy curtailment" systems. The end user insisted a 5 second timeout was the maximum they could tolerate (ie: wishful thinking - set a 5 second timeout regardless of reality). So lets just see what happens when we hit one of the rare but expected latencies over 10 seconds.
  1. SCADA software sends out request sequence 74
  2. 5 seconds later, SCADA times out 74 and sends out 75
  3. 5 seconds later, SCADA times out 75 and sends out 76
  4. 1 second later - since TCP/IP is reliable - all three responses return
  5. SCADA is expecting response 76, but sees 74 ... Oh, big problem ... need to reset comm subsystem
  6. SCADA sends reset to remote RTU, expects response 1 but ... da da ... sees response 75 since they never flushed the old info and TCP/IP is reliable.
  7. SCADA sends a 2nd reset to remote RTU, expects response 1 but sees response 76 since they never flushed the old info and TCP/IP is reliable
  8. At this point, I hope you see that there are still 2 responses to the comms reset in the receive queue!

Anyway, whenever this reset "temper-tantrum" occurred it would take 10 to 15 minutes to get the connection back up. Of course one problem was the stupid customer unwilling to set the correct timeout, but the SCADA software was defective since it wasn't smart enough to just discard old responses with timed out sequence numbers. In the above example, life would have been fine and dandy had the SCADA system just discarded responses 74 and 75 since it expected 76.

Labels:

Wednesday, February 07, 2007

Cellular and DNP3

I was just at the Distributech show earlier this week - the show for power utilities. Lots of interest in cellular access. I know both OSI and Itron have successfully tested their software against our cellular product.

I have a DNP3 RTU up on my public cellular device, but need to confirm details of how the public can access it. I also had a discussion with the primary provider of DNP3 source code in the world and we will be looking at putting a DNP3 slave simulator up via cellular. It would be really userful if this could expose some of the statistical & diagnostic info managed by the simulator. This would help software vendors fine tune their software to handle the variable latency of cellular.

Labels: ,

Thursday, February 01, 2007

Do Users Really Want Industrial Ethernet?

(For those impatent to read this to the end - I'm not saying don't use Ethernet ... I am just saying be careful you understand what your customers expect and what functionality they will assume you include *for free* when you add Ethernet)

My last post created some interesting feedback. But I want to emphasize a topic from that post more fully. For the last 15 years I've been involved in the "multi-vendor interface" business - linking multiple vendors' equipment by data comms. First I worked in RS-232 and 485, then fiber optics, then Ethernet, and now by virtually every technology that moves TCP/IP.

From time to time I am contacted by some pretty desperate customers - for example I had one customer who had piloted some Ethernet-based temperature sensors. Things worked fine in the lab with their lab computer, so they bought 50 ... only to find out they couldn't use them. It seems these sensors really were "just Ethernet" - they talked by Ethernet broadcast and direct MAC-layer packets. They didn't support TCP/IP and therefore could NOT be routed by any standard network infrastructure. The user could not talk to any of the sensors they had intended to install in panels around the plant because the "Computer Room" wasn't on the same physical Ethernet segment as the "floor". There was no way to broadcast or unicast MAC-level between the systems. This customer hoped I knew of some magic box to act as gateway between TCP/IP nodes and pure Ethernet nodes; I didn't.

So this brings me back to the concept of the true cost to implement "Ethernet". Customers who ask for Ethernet are not really asking for Ethernet hardware or an Ethernet media bus. They have the expectation that they can interface your "Ethernet Devices" with the wide variety of other equipment they have - including WiFi, routed Ethernet, fiber optics, wide-area networks, and so on. They also expect (at least in a future firmware rev) web pages for configuration, SNMP for remote management, strong encryption, and so on.

So the term "Ethernet" has taken on a life of its own - remember when 802.11 was called "Wireless Ethernet". Well, there is absolutely NOTHING Ethernet about 802.11, yet it was a useful PR move to link the two. No doubt it helped spread the acceptance of WiFi as we now call it. Interestingly enough, the current PCI verse PCI-Express adapters you buy for a PC are using the same PR trick - linking a new, unknown technology to an old established technology that merely accomplish the same function by very different means. Maybe Sony should have called Beta-Max VHS-Max instead ... but then I'm showing my age by even knowing that a consumer-oriented video standard other than VHS even existed.

But back to Ethernet. If you are a small device maker and have yet to start making Ethernet-based products, just be aware that customers who ask for "Ethernet products" aren't really asking for ... err, products with Ethernet. They are asking for products which integrate into (at a minimum) the wide family of TCP/IP based technologies out there. I am not even talking about should you support Modbus/TCP or ODVA Ethernet/IP or ProfiNet yet. I am just saying customers will expect your "Ethernet products" to be able to hold a raw TCP/IP or UDP/IP conversation with all of the other equipment they are investing in daily.

So the cost to add an Ethernet port is just a small part of your cost to "add Ethernet". That is why companies like Digi can sell Ethernet-to-Serial converters or sell "async Ethernet driver chips" like the Digi Connect ME which links to your CPU's serial UART. These devices of course cost more than $9.95 or even the cost of a few new hardware chips, but that higher cost is paying for TCP/IP, web servers, SNMP servers, strong AES encryption and all of the other things your customers expect when the buy "Ethernet products".

So to digress a bit, I suffer this "Oh, don't worry ... it's Ethernet" on a daily basis. So far I have to say at least 95% of the off-the-shelf software applications I test supporting TCP/IP don't work well with technologies other than direct Ethernet. This includes problems not only when extremely different media like satellite or cellular, but even when WiFi is used. So that is part of my mission in this blog - what you want is NOT to Ethernet-enable your products. Instead you need to "IP-enable" your products by way of an Ethernet interface.

Labels: , ,

Friday, January 12, 2007

Cellular Costs - bytes you pay for each month

Sadly, we are about to enter one of the dark-arts of cellular usage ... what are you actually billed for. Given the 50 page voice cell phone bill my family gets each month, one would NOT think the cell phone companies lacked the ability to explain - let alone document - what they charge data users for! It is not that one cannot get a verbal answer from cellular providers' engineers; one can get too many different answers.

However, there are some facts we can know.

An example: Modbus/RTU via TCP/IP, one poll per 10 minutes
Let us build up an example. Start with a customer named Joe who plans to poll 10 words of data every 10 minutes, or 4320 polls per month. Under Modbus/RTU this would be 8 bytes in the request and 25 bytes in the response. So Joe starts with a the wonderful view that he'll only be moving 143K per month and maybe one of those $3.95/month plans for half-a-meg will fit nicely.

Sorry to throw some cold water on Joe's euphoria, but Joe still must pay for the TCP and IP header overhead. After all, the cell data network is in effect "tunneling" his TCP/IP and Modbus/RTU and so treats even the TCP and IP headers a billable payload. So Joe needs to consider that 4320 round-trip polls per month results in 8640 TCP/IP data packets and potentially another 8640 TCP acknowledge packets. Perhaps half of these TCP acknowledge packets will be merged with the TCP/IP data packet returning the Modbus/RTU response ... but then again maybe they won't. So to keep it simple and budget safe, Joe should assume worst case and that all 8640 TCP acknowledgements travel alone. Assuming each IP header is 20 bytes and each TCP header is another 20 bytes (they may be 28 is you use Linux), this amounts to another 17,280 times 40 bytes or 691K bytes (0.7MB) JUST for the theoretical TCP/IP overhead. Joe is up to 834K per month now - clearly a 1MB/mo or larger plan is required.

Ok, wait a second ... now why did I say "theoretical TCP/IP overhead"? Because in reality Joe will end up moving more TCP/IP traffic than the 4320 polls strictly require. The first extra overhead will be from premature TCP retransmissions. The high variable latency of cellular means Joe will see from 2% to 10% retransmissions, and since cellular is very reliable, each transmission will result in duplicate TCP acknowledgements as well. Sticking to worst case, budget-safe assumptions Joe should budget about 10% or 100K per month for premature TCP retransmissions. So now Joe is up to 934K per month.

However, there is yet another overhead Joe should budget for - TCP Keepalive probes to detect lose or death of the TCP socket. Without this, one end of the connection could go away and the other end would never know and never recover the socket resource. Since wide-area-networking is involved, Joe also needs to assume at least one intermediate device will abort and discard the TCP context if idle more than 5 minutes. Given Joe polls every 10 minutes, he'll need at least one TCP keepalive exchange between each poll. Each TCP keepalive exchange consists of another 40 plus 40 bytes, so we are talking 4320 x 80 bytes or another 346K of billable traffic. This puts Joe up to 1.28MB of billable traffic to move 143K of Modbus traffic.

Now, why not close and reopen the socket? Yes, that is an option but each TCP close and reopen generates about 320 bytes - not including TCP retransmissions. So Joe can either pay for 346K worth of TCP Keepalive or 1.38M of TCP socket thrashing; which would be 1.28MB and 2.32MB per month respectively.

So Joe is up near 1.5MB per month just to move his 10 registers of data once per 10 minutes, and this doesn't include any time he checks the web interface of his cellular device for status (say another 200-500K per access), nor does it include any on-demand HMI data access screens which trigger other Modbus/RTU polls. These could easily create many MB of traffic per month and requires carefully, mindful behavior by Joe and his colleges to control costs. One careless person can easily drive the cellular bill up by hundreds of dollars in a month!

Summary:
  • Raw Modbus/RTU data = 140K per month
  • Basic TCP/IP headers to move and acknowledge data = 691K per month
  • Estimated 10% premature retransmission = 100K per month
  • One TCP Keepalive exchange between 10 minute polls = 346K per month
  • Overall, Joe should expect at least 1.5MB per month and I'd suggest he budget for 3MB or even 5MB. This puts him up into the $20/month cell plan range.

Labels: , ,

Saturday, January 06, 2007

Real Numbers (Part 2) - Modbus/RTU over cellular

My previous post covered Modbus/RTU polls once per 30 seconds. Here is a second set of results for 24 hours with polling once every 60 seconds.
  • Of 1440 polls, only 3 failed responses with a 30 sec timeout.
  • Fastest poll was 451 msec round-trip (1/2 sec)
  • Slowest poll was 10,407 msec round-trip (10.5 sec)
  • Statistical average of successful polls was 1748 msec
  • In chart below, white dots are round-trip times and black line is the moving average over a 15-minute period.
  • Notice a few interesting trends; such as the very fast response around 11am and again around 3am the next morning.
24 hours chart
(Click the image to see a large version)
(Click here to download ZIP file with Excel and OpenOffice spreadsheet of times)

Labels: ,

Friday, January 05, 2007

Real Numbers - Modbus/RTU over cellular

I'm working on a fuller set of numbers, but here is a real-world example of Modbus/RTU poll-response times over cellular.

I am running a Modbus/TCP poller (ModScan32), which polls a Digi PortServer TS4H by Ethernet, and the TS4H in turn is using Modbus/RTU-in-TCP/IP via the Digi corporate backbone to poll a Digi Connect WAN on Cingular GSM with a Twido PLC on the serial port.

Why use the TS4H here? Why not go directly from a Windows computer to cellular? Well, three reasons:
1) The Digi TCP/IP stack is much more graceful over cellular than Windows' TCP/IP stack - the Windows stack retries too hard and wastes bandwidth that some patience would save. With Windows you'll commonly see 2 to 5 percent retransmissions which - given how cellular is VERY reliable - ends up doing nothing but create duplicated TCP acknowledgments you must pay for. This is actually a weakness in the design of TCP/IP; which proponents claim is SOLVED by TCP/IP as-is. TCP allows hosts to auto-adjust timing behavior to match real-world performance. Unfortunately, the standard TCP algorithms keep timing too close to the "average" behavior which wastes CASH over cellular links with high and variable latency.
2) I am testing a slave timeout algorithm in the Modbus Bridge code for the Digi One IAP and TSx family related to "stale" responses arriving after the slave timeout. This is a common weakness in Modbus/RTU hosts which assume either a timely response or NO response.
3) The Digi Modbus Bridge keeps nice timing statistics such as min/max/average round-trip delay and most Windows tools do NOT.

So for example, my Master polls once per 30 seconds. This means the GSM modem in the Digi Connect WAN maintains a constant data slot allocation with the cell tower. After 370 polls, 352 have had a round-trip of 2500 msec or less and 18 polls have had a round-trip above 2500 msec (ie: I have seen 18 timed out requests with a "stale response" arriving AFTER the timeout period - behavior I am investigating).

The Digi PortServer TS4H telnet trace includes this info:
01:38:15 IA INFO: mbrtu:s02 complete rsp min:467 avg:1565 max:9142 msec

This means the fastest poll took only 467 msec, the slowest took 9142 msec (nearly 10 seconds!) and the moving average round-trip time is about 1.5 seconds. So the minimum round-trip was only one-third of the average, while the maximum round-trip was nearly six-times the average. Since every poll is exactly the same, one would NEVER see such variance on a direct RS-232 or RS-485 serial line. I'd be surprised if a PLC would have even a +/- 10% variance in response times. This is one of the problems with using off-the-shelf tools with cellular - the vendors have just NOT designed the tools for such variation in response performance.

As a side note, the polls being less than 40-50 seconds is of interest here because the cell tower (with 2.5G GSM/CDMA) will take away the data slots from the modem if it has been idle too long - the time varies but is often in the 40-50 seconds. When this happens, the modem is still connected to the tower but requires some control traffic to be reassigned bandwidth to move data. So using a poll rate slower than this idle time would shift the average round-trip time up. Once cellular system make the move to 3G this "idle period" will drop to a few seconds only, meaning telemetry systems may perform much WORSE in the new faster networks.

Labels: ,

Thursday, December 28, 2006

Cellular IP-Friendly Apps - Never Block on TCP Sockets

Traditionally programmers have assumed that a TCP packet will either make it to the remote peer error-free or the TCP socket will be detected as failed. However, this has proven a disastrous assumption in the world of cellular networks.

Cellular networks seem to suffer a kind of burst-error mode where whole groups of TCP packets get lost or delayed, while another group makes it through. This seems to confuse the TCP state machines within OS which are optimized for the more rare, single-packet loss of Ethernet. We have Ethereal traces where one can see the application send a TCP packet, the OS retries once, a collect of old stale TCP acknowledgements return from the remote - then nothing. Eleven hours later there have been no more TCP retries, no TCP keepalive, no response from the remote, and no TCP stack error from the OS to abort the application block. The host application is still hung, blocking waiting for either a response or socket failure which never come.

So is this a bug in the OS? Does it matter? It is your application and "our" customer who pays the price. For example, Digi had to go through our RealPort driver and literally add an OS timer to abort every TCP socket call if it did not return in 60 seconds. Yes, this sounds like a royal pain but it was the only way to avoid this failure every few weeks when running across cellular IP.

Recommendation: applications must NEVER block on a socket waiting for a response or socket failure. Applications must always use an OS or external timer to abort socket functions that take longer than 1 minute. Sadly, running in non-blocking mode is NOT enough since at times it will be the API call which fails to return regardless of the block/non-block setting. So even using API calls with explicit timeouts is not safe.

Labels:

Friday, December 15, 2006

Rockwell AB PLC via Cellular

So far we have succeeded in getting several Rockwell/Allen-Bradley PLC up on Cellular with the Digi Connect WAN, which is a cellular router for GSM or CDMA with local Ethernet and serial port.

In Summary:
  • Serial DF1: You can access serial MicroLogix PLC such the MicroLogix 1200 on the remote Digi Connect WAN's serial port. You either need to have an OPC server which can directly encapsulate DF1 protocols into TCP/IP or to use Digi RealPort to create redirected COM ports for RSLinx. Ideally, using the newer DF1 Radio Modem protocol can cut your data costs in half, but DF1 Full-Duplex or Half-Duplex can also be used. DH485 won't work via cellular due to the high latency. You must slow the PLC (ACK) timeout setting down to 30 seconds, so you cannot use a MicroLogix 1000 since it doesn't allow this parameter to be adjusted. DF1 Radio Modem has no DF1 (ACK) or (NAK), which is why it costs less to use.
  • CSPv4 or AB/Ethernet: You can access legacy PLC such as SLC5/05 and PLC5E by enabling TCP port forwarding of port 2222 on the Digi Connect WAN. Under RSLinx you enter the IP or DNS name for your Digi Connect WAN in the "Ethernet Driver", then right click the driver to slow down the timeouts from default of 3 seconds to a cellular-friendly 30-seconds. For a bit of fun, open this link in your browser and you will access the web pages of my SLC5/05 through Cingular/GSM cellular - http://digiwan.gotdns.org:8080/. But please don't leave this page open since you'll impact other people trying to look at my cellular PLC.
  • Ethernet/IP: You can access ControlLogix and other newer PLC supporting Ethernet/IP by enabling TCP port forwarding of port 44818 on the Digi Connect WAN. Under RSLinx you enter the IP or DNS name for your Digi Connect WAN in the "Remote Devices via Linx Gateway" Driver, then right click the driver to slow down the timeouts from default of 3 seconds to a cellular-friendly 30-seconds. You cannot use the RSLinx Ethernet/IP driver since it relies on UDP broadcast which cannot move across wide-area-networks.


If you want more detailed instructions, I have an application note online here:

90000772_A_Cell_AB.pdf

Labels: , ,