Lynn's Industrial Protocols over IP

Wednesday, August 22, 2007

The Truth about Cellular IP

I have been very busy analyzing real-world telemetry traffic over cellular IP, and unfortunately I am now 100% convinced that you cannot effectively use most (any?) off-the-shelf "Ethernet" software tool to talk to remote Ethernet devices over cellular IP. Bottom-line is that - unless your host app is custom written to be data cost and time delay sensitive - your data costs will be bloated due to the nature of the tool. Even something as "obvious" as adding compression doesn't solve the problem because telemetry packets tend to be too small for effective compression. For example: an 8-byte Modbus/RTU request becomes 12-bytes after ZIP-style compression. Plus this doesn't reduce the 104-bytes of TCP overhead nor 28-bytes of UDP overhead. None of the cellular providers allow use of RFC-class TCP header compression, since it requires all of the infra-structure to maintain
copies of headers etc.

So I have been working on "reduction" solutions - how to obtain the effect of moving "X" IP packets but only moving "X-minus-a-bunch" of actual IP packets.

Tunneling TCP thru UDP
The most promising and generic form of reduction is to tunnel TCP/IP via UDP/IP over cellular. So the host application talks TCP/IP to a local proxy, which acts as the TCP end-point. All of the TCP SYN, ACK and Keepalive traffic is limited to the local Ethernet. The local proxy then initiates a UDP "session" with a remote proxy over cellular & we instantly see a 60-90% reduction in data costs. The remote proxy initiates a TCP/IP connection to the remote Ethernet device, which again isolated the extra TCP overhead to the remote Ethernet.

The reaction of non-IA network engineers to this idea is predictable and a bit humorous after a while. They immediately say "You cannot do that!!! UDP/IP is unreliable!!! You'll break something!!! You are committing a mortal Sin!!!" But in reality none of the IA protocols leverage the reliability of TCP anyway. For example, Rockwell RSLogix doesn't send a program block to a ControlLogix and blindly assume it was successful after the TCP Acknowledge from the peer is processed. Instead, RSLogix sits (blocks literally) and waits for a successful CIP response on a single CIP Connection. So if the local proxy returns a TCP-ACK to the RSLogix host and the CIP request is lost within the UDP/IP tunnel ... eventually RSLogix times out the CIP connection and the application (and/or user) will restart.

Fortunately, cellular is very reliable - all of my tests sending 10,000 UDP packets rarely even lost 1 packet and I'm not sure if such a rare loss is due to cellular or just my test script hiccupping & dropping a packet. Plus cellular tends to have only very bursty error problems. In other words, you won't lose 1 packet per 10,000; instead you'll lose all packets for 5 minutes or just 5 random packets out of a group of 10 sent. This shotgun-damage tends to confuse TCP/IP state machines to the point that they abort the connection anyway. In truth, in all of my Wireshark/Ethereal trace reviewing I have never seen a single situation where a TCP retry did anything but add data cost; every TCP retransmission just results in a "Duplicate ACK" showing up a few packets below in the trace & a doubling of the cost of that block of data.

So overall, anyone planning to use cellular should first investigate if they can use UDP/IP instead of TCP/IP.

TCP Problem #1 - added cost for pointless ACK
As mentioned above, real-world analysis of telemetry use of TCP shows the TCP ACK isn't useful; but worse, Embedded TCP devices tend to sub-optimize the ACK timing to "speed up" data transmission and recovery. Almost universally moving an IA protocol via TCP/IP results in 4 TCP packets instead of the idealized 3.

  • Your app sends a TCP request (request data size + 40-52 bytes of overhead)
  • 800-1100 msec later your app receives a TCP ACK without data (another 40-52 bytes of overhead)
  • 10-40 msec later your app receives the protocol response (response data size + 40-52 bytes of overhead)
  • Within a few msec, your app sends the TCP ACK without data.

So what would have been a 2-packet transaction with only 56 bytes of overhead under UDP/IP, or what should have been a 3-packet transaction with 120-156 bytes of overhead under ideal TCP/IP usually becomes a 4-packet transaction with 160-208 bytes of overhead. Yes, there exists a TCP socket option and the concept of "Delayed ACK Timer" to prevent the first empty TCP ack from being returned over cellular, but few embedded products use this since it adds code complexity, and it slows down overall data communications. At least in the IA world it seems everyone wants their Ethernet Product costing 2-4 times more than their serial product to appear lightning fast. So they ignore the TCP community's decades of hard-earned experience and "hack" their TCP stack to sub-optimize fast local Ethernet performance.

So this is where the instant 60-90% data cost savings of using UDP over TCP comes from. UDP has smaller headers and results in fewer packets being sent. Since the cellular IP system is "encapsulating" your TCP/IP packets in a manner similar to PPP, the entire IP header, TCP or UDP header, and your data is all considered billable payload.

There is also a myth propagated to this day that the TCP ack causes retry to occur more rapidly out in the wide-area-network infrastructure. The rhetoric goes, "If the 3rd and 4th router link is congested and the TCP data packet is lost, then the 3rd router will retransmit ... which is faster ..." Perhaps this was true back in the 1980's, but today the 3rd and 4th router (and all of the other 20 to 30 routers in a cellular end-to-end path) are just tossing IP packets upstream with no awareness of the packet functions. In reality, it is only the TCP state machines within your host machine and within your remote device that have any ability to retransmit anything.

TCP Problem #2 - added cost for premature retry
The TCP RFC includes many dynamic timers that automatically adjust themselves based on real-world performance. This is actually pretty neat. It means if the TCP ACK and response times tend to be longer than normal, then the TCP state-machine slowly increases the delay before retransmission. But I've seen 3 problems with this.

  1. The most effective way to leverage auto-adjust is to include the 12-byte TCP header options that time-stamps all packets. Linux system add this by default and installing one of many PLC engineering tools on your Windows computers causes Windows to also start always using this. The setting generally is global - you either have 40-byte TCP headers or 52-byte TCP headers forever. So for small telemetry packets, this adds a disproportionately large increase in data costs.
  2. Many embedded devices (PLC, RTU and I/O devices) have "hacked" the TCP ACK sub-system to force connection failure to be faster than the standard 3-4 minutes. For example, I worked with one large PLC company which expected TCP sockets failure in less than 1 second, so they forced TCP retransmission in hundreds of msec and without any normal exponential backoff between retries. This is totally unusable over cellular; you will end up with 30% to 90% of your data traffic being premature retries and responses to premature retries. I have literally seen Wireshark/Ethereal traces which are mainly black lines with red text - which is the default color used to show TCP "problems" such as lost-frag, retrans, dup-ack, etc.
  3. The latency in cellular is abnormal by an order of magnitude. Even browsing the internet or doing a telemetry polling test over DSL/cable broadband averages latencies in the 100-150 msec range. This is what a Windows or Linux defines as "slow/bad" - not the 800 to 3500msec of cellular. So even watching a Windows or Linux TCP state machine auto-adjust the retransmission delay over time, you will not see it achieve a 100% effective setting which eliminates wasted TCP retransmissions. The delay seems to top out
    at about 1.5 to 1.8 seconds, which is just too close to the actual "normal" latency range. So again, use of UDP/IP frees the use user from data costs associated with TCP legacy assumptions - both the main-stream MIS/IT market variety of assumptions and the misapplied IA vendors "speed-ups".

TCP Problem #3 - uncontrollable SYN/Socket Opens

Given the way all cellular systems "park" inactive cellular data devices, it is exceedingly rare to ever see a host app open a new TCP socket without prematurely retrying/retransmitting the SYN packet. This is because one is virtually guaranteed that it will take about 2.5 seconds for the data device to be given active airwave resources and return the SYN+ACK response. This has NOTHING to do with the "always connected" feature Digi and others claim. The data device (even when parked) is fully connected by IP and fully authenticated by the system - it is "always connected". However, the local cell tower only has finite airwave resources, so any device (cell phone or data device) which is idle from 3 to 45 seconds is "parked" without having any preallocated airwave resources. Literally when the TCP "SYN" shows up, the cell tower has to use the control channel to inform the data device to request airwave resources, and after these are requested and allocated the data device can receive and response to the TCP socket open request.

But that's not the real problem related to TCP Socket Opens ... the real problem is yet another case of IA vendors sub-optimizing TCP behavior for fast local Ethernet performance. For example, I once had a customer who normally paid about $40 per month receive a $2000 bill one month. It turns out they had powered down the remote site for 3+ days and the off-the-shelf 3rd-party host application they used would try to reopen the TCP socket every 5 seconds!!! So Windows would send the initial TCP SYN to start the open, since the remote was off-line Windows would retransmit this TCP SYN a few seconds later. After a total of 5 seconds, the application would ABORT this TCP socket attempt and start a new one. So this host app was pushing 24 billable TCP packets per minute out to a remote site that was powered down. This was nothing the host app vendor documented, nor was it anything a user could configure or over-ride. The user could configure the host app to ONLY poll once per 5 minutes; but the user had no control over this run-away TCP SYN/Open behavior.

Tunneling TCP through UDP effectively decouples the TCP SYN/Open from cellular data charges. The first TCP Syn/Open request to the local proxy would succeed even if the remote IP site is offline. No retries would be required. Even if the host app attempts to retry the data poll every 5 seconds, this is something the UDP proxy can be configured to "resist". If the user truly wants data packets to only move every few minutes, that is something the UDP proxy can easily enforce.

TCP Problem #4 - sub-optimized TCP keepalive

The final problem I'll discuss (but not by any means the "last" problem with TCP) is that many embedded IA devices have relatively fast TCP Keepalives hard-coded to speed up lost-socket detection. While this is an admirable goal, a Rockwell PLC sending out a TCP Keepalive at a fixed 45 second interval can create up to 6MB of monthly traffic by doing this. Siemens S7 PLC seem to issue TCP keepalive every 60 seconds - a bit better, but not by much. Maybe such a heart-beat is useful to know the remote is accessible, but given the reliability of cell phones (when the last time you had a dropped call or no signal ...) you'll obtain a lot of false-alarms if you treat every missed packets as something requiring maintenance's attention.

Again, tunneling TCP through UDP effectively eliminates the automatic, possibly uncontrollable use of TCP Keepalive. If your process can handle you talking to it once an hour, then the cost of TCP socket open and close, as well as any TCP Keepalive is all wasted investment.

Not only this, but the cellular providers do NOT want users who send a simple, rather empty packet every 30 to 60 seconds - this is literally the worst kind of customer, as this forces the cell tower to "waste" one of its very limited airwave resources with almost no income returned to the carrier. From what I hear, carriers either want customers who talk constantly and pay huge monthly fees (say $90 to $350/month); or they want customers who rare talk and pay a small fee (even just $5/mo) but cost the carrier virtually no direct expenses.

Putting this is "restaurant terms":

  • A cellular data device that talks constantly but pays for a large plan is like a restaurant patron who sits at a table, constantly ordering more food and paying a larger bill.
  • A cellular data device that rarely talks is like the restaurant patron who comes in once a month, sits at a table, orders a meal, pays and then vacates the table.
  • A cellular data device that keeps an idle channel open full time but rarely talks is like the restuarant patron who sits at a table in the resturant, reading the paper but rarely ordering food or paying a bill.

In fact, in private chats with carrier account people, I have heard several times that they have been directly to prefer either customers who talk constantly on large plans or those who talk at most once an hour (better once a day) on small plans. Customers planning to talk every few minutes have been defined as bad investments. It may be fair to say that after years of building up the data-plan customer base, the cellular carriers have come to understand that the REAL cost of data plans is not the bulk data bytes moved; it is instead the percentage of time the device consumes (or squats on) 1-of-N scare airwave resources in proportion to the monthly fee they pay.

Labels: , , ,

Tuesday, June 12, 2007

Real World Cellular - ControlLogix PLC

Summary: Before I listed some real world numbers for Modbus polling. This time I walk through some of the costs and issues of using ODVA Ethernet/IP to talk to a Rockwell ControlLogix PLC.

The Convoluted Path of Wide-Area-Networks:
In general the magic of IP hides reality from us all. We tend to think "now I am browsing Google.com or iatips.com", but we don't really understand how COMPLEX and MIRACULOUS this really is. Your computer is NOT connected to either of these web servers; instead your computer uses the services of a dozen or more other computers/routers to get from "here" to "there". Every single data byte must be forwarded hop-by-hop through all of these cooperative peers.

As example, here is a Trace Route (tracert) of access from a computer within my test lab to a ControlLogix PLC sitting six (6) feet away. I am using public Internet access via a cellular Digi Connect WAN to the Ethernet (ENB) of the ControlLogix. Some of the public IP have "X" entered replacing the digits; you don't need to really know the exact IP value.

My computer has private IP = 10.9.92.1
01 01 ms 10.9.1.1 (Digi's private Intranet)
02 01 ms 10.10.11.10 (Digi's private Intranet)
03 01 ms 10.254.254.2 (Digi's private Intranet)
04 16 ms 66.77.x.x (Digi Co-Host/Internet Link)
05 04 ms 69.8.x.x (Digi Co-Host/Internet Link)
06 64 ms 66.77.x.x (Digi Co-Host/Internet Link)
07 09 ms min-core-02.inet.qwest.net [205.171.128.110]
08 11 ms cer-core-02.inet.qwest.net [67.14.8.18]
09 12 ms cer-brdr-01.inet.qwest.net [205.171.139.62]
10 39 ms qwest-gw.cgcil.ip.att.net [192.205.32.97]
11 35 ms tbr2.cgcil.ip.att.net [12.123.4.254]
12 35 ms tbr2.sl9mo.ip.att.net [12.122.10.46]
13 75 ms tbr2.attga.ip.att.net [12.122.10.137]
14 31 ms 12.122.85.157
15 34 ms 12.86.140.146
16 * Request timed out. (Part of Cellular Infra-Structure)
17 * Request timed out.
18 * Request timed out.
19 * Request timed out.
20 1276 ms mobile-166-XXX-XXX-XXX.mycingular.net [166.XXX.XXX.XXX]
Digi Connect WAN has private local IP = 192.168.196.80 (is 'gateway')
ControlLogix PLC has private local IP = 192.168.196.21

These traces always amaze me - how something so seemingly trivial takes so much effort to really function. Notice how my lab PC has to route through 6 devices to even get out of Digi's company network, then through Qwest (our ISP), through AT&T (my cellular SIM provider), through some unnamed hops of the cell system, and finally be port forwarded to the ControlLogix PLC. The packets may be passing through Minneapolis, Chicago, Detroit, Atlanta, and then finally returning to the PLC sitting right beside me.

Effect of NAT (Network Address Translation)
Now lets look at what happens when RSLinx on my PC opens an ODVA Ethernet/IP socket to the ControlLogix PLC. Every TCP/IP packet requires 4 unique values which define a connection:
  1. Destination IP (target device)
  2. Destination Port (target application within device)
  3. Source IP (return address to originator)
  4. Source Port (likely random port, originator is waiting for responses here)

So we start out with the 4-tuple DST=166.x.x.x : 44818 and SRC=10.9.92.1 : 22256. The 166.x.x.x IP is assigned by my cellular carrier. Port 44818 is ODVA's "well-known" port for Ethernet/IP. 10.9.92.1 is an internal Digi selected private IP. TCP port 22256 is the ephemeral (or random) port selected by RSLinx to listen for responses.

The first NAT effect is the Digi corporate firewall changes the request to be DST=166.x.x.x : 44818 and SRC=66.77.x.x : 22256. My private IP of 10.9.92.1 is meaningless out in the Qwest or AT&T's networks, so something needs to swap this for a "real" world-unique IP leased by Digi. Our corporate NAT interface creates a record (with a lifetime of 5 minutes) that allows any responses to be correctly restored to 10.9.92.1

The second NAT effect is when the Digi Connect WAN forwards to the ControlLogix with another private IP. So the 4-tuple now becomes DST=192.168.196.21 : 44818 and SRC=66.77.x.x : 22256. The ControlLogix thinks IP host 66.77.something is connected to it - not the real host IP of 10.9.92.1. Plus the ControlLogix has NOT CLUE that the RSLinx thinks the ControlLogix as IP of 166.something.

Now, to send a response the ControlLogix issues a TCP/IP packet with the flipped 4-tuple of DST=66.77.x.x : 22256 and SRC=192.168.196.21 : 44818. The Digi Connect WAN restores (undoes) the NAT and changes this to DST=66.77.x.x : 22256 and SRC=166.x.x.x : 44818. After passing back through AT&T and Qwest, Digi's corporate NAT interface restores its own NAT and changes it back to DST=10.9.92.1 : 22256 and SRC=166.x.x.x : 44818.

This understanding of NAT and IP is useful for understanding the capability and limitations of cellular access to certain devices with certain protocols. A future entry will cover setting up RSLinx Classic and using RSLogix 5000 to download over cellular to a L5555 processor.

Labels: ,

Friday, May 04, 2007

ODVA, Rockwell, and Modbus Get Smart

Summary: next week an amazing joining of technologies will take place at ODVA in Ann Arbor MI as ODVA, Rockwell, and Schneider-Electric discuss how to integrate Modbus servers (slaves) into CIP Producer/Consumer systems

One of the fun things about being involved in "multi-vendor" solutions is when you recognize moments of amazing sanity as they occur. One such moment of amazing sanity is occurring next week when ODVA (aka Rockwell / Allen-Bradley) and Modbus supporters (aka Schneider-Electric / Modicon / SquareD / Telemecanique) sit down to discuss how to integrate Modbus devices into the ODVA Ethernet/IP and CIP network systems. Of course there must be some interesting hidden politics behind this move - and I somewhat light-heartedly believe that perhaps French Schneider-Electric sees joining with the Americans (Rockwell/ODVA) as the lesser of two evils when compared to joining with the Germans (Siemens/PNO).

Check out: ODVA Call For Members: Modbus Integration JSIG The kick-off meeting for the Modbus JSIG runs from Thursday, May 10, 11:00 AM to Friday, May 11, 04:00 PM

Side-stepping the marketing fluff and platitudes of a brighter future such meetings evoke, small third-party suppliers and the folks on the plant floor can expect the following benefits. Regardless of the directly stated goals of ODVA, Rockwell, or Schneider-Electric, small vendors will implement solutions that include these abilities:
  • Vendors making Ethernet Modbus/TCP products will have a simpler "first step" to adding full ODVA/CIP support without the somewhat overwhelming task of 100% conversion of a word-array device model into hundreds (or thousands) of CIP objects.
  • ControlLogix PLC will be able to connect through Ethernet-to-Serial devices to multi-drops of Modbus/RTU slaves. For example a user with a dozen small Modbus/RTU PID loop controllers will be able to add an Ethernet-to-Serial device to read via Modbus and cyclically produce a small block of word data from each loop controller over Ethernet.
  • HART, Bluetooth, ZigBee and other new technologies which offer Modbus interfaces will find a instant place as sensors and I/O within CIP and Rockwell systems.
  • Since Siemens, GE-Fanuc, Omron, Mitsubishi and most major PLC brands offer some method to act as Modbus slaves, users with any of these PLC will be able to integrate them within the CIP Producer/Consumer system.
Looking forward a year or two, I also can foresee some other secondary benefits evolving from this JSIG's work:
  • I started working with ODVA Ethernet/IP almost 8 years ago and still as-of today the legacy PLC5E, SLC5, and serial MicroLogix (the old PCCC-based PLC) don't have effective inclusion within CIP Producer/Consumer systems. Since the device model of PCCC PLC shares much in common with Modbus PLC, it is a very small enhancement to add a similar support for AB PLC - perhaps AB will actually extend this to future firmware updates to Ethernet-based PLC. In fact, since my Digi One IAP code already allows Modbus masters to query DF1 and CSPv4 slaves as-if Modbus slaves, as soon as Digi adds Ethernet-to-Modbus support per this JSIG's output users of older AB PLC will gain access to CIP Producer/Consumer systems indirectly as honorary Modbus slaves.
  • Today legacy Modbus and Modbus/TCP systems lack any simple form of multicast producer/consumer exchange. While the IDA protocol offers this, IDA is so many orders of magnitude more complex (and resource hungry) than simple Modbus as to become really an "unrelated" protocol. Any specification that defines a "server interface" naturally implies a corresponding "client interface". So although this ODVA JSIG is not planning to define how Modbus "peers" could use multicast to exchange cyclic data, the end result will be a fairly natural and multi-vendor method to do this. So while I doubt many pure Modbus/TCP products would implement ODVA protocols just to gain this multicast exchange, any products which add the CIP support anyway will naturally add the last few bits of code required to enable pure Modbus-to-Modbus multicast via the ODVA mechanism.
  • Taking the above point to its natural conclusion means Modbus/TCP masters which implement the Modbus JSIG's "server" function will also gain a mechanism to access CIP Producer/Consumer systems. Even if the ODVA JSIG doesn't cover how to do this, natural methods will be inferred, produced, and copied by vendors to make this a fairly common new product feature.
And lastly, it will interesting to see what response PNO considers to likewise include Modbus slaves more formally into PROFINET. Personally, I have always felt that the simpler PROFINET IO protocol would fit very naturally with multi-drops of Modbus slaves acting like plug in I/O modules ... each slave would be a module within the IO device.

Labels: , , ,

Wednesday, April 25, 2007

Rockwell and Modbus Data Traffic

Summary: I compare the data costs for four common off-the-shelf PLC protocols to a remote cellular site. As a rule of thumb, even if you have an Ethernet-based PLC your lowest data costs for SCADA-style periodic polling are obtained using serial protocols via the PLC's serial port. Their modern "Ethernet protocols" are very wasteful of cellular bandwidth.

PLC Protocol Example:
A simple but realistic SCADA scenario is to poll every 15 minutes and read 10 words of data and write 2 words of data. This commonly requires 1 Read command and 1 Write command (I'll ignore the rarely supported Modbus command that reads & writes within a single command.)

While there exists special SCADA protocols and special products that optimize remote traffic, I am not looking at those protocols at the moment. Instead, since cellular and satellite access to remote IP and Ethernet products has enabled people to use off-the-shelf PLC technology, I am looking at the more traditional PLC protocols. These are things which affect users when they apply an Ethernet design to an IP-based wide-area-network system.

I compare these 4 PLC protocols:
  • AB/DF1 Radio Modem (RM) encapsulated in UDP/IP. DF1 RM is basically DF1 Full-Duplex with no ACK/NAK and is supported by the SLC5 and MicroLogix line.
  • Modbus/RTU encapsulated in UDP/IP. Modbus/TCP within UDP/IP is roughly the same size.
  • AB/CSPv4 in TCP/IP as supported by SLC5/05 and PLC5E MSG blocks.
  • AB/Ethernet/IP as moved by ControlLogix Explicit MSG blocks to PCCC-based remote PLC. Note that Ethernet/IP "I/O Messaging" does NOT work through NAT'd wide-area-networks since the protocol embeds IP information within the data packets and is thus is "broken" by NAT.
The bytes per 15 minutes includes the IP headers, any PLC protocol overhead, and the actual data moved for each update (always 20 + 4 bytes). The MB per month is just defined by 30 days of such polling, or 2880 updates at once per 15 minutes. The 2 serial protocol have been encapsulated into UDP/IP and I assume the remote IP end-point can handle connecting this to the PLC's serial port.

ProtocolTransportPer 15 MinMB per monthRelative Cost
Ethernet/IPTCP/IP1202 bytes3.46 MB100%
AB/CSPv4TCP/IP960 bytes2.76 MB80%
Modbus/RTUUDP/IP166 bytes0.48 MB14%
DF1 Radio ModemUDP/IP194 bytes0.56 MB16%

The two Rockwell "Ethernet" protocols cost a lot more to use in part because they force use of TCP/IP, and therefore suffer the repeated cost of TCP socket opening and closing, plus extra TCP acknowledgment overhead. They also suffer because they both involve connection registration and service functions that needlessly repeat every time the connection is reestablished. While the actual data packets of these protocols are roughly twice the size of the serial encapsulated protocols, the real burden they suffer is all the extra TCP/IP packets exchanged that do NOT directly involve field data update.

Both the serial Modbus/RTU and DF1 Radio Modem benefit that they move no IP packets that don't relate to the field data update - no TCP/IP open or close or acknowledgement; no protocol "service function" overhead. Each moves just 1 read request and 1 read response, plus 1 write request and 1 write response.

Discussion of Other PLC Protocols:

Most other PLC Ethernet protocols will either approach the costs of the AB/CSPv4 - or they won't work at all due to use of direct "Ether-Types" and lack of IP compatibility. Most serial protocols with roughly either match the 2 show here or be twice the cost if protocol ACKs are used by the protocol.

Modbus/ASCII will almost double the cost of Modbus/RTU since each data packet is roughly twice the size. But this wouldn't increase the IP overhead any.

Using DF1 Full-Duplex instead of Radio Modem would effectively double the cost over DF1 RM since DF1 Full-Duplex moves the protocol ACK/NAK, which doubles the IP header overhead also. Using DF1 Half-Duplex would triple or even quadruple the costs since HD not only moves protocol DF1 ACK/NAKs, but the ENQ/EOT polling overhead.

Most other protocols I am aware of - such as Omron Hostlink, GE-Fanuc SNPX, and Siemens PPI - would cost roughly 2 to 3 times more than Modbus/RTU or DF1 RM since they include protocol ACK, while a few even encode many parts of the message as ASCII or BCD form instead of as binary.

Labels: , , ,

Thursday, April 19, 2007

Rockwell PLC5E and SLC5 over Cellular

Summary: Customers ask me "How much will cellular cost" - this blog post walks through some examples of SCADA-style periodic polling of AB/PLC5E or SLC5/05 using the CSPv4 protocol (aka AB/Ethernet to 3rd parties) over TCP port 2222.

Real-World Numbers
For a simple SCADA-style example assume we need to read 10 words of data (20 bytes) and write 2 words (4 bytes) every time period. Obviously there would be simple optimizations to this, such as only writing data which changes or using PLC MSG blocks to push data from PLC to SCADA only when something changes. However my goal in this blog post isn't to "tweak" a solution to minimize cost, but to examine the protocol impact of using Rockwell CSPv4 over IP.

The table below shows the megabyte per month when polling once per second, per 5 seconds, per 1 minute, per 5 minutes, per 15 minutes, and per 1 hour. There are lots of variables considered ... and many more ignored. The traffic ranges from worst-case of 1005.0 MB for TCP/IP with larger header options polled once per second to best case of 0.2 MB for UDP/IP polled once per hour. This assumes the use of the CSPv4 submode 7, with local LSAP addressing and ignores that Rockwell PLC5E and SLC5/05 don't support CSPv4 within UDP/IP. Raw efficiency at moving the data bytes ranges for about 10% for UDP/IP to barely 1% for TCP/IP; which means most of what you are paying for is not related to actual, meaningful field data.

( Click this image to see a larger version )
CSPv4 Poll Record
(is at http://iatips.com/blogimage/rockwell_cspv4_traffic.png)

Notes on the Table

Since this example reads and writes small amounts of data, it assumes a SLC5-style Protected Typed Read with 3-Address Fields and the corresponding SLC5 write.

The smaller 40 byte TCP/IP header has no options attached; the larger 52-byte TCP/IP header includes the RFC 1323 Timestamp and Window Scale TCP options. These appear to be the normal default for Linux and easily becomes enabled under Windows since all applications share a single setting in the Registry.

The two time columns "15 min (Alive)" and "1 hr (Alive) assume a roughly 4 min 45 sec TCP keepalive to prevent the socket from closing. This reduces the traffic by the extra open/close overhead in exchange for billable TCP Keepalive packets. Keep in mind this ALSO requires the PLC to be properly configured to NOT close the idle sockets. By default, my SLC5/05 seems to close the idle connections in a few minutes.

The two time columns "15 min (Cls)" and "1 hr (Cls) assume the socket is closed after the a data polls, and the TCP socket and CSPv4 session must be reopened for teh next poll.

Discussion
Of course the standard costs of using TCP/IP verse UDP/IP apply:
  • TCP/IP uses larger headers, ranging from 40 to 52 bytes per packet as compared to UDP/IP's smaller 28 byte of header.
  • TCP/IP involves the TCP Acknowledgments, which may result in separate, billable 40 to 52 byte packets moving frequently without any meaningful field data.
  • TCP/IP may require reopening a socket, costing 120 to 250 bytes per open, plus closing costing from 160 to 400 bytes. Exact sizes are hard to predict since both opening and closing of sockets tend to be "pushed" and result in excess retransmissions and retries when high network latency is true.
  • TCP/IP over unknown 3rd party wide-area-network infrastructure requires at least 1 TCP packet to move every 4 minutes 45 seconds to maintain health. This means either a data packet or a TCP Keepalive with data.
It should be clear to see why using UDP/IP over cellular (which is very reliable) is much cheaper than using TCP/IP. There are no socket open, close, acknowledgment, or keepalive costs. Plus field experience has shown that rapid IP retries rarely succeed. For example, a customer polling with UDP every 5 minutes with 3 explicit retries if no answer will likely have the original poll succeed or that poll and all 3 retries fail. Again, cellular is very reliable in that packets almost always make it through unless there is a network or congestion issues and then only time (a few minutes) solves the problem. So such customers have abandoned retries and just ignore 1 failed 5 minute poll and "retry" in 5 minutes.

CSPv4 issues include:
  • Rockwell PLC and software tools do NOT support use of UDP/IP - my tests with UDP/IP have to be conducted with the Digi One IAP which happily bridges CSPv4 between TCP and UDP (as well as to or from Ethernet/IP and DF1).
  • CSPv4 requires the exchange of a pair of 28-byte negotiation TCP packets when a new TCP socket is opened to inform the client (master) of a server (slave) assigned session handle. This nearly doubles the overhead of an open-poll-close socket paradigm.
  • The 28 byte CSPv4 header really contains little useful information; such excess bytes cost nothing tangible under Ethernet but cost cash in the form of requiring larger cell plans over cellar.
In conclusion, CSPv4 will be a rather poor choice for raw, periodic polling of remote AB PLC. ODVA Ethernet/IP (as implemented by all vendors) is even worse.

Your only effective solution at present is to carefully craft a set of MSG blocks to push data from the field in a report-by-exception paradigm. Of course you also must include safe guards within your PLC to prevent rapid, repeated MSG block triggers during system failure that could cost you thousands of dollar ($$$) in a few days.

Labels: ,

Monday, April 09, 2007

Cellular to Allen-Bradley SLC5/05 on TCP 2222

The old CSPv4 protocol.

The Rockwell/AB SLC5/05 and PLC5E natively speak an older "unpublished" protocol named CSPv4, although most third party vendors call it either AB/Ethernet or AB/TCP. It moves only on TCP port 2222 - ODVA Ethernet/IP I/O Messaging is only on UDP port 2222, so they don't conflict. The protocol consists (normally) of a 3-part packet:
  • 28-byte header
  • 4 or 15-byte LSAP or end-point addressing packet
  • PCCC message which is basically what DF1 documents as an Application Packet
In general, the packets are fairly sparse and compressible (if you have the tools to do this). The characteristics of CSPv4 which impact cellular (and wide-area-network) support:
  • Rockwell tools and PLC only support use of TCP/IP and port 2222; this greatly limits use of CSPv4 in NAT'd networks since the remote NAT router can only forward TCP port 2222 to a single remote PLC.
  • CSPv4 includes a single TCP packet exchange to "register a session" or connect. If you are polling faster than the PLC will hang-up on you, then this is not important. However, if you poll slow enough that a new TCP/IP socket must be opened for each poll, then even ignoring the TCP socket open/close overhead this nearly doubles your traffic costs.
  • In tests, a SLC5/05 seems effective at including the TCP ACK response to the host within the CSPv4 data response packet, so you only have to pay for one empty TCP ACK, which is the host's acknowledgment to the PLC for the response.
  • TCP Keepalive could be an issue, since most hosts fail to issue it and the SLC5/05 I've tested against either doesn't issue TCP keepalives
    or does it very frequently.

Labels: ,

Wednesday, March 14, 2007

Rockwell PLC and TCP Headers

I have started running some tests of standard Rockwell protocols querying off-the-shelf Allen-Bradley PLC, with the goal to create a series of "estimators" for traffic. A user would enter the data to poll and the tool will estimates the data byte load contributed by this poll pattern.

The Mystery 17% Cost Increase:
Last night I ran a test polling ten words once a minute from an Allen-Bradley SLC5/05C's N7 file over GSM. This is nothing exotic - I ran similar tests a few months ago and had preconceived ideas of what to expect ... beep ... wrong! In between Then and Now, some unknown application changed my Windows XP system registry, enabling the "RFC 1323 Timestamp and Window Scale TCP options". The end result was an unexpected 16.51% increase in data byte traffic with no perceived value.

I have no clue which tool did this; and unfortunately Windows (at least 2K and XP) use a single global setting for the entire TCP stack. I could change it back ... but would that break this other mystery application? Will this other mystery application just change it back? Will I launch a mini cold-war race as this mystery application tries to keep RFC 1323 enabled and my test tools try to keep it disabled?

The Byte Counts with and without RFC1323:
Here is an exact accounting of the change in byte counts - remember, cellular is basically a mobile-IP tunnel which moves TCP/IP or UDP/IP as pure data payload. So you pay for both the IP and TCP headers, plus any data-less TCP Acknowledge or Keepalive packets.

I'll ignore the opening and closing of the socket, plus TCP Keepalive since I'm polling fairly steady-state once per minute. The PLC includes the TCP ACK in the response, so at least we avoid 1-of-2 data-less TCP Acknowledgments.


no RFC1323with RFC1323
Request: IP header2020
Request: TCP header2032
Request: CSPv4 Packet4242
Response: IP header2020
Response: TCP header2032
Response: CSPv4 Packet5656
Client ACK: IP header2020
Client ACK: TCP header2032
Client ACK: (no data)00


no RFC1323with RFC1323
Total Bytes per Poll218254
Total Bytes per Hour13,08015,240
Total Bytes per Day313,920365,760
Total Bytes per Month9,417,60010,972,800

So this means a user doing 1 read of 10 words per minute would magically see a 16.51 % increase in data traffic ... just because they (or the IT department or even Microsoft Windows Update) changes a hidden registry setting. This is yet another example of both how hard it is to keep tight control on your cellular data costs; plus adds to my belief that using off-the-shelf host applications over cost sensitive IP networks is a losing battle. At some point you'll need a tool or device which is 100% "under-control" when it come to packet creation.

Windows Registry Details:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts

Tcp1323Opts
Key: Tcpip\Parameters
Value Type: REG_DWORD—number (flags)
Valid Range: 0, 1, 2, 3
  • 0 (disable RFC 1323 options)
  • 1 (window scaling enabled only)
  • 2 (timestamps enabled only)
  • 3 (both options enabled)
Default: No value. The default behavior is as follows: do not use the Timestamp and Window Scale options when initiating TCP connections but use them if the TCP peer that is initiating communication includes them in the SYN segment.

Description: This parameter controls the use of RFC 1323 Timestamp and Window Scale TCP options. Explicit settings for timestamps and window scaling are manipulated with flag bits. Bit 0 controls window scaling, and bit 1 controls timestamps.

Labels: , ,

Friday, January 12, 2007

Cellular Costs - bytes you pay for each month

Sadly, we are about to enter one of the dark-arts of cellular usage ... what are you actually billed for. Given the 50 page voice cell phone bill my family gets each month, one would NOT think the cell phone companies lacked the ability to explain - let alone document - what they charge data users for! It is not that one cannot get a verbal answer from cellular providers' engineers; one can get too many different answers.

However, there are some facts we can know.

An example: Modbus/RTU via TCP/IP, one poll per 10 minutes
Let us build up an example. Start with a customer named Joe who plans to poll 10 words of data every 10 minutes, or 4320 polls per month. Under Modbus/RTU this would be 8 bytes in the request and 25 bytes in the response. So Joe starts with a the wonderful view that he'll only be moving 143K per month and maybe one of those $3.95/month plans for half-a-meg will fit nicely.

Sorry to throw some cold water on Joe's euphoria, but Joe still must pay for the TCP and IP header overhead. After all, the cell data network is in effect "tunneling" his TCP/IP and Modbus/RTU and so treats even the TCP and IP headers a billable payload. So Joe needs to consider that 4320 round-trip polls per month results in 8640 TCP/IP data packets and potentially another 8640 TCP acknowledge packets. Perhaps half of these TCP acknowledge packets will be merged with the TCP/IP data packet returning the Modbus/RTU response ... but then again maybe they won't. So to keep it simple and budget safe, Joe should assume worst case and that all 8640 TCP acknowledgements travel alone. Assuming each IP header is 20 bytes and each TCP header is another 20 bytes (they may be 28 is you use Linux), this amounts to another 17,280 times 40 bytes or 691K bytes (0.7MB) JUST for the theoretical TCP/IP overhead. Joe is up to 834K per month now - clearly a 1MB/mo or larger plan is required.

Ok, wait a second ... now why did I say "theoretical TCP/IP overhead"? Because in reality Joe will end up moving more TCP/IP traffic than the 4320 polls strictly require. The first extra overhead will be from premature TCP retransmissions. The high variable latency of cellular means Joe will see from 2% to 10% retransmissions, and since cellular is very reliable, each transmission will result in duplicate TCP acknowledgements as well. Sticking to worst case, budget-safe assumptions Joe should budget about 10% or 100K per month for premature TCP retransmissions. So now Joe is up to 934K per month.

However, there is yet another overhead Joe should budget for - TCP Keepalive probes to detect lose or death of the TCP socket. Without this, one end of the connection could go away and the other end would never know and never recover the socket resource. Since wide-area-networking is involved, Joe also needs to assume at least one intermediate device will abort and discard the TCP context if idle more than 5 minutes. Given Joe polls every 10 minutes, he'll need at least one TCP keepalive exchange between each poll. Each TCP keepalive exchange consists of another 40 plus 40 bytes, so we are talking 4320 x 80 bytes or another 346K of billable traffic. This puts Joe up to 1.28MB of billable traffic to move 143K of Modbus traffic.

Now, why not close and reopen the socket? Yes, that is an option but each TCP close and reopen generates about 320 bytes - not including TCP retransmissions. So Joe can either pay for 346K worth of TCP Keepalive or 1.38M of TCP socket thrashing; which would be 1.28MB and 2.32MB per month respectively.

So Joe is up near 1.5MB per month just to move his 10 registers of data once per 10 minutes, and this doesn't include any time he checks the web interface of his cellular device for status (say another 200-500K per access), nor does it include any on-demand HMI data access screens which trigger other Modbus/RTU polls. These could easily create many MB of traffic per month and requires carefully, mindful behavior by Joe and his colleges to control costs. One careless person can easily drive the cellular bill up by hundreds of dollars in a month!

Summary:
  • Raw Modbus/RTU data = 140K per month
  • Basic TCP/IP headers to move and acknowledge data = 691K per month
  • Estimated 10% premature retransmission = 100K per month
  • One TCP Keepalive exchange between 10 minute polls = 346K per month
  • Overall, Joe should expect at least 1.5MB per month and I'd suggest he budget for 3MB or even 5MB. This puts him up into the $20/month cell plan range.

Labels: , ,

Wednesday, January 03, 2007

Rockwell Bridging - Ethernet to DF1

Question: We have a MicroLogix 1500 with only 1 serial port. What Digi product can we use to enable Ethernet access from RSLinx or a HMI display?

The Digi One IAP allows bridging AB protocols:
- Ethernet/IP Master (such as ControlLogix) can query DF1 PLC
- CSPv4 Master (such as RSLinx, PLC5E or SLC5/05) can query DF1 PLC
- DF1 encapsulated in TCP/Ip (such as OPC server) can query serial DF1 PLC
- Modbus Master (TCP, RTU, or ASCII) can query DF1 PLC as-if a Modbus slave.

All of these can function concurrently, as the serial port is moving pure DF1.

The Digi One IAP cannot support DH485 because (like ProfiBus) the token rotation is too fast to be encapsulated over Ethernet successfully.

The general Rockwell Bridging is discussed in this PDF file:
http://ftp1.digi.com/support/documentation/90000636_a_doiap_ra_bridge.pdf

Quick comparison of Digi One IAP to the 1761-NET-ENI:
ProductDigi One IAP1761-NET-ENI
Ethernet/IP to DF1 FullDuplexYESYES
CSPv4 (PLC5E protocol) to DF1 FullDuplexYESno
Modbus to DF1 FullDuplexYESno
DF1 encapsulated in TCP/IP or UDP/IPYESno
DF1 by port redirectionYESYES
Supports DF1 Radio ModemYES (next release "H")no
Maximum active Masters/Peers644
Configuration by WebYESno
Configuration by Telnet or SSHYESno


Basically, the Digi One IAP does everything the 1761-NET-ENI does (plus much more) except:
  • The Digi One IAP does NOT handle CIP encapsulated within DF1, which is required only for RSLinx to CompactLogix RS-232 port
  • The Digi One IAP does NOT have emails triggered by PLC MSG blocks
  • RSLinx won't talk Ethernet/IP through the Digi One IAP - it will talk "Ethernet Driver" fine. This is *NOT* related to existance of an EDS file. RSLinx talks via the 1761-NET-ENI because it is hard-coded to treat the ENI special. There is no EDS file information which RSLinx examines to function with the ENI.

Labels: , , ,

Friday, December 15, 2006

Mixing Modbus and Rockwell on Ethernet

Both Modbus and PCCC-based protocols like DF1 or CSPv4 (AB/Ethernet) have been around for years. Yet if one looks at the similarities between the two, one quickly sees that the act of reading 10 words from an N7 data file is exactly the same as reading 10 words from Modbus 4x00001. The Digi One IAP leveraged this to become the world's first off-the-shelf transparent protocol bridge. It freely accepts Modbus or Rockwell requests and bridges them to the appropriate form for the slave to understand.

Here is an example system:
 Example of AB and Modbus talking on Ethernet

  • The ControlLogix can poll the Modbus/TCP and DF1 PLC
  • The Modbus/TCP PLC can poll the ControlLogix and DF1 PLC
  • The DF1 PLC can poll the ControlLogix and Modbus/TCP PLC.

So how does this work? Take a look at the messages to read the first 48 bits of bit memory:
  • Modbus/TCP is 001E00000006010100000030
  • Modbus/RTU is 0101000000303C1E
  • Modbus/ASCII is :010100000030CE(CR)(NL)
  • DF1 Full-Duplex is 100201000F000019A206038500001003DE06
  • CSPv4 is 0107000E00 … 010500000F000019A2060385000
  • PCCC-Ethernet/IP is 6F002800 … 0000010000000F000019A2060385000
Notice the bold, underlined text patterns? This is the heart of how a normal Modbus Bridge or 1761NetENI function. Modbus/TCP, Modbus/RTU, and Modbus/ASCII may have different bytes, but they all move the exact same Modbus command; a Modbus bridge doesn't need to understand the Modbus command, just be able to unpack and repack each form. Similarly DF1, CSPv4, and PCCC-in-Ethernet/IP have different bytes, but they all move the same PCCC command; a PCCC bridge doesn't need to understand the PCCC command, just be able to unpack and repack each form.

The Digi One IAP takes this one step further - since each of these bold, underlines commands is accomplishing the same thing - namely reading the first 48-bits of bit memory - the Digi One IAP can take either command and mechanically create the other. So given the Modbus command 010100000030, it can create the PCCC command 0F000019A20603850000. Given the core PCCC command 0F000019A20603850000 it can create the Modbus command 010100000030. So this how a Modbus/TCP master can query a ControlLogix with PCCC-enabled. the Modbus/TCP master thinks it is polling another Modbus device. The ControlLogix thinks it is being polled by another Ethernet/IP device.

Here are links to other related information:

Digi One IAP product page
Application Note for Modbus master polling Rockwell devices.
Excel spreadsheet for Modbus master polling Rockwell devices.
Application Note for Rockwell master polling Modbus devices.
Excel spreadsheet for Rockwell master polling Modbus devices.
PDF presentation of various ways to mix Modbus and Rockwell devices

Labels: , ,

Rockwell AB PLC via Cellular

So far we have succeeded in getting several Rockwell/Allen-Bradley PLC up on Cellular with the Digi Connect WAN, which is a cellular router for GSM or CDMA with local Ethernet and serial port.

In Summary:
  • Serial DF1: You can access serial MicroLogix PLC such the MicroLogix 1200 on the remote Digi Connect WAN's serial port. You either need to have an OPC server which can directly encapsulate DF1 protocols into TCP/IP or to use Digi RealPort to create redirected COM ports for RSLinx. Ideally, using the newer DF1 Radio Modem protocol can cut your data costs in half, but DF1 Full-Duplex or Half-Duplex can also be used. DH485 won't work via cellular due to the high latency. You must slow the PLC (ACK) timeout setting down to 30 seconds, so you cannot use a MicroLogix 1000 since it doesn't allow this parameter to be adjusted. DF1 Radio Modem has no DF1 (ACK) or (NAK), which is why it costs less to use.
  • CSPv4 or AB/Ethernet: You can access legacy PLC such as SLC5/05 and PLC5E by enabling TCP port forwarding of port 2222 on the Digi Connect WAN. Under RSLinx you enter the IP or DNS name for your Digi Connect WAN in the "Ethernet Driver", then right click the driver to slow down the timeouts from default of 3 seconds to a cellular-friendly 30-seconds. For a bit of fun, open this link in your browser and you will access the web pages of my SLC5/05 through Cingular/GSM cellular - http://digiwan.gotdns.org:8080/. But please don't leave this page open since you'll impact other people trying to look at my cellular PLC.
  • Ethernet/IP: You can access ControlLogix and other newer PLC supporting Ethernet/IP by enabling TCP port forwarding of port 44818 on the Digi Connect WAN. Under RSLinx you enter the IP or DNS name for your Digi Connect WAN in the "Remote Devices via Linx Gateway" Driver, then right click the driver to slow down the timeouts from default of 3 seconds to a cellular-friendly 30-seconds. You cannot use the RSLinx Ethernet/IP driver since it relies on UDP broadcast which cannot move across wide-area-networks.


If you want more detailed instructions, I have an application note online here:

90000772_A_Cell_AB.pdf

Labels: , ,