At this point, most of us probably end up taking the technology behind our VoIP and UC services for granted. This is completely normal — in fact, many of us go through our days utilizing new technologies without understanding how exactly they work. Not all of us have the time to sit down and dissect what seems to be incredibly complicated tech. Normally, even as a consumer or business owner shopping for a new solution, you’d be completely fine just knowing the basics.

However, if you truly want to ensure the absolute best solution, service, and system for your business, then understanding what you’re getting into can go a very long way. VoIP itself isn’t necessarily too complicated, although it’s easy to get bogged down by the overwhelming technical data and what essentially reads as jargon to the average person. At the end of the day, though, the most important information to understand are the protocols and standards by which VoIP communication is even made possible. Thankfully, the general ideas themselves aren’t too complicated, and we’re here to help.

Armed with the knowledge of how VoIP transfers data, and the differences in standards and protocols used to accomplish that, your business can adopt the right solution for their needs. So let’s dig in, shall we?

The Very Basics: What Even Are Protocols?

VoIP stands for Voice over Internet Protocol, and the word “protocol” is an integral part as to how the entire system works. Essentially, VoIP is a method of transferring audio and even video information across, well, the Internet. However, sending data over the internet isn’t as simple as attaching a file to your email or sharing a Dropbox link. In fact, all of that is made possible simply because of protocols.

So, what is a protocol? Well, very simply put, a protocol is a set of rules that computers use govern and explain how they communicate with each other.

Many of you may remember that god-awful dial-up tone that would play when you tried to connect to the internet using something like AOL. The series of beeps, squeaks and buzzes sounded to many like a robot conversation — and that’s exactly what it was — your computer attempting to “talk” to the internet through a series of checks.

The Transmission Control Protocol/Internet Protocol

Throughout your time on the internet, you may have also come across the nomenclature of TCP/IP. While not the main focus of our discussion, it is worth mentioning. Just about every single computer and device that connects to the internet utilizes and supports TCP/IP. TCP/IP isn’t a singular networking protocol, but rather a suite of protocols that is named after the two most important ones.

For a communication to occur, computers need both a message to send and a method by which to consistently and reliably send and deliver that message. TCP is what deals with the message itself, breaking the content down into smaller sections called packets. This helps explain why packet loss is so detrimental to your call quality. Meanwhile, the IP layer of the suite deals specifically with sending and delivering the packets. This is where your IP address comes from, which is essentially like your house address — a mostly static location or label for your network.

While the TCP/IP protocol suite is the basis for all communications, VoIP and UC rely on communication and signaling protocols to establish a connection between two devices, and allow the transfer of audio or video data beyond the standard suite.

So Then What Protocols Does VoIP Rely On?

Now that we understand what a protocol is and how important they are to the entire process, we can take a closer look at the specific protocols VoIP utilizes. There are quite a bit out there, and many have come and gone over time. However, for the vast majority of users, there are only two main protocols to focus heavily on.

The most popular protocols currently utilized for UC are SIP (Session Initiation Protocol) and H.323. If you’ve taken a look at some of our provider head to heads, you may have noticed that some providers will specifically offer SIP Trunking capabilities. Beyond this, VoIP can utilize other protocols like MGCP and SCCP, but we will go more in depth with those further down. You can click here to skip down there now.

Intelligent Endpoint Protocols

The name Intelligent Endpoint Protocols is used to describe SIP and H.323 because all of the “intelligence” necessary to locate the receiving device and establish the data transfer between your device (the local host) and whomever you are calling (the remote device) is baked right into the protocol.

Both SIP and H.323 are the most popular protocols you’ll come across, having originated in 1995 and 1996 respectively. It’s fairly safe to say, though, that H.323 has grown to be more popular than SIP in recent years. However, this isn’t to say that H.323 is objectively better. In fact, it’s hard to come to that conclusion: both protocols do the job, and both do the job well. At the end of the day, like most things, it will come down to what your business specifically requires.

The Current Standards: SIP vs H.323

This topic has pretty much been beaten to death all around the internet. As we previously stated, there isn’t necessarily one protocol that is better than the other. However, it is still important to understand how each protocol operates, how they differ, and when it makes sense to use one over the other.

The Basic Definitions, and Concepts

Session Initiation Protocol:

The Session Initiation Protocol has become the Internet Engineering Task Force (IETF) standard for multimedia sessions. The IETF is really just a large, open international community comprised of almost anyone involved in networking, including designers, operators, vendors, and researchers focused on the evolution of the Internet. The IETF has determined SIP to be a standard for audio, video, and even instant messaging or team messaging solutions.

It is interesting to note that SIP is modular, meaning it can be changed around. Depending on the type of data you wish to transmit, your SIP deployment will need to be designated for that specifically. Therefore, VoIP and IM communications will work around different aspects — and this is a major strength of SIP. It’s less of a one-size-fits all approach because it can be designed specifically for one approach.


On the other hand, the H.323 protocol has become the international standard for multimedia communication over “packet-switched networks.”  This can include Local Area Networks (LANs), Wide Area Networks (WANs), and even the general internet we all connect to. Essentially, H.323 can be seen as an “umbrella” that includes multiple standards: H.323, H.225.0, H.245 and H.460. H.323 is an older standard, and a very large part of it was based on the ISDN standards.

Don’t worry too much about what that means: it was simply the set of standards for legacy and traditional phones that communicate over the PTSN. H.323 focuses on covering real-time voice, video, and data communication, and was designed specifically to operate over IP networks. Although not widely used, H.323 was also designed with voice and video conferencing capabilities. At this point, H.323 has become the world market leader for voice and video over IP communications (that means your VoIP system) and is even utilized in enterprise video conferencing solutions.

How Do They Differ?

First off, H.323 was based off of the binary language using 1’s and 0’s. SIP, on the other hand, has a simpler text-based format — much like the HTTP that powers pretty much all websites. In fact, a lot of the technology used to support HTTP was utilized when developing SIP. The differences between the two are bigger than that, though.


  • H.323 was developed by the International Telecommunications Union — the organization responsible for building the Public Switched Telephone Network we use for landlines and legacy phones. Developed with video conferencing in mind, it naturally lends towards voice as well.
  • SIP was developed, and is currently controlled, by the IETF as we mentioned above. This organization is responsible specifically for the protocols and overall functionality of the internet. SIP was designed to add a new flexible and modular layer to the internet.

Phones and Flexibility

  • H.323 is mostly a proprietary solution at this point, and explains why providers will require users to purchase their specific phones to ensure all functions and features work.
  • SIP, on the other hand, is much more flexible and generally any SIP phone will operate fully on almost any SIP network. Providers try to prevent this, but it’s mostly possible.

Use Cases

  • H.323 works very well for voice and video communications. Based on an original PTSN protocol, it makes sense for users to expect the same level of reliability and standard calling features. However, it has not expanded much beyond video or voice.
  • SIP, being modular, is what some refer to as “media agnostic.” SIP does not necessarily require a specific type of data to be transferred; therefore, it can be utilized for instant messaging, presence indicators (who is online and who isn’t), and even file transfers along with, of course, video and voice.

Ultimately, the end user will never really notice a difference between the two: both will make and receive calls, and function exactly as they should. However, with their origins being different and each focused on different tasks originally, it’s easy to understand where drawbacks could appear for each.

  • H.323 works very well for VoIP and video conferencing, and is utilized by most providers for these needs; however, it has not been updated much in the last 10 years and doesn’t power the ever-popular team messaging.
  • SIP has more flexibility and, therefore, more use cases with even more features and functions available. Overall, it has a wider range of functions, and while less specifically focused on voice calls, it’s still completely capable.

What Other Protocols, Standards, and Definitions Should I Know?

While SIP and H.323 are probably the most common and popular protocols, there are other options that exist. Beyond this, there are a number of standards and terms thrown around that can become confusing quickly.

Telephony Gateway: These gateways are the network elements that simply convert audio signals carried on the PTSN to data packets transferred over the internet, or your LAN.

MGCP: The Media Gateway Control Protocol is simply a call control protocol, also known as a signaling protocol, utilized in VoIP systems. This protocol mirrors the structure of the standard PTSN.

Call Agent: Simply put, a “call agent” element is required in VoIP to deliver specific services to users and control the signaling communications between phones. Call agents instruct phones to provide dial tone and provide the heavy lifting with functions like phone number switching logic, call control, and endpoint registration.

H.248, or MEGACO: Developed by Cisco as an alternative to H.323, H.248 implements the media gateway control protocol to provide telecommunication functions and services across both modern packet networks (like the Internet or your LAN) and the PTSN.

SCCP: Also known as SKINNY — a term that generally refers to a device that has cut out the fat — it possesses less features and functions, but has the same core elements. When it comes to VoIP, though, SKINNY is a proprietary and Cisco-specific protocol. SCCP was developed for IP telephony specifically, but has integrated video. SCCP employs a “central call agent,” which allows for very advanced and complex call features. SCCP requires that the call agent aspect always remain available to provide call features, which makes SKINNY a less than suitable option for implementations that require endpoints to function independently from a call agent.

What Does The Future Hold?

Of course, this doesn’t tell the whole story, and a number of alternative protocols and standards do exist. In modern times, SIP and H.323 are the most widely adopted and utilized standards worth focusing on; however, that will change very soon.

We discussed at length WebRTC previously, and it’s worth mentioning again. WebRTC can be considered a modern catalyst of VoIP, moving the technology beyond the limitations of SIP and H.323, even with all their flexibility. WebRTC, which stands for Web Real-Time Communication, represents the newest collection of protocols and APIs that enable real-time communications directly in our browsers and phone or computer apps. Not to mention new 5G connections that will boost speeds to help expand VoIP and WebRTC.

WebRTC Will Only Expand VoIP

WebRTC also utilizes peer-to-peer connections, allowing users to establish the most direct connection with each other as possible. Also, as we all know of course, simplicity means great adoption. Therefore, in simple terms, WebRTC will allow us to carry out VoIP calls and video conferences even more directly in our web browsers or phone apps without ever having to download and install a plug-in, launcher, or independent application. Even contact centers will benefit immensely from adopting WebRTC.

This new protocol provides the ability to send voice and video over an IP network, albeit in a less restrictive way. WebRTC will not and cannot outright replace VoIP. You can have VoIP without WebRTC, as we have for many years, but you cannot have WebRTC without VoIP because it is VoIP — or, rather, an evolution of VoIP that allows it to live directly and comfortably inside your web browser. This is the right step for VoIP — pushing it into new boundaries and use cases that will help keep the method around for a long time, and even grow into something completely new.