Jump to ↓

Compared to traditional landlines, a business VoIP phone system provides benefits like custom routing, HD voice, analytics, and other advanced features. VoIP technology relies on several communication protocols to successfully transmit audio data over the Internet. VoIP protocols provide the functionality and framework for every step of the VoIP process.

This article will cover all you need to know about VoIP protocols, including the most important ones, how they work, and why they’re needed.


What is a VoIP Protocol?

A VoIP protocol is a set of rules and specifications that dictate how users’ software systems connect, establish, and maintain VoIP calls. For users to connect on a VoIP call over the internet, their software systems must first coordinate basic information: each party’s identification, the codecs involved, channels used, media types transmitted, and more. Then, each party must actively send and receive multimedia data for the duration of the call session.

Voice over IP (VoIP) protocols outline rules and directions that guide the software through this process, making VoIP calls possible.


What do VoIP Protocols Do?

VoIP protocols specify the order, steps, and rules about how software systems should format and send data to establish and stream a VoIP connection. To facilitate internet-based calling, VoIP communication requires several functions: establishing a real-time session, registering involved users, determining which types of media will be sent, actually transmitting and receiving the media, and more. VoIP protocols tell software systems which steps to take, and which messages to send, to facilitate this process.

A single VoIP phone call involves several protocols, some of which work simultaneously and some that work in a particular order.

Protocols handle the following VoIP functions:

  • Transport: Transport protocols establish reliable end-to-end connections so that each party can send data during the call. These protocols also confirm data receipt, to ensure that packets are reaching their destination reliably.
  • Connection management: Connection management protocols establish a call or session between user endpoints
  • Signaling: Signaling protocols identify each endpoint's location and IP address, dial the involved parties, negotiate the codecs that the call will use, and handle call controls like mute and transfer
  • Media description: Media description protocols coordinate which types of media will be sent during the session–such as audio, video, text, and other types of data
  • Media: Media protocols handle the actual real-time data transmission during a VoIP session, including audio and video
  • Security: Security protocols verify user identities, uphold access controls, and encrypt session data

There are dozens of VoIP protocols available to software developers, including both proprietary and open-source protocols. Many protocols offer the same functionality and can replace each other–such as SIP replacing H.323 over the last decade–while other protocols rely on each other to work.


Most Common VoIP Protocols

These are the most common VoIP protocols. Most of the protocols listed below work together and are commonly implemented in today’s VoIP platforms.

Common VoIP protocols:


Session Initiation Protocol (SIP)

The most common VoIP protocol, SIP, is a signaling protocol that establishes, maintains, and terminates a connection between all parties on a VoIP call. SIP identifies call participants and then defines the format and order of messages between them–including invites and ringing. Once SIP establishes the call between endpoints, another protocol like RTP takes over the active media stream.


Real-Time Transport Protocol (RTP)

RTP is a transport protocol that delivers audio and video media in real time between endpoints during a VoIP call. Once SIP establishes the VoIP call, RTP takes over to stream audio data during the active call. Nearly all VoIP and video-conferencing platforms utilize RTP for live media communications, including web-embedded software like WebRTC.


RTP Control Protocol (RTCP)

Working alongside RTP, RTCP provides quality of service (QoS) and packet-delivery statistics for multimedia data. RTCP tracks and relays information to call parties about packet counts, packet loss, and round-trip delay time. This information helps phone system software identify data transmission bottlenecks and poor connectivity, for troubleshooting.


Secure Real-Time Transport Protocol (SRTP)

SRTP is a security protocol that works in parallel with RTP to encrypt data, authenticate messages and their integrity, and provide replay attack protection. Though it partners with RTP, SRTP is optional. Users can enable and disable each of SRTP’s features separately.


Session Description Protocol (SDP)

Working alongside SIP, SDP is a signaling protocol that exchanges basic information between users about the call. SDP conveys information like the session’s name and purpose, start and end times, the types of media included in the session, endpoint port numbers, codecs used, and more.


Media Gateway Control Protocol (MGCP)

MGCP is a transport protocol that controls the media gateways between the Internet and the public-switched telephone network (PSTN). Some VoIP calls utilize both the internet and the cable-based PSTN, requiring media gateways that convert packet-based audio data into a circuit-switched audio signal that’s compatible with the PSTN. MGCP controls these gateways for all VoIP calls and endpoints that involve PSTN phone lines.



The predecessor to SIP, H.323 is a system specification consisting of multiple protocols that establish sessions for data-packet transmission over an IP network. H.323 features protocols that handle registration, call signaling, and channel opening for a VoIP call to occur. While some VoIP and video communications software still use H.323, most VoIP platforms have switched to SIP–which provides the same functionality with a simpler setup.


Supplementary or Less Popular Protocols

The protocols listed below are less common than those above. These protocols may have specialized use cases that limit their popularity, or they may play a more supplementary role in VoIP calling. Some of the below protocols are outdated and have been gradually replaced by the ones above.

Supplementary and less common VoIP protocols:


XMPP and Jingle

XMPP (Extensible messaging and presence protocol) is an application-layer protocol originally designed to transmit instant messaging, presence detection data, and contact list information over the internet. This functionality has been integrated into many other types of applications, such as VoIP, video conferencing, messaging, and file transfer.

Jingle is an XMPP extension and signaling protocol that implements instant messaging, file sharing, and other types of structured data transmission into VoIP and video calls. Jingle prepares and delivers this media, but uses RTP to stream it during the session.


Inter-Asterisk Exchange (IAX)

IAX is a communications protocol and SIP alternative, which provides VoIP telephony over the Asterisk private branch exchange (PBX) software. While most VoIP implementations use SIP, MGCP, and RTP, IAX uses one data stream and port number for session signaling and media transmission. This system simplifies some aspects of the cloud-based telephony process. While Asterisk has grown in popularity and is now used for roughly 16% of VoIP systems, it’s still less popular than SIP and the protocols described above.


H.248 (Megaco)

One part of the MGCP, H.248 enables media gateway controllers to communicate with media gateways, so the gateways appropriately convert audio between the signal-based PSTN and packet-based IP networks. However, H.248 does not facilitate communication between different media gateway controllers. It therefore depends on MGCP and other protocols to make a complete system that connects endpoints on a VoIP call.



The H.320 recommendation contains multiple protocols that enable narrow-band visual telephone systems, particularly video-conferencing and video phones, to transmit audio and video media over the PSTN. The recommendation describes and defines communication modes, terminal types, and call control arrangements that allow videoconferencing to work over landline. The protocols within H.320 only apply to media over ISDN-based networks, which supports video over cable landline, and it also specifies only narrow-band audio signals–with bit rates from 64 to 1920 kbit/s. Due to the limited popularity of ISDN and the fact that modern-day data transmission methods are much faster, H.320 is hardly used at all today.



H.324 is a recommendation that provides the standard for low-bitrate multimedia communication over traditional analog phone lines. Similar to H.320, H.324 applies to voice, audio, and data transmitted over landline. It outlines the technical requirements for low-bitrate endpoints and terminals to engage in multimedia communications over the PSTN.


Skinny Client Control Protocol (SCCP)

SCCP is a proprietary transport protocol, developed by Cisco, that functions similarly to MGCP, converting media between the analog PSTN and packet-based IP network. SCCP is designed for Cisco hardware endpoints, such as Cisco VoIP phones. It is also a signaling protocol that registers and connects endpoints, like SIP. Since SCCP is proprietary and SIP is open, SIP has become much more widely adopted among VoIP services.


VoIP Protocols are an Essential Part of Cloud-Based Telephony

Many different protocols–such as SIP, RTP, and MGCP–work together to enable cloud VoIP telephony. Over the last few decades, more efficient protocols and software have gradually phased out some of the older ones, such as H.323 and H.324.

If you’re using a VoIP system or looking to build one, it’s most likely that your platform utilizes the protocols listed in the top section of this article.