The effectiveness of a company’s business processes depends on employee engagement and workplace communication, especially given the new trend of remote work. For some niches, communication with customers is key to business success. At some point, businesses realize they need an effective tool for facilitating communication among employees.
A common solution is developing real-time chat functionality to let users get the information they need in seconds. You can embed chat functionality into existing applications (like an m-commerce app) or create a standalone app for workplace communication (like Slack or Healthfully).
Existing messaging applications are equipped with lots of useful features and extensions but can cost a pretty penny. For instance, Slack charges $8 a month per active user. Also, some industries (like finance) require strong privacy controls, so companies need to store all data on their own servers. This creates the need to build a custom messenger or integrate chat functionality into existing apps.
Having developed standalone messaging applications and numerous projects with built-in chat functionality, we decided to reveal the real magic behind a messaging application. This extensive guide covers three key subjects:
Must-have functionality— Discover the main feature set for any messaging app.
What’s under the hood? — Learn about the technical aspects of messaging app development.
Final touches — Consider features to make your app stand out.
Here are the main features that will turn your application into a silver bullet.
Though the content of messages may vary, all text messaging applications deliver small amounts of information from one user to another. This is the core feature of any messenger app, so you should think through every detail of it:
Secret chats. Implement chats that disappear after a certain period of time to provide an extra level of privacy.
Delivery status. It shows whether the chat participant has seen and received a message.
Unsend message. Most modern messengers let users send messages within a certain time frame after the message is sent.
Group chats. Implement group chat functionality, especially when building messengers for communities or teams. Group chat was a core feature of Healthfully, a medical platform we worked on that improves communication in hospitals. We created a UI/UX design and were responsible for web development.
Moderation. Adding group chat functionality also requires adding moderation functionality. Enable admins to ban or remove users, delete messages, and make other users admins. Telegram also has a feature that allows admins to limit the number of messages from one user during a certain period of time.
When there are lots of members in a group, it becomes hard to coordinate activities and stay in touch. Thanks to polls, making decisions and coordinating in groups is faster and easier.
Voice and video messages
Sometimes users don’t have time to type long messages. For added convenience, allow users to send video and audio messages. This feature, which first appeared in WeChat in 2015, quickly captured the hearts and minds of Chinese users and started spreading to other chatting software.
Voice and video calls
The more channels of communication you provide, the more convenient your app will be for users. Let users communicate via free voice and video calls. Group call functionality takes greater efforts to implement but provides the possibility to run video meetings.
Users want to instantly share photos, GIFs, videos, documents, and other content. Implement file exchange functionality that allows users to send files in various formats. You can integrate with Google Drive and Dropbox using their APIs to let users import files directly from these services.
Photos are the most popular type of file to exchange. It’s a good idea to let users highlight elements in photos or simply decorate them. Photo editing functionality can vary greatly, from cropping to adding Snapchat-like filters.
Our developers have created a useful feature called ForceBlur that blurs received images and allows users to make them visible using 3D Touch. This is a perfect solution for situations when users want to send sensitive images and want to make sure that only their intended recipient will see them. ForceBlur is an open-source solution; you can find the code on our GitHub page.
We’ve also developed an open-source cropping library for Android called uCrop. This easy-to-integrate tool is used by popular apps like Discord, Mi Fit, Mi Drop, and ShareChat. The source code is also available on the Yalantis GitHub page.
Public channels make messaging applications a powerful marketing tool and a medium that companies, bloggers, and professional communities can use to make announcements. Typically, only admins can create posts in public channels; subscribers are unable to create new posts but can like and comment on existing posts. We developed public channels called Communities in Healthfully. In these channels, doctors can post useful information for patients who, in turn, can like or comment on those posts.
Another question is how users discover public channels. Telegram has no official way to browse channels. Usually, users share links of channels they’re already subscribed to with others. WeChat and Viber enable users to discover channels by QR code.
Rakuten has implemented a channel discovery feature in Viber with Tinder-like swipes. The Discover section in Viber offers a range of channels categorized by topic. Users can pick a topic and use the swipe feature to browse. This adds a level of gamification to the application.
Integration with social networks
There are two main reasons to integrate with social networks:
Quick registration. During registration, a user can choose to link their messenger profile with a social network profile. This saves time on typing information.
File sharing. Users can share files via social media accounts. In turn, some messengers provide custom buttons on the websites that allow users to share posts from the websites directly in the messenger.
What’s under the hood
Now let’s dive deeper into the technical part of messaging app development.
Let’s start from the basics: you’ll need to choose the communication protocol for your app. A communication protocol is a set of rules that allow two or more devices to exchange data in real time. Your choice of communication protocol is especially important for a messaging app since it should provide a seamless and low-latency connection between all chat participants.
Here are some of the communication protocols you might consider:
Extensible Messaging Presence Protocol (XMPP)
This is an old and sophisticated XML protocol that was born in the times when ICQ was popular. XMPP can easily be extended or changed depending on your project’s needs. Designed for real-time messaging, XMPP has a very efficient push mechanism. It’s secure, since it uses SASL; also, the development team is working on end-to-end encryption to make the protocol even more secure. Thanks to the decentralized nature of this protocol, anyone can run their own XMPP server.
But XMPP has disadvantages as well. Developers complain about the verbosity and complexity of the protocol, and long session handshakes result in slow connection speeds. Also, it doesn’t provide message delivery confirmation by default, so developers need to set this up manually. To our mind, XMPP is overweight and outdated.
Message Queuing Telemetry Transport (MQTT)
You can consider the MQTT protocol for exchanging data between clients and servers. This publish–subscribe protocol was developed for gathering telemetry data. Nowadays, MQTT usually connects different IoT devices. It works over TCP/IP and supports secure connections over TLS. But one thing that could complicate your backend development is Message Brokers for MQTT.
By default, MQTT Message Brokers have a simple list of features: authentication, authorization, and publish/subscribe to queue. To implement features like group chats and message statuses, you need to develop plugins or develop custom services that will receive messages, process them, and only after that publish them to another queue and deliver them to the user. It looks a bit complicated and isn’t optimal from the architectural and performance perspectives.
This modern and secure protocol allows for continuous bidirectional data exchange between clients and servers. When a connection is established, the server and client exchange data without requests from the client side. Data is sent to the client right after it arrives to the server. Because several users are connected through a single connection, WebSockets consume less traffic and deliver messages faster than the other protocols mentioned above. If you want to learn more about WebSockets, we recommend reading our article on using WebSockets in Go.
Your communication protocol is only one consideration. You also need to choose a programming language for server-side implementation. Your back end must be reliable, scalable, and support millions of connections per node. So concurrency from the start is crucial. Golang and Erlang have proved reliable solutions for messaging app development, so we’ll compare these two options.
Erlang. It’s no secret that the back end for WhatsApp, the most popular messenger in the world, is written using Erlang. Why Erlang? Because it was written in the 1990s for telecommunications equipment and supports concurrency by design. It’s functional, dynamically typed, and performs garbage collection during runtime. You can choose Erlang as the language for the back end of your messaging app, but there’s one problem: Erlang isn’t so popular. Finding experienced Erlang developers is pretty hard, and they charge high hourly rates.
Golang (Go). What about using Go for the back end? It’s a simple, statically typed, and lightning-fast programming language that compiles into native code. Introduced by Google in 2009, Go also supports concurrency by design and has a garbage collector. The concurrency models of Erlang and Go were inspired by the CSP, or communicating sequential processes, principle. Go has a huge community and a variety of open-source projects. Finding developers who are experienced in Go is not as hard as finding Erlang developers.
And what about client-side programming languages? According to current trends in the mobile development world, Swift and Kotlin have become the top solutions for iOS and Android development respectively, dethroning the traditional Objective-C and Java.
Read also: Kotlin / Java: Basic Syntax Differences
Storing chat history
There are two approaches to storing chat history: in the cloud (like Telegram and Slack do), and on a user’s smartphones (like Viber does). Storing data in the cloud means that messages will be available when a user switches to another device. But it requires additional expenses on your side. Storing everything on users’ devices is less convenient for users but doesn’t require investments in cloud storage. Viber allows users to back up their chat history in Google Drive. Users can enable auto backup, but this feature stores only text messages.
1. Overall app architecture
What parts do we need to build a messaging app? Take a look at the component diagram above and you'll see the following:
An API is the core of a real time messaging app. It’s responsible for establishing a connection between a client and a server for authentication, event processing, message routing, and other purposes.
Media service. Any messenger needs to store, process, and return images, videos, and other data. That’s what a media service does.
A queue is responsible for routing messages between API nodes. A queue can be any broker with a fan-out messaging pattern, clustering support, and high throughput.
File storage can be a classic object storage with support for private buckets or a CDN for fast content delivery.
A database cluster is responsible for storing users’ profiles, settings, chats, events, etc. It must scale vertically and horizontally and must support replication and backups.
A key–value cache cluster is needed for storing temporary user data, counters, tokens, and so on. It must be scalable, so for this part of the architecture we suggest using Redis Cluster. It supports various data types, sharding, and clustering.
A time-series database is needed for saving statistical data like user registrations/logins or message sending events. InfluxDB is our choice for this. You could use the full Influx stack for gathering, storing, and visualizing data.
A notification service is responsible for sending push notifications to mobile devices and sending SMS codes for two-factor authentication.
An SMS service is an external service that sends SMS messages to users’ phone numbers. The main criteria is an SLA for SMS delivery. An SMS service must deliver messages to users as fast as possible and operate in countries where you’ll launch your messaging app. Good options are AWS SNS and Twilio.
A push notifications service is an external service that delivers push notifications to the mobile platform you plan to support. We prefer using Google Firebase for Android and iOS apps, as it’s free and fast.
2. Message flow
Let’s see how messages are distributed in a group with three participants (see the schema below):
A message comes from a sender to API Node 1 via the Load Balancer (a device that distributes traffic across a number of servers).
API Node 1 processes the message, obtains the list of chat participants from the database, creates messages for each participant, and transmits them to the Queue.
The Queue delivers the message to each API Node under the Load Balancer except Node 1.
API Node 2 skips the message for the Web Client but delivers it to the iOS client.
API Node 3 skips the message for the iOS client but delivers it to the Web Client.
It looks like we have some overhead when API Node receives messages for each chat participant and then delivers them only to connected clients. But for this solution, we don’t need to save client connection states globally and don’t need a global key–value map of connected clients and nodes. Our solution is suitable for mobile clients with unstable connections and frequent reconnection attempts.
3. Architectural patterns
If you want your app to successfully compete with market leaders, it needs to deliver messages as fast as possible in unstable network environments and support message synchronization between multiple devices. This dictates strict requirements for the server-side software architecture. You need to choose the right architectural pattern to avoid rewriting services in the future.
The Event Sourcing architecture pattern ideally suits a messaging app. In contrast with other patterns, instead of storing the current state of data in a domain, the Event Sourcing pattern records a full series of actions (events) taken on that data.
Event Sourcing persists the state of an entity, i.e. a chat, as a sequence of events. An app persists events in an event log, which is a database of events, and this log has an API for adding and retrieving an entity’s events.
Some users can have a large number of events. In order to optimize app loading, an API can periodically save a snapshot of an entity’s state. To reconstruct the current state, the application finds the most recent snapshot as well as events that have occurred since that snapshot. As a result, there are fewer events to replay.
When a user installs a mobile application from scratch on a fresh device and logs in to an existing account, they receive the latest snapshot and events that occured since the date of that snapshot. After this initial synchronization, the mobile application reconstructs the current state by applying events to the snapshot one by one in chronological order.
When a user launches a mobile application after some idle time, they receive events since the last online timestamp till the present, and the app reconstructs the state in the same way by applying events one by one chronologically.
4. Audio and video calls
WebRTC (Web Real-Time Communication) is a common solution for audio and video calls. In plain English, WebRTC is a collection of APIs and protocols that enable real-time peer-to-peer connections for streaming audio and video.
WebRTC is completely free and open source, so it can easily be used by anyone. And thanks to the openness of the project, it has a vibrant ecosystem around it. Besides that, WebRTC is available in most modern browsers, it’s portable, and it can be used for mobile apps as well.
There are three main APIs for WebRTC that, when working together, provide smooth transmission of high-quality audio and video data:
Media Streams is responsible for granting access to a device’s camera, screen, and microphone.
PeerConnection is probably the most important part of WebRTC and is the hardest to implement. It’s responsible for establishing connections between users. PeerConnection generates a network for exchanging messages and processes incoming requests, implements ICE to connect media channels, encodes and decodes audio and video data, sends and receives media over the network, and handles all local and network issues.
DataChannel is used to send arbitrary information between devices. You can configure data channels to be reliable or unreliable as well as ordered or unordered in the way they deliver messages.
The process of discovery and negotiation of WebRTC peers is called signaling. Using a signaling server, two devices can discover each other and exchange negotiation messages. WebRTC does not specify signaling, so you should use WebSockets for it.
To make embedding and integrating audio and video calls with WebRTC easy as pie, you can use the following SDKs:
- MirrorFly is an easy-to-use WebRTC-powered tool that helps developers integrate live broadcasting, video conferencing, and video calls. It boasts low latency, advanced H.264 data encryption, and the ability to stream HD videos.
- Sinch is a lightweight cross-platform SDK for video calls that has extensive and understandable documentation and allows developers to control media streams.
- The WebRTC framework from PubNub supports a wide range of WebRTC features and is considered a highly reliable solution that encompasses enterprise-grade security.
- OpenTok is a WebRTC platform that simplifies the integration of voice and video calls, screen sharing, and video streaming into mobile and web apps. We’ve used this tool for creating a messaging application and can vouch for its reliability, simplicity to integrate, and ease of use.
5. Media service image, audio, and video processing
All photos, audio, and video messages should be processed somehow on the back end of your app. You need to generate image previews, compress audio and video files, etc. Part of this work can be done on the client side, but sometimes you need to load it to the server side.
If you’re developing a cloud-agnostic solution, you don’t have any alternatives except to write your own scalable media service with high throughput and optimize it during development.
Follow these security best practices and use these technologies during development and deployment:
Use TLS/SSL protocols for any type of client–server connection.
Connect a user’s profile to a unique phone number and implement two-factor authentication using SMS.
Pin certificates for mobile devices.
Use end-to-end encryption for message content (text, audio, video, and calls) using asymmetric cryptography algorithms. Generate key pairs for each chat and do not store them in the database.
Use Private File Object storage for each chat with authentication and authorization so only active chat participants can download media files from the chat bucket.
Use a private network for infrastructure deployment.
Back up everything.
Use rate limiting and throttling.
Comply with GDPR requirements.
Do not save sensitive user data in the database. If you need to save any sensitive user data, encode it or use hashing algorithms. All rows must be hashed using a unique salt to avoid decoding with lookup and rainbow tables.
Use at least an SHA-512 hashing algorithm.
In our extensive guide to data security mechanisms, you can find more detailed information on this topic.
To compete in the market, you should constantly improve your app’s functionality. We’ve gathered a list of advanced features that will make your app a robust competitor to giants like WhatsApp and WeChat.
Chatbots are small applications that work inside a messaging app. Bots can greatly enlarge the functionality of your app and make it a one-size-fits-all solution, allowing users to do everything from search for YouTube videos to transfer money to others. Bots are developed by third-party developers and provide integration with their services.
You should thoroughly curate what third-party chatbot developers create. Most major messaging app developers write guidelines for creating chatbots and have a team of developers that verify chatbots created by third-party vendors.
Money in messaging apps is usually transferred via chatbots. But WeChat, which is known as the Chinese solution for everything, goes further and has implemented in-app payments in WeChat Pay.
There are several ways to pay via the WeChat app:
Generate a QR code. This feature works in two ways. With the first method, a user generates a QR code and shows it to the merchant to pay. With the second method, the merchant creates different QR codes for different goods and users scan these QR codes to pay.
In-app payments. For this, merchants should integrate the WeChat Pay SDK. When users pay in apps other than WeChat, they should authorize WeChat to process the payment. Once the transaction is done, the page will redirect to the initial app.
Web-based payments. Merchants can send product messages to their followers via Official Account. With WeChat Pay enabled, followers can quickly purchase products on a merchant’s shopping page.
WeChat Pay is an e-wallet. To add the same payment functionality to your messaging app, you should integrate a payment gateway. To learn more about this, read our comprehensive article on integrating mobile app payment gateways.
From Snapchat and Instagram, Stories and streams have steadily conquered messaging applications. Streaming functionality allows users to share data in real time with several users simultaneously. WhatsApp Status, for instance, allows users to post videos, GIFs, and photos that disappear after 24 hours.
Natural language processing
Natural language processing allows applications to analyze language, transform it, or generate human-like spoken language. NLP comes in different forms, from recognizing text in photos to generating human-like dialogue.
Messaging applications usually have speech-to-text and response generation functionality. Luckily, to integrate these features in your app, you don’t need a team of data scientists. A lot of tech giants share APIs that allow you to equip your application with AI-powered features. Let’s take a look at the most popular solutions:
a) Speech-to-Text API from Google Cloud. Released in 2018, Google’s API offers an accurate speech-to-text feature. Accuracy is achieved by using multiple machine learning models for various use cases.
The API recognizes 120 languages and dialects and can recognize languages automatically. It also boasts enhanced punctuation options for clearer transcripts.
But this tool has limitations in terms of the custom vocabulary builder and cannot work offline. Also, be ready to pay. There’s a complicated formula for calculating the cost.
b) Cognitive Services from Microsoft Azure. Microsoft’s solution offers a range of AI-powered features including content moderation, text translation, computer vision, and face recognition.
In terms of speech-to-text functionality, it boasts real-time text processing and translation, speaker recognition, and a more customizable vocabulary compared to Google’s API. Cognitive Services also requires an internet connection and has fixed prices per 1,000 transactions.
c) Speech framework from Apple. This tool is used for enabling iOS applications to capture a user’s voice and send it to Apple’s servers for processing. So it’s clear that it won’t work without an internet connection.
The framework is free, but if you use it, you must ask the user’s permission to process their data on Apple’s servers. To remain available to all applications, Apple limits the number of recognitions that can be performed by individual devices per day, and each app may be throttled globally based on the number of requests it makes per day.
d) Watson Speech to Text from IBM. This solution allows for processing large quantities of data. You can easily integrate Watson Speech to Text into your app using mobile SDKs for Android and iOS or with a REST API.
Watson operates in a limited number of regions and has three plans. The price mainly depends on the number of times the API is used each month and what features you need.
As you can see, the popularity of messaging applications is reasonably high. With chatbots and a wide range of other features, these apps can become a real magic bullet for users. But a wide range of features means complex projects.
After working on numerous projects with chat functionality, we decided to create our own out-of-the-box solution that greatly facilitates messaging app development. If you want to know more about it, drop us a line. We can also consult with you in case you have any questions about creating a messenger.