Kollective hosts thousands of live streaming events each year for its customers and most of them go smoothly. But when something goes wrong and the clock is ticking, fast and accurate troubleshooting is critical.
Our support team gets called in to assist in these cases so, over time, I’ve had the opportunity to identify and establish best practices for troubleshooting technical issues in general and live video events specifically. Below I share five strategies to consider before heading in to your next big live event.
- Always allow time to test beforehand at the event location
Live events are high-pressure, time sensitive activities with multiple moving parts. Waiting until the last minute to test and resolve issues is a recipe for disaster.
Beforehand at the event location, be sure to test:
- Network connection
- Power and cabling
- Devices – cameras, lights, microphones
- Presentation files
- Injected videos
Take note of any new, untrained or offsite personnel that might need to be present during the event, including interpreters needed for multi-language videos. Also, note any unusual equipment or connection types such as satellite connections and 4G wireless connections.
Keep a checklist of this information nearby so you can refer to it during troubleshooting if necessary. I’m a big believer in checklists. You never know what you’ll remember or forget in the middle of a high-pressure situation.
- Redundancy and redundancy
Even though you’ve tested beforehand, something can still go wrong during an event. Redundancy can help guard against problems that arise in the moment.
A dying encoder is not necessarily fatal if you have a backup encoder in place. Likewise, a network problem won’t be fatal if you have a backup connection established; and a lapel mic disaster will not be a big deal if you have a handheld mic nearby.
- Isolate components that are working as expected from those that aren’t
If you find you’re having problems in the moment and you call for help – by the way, always call as phones ring immediately and precious minutes might be lost before a new email is seen – the first thing we do at Kollective Support is try to pull up the event ourselves. Events stream from local sources through an Internet gateway across multiple Internet service providers to the datacenter, where the stream is processed and sent back out to each viewer.
If we can see the event, we know it is coming from the cameras and mics through the encoder across the network to our servers and out to us. We can then quickly separate what’s working from what’s not, either through SD ECDN delivery or outside of it. We can also run test live events on our platform to verify that the platform itself is working properly, and confirm that our other customers’ events are working as expected.
This illustrates a basic principle in troubleshooting a broken process: isolating components that are working as expected from those that aren’t. Identifying the symptoms of a problem and tracing its origin are required before the problem can be solved. If you make changes based on the symptoms alone, you might inadvertently worsen the problem or introduce a new one.
- Partner with others to collect and analyze the data
If we can’t see your stream, but we can see ours, we’ll want to determine whether your encoder can reach us. One way to do this quickly and easily is for you to shut off your encoder and have us push to your event from our encoder. If ours works and yours doesn’t, it could still be caused by various things such as network issues or encoder issues, but at this point we’re eliminating potential sources of problems.
Obviously, this can’t always be done painlessly during an event. Shutting off your encoder may interrupt a local recording, which is not desirable. We’ve seen cases where:
- Network cables have been yanked out from encoders
- Network devices have broken
- Competing traffic has overloaded the network and the congestion has blocked the stream
- Encoders have reached a bad state and needed to be restarted
If we can see your stream, but your end-users can’t, there is usually something wrong on the delivery end between the datacenter and the end-users. There are multiple possible causes, ranging from the Internet gateway, to network congestion at specific sites, to problems with end-user machines.
We don’t have visibility into our customers’ networks, so we often recommend getting network team representatives involved both before and during the event. Partnering to collect and analyze the data is key. No single person or organization is likely to have all the information needed to identify and resolve a complex problem.
That’s why the Kollective Support approach is to work as part of a team, alongside the production team working at the event, the customer’s network team and any third parties involved in the event. Finger-pointing slows everything down. Working together to collect data and resolve a problem is much more effective.
- Develop a culture of continuous improvement
Even with on-site testing, redundancy and teamwork, unforeseen issues or inefficiencies may still arise during the production. For this reason, building a culture of post-event analysis and retrospectives allow teams to continually improve not only the next event’s performance but overall event management processes.
What’s important is to identify what wasn’t ideal and improve on those things for next time. Again, make sure to document your process improvements and the steps you plan to follow to achieve them. Repeatable processes greatly increase your chances of success.
The second or third time around, you may want to:
- Adjust individual settings or upgrade software for users who had problems getting a live stream
- Upgrade circuits for locations that experienced some congestion
- Reduce the bitrate of the live stream to improve deliverability within your network
- Upgrade the encoder or increase the bitrate if you’re not happy with the picture quality
Kollective’s Network Readiness Test (NRT) can detect delivery issues within your network such as: protocols that are being blocked, sites with congestion/capacity issues, parts of the world with latency issues, etc., These can be scheduled before the event, but should be considered after events with complex delivery issues. Establishing a baseline and re-running the NRT after changes have been made can confirm that the changes corrected the observed problems.
At Kollective, we’re committed to providing world-class support services for our customers. What are your best practices for troubleshooting live video events? Please share your insights and comments with us.
By Scott Schroeder, Support Director at Kollective