Online and Over-the-Top (OTT) video viewership is increasing at a tremendous rate. At the same time, the tolerance for poor quality of experience is shrinking to unperceivable levels. This combination of factors is going to drive significant optimization in adaptive streaming technology in the coming months and years. Over the last several years, we've seen traditional UDP style streaming technologies fade away in exchange for http based adaptive streaming formats like MPEG DASH, HLS, HDS, and Smooth. This shift provides a number of benefits. By using short but aligned video segments of varying quality levels , adaptive video is able to leverage the strengths Content Delivery Networks (CDNs) already have in their caching and edge infrastructure to deliver smooth video over the Internet. Let's touch briefly on how this works. In adaptive streaming, performance is regulated through heuristics built into the video player. The player reads a stream manifest, which tells it a bit about the audio and video, the available quality levels, and where to find the content. In fact, the video in an adaptive stream is technically not really a stream at all! It's actually comprised of small chunks of downloadable video segments (2 to 10 seconds in length in most cases) available in several different bitrates. The player starts by downloading and playing back the lowest quality segments, and then analyzes how well it is keeping up. If it's doing great, it starts requesting higher quality levels, taking a moment at each step to assess performance. This is why the videos you watch so often look fuzzy for the first few seconds. This fuzziness is basically your video player starting small until it is satisfied with how well it is performing. It does this until it reaches the highest quality stream it can handle. If the network encounters congestion, it starts going back down the quality stack. Basically, the player is analyzing its own performance throughout the viewing process, and making decisions on which quality level to download based on that analysis. This process of negotiating quality based on playback heuristics generally does a good job managing performance, but there are limits to this method's effectiveness. What happens when the viewers bandwidth, or their system's ability to playback the content, is not the source of the problems? To explore this topic, it might be best to look at scenarios that can negatively affect performance. One that I'm very familiar with from my work in the enterprise is related to network capacity. Many enterprises have closed internal networks architected with just enough headroom to support day-to-day business activities like email, document sharing, conference calls, and so on. Video, on the other hand, is a bandwidth hog that quickly drives a network to congestion (an IT executive I once worked with referred to video as the cholesterol of the network). When networks get congested, playback quality suffers for all. To solve for this problem, a handful of companies have come to the table with Software Defined Networking (SDN) solutions based on peer-to-peer delivery. Hive Technologies, Streamroot, and Kollective are a few of the companies leading with these types of approaches. In the typical scenario of limited network capacity, performance problems are produced by network bottlenecks: areas of the network where there's just too much data trying to traverse at the same time. These bottlenecks are usually the result of too many simultaneous viewers pulling streams from the server or CDN, and often occur during live webcasts where large audiences are all viewing at the same time. Even though all viewers are watching the same program, they must all go back to the source to obtain their own stream segments. Contrarily, in peer-2-peer scenarios, intelligence built into the delivery solution seamlessly sends viewers to other nearby viewers to obtain streams. This limits the number of viewers going all the way back to the source for streams, thereby eliminating the bottlenecks in the network. But what if the problem isn’t the network, rather, it's stemming from an issue with a particular CDN or CDN edge server? This is one of the challenges a company called DLVR is aiming to solve with a trademarked approach they call "Responsive Manifests". By doing real time analysis on many different variables (device, network, location, video characteristics, CDN performance, etc.), the platform creates a unique manifest for each viewer in real time, optimized to provide each individual with the best performance possible. Remember, in traditional adaptive streaming, the player obtains the manifest (the blueprint for the stream), and uses it to determine how to playback the content. In this scenario, the manifest is being continually rewritten in response to delivery performance. Is a CDN edge server having a moment of trouble? No problem, just re-write the manifest to send the player to a different edge server (or a different CDN altogether) until the problems resolve. Optimization isn't only focused on quality of experience. There is plenty of attention centered on efficiency and cost of delivery as well. For example, Telestream recently announced adaptive bit rate optimization capabilities in their Vantage media processing product. It aims to determine where quality improvements actually fail to be perceptible, and then just writes those video segments out of the manifest. For example, if a video is encoded at a top bandwidth of 8 megabits, but a particular scene is very simple and the 8 megabit chunks offer no perceivable benefit over the 1.5 megabit chunks, it will just rewrite the manifest to make the 1.5 megabit chunk the highest available quality for as long as that is the case. This means audiences won't be downloading high bitrate chunks when the lower bitrate media can do the same job, thereby greatly reducing CDN data costs (click here to see a demo). Netflix, being one of the largest providers of OTT video, also continually pushes the boundaries in optimizing adaptive video delivery. From encoding optimization, to quality analysis, to using predictive analytics to detect problems, Netflix is amazingly transparent in their efforts and their tech blog should be on every streaming enthusiasts reading list. In this post, I've aimed to provide just a few examples of solutions already in market focused on adaptive streaming optimization. Clearly, for every solution that has made it to market, there are countless more working their way through labs. This is not surprising because, as amazing as the technology is, there is still plenty of room for improvement. Search for "Adaptive Streaming Optimization' and you'll find no shortage of research papers exploring everything from improving adaptive streaming performance over wireless networks to improving the efficiency of player heuristics. It all points to the fact that the foundation of adaptive streaming is solid and here to stay, but the era of adaptive streaming optimization has only just begun.
In late April, Microsoft Production Studios leveraged Azure Media Services to webcast the Microsoft Build Keynotes. The following week, we used the same configuration for the delivery of the Microsoft Ignite keynotes. Combined, the events served live, HD quality streams to many hundreds of thousands of unique viewers. The webcast performed very well and was accessible on a broad range of platforms, devices and browsers. Recently, I shared a brief peek into how the team at Microsoft Production Studios delivered the events and some insights gained along the way. Click here to read the article on the Production Studios Blog.
Browsing through the television or mobile devices sections of a big box store today might give you the impression that 4k has landed. While it has in some respects, there are many more pieces to the 4k puzzle that are years away from being assembled into place. First of all, what is 4k? Simply put, 4k, formally known as Ultra High Definition (UHD), is a screen resolution comprised of roughly 4000 horizontals pixels (4096x2160 for digital cinema and 3840x2160 for television). It's essentially 4 times the resolution of 1920x1080 HD, which represents the high end of HD resolutions. Oddly enough, all of the frame rates that came before it were known by their vertical resolutions (480, 720, 1080). The UHD generation is known by their horizontal resolutions, which can make it more challenging to keep straight. In addition to the frame size, there are also improvements in color depth, gamut, and other areas that improve the viewing experience. In an age where more is better, 4k seems like it should explode onto the scene. However, there are quite a few reasons we won't see it take hold overnight. First of all, it's big, really big. It's essentially four times the size of content broadcasters are making and delivering today. This means that most of the existing infrastructure in broadcast facilities will have to be upgraded or replaced. Infrastructure that can be retained will have a reduced capacity by a factor of four. Given that many broadcasters feel they just completed the transition from Standard Definition to High Definition, most will be unable to jump quickly into UHD. Some will adopt it in stages, starting with cameras and edit systems, because there are many advantages to beginning to create a 4k asset library - but that content will be scaled down to HD resolutions for broadcast. So traditional broadcasters may not jump into the fray right away, but what about Over the Top (OTT) providers like Netflix. After all, Netflix is already delivering their hit original series, House of Cards, in 4k resolutions, right? While this is true, it's likely the number of people capable of viewing the series in 4k numbers only in the hundreds as of this writing. There are a few reasons for this. First of all, as previously mentioned, the 4k footprint is big. Using standard compression formats in use today (typically h.264 for video), the 4k resolutions of a video stream could be upwards of 25mbps or more, which can be 2 to 3 times the bitrate that is delivered at the high end today. This would choke most Internet Service Providers (ISPs) and ensure that very few viewers could tune into that resolution. This doesn't even take into account the delivery costs inherent in pushing all that data around. For this reason, Netflix has wisely chosen to encode and deliver content using h.265, a more efficient compression spec that can decrease the required bandwidth by 30 to 50% compared to h.264. Therefore, to watch House of Cards in 4k, you need a television with 4k resolution capable of decoding h.265 and running the Netflix application natively. The problem is, there are few televisions on the market that are able to do this. I should note, there are others trying to hack at this issue from different angles. For example, Beamr is a technology that claims to filter media in a way that allows 4k content to be encoded in h.264 at bandwidths equivalent to content encoded in h.265 while maintaining the same perceived level of quality. They claim to do this by filtering out information that is not able to be perceived by human vision during the encoding process. It has promise, but this type of approach is still on the fringe and it remains to be seen if it will be implemented broadly or if the industry will skip stop gap measures such as this in favor of pushing forward h.265. If solutions of this nature get adopted in the near term, it may help speed up 4k's arrival. One might also ask, what about my Xbox, PS4 or Roku device, can't I view 4k on these devices? Again, we run into an obstacle. Currently, most shipping consoles and set top boxes have HDMI 1.4 ports - which are technically capable of delivering 4k resolutions, but don't have the bandwidth to support high frame rates. To achieve the true promise of 4k, the console and the connected television and/or receiver will all need to support HDMI 2.0. To my knowledge, it has not yet been announced if the existing game consoles and set top boxes will be firmware upgradable to HDMI 2.0. On the other hand, televisions are starting to ship with HDMI 2.0 today. There is a great CNET article by Geoffrey Morrison that captures a snapshot of where the major manufacturers sit with HDMI 2.0 support. Regardless, Netflix is limiting their 4k content to UHD televisions with the built-in Netflix app for now. So let's assume we start adopting UHD televisions and the consoles and set-top boxes get upgraded to support HDMI 2.0 and h.265 decoding, or that the industry chooses to embrace a technology like Beamr's to make encoding more efficient… now can we watch our 4k OTT content? We'll be much closer, but the fact remains that even at the bandwidths of more advanced h.265 encoding, 4k is a big data hog. When adoption starts to reach a tipping point, Content Delivery Networks (CDNs) and ISPs will feel the congestion on their networks. CDNs are already exploring ways to be more efficient in delivering content to homes. According to Tom Leighton, CEO of Akamai (one of the world's largest CDNs), while speaking in CNET's For the Record Podcast, the company has been exploring many techniques for tackling this problem including broader use of Multicast (where applicable), peer to peer and client assisted delivery and more. In fact, the company already has a client product called NetSession that aims to create download efficiencies. Netflix has sought to improve the situation on their own behalf as well. Today, Netflix offers ISPs a cache server product called "Open Connect" that essentially allows the ISPs to put a Netflix cache server inside their own network infrastructure. This makes it so end users don't have to source media content all the way back to Netflix, rather, they can get it from within the walls of the ISP, where bandwidth is less constrained. Netflix even rates how well different ISPs do delivering media, likely in an effort to publicly shame ISPs into incorporating these cache servers to improve overall performance and lower the cost of Netflix content delivery. While this approach is good for Netflix and for Netflix customers, it's really not scalable to build unique caching methodologies for every content provider. I believe CDNs, like Akamai, will start to build similar solutions into ISPs and potentially even incorporate technology into televisions, devices, and set top boxes to aid in end-to-end delivery. By going this route, most or all content providers will be served, rather than just a few top players like Netflix. Just as we see Dolby, DTS and other monikers on your electronics today, perhaps we will someday see references to delivery brands to provide consumers with confidence that they will spend less time in buffering states. Is it time for Akamai Inside? In addition to all of the above mentioned obstacles that will hamper rapid 4k adoption, you also can't overlook the industry's temptation to take shortcuts in the interim. One such shortcut will come in the form of Dolby Vision. With Dolby Vision, Dolby intends to enhance the richness of video in our media experiences in the same way it did with audio. Dolby vision increases brightness levels by 40 times over conventional television, it expands the color depth, and enhances contrast in ways that create a dramatic perceptual difference in the image quality. They are strong advocates of the notion that better pixels are better than more pixels. In the short term, content creators and broadcasters may choose to make their content richer by leveraging Dolby Vision rather than trying to jump straight into higher resolutions. Research in human visual acuity (above) suggests that this is a smart strategy because improved color volume and dynamic range is shown to have a higher impact on how we perceive the quality of the image relative to resolution improvements. In fact, depending on the size of the display and the distance at which the viewer is sitting, the argument could be made that a jump in pixel count may make no perceptible difference for many viewers (Click here for a great resource on resolution, display size, and viewing distance. Also the source of the graphic to the left). All of this suggests that we are a few years away from 4k media consumption being the norm for a significant portion of media audiences, but it will eventually emerge. Ironically, one of the areas we may see 4k most prevalent in the near term will be in the user generated space. Tablets and phones are quickly beginning to support 4k photography and video recording. That type of content is often created and viewed locally, bypassing many of the issues presented above. When it's not, it's usually short in form and relevant only to small audiences, thereby making it a fairly light load on network resources. This means many people may find themselves viewing their home videos and YouTube channels in 4k while they wait for the broadcasters and delivery systems to catch up. How long do you think it will take for mainstream adoption of 4k media consumption? Leave a comment and let me know your thoughts.
Technology makes almost anything we do subject to reinvention, and a company called Spritz is demonstrating this by reinventing something we've done the same way for centuries - how we read. In short, Spritz posits that our eye movement dramatically slows down the speed at which we read. Spritz research suggests that approximately 20% of our time reading is spent processing the content, while the other 80% of that time is spent moving our eyes from word to word. Their technology removes the hefty tax incurred through eye movement by streaming words to the screen in a way that makes typical eye movement unnecessary. Examples on their website had me reading 500 words per minute very comfortably. More and more, our reading is done on screens, not on paper. Spritz's approach is reading designed from the ground up for the age we live in, and the Spritz technology reminds us that there is still ample opportunity to improve our efficiencies in many areas we take for granted. Moreover, this approach helps overcome new problems that are just now emerging, like how to present information in an age of wearable technologies, where screen real estate is extremely limited. Streaming words to your phone, watch, or eyewear may prove to be a very efficient way to present information in mobile scenarios. It may even be a more comfortable way to deliver captions for online video - since it can feel awkward reading long strings or multiple rows of text within video experiences today. Finally, What impact might this have for traditional advertising and marketing efforts? Mobile environments are already an awful place for delivering typical web display banners, but nevertheless, display ads are dependent on the meandering eye of the end user. They are designed to distract, and exploit the viewer's drifting attention and wandering eyes, thereby being dependent on the very behavior that Spritz intends to eliminate. On the other hand, it creates yet another opportunity to reinvent how we accomplish our marketing goals from the ground up. I can't help but think this brings us one step closer to Blipverts, first coined in the eighties television show, Max Headroom, where 30 second television commercials are compressed into 3 second streams of sounds and images. In the series, Blipverts had a side effect of causing some viewers to explode… but I have a much more positive outlook for Spritz's implementation.
The above equation intends to show some of the key factors leading to explosive growth of augmented reality scenarios. Augmented reality refers to supplementing the real world with contextual computer generated information. While the concept of augmented reality is not new, the ability to “take it with you” is just becoming economical. The devices we carry every day (iphones, smartphones) will get smarter and more powerful, the bandwidth connecting us to the net will continue to increase, and the data being crunched for us in the cloud will continue to become more relevant.
Already, the GPS information in iphones is changing the way people interact with the world around them. Turn-by-turn directions are combining your location information with map data to get you where you wish to go. Smart applications provide details about real estate by using GPS to understand which property sits in front of you. Social networks can be updated with the click of a button to let your friends know where you are. Even so, in a few years we will shrug when we think about how primitive the technology was at this stage.
As I write this, applications are being authored that combine the camera, orientation, and GPS information from the phone with data in the cloud to provide rich information about every day things. Just by pointing the camera on your phone to a sculpture in the park, you will get detailed information delivered to your screen about the art and the artist. Pointing at a restaurant might reveal the day’s dinner and happy hour specials, the wait time for a table, and then provide you a way to place your order as you make your approach. New in town and not sure where to get a bite? Point your device down the street and GPS combined with image recognition will present you with rich data about the establishments that lie ahead of you. In order to make sense of the enormous amount of data, the information will be easily filterable (Food, Italian, Romantic – for example) . Many different devices will provide this functionality and it will become clear that the hardware at your hip is much more than a phone.
Eventually, companion devices such as smart-glasses (Retinal Imaging Display), will enter the marketplace. These glasses will connect to your device via technologies like Bluetooth to allow for more convenient access to your information. Think of it as the way Terminator viewed the world, only without the all the red tint and orders to kill people. Information at your fingertips will become information at your pupils. Voice recognition will allow you to easily navigate through the data that is being drawn onto the lenses of your eyes. Likewise, content will be very interactive, allowing you to receive and send information to the people and places that stand before you.
This brings me to the “people” part of this future. The dawn of augmented reality is neatly aligned with the maturing of social media. The data crunching cloud is not only taking in everything it can about where you are and what you’re looking at, it’s correlating it with everything it knows about you through your social networks, online profiles, search histories, and so on.
The device you carry with you will become your extended sense of site and sound (and maybe more) and the data rich, processor heavy cloud will become the distributed machine that is organizing information, processing it and serving it up. You may think this sounds overwhelming and perhaps even annoying. In fact, you’re probably right. But rest assured… though it may be too much information for you to stomach, your kids are going to love it.
Want to learn more? Check out the Layar platform for one approach or Tonchidot for another. Note: This post was originally published for the CrapMonkey Podcast in 2009.
Every movie in the 80′s talked about plastics being the wave of the future. As I sit here in my office surrounded by plastic gizmos and gadgets, typing away on my plastic keyboard, and listening to music crank out my plastic speakers; there is no doubt that those predictions have come true… almost. In truth, we’ve only seen the first phase of the plastic boom – the top down, manufactured approach where companies make and consumers buy. Well strap on your seatbelts folks, because Plasticiety is rapidly approaching and the wave it’s riding is called MakerBot. In the not-so-distant future, we will all have MakerBots on our desktops or in our garages. They will do to manufacturing what desktop publishing did to printing; put it in the hands of the commoner. Makerbots are open source 3D printers. Similar to how a hot glue gun works – plastic tubing goes in one side, gets heated up into a liquid stream and is injected onto a surface (layer by layer) to form a 3-D object. Today the 3D object is limited in size to 4″x4″x6″, but that will increase in the wake of Plasticiety. Products will dawn stickers that define what percentage of MakerBot printable parts comprise them. Consumer purchasing decisions will be partially based off of this volunteer rating because it will mean the products can be easily repaired. Did the knob break off of your stove? No problem, just go to GE.com and download the 3D knob object that matches your part number and print out a new one. Tired of losing or breaking the battery cover for your remote controls? Put down the duct tape and have Makerbot print a new one. Uh oh, the kids lost the toothpaste cap again, surf over to Crest.com, download the object, and print a few spares. But that describes only the tip of the iceberg! Sharing and community will do to manufacturing what it has done to music… irrevocably flip it on its head. Just visit some of the many object sharing communities that this revolution will foster, download the object models that intrigue you, and then print them into existence. Objects will be simple at first: hooks and hangers, Jello molds, cookie cutters, spatulas, spoons, measuring cups, salt and pepper shakers, coasters, bottle caps, etc. – but they will become more complex as the phenomenon takes off. In addition to the plastic tubing (aka – print cartridges), our local hardware and office supply stores will sell bundles of simple parts like springs, hinges and simple motors. With easily downloadable instructions, you will quickly be able to assemble your homemade parts into more complex creations. Likewise, when you find or invent items that are useful, it will be easy to post them to the community and share them with your friends. As noted above, today these MakerBots only make items that are smaller than 4″ by 4″ by 6″ and the objects they create have limitations - but that will change as time goes on. The bots you have in your home will become efficient at creating items of larger sizes and at higher qualities. Likewise, 3D object printing shops will begin to show up around town (the Kinkos of Plasticiety) and enable much larger or specialized projects to be completed with the same relative ease and low cost. You are Here. At the crossroads of hyper-manufacturing and consumer empowerment… where digital turns back into physical and where ideas become tangible objects. Welcome to Plasticiety. For a glimpse of this future today, check out MakerBot’s website and the open source object marketplace at Thingiverse.com.