Current location - Trademark Inquiry Complete Network - Tian Tian Fund - Smart Speaker Research Report
Smart Speaker Research Report
On 20 14, 1 1, Amazon released the smart speaker Echo in a low-key manner, which was officially released on 20 15 after half a year of internal testing. In that year, the sales volume was 2.5 million units, and in 16, the sales volume was 5.2 million units, which surpassed the traditional speaker leader Sonos and became the overlord of the online speaker industry, and once occupied 99% of the subdivided smart speaker market. After Amazon's smart speaker Echo received a strong response from the market, Google released google home in May 20 17, Apple released its smart speaker HomePod in WWDC in May 20 17, and domestic manufacturer JD.COM cooperated with Iflytek to launch Ding-Dong speakers ... At one time, international internet and hardware giants joined the battle for the entrance of voice interactive traffic, which gave birth to a wave of AI landing.

Smart speakers have become the fastest growing consumer-grade hardware in the world. Since the release of Apple's home pod in June 20 17, from the second half of the year, one or two technology companies have released new smart speakers or second-generation and third-generation products every month. So far, domestic technology giants BAT and Xiaomi, established electrical appliance manufacturers Lenovo and Suning, voice technology companies Iflytek and Spirits, and hardware technology startups Mobvoi and Ruoqi have all entered the market through self-research or cooperation.

The explosion of smart speakers is inseparable from the support of voice technology, and there are two kinds of manufacturers who master voice technology. One is Internet giants such as Amzon, Google, BAT, Apple and Microsoft, and the other is manufacturers specializing in voice interaction technology such as Iflytek and Spirits. Through independent research and development of software and hardware products or technical output, voice technology manufacturers empower traditional smart speaker manufacturers, content and Internet service providers, acquire users and data, and establish a platform ecology.

1. Amson Alexander

Basic information: Amzon Alexa is Amazon's intelligent virtual assistant and open platform. 20 10 started research and development, and 20 14 1 1 was released simultaneously with Echo. Alexa has the ability of voice technology, and realizes different functional applications by running an independent program called "Skills" (similar to running an app on the operating system of a mobile phone), supporting a series of functions such as music playing, voice shopping, smart home control, and intelligent communication. Due to its first-Mover advantage and a large number of landing products, alexa has far surpassed other technology manufacturers in product carrying capacity and intelligence level (CNET's statistics in CES20 17).

Openness: Alexa 2065438+In June 2005, Amazon opened Alexa to third-party developers, and released two sets of development toolkits, Alexa Skills Kit(ASK) and Alexa Voice Service (AVS), to make it easier for developers to develop Alexa's "skills"; It also has a venture capital fund founded by alexa to support start-ups in the field of voice interaction and Alexa prize, a college student development competition. From more than 29 skills when 20 14 was released to nearly 4w skills today, Alexa has a vast number of skills that far exceed those of other technology vendors due to its active open policy and constantly optimized development tools.

Scope of application: Up to now, Amzon Alexa has been launched in 38 countries around the world (not yet launched in China), covering six languages (English, German, French, Italian, Spanish and Japanese). In addition to its own echo series speaker products, Alexa also empowers speaker products such as sonos, Lenovo and Harman Kardon, smart TVs such as amzon fire TV, tablets and smart phones such as amzon fire, Huawei mate9 and HTC, notebooks and PCs such as ASUS, Hewlett-Packard and Lenovo, smart home products such as smart refrigerators, smart lights and smart switches, wearable devices such as smart headphones and smart watches, and Ford and BMW.

2. Google Assistant

Basic information: Google assistant is Google's virtual assistant, which was officially released at the Google Developers Conference in May, 20 16. It supports voice interaction and is installed in Google's smartphones and smart speakers.

Openness: In February of 20 16, Google launched the developer platform Actions on Google, and in April of 20 17, it released the SDK (Software Development Toolkit) for developing Google assistant applications for third-party developers, further expanding the support for smart cars and other smart home devices. Google Assistant supports voice input and visual response, and can identify objects and collect visual information through the device's camera.

Scope of application: At present, Google Assistant has supported eight languages including English, Japanese, French, German and Spanish. It is estimated that more than 30 languages will be supported by the end of 20 18, covering 95% of Android phones (Chinese is not supported for the time being). In addition to its own Google home series speakers and Pixel series smartphones, Google assistant also empowers smart phones such as Sony and Nokia, smart TVs such as Panasonic, LG and Sonos, smart set-top boxes and smart speaker products, computers such as Lenovo and Aviva, and smart car products such as Volvo.

3. Microsoft Cortana

Basic information: Cortana is Microsoft's virtual intelligence assistant, which was officially released in June 20 15, and gradually applied to mobile devices equipped with windows operating system and Android/ios system. Cortana has voice interaction function, and uses bing's search engine information to answer questions. It can call applications, query the weather, recommend restaurants and attractions, and control smart homes.

Openness: At the build Developer Conference on 20 17, Microsoft withdrew from cortana skill development platform, allowing third-party developers to develop skills for Cortana.

Scope of application: Up to now, cortana supports nearly 10 languages such as Chinese (simple/complex), English, German, French and Japanese. Cortana has been integrated into many Microsoft products, such as Edge browser, windows 10, in-vehicle system, Skype (Microsoft's instant messaging service), and enabled Microsoft to cooperate with Harman Kardon's smart speaker invoke.

4. Apple siri

Basic information: Siri (voice interpretation and recognition interface) is Apple's virtual assistant. Siri was founded in 2007. At first, Siri was just an application on the iOS platform. 20 10 After Apple acquired Siri and redeveloped it in April, Siri became the built-in software of Apple devices, which was re-released on 20 1 1 and only allowed to run in iOS and macOS. Siri supports voice interaction and can complete data search, weather query, alarm clock setting and many other services.

Openness: Siri interface was opened at the Apple Developers Conference in June, 201016, and Sirikit was added to the IOS development platform to support developers to call Siri to display application content. Siri currently has no independent skill development platform.

Scope of application: Up to now, Siri supports more than 20 languages such as Chinese (simple and complicated), English, French, German and Italian, and empowers a full range of Apple products, such as iPhone, iPad, iPod, Apple watch and mac.

1. Iflytek

Basic information: Iflytek, founded in 1999, is the largest manufacturer of intelligent voice technology in China. He has long-term research and accumulation in the field of intelligent speech technology, and has international leading achievements in many technologies such as Chinese speech synthesis, recognition and evaluation. It has a close relationship with China officials and can be called "the national team of China sound industry". Iflytek's market share of voice technology in China is over 70%, and the market share of voice synthesis products is also over 70%.

Openness: Iflytek Open Platform is the first Iflytek open platform in the world to provide intelligent voice interaction capability of mobile Internet. Iflytek input method based on Iflytek open platform, Linxi voice assistant, AI+ education, AI customer service, AI medical care (voice electronic medical record, medical image aided diagnosis system, intelligent assistant, etc. ), small translator, flying fish intelligent vehicle system, Iflytek Morph microphone system in family scenes and other products are widely used in artificial intelligence.

Scope of application: Iflytek supports 34 languages, including dialects all over China. At present, it has empowered domestic big-name smart TVs such as Changhong, Hisense and Konka, wearable devices such as GlassX and ZWatch, smart cars at home and abroad such as Audi, BMW, Mercedes-Benz, General Motors, Ford, SAIC, Changan, Geely, Great Wall and Chery, smart speakers (JD.COM Ding Dong speakers) and chat robots (small fish at home). Smart home products such as curtains and air conditioners provide intelligent voice interactive services for more than 60,000 apps such as Didi taxi, Gaode map and QQ reading, covering all aspects of life such as chat, tools, videos, news and navigation.

2. Baidu dueros is small

Basic information: DuerOS is Baidu's conversational artificial intelligence system, which was officially released at Baidu AI Developers Conference on July 20 17. DuerOS has more than 200 capabilities in 10 category, such as audio-visual entertainment, information inquiry, life service and travel road conditions. Users can realize command control, information query, knowledge application, addressing navigation, daily chat, intelligent reminder and various O2O life services in different scenarios. At the same time, it supports the ability access of third-party developers.

Openness: DuerOS open platform includes intelligent device open platform and skill open platform, which are suitable for different types of hardware manufacturers and developers respectively. In order to facilitate "getting started", Baidu released the DuerOS suite for individuals, product manufacturers and special manufacturers, integrated third-party solutions including Yinzhi Technology, Xiansheng Internet, Intel and Rockchip, and launched the skill store APP "Xiaodujia".

Application: DuerOS supports Mandarin, English, Cantonese, Sichuan dialect and other languages. , and has empowered smart speakers, TVs, ice and other small household appliances and smart home products, smart phones, watches and other portable devices, car machines, smart rearview mirrors and other smart car products, with a cumulative capacity of 50 million units, daily life exceeding 65.438+million, and DuerOS 65.438+06 million.

3. Xiao Ai Open Platform Xiao Ai

Basic information: Xiao Ai Open Platform (formerly Shuidi Platform) opened its voice capability and SDK to the outside world from 2065438 to May 2007. Based on Xiaomi's hardware ecology and massive data, it provides the world's leading artificial intelligence technologies such as speech recognition and NLP, and provides one-stop artificial intelligence services for developers.

Scope of application: Xiao ai's open platform capability has been integrated into Xiaomi's software and hardware products, such as Xiaomi TV, Xiaomi AI speaker, Xiaomi Jin Fu' Mi Xiaobei', enabling Xiaomi's ecological chain to connect 8500 Internet of Everything devices, and the daily active users of virtual assistant Xiao Ai have reached100000.

4.AliGenie voice developer platform Tmall Elf

Basic information: AliGenie open platform was released at the Yun Qi Congress in June 17 and June 12, 2007. Initiated by Alibaba Artificial Intelligence Lab, it shares Alibaba's accumulated technology in the field of artificial intelligence in the form of API or SDK for enterprises/institutions/entrepreneurs/developers. At present, it has an online platform covering audio-visual entertainment, news information and shopping take-out.

Application scope: AliGenie developer platform mainly includes three parts: elf skill market, hardware open platform and industry solutions, which fully empowers smart home, manufacturing, retail, hotels, aviation and other service scenarios.

5. Tencent Cloud Xiao Wei

Basic information: Tencent Cloud's intelligent service system and intelligent service open platform help intelligent hardware manufacturers realize voice human-computer interaction and audio-video service capabilities. Since 20 12, WeChat AI team has applied functions such as voice input, voice recognition and semantic analysis technology to WeChat. Tencent Cloud Xiao Wei is named "Xiao Wei" because it is based on the voice technology of WeChat. It was officially released at Tencent's "Cloud+Future" Summit on June 201July.

Scope of application: Tencent Cloud Xiao Wei includes hardware open platform, skill open platform and service robot (intelligent customer service) platform. Combined with Tencent's social relationship chain, it covers many scenes such as family, car, sports, hotel and children's accompanying education.

6.Spirit DUI Open Platform

Basic information: Spirits was founded in Cambridge, England in 2007. Its founders are all from Cambridge. In 2008, he returned to China and settled in Suzhou. It is one of the few companies with man-machine dialogue technology in China and one of the few companies with independent property rights and integrated Chinese and English voice technology in the world. 2065438+September 2007, Spirits officially released the DUI (Dialogue User Interface) open platform, which takes task-based dialogue as the core, has the functions of chatting and answering questions, and creates humanized interaction. DUI, as an open platform for full-link intelligent dialogue, opens the dialogue function based on Spirits intelligent voice language technology, and provides development services such as GUI customization, version management and private cloud deployment.

DUI has four major systems: Qingnang (service and R&D support), Tianji (big data), Wei Zi (rich third-party resources) and Linglong (terminal solutions and environment). The DUI platform has access to a wealth of third-party content, and has built-in the most professional phonetic language skill store in China. With deep data visualization, personalized customization and zero threshold operation. Developers can achieve high customization of the whole link through DUI, and almost every module can be customized.

Application scope: The platform has covered many application scenarios such as automobile, home, robot, story machine and mobile assistant. Provide solutions such as smart cars, smart homes and intelligent robots, and empower cutting-edge intelligent products such as Tmall Elf X 1, Xiaomi AI speaker Xiao Ai, Lenovo smart speaker and Xiaomi Die 70-step smart rearview mirror.

1. Mobile phone

Mobvoi is a China artificial intelligence company invested by Google, founded by Li Zhifei, a Chinese scientist in Silicon Valley, who returned to China in 20 12. It has self-developed core technologies such as speech recognition, semantic analysis, vertical search, vision-based ADAS and robot SLAM. Representative software and hardware products include smart watch Ticwatch, car smart rearview mirror Ticmirror, smart speaker Tichome, Mobvoi voice assistant APP and advanced driver assistance system Ticeye in Magic Eyes.

2. Orion starry sky

Orion Star has a complete set of far-field voice technology. The self-developed full-link far-field voice interaction system "Orion Voice OS" empowers Himalayan "Xiaoya" speakers, and smart home products such as Midea, Haier, Bolian, Haier Youjia and Ouruibo. Xiaomi AI speaker and Xiaomi TV also adopt TTS (speech synthesis) technology and ASR (speech recognition) technology of Orion Star. Orion Star has its own speaker, Bao Xiao AI speaker, connected with WeChat payment and UnionPay payment, and integrated blockchain technology.

In 20 17, Orion also won the first place in the restricted category (only using the data provided by the contest) of Microsoft Million Celebrity Recognition Competition, which is recognized as the World Cup of Face Recognition. 2065438+March 2, 20081day, Orion Star officially released the robot product matrix in the field of artificial intelligence, including landing reception, sales, child companionship and many other scenes. At the same time, Orion OS, the robot platform of Orion, is released, which integrates self-developed multi-chip system, camera+vision algorithm, microphone array, Orion TTS, indoor navigation platform and seven-axis manipulator to form a complete robot technology chain. Orion OS has established strategic partnerships with Microsoft, sogou, Qualcomm, NVIDIA and Hammer Technology.

3.ruoqi Rokid

Rokid company was founded in July, 20 14, and belongs to Hangzhou Ban Ling science and technology co., ltd., headquartered in Hangzhou, China, with R&D centers in Beijing and San Francisco, which is dedicated to the research in the field of robotics, focusing on the core technologies such as long-range directional pickup/speech semantic recognition, face/gesture recognition, sound and projection system. Rokid, the existing PEBBLE Lunatone smart speaker, AR glasses Rokid Glass, intelligent robot ALIEN and other products. Rokid smart home robot won the innovation award of CES International Consumer Electronics Show on 20 16 and 20 17 for two consecutive years.

4. Aggregation entropy intelligent depth brain

Deep brain was founded in Shanghai on 20 12, and is committed to the research and development of artificial intelligence products. The core team is technical research talents from famous universities at home and abroad, providing in-depth man-machine dialogue capabilities for more than 100 manufacturers and reaching in-depth cooperation with Samsung, Huawei, Lenovo and ZTE. In 20 14, DeepBrain released the first smart speaker in China-Xiao Zhi Super Speaker, which was half a year earlier than the launch of Echo. Its semantic skill platform has been settled by thousands of developers, and more than 1000 semantic skills based on smart home have been developed.

5. Voice of sogou

Sogou was founded by Sohu Company on August 3rd, 2004, with the domain name of Sogou.com, in order to enhance Sohu's search ability. September 20 13, Tencent injected capital into sogou and injected its search service and input method business into sogou. 20 17, 1 1, sogou is listed on the NYSE, and Tencent currently holds 45.37% of the shares in sogou. Sohu's shareholding in sogou is 39.2 1%. The research of sogou's voice technology began on 20 12, and was officially launched on the sogou voice cloud open platform on June 20 13. It connects all products in sogou, including input methods and maps, and introduces sogou voice assistant. Like Siri, the interactive experience provided by sogou Voice Assistant on mobile phone can't make users rely on it enough, and the product utilization rate is not high. 20/kloc-in August, 2006, sogou released the voice interaction engine "Yin Zhi". In February, 20 17, we cooperated with NavInfo and Gefei to launch the software and hardware solution of Gefei intelligent car networking G8 Ⅱ, which provided ASR voice recognition capability for Xiaomi TV 4A released in March, 20 18, and released it for conference tablet manufacturer Vision.

Sales volume and market share are of great significance to smart speaker products. Because intelligent voice technology has just landed, it relies heavily on the feed of user data, and the more it is used, the smarter it will be.

According to the author's estimation, as of 20 18 Q 1, the global market share of smart speakers is as follows. Amzon has occupied 7 1% of the market with its first-Mover advantage and rich products, while Google has seized 12% of the market with its complete low, medium and high product matrix and user base. Tmall and Xiaomi occupy 6% and 4% market share respectively by virtue of e-commerce system, smart home ecology and low-cost explosion strategy. JD.COM set foot in smart speakers earlier. Apple's homepod was officially released on February 9 this year, with a high price, still accounting for 1%, and other brands all accounted for 3%.

As the pioneer of smart speaker category, Amzon constantly optimizes the new capabilities of speaker products and continues to innovate. Starting from the scene and form, small and low-priced echo dot, echo show with screen speakers and echo spot with alarm clock speakers have been launched one after another. There are not only promotional explosions, but also high-end category defense. It has a complete product matrix of high, medium and low grade, and the cumulative sales volume of each model exceeds 30 million. At present, it is the only smart speaker manufacturer with sales exceeding 10 million, leading the global smart speaker market.

Linglong Technology, co-founded by JD.COM and Iflytek, launched a Ding Dong speaker. As an earlier smart speaker manufacturer in China, JD.COM began to sell a series of new products from May 20 15. The overall product line is similar to Amzon, constantly exploring more forms and scenes, constantly providing more customized functions, and cutting the early education market with children's educational speakers. With the strong entry of Xiaomi, Ali and Baidu, JD.COM also launched high-end screen speaker PALY of Ding Dong and low-priced mini2 of Ding Dong to seize the domestic market.

In May of 20 16, when Amzon almost monopolized the smart speaker market, Google entered the market and launched Google home, which once occupied more than 20% of the market with its elegant design, intelligent question and answer based on Google search engine and price difference. And on 20 17 and 10, Google home mini with low price and Google home max with high price were launched, which continuously supported more new skills and scenarios: accessing more smart home devices, supporting 500w recipes to seize the kitchen, and supporting voice shopping.

As a leading eco-builder of smart home in China, Xiaomi's products cover smart home equipment such as headphones, mobile power supplies, bracelets, sockets, sphygmomanometer, air purifier, water purifier, sports camera, balance car, battery, bedside lamp, rice cooker, etc. Xiaomi began to develop the virtual assistant Xiao Ai at the end of 20 16, and officially released the smart speaker Xiao Ai in September of 20 17. In addition to the strong ecological background of Xiaomi House, Xiao Ai has received great attention because of his witty personnel. 20 18 Xiaomi launches Q Meng version of Xiao Ai classmate mini to join the domestic low-price impulse war.

Ali attaches great importance to the research and development of new technologies. 17 July, Tmall Elf x 1 released. At the same time, the Alibaba artificial intelligence laboratory, which is responsible for the research and development of Ali consumer-grade AI products, was unveiled. /kloc-In June of 0/7, Yun Qi Congress officially announced the establishment of Dharma Institute, recruiting experts in various key technical fields, and conducting research on basic science, AI chips, disruptive technological innovation, etc. On March 18, Tmall Elf M 1 cookies and Fire Eye Bracket went online, and in June, Tmall Elf Cube Sugar went online, and continued to explore more speaker shapes and fancy output AI technologies, including image recognition, face recognition, object detection and emotional feedback. At the same time, improve the cost performance of products and stick to the battlefield of low-cost impulse of speakers. Relying on Ali's powerful e-commerce network, Tmall Elf has become the largest-selling smart speaker brand in China.

20 17 February, Baidu wholly acquired Raven Technology, and 10 165438+ released Raven Smart Speaker with novel appearance, rich colors and detachable dot-matrix touchpad. Pricing 1699 against high-end speakers Sonos, Bose and Harman kardon. /kloc-at the beginning of 0/8, DOSS smart speakers were launched in conjunction with DOSS, a veteran audio manufacturer. In March and June, the first intelligent video speaker in China was listed at a low price of 599 yuan at home and in 89 yuan, and sold in JD.COM and Tmall. With its high cost performance, it still has a place in the price war of domestic speakers.

In addition to mainstream smart speakers with visible sales, there are many smart speaker products at home and abroad. For example, invoke jointly launched by Microsoft and Harman Kardon is equipped with Clova jointly launched by Microsoft Cortana, Line and Qualcomm, as well as a series of smart speakers of domestic small and medium-sized entrepreneurial teams, Tichome of Mobvoi, Xiaoya speakers of Himalaya and so on. With the gradual maturity of the smart speaker market, all kinds of speakers either find their own position or become silent.

In addition to providing the consumption function of core content resources, smart speakers also dig deep into family scenes, open platforms, attract third-party developers and provide more and more skills. Amzon Echo, the leading smart speaker industry, has more than 3w skills. From the perspective of skill growth gradient, 20 16 began to break out, and it is still rising with the increase of Echo sales. With more than 3w skills, Amzon is more like a voice operating system than its followers.

Just like the app of smartphone operating system, among the massive voice operating system skills, only a few really get attention, and a large number of skills become zombie skills, and no one cares. Therefore, other smart speaker brands, as long as they cover high-frequency and core functions and provide more resources, home control and creative functions, need not be afraid of Amzon's horrible skill number.

The functions of smart speaker products are similar, mainly divided into three directions: content skills, tool skills and interactive entertainment; The core function tendency of the function oriented to the core scene and the core population is gradually emerging.

The functions that users pay high attention to mainly include music, movies, life assistants, smart homes, games and entertainment, followed by educational content, fun and funny, news, news and financial functions;

Tool skill users have a low reputation, but it is irreplaceable; Interactive entertainment skills are highly replaceable, and functions with good word-of-mouth and experience are more popular; The evaluation of content skills is even, and the focus is on the availability of high-quality resources.

Extracted from the user review data of e-commerce platforms such as eBay, Wal-Mart, JD.COM and Tmall.

User experience summary:

1) The overall penetration rate of smart speaker products is very high, and the user acceptance is high. The evaluation matrix is 70 ~ 80% of the 5-star praise;

2) There is little difference in users' perception of basic voice performance such as awakening, recognition and analysis of speaker products (probably because the number of speaker brands owned by individuals is insufficient 1, and the comparison is not obvious); Sensitive to sound quality, richness of content resources and "intelligence (intelligence and fun)".

3) The satisfaction and love of the elderly and children for speakers are important consumption decision-making factors.

4) Overseas users have higher requirements for sound quality, and the use scenes of speakers are more clearly distinguished. They tend to be equipped with multiple speaker products for different residential scenes, and there are not many requirements for whether to bring their own batteries. Domestic users have insufficient requirements for sound quality and recognition level, and expect to bring their own batteries for easy movement.

5) Users expect that wake-up words can be customized, content resources are richer, and resources and content are interconnected.

1) speaker sales: in 20 17, the global shipment of smart speakers was 3,200 W, of which Amzon and Google divided the market by about 9: 1. The sales volume of Amzon speakers has exceeded 2000w, and the circulation and the number of active devices are far ahead in the world. By the end of April, 20 18, domestic manufacturers were led by Tmall and Xiaomi, with orders of 200w, while startups such as Baidu and Tencent were all below10W ... The big manufacturers with ecological closed-loop and technology platforms had the trend of "high cost-effective distribution" and occupied the fast user market at low prices. According to the conservative forecast of Konashe's analysis company, in 2065, the global speaker sales will increase to 5,630 units in 438+08, the sales of the United States as the main battlefield will reach 3,840 units, and the sales of China as the second largest market will reach 4.4 million units.

2) Functional coverage: The functions of smart speaker products are obviously homogeneous, which are mainly divided into three directions: content skills, tool skills and interactive entertainment; Large manufacturers with ecological closed-loop and technology platforms are building the AI technology platform with voice technology as the core into an Android/ios-style operating system, attracting more intelligent hardware manufacturers and independent developers to settle in; In view of the fact that there is no obvious gap in technical level, the transferability of skill developers, technology and skills will not be the key factors to win.

3) User feedback: the overall market acceptance of smart speaker products is high, and the evaluation matrix is 70 ~ 80% with 5 stars; Users' perception of the basic performance of voice interaction is not different, such as the success rate of awakening, recognition and analysis, but they are sensitive to the dimensions of sound quality, richness of content resources, intelligence and speaker's interest. At the same time, users' expectations are still rising, and more and more personalized requirements are put forward for speaker products.