【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

1. 来源

github: MsEdgeTTS, edge-TTS-record
我爱破解:微软语音助手免费版，支持多种功能，全网启动

2. 准备工作

功能来源：edge浏览器
抓包工具：fiddler
模拟请求：postman

3. 主要分析步骤

第一步：确定edge浏览器read aloud功能用js如何调用，fiddler上面没有捕捉到

const voices = speechSynthesis.getVoices() function speakbyvoice(text, voice) { 
          var utter = new SpeechSynthesisUtterance(text)  for (let v of voices) { 
           if (v.name.includes(voice)) { 
            utter.voice = v    break   }  }  speechSynthesis.speak(utter)  return utter } speakbyvoice("hello world", "Xiaoxiao")

第二步：尝试正确edge-TTS-record抓包，抓一个http请求和websocket连接。对照MsEdgeTTS代码可知：

/* * postman中模拟成功 * 获取可用语音包选项，等价于speechSynthesis.getVoices() * http url: https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/voices/list?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 * method: GET */ { 
          uri: "https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/voices/list",  query: { 
           trustedclienttoken: "6A5AA1D4EAFF4E9FB37E23D68491D6F4"  }  method: "GET" }  /* * postman中模拟成功 * 发送wss连接、传输文本和语音数据，等价于speechSynthesis.speak(utter) * wss url: wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1?TrustedClientToken= * send: 发送两个数据，第一个是所需的音频格式，第二个是ssml标记文本(需要随机生成)requestid，替换掉guid分隔符-即可) * receive: 接收到的webm同样包含音频字节requestid正文部分，用Path=audio\r\n定位文本索引 * 存在的问题： 1.第一次发送的音频格式文本只在webm-24khz-16bit-mono-opus只有在格式下才能成功连接，尝试其他格式后才能直接断开； * 2.第二次发送ssml文本不支持mstts对命名空间的分析，是的Auzure阉割版语音服务，比如不能出现xmlns:mstts="****"、<mstts:express-as/>、<p/>、<s/>等语言标记 */ { 
        
	uri: "https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/voices/list",
	query: { 
        
		trustedclienttoken: "6A5AA1D4EAFF4E9FB37E23D68491D6F4"
	},
	sendmessage: { 
        
		audioformat: ` X-Timestamp:Mon Jul 11 2022 17:50:42 GMT+0800 (中国标准时间) Content-Type:application/json; charset=utf-8 Path:speech.config {"context":{"synthesis":{"audio":{"metadataoptions":{"sentenceBoundaryEnabled":"false","wordBoundaryEnabled":"true"},"outputFormat":"webm-24khz-16bit-mono-opus"}}}}`,
		ssml: ` X-RequestId:7e956ecf481439a86eb1beec26b4db5a Content-Type:application/ssml+xml X-Timestamp:Mon Jul 11 2022 17:50:42 GMT+0800 (中国标准时间)Z Path:ssml <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)'><prosody pitch='+0Hz' rate ='+0%' volume='+0%'> hello world</prosody></voice></speak>`
	}
}

4. 编写代码

websocket库：WebSocketSharp。最新版安装失败的可以降版本安装，此文发布的时候最新预览版是1.0.3-rc11

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using WebSocketSharp; // nuget包：WebSocketSharp（作者：sta，此文安装版本：1.0.3-rc10）

namespace ConsoleTest
{ 
        
    internal class Program
    { 
        
        static string ConvertToAudioFormatWebSocketString(string outputformat)
        { 
        
            return "Content-Type:application/json; charset=utf-8\r\nPath:speech.config\r\n\r\n{\"context\":{\"synthesis\":{\"audio\":{\"metadataoptions\":{\"sentenceBoundaryEnabled\":\"false\",\"wordBoundaryEnabled\":\"false\"},\"outputFormat\":\"" + outputformat + "\"}}}}";
        }
        static string ConvertToSsmlText(string lang, string voice, string text)
        { 
        
            return $"<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='{ 
          lang}'><voice name='{ 
          voice}'>{ 
          text}</voice></speak>";
        }
        static string ConvertToSsmlWebSocketString(string requestId, string lang, string voice, string msg)
        { 
        
            return $"X-RequestId:{ 
          requestId}\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n{ 
          ConvertToSsmlText(lang, voice, msg)}";
        }
        
        static void Main(string[] args)
        { 
        
            var url = "wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4";
            var Language = "en-US";
            var Voice = "Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)";
            var audioOutputFormat = "webm-24khz-16bit-mono-opus";
            var binary_delim = "Path:audio\r\n";

            var msg = "Hello world";
            var sendRequestId = Guid.NewGuid().ToString().Replace("-", "");
            var dataBuffers = new Dictionary<string, List<byte>>();

            var webSocket = new WebSocket(url);
            webSocket.SslConfiguration.ServerCertificateValidationCallback = (sender, certificate, chain, sslPolicyErrors) => true;
            webSocket.OnOpen += (sender, e) => Console.WriteLine("[Log] WebSocket Open");
            webSocket.OnClose += (sender, e) => Console.WriteLine("[Log] WebSocket Close");
            webSocket.OnError += (sender, e) => Console.WriteLine("[Error] error message: " + e.Message);
            webSocket.OnMessage += (sender, e) =>
            { 
        
                if (e.IsText)
                { 
        
                    var data = e.Data;
                    var requestId = Regex.Match(data, @"X-RequestId:(?<requestId>.*?)\r\n").Groups["requestId"].Value;
                    if (data.Contains("Path:turn.start"))
                    { 
        
                        // start of turn, ignore. 开始信号，不用处理
                    }
                    else if (data.Contains("Path:turn.end"))
                    { 
        
                        // end of turn, close stream. 结束信号，可主动关闭socket
                        // dataBuffers[requestId] = null;
                        // 不要跟着MsEdgeTTS中用上面那句，音频发送完毕后，最后还会收到一个表示音频结束的文本信息
                        webSocket.Close();
                    }
                    else if (data.Contains("Path:response"))
                    { 
        
                        // context response, ignore. 响应信号，无需处理
                    }
                    else
                    { 
        
                        Console.WriteLine("unknow message: " + data); // 未知错误，通常不会发生
                    }
                }
                else if (e.IsBinary)
                { 
        
                    var data = e.RawData;
                    var requestId = Regex.Match(e.Data, @"X-RequestId:(?<requestId>.*?)\r\n").Groups["requestId"].Value;
                    if (!dataBuffers.ContainsKey(requestId))
                        dataBuffers[requestId] = new List<byte>();
                    if (data[0] == 0x00 && data[1] == 0x67 && data[2] == 0x58)
                    { 
        
                        // Last (empty) audio fragment. 空音频片段，代表音频发送结束
                    }
                    else
                    { 
        
                        var index = e.Data.IndexOf(binary_delim) + binary_delim.Length;
                        dataBuffers[requestId].AddRange(data.Skip(index));
                    }
                }
            };


            webSocket.Connect();
            var audioconfig = ConvertToAudioFormatWebSocketString(audioOutputFormat);
            webSocket.Send(audioconfig);
            webSocket.Send(ConvertToSsmlWebSocketString(sendRequestId, Language, Voice, msg));

            while (webSocket.IsAlive) { 
         }
            Console.WriteLine("接收到的音频字节长度：" + dataBuffers[sendRequestId].Count);
            Console.ReadKey(true);
        }
    }
}

5. 结语

模拟websocket请求成功，缺陷是postman模拟结果显示音频outputformat参数只能是webm-24khz-16bit-mono-opus，也就是说还需要再用ffmpeg之类的库转换格式。暂时也没找到比较好用的库，先记录到这

资讯详情

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

1. 来源

2. 准备工作

3. 主要分析步骤

4. 编写代码

5. 结语

动力学技术KTU1121 USB Type-C 端口保护器的介绍、特性、及应用

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

1. 来源

2. 准备工作

3. 主要分析步骤

4. 编写代码

5. 结语

动力学技术KTU1121 USB Type-C 端口保护器的介绍、特性、及应用

最近热搜

历史搜索 清除历史记录

历史搜索清除历史记录