Cognitive Services – Speech to Text

Spread the love

You can use the Cognitive Services Speech SDK to convert Speech to Text for your application using recorded voice from a microphone or an audio file. In this tutorial we are going to see how to convert speech to text from an audio file.


To use the Speech SDK you need to create a Cognitive Services API account with access to the Speech APIs. If you don’t have an Azure subscription, you can create a trial account. You need the access key provided when you activate your free trial, or you may use a paid subscription key from your Azure dashboard.

How to convert Speech to Text

  1. Create a C# project in Visual Studio 2017
  2. Download the Microsoft Cognitive Services Speech SDK. You can install it via nuget using the following command
    > Install-Package Microsoft.CognitiveServices.Speech -Version 0.6.0	
  3. Add the following top level declarations in your code
    using System;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
  4. Add the following code
    // Creates an instance of a speech factory with specified subscription key and service region.
    // Replace with your own subscription key and service region (e.g., "westus").
    var factory = SpeechFactory.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    var stopRecognition = new TaskCompletionSource<int>();
    // Creates a speech recognizer using file as audio input.
    // Replace with your own audio file name.
    using (var recognizer = factory.CreateSpeechRecognizerWithFileInput(@"demosampletotext.wav"))
        // Subscribes to events.
        recognizer.IntermediateResultReceived += (s, e) => {
            Console.WriteLine($"\n    Partial result: {e.Result.Text}.");
        recognizer.FinalResultReceived += (s, e) => {
            var result = e.Result;
            Console.WriteLine($"Recognition status: {result.RecognitionStatus.ToString()}");
            switch (result.RecognitionStatus)
                case RecognitionStatus.Recognized:
                    Console.WriteLine($"\n    Final result: Text: {result.Text}, Offset: {result.OffsetInTicks}, Duration: {result.Duration}.");
                case RecognitionStatus.InitialSilenceTimeout:
                    Console.WriteLine("The start of the audio stream contains only silence, and the service timed out waiting for speech.\n");
                case RecognitionStatus.InitialBabbleTimeout:
                    Console.WriteLine("The start of the audio stream contains only noise, and the service timed out waiting for speech.\n");
                case RecognitionStatus.NoMatch:
                    Console.WriteLine("The speech was detected in the audio stream, but no words from the target language were matched. Possible reasons could be wrong setting of the target language or wrong format of audio stream.\n");
                case RecognitionStatus.Canceled:
                    Console.WriteLine($"There was an error, reason: {result.RecognitionFailureReason}");
        recognizer.RecognitionErrorRaised += (s, e) => {
            Console.WriteLine($"\n    An error occurred. Status: {e.Status.ToString()}, FailureReason: {e.FailureReason}");
        recognizer.OnSessionEvent += (s, e) => {
            Console.WriteLine($"\n    Session event. Event: {e.EventType.ToString()}.");
            // Stops recognition when session stop is detected.
            if (e.EventType == SessionEventType.SessionStoppedEvent)
                Console.WriteLine($"\nStop recognition.");
        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });
        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
  5. In the above code replace your subscription key from your Azure Account and your service region. You can find a list of service regions here
  6. Choose the audio you want to convert to text. Make sure that the format is single-channel (mono) WAV / PCM with a sampling rate of 16 kHz.
  7. Run the code and check out the results!

Leave a Reply

Your email address will not be published. Required fields are marked *