Optical Character Recognition (OCR) gives a computer the ability to read text that appears in an image, letting applications make sense of signs, articles, flyers, pages of text, menus, or any other place that text appears as part of an image. The Mobile Vision Text API gives Android developers a powerful and reliable OCR capability that works with most Android devices and won't increase the size of your app.

In this codelab, you will build an app that shows a live camera preview and speaks any text it sees there. Along the way, you'll learn how to use the Mobile Vision API to delight and empower your users.

What you'll learn

What you'll need

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would rate your experience with building Android apps?

Novice Intermediate Proficient

You can either download all the sample code to your computer...

Download Zip

...or clone the GitHub repository from the command line.

$ git clone https://github.com/googlesamples/android-vision.git

You may need to update your installed version of Google Repository in order to use the Mobile Vision Text API.

Open Android Studio, and then open the SDK Manager:

There, make sure your version of Google Repository is up to date. It should be at least version 26.

Now you're ready to open the starter project.

  1. Select the ocr-reader-start directory from your sample code download (File > Import Project... > ocr-reader-start).
  2. Add the Google Play Services dependency to the app. Without this dependency, the Text API won't be available and you won't be able to build.

Open the build.gradle file in the app module and change the dependencies block to include the play-services-vision dependency. When you're done, it should look like this:

dependencies {
   compile fileTree(dir: 'libs', include: ['*.jar'])
   compile 'com.android.support:support-v4:23.0.1'
   compile 'com.android.support:design:23.0.1'
   compile 'com.google.android.gms:play-services-vision:9.0.0+'
}
  1. Enable USB debugging on your Android device.
  2. Click the Gradle sync button.
  3. Click the Run button.

After a few seconds, you should see the Read Text screen come up, but it's just a black screen.

Right now, it doesn't do anything, and the CameraSource isn't set up. But we're going to change that next.

Frequently Asked Questions

To start things out, we're going to create our TextRecognizer. This detector object processes images and determines what text appears within them. Once it's initialized, a TextRecognizer can be used to detect text in all types of images. Find the createCameraSource method and build a TextRecognizer.

OcrCaptureActivity.java

private void createCameraSource(boolean autoFocus, boolean useFlash) {
    Context context = getApplicationContext();

    // TODO: Create the TextRecognizer
    TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
    // TODO: Set the TextRecognizer's Processor.

    // TODO: Check if the TextRecognizer is operational.

    // TODO: Create the mCameraSource using the TextRecognizer.
}

Just like that, the TextRecognizer is built. However, it might not work yet. If the device does not have enough storage, or Google Play Services can't download the OCR dependencies, the TextRecognizer object may not be operational. Before we start using it to recognize text, we should check that it's ready. We'll add this check to createCameraSource after we initialized the TextRecognizer.

OcrCaptureActivity.java

// TODO: Check if the TextRecognizer is operational.
if (!textRecognizer.isOperational()) {
    Log.w(TAG, "Detector dependencies are not yet available.");

    // Check for low storage.  If there is low storage, the native library will not be
    // downloaded, so detection will not become operational.
    IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW);
    boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null;

    if (hasLowStorage) {
        Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show();
        Log.w(TAG, getString(R.string.low_storage_error));
    }
}

Now that we've checked that the TextRecognizer is operational, we could use it to detect individual frames. But we want to do something a little more interesting: read text live in the camera view. In order to do that, we'll create a CameraSource, which is a camera manager pre-configured for Vision processing. We're going to set the resolution high and turn autofocus on, because that's a good match for recognizing small text. If you knew your users would be looking at large blocks of text, like signage, you might use a lower resolution, which would be able to process frames more quickly.

OcrCaptureActivity.java

// TODO: Create the mCameraSource using the TextRecognizer.
mCameraSource =
        new CameraSource.Builder(getApplicationContext(), textRecognizer)
        .setFacing(CameraSource.CAMERA_FACING_BACK)
        .setRequestedPreviewSize(1280, 1024)
        .setRequestedFps(15.0f)
        .setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null)
        .setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE : null)
        .build();

Here's what the complete createCameraSource should look like when you're done:

OcrCaptureActivity.java

private void createCameraSource(boolean autoFocus, boolean useFlash) {
    Context context = getApplicationContext();

    // Create the TextRecognizer
    TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
    // TODO: Set the TextRecognizer's Processor.

    // Check if the TextRecognizer is operational.
    if (!textRecognizer.isOperational()) {
        Log.w(TAG, "Detector dependencies are not yet available.");

        // Check for low storage.  If there is low storage, the native library will not be
        // downloaded, so detection will not become operational.
        IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW);
        boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null;

        if (hasLowStorage) {
            Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show();
            Log.w(TAG, getString(R.string.low_storage_error));
        }
    }

    // Create the mCameraSource using the TextRecognizer.
    mCameraSource =
            new CameraSource.Builder(getApplicationContext(), textRecognizer)
            .setFacing(CameraSource.CAMERA_FACING_BACK)
            .setRequestedPreviewSize(1280, 1024)
            .setRequestedFps(15.0f)
            .setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null)
            .setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE : null)
            .build();
}

If you build the app now, you should see a live camera view! But in order to process images from the camera, we need to handle that last TODO in createCameraSource: create a Processor to handle the text detections as they come in. We'll do that in the next step.

By now, your app could detect text on individual frames using the detect method on the TextRecognizer. That's what you would do if you wanted to find text in a photograph or other image file. But in order to read text straight from the camera, it's useful to implement a Processor, which will handle detections as often as they become available.

Go to the OcrDetectorProcessor class and have it implement Detector.Processor<TextBlock>.

OcrDetectorProcessor.java

public class OcrDetectorProcessor implements Detector.Processor<TextBlock> {
    private GraphicOverlay<OcrGraphic> mGraphicOverlay;

    OcrDetectorProcessor(GraphicOverlay<OcrGraphic> ocrGraphicOverlay) {
        mGraphicOverlay = ocrGraphicOverlay;
    }
}

That interface requires two methods be implemented. The first, receiveDetections, will receive TextBlocks from the TextRecognizer as they become available. The second, release, can be used to cleanly get rid of resources when the TextRecognizer is disposed of. In this case, we only have to clear the graphic overlay, which cleans up all the OcrGraphic objects.

We'll get the TextBlocks from the detection and create OcrGraphic objects for each text block that the processor detects. For now, they won't render; we'll implement their draw behavior in the next step.

OcrDetectorProcessor.java

@Override
public void receiveDetections(Detector.Detections<TextBlock> detections) {
    mGraphicOverlay.clear();
    SparseArray<TextBlock> items = detections.getDetectedItems();
    for (int i = 0; i < items.size(); ++i) {
        TextBlock item = items.valueAt(i);
        if (item != null && item.getValue() != null) {
            Log.d("Processor", "Text detected! " + item.getValue());
        }
        OcrGraphic graphic = new OcrGraphic(mGraphicOverlay, item);
        mGraphicOverlay.add(graphic);
    }
}

@Override
public void release() {
    mGraphicOverlay.clear();
}

Now that the processor is ready, we have to set the textRecognizer to use it. Head back to the last remaining TODO in the createCameraSource method in OcrCaptureActivity:

OcrCaptureActivity.java

// Create the TextRecognizer
TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
// TODO: Set the TextRecognizer's Processor.
textRecognizer.setProcessor(new OcrDetectorProcessor(mGraphicOverlay));

Now, build and run the app. At this point, you should be able to point the camera at text and see the "Text detected!" debug messages appear in Android Monitor Logcat a few times a second! But that's not a very intuitive way to visualize what the TextRecognizer is seeing.

In the next step, we'll put that text on screen.

The debug message tells us that text is being recognized. We'd like the user to see that, too, by drawing the text on top of the camera preview.

Let's implement the OcrGraphic draw method. We want to see if the graphic has text, translate its bounding box to the appropriate coordinates for the canvas, and then draw the box and text.

OcrGraphic.java

@Override
public void draw(Canvas canvas) {
    // TODO: Draw the text onto the canvas.
    if (mText == null) {
        return;
    }

    // Draws the bounding box around the TextBlock.
    RectF rect = new RectF(mText.getBoundingBox());
    rect.left = translateX(rect.left);
    rect.top = translateY(rect.top);
    rect.right = translateX(rect.right);
    rect.bottom = translateY(rect.bottom);
    canvas.drawRect(rect, sRectPaint);

    // Render the text at the bottom of the box.
    canvas.drawText(mText.getValue(), rect.left, rect.bottom, sTextPaint);
}

Build and try it out on our sample text:

You should see the box appear on screen with the text in it! Feel free to play with the color values using TEXT_COLOR.

But how about this one?

The bounding box looks right, but the text is all at the bottom of the box.

That's because the engine puts all the text it recognizes in a TextBlock into one complete sentence, even if it sees the sentence broken over multiple lines. If you want the complete sentence, that's very useful. But what if you want to know where each individual line of text actually is?

You can get the Lines from a TextBlock by calling getComponents, and then you can iterate over each line to get the location and values of the text within it. This lets you put the text in the place it actually appears.

OcrGraphic.java

@Override
public void draw(Canvas canvas) {
    // TODO: Draw the text onto the canvas.
    if (mText == null) {
        return;
    }

    // Draws the bounding box around the TextBlock.
    RectF rect = new RectF(mText.getBoundingBox());
    rect.left = translateX(rect.left);
    rect.top = translateY(rect.top);
    rect.right = translateX(rect.right);
    rect.bottom = translateY(rect.bottom);
    canvas.drawRect(rect, sRectPaint);

    // Break the text into multiple lines and draw each one according to its own bounding box.
    List<? extends Text> textComponents = mText.getComponents();
    for(Text currentText : textComponents) {
        float left = translateX(currentText.getBoundingBox().left);
        float bottom = translateY(currentText.getBoundingBox().bottom);
        canvas.drawText(currentText.getValue(), left, bottom, sTextPaint);
    }
}

Try that again with the text:

Better! You can choose how granular you want to go based on your application's needs. If you like, you can call getComponents on each Line and get the positions of the actual Elements (words, in Latin languages). You can also customize the textSize to fill as much space as the actual text does on screen.

Now that we can see where the text is and what it says, let's do something with it.

Now, text from the camera has been converted to useful, structured Strings, and these Strings are being displayed on screen. Let's do something else with them.

By using the TextToSpeech API built into Android, and implementing the contains method in OcrGraphic, we can make the app speak the text out loud when the TextBlock graphic is touched.

First, let's implement the contains method in OcrGraphic. The getGraphicAtLocation method in GraphicOverlay is already moving the x and y values into the View's coordinates, so what's left for us to do it check if those coordinates happen to be within the bounds of this graphic's displayed bounding box.

OcrGraphic.java

public boolean contains(float x, float y) {
    // TODO: Check if this graphic's text contains this point.
    if (mText == null) {
        return false;
    }
    RectF rect = new RectF(mText.getBoundingBox());
    rect.left = translateX(rect.left);
    rect.top = translateY(rect.top);
    rect.right = translateX(rect.right);
    rect.bottom = translateY(rect.bottom);
    return (rect.left < x && rect.right > x && rect.top < y && rect.bottom > y);
}

Now that we can check for whether text is contained at a location, let's go to the onTap method of OcrCaptureActivity and handle the tap action by checking for a graphic.

OcrCaptureActivity.java

private boolean onTap(float rawX, float rawY) {
    // TODO: Speak the text when the user taps on screen.
    OcrGraphic graphic = mGraphicOverlay.getGraphicAtLocation(rawX, rawY);
    TextBlock text = null;
    if (graphic != null) {
        text = graphic.getTextBlock();
        if (text != null && text.getValue() != null) {
            Log.d(TAG, "text data is being spoken! " + text.getValue());
            // TODO: Speak the string.
        }
        else {
            Log.d(TAG, "text data is null");
        }
    }
    else {
        Log.d(TAG,"no text detected");
    }
    return text != null;
}

With that in place, you should be able to build and check via Android Monitor (adb logcat) that text is being tapped when you expect it to.

How about speaking the text out loud? Go to the top of the Activity and find the onCreate method. When we start the app, we should initialize the TextToSpeech engine so it's ready when we need it. There's already a TextToSpeech class variable, so we just need to initialize it with the context and a generic OnInitListener.

OcrCaptureActivity.java

@Override
public void onCreate(Bundle bundle) {

    // (Portions of this method omitted)

    // TODO: Set up the Text To Speech engine.
    TextToSpeech.OnInitListener listener =
            new TextToSpeech.OnInitListener() {
                @Override
                public void onInit(final int status) {
                    if (status == TextToSpeech.SUCCESS) {
                        Log.d("TTS", "Text to speech engine started successfully.");
                        tts.setLanguage(Locale.US);
                    } else {
                        Log.d("TTS", "Error starting the text to speech engine.");
                    }
                }
            };
    tts = new TextToSpeech(this.getApplicationContext(), listener);
}

Okay, all that's left is to add the code to speak the string out loud in the onTap method.

OcrCaptureActivity.java

private boolean onTap(float rawX, float rawY) {
    // TODO: Speak the text when the user taps on screen.
    OcrGraphic graphic = mGraphicOverlay.getGraphicAtLocation(rawX, rawY);
    TextBlock text = null;
    if (graphic != null) {
        text = graphic.getTextBlock();
        if (text != null && text.getValue() != null) {
            Log.d(TAG, "text data is being spoken! " + text.getValue());
            // Speak the string.
            tts.speak(text.getValue(), TextToSpeech.QUEUE_ADD, null, "DEFAULT");
        }
        else {
            Log.d(TAG, "text data is null");
        }
    }
    else {
        Log.d(TAG,"no text detected");
    }
    return text != null;
}

Now, when you build the app and tap on detected text, it should be spoken aloud. Try it out!

You've got an app that can read text straight from the camera and speak it out loud!

From here, you're well set up to explore lots of other possible uses for Text Detection in your own apps. Read addresses and phone numbers from business cards, make images of documents useful or searchable, and assist with translation or accessibility. Apply OCR wherever you want to understand the text contained in an image.

What we've covered

Next Steps

Learn More