Translate a dataframe column

Hey!

I want to translate a dataframe column into english. This column contains at each row a list of words. I tried the following :

from googletrans import Translator

translator = Translator() 
df.col=df.col.apply(lambda x: translator.translate(x, dest='en'))

But this part never ends, and when it stops I get the error read out of time. What am I doing wrong?

Thank you in advance!

Hello Marianthi,

To translate a dataframe column containing lists of words in Python, you can use the Google Cloud Translate API. Here’s an example code :

from google.cloud import translate_v2
import pandas as pd

# create a translation client
translate_client = translate_v2.Client()

# read in your dataframe
df = pd.read_csv(path)

# define the target language for translation (in this case, English)
target_language = 'en'

# define a function to apply to each row of the column, translating the list of words

def translate_list(row):

    # join the list of words into a single string
    words_string = ' '.join(row[column_to_translate])

    # translate the string to the target language
    translated_text = translate_client.translate(words_string, target_language=target_language)

    # split the translated text back into a list of words
    translated_words = translated_text['translatedText'].split()

    return translated_words

# define the column containing the lists of words you want to translate
column_to_translate = 'your_column_name'

# apply the translation function to the column
df['translated_column'] = df.apply(translate_list, axis=1)

Yohan

1 Like

I have installed google-cloud by :
pip install google-cloud

but I still get the following error :

What does this mean? How can I fix this error?

Thank you in advance!

To use the Google Cloud Translation API to translate text in Python, you will need to follow these general steps:

  1. Set up a Google Cloud account and create a new project.
  2. Enable the Cloud Translation API and generate API credentials (a service account key) for your project.
  3. Install the google-cloud-translate Python package using pip: pip install google-cloud-translate
  4. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of your API credentials file.

Thus, you must click on the following link and follow the steps above.

Is this the only way to translate some elements of the dataframe? Is this payable?
Does the googletrans library not work?

Could you explain to me further how the step 4 works exactly? Shall I add a new file .env with

GOOGLE_APPLICATION_CREDENTIALS = '...'

and then in the main python file I write the following ?

import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")

or what am I supposed to do?

Certainly, I can explain step 4 in more detail.

  1. First, you need to create a service account in your Google Cloud Console. To do this, navigate to the IAM & Admin page and click on « Service accounts ». Then, click on « Create Service Account » and follow the prompts to create a new service account with the appropriate permissions to access the Cloud Vision API.

  2. After you have created the service account, you need to create a key for the account. Click on the service account you just created, then click on « Keys » and « Add Key ». Choose « JSON » as the key type and click « Create ».

  3. This will download a JSON file containing your service account key.

GOOGLE_APPLICATION_CREDENTIALS= "your_path_json_key"

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set the path to the JSON key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')

With these steps completed, your Python code should now be able to authenticate with the Cloud Vision API.

Yohan