Unicodedecodeerror: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

Question

Unicodedecodeerror: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

calendar_todayMar 26 • schedule4 min read

When working with strings you may have encountered the error "UnicodeDecodeError: 'utf-8' codec can't decode byte X in position X: invalid continuation byte". It occurs because when we specify an incorrect encoding when decoding bytes data. In order to fix the issue, we have to specify correct encoding.

Table of Contents: #

UnicodeDecodeError: 'utf-8' codec can't decode byte X in position X: invalid continuation byte #

Managing the encoding of character strings can sometimes cause problems in Python. For example, the error "UnicodeDecodeError: 'utf-8' codec can't decode byte xxx in position x: invalid continuation byte" occurs in a script in Python when trying to decode a string to UTF-8 but that it is not encoded in this way. If you are handling a file and you don't know the encoding, there are solutions to work around this error.

To understand what the errors means, we have to analyze the error message. Strings have encoding that describes character set and collation the string bytes objects will accept. When we encode the string in UTF-8, we specify data in UTF-8 standard format. If we were to decode the string, we were to decode the string with latin-1 for instance, that will cause the error "UnicodeDecodeError: 'utf-8' codec can't decode byte in position: invalid continuation byte".

Encoding is a process whereby we turn sequence of characters, which includes alphabet, numbers, punctuation and all of the other symbols, into bytes for efficiency in transmission and storing. Decoding is the opposite of encoding. It's a process of turning bytes into the sequence of characters.

Here's a demonstration of the problem:

str_bytes = 'ééééééé'.encode('latin-1')

# ⛔️ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
my_str = str_bytes.decode('utf-8')

The code runs into an error when we try to decode the string with an encoding that wasn't used to decode the string. To fix this we have to decode the string using the same encoding we used to encode it. Here's the fix to the previous error:

str_bytes = 'ééééééé'.encode('latin-1')
my_str = str_bytes.decode('latin-1')

Now the program runs correctly and no errors occur.

The Solution #

In general, you ought to decode your strings in the same encoding you used to decode but at times this issue is hard to avoid because it can be difficult to keep track of many encoding. That's why I present two solution. The first one is to decode your strings correctly and the second one is to read the strings from a file and have Python automatically find the string encoding.

Solution one

When doing a decoding of bytes using decode() method, you have to make sure that the encoding is the one was used to encode the string into bytes. For example, in this example I have set the encoding of first_string to UTF-8 which is an encoding that accepts all known language on earth, and then to decode it I have set the encoding to UTF-8 again.

first_string = 'ééééééé'.encode('utf-8')
decoded_str = first_string.decode('utf-8')
print(decoded_str)

Solution two

File management functions have a "Binary" mode which treats characters as "bytes". With this mode, no decoding is performed and the characters are thus preserved, whatever their encoding. To open a file in binary mode, you must specify the "rb" mode.

with open(filePath, 'rb') as file:
    content = file.read()

To write to a file in binary mode, you must use the "wb" or "ab" modes.

If, despite the encoding problem, you want to open the file and read the content in utf-8, it is possible to add an additional parameter to the "open()" function telling it to ignore errors. Characters that cannot be read will be ignored and not displayed.

with open(filePath, encoding="utf8", errors='ignore') as file:

The "byte 0xff in position 0" error that appears when you try to decode a file in UTF-8 may simply indicate that the file is encoded in UTF-16. You can try changing the opening encoding of the file.

with open(filePath, encoding='utf-16') as file:

This solution only works with Python 3, which includes UTF-16 encoding support in the "open()" function. If you are using Python 2, you will have to perform a conversion after opening the file in binary mode.

with open(filePath,'rb') as file:
    content = file.read()
content = content.rstrip("\n").decode("utf-16")

The Conclusion

Thank you for sticking with this tutorial all the way to the end. When attempting to decode a string using decode() method and passing wrong encoding as a parameter the error message "UnicodeDecodeError: 'utf-8' codec can't decode byte X in position X: invalid continuation byte". To fix this issue, you have to either find out what encoding was used for the string in encode() method and use it to decode the string using decode() method. If you are reading encoding from a file and not a string, you can open the file in binary mode with open() method and pass the correct encoding to it's encoding parameter.

If you found this article helpful. Please comment and share. If you find any issue you can always tell, I'll get back to you as soon as possible. Take care!

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Travis Gockel

439 Points • 11 Badges

1Posts

19Comments

I’m a software engineer working on building and maintaining large-scale systems across different dom... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Unicode escape' codec can't decode bytes in position 2-3: truncated \ uxxxxxxxx escape Jundarer - May 9
	Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI Masbadar - Mar 12
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20
	Data Types in Java, Part 1, the primitive types in detail Pravin - Sep 15, 2025
	There are 4 bytes in an ipv4 address. what is the highest decimal value you can have for one byte? manualpost - Apr 6

Unicodedecodeerror: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

Table of Contents: #

UnicodeDecodeError: 'utf-8' codec can't decode byte X in position X: invalid continuation byte #

The Solution #

Solution one

Solution two

The Conclusion

0 Comments

Please log in to comment on this post.

More Posts

Unicode escape' codec can't decode bytes in position 2-3: truncated \ uxxxxxxxx escape

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

Data Types in Java, Part 1, the primitive types in detail

There are 4 bytes in an ipv4 address. what is the highest decimal value you can have for one byte?

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,754 amazing developers

Don't have an account? Sign up

OR

Unicodedecodeerror: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

Table of Contents: #

UnicodeDecodeError: 'utf-8' codec can't decode byte X in position X: invalid continuation byte #

The Solution #

Solution one

Solution two

The Conclusion

0 Comments

Please log in to comment on this post.

More Posts

Unicode escape' codec can't decode bytes in position 2-3: truncated \ uxxxxxxxx escape

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

Data Types in Java, Part 1, the primitive types in detail

There are 4 bytes in an ipv4 address. what is the highest decimal value you can have for one byte?

Related Jobs

Commenters (This Week)