Byte Swapping in NumPy

Learn via video courses
Topics Covered

Overview

All sorts of data are stored in computers as a stream of binary integers known as bits. Text, images, and videos are all made up of ones and zeros. Byte ordering enables you to change individual pieces of data at the most granular level. Python usually employs high-level abstractions to shield you from the underlying components, but the ndarray, an n-dimensional NumPy array, can access data in memory. In actuality, overloaded Byte ordering flavors are more frequent. However, you will be surprised by their quirks while dealing with them in their natural state!

Introduction

Before we begin, we'll presume you've successfully installed Python on your computer and are familiar with NumPy arrays.

Let us take a step back and study the binary system, which is required to comprehend byte ordering. If you're already familiar with it, go to the Byte swapping section below.

Numbers may be represented in an infinite number of different ways. Since the dawn of time, people have utilized numerous notations, such as Roman numerals. The majority of modern civilizations use positional notation because it is cost-effective, adaptable, and well-suited to arithmetic.

Computers, on the other hand, treat data as a collection of numbers expressed in the base-two numeral system, sometimes known as the binary system. These numbers only contain two digits, zero and one.

For example, under the base-ten system, the binary number 11011000001 equals 1729.

We think you now have a solid understanding of binary numbers. It is time to examine how our system stores data in our system.

Byte Ordering and Ndarrays

There is no disagreement on the order of bits in a single byte. Regardless of how they're physically put up in memory, the least-significant bit is always at index zero, and the most-significant bit is at index seven.

However, there is no agreement on byte order in multibyte pieces of data. A piece of information with more than one byte can be read from left to right, as in an English text, or from right to left, as in an Arabic text. Computers view bytes in a binary stream in the same way that humans see words in a sentence.

It makes no difference which direction computers read data from as long as they follow the same rules everywhere. It is common to discover that the memory you wish to investigate with an array does not have the same byte ordering as the system on which Python is running.

For illustration, I may be operating on a machine with a little-endian CPU but have fetched data from a big-endian machine's file. Assume I've fetched 4 bytes from a file on a big-endian machine. I understand that these four bytes represent two 16-bit numbers. On a big-endian computer, a two-byte integer is encoded with the Most Significant Byte (MSB) first, followed by the Least Significant Byte (LSB) . Thus, in memory order, the first byte would be the MSB integer of the first integer, the second byte would be the LSB of the first integer, and so on.

Consider the following example: imagine the two integers we chose were fact 7 and 776. Because 776 may be written as 256 * 3 + 8, the four bytes in memory would be: 0, 7, 3, 8.

You're probably wondering what we mean by Endian machines. Don't worry; I've got your back. Let us go to the following part, where I will briefly describe the concept of endianness.

Big-endian and Little-endian System

If you think of memory as a one-dimensional tape made up of bytes, you'll need to divide the data into individual bytes and arrange them in a contiguous block. Some people like to begin from the left end since that is how they read, but others prefer to begin at the right end. When bytes are arranged in a row from left to right, the most significant byte is allocated to the lowest memory address. This is referred to as big-endian order. When bytes are placed from right to left, the least-significant byte is saved first. This is known as little-endian order.

Circling back to the previous example, let's have a look at some practical implementations.

Code Cell :

Output :

To access these integers, we may utilize an ndarray. In that situation, we may wrap this memory in an array and inform numpy that it contains two numbers that are 16 bit and big-endian.

Code Cell :

Output :

Take note of the array "dtype = '>i2'" above. The > denotes 'big-endian' (< denotes 'little-endian') and i2 denotes 'signed 2-byte integer'.

Byte Swapping

As you may expect from the preceding section, there are two methods to influence the connection between the array's byte ordering and the underlying memory it is inspecting:

  1. Modify the array dtype's byte-ordering metadata to make the underlying data seem to be in a different order. This is the function of arr.newbyteorder(). Let's take a closer look into arr.newbyteorder().

ndarray.newbyteorder()

Syntax :

ndarray.newbyteorder(new_order='S', /)

The option new order is used to assign the ndarray object New Byte order. The following types of new byte orders are possible:

  • 'S' - change dtype from current to opposite endian 
  • '<', 'L' - little endian 
  • '>', 'B' - big endian  
  • '=','N' - native order 
  • '|','I' - ignore (no change to byte order)

Using an example, let's proceed. Assume I have a NumPy array with a single uint32 entry.

Code Cell :

Output :

Then I have the option of interpreting the data as little-endian:

Code Cell :

Output :

Note :

The new byte order generated by newbyteorder is exclusively determined by the array's dtype; a.newbyteorder ("<") provides a view of a with a little-endian dtype. It has no effect on the array's form or strides, nor does it modify the contents of memory.

  1. Change the underlying data's byte order while leaving the dtype interpretation alone. This is the function arr.byteswap().

ndarray.byteswap()

Syntax :

ndarray.byteswap(inplace=False)

It is used to switch between low-endian and big-endian data representations by returning a byte swapped array, which can be swapped in-place if desired. Before proceeding, let us look at an illustration.

Code Cell :

Output :

Note :

The ndarray.byteswap() method does not work with string arrays.

Changing Byte Ordering

Now consider some of the most typical scenarios in which byte order must be changed:

1. The Endianness of Your Data and dtype Do not Match, and You Wish to Update the dtype to Match the Data.

Assume we experienced a circumstance in which we created something that did not match:

Code Block :

Output :

The obvious solution is to update the dtype to give the right endianness:

Code Block :

Output :

Take note that the array has not altered in memory.

2. Your Data and dtype endianness are not the same, and You Want to Swap the Data to Match the dtype.

You should do this if you require the data in memory to be in a specific order. For example, you may be writing memory to a file that requires specific byte ordering.

Code Cell :

Output :

As you can see, the array in memory has altered.

3. The Endianness of Your Data and dtype Match, but You Want the Data Switched and the dtype to Reflect this.

You may have a correctly stated array dtype, but you need the array to be in memory with the opposite byte order and the dtype to match so the array values make sense. In this situation, you just do the previous two procedures (described in the previous two points).

Code Cell :

Output :

The ndarray astype method makes it easy to convert the data to a certain dtype and sort the bytes.

Code Cell :

Output :

So, this concludes the blog. Kudos! :tada: You now have a solid understanding of byte Swapping in NumPy.

Conclusion

This blog taught you about:

  • We began by understanding the fundamental notion of binary numbers.
  • The fundamental concepts behind Byte ordering.
  • Big-endian and little-endian notation.
  • Several methods for changing the connection between the array's byte ordering and the underlying memory are being examined.
  • Examples of common instances in which byte ordering must be adjusted