Structured DataType Using NumPy

Learn via video courses
Topics Covered

Overview

Structured Array in NumPy is comparable to Struct in C. It is utilized to classify data of various sizes and types. Data containers known as fields are used in structure arrays. Any kind and size of data can be contained in any data field, and the dot notation can be used to access array elements.

Introduction to Structured Data Type in NumPy

An ordinary array contains only objects of the same type. The structured array is a format for holding numerous objects in the index group.

The structure in a NumPy structured array is identical to that in C. It is a collection of variables with various sizes and data kinds. It is utilized to classify data of various sizes and types.

Data containers known as fields are used in structure arrays. Any kind and size of data can be contained in any data field. Dot notation can be used to access array elements, and named array fields can hold data of different sizes and types.

A structured datatype is a collection of bytes (the structure's item size) that are translated into a collection of fields. Each field in the structure contains a name, a data type, and a byte offset.

Properties of Structured Array

Let's look at some of the properties of structured arrays in NumPy.

  • In an array, every struct has the same number of fields.
  • Same field names should be used in all structs.

Consider, for instance, a structured array of vehicles with various fields, such as company, model, kind of fuel consumption, etc.

A structure of class struct exists for each record in the array vehicle. The empty array is included when adding any new fields for a new struct to the structure's array, which is why it is referred to as a struct.

Let's take a look at an example.

Output:

Here we created a structured NumPy array with 4 fields named company, model, fuel_mode, and mileage, respectively. The dtype for each field is given below.

Fielddtype
companystr_ (upto 20 characters)
modelstr_ (upto 20 characters)
fuel_modestr_ (upto 20 characters)
mileageint32

Here we created two structures, one for the company Tesla and one for BMW, and both of these have the same number of fields, that is, four. Hence following the first property. Taking a look at the dtype returned by the type() function, we can observe that both the structures have the same dtype. Hence following the second property as well.

Need for Structured Array

In each programming language, a user-defined datatype called structure enables us to aggregate data of various sorts. A complex data type can be built with the aid of structure to make it more meaningful. It resembles an array in certain ways, but an array only stores data of the same type. However, structure, which is more useful practically, may store data of any type. Structured data types share a similar memory organization with structs in the C programming language and are intended to copy them. They are designed for low-level manipulation of structured buffers, such as decoding binary blobs, and for interacting with C code. They enable control over the structure's memory layout and support specialized features like subarrays, nested datatypes, and unions for these uses.

Users that want to edit tabular data, like that found in CSV files, may find that other pydata projects, like xarray, pandas, or DataArray, are more appropriate. These offer a high-level interface and are better suited for tabular data analysis. For instance, structured arrays in NumPy may exhibit subpar cache behavior due to their memory architecture resembling a C-struct.

Create a NumPy Structured Array

From Tuple

Each tuple has the following elements: field name, data type, and optionally, shape. Field Titles are below for further information on field titles. The field name is a string, a datatype is an object that can be converted to a data type, and the shape is a tuple of integers indicating the subarray shape.

Let's take a look.

Output:

Here f4 stands for float32 dtype, and (2, 2) is the specified shape of the sub-array. All these fields were passed as tuples, and hence we created a structured array from tuples.

From Comma Separated dtype

Any of the string dtype specifications may be utilized in a string and segregated from other strings using this shorthand notation. The field names are given the default names f0, f1, etc., and the item size and byte offsets of the fields are chosen automatically.

Output:

Here, i8 stands for the NumPy bytes_ dtype, f4 for the float32 dtype and S3 for the int64 dtype. Here, three structures were formed with the field names "f0," "f1," and "f2" using the dtype in the prescribed order. All of these fields were supplied as comma-separated parameters, therefore, we used comma-separated parameters to generate a structured array.

From Dictionary

This specification type enables control over the byte offsets of the fields and the item size of the structure, making it the most flexible.

The dictionary comprises four optional keys: offsets, item size, aligned, and titles, as well as two necessary keys: names and formats. A list of field names and a list of dtype specifications, both of the same length, should be the values for "names" and "formats," respectively. A list of integer byte-offsets, one for each field in the structure, should be the optional "offsets" value. If ‘offsets’ is not given, the offsets are determined automatically. The optional "item size" parameter ought to be an integer that specifies the dtype's overall size in bytes. This size must be sufficient to accommodate all of the fields.

Output:

Here, i4 stands for a 4-byte integer, and i8 stands for an int64 "dtype." Here, each of these fields was supplied as a dictionary. Therefore, we used dictionaries to generate a structured array.

Conclusion

Let's see what we have learned here.

  • Structured Array in NumPy is utilized to classify data of various sizes and types.
  • Data containers known as fields are used in structure arrays.
  • In the structured array, every struct has to have the same number of fields, and the same field names should be used in all structs.
  • There are different methods to create structured arrays, from tuples, comma-separated arguments, and dictionaries.