Validation of Uploaded Content in Django

Learn via video courses
Topics Covered

Overview

In this article, we'll look at how to use Django to implement file type validation by verifying the file extension before uploading. To give users a positive experience, client-side validation is being used in this example. In some circumstances, client-side validation is a much more effective technique than server-side validation because it takes less time.

Uploaded Files and Upload Handlers

Django Upload Validator is a straightforward tool that uses the Python-Magic library to validate file types and extensions.

Uploaded files

class UploadedFile The actual file data is kept in request. FILES during file uploads. A wrapper for an uploaded file, an entry in this dictionary is a UploadedFile object (or a subclass). To access the uploaded content, you typically employ one of the following techniques:

UploadedFile.read() Read the entire file containing the uploaded data. If the uploaded file is large, using this method to read it into memory could cause your system to crash. Instead, you should probably use chunks(), see below.

UploadedFile.multiple_chunks(chunk_size=None) Returns True if the uploaded file is sizeable enough to demand multiple chunks of reading. This will by default be any file larger than 2.5 megabytes, but you can change that; see below.

UploadedFile.chunks(chunk_size=None) A generator that returns sections of the file. If multiple_chunks() returns True, you should use this method in a loop rather than read().

Using chunks() continuously is frequently the simplest solution in practice. Large files won't overrun your system's memory if you loop over chunks() rather than using read().

UploadedFile has the following beneficial characteristics.

UploadedFile.name The file's name (for example, my_file.txt).

UploadedFile.size The uploaded file's size is expressed in bytes.

UploadedFile.content_type The file's content-type header (such as text/plain or application/pdf), which was uploaded with it. You shouldn't rely on the uploaded file being of this type like you wouldn't with any other user-supplied data. As always, "trust but verify" to ensure that the file contains the information that the content-type header claims it to.

UploadedFile.content_type_extra A dictionary with additional information is sent along with the content-type header. Services like Google App Engine, which intercepts and manages file uploads on your behalf, frequently offer this service. As a result, your handler might only receive a URL or other pointer to the file rather than the uploaded file's actual content.

UploadedFile.charset The character set (i.e., utf8) provided by the browser for text/* content types. Again, the best course of action is to "trust but verify."

Note: By iterating over the uploaded file, you can read the file line by line just like regular Python files:

Utilizing universal newlines, lines are divided.

UploadedFile has the following subclasses:

classTemporaryUploadedFile A file that has been stored temporarily (i.e. stream-to-disk). The TemporaryFileUploadHandler uses this class. It has an additional method in addition to the ones from UploadedFile:

TemporaryUploadedFile.temporary_file_path() Returns the complete path to the temporary file that was uploaded.

classInMemoryUploadedFile A memory upload of a file (i.e. stream-to-memory). The MemoryFileUploadHandler uses this class to upload files.

Built-in upload handlers

The default file upload behaviour in Django is to read small files into memory and large files onto a disc. This behaviour is provided by the MemoryFileUploadHandler and TemporaryFileUploadHandler together. They can be found in the django.core.files.uploadhandler.

classMemoryFileUploadHandler File upload handler to stream uploads into memory (used for small files).

classTemporaryFileUploadHandler Stream data into a temporary file using the TemporaryUploadedFile upload handler

Writing Custom Upload Handlers

classFileUploadHandler Subclasses of django.core.files.uploadhandler.FileUploadHandler should be used by all file upload handlers. Wherever you want, you can define upload handlers.

Required Methods

The following procedures must be defined by custom file upload handlers:

FileUploadHandler.receive_data_chunk(raw_data, start)

Obtains a "chunk" of information from the file upload.

The uploaded data is contained in the bytestring named raw_data.

start indicates where in the file this raw_data chunk starts.

The receive_data_chunk methods of the following upload handlers will use the data you return. One handler can act as a "filter" for other handlers in this way.

To prevent additional upload handlers from receiving this chunk, return None from receive_data_chunk. This is helpful if you plan on keeping the uploaded data on your system and don't want subsequent handlers to keep a copy of it.

The upload will fail if a StopUpload or SkipFile exception is raised, and the file will be entirely skipped.

FileUploadHandler.file_complete(file_size)

When a file has finished uploading, a call is made.

A UploadedFile object will be kept in the request.FILES. should be returned by the handler. To specify that subsequent upload handlers should provide the UploadedFile object, handlers may also return None.

Optional Methods

The following optional methods or attributes are also open to the definition by custom upload handlers:

FileUploadHandler.chunk_size

Size of the "chunks" that Django should store in memory and feed to the handler, expressed in bytes. In other words, the size of chunks fed into FileUploadHandler.receive_data_chunk is controlled by this attribute.

The chunk sizes should be divisible by 4 and should not be larger than 2 GB (231 bytes) to achieve the best performance. Django will use the smallest chunk size specified by any handler when multiple handlers provide different chunk sizes.

The default value is 64 KB, or 64*210 bytes.

FileUploadHandler.new_file(field_name, file_name, content_type, content_length, charset, content_type_extra)

Callback indicating the beginning of a fresh file upload. Before any data is sent to any upload handlers, this is called.

The file's <input> field's string name is field_name.

The file_name that the browser provides is the file name.

content_type is the MIME type that the browser provides, such as "image/jpeg".

The length of the image as specified by the browser is called content_length. Occasionally, this won't be offered and will instead be None.

The character set (i.e., utf8) provided by the browser is known as charset. Similar to content_length, this is occasionally unavailable.

content_type_extra is additional data from the content-type header about the file. Take a look at UploadedFile.content_type_extra.

To stop subsequent handlers from handling this file, this method may raise a StopFutureHandlers exception.

FileUploadHandler.upload_complete Callback indicating that all files have been successfully uploaded.

FileUploadHandler.upload_interrupted Callback indicating the upload was stopped, such as when the user closed their browser while a file was being uploaded.

FileUploadHandler.handle_raw_input(input_data, META, content_length, boundary, encoding)

Allows the parsing of the raw HTTP input to be entirely overridden by the handler.

A file-like object called input_data supports read() operations.

META and request. META share the same object.

The length of the data in input_data is the value of content_length. Read no more from input_data than the content_length bytes.

The MIME boundary for this request is boundary.

The request's encoding is called encoding.

If you want upload handling to continue, return None. If you want to return the new data structures appropriate for the request directly, return a tuple of (POST, FILES).

Conclusion

Validation of uploaded content in Django. Throughout the entire article, we are attempting to define a specific approach to dealing with file uploads for a class-based view. According to the documentation-

  • To show how to use FileExtensionValidator, we'll build a file uploader application that checks 'pdf' files on the server side. Create a new project first, and then inside of that project, create a new app.
  • Then, in the settings.py, add the app's name to your list of "INSTALLED_APPS."
  • Now, raise a ValidationError with the code "invalid_extension" if the value. name extension (value is a File) cannot be found in the allowed_extensions list. With allowed_extensions, the extension is equated without regard for the case.
  • In this file field, users may only upload PDF files. If not, it will raise an error. It is also always advised to include client-side verification for such requirements. This article provides an example of how file uploads can be verified at the server end.