Forensic Analysis of SQLite Databases: Free Lists, Write Ahead Log, Unallocated Space and Carving
A newer version of this article
Please find an updated version of this article here.
SQLite is a widely popular database format that is used extensively pretty much everywhere. Both iOS and Android employ SQLite as a storage format of choice, with built-in and third-party applications relying on SQLite to keep their data. A wide range of desktop and mobile Web browsers (Chrome, Firefox) and instant messaging applications use SQLite, which includes newer versions of Skype (the older versions don’t work anyway without a forced upgrade), WhatsApp, iMessages, and many other messengers.
Forensic analysis of SQLite databases is often concluded by simply opening a database file in one or another database viewer. One common drawback of using a free or commercially available database viewer for examining SQLite databases is the inherent inability of such viewers to access and display recently deleted (erased) as well as recently added (but not yet committed) records. In this article, we’ll examine the forensic implications of three features of the SQLite database engine: Free Lists, Write Ahead Log and Unallocated Space.
SQLite Recovery and Analysis Tool
For the purpose of this article, we’ll be using Belkasoft Evidence Center to illustrate the low-level approach to handle SQLite databases. Belkasoft Evidence Center is an all-in-one digital forensic tool to help investigators reliably carve the disk or disk image for SQLite databases, extract and analyze information from all available sources including freelists, rollback journals and write ahead logs. The tool can discover and display data stored in unallocated areas via built-in SQLite Viewer or HexViewer.
Belkasoft Evidence Center is perfectly equipped to handle existing, emptied, deleted or corrupted SQLite databases
The powerful carving algorithms can identify and extract known types of data (such as browser history or mobile app data among many others) from existing, deleted and unreferenced database records. You may request a free evaluation copy at https://belkasoft.com/trial.
Freelists: Access to Deleted Entries and Cleared Databases
Before we begin, let’s have a brief look at how SQLite stores and manages data. Regular records are stored in a file called the “Database Image”. The database image is broken up into “pages” of a fixed size. The size of each page is specified in the file header. Each page may have one of the many roles assigned by SQLite. One of these roles is storing the actual data. Each page has a number assigned by the SQLite engine. Page numbering starts with 1.
An SQLite database may have one or more unused pages. These pages may be created and reserved for future use to improve performance, in which case they contain no meaningful data. Unused pages in the database file are stored on what is called a “freelist”. Here’s how SQLite.org explains the meaning of freelists:
“A database file might contain one or more pages that are not in active use. Unused pages can come about, for example, when information is deleted from the database. Unused pages are stored on the freelist and are reused when additional pages are required.”
As we see, keeping newly created pages is not the only use for freelists. Pages that are deleted from a SQLite database with no “Auto Vacuum” option active are placed on a freelist until wiped or claimed by the database engine to store other data. Until then, the pages are just sitting in the freelist, and are not accessible through the database engine via regular means. These unused pages may contain forensically important information such as deleted SMS messages, cleared chat logs and emptied browsing histories. As a result, accessing information stored in the freelist is extremely important.
Standard database viewers and SQLite components do not provide access to information stored in the freelist. As a result, we had to develop forensically-oriented code in order to reliably detect, extract and access deleted information stored in the freelist.
SQLite Database Browser shows zero records in Messages table of Skype history file
By creating native SQLite analysis, we were able to add quite a bit of power to SQLite investigations. Low-level SQLite support allows analyzing deleted SQLite pages – such as cleaned up mobile applications data, chats that were deleted by a suspect, cleared Web histories and other types of data.
Using Belkasoft Evidence Center helped restore tens deleted Skype chats out of seemingly empty Skype database
The ability to recover deleted records from cleared SQLite databases becomes essential for any investigation involving the analysis of suspects’ online communications, so using a forensic tool with freelist support in your lab is an absolute must.
Write Ahead Logs: Access to Non-Committed Data
While the freelist offers access to deleted SQLite pages, the write ahead log (or “WAL”) can become a doorway to accessing records not yet committed into main database. But let’s first look at what the WAL is.
Historically, SQLite used journaling to guard against potential write errors. This mechanism was called “Rollback Journals”. Every time the SQLite engine was about to write data to a page, the original content of that page was backed up into a separate journal file. If the write operation was concluded successfully, the engine would remove the journal file. If, however, there was a problem writing data, or the write operation was interrupted for whatever reason, the journal file would remain on the disk. As a result, the next time the database is started, the engine would discover those journal files and roll the entire database back to its previous state by merging data from rollback journals into the main database.
That was long ago. From version 3.7.0, SQLite employs a completely different journaling mechanism. Called “Write Ahead Log”, it’s basically an upside down version of Rollback Journals. With write ahead logs, the database does not touch the main database when a new or altered page is written. Instead, for the time being new or altered data is stored in a separate file called Write Ahead Log. Interestingly, the new data may remain in the WAL file for a very long time until committed by the “Checkpoint” event. Until then, the database will read new data from the WAL instead of using the main database. The Checkpoint event occurs automatically after the WAL size reaches a certain size (by default, this is 1000 pages). Changes can be committed via the database API or manually with the following SQL command: PRAGMA wal_checkpoint;
So what does it have to do with digital forensics? One thousand records is an awful lot when we talk about chatting or browsing. The entire chat session may never trigger a single checkpoint. As a result, information in the write ahead log will remain uncommitted. General database tools as well as SQLite drivers used in many forensic products will attempt to commit write ahead logs before accessing data. However, this is not the best course of action from the forensic point of view.
What we do when accessing Rollback Journals and Write Ahead Logs is parsing the database one page after another in low level. This native processing has an immediate advantage of displaying both the old (historic) copy of a page retrieved from the main database file as well as the new (uncommitted) copy of the same page stored in the write ahead log (or vice versa if an older version of SQLite with rollback journals is used). Uncommitted pages are then highlighted with a different color:
As a side note, we were recently involved in a case where the suspect’s main database was empty as a result of a recent cleanup. However, the write-ahead log contained a lot of forensically important information.
SQLite Carving: Not As Easy
While all is fine and saucy when you have access to the actual files storing SQLite databases, things aren’t as rosy when it comes to recovering SQLite databases that were not just emptied but physically deleted from the disk. If this is the case, and if the file system record pointing to a database file is no longer available, examiners have to refer to file carving in order to recover the database file. However, carving the SQLite format is problematic at best.
Before we continue talking about why carving SQLite databases is so difficult, let us have a look at how file carving works. As we know, the time the operating system deletes a file is not necessarily the time it claims its space back. The fact that the content of a deleted file may stay on the disk for a long time allows examiners use special techniques to discover the file and extract its content by using content-aware signature search. (The above does not apply to SSD drives or MMC memory used in many mobile devices, particularly Apple’s hardware running iOS 4 and newer; see our article on SSD forensics at https://belkasoft.com/ssd-2014).
Normally, when carving for a file, one looks for a certain sequence of bytes representing the signature that characterizes a certain file format. After encountering such a signature, and after performing one or more secondary checks to validate that the sequence represents the beginning of a given file, the carving tool must calculate the length of the file. Normally, the file’s length is calculated by analyzing its header for file length information. If the header contains no information about the length of a file, the carving tool may scan subsequent clusters in an attempt to locate its footer.
Now back to SQLite. Carving SQLite databases is problematic because although the SQLite database file does have a header that can be detected with signature search, the header itself contains no information about the length of a file. There is no footer either, so determining the exact length of a SQLite database is extremely difficult considering that it may encapsulate data in pretty much any format including binary data. What *is* available though is the size of an individual page. This information is contained in the database header.
The resulting solution involves content-aware analysis of subsequent data blocks read off the disk. The carving algorithm uses information about page size extracted from the database header as well as the value of the first byte of each subsequent page to determine of the data in a particular page is valid in the context of SQLite database format.
So this is how it works. First, we analyze the database header to identify the first page and to discover page size. By knowing the page size, we read that many bytes of subsequent data and save them as the first page. Then we read the next page off the disk and validate its first byte to determine if the page belongs to a SQLite database. We do this until encountering a page that does not pass validation, which we consider to be the end of a SQLite database file. Sounds easy? In fact, this is a lengthy procedure that does not cope well with disk fragmentation, effectively recovering only the first fragment of a large SQLite file. However, this is still much better than nothing. This is precisely the way Belkasoft Evidence Center handles SQLite databases in its low-level parsing engine.
Unallocated Space in SQLite Databases
When talking about deleted records, one simply must mention another type of artifacts routinely encountered when analyzing SQLite databases. As we already figured, a SQLite database is broken into pages. Some of these pages are "leaf table b-trees" which contain the data. In turn, these leaf table b-trees contain cells. Normally, SQLite will attempt to place new cell towards the end of the b-tree page. The newly added cells will reside towards the end of the page. Until allocated for a new cell, the space before the first cell starts is, well, unallocated. Unallocated space can be empty, or it may contain deleted data or remnants of previously used pages.
To put it simply, unallocated space in SQLite databases are page fragments containing random chunks of data. This is similar to unallocated space on a regular hard drive. These unallocated fragments do not contain valid data or pointers. In a word, these page fragments represent areas of available space that are ready to accept new data.
Unlike freelists, unallocated space is not formally referenced from anywhere in the SQLite database. As a result, examining unallocated space is very difficult and time-consuming. Unallocated fragments normally contain random chunks and fragments of data. It is impossible to determine which page used to contain these fragments. It is also impossible to recover the broken relations.
At the same time, analyzing unallocated space is essential for forensic investigations as these page fragments may contain data deleted by the user a long time ago.
Belkasoft Evidence Center can display all areas of unallocated space in SQLite database in Hex Viewer. The product can also carve all supported artifacts from these areas.
A number of Skype chats recovered from unallocated SQLite space. Naturally, most of records are corrupted, but this is better than nothing
While SQLite forensics may seem easy from the first glance, just opening a database in free SQLite Database Browser won’t cut it by a huge margin. A tool supporting both freelists and WAL analysis, having the ability to carve deleted SQLite databases and interpret data in SQLite unallocated space is a must for every digital forensic investigator.
Try it yourself
Request a free trial version of Belkasoft Evidence Center at https://belkasoft.com/trial to test all the powerful SQLite analysis capabilities provided by the product.