Scoring Systems for Metrics - Device Fingerprinting and Identification

6.3 Device Fingerprinting and Identification

6.3.3 Scoring Systems for Metrics

Scoring is the way each metric contributes to the final rank of a given match.

We propose three different scoring systems and briefly present them below.

Metrics majority voting score. In this scoring system, each metric of each fingerprint match is ranked in decreasing order. The fingerprint match that ranks highest on most of its metrics is considered to be the most accurate match to the unknown sample.

3http://ssdeep.sourceforge.net/

Large Scale Security Analysis of Embedded Devices’ Firmware

Uniform and non-uniform weighting scores. In these scoring system, each metric value of a fingerprint is assigned a weight. Then, for each metric of each fingerprint all the weighted values are summed into a total value of the finger-print. Finally, all the total values are ranked in decreasing order. The fingerprint match whose total value ranks highest is considered to be the most accurate match to the unknown sample.

For our evaluation, we used the uniform weights of 16.6% for each of the six metrics. We also used the empirically found weights for each of the six metrics:

4% forsitemap, 4% forFSM, 1% forfuzzy hash header, 1% forfuzzy hash content, 10% for crypto hash header, 80% for crypto hash content.

Score fusion. In our evaluation, we used thescore fusiontechnique to improve the accuracy of identification. The score fusion technique is widely and actively used in various research fields, such as biometrics [161] and sensors data [143].

It is used to increase the confidence in the results and to counter the effect of imprecisely approximated data (e.g., fingerprints in biometrics) and unstable data readings (e.g., sensors data).

We take as input the decreasingly ordered rankings from each of the scoring systems described above. Then, we apply majority voting to each ranking from these three scoring systems. This allows our system to decide which match is the most accurate based on its scores computed using the three different scoring systems presented above.

6.3.4 Experimental Setup

We start by emulating the 27 firmware images and connecting up the 4 physi-cal devices. At this step we apply the firmware emulation technique previously introduced in Chapter5. Then we create one fingerprint for the embedded web interface of each of these 31 devices. Subsequently we create a list of IP ad-dresses based on the IP address of each of the 31 running devices. We make sure that the list of IP addresses is randomized every time it is created. We feed sequentially each IP address to the identification module which acts like an oracle and has to “guess” to what fingerprint to assign the web-interface at this particular IP address. For this, the identification module loads the previously created fingerprinting database, computes the metrics for each URL and accu-mulates them, runs the scoring systems and finally outputs the most accurate fingerprint match by applying the score fusion method. The list of these steps constitutes an experimental run.

We execute the above steps for 100 experimental runs at various points in time, under varying network conditions and varying IP address assignments. We also vary the number of threads used for web interface crawling and the speed at

6.3. DEVICE FINGERPRINTING AND IDENTIFICATION 95

which they crawl. Finally, we compute the average of successful and erroneous identification rates based on results from each experimental run.

6.3.5 Evaluation

Summarized, our tests on average resulted in 89.4% accuracy in device iden-tification. The tests were run using a database containing 31 fingerprints of embedded web interfaces. The fingerprinted web interfaces contained around a third of similar or consecutive firmware versions (e.g., Brickcom).

Our evaluations show that the cryptographic hash of the content is the most sta-ble and accurate feature. On average, it provided an accuracy of more than 85%.

On the other end, the fuzzy hash of headers and content were the most unstable.

One reasons for this is that fuzzy hashing does not perform well with short data (e.g., HTTP headers). Another reason, as discussed also in Section6.2.4, could be the fact that fuzzy hash is not an accurate data match and can introduce noise rather than useful similarity information. These empirical observations lead us to choose the non-uniform scoring weights as presented in6.3.3. Finally, the most accurate scoring system in our tests was the majority voting, followed by the non-uniform. As expected, the uniform weights scoring system performed the worst with more than 50% of classification errors. This can be explained by the high weights assigned to non-accurate and noisy fuzzy hash metrics.

6.3.6 Discussion

Embedded web interface’s URLs that majorly affect the device functionality, e.g., reboot.php, FW-upgrade.cgi, represent a particular challenge. Access-ing such URLs durAccess-ing fAccess-ingerprintAccess-ing or matchAccess-ing may cause the device to go offline or malfunction right in the middle of the identification process. As a re-sult the fingerprint or match will be partial and may produce wrong rere-sults.

Undoubtedly, automatically detecting such URLs and excluding them from the identification process may help prevent such behavior. One way to achieve this detection is by doing manual annotations on a case-by-case basis, but this obvi-ously does not scale. Other ways to achieve this is by using static and dynamic analysis. Alternatively, it is possible to perform a first sequential run on each URL of an embedded web interface and check the “liveness” of the embedded web interface after each URL is accessed. Even though this method is not 100%

accurate since the device can go offline or malfunction for other reasons as well, it may help removing the perturbing URLs and establish a “safe” list of URLs to be used for device identification. Finally, though removing few URLs from the identification process might slightly reduce the exploratory surface of the device’s web interface, the number of such URLs in practice is extremely low compared to all other resources in the embedded web interface (e.g., images, JavaScript,

Large Scale Security Analysis of Embedded Devices’ Firmware

styles, static help files). Thus the impact of their removal on the the scoring is expected to be minimal.

Additionally, we designed our fingerprinting tools in a robust manner so that they support HTTP authentication and HTTP cookies. This functionality is required since many embedded devices user either HTTP basic authentication or HTML forms authentication which is stored in session cookies. Having this functionality enables us to create more granular fingerprints and, depending on the embedded web interface, identify if a given access to a connected device is unauthenticated, unauthenticated as admin, ... , unauthenticated as userN.

Finally, our embedded web interface identification can be further improved by combining our fingerprint data with results from complementary fingerprint lev-els, such as device [144], CPU and clock [144], OS [190], network stack (e.g., NMAP) [114], combined (e.g., Nessus⁴).

Dans le document Large scale security analysis of embedded devices' firmware (Page 115-118)