MedBN: Robust Test-Time Adaptation against Malicious Test Samples

The process of batch normalizing tranformation (Batch Normalization vs. MedBN (Ours))

✅ In response to the threat of malicious samples, we introduce a simple and effective robust batch normalzation method, Median Batch Normalization (MedBN), which uses the median instead of mean to estimate the batch statistics. Our method is algorithm-agnostic, thus allowing seamless integration with existing TTA frameworks.

[1] Wu, Tong, et al. "Uncovering adversarial risks of test-time adaptation." International Conference on Machine Learning. 2023.

Abstract

Test-time adaptation (TTA) has emerged as a promising solution to address performance decay due to unforeseen distribution shifts between training and test data. While recent TTA methods excel in adapting to test data variations, such adaptability exposes a model to vulnerability against malicious examples. Indeed, previous studies have uncovered security vulnerabilities within TTA even when a small proportion of the test batch is maliciously manipulated. In response to the emerging threat, we propose median batch normalization (MedBN), leveraging the robustness of the median for statistics estimation within the batch normalization layer during test-time inference. Our method is algorithm-agnostic, thus allowing seamless integration with existing TTA frameworks. Our experimental results on benchmark datasets, including CIFAR10-C, CIFAR100-C, and ImageNet-C, consistently demonstrate that MedBN outperforms existing approaches in maintaining robust performance across different attack scenarios, encompassing both instant and cumulative attacks. Through extensive experiments, we show that our approach sustains the performance even in the absence of attacks, achieving a practical balance between robustness and performance.

🏹 Targeted Attack Scenarios (Table 1 in paper)

• Scenrario description: The objective of a targeted attack is to manipulate the model into predicting a targeted label on a targeted sample, using malicious samples in a batch.
• Evalutation metric: Attack Success Rate (ASR; %). ⬇️ is better.
• Results: The ASRs of all TTA methods using MedBN are consistently less than 20% for CIFAR10-C, 10% for CIFAR100-C, and 1% for ImageNet-C.

📉 Indiscriminate Attack Scenarios (Table 2 in paper)

• Scenrario description: The objective of an indiscriminate attack is to degrade the performance of benign samples.
• Evalutation metric: Error Rate (ER; %) on benign samples. ⬇️ is better.
• Results: All TTA methods using MedBN shows reduced ERs as high as approximately 9% in CIFAR10-C, 11% in CIFAR100-C, and 12% in ImageNet-C.

Dataset	B / m	Normalization	Method	m=0
ResNet50 backbone
LDAM (NeurIPS'2019)	49.8	60.4	46.9	30.7
RIDE (3 exp.) (ICLR'2021)	54.9	66.2	51.7	34.9
BCL (NeurIPS'2020)	57.1	67.9	54.2	36.6
¹LDAM+DRW+SAM (NeurIPS'2022)	53.1	62.0	52.1	32.8
²PaCo+SAM (ICCV'2021, NeurIPS'2022)	57.5	62.1	58.8	39.3
ViT backbone
ViT (ICML'2021)	37.5	56.9	30.4	10.3
DeiT-III (ECCV'2022)	48.4	70.4	40.9	12.8
¹DeiT-LT (ours)	55.6	65.2	54.0	37.1
²DeiT-LT (ours)	59.1	66.6	58.3	40.0

Dataset

B / m

Normalization

Method

m=0

ResNet50 backbone

LDAM (NeurIPS'2019)

49.8

60.4

46.9

30.7

RIDE (3 exp.) (ICLR'2021)

54.9

66.2

51.7

34.9

BCL (NeurIPS'2020)

57.1

67.9

54.2

36.6

¹LDAM+DRW+SAM (NeurIPS'2022)

53.1

62.0

52.1

32.8

²PaCo+SAM (ICCV'2021, NeurIPS'2022)

57.5

62.1

58.8

39.3

ViT backbone

ViT (ICML'2021)

37.5

56.9

30.4

10.3

DeiT-III (ECCV'2022)

48.4

70.4

40.9

12.8

¹DeiT-LT (ours)

55.6

65.2

54.0

37.1

²DeiT-LT (ours)

59.1

66.6

58.3

40.0

@article{park2024medbn, title={MedBN: Robust Test-Time Adaptation against Malicious Test Samples}, author={Park, Hyejin and Hwang, Jeongyeon and Mun, Sunung and Park, Sangdon and Ok, Jungseul}, journal={arXiv preprint arXiv:2403.19326}, year={2024} }

MedBN: Robust Test-Time Adaptation against Malicious Test Samples

A potential security flaw in test-time adaptation process [1]: vulnerablity of mean in batch normalization layers

While Test-Time Adaptation (TTA) methods excel in adapting to test data variations, such adaptability exposes a model to vulnerability against malicious examples, particularly when a small proportion of the test batch is maliciously manipulated.

The process of batch normalizing tranformation (Batch Normalization vs. MedBN (Ours))

Abstract

Main Results

We consider seven TTA methods which update batch statistics or the affine parameters of batch normalization layers and evaluate our approach using three major benchmarks for TTA: CIFAR10-C, CIFAR100-C, and ImageNet-C. For further details, please refer to our paper.

🏹 Targeted Attack Scenarios (Table 1 in paper)

📉 Indiscriminate Attack Scenarios (Table 2 in paper)

BibTeX