Обнаружение вторжений на основе многозадачного глубокого обучения для сетей Интернета вещей

Дун Хуэйяо

Обнаружение вторжений на основе многозадачного глубокого обучения для сетей Интернета вещей тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Дун Хуэйяо

Дун Хуэйяо
кандидат наук
2025

Специальность ВАК РФ00.00.00

Количество страниц 436

Дун Хуэйяо. Обнаружение вторжений на основе многозадачного глубокого обучения для сетей Интернета вещей: дис. кандидат наук: 00.00.00 - Другие cпециальности. ФГАОУ ВО «Национальный исследовательский университет ИТМО». 2025. 436 с.

Оглавление диссертации кандидат наук Дун Хуэйяо

Реферат

Synopsis

Introduction

CHAPTER 1. Systematic Analysis of IoT Intrusion Detection Research

1.1 Research Background and Relevance: IoT Security

1.2 Definition and Classification of Intrusion Detection Systems

1.3 Systematic Review of ML-based Intrusion Detection System

1.3.1 Deep learning-based intrusion detection techniques

1.3.2 Anomaly detection in IoT environments

1.3.3 Current research status of multi-task learning-based method

1.3.4 Data preprocessing techniques for IoT security data

1.3.5 Time-series analysis and sequential attack detection

1.4 Research Gap and Requirements

1.5 Research Model Formulation

1.5.1 Design of DL-IDS: A decentralized approach

1.5.2 Problem Formulation of the Security Model

1.5.3 Evaluation Metrics for DL-based IDS

1.5.4 Optimization Criteria

CHAPTER 2. Intrusion Detection Model Based on Integration of Multi-task Learning and Autoencoder-based Anomaly Detection

2.1 Security Landscape: Contextual Foundations

2.2 Proposed Architecture: Multi-Task Learning with Anomaly Detection

2.2.1 Data preprocessing

2.2.2 A Hybrid Resampling approach

2.2.3 Anomaly Detection with Autoencoder

2.2.4 Multi-task Learning and Weighted Loss Optimization

2.3 Empirical Validation and Discussions

2.3.1 Implementation and Experiment Setup

2.3.2 Benchmark Dataset Preparation

2.3.3 Comparative Performance Evaluation

2.4 Discussion on Implementation and Application

2.5 Chapter Synthesis

CHAPTER 3. Intrusion Detection Model Based on Soft Parameter Sharing Multitask Learning and Data Resampling

3.1 Security Landscape: Contextual Foundations

3.2 Proposed Architecture: Hybrid Workflow with Soft Parameter Sharing

3.2.1 Adaptive Undersampling Method for Classification Enhancement

3.2.2 Soft Parameter Sharing-based MTL Framework

3.2.3 Weighted Categorical Cross-entropy Loss

3.3 Empirical Validation and Discussions

3.3.1 Implementation and Experiment Setup

3.3.2 Benchmark Dataset Preparation

3.3.3 Comparative Performance Evaluation

3.4 Practical Deployment Analysis: Usability Considerations

3.5 Chapter Synthesis

CHAPTER 4. Intrusion Detection Model Based on Attention and LSTM for Time Series Analysis

4.1 Security Landscape: Contextual Foundations

4.2 Proposed Architecture: Attention-Driven Sequential Traffic Analysis

4.2.1 Time-series Data: Processing and Trend Analysis

4.2.2 Knowledge-Aware Attention Network

4.2.3 Recurrent Networks and MTL Architecture

4.2.4 Distributed and Continuous Training

4.3 Empirical Validation and Discussions

4.3.1 Implementation and Experiment Setup

4.3.2 Benchmark Dataset Preparation

4.3.3 Comparative Performance Evaluation

4.4 Model Usability: Efficiency, Deployment, and Maintenance

4.5 Chapter Synthesis

Conclusion

List of abbreviations

Bibliography

Appendix A: Registered Program and Acts of Implementation

Appendix B: Publications

Реферат

Общая характеристика диссертации

Введение диссертации (часть автореферата) на тему «Обнаружение вторжений на основе многозадачного глубокого обучения для сетей Интернета вещей»

Актуальность выбранной темы.

В современную эпоху Интернет вещей (Internet of Things, IoT) стремительно развивается и становится неотъемлемой частью многих отраслей, включая здравоохранение, автомобилестроение, промышленные цепочки поставок и производство, а также "умные" города. Эта технология обеспечивает бесперебойную связь между устройствами, облегчая сбор и обмен данными для повышения эффективности работы и создания новых возможностей за счет автоматизации, дистанционного управления и аналитики в режиме реального времени. Повсеместное распространение IoT-устройств превратило их в критически важные компоненты инфраструктуры, оказывающие значительное влияние на различные отрасли.

Несмотря на преимущества, предоставляемые системами IoT, их растущая интеграция в основные сервисы вызывает опасения относительно угроз и уязвимостей безопасности. IoT-устройства часто не имеют надежных протоколов безопасности, что ограничивает их возможности по предотвращению угроз при хранении конфиденциальной информации и выполнении критически важных функций. Это делает их главными объектами для эксплуатации злоумышленниками. Например, в октябре 2016 года ботнет Mirai использовал многочисленные IoT-устройства, используя пароли по умолчанию для получения доступа и установки, что привело к одной из крупнейших в истории распределенных атак типа «отказ в обслуживании» (DDoS). От этой атаки пострадали крупнейшие медиаплатформы, включая Twitter, Reddit, CNN и Netflix. Подобные нарушения могут привести к несанкционированному доступу к конфиденциальным данным, манипулированию работой устройств и нарушению целостности сети. Потенциальные последствия этих уязвимостей подчеркивают критическую важность внедрения надежных мер безопасности в системах IoT.

Для устранения угроз безопасности, с которыми сталкиваются сети IoT, необходимо применять как меры, ориентированные на человека, такие как

соблюдение правил и строгие политики использования паролей, так и технические механизмы защиты. К таким механизмам относятся межсетевые экраны, системы обнаружения вторжений (IDS) и системы предотвращения вторжений (IPS). Брандмауэры регулируют сетевой трафик на основе установленных правил безопасности, однако они обеспечивают лишь ограниченную защиту от сложных атак, использующих уязвимости, характерные для IoT.

IDS и IPS предназначены для обнаружения и предотвращения несанкционированных вторжений путем мониторинга сетевого трафика на предмет подозрительной активности. IDS поднимает тревогу при обнаружении вредоносного поведения, а IPS может активно блокировать или смягчать угрозу. С точки зрения баланса между безопасностью и удобством использования IDS часто справляется с этим компромиссом более эффективно, поскольку позволяет избежать ошибочного блокирования законных пользователей или нормальной деятельности.

Традиционные IDS, основанные на статике, в первую очередь полагаются на исторические профили злоумышленников и легитимных пользователей для обнаружения аномалий и вторжений. Однако быстрая эволюция современных кибератак создает серьезные проблемы для этих традиционных систем. Поскольку злоумышленники постоянно адаптируют и совершенствуют свои методы, статические модели часто не справляются с угрозами «нулевого дня» и сложными адаптивными угрозами, которые могут легко обойти предопределенные правила обнаружения. Эти развивающиеся атаки используют уязвимости системы и поведенческие модели, которые традиционные IDS не способны распознать, что приводит к повышенному риску нарушения безопасности в чувствительных сетевых средах IoT. Недавние достижения в использовании методов машинного обучения (ML) показали перспективность повышения возможностей обнаружения в сетях IoT. Подходы ML позволяют автоматически изучать и выявлять закономерности как нормального, так и аномального поведения, анализируя огромные объемы данных, тем самым обеспечивая более эффективное выявление новых угроз, которые статические системы могут пропустить. Несмотря на эти

достижения, успешная работа алгоритмов машинного обучения во многом зависит от нескольких критических предположений, которые зачастую трудно выполнить в практических условиях. Например, успешная реализация моделей классификации с обучением под наблюдением предполагает наличие большого объема помеченных данных о сетевом трафике. На практике ручная маркировка обширных наборов данных не только нецелесообразна, но и занимает очень много времени, что является существенным препятствием для эффективного обучения моделей. Кроме того, эти модели часто предполагают сбалансированное распределение данных о сетевом трафике по классам, что редко встречается в реальных условиях. Динамическая природа сетевого трафика может приводить к значительным колебаниям объема нормальной и вредоносной активности с течением времени, что часто приводит к ситуации, когда доброкачественный трафик значительно превышает данные об атаках. Такой дисбаланс может существенно снизить точность обнаружения, особенно в сценариях с редкими или новыми векторами атак.

Более того, в реальных сетевых средах развертывание сложных моделей глубокого обучения (ЭЬ) на устройствах 1оТ с ограниченными ресурсами сопряжено с целым рядом проблем. Многие 1оТ-устройства обладают ограниченными вычислительными мощностями и ресурсами памяти, что затрудняет внедрение сложных моделей, требующих значительных вычислительных ресурсов для эффективной работы. Кроме того, такие системы глубокого обучения должны выполнять быстрый анализ в режиме реального времени для своевременного обнаружения угроз и реагирования на них, что необходимо для обеспечения безопасности и надежности инфраструктуры 1оТ. Таким образом, ограничения однозадачных моделей безопасности становятся очевидными, поскольку они часто не учитывают многогранный и эволюционирующий характер угроз в современном киберпространстве, что приводит к пробелам в возможностях обнаружения, которые могут поставить под угрозу общую безопасность системы.

Характеристики сред 1оТ требуют разработки моделей безопасности, которые одновременно обладают высокой устойчивостью к возникающим угрозам и совместимы с эксплуатационными ограничениями системы и внутрисетевых устройств. Достижение этой двойной цели требует баланса между эффективностью обнаружения и совместимостью системы. С точки зрения производительности основной метрикой для оценки модели обнаружения вторжений в контексте 1оТ является показатель обнаружения атак. Это подразумевает достижение практически идеальной идентификации и классификации известных моделей атак, а также надежное обнаружение неизвестных атак или атак «нулевого дня». При этом низкий уровень ложных срабатываний не менее важен, поскольку чрезмерное количество ложных срабатываний может нарушить нормальную работу системы 1оТ, что приведет к ненужным перерывам в передаче законных данных и снижению общей эффективности работы. Поэтому модели безопасности должны быть точно настроены для обеспечения надежной идентификации угроз и гарантии бесперебойной работы нормальных функций и критически важных потоков данных. С точки зрения совместимости сети 1оТ состоят в основном из распределенных сенсорных узлов с ограниченной вычислительной мощностью и объемом памяти. Следовательно, наборы параметров моделей безопасности, включая обучающие веса, должны быть достаточно малы, чтобы помещаться в эти ограниченные объемы памяти. Кроме того, процессы вывода и обновления должны работать с высокой скоростью и низкой задержкой, обеспечивая быстрое реагирование на события безопасности. С точки зрения машинного обучения, эти требования к производительности и совместимости выражаются в нескольких конкретных технических характеристиках моделей-кандидатов: минимальный размер модели, вычислительная эффективность, а также оптимизированное прямое и обратное распространение для быстрого обнаружения и реагирования.

Целью диссертационной работы аключается в решении значительных и развивающихся проблем, связанных с безопасностью сетей 1оТ. Эти проблемы включают постоянную нехватку достаточного количества высококачественных маркированных данных для обучения моделей безопасности, сложность надежного

обнаружения новых типов атак и атак «нулевого дня», а также частую ошибочную классификацию сетевых данных, вызванную несбалансированным распределением данных, что характерно для сетей и систем IoT. Кроме того, динамический и изменяющийся во времени характер сетевого трафика IoT создает существенные сложности, что делает необходимым разработку адаптивных методов, способных реагировать на изменения в режиме реального времени. Методики, предлагаемые в данной работе, призваны обеспечить создание высокопроизводительных и адаптивных систем обнаружения вторжений (IDS), способных эффективно функционировать в условиях типичных ограничений, присущих ресурсно-ограниченным и распределенным инфраструктурам IoT.

Для достижения данной цели в рамках диссертации были поставлены и решены следующие задачи:

Задача 1 Подчеркните растущую важность и необходимость обеспечения безопасности систем IoT, особенно по мере того, как эти сети расширяются в масштабе и сложности, становясь все более неотъемлемой частью критической инфраструктуры. Укажите на необходимость создания передовых, интеллектуальных и адаптивных систем обнаружения вторжений, способных реагировать на быстро меняющиеся угрозы. Проведите всеобъемлющий и систематический обзор современных подходов к обнаружению вторжений на основе машинного обучения, особенно с акцентом на глубокое обучение, с четким определением ключевых пробелов и ограничений в исследованиях, применительно к требованиям систем IoT. Выполните критический анализ технических требований и соображений, связанных с реализацией IoT, что позволит разработать и формализовать надежную исследовательскую структуру, специально предназначенную для решения многогранных проблем, возникающих в области безопасности IoT.

Задача 2 Создайте строгий аналитический рабочий процесс для эффективного и систематического сбора, предварительной обработки и управления данными о сетевом трафике IoT. Это включает в себя разработку методологических процессов для агрегирования, очистки и преобразования необработанных сетевых данных с целью обеспечения их пригодности для эффективного моделирования на основе машинного обучения и принятия решений. Эти методы крайне важны для поддержки операций по обнаружению вторжений в режиме реального времени, так как они позволяют создавать надежные конвейеры данных, способные справляться с масштабом и сложностью сред IoT.

Задача 3 Разработан новый подход, объединяющий многозадачные системы обучения (Multi-task Learning, MTL) с методами на основе автоэнкодеров для обнаружения аномалий в системах IoT. Этот подход использует жесткий обмен параметрами в архитектурах MTL для оптимизации использования ресурсов, что крайне важно, учитывая вычислительные ограничения, характерные для систем IoT. Кроме того, предложены и реализованы стратегии обнаружения аномалий, специально разработанные для выявления и устранения уникальных характеристик и моделей угроз, присутствующих в сетевом трафике IoT.

Задача 4 Продвижение в области адаптивного обнаружения вторжений осуществляется за счет использования MTL-фреймворков с мягким разделением параметров, которые обеспечивают более высокую степень гибкости и модульности при обучении моделей. Включение взвешенных функций потерь и расширенных оптимизаций адаптивного обучения облегчает настройку модели на гетерогенную сетевую среду, учитывая разнообразие и изменчивость, присутствующие в распределенных системах IoT. Эта цель направлена на повышение устойчивости и адаптивности моделей IDS, обеспечивая их эффективность в условиях различных топологий систем и переменных операционных требований.

Задача 5 Представляем и оцениваем новые методы уменьшения выборки (undersampling) на основе машинного обучения, специально разработанные для борьбы с чрезмерной избыточностью образцов доброкачественного трафика, часто встречающихся в наборах данных IoT. Выбирая наиболее информативные и репрезентативные образцы, эти методы призваны смягчить пагубное влияние дисбаланса классов, значительно повышая производительность классификаторов, особенно в сценариях, где вредоносная активность редка по сравнению с обычными моделями трафика.

Задача 6 Использовать интеграцию сетей внимания, основанных на знаниях, и сетей с длинной краткосрочной памятью (KAN+LSTM, Knowledge-aware Attention Networks + Long short-term memory) для анализа и интерпретации последовательной активности в сетях IoT в реальном времени. Это направление исследований решает ключевые задачи обнаружения коррелированных по времени и последовательных атак, а также улучшения управления трафиком и смягчения последствий атак, использующих зависящие от времени шаблоны или поведение в сетях IoT.

Задача 7 Разработать масштабируемую распределенную систему обнаружения вторжений, специально оптимизированную для развертывания в реальных средах IoT. Эта задача включает в себя разработку протоколов непрерывного обучения, периодическое обновление политик и моделей, а также создание эффективных операционных механизмов, соответствующих ограничениям и требованиям распределенных архитектур с ограниченными ресурсами. Основное внимание уделяется поддержанию высокой эффективности обнаружения при минимизации накладных расходов и обеспечении постоянного развития системы.

Задача 8 Реализовать предложенные программные конструкции и методологии для усовершенствованных моделей безопасности на основе MTL. Провести детальную экспериментальную оценку этих моделей с особым акцентом на их производительность в различных сценариях сетей IoT. Критерии оценки включают эффективность использования ресурсов, точность обнаружения и

масштабируемость в динамических, распределенных и ограниченных по ресурсам средах, что гарантирует практическую жизнеспособность и устойчивость полученных решений в реальных условиях эксплуатации.

Методы исследования.

Систематический обзор литературы, системный анализ и проектирование, теоретическая разработка и реализация методов и моделей обработки данных, моделирование, экспериментальные оценки и эмпирическая валидация.

Основные положения, выносимые на защиту:

1 Модель обнаружения вторжений, основанная на интеграции многозадачного обучения и обнаружения аномалий с использованием автокодировщика: интегративная модель обнаружения вторжений, которая отличается сочетанием методов обучения представлений, обнаружения аномалий и контролируемой классификации атак в рамках многозадачного глубокого обучения, а также используемыми методами предварительной обработки данных для сетевого трафика Интернета вещей и гибридными методами повторной выборки для повышения эффективности обнаружения.

2 Модель обнаружения вторжений, основанная на многозадачном обучении с мягким совместным использованием параметров и повторной выборке данных: гибкая модель обнаружения вторжений, которая отличается использованием многозадачных методов обучения с мягким разделением параметров для классификации атак и методов повторной выборки данных, специально разработанных для повышения надежности и адаптивности модели в сетях Интернета вещей.

3 Модель обнаружения вторжений, основанная на механизмах внимания и долгой краткосрочной памяти (ЬБТЫ) для анализа временных рядов: гибридная модель обнаружения вторжений на основе временных рядов, которая отличается интеграцией механизмов внимания и методов многозадачного обучения на основе LSTM и использованием анализа временных рядов, улучшающим последовательное моделирование зависимостей, что приводит к повышению точности обнаружения аномалий в сетях Интернета вещей.

Научная новизна диссертации отражена в комплексе инновационных и практически значимых моделей, алгоритмов и стратегий обработки данных, основанных на методе многозадачного обучения (МТЬ). Эти разработки выходят за рамки изолированных однозадачных подходов, распространенных в большинстве современных исследований, и объединяют в себе обнаружение аномалий, классификацию трафика и анализ динамической нагрузки на трафик в синхронизированных временных рамках. Предложенные методики включают конвейеры предварительной обработки данных, специально разработанные с учетом уникальных свойств и проблем сетевого трафика 1оТ, устраняя такие узкие места, как значительный дисбаланс классов, ограниченность маркировки и колебания скорости трафика. В совокупности эти новые подходы способствуют как теоретическому пониманию, так и практическим возможностям систем обнаружения вторжений в средах 1оТ.

• В отличие от традиционных методик, изолирующих каждую задачу, связанную с безопасностью, в данной диссертации представлена синергетическая и надежная архитектура модели, которая плавно объединяет обнаружение аномалий на основе автоэнкодеров с жестким фреймворком МТЬ, использующим совместное обучение параметров, специально разработанным для комплексной классификации сетевого трафика. Интегративная модель использует репрезентативные возможности автоэнкодеров для выявления отклонений от нормального трафика, в то время как одновременное выполнение задач классификации выигрывает от совместного использования параметров, что повышает эффективность обучения и обобщения. Отличительной чертой является включение сложных гибридных стратегий повторной выборки, которые решают распространенную проблему дисбаланса классов путем разумного дополнения классов меньшинств, не вызывая при этом чрезмерной подгонки. Это позволяет эффективно предотвратить плохое обнаружение редких классов атак. Кроме того, в архитектуру встроен механизм оптимизации потерь с учетом неопределенности, который калибрует обновления модели на основе мер доверия, значительно улучшая способность различать доброкачественный и вредоносный трафик. Это

приводит к заметному улучшению показателей обнаружения атак и надежности прогнозов модели в реальных условиях IoT.

• В этом исследовании, устраняющем хорошо документированный пробел в данной области, систематически изучаются и оцениваются механизмы разделения параметров в системах MTL, ориентированных на безопасность, с акцентом на расширенную гибкость, обеспечиваемую подходами мягкого разделения параметров. В частности, внедрение техники на основе регуляризатора позволяет динамически регулировать штрафные санкции, а новый подход на основе Softmax Gate обеспечивает адаптивный контроль степени обмена информацией между задачами, что позволяет достигать оптимальной производительности для конкретной задачи без ущерба для способности к коллективному обучению. Благодаря более тонкому механизму совместного использования параметров предложенные механизмы повышают адаптивность и устойчивость моделей обнаружения вторжений, особенно в разнообразных и развивающихся сетевых контекстах. Кроме того, работа отличается всесторонним эмпирическим анализом точности модели и эффективного обнаружения нечастых и редких атак, предлагая рекомендации для будущих исследований и практического развертывания в средах с изменчивыми профилями атак.

• Традиционные методы случайного сокращения выборки (undersampling) часто беспорядочно уменьшают объемы основных классов и могут непреднамеренно отбрасывать информативные образцы. Учитывая недостатки, выявленные в распространенных методах сокращения выборки, в данной диссертации впервые применяется адаптивный подход к сокращению выборки, который оценивает отбор на основе сложности классификации каждого образца. Методика выборочно отсекает большинство экземпляров класса, которые легко классифицируются, и сохраняет те, которые являются более сложными и, следовательно, более информативными для процесса обучения классификатора. Это позволяет не только получить более сбалансированное и репрезентативное распределение данных, но и значительно улучшить дискриминационные способности базового классификатора. Адаптивная конструкция напрямую решает

проблему дисбаланса классов, характерную для реального трафика IoT, способствуя как улучшению распознавания меньшинств, так и общей стабильности классификации.

• Данная диссертация открывает новые горизонты, синтезируя современные достижения глубокого обучения, в частности механизм самовнимания, с подходами рекуррентных нейронных сетей, адаптированными для анализа временных рядов. Встраивая модуль внимания, учитывающий знания в сеть (Knowledge-based attention network, KAN) с долговременной памятью (LSTM), система эффективно улавливает и использует временные зависимости и динамические паттерны, присущие сетевому трафику IoT. Такое объединение не только повышает эффективность обнаружения скоординированных по времени и сложных попыток вторжения, но и расширяет функциональность до мониторинга нагрузки на трафик в реальном времени и предиктивной аналитики. В результате получается многогранная модель сетевой безопасности, способная предложить тонкие временные нюансы, обеспечивающие как проактивное реагирование на угрозы, так и расширенное управление ресурсами в средах IoT.

В целом, основная новизна данной диссертации заключается в плавной интеграции передовых стратегий многозадачного обучения, адаптивной и контекстно-зависимой обработки данных, а также современных архитектур глубокого обучения. Все эти инновации в совокупности пересматривают дизайн, адаптивность и операционную эффективность систем обнаружения вторжений, адаптированных к специфическим нагрузкам и требованиям современных сетей IoT.

Научно-техническая задача диссертации заключается в концептуализации, разработке и реализации передовых многозадачных моделей безопасности на основе глубокого обучения, адаптированных к уникальным требованиям обнаружения вторжений в средах Интернета вещей (IoT). В отличие от традиционных систем безопасности, которые часто фокусируются на одной задаче обнаружения, данное исследование направлено на применение принципов многозадачного обучения (MTL) в архитектурах глубоких нейронных сетей для

одновременного решения нескольких взаимосвязанных задач безопасности, присутствующих в сложных и гетерогенных сетях IoT.

С научной точки зрения, объединяя обнаружение аномалий, классификацию трафика и распознавание временных шаблонов в рамках единой системы многозадачного обучения (MTL), диссертант стремится достичь комплексного и адаптивного обнаружения угроз, максимально используя общие представления в смежных задачах. Это не только повышает эффективность и действенность механизмов безопасности, но и обеспечивает большую обобщенность и устойчивость к развивающимся и новым атакам, которые обычно возникают при развертывании IoT-систем с ограниченными ресурсами. Технические задачи исследования включают разработку сложных конвейеров предварительной обработки данных, подходящих для шумных, несбалансированных и высокоскоростных данных сети IoT, внедрение инновационных методов разделения параметров для ресурсосберегающего обучения и систематическую оценку производительности моделей в реалистичных распределенных сценариях IoT. Благодаря тщательным экспериментам и проверке работа стремится установить новые ориентиры в многозадачном глубоком обучении для обеспечения безопасности, поддерживая как теоретическое развитие, так и практическое применение в области обнаружения вторжений в IoT.

Объектом исследования являются сети (системы) IoT, аномалии и вторжения (атаки) в сети IoT, механизмы обнаружения вторжений и аномалий (системы).

Предметом исследования являются модели и методы обнаружения вторжений на основе многозадачного обучения, включая обнаружение аномалий на основе автоэнкодера, мягкое разделение параметров с повторной выборкой данных для улучшения классификации, а также сети KAN и LSTM для анализа временных рядов в реальном времени для анализа трафика IoT.

Теоретическая значимость проведенного исследования заключается в инновационном применении передовых методов искусственного интеллекта, специально направленных на повышение безопасности сетей Интернета вещей

(1оТ). Во-первых, в работе представлена новаторская методология предварительной обработки данных, разработанная для решения сложной задачи анализа сетевого трафика. Этот новый подход учитывает многогранную природу данных 1оТ, которые часто характеризуются большими объемами, высокой скоростью и различной степенью надежности и актуальности. Благодаря предварительной обработке данных сетевого трафика с использованием методов, учитывающих особенности среды 1оТ, исследование гарантирует, что последующий анализ будет основываться на прочном фундаменте чистых, репрезентативных и информативных данных. Во-вторых, в данном исследовании разработаны усовершенствованные многозадачные модели безопасности, которые могут выполнять широкий спектр функций, выходящих за рамки простого обнаружения аномалий и атак. Эти модели предназначены для выполнения различных аналитических задач, связанных с безопасностью и производительностью сети, что повышает точность оценки безопасности. Объединяя сильные стороны многозадачного обучения и сложных алгоритмов ИИ, в основном глубоких нейронных сетей, данное исследование представляет собой целостное и адаптируемое решение сложных и развивающихся проблем, возникающих в результате новых сетевых вторжений в 1оТ. Интеграция этих передовых методологий не только обогащает теоретический ландшафт исследований в области безопасности 1оТ, но и создает новые парадигмы для понимания и устранения сетевых уязвимостей с использованием подходов, основанных на искусственном интеллекте.

Список литературы диссертационного исследования кандидат наук Дун Хуэйяо, 2025 год

Литература

1. Altan G. SecureDeepNet-IoT: A deep learning application for invasion detection in industrial internet of things sensing systems. Transactions on Emerging Telecommunications Technologies. 2021. vol. 32. no. 4. DOI: 10.1002/ett.4228.

2. Tien C.-W., Chen S.-W., Ban T., Kuo S.-Y. Machine learning framework to analyze iot malware using elf and opcode features. Digital Threats: Research and Practice. 2020. vol. 1. no. 1. pp. 1-19. DOI: 10.1145/3378448.

3. Rizvi S., Aslam W., Shahzad M., Saleem S., Fraz M. Proud-mal: static analysis-based progressive framework for deep unsupervised malware classification of windows portable executable. Complex & Intelligent Systems. 2022. pp. 1-13.

4. Jung B., Kim T., Im E. Malware classification using byte sequence information. Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems. 2018. pp. 143-148.

5. Alrawi O., Lever C., Valakuzhy K., Snow K., Monrose F., Antonakakis M., et al. The circle of life: A large-scale study of the IoT malware lifecycle. 30th USENIX Security Symposium (USENIX Security 21). 2021. pp. 3505-3522.

6. Smmarwar S., Gupta G., Kumar S. Android malware detection and identification frameworks by leveraging the machine and deep learning techniques: A comprehensive review. Telematics and Informatics Reports. 2024. vol. 14. DOI: 10.1016/j.teler.2024.100130.

7. Branitskiy A., Kotenko I. Network attack detection based on combination of neural, immune and neuro-fuzzy classifiers. IEEE 18th International Conference on Computational Science and Engineering. 2015. pp. 152-159.

8. Desnitsky V., Kotenko I., Nogin S. Detection of anomalies in data for monitoring of security components in the internet of things. XVIII International Conference on Soft Computing and Measurements. 2015. pp. 189-192.

9. Wang C., Zhao Z., Wang F., Li Q. A novel malware detection and family classification scheme for IoT based on deam and densenet. Security and Communication Networks. 2021. vol. 2021. no. 1. pp. 1-16. DOI: 10.1155/2021/6658842.

10. Yousefi-Azar M., Varadharajan V., Hamey L., Tupakula U. Autoencoder-based feature learning for cyber security applications. In 2017 International Joint Conference on Neural Networks (IJCNN). 2017. pp. 3854-3861.

11. Bakir H., Bakir R. Droidencoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms. Computers and Electrical Engineering. 2023. vol. 110. DOI: 10.1016/j.compeleceng.2023.108804.

12. Venkatraman S., Alazab M., Vinayakumar R. A hybrid deep learning image-based analysis for effective malware detection. Journal of Information Security and Applications. 2019. vol. 47. pp. 377-389.

13. Yakura H., Shinozaki S., Nishimura R., Oyama Y., Sakuma J. Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017. pp. 55-56.

14. Li X., Wang L., Xin Y., Yang Y., Chen Y. Automated vulnerability detection in source code using minimum intermediate representation learning. Applied Sciences. 2020. vol. 10. no. 5. DOI: 10.3390/app10051692.

15. Wu P., Guo H., Buckland R. A transfer learning approach for network intrusion detection. IEEE 4th International Conference on Big Data Analytics. 2019. pp. 281-285.

16. Qiang Q., Cheng M., Zhou Y., Ding Y., Qi Z. Malup: A malware classification framework using convolutional neural network with deep unsupervised pre-training. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021. pp. 627-634.

17. Hu J., Liu C., Cui Y. An improved cnn approach for network intrusion detection system. International Journal of Network Security. 2021. vol. 23. no. 4. pp. 569-575.

18. Xu Z., Fang X., Yang G. Malbert: A novel pre-training method for malware detection. Computers & Security. 2021. vol. 111(2). DOI: 10.1016/j.cose.2021.102458.

19. Habibi O., Chemmakha M., Lazaar M. Performance evaluation of cnn and pre-trained models for malware classification. Arabian Journal for Science and Engineering. 2023. vol. 48. no. 8. pp. 10355-10369.

20. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014. vol. 27. pp. 2672-2680.

21. Dong H., Kotenko I. Hybrid multi-task deep learning for improved iot network intrusion detection: Exploring different cnn structures. 2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS). 2024. pp. 7-12.

22. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A., Kaiser L., Polosukhin I. Attention is all you need. Advances in neural information processing systems. 2017. pp. 5998-6008.

23. Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. 2020. arXiv preprint arXiv:2010.11929.

24. Huh M., Agrawal P., Efros A. What makes imagenet good for transfer learning? 2016. arXiv preprint arXiv:1608.08614.

25. Kingma D., Ba J. Adam: A method for stochastic optimization. 2014. arXiv preprint arXiv:1412.6980.

26. Loshchilov I., Hutter F. Decoupled weight decay regularization. 2017. arXiv preprint arXiv:1711.05101.

27. Ronen R., Radu M., Feuerstein C., Yom-Tov E., Ahmadi M. Microsoft malware classification challenge. 2018. arXiv preprint arXiv:1802.10135.

28. Nataraj L., Karthikeyan S., Jacob G., Manjunath B. Malware images: visualization and automatic classification. Proceedings of the 8th International Symposium on Visualization for Cyber Security. 2011. pp. 1-17.

29. Szegedy C., Ioffe S., Vanhoucke V., Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence. 2017. vol. 31. no. 1. DOI: 10.1609/aaai.v31i1.11231.

30. Howard A., Sandler M., Chen B., Wang W., Chen L.-C., Tan M., Chu G., Vasudevan V., Zhu Y., Pang R., Adam H., Le Q. Searching for mobilenetv3. IEEE/CVF International Conference on Computer Vision (ICCV). 2019. pp. 1314-1324.

31. Chollet F. Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. pp. 1800-1807. DOI: 10.1109/CVPR.2017.195.

32. Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021. pp. 10012-10022.

33. Rao Y., Zhao W., Liu B., Lu J., Zhou J., Hsieh C.-J. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in Neural Information Processing Systems. 2021. vol. 34. pp. 13937-13949.

34. Kim D., Majlesi-Kupaei A., Roy J., Anand K., ElWazeer K., Buettner D., Barua R. DynODet: Detecting Dynamic Obfuscation in Malware. Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA): 14th International Conference, DIMVA. 2017. pp. 97-118.

35. Liao M., Lu Y., Li X., Di S., Liang W., Chang V. An unsupervised image dehazing method using patch-line and fuzzy clustering-line priors. IEEE Transactions on Fuzzy Systems. 2024. vol. 4. pp. 1-15. DOI: 10.1109/TFUZZ.2024.3371944.

36. Wang L., Fayolle P., Belyaev A. Reverse image filtering with clean and noisy filters. Signal, Image and Video Processing. 2023. vol. 17. no. 2. pp. 333-341.

Дун Хуэйяо — аспирантка, факультет информационной безопасности, Университет ИТМО;

программист, лабораторией проблем компьютерной безопасности, Санкт-Петербургский

Федеральный исследовательский центр Российской академии наук. Область научных

интересов: прикладная наука о данных (в частности, многозадачное глубокое обучение,

автокодирование, анализ изображений, глубокое обучение с подкреплением), сетевая безопасность, 1оТ. Число научных публикаций — 8. hydong@itmo.ru; Кронверкский проспект, 49 А, 197101, Санкт-Петербург, Россия; р.т.: +7(812)508-3311.

Hybrid Multi-Task Deep Learning for Improved IoT Network Intrusion Detection: Exploring Different CNN Structures

Huiyao Dong Faculty of Information Security ITMO University Saint Petersburg, Russia hydong@itmo.ru

o o

Igor Kotenko Laboratory of Computer Security Problems St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS) Saint Petersburg, Russia ivkote@comsec.spb.ru

■00-

O u

fcfl E

Abstract-The rapid expansion of the Internet of Things (IoT) has led to the need for robust security mechanisms to protect IoT networks and devices against various attacks. In this paper, we propose a novel hybrid intrusion detection solution that harnesses the power of multi-task learning (MTL) to enhance intrusion detection performance. We introduce a MTL structure-based model with optimal loss function and task-specific weight optimization, which effectively detects multi-class intrusion threats. Moreover, to address the challenge of imbalanced data in network traffic, we employ a generative adversarial network (GAN)-based oversampling technique for data pre-processing, generating synthetic samples for minority classes. Additionally, we conduct a comprehensive study of different convolutional neural network (CNN)-based deep learning architectures to identify the optimal shared layers for our MTL model, further enhancing the effectiveness of the intrusion detection system. Experimental results on dataset CICIDS2017 demonstrate that despite many rare attacks lacking sufficient number of samples, the MTL-based methodology can deliver superior classification performance.

Keywords—intrusion detection, deep learning, multi-task learning, convolutional neural networks, IoT network

I. Introduction

The Internet of Things (IoT) has experienced significant growth in recent years, enabling the interconnection of physical devices and networks and providing interoperable information and communication services. However, the expanding IoT networks and devices also pose security challenges, as the integration of network and physical devices increases their susceptibility to threats and attacks. Smart devices, responsible for storing confidential information and performing critical functions, have limited processing and storage capacity, making them vulnerable to hacking or compromise. Additionally, sectors like smart cities and healthcare, relying on IoT-based automation, must consider the potential vulnerability of

sensitive data and supervisory controls, as they can be attractive targets for attackers [1].

Among all established mechanisms for network intrusion detection systems (IDSs), artificial intelligence (AI) has demonstrated its ability to provide robust and reliable security solutions for both physical devices and network communication through deep learning (DL) techniques. Multilayer neural network-based anomaly detection could yield optimal performance on large-scale data [2], and various DL models can outperform traditional models [3]. Even though DL-based security applications are increasing, their performance needs improvements. Therefore, reducing false alarms [4,5], detecting zero-day attacks [6, 7], and deploying decentralized intrusion detection systems [8, 9] have become popular research directions. Meanwhile, it's essential to acknowledge the limitations of single-task model when handling high-complexity data inputs, since they lose effectiveness as data dimensionality increases. Multi-task learning (MTL) allows the model to share informative parameters among features, thereby enhancing the multi-classification performance. Additionally, it mitigates the risk of overfitting and can balance and enhance the overall performance across all tasks by utilizing task-specific loss weight optimization [10]. Hence, transitioning to MTL could result in significant breakthroughs for IDS.

This paper proposed a novel hybrid intrusion detection solution harnessing the capabilities of MTL to achieve balanced and enhanced intrusion detection performance. The primary objective is to develop a lightweight and efficient model that delivers optimal detection rates while maintaining acceptable false alarms, even for rare attacks. The proposed methodology makes the following key contributions:

• We introduce a MTL model with hard parameter sharing, utilizing the optimal loss and task-specific weight optimization. This model serves as an effective multi-

The reported study was partially funded by the budget project of FFZF-2022-0007.

class intrusion detection algorithm, improving the overall detection especially on rate attacks.

• To address the challenge of imbalanced data in network traffic, we implement a generative adversarial network (GAN)-based oversampling technique in the data preprocessing stage. This technique generates synthetic samples for minority classes.

• To determine the most suitable architecture for the shared layers of the MTL model, we conduct a comprehensive study the one dimensional variation of prominent DL structures, namely Very Deep Convolutional Networks (VGG) [11], Xception [12], and ResNet [13]. We evaluate their performance based on their detection ability and training time efficiency. This investigation allows us to identify the optimal architecture for the shared layers.

The remainder of this paper is organized as follows. Section II provides a summary of related work in the IDSs and MTL-based IDS research. In Section III, we elaborate on the proposed framework, describing the different components and their interactions. Section IV introduces the experimental datasets, presents the experimental results, and evaluates the performance of the proposed methodology. Finally, in Section V, we conclude the paper and discuss potential future research.

II. Related Work

Researchers continuously develop innovative solutions to address the technical challenges of IDSs. One key challenge in IDSs is investigating the balance between a high detection rate and an acceptable false alarm;. Various approaches like hybrid structured GAN and LSTM [4], and hybrid framework with Autoencoders (AE) and LSTM [5] are utilised. To address the challenge of zero-day attacks, which target unknown vulnerabilities, hybrid IDS models utilizing ensembled DL models have gained popularity [6]. For example, an ensemble learning model combining Gated Recurrent Units (GRU), CNN, and Random Forest enhances autonomy and prediction ability for zero-day attacks [6]. Additionally, a combination of neural networks, immune systems, neuro-fuzzy classifiers, and support vector machines has been considered [14]. Another ensemble model based on AE focuses on minimizing false alarms in zero-day attack detection [7]. Furthermore, the distributed nature of IoT networks and their numerous devices has led to the distributed IDS schemes, for example, regularized sparse Deep Belief Networks for distributed Industrial IoT IDS [8].

Recently, more attention is paid to MTL-based IDSs for many benefits like efficient resource utilization, improved overall detection performance and generalizability of the model. Hybrid model structure utilizing convolutional layers has been proven in our prior works [15, 16], in which both deep CNN and convolutional AE could perform feature extraction on the traffic classification effectively. In another complicated yet effective hybrid model, MEMBER [17], five main components were implemented: a data preprocessing layer, an AE for data projection into a latent feature space, a multi-scale CNN-based for multi-scale spatial features learning, and a distance-based prototype network for data combination and traffic classification. In another network data-oriented research [18], a

soft parameter sharing MTL framework was proposed, in which separate models including AE-based contrastive learning model, a supervised clustering model, and Multiple Layer Perception (MLP)-based classifier were implemented. The property of MTL, that it can build one major model to achieve multiple tasks based on the same dataset, has made it adaptive to dynamic environment for in-vehicle network [19] and distributed system with massive data [20].

As it is apparent that MTL has significantly advanced intrusion detection research, our method combines the strength of MTL with GAN-based oversampling and performance optimisation, which would undoubtedly enhance IDS research.

III. Methodology

Fig. 1 shows the overall workflow of the proposed method. In order to reduce the dimensionality of high dimensional network data, the principal component analysis (PCA) algorithm to select the necessary features is used. Subsequently, random undersampling and GAN-based oversampling are implemented to address the issue of imbalanced data. Lastly, an MTL model with optimized weight functions, encompassing two key components, is applied. The shared layers perform feature extraction and learning for all tasks utilising CNN, while the task-specific subnets function as dense classifiers, performing binary classification for different types of network traffic.

Feature Selection (PCA)

0 Undersampling

(benign)

Resampling ->'•

Oversampling (Attacks)

CNN-based Feature Learning

Z3

Dense Dense

Classifer Classifer

Fig. 1 The workflow of proposed framework.

A. Feature Selection with PCA

PCA is a feature selection technique to extract the most informative components from high-dimensional data by transforming original features into low-dimensional principal components, with each represents a combination of the original features. A critical aspect of PCA is the explained variance ratio, which evaluates the proportion of the overall variance explained by the individual principal components. Assuming there are k components and the variance of each component can be represented with eigenvalues A1(...,Ak , the explained variance ratio can be calculated as (1):

variancevector = / (Ai + A2 + ... + Ak),

/ (Ai + A2 + A3 + ... + Ak)] (1)

In our experiment, a threshold of 0.999 is set for the cumulative sum of the variance components. This allows for the elimination of features with low contributions while maximizing the explanation ability of retained features, resulting in a substantial reduction in input data dimensionality.

B. GAN Oversmapling

Table I illustrates the model structure of the proposed GAN model. The generator is constructed as a fully connected AE. Assuming the input features X have N dimensions, the encoding-decoding process follows the data dimension changing pattern of [N,int(N/2),int(N/4),int(N/2),N] . The generator generates X' with the same dimensionality as the original data, allowing it to learn the distribution of features and generate artificial data samples based on the given input and corresponding target labels. The discriminator consists of multiple stacked fully connected layers, activation layers, and dropout layers, followed by an output layer for binary classification using sigmoid activation.

TABLE I. GAN MODEL STRUCTURE

Model Generator Discriminator

Structure Encoder = [ Dense(128),

Dense(N/2), ReLU(),

BatchNormalization(), Dropout(0.1),

ReLU(), Dense(128),

Dense(N/4), ReLU(),

BatchNormalization(), Dropout(0.1))

ReLU()] Dense(1,

Decoder = [Dense(N/2), activation='sigmoid')

BatchNormalization(),

ReLU(),

Dense(N),

BatchNormalization(),

ReLU()]

Fig. 2 depicts how the GAN oversampling algorithm generates new data. The GAN is trained to learn the feature distribution of the input and generate samples belonging to specific classes using class labels. The generator aims to minimize the reconstruction loss by comparing the generated X' with the original X. The discriminator functions as a multi-label or binary classifier, depending on the classes. During batch training, the GAN generates synthetic data that accurately approximates the distribution of individual variables. The generated and original data are then combined and shuffled.

Algorithm 1 GANOversarapling l: define Encoder E 2: define Decoder D 3: define Generator G = DiEiinpv.t)) 4: define Discriminator D 5: for iteration = 1,2,..., k do...do 6: Train G on the original data G.fit(X, y)

7: Generate G a batch of samples newdata = G,predict(emptyvector) 8: Evaluate the discriminator D on the generated samples newdata 9: Store generated data into the new set X',append(newdata) 10: end for 11: Loss Optimization

12: Combine Original and new set Xfl/,m — X' + X Fig. 2 Pseudocode of the GAN Oversampling.

C. Multi-task Learning

MTL is an inductive technique initially introduced to enhance the generalization performance across various tasks. It accomplishes this by enabling the model's core to generalize across tasks using the same set of parameters, making use of domain-specific information during the training process. In our proposed model, the shared layers of the MTL framework are responsible for feature extraction and learning, utilizing convolutional layers. While awell-known CNN-based structures (such as ResNet, Xception, and VGG) have been successful in image classification tasks, they have not been extensively applied to one-dimensional (1-D) sequential data. To investigate and identify the most effective approach, we have constructed 1 -D versions of these three popular convolutional models and employed them as the common branch of our MTL model.

Xception is an advanced version of the separable convolution in deep learning models. It comprises two separate branches that perform convolutional functions with different settings [12]. The depthwise convolution branch resembles a typical deep convolutional network, consisting of multiple convolutional layers, normalization layers, and max pooling layers. On the other hand, the pointwise convolution branch utilizes convolutional layers with a filter size of 1. This branch reduces the dimensionality of the input data and introducing non-linearity to the convolutional operation by applying learned weights. The outputs of the two branches are later merged. Each branch learns distinct and dynamic patterns and complex relationships independently. The Inception module simplifies and improves this process by separating operations that examine cross-channel and spatial correlations separately [12].

We configured the filter size as 3 and the pooling size as 2, while the output units of each block are automatically calculated. Initially, the first Xception block applies a depthwise convolution with 32 filters of size 3 and stride 2, followed by batch normalization and ReLU activation. It is then followed by a pointwise convolution with the same filters and batch normalization. As additional Xception blocks are added, the filter volume of each block is doubled. For example, in a deep model with 4 Xception blocks, the output units of the convolutional layers in each block are 32, 64, 128, and 256. After the Xception blocks, a global average pooling is performed on the output, followed by flattening. Two fully connected layers with 128 units and ReLU activation are then added. Finally, the shared module concludes, and task-specific subnets are constructed based on the specified tasks.

ResNet. Residual blocks are a key component in mitigating the vanishing gradient problem and enhancing the effectiveness of training. These blocks consist of two branches: one branch performs feature extraction and processing, while the other serves as a skipping connection that preserves the original input data of the block. This design allows for learning the residual mapping, which represents the difference between the input and output of a block [13]. Residual blocks are particularly used in constructing deep convolutional networks. In this paper, a lightweight residual block is proposed, comprising a sequence of two 64-unit convolutional layers, normalization, and ReLU activation. To ensure the successful merging of the processed output and block input, the padding parameter on the

convolutional layer is set to "same", ensuring that the output has the same size as the input. By combining the processed features with the original input, the resulting tensor is returned as the output of the residual block. This architecture facilitates the direct flow of gradients through the skip connection, enabling efficient learning of the residual mapping.

VGG model is a deep convolutional neural network architecture with significantly increasing parameters in each block; it is original designed for large-scale image classification tasks [11], but its ability to process high-dimensional data make it also suitable of handling network traffic data analysis. The implemented 1-D VGG consists of four convolutional layers followed by one max pooling layers. The number of filters in the convolutional layers will be doubled as the network goes deeper. After several deep convolutional blocks are implemented (taking VGG19 in [11] for example, it contains four blocks), fully connected layers are implemented for classification task.

D. Loss and Weights Initialization and Optimization

The MTL loss is computed by summing the task-specific losses multiplied by their corresponding loss weights. Assuming there are T tasks for the MTL models, denoted as W = {(^....Wt} , the total loss will be Ltotal = Titer Wt • Lt, in which Lt represents the individual loss for each task. In the context of network traffic analysis, identifying benign behaviour and well-known attacks such as denial of service (DoS) is relatively straightforward, while rare attacks may lack sufficient samples and prove challenging to identify.

To enhance the performance of identifying rare attacks, we propose a difficulty-aware loss weight initialization approach. In our previous work [16], we devised a function to calculate the initial task weights based on the distribution of different traffic types. In this paper, we introduce a slight modification to the function by replacing the computed distribution with its natural logarithm. With whole data matrix D, and each task t has its sub matrix Dt, the task weights are represented as (2):

and scaling them based on the adjuster H , which provides more control over the scaling of weights for each class:

Wt = [w1,^,Wt],t ET ,wt= log(-

countsample(D) countsample(Dt)'

(2)

For imbalanced classification problems, utilising weighted loss function is particularly essential for building effective and practical model. In this paper, we implement a dynamically weighted binary cross-entropy (BCE) loss function, which incorporates class-specific weights to address the disparity between positive and negative instances. The objective of this custom loss function is to improve the model's performance by fine-tuning its ability to classify minority class instances, which are typically more challenging to predict.

In the loss function, class frequencies are first calculated using the ground truth labels y . In a binary classification, the positive cases are marked as 1 while negative ones are 0. Hence, the ratio used for adjusting negative cases rationeg and positive cases ra tioC

„pos are calculated as (3) and (4): ratiOneg = E(y)/ len(y) ratiOpoS = [len(y~) - E(y)}/ len(y)

(3)

(4)

neB max(rationegfratiOpOS) W = _ratiopos_

pos max(r at iojeg,ratiopos)

(5)

(6)

The binary cross-entropy (BCE) between the ground truth labels y and the predicted probabilities y' is then computed:

BCE(y.y') = -\y * log(y') + (1 - y) * log(1 - y')]

(7)

A weight vector is formulated by applying class-specific weights to the ground truth labels:

(i=yi* (pos + (1- yd * Wneg> WixW (8)

Finally, the dynamically weighted binary cross-entropy is calculated by performing element-wise multiplication of the weight vector W and the original BCE values. The mean of the resulting product is returned as the custom weighted loss (9):

weightedBCEy y/p = mean(WQBCE(y_true,y_pred)) (9)

In the experiment, we implement 4 functional blocks for all the three models; for Xception, the convolutional unit of each block are (32, 64, 128, 256); for Resnet, the convolutional unit of all blocks are 64; for VGG, the convolutional unit are (64, 128, 256, 512). To make sure the models are lightweight and easy to compute, we try to implement convolutional layer with small unit except VGG, as the model was designed to have expanding width as model deepen.

IV. Experiments

A. Datasets

We use the widely used dataset CICIDS2017, which was recently created and may give accurate representations of contemporary network traffic and assaults, to assess the detection effectiveness of our algorithms. Comprising 80 characteristics [21], it consists of both regular traffic and seven types of attacks. As shown in Table II, while typical behaviour and a few significant assaults constitute the bulk of the dataset, certain crucial attacks in CICIDS2017 only pose tiny percentages. Rare attacks like web attack, heartbleed and bot, despite the possible severe damage they can cause, their proportion in the network traffic is less than 0.1%, which is extremely challenging for IDS to identify.

TABLE II. Network Traffic Shares of CICIDS2017

CICIDS2017

Attacks Volumes Share

BENIGN 2145166 84.358%

Bot 1966 0.077%

Brute Force 13835 0.544%

DDoS 128027 5.035%

DoS 251712 9.898%

Heartbleed 11 0.0004%

Infiltration 36 0.001%

Web Attack 2180 0.086%

As shown in (5) and (6), the class weights wneg and wpos are calculated by normalizing the negative and positive class ratios

B. Binary Classification

Firstly, we tested three models in their single task learning form for binary classification, namely labelling samples as either benign or malicious. All the experiments were run on a T4 GPU, with 50 training epochs and a batch size of 1,024.

Table III presents a comparison of the performance of three different classification models across several evaluation metrics, including loss, accuracy, precision, recall, F-1 score, and the false alarm rate. This comparison is made for both the training and validation sets. All models perform well on the training set, achieving over 99% accuracy and a false alarm rate lower than 1%. Among the models, ResNet exhibits the best recall and F1 rate, indicating it has the highest detection ability. Based on these metrics, ResNet shows the best overall performance in the training set, closely followed by Xception. However, on the validation set, both Xception and VGG maintain a low loss, while ResNet has a significantly higher loss of 0.320. The accuracy of ResNet on the validation set is relatively lower at 94.960%, and the recall drops dramatically to 67.761%. Despite the decrease in detection ability, ResNet has the lowest false alarm rate (0.002%), followed by Xception (0.103%) and VGG (0.892%). When considering the validation set, Xception and VGG performs closer to each other, with Xception generally showing better results in terms of F-1 score and false alarm rate, while VGG has a higher accuracy and recall.

TABLE III. EVALUATION MATRICS OF BINARY CLASSIFICATION

Model Loss Accuracy Precision Recall F-1 False alarm

Train Xception 0.213 99.310% 97.656% 97.927% 97.791% 0.436%

ResNet 0.210 99.350% 97.645% 98.182% 97.913% 0.439%

VGG 0.211 99.240% 97.300% 97.868% 97.583% 0.504%

Validation Xception 0.217 98.890% 99.411% 93.445% 96.336% 0.103%

ResNet 0.320 94.960% 99.985% 67.761% 80.778% 0.002%

VGG 0.212 99.090% 95.365% 99.008% 97.153% 0.892%

Based on the performance on the validation set, the Xception model could be considered the best overall choice due to its trade-off between the F-1 score and false alarm rate, which reflects a good balance between precision and recall. Additionally, the AUC curve in Fig. 3 supports this conclusion for all three models. ResNet displays signs of overfitting on the training set and was not able to deliver equally good performance, indicating it is not the best choice for CICIDS 2017. Furthermore, it is important to consider the computational time, as the average training time for Xception is 36s, while VGG takes significantly longer time due to its considerable model depth and width. In conclusion, considering the trade-off between performance metrics, the Xception model appears to be the most suitable choice for the binary classification task, demonstrating a good balance between precision and recall.

C. Multi-task Setting Classification

In MTL, each type of network traffic is considered an individual task. Every subnet comprises two fully connected layers devoted to feature extraction and a single output layer for binary classification. The experiments were run on a T4 GPU, consisting of 100 training epochs and a batch size of 1,024.

Table IV provides a comprehensive evaluation and comparison of binary MTL models across many types of attacks. For benign behaviours and customary attacks, all models demonstrate promising performance. In particular, the VGG model performs outstandingly, boasting an accuracy of 99.50% and an F-1 score of 99.71%. The remaining two models, Xception and ResNet, also exhibit impressive accuracy, nearing 99.85%. When it comes to detecting Denial of Service (DoS) attacks, all three models exhibit superior accuracy above 99%, with false-alarm rates falling below 0.5%. Notably, the ResNet model surpasses Xception and VGG in terms of recall, indicating its unique ability to correctly identify positive instances of DoS attacks.

ROC Curves • Binary Classification

0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate

Fig. 3 AUC curve comparison.

Contrarily, all models face challenges in recall and precision for rarer forms of attacks. For instance, in detecting Bot-net attacks, while ResNet exhibits the highest recall rates, the associated false-alarm rate is relatively high, indicating room for improving its detection capabilities. For 'Infiltration' traffic, all models boast near-flawless accuracy due to highly imbalanced classes. However, there are significant variations in precision and recall. Among them, the ResNet model outshines by achieving the highest recall and F-1 score. Concerning the Heartbleed attack, for which only four positive cases exist, Xception and ResNet models obviously uncover better detection capacity with a 75% recall rate.

In conclusion, all three models exhibit substantial detection prowess across various types of network traffic, albeit with minor fluctuations. The ResNet model outperforms in recall rates, particularly for rarer attacks, indicating its dominant detection potential. In contrast, the Xception model manifests consistent performance with relatively lower false alarm rates and high precision. Given its computational efficacy, Xception could also be a practical option for IoT systems that necessitate prompt detection and minimal false alarms.

V. CONCLUSION

In the paper we propose the MTL based intrusion detection framework which consists of oversampling technique with GAN. We fulfilled a comprehensive comparison and analysis on different types of convolutional neural networks for the shared layer of MTL model with optimal loss function and task-specific weight optimization. The experimental results underscore the

efficacy and potential of multitask learning models in detecting and categorizing various attack types accurately. Each model demonstrates its unique strengths in certain attack categories. However, while their remarkable accuracy rates and detection capacity for major attacks are commendable, there remains room to ameliorate the detection proficiency for rarer attacks.

TABLE IV. EVALUATION MATRICS OF MTL MODELS

Normal