Course: Large Language Models & Agents

MSc. Data Science, Bar-Ilan University

2025 Fall

Lecturer: Dr. Alexander(Sasha) Apartsin

HoS Course Series Home: Here

StudenT Projects

MedFollow Extract

Clarity from complexity in patient care instructions

Michal Laufer

GitHub

An AI system that analyzes medical summaries and automatically extracts follow-up instructions, including next steps, monitoring guidelines, lifestyle recommendations, and scheduled evaluations. MedFollow Extract transforms unstructured clinical text into clear, structured action items to support continuity of care

PolicyWeaver

Deriving actionable LLM policies from real traces.

Ofek Ophir, Gilad Zusman

GitHub

PolicyWeaver automatically infers, updates, and validates behavioral policies for LLMs by analyzing small sets of desirable and undesired model traces. The system generalizes from examples to extract consistent rules, identify risky patterns, and generate improved policy specifications that guide safer, more predictable model behavior. It streamlines policy creation by transforming raw interaction traces into explicit constraints that can be iteratively refined as new cases emerge.

ActionSense

Reliable action items from unreliable meeting audio transcripts.

Yael Reina, Meir Weinberg

GitHub

ActionSense extracts clear, structured action items from unstructured and noisy meeting data, focusing on transcripts degraded by ASR errors, overlapping speech, filler words, and fragmented utterances. The system identifies commitments, deadlines, follow-ups, and responsibilities even when the input text is inconsistent or partially corrupted. It provides robust extraction pipelines tailored to real-world meeting environments, where imperfect audio and transcription noise cause standard NLP methods to fail.

Adversarial Agreement Benchmark (AAB)

Measuring how fast confidence collapses under adversarial persuasion.

Gil Shapira

GitHub

This project introduces a controlled benchmark for measuring the fragility of LLM agreement under adversarial interaction. The benchmark quantifies how long and under what strategies an adversarial LLM can persuade a target LLM to abandon a correct answer and converge on an incorrect or hallucinated one.

CodeReFresh

Keeping your model’s knowledge as current as your code

Aviad Oster, Netanel Daniel

GitHub

A knowledge-editing framework that updates code-generation models with the latest changes in programming libraries such as PyTorch. CodeReFresh modifies only the relevant internal knowledge segments without retraining from scratch, ensuring that generated code reflects new APIs, deprecated functions, and evolving best practices.

ContextDiarist

Dialogue boundaries inferred through global context.

Yoav Ellinson

GitHub

ContextDiarist performs dialogue diarization using an LLM that can interpret long-range relationships, global context, and conversational dynamics. Instead of relying solely on acoustic cues, the system segments and attributes turns by understanding speaker intent, topical flow, semantic continuity, and cross-utterance dependencies. This enables accurate diarization even in cases with overlapping themes, indirect references, or sparse speaker markers, leveraging the LLM’s holistic view of the entire conversation.

Course: Large Language Models & Agents

StudenT Projects

MedFollow Extract

PolicyWeaver

ActionSense

Adversarial Agreement Benchmark (AAB)

CodeReFresh

ContextDiarist

Browse for all upcoming Hands-On AI Science course offerings, and past student projects