Corporate Chat eDiscovery

Processing & Reviewing Chat Data: Handling Volume, Threads & Metadata

Chat data processing is fundamentally different from email. Learn how to handle massive message volumes, thread complexity, metadata challenges, and cost-effective review strategies.

By Legal Tech Titan Editorial

• August 12, 2024 • 19 min read

Chat data processing and review for eDiscovery

Once corporate chat data is extracted, the real challenges begin. Processing and reviewing chat data requires different approaches, tools, and expertise than traditional email-based eDiscovery. The volume, format, and metadata characteristics of chat create unique processing challenges that can dramatically impact project costs and timeline. Chat data volume challenges are unprecedented. A single Slack workspace with 500 users and 3 years of history might contain 50+ million messages. Compare that to email: the same organization might have 1-2 million emails. Chat messages are typically much shorter (average 20-50 words vs. 300+ words for email), creating processing and review challenges. You can't simply apply email-based workflows to chat. Processing chat data begins with proper load file creation. Unlike email which has standardized formats (PST, MSG), chat export formats vary widely. Slack exports JSON, Teams exports can be CSV or structured formats, Google Chat exports XML. A good processing platform (Relativity, eDiscovery Plus, or similar) can normalize these formats into a consistent structure suitable for review. Critical metadata considerations for chat include: message timestamps (often inaccurate due to timezone issues), threading relationships (messages in a thread should be grouped together), user metadata at message time (who was the sender, what was their role), edits and reactions (which provide context but aren't always exported), file attachments and links (which may be stored separately), and formatting (italics, bold, code blocks lose meaning in plain text export). Threading is a unique challenge in chat that doesn't exist in email. A single Slack thread might have 50-100 messages discussing a critical topic. If those messages are flattened into individual documents, you lose the context and coherence of the conversation. Proper chat processing keeps threaded messages together—either grouped as single documents or linked with clear parent-child relationships. The best practice is to export threads as single documents or keep clear threading indicators in the review platform. Deduplication is another complex issue. Chat messages are rarely duplicated in the way emails are, but messages can appear in multiple channels (cross-posted or mentioned in different workspaces). You need deduplication logic that preserves necessary copies while removing true duplicates. Volume mitigation strategies are essential for managing chat eDiscovery costs. Technology-Assisted Review (TAR) is particularly effective for chat data because the high volume and repetitive nature of chat make AI models highly effective. Using CAL (Continuous Active Learning) on chat data can reduce required review by 60-80% compared to traditional manual review. Filtering strategies are also important: eliminate administrative messages, bot notifications, or automated alerts that clutter data without adding relevance. Early case assessment tools help identify the most relevant channels and custodians before full extraction. Chat review interfaces must differ from email review. Traditional document review platforms designed for email often struggle with chat because they don't handle threading, formatting, or conversation context well. Specialized chat review interfaces (offered by newer platforms like Logikcull or Everlaw) make threading explicit, preserve formatting, and display conversation context. For large chat eDiscovery projects, cost-effective approaches include: (1) Use TAR/CAL to identify relevant messages—target 10-15% sampling of chat messages for review; (2) Filter out low-value communications (notifications, status updates, links without context); (3) Group threaded conversations for context preservation; (4) Use horizontal and vertical slicing—review all messages from certain custodians or during certain timeframes; (5) Utilize subject matter expert review for complex domains (e.g., technical team chat about engineering decisions). Privilege handling in chat is particularly tricky. Email signatures sometimes indicate privilege ("Privileged & Confidential"), but chat typically lacks formal indicators. You must identify privileged chat through context—legal team channels, attorney-client discussions, work product discussions. This requires human review and careful privilege log entries. The economics of chat eDiscovery are favorable compared to traditional email. Despite larger volumes, TAR effectiveness means faster review and lower per-message costs. A typical chat eDiscovery project might cost $50,000-150,000 depending on volume and complexity, compared to $200,000-500,000+ for equivalent email volumes using traditional review. This cost advantage makes chat eDiscovery manageable even for smaller firms handling larger cases. Organizations managing chat eDiscovery should budget for specialized tools, experienced technical coordinators, and TAR-capable review platforms. The investment pays dividends in faster, more cost-effective discovery.

Processing & Reviewing Chat Data: Handling Volume, Threads & Metadata

Tags

Stay Updated on Legal Tech Trends

Processing & Reviewing Chat Data: Handling Volume, Threads & Metadata

Tags

Related Articles

Corporate Chat Data in eDiscovery: Why Slack, Teams & Google Chat Matter

Slack eDiscovery: Export, Preserve & Produce Slack Messages Correctly

Microsoft Teams eDiscovery: Extracting Chat, Files & Compliance Data

Stay Updated on Legal Tech Trends