Risk Factor Disclosure Dataset v1.0
📦 Risk Factor Disclosure Dataset v1.0 – 1,869 Enriched Risk Disclosures from SEC Filings
A premium, AI-ready dataset of 1,788 clean, structured Item 1A: Risk Factors
extracted from 10-K filings of 267 publicly traded U.S. companies between 2010–2024.
Ideal for:
✅ LLM training and retrieval for risk, compliance, and regulatory intelligence
✅ Trend analysis across industries and macroeconomic events
✅ Risk classification and forward-looking disclosure modeling
✅ Grounding GenAI agents with real corporate risk language
✅ ESG, litigation, and geopolitical risk research
🧠 What’s Included
-
item1a_enriched.csv
– Full dataset with risk category tags, forward score, summaries, and metadata -
item1a_enriched.jsonl
– JSONL version for use in AI pipelines -
sample_100_item1a.csv
– 100-record preview sample -
README.md
– Field descriptions, use cases, and schema -
LICENSE.txt
– Tiered license for individual and enterprise use
📊 Dataset Stats
MetricValueTotal Records1,788 enriched disclosuresUnique Tickers267Coverage Years2010 – 2024Avg. Risk Section Length~520 wordsForward Score Range0.00 – 0.03Forward Score Mean ± StdDev0.0157 ± 0.0066Number “Too Short” (<100w)3 (filtered out)Risk Categories (multi-label)7 (see below)
🔖 Risk Categories Covered
Each record is tagged with one or more of the following themes:
- Regulatory
- Litigation
- Geopolitical
- Technology Disruption
- Cybersecurity
- Climate
- Supply Chain
Also included:
- Forward-looking language score based on modal verbs and speculative phrasing
- Model-generated summary (DistilBART)
- SEC metadata (ticker, CIK, year, filing date)
🧠 Use Cases
- Fine-tune LLMs for legal/financial risk reasoning
- Train classifiers to detect emerging or underreported risks
- Build GenAI copilots for compliance, ESG, or investor relations
- Retrieve top risk disclosures by tag, score, or industry
- Visualize disclosure evolution across 15 years
💼 License Tiers
Tier Scope Price Early Bird Single-user, no updates $299 Individual Single-user + lifetime updates $499 Enterprise Org-wide use + internal product rights$999
Refer to LICENSE.txt
for details.
🎁 Sample file: Risk Factor Disclosure Sample
Link:https://drive.google.com/file/d/1rCG_lgHy9LxyhsATElTB1DfkEVf7DfQ2/view?usp=sharing
📬 Questions or enterprise licensing? Contact: Asapuaiworks@gmail.com
A high-quality, AI-ready dataset of 1,869 structured risk factor disclosures (Item 1A) extracted from 10-K filings of 267 U.S. public companies across 15 years (2010–2024).