Methodology
Data collection pipeline, value-alignment coding schema, and corpus quality notes for the LinkedIn study.
| Stage | Count |
|---|---|
| Raw comments collected | 4.3K |
| After cleaning & de-noising | 4.2K |
| Relevant to AI value alignment | 607 |
| LLM coded | 607 |
LinkedIn provides no public API for posts or comments, and automated scraping violates the platform's Terms of Service. The corpus is therefore assembled from compliant sources only: manual capture of public posts, participant-donated exports, and (where granted) LinkedIn research-programme access. Author names are retained solely for deduplication and are excluded from all published analyses; comments are analysed at the aggregate level.
Each comment is coded on four value-alignment dimensions using a zero-temperature Llama-3.3-70B call with a 4-step chain-of-thought (VA-4S). One comment per call, with the parent post supplied as context. Values are validated against the controlled vocabulary below; the emotion vocabulary is identical to the folk corpus to allow cross-study comparison.
| Dimension | Valid Values |
|---|---|
| value_primary + value_secondary |
safety · transparency · fairness · privacy · accountability · honesty · human_autonomy · beneficence · dignity · sustainability · economic_equity · none · unclear |
| target | individual_users · workers · organisations · society · humanity · vulnerable_groups · none · unclear |
| stance | demanding · optimistic · skeptical · critical · mixed · unclear |
| emotion | approval · fear · outrage · indifference · resignation · mixed · unclear |
Comments pass pipeline.py filter if they contain at least one term
from each of the AI and value/alignment keyword sets.
| Rule | Description |
|---|---|
| AI keywords | ai, artificial intelligence, machine learning, llm, genai, chatgpt, claude, gemini, copilot, automation, agentic… |
| Value/alignment keywords | align, value, ethic, trust, responsib-, transparen-, fair, bias, privacy, safe, honest, autonom-, dignity, sustainab-, job, equit-, guardrail… |
| Min length | ≥ 20 characters after strip |
| Excluded | Congratulation/engagement-bait comments ("Great post!", "CFBR"), URLs-only, emoji-only bodies |