Specific: Clearly define the exact goal or objective without ambiguity
Measurable: Establish concrete criteria to track progress and quantify outcomes
Achievable: Set realistic and attainable targets within existing resources and constraints
Relevant: Ensure the goal aligns with broader organizational or personal objectives
Time-bound: Set a precise deadline or timeframe for completing the goal
Data Ethics
Consent: Obtaining explicit permission from individuals before collecting or using their personal data
Transaction-transparency: Clearly communicating how data is collected, used, and shared
Openness: Providing clear and accessible information about data practices and processes
Privacy: Protecting individual data from unauthorized access or misuse
Ownership: Recognizing and respecting individuals' rights to their personal data
Currency: Ensuring data remains up-to-date, accurate, and relevant
Data Credibility
Reliability: Consistency of measurements across different collection instances
Originality: Direct sourcing from primary information sources without intermediary alterations
Comprehensiveness: Complete inclusion of all essential data elements required for accurate analysis
Currency: Timeliness and relevance of data to the current research or operational context
Citation: Formal acknowledgment and linkage to the original data production source
Data Solutions Questionaire
Practical: Is it easy to implement?
Unintended Consequences: What will happen if we do this?
Logical: Does this make sense logically?
Precedent Backed: What happened when we tried this before?
Ethical: Is this the right thing to do?
Difference with Alternative: Is it better than other ideas?
Metadata Types
Structural: Describes the internal organization and relationships within a data set
Administrative: Provides technical information about data management, creation, and preservation
Descriptive: Identifies and explains the content, context, and characteristics of data
Data Collection Best Practices
Systematic Data Management
Primary Recording: Capture raw data on paper as initial documentation
Digital Migration: Systematically transfer data to electronic format
Worksheet Optimization: Consolidate data in a single, structured worksheet
Structural Integrity
Implement a unique identifier (ID) column for precise record tracking
Allocate one column per distinct variable
Reserve first row for variable/column names
Quality Control
Completeness: Ensure every cell contains meaningful information
Meticulous Documentation: Maintain comprehensive and clear research notes
Standardization: Maintain consistent data entry protocols
Analytical Rigor
Precision: Avoid data speculation or unsubstantiated entries
Numerical Representation: Recognize zero (0) as a valid numerical value
Data Cleaning Protocol
Preservation
Create an unaltered backup of original dataset
Perform cleaning in a separate working table
Error Management
Systematic Error Tracking: Document and report all data anomalies
Utilize database functions for efficient and reliable data cleaning
Exploratory Data Analysis (EDA): Non-Sequential Iterative Approach
Discovery Phase
Rapidly scan dataset to understand fundamental characteristics and potential insights
Structural Assessment
Map data architecture, identifying variable types and potential inter-feature relationships
Validation Processes
Rigorously verify data integrity, statistical assumptions, and distribution consistency
Cleaning Techniques
Strategically handle missing values, outliers, and standardize data formats
Joining and Integration
Seamlessly merge datasets, ensuring referential integrity and comprehensive data unification
Presentation Preparation
Transform complex data into compelling visualizations and actionable insights
Iterative Refinement
Dynamically cycle through analysis stages, continuously challenging and improving initial assumptions
Key Principles
Maintain analytical flexibility, prioritizing deep data understanding over rigid methodological constraints
Normalization vs Standardization
Normalization scales data to a fixed range (typically 0-1) using min-max scaling, useful for algorithms that require bounded input features.
Standardization transforms data to have zero mean and unit variance, creating a standard normal distribution that helps handle outliers and works well with normally distributed data.
Imputing vs Weight of Evidence
Imputation replaces missing data with estimated values based on other available information to preserve data completeness and reduce bias.
Weight of Evidence (WoE): Convert all of the quantities to categories with giving non-availability of data a category this helps us not miss out any information due to bias.
Long Table vs Wide Table
A long table stores each data point as a separate row, with an identifier column to distinguish different series, allowing more flexible data representation.
A wide table condenses data so that each line represents an entire series, with multiple columns representing different data points, which can improve query performance for certain aggregations.
Anxiety Cheat Sheet Anxiety Cheat Sheet I. Anxiety External Reasons Trauma Acute Trauma : Sudden, intense distress from a single event. Example: Experiencing a car accident. Chronic Trauma : Ongoing exposure to distressing situations. Example: Growing up in an abusive household. Complex Trauma : Multiple, varied traumatic events over time. Example: Enduring repeated bullying at school and neglect at home. Overthinking Future Tripping : Exc...
Philosophical Blog Post Exploring Philosophical Thought Experiments Themes Aesthetics Authenticity vs cultural amalgamation Temporary beauty vs lasting permanence High pleasure vs low pleasure Art as intentional creation vs random natural occurrences Counterargument to expressive theory: art isn't just intentional expression of examined clarified emotions. Politics and Social Utilitarian guilt ridden action vs moral inaction. Utility monster: Utilitarianism exploited as hedonistic endeavors. Tragedy of the commons: shared resources overused. Ownership vs labor rights and their conflicts. Radical translation: challenges in interpreting languages and meaning. Lifeboat thought experiment: harsh limit of resource sharing. Fair inequality: better than...
Cognitive Biases and Laws 99 Eponymous Laws List of Cognitive Biases and Laws Maslow hammer: a cognitive bias that involves an over-reliance on a familiar tool . Murphy Law (Finagle Law, Sod Law): Anything that can go wrong will go wrong . Parkinson's Law: The demand upon a resource tends to expand to match the supply of the resource (If the price is zero). Sturgeon Law: asserts that 90% of everything is crud. Occam Razor: states that the simplest explanation is usually the correct one. Hanlon Razor: suggests that one should not attribute to malice what can be adequately explained by incompetence. Pareto Principle: also known as the 80/20 rule, states that roughly 80% of the effects come from 20% of the causes. Peter principal: states that employees tend to rise to their level of incompetence within an organizational hierarc...
Comments
Post a Comment