XBRLFiles
Explore XBRL with R
Install / Use
/learn @bergant/XBRLFilesREADME
Exploring XBRL files with R
Darko Bergant
Saturday, January 17, 2015
#What is XBRL? Extensible Business Reporting Language (XBRL) is the open international standard for digital business reporting, managed by a global not for profit consortium, XBRL International.
#XBRL Parser for R Parsing XBRL is not something you could do with your eyes closed. Fortunately the XBRL package by Roberto Bertolusso and Marek Kimel takes all the pain away.
To parse complete XBRL, use xbrlDoAll function.
It extracts xbrl instance and related schema files to a list of data frames.
library(XBRL)
inst <- "http://edgar.sec.gov/Archives/edgar/data/21344/000002134414000008/ko-20131231.xml"
options(stringsAsFactors = FALSE)
xbrl.vars <- xbrlDoAll(inst, cache.dir = "XBRLcache", prefix.out = NULL)
str(xbrl.vars, max.level = 1)
## List of 10
## $ element :'data.frame': 21453 obs. of 8 variables:
## $ role :'data.frame': 96 obs. of 5 variables:
## $ calculation :'data.frame': 196 obs. of 11 variables:
## $ context :'data.frame': 740 obs. of 13 variables:
## $ unit :'data.frame': 4 obs. of 4 variables:
## $ fact :'data.frame': 2745 obs. of 9 variables:
## $ footnote :'data.frame': 6 obs. of 5 variables:
## $ definition :'data.frame': 1398 obs. of 11 variables:
## $ label :'data.frame': 2798 obs. of 5 variables:
## $ presentation:'data.frame': 1582 obs. of 11 variables:
XBRL Data Frames Structure
The data structure of the data frames is shown in the image below

All values are kept in the fact table (in the fact field, precisely).
The element table defines what are these values (the XBRL concepts, e.g. “assets”, “liabilities”, “net income” etc.).
The context table defines the periods and other dimensions for which the values are reported.
With dplyr's join and filter it is quite easy to explore the data in interrelated tables.
For example, to extract revenue from the sale of goods we have to join the
facts (the numbers) with the context (periods, dimensions):
library(dplyr)
xbrl.vars$fact %>%
filter(elementId == "us-gaap_SalesRevenueGoodsNet") %>%
left_join(xbrl.vars$context, by = "contextId") %>%
filter(is.na(dimension1)) %>%
select(startDate, endDate, fact, unitId, elementId) %>%
(knitr::kable)(format = "markdown")
|startDate |endDate |fact |unitId |elementId | |:----------|:----------|:-----------|:------|:----------------------------| |2011-01-01 |2011-12-31 |46542000000 |usd |us-gaap_SalesRevenueGoodsNet | |2012-01-01 |2012-12-31 |48017000000 |usd |us-gaap_SalesRevenueGoodsNet | |2013-01-01 |2013-12-31 |46854000000 |usd |us-gaap_SalesRevenueGoodsNet |
Balance Sheet Example
Select Statement
XBRL encapsulates several reports of different types:
table(xbrl.vars$role$type)
##
## Disclosure Document Statement
## 86 1 9
To find all statements, filter roles by type:
htmlTable::htmlTable(data.frame(Statements=
with(
xbrl.vars$role[xbrl.vars$role$type=="Statement", ],
paste(roleId, "\n<br/>", definition, "\n<p/>")
)),
align = "l",
rnames = FALSE
)
<table class='gmisc_table' style='border-collapse: collapse;' >
<thead>
<tr>
<th style='border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>Statements</th>
</tr>
</thead>
<tbody>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedBalanceSheetParentheticals
<br/> 1003500 - Statement - CONSOLIDATED BALANCE SHEET (Parentheticals)
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedBalanceSheets
<br/> 1003000 - Statement - CONSOLIDATED BALANCE SHEETS
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfCashFlows
<br/> 1004000 - Statement - CONSOLIDATED STATEMENTS OF CASH FLOWS
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfComprehensiveIncome
<br/> 1002000 - Statement - CONSOLIDATED STATEMENTS OF COMPREHENSIVE INCOME
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfComprehensiveIncomeCalc2
<br/> 1002000 - Statement - CONSOLIDATED STATEMENTS OF COMPREHENSIVE INCOME
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfIncome
<br/> 1001000 - Statement - CONSOLIDATED STATEMENTS OF INCOME
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfIncomeCalc2
<br/> 1001000 - Statement - CONSOLIDATED STATEMENTS OF INCOME
<p/></td>
</tr>
<tr>
<td style='text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfShareownersEquity
<br/> 1005000 - Statement - CONSOLIDATED STATEMENTS OF SHAREOWNERS' EQUITY
<p/></td>
</tr>
<tr>
<td style='border-bottom: 2px solid grey; text-align: left;'>http://www.thecocacolacompany.com/role/ConsolidatedStatementsOfShareownersEquityParentheticals
<br/> 1005500 - Statement - CONSOLIDATED STATEMENTS OF SHAREOWNERS' EQUITY (Parentheticals)
<p/></td>
</tr>
</tbody>
</table>
Presentation hierarchy
To find out which concepts are reported on specific financial statement component, we have to search the presentation tree from the top element.
library(tidyr)
library(dplyr)
# let's get the balace sheet
role_id <- "http://www.thecocacolacompany.com/role/ConsolidatedBalanceSheets"
# prepare presentation linkbase :
# filter by role_id an convert order to numeric
pres <-
xbrl.vars$presentation %>%
filter(roleId %in% role_id) %>%
mutate(order = as.numeric(order))
# start with top element of the presentation tree
pres_df <-
pres %>%
anti_join(pres, by = c("fromElementId" = "toElementId")) %>%
select(elementId = fromElementId)
# breadth-first search
while({
df1 <- pres_df %>%
na.omit() %>%
left_join( pres, by = c("elementId" = "fromElementId")) %>%
arrange(elementId, order) %>%
select(elementId, child = toElementId);
nrow(df1) > 0
})
{
# add each new level to data frame
pres_df <- pres_df %>% left_join(df1, by = "elementId")
names(pres_df) <- c(sprintf("level%d", 1:(ncol(pres_df)-1)), "elementId")
}
# add last level as special column (the hierarchy may not be uniformly deep)
pres_df["elementId"] <-
apply( t(pres_df), 2, function(x){tail( x[!is.na(x)], 1)})
pres_df["elOrder"] <- 1:nrow(pres_df)
# the final data frame structure is
str(pres_df, vec.len = 1 )
## 'data.frame': 37 obs. of 8 variables:
## $ level1 : chr "us-gaap_StatementOfFinancialPositionAbstract" ...
## $ level2 : chr "us-gaap_StatementTable" ...
## $ level3 : chr "us-gaap_StatementScenarioAxis" ...
## $ level4 : chr "us-gaap_ScenarioUnspecifiedDomain" ...
## $ level5 : chr NA ...
## $ level6 : chr NA ...
## $ elementId: chr "us-gaap_ScenarioUnspecifiedDomain" ...
## $ elOrder : int 1 2 ...
Amounts and Contexts
Elements (or concepts in XBRL terminology) of the balance sheet are now gathered in data frame with presentation hierarchy levels. To see the numbers we have to join the elements with numbers from fact table and periods from context table:
# join concepts with context, facts
pres_df_num <-
pres_df %>%
left_join(xbrl.vars$fact, by = "elementId") %>%
left_join(xbrl.vars$context, by = "contextId") %>%
filter(is.na(dimension1)) %>%
filter(!is.na(endDate)) %>%
select(elOrder, contains("level"), elementId, fact, decimals, endDate) %>%
mutate( fact = as.numeric(fact) * 10^as.numeric(decimals)) %>%
spread(endDate, fact ) %>%
arrange(elOrder)
library(pander)
pres_df_num %>%
select(elementId, contains("2013"), contains("2012")) %>%
pandoc.table(
style = "rmarkdown",
split.table = 200,
justify = c("left", "right", "right")
)
| elementId | 2013-12-31 | 2012-12-31 | |:-------------------------------------------------------------------------------|-------------:|-------------:| | us-gaap_CashAndCashEquivalentsAtCarryingValue | 10414 | 8442 | | us-gaap_OtherShortTermInvestments | 6707 | 5017 | | us-gaap_CashCashEquivalentsAndShortTermInvestments | 17121 | 13459 | | us-gaap_MarketableSecuritiesCurrent | 3147 | 3092 | | us-gaap_AccountsReceivableNetCurrent | 4873 | 4759 | | us-gaap_InventoryNet | 3277 | 3264 | | us-gaap_PrepaidExpenseAndOtherAssetsCurrent | 2886 | 2781 | | us-gaap_AssetsHeldForSaleCurrent | 0 | 2973 | | us-gaap_AssetsCurrent | 31304 | 30328 | | us-gaap_EquityMethodInvestments | 10393 | 9216 | | ko_AvailableForSaleSecuritiesAndCostMethodInvestments | 1119 | 1232 | | us-gaap_OtherAssetsNoncurrent | 4661 | 3585 | | us-gaap_PropertyPlantAndEquipmentNet |
Related Skills
node-connect
337.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.3kCommit, push, and open a PR
Security Score
Audited on Jan 9, 2026
