unknown.html
6.74 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Change unknown values to NA and vice versa</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>
<table width="100%" summary="page for unknownToNA {gdata}"><tr><td>unknownToNA {gdata}</td><td align="right">R Documentation</td></tr></table>
<h2>Change unknown values to NA and vice versa</h2>
<h3>Description</h3>
<p>
Unknown or missing values (<code>NA</code> in <font face="Courier New,Courier" color="#666666"><b>R</b></font>) can be represented in
various ways (as 0, 999, etc.) in different programs. <code>isUnknown</code>,
<code>unknownToNA</code>, and <code>NAToUnknown</code> can help to change unknown
values to <code>NA</code> and vice versa.
</p>
<h3>Usage</h3>
<pre>
isUnknown(x, unknown=NA, ...)
unknownToNA(x, unknown, warning=FALSE, ...)
NAToUnknown(x, unknown, force=FALSE, call.=FALSE, ...)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
generic, object with <code>NA</code></td></tr>
<tr valign="top"><td><code>unknown</code></td>
<td>
generic, value used instead of <code>NA</code></td></tr>
<tr valign="top"><td><code>warning</code></td>
<td>
logical, issue warning if <code>x</code> already has <code>NA</code></td></tr>
<tr valign="top"><td><code>force</code></td>
<td>
logical, force to apply already existing value in <code>x</code></td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
arguments pased to other methods (as.character for POSIXlt
in case of isUnknown)</td></tr>
<tr valign="top"><td><code>call.</code></td>
<td>
logical, look in <code><a href="../../base/html/warning.html">warning</a></code></td></tr>
</table>
<h3>Details</h3>
<p>
This functions were written to handle different variants of
“other <code>NA</code>” like representations that are usually used in
various external data sources. <code>unknownToNA</code> can help to change
unknown values to <code>NA</code> for work in <font face="Courier New,Courier" color="#666666"><b>R</b></font>, while <code>NAToUnknown</code> is
meant for the opposite and would usually be used prior to export of data
from <font face="Courier New,Courier" color="#666666"><b>R</b></font>. <code>isUnknown</code> is utility function for testing for unknown
values.
</p>
<p>
All functions are generic and the following classes were tested to work
with latest version: “integer”, “numeric”,
“character”, “factor”, “Date”, “POSIXct”,
“POSIXlt”, “list”, “data.frame” and
“matrix”. For others default method might work just fine.
</p>
<p>
<code>unknownToNA</code> and <code>isUnknown</code> can cope with multiple values in
<code>unknown</code>, but those should be given as a “vector”. If not,
coercing to vector is applied. Argument <code>unknown</code> can be feed also
with “list” in “list” and “data.frame” methods.
</p>
<p>
If named “list” or “vector” is passed to argument
<code>unknown</code> and <code>x</code> is also named, matching of names will occur.
</p>
<p>
Recycling occurs in all “list” and “data.frame” methods,
when <code>unknown</code> argument is not of the same length as <code>x</code> and
<code>unknown</code> is not named.
</p>
<p>
Argument <code>unknown</code> in <code>NAToUnknown</code> should hold value that is
not already present in <code>x</code>. If it does, error is produced and one
can bypass that with <code>force=TRUE</code>, but be warned that there is no
way to distinguish values after this action. Use at your own risk!
Anyway, warning is issued about new value in <code>x</code>. Additionally,
caution should be taken when using <code>NAToUnknown</code> on factors as
additional level (value of <code>unknown</code>) is introduced. Then, as
expected, <code>unknownToNA</code> removes defined level in <code>unknown</code>. If
<code>unknown="NA"</code>, then <code>"NA"</code> is removed from factor levels in
<code>unknownToNA</code> due to consistency with conversions back and forth.
</p>
<p>
Unknown representation in <code>unknown</code> should have the same class as
<code>x</code> in <code>NAToUnknown</code>, except in factors, where <code>unknown</code>
value is coerced to character anyway. Silent coercing is also applied,
when “integer” and “numeric” are in question. Otherwise
warning is issued and coercing is tried. If that fails, <font face="Courier New,Courier" color="#666666"><b>R</b></font> introduces
<code>NA</code> and the goal of <code>NAToUnknown</code> is not reached.
</p>
<p>
<code>NAToUnknown</code> accepts only single value in <code>unknown</code> if
<code>x</code> is atomic, while “list” and “data.frame” methods
accept also “vector” and “list”.
</p>
<p>
“list/data.frame” methods can work on many components/columns. To
reduce the number of needed specifications in <code>unknown</code> argument,
default unknown value can be specified with component ".default". This
matches component/column ".default" as well as all other undefined
components/columns! Look in examples.
</p>
<h3>Value</h3>
<p>
<code>unknownToNA</code> and <code>NAToUnknown</code> return modified
<code>x</code>. <code>isUnknown</code> returns logical values for object <code>x</code>.</p>
<h3>Author(s)</h3>
<p>
Gregor Gorjanc
</p>
<h3>See Also</h3>
<p>
<code><a href="../../base/html/NA.html">is.na</a></code>
</p>
<h3>Examples</h3>
<pre>
xInt <- c(0, 1, 0, 5, 6, 7, 8, 9, NA)
isUnknown(x=xInt, unknown=0)
isUnknown(x=xInt, unknown=c(0, NA))
(xInt <- unknownToNA(x=xInt, unknown=0))
(xInt <- NAToUnknown(x=xInt, unknown=0))
xFac <- factor(c("0", 1, 2, 3, NA, "NA"))
isUnknown(x=xFac, unknown=0)
isUnknown(x=xFac, unknown=c(0, NA))
isUnknown(x=xFac, unknown=c(0, "NA"))
isUnknown(x=xFac, unknown=c(0, "NA", NA))
(xFac <- unknownToNA(x=xFac, unknown="NA"))
(xFac <- NAToUnknown(x=xFac, unknown="NA"))
xList <- list(xFac=xFac, xInt=xInt)
isUnknown(xList, unknown=c("NA", 0))
isUnknown(xList, unknown=list("NA", 0))
tmp <- c(0, "NA")
names(tmp) <- c(".default", "xFac")
isUnknown(xList, unknown=tmp)
tmp <- list(.default=0, xFac="NA")
isUnknown(xList, unknown=tmp)
(xList <- unknownToNA(xList, unknown=tmp))
(xList <- NAToUnknown(xList, unknown=999))
</pre>
<hr><div align="center">[Package <em>gdata</em> version 2.3.1 <a href="00Index.html">Index]</a></div>
</body></html>