ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have created a database with the following code:
Code:
CREATE TABLE IF NOT EXISTS public.data_table
(
id integer NOT NULL DEFAULT nextval('data_table_id_seq'::regclass),
reference_type character varying(40) NOT NULL,
reference_value character varying(100) NOT NULL,
CONSTRAINT data_table_pkey PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS public.crossref
(
source_ref integer NOT NULL,
target_ref integer NOT NULL,
CONSTRAINT crossref_pkey PRIMARY KEY (source_ref, target_ref),
CONSTRAINT crossref_source_ref_fkey FOREIGN KEY (source_ref)
REFERENCES public.data_table (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT crossref_target_ref_fkey FOREIGN KEY (target_ref)
REFERENCES public.data_table (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
);
CREATE INDEX IF NOT EXISTS fki_crossref_target_ref_fkey
ON public.crossref USING btree (target_ref);
-- this function will be called by the view
-- and assures the reference / value pair exists and will return the id.
CREATE OR REPLACE FUNCTION get_refid(_type varchar, _value varchar)
RETURNS integer AS
$BODY$DECLARE
ref_id INTEGER;
BEGIN
IF _type IS NULL OR _value IS NULL
THEN
RAISE EXCEPTION 'None of the arguments to get_refid(type,value) may be NULL';
ELSE
INSERT INTO data_table(reference_type,reference_value) VALUES (_type,_value) ON CONFLICT DO NOTHING;
SELECT INTO ref_id id FROM data_table WHERE reference_type=_type AND reference_value=_value;
END IF;
RETURN ref_id;
EXCEPTION WHEN unique_violation THEN
SELECT INTO ref_id id FROM data_table WHERE reference_type=_type AND reference_value=_value;
RETURN ref_id;
END;$BODY$
LANGUAGE plpgsql VOLATILE;
-- view (supports select + insert; update will come in due time)
CREATE OR REPLACE VIEW public.crossview AS
SELECT xr1.reference_type AS source_reference_type,
xr1.reference_value AS source_reference_value,
xr2.reference_type AS target_reference_type,
xr2.reference_value AS target_reference_value,
crossref.valid_from,
crossref.valid_to
FROM crossref
JOIN data_table xr1 ON crossref.source_ref = xr1.id
JOIN data_table xr2 ON crossref.target_ref = xr2.id;
CREATE RULE crossview_insert AS
ON INSERT TO crossview
DO INSTEAD
INSERT INTO crossref(source_ref, target_ref) VALUES (get_refid(NEW.source_reference_type, NEW.source_reference_value), get_refid(NEW.target_reference_type,NEW.target_reference_value));
This does get the job done. I noticed double entries on the data table (added a unique index on reference_type + reference_value, as is actually logical.
Now, single simple inserts on the view work as intended; However, something is not entirely as I would like it:
If I do, for example:
Code:
INSERT INTO crossview(source_reference_type,source_reference_value,target_reference_type,target_reference_value)
VALUES
('name','John','street', 'livestreet'),
('name','John','city','atown'),
('name','John','hobby','atoms');
Without unique constraint, "name" + "John" will appear multiple times in data_table; however, with the unique constraint only once (of course); however: the id skips up.. meaning nextval is called several times by the stored procedure... What is the correct way to go around and solve this? (yes, id is only 'a number'; yet: it does not feel clean... but can't really put my hands on it.
Agree. In quite a few systems I've worked with over the years, the practice was to generate primary-keys at random, ostensibly to keep the index-trees in better balance but maybe just to emphasize that primary-key values must "mean nothing at all." You do not and must not care that the numbers are, or aren't, consecutive ... only that they are unique.
In fact there was a small improvement to be made (actually, my fear was/is more than I run out of integers, rather than the gaps themselves; but then again, maybe I shouldn't have used the serial type for that.
Anyway the improvement is rather than insert and wait for the exception, is first lookup and act on "IF NOT FOUND" and then INSERT. Due to concurrent invocations of this procedure, I should still handle an exception and try to do the select again. So far this seems to work fine:
Code:
REATE OR REPLACE FUNCTION public.get_refid(
_type character varying,
_value character varying)
RETURNS integer AS
$BODY$DECLARE
ref_id INTEGER;
_tid INTEGER;
BEGIN
IF _value IS NULL
THEN
RAISE EXCEPTION 'get_refid() value may not be null.';
ELSE
_tid := get_typeid(_type);
SELECT INTO ref_id id FROM ref_data WHERE type_id=_tid AND reference_value=_value;
IF ref_id IS NULL
THEN
INSERT INTO ref_data(type_id,reference_value,create_ts) VALUES (_tid,_value,now()) ON CONFLICT DO NOTHING;
SELECT INTO ref_id id FROM ref_data WHERE type_id=_tid AND reference_Value=_value;
END IF;
END IF;
RETURN ref_id;
EXCEPTION WHEN unique_violation THEN
SELECT INTO ref_id id FROM ref_data WHERE reference_type=_tid AND reference_value=_value;
RETURN ref_id;
END;$BODY$
LANGUAGE plpgsql;
I suggest that your first approach was much more efficient and that you ought not be concerned at all about "gaps in the sequence."
Your "if not found" strategy has a race condition, unless you use a transaction with a very-restrictive isolation level.
Your original "insert multiple rows" strategy, while it still must be performed "in a transaction," could tolerate a much more liberal level.
Noone, apart from the database itself, should even know(!) what a "primary key" value is! If your application requires "an externally available unique identifier," it should generate such a value ... or, values ... by some means, and store them separately. Any external application that needs to refer to a particular row must use this "guaranteed unique, but otherwise nonsensical," identifier. ("UUID" a.k.a. "GUID" fields are very-commonly used for this purpose.)
Last edited by sundialsvcs; 05-31-2016 at 01:20 PM.
Well, it was not the gaps in themselves that I felt could be an issue; rather: since the serial type is limited I feared the end of this sequence would be reached in a relatively short term. For some reason (can't really find the exact why right now) the get_ functions would be called so often, that I noticed huge gaps (of an order of 100k) within a few hours. That got me wondering if the sequence would likely recycle and if that would break the database.
Should've thought about using a UUID; I can easily create one by adding the extension pgcrpyto and calling gen_random_uuid()
The only thing is that I'll have to alter the schema a bit to use these uuids rather than the sequence and integers. But the gain is obvious. Thanks for pointing that out.
Edit: fixed most now, I think:
Altered the tables
Code:
DROP VIEW xreference_view; -- because the view relies on the tables that get recreated.
DROP TABLE ref_types;
CREATE TABLE ref_types
(
id uuid NOT NULL DEFAULT gen_random_uuid(),
reference_type character varying(40) NOT NULL,
CONSTRAINT ref_types_pkey PRIMARY KEY (id)
);
DROP TABLE ref_data;
CREATE TABLE ref_data
(
id uuid NOT NULL DEFAULT gen_random_uuid(),
type_id uuid NOT NULL,
reference_value character varying(100) NOT NULL,
create_ts timestamp without time zone,
CONSTRAINT ref_data_pkey PRIMARY KEY (id),
CONSTRAINT type_id_fkey FOREIGN KEY (type_id) REFERENCES ref_types (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE
);
CREATE TABLE xreference
(
source_ref uuid NOT NULL,
target_ref uuid NOT NULL,
valid_from date,
valid_to date,
CONSTRAINT xreference_pkey PRIMARY KEY (source_ref, target_ref),
CONSTRAINT xreference_source_ref_fkey FOREIGN KEY (source_ref) REFERENCES ref_data (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT xreference_target_ref_fkey FOREIGN KEY (target_ref) REFERENCES ref_data (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
);
CREATE OR REPLACE VIEW xreference_view AS
SELECT
xr1.reference_type AS source_reference_type,
xr1.reference_value AS source_reference_value,
xr2.reference_type AS target_reference_type,
xr2.reference_value AS target_reference_value,
xreference.valid_from,
xreference.valid_to
FROM xreference
JOIN (
SELECT
ref_data.id,
xt1.reference_type,
ref_data.reference_value
FROM ref_data
JOIN ref_types xt1 ON ref_data.type_id = xt1.id
) xr1 ON xreference.source_ref = xr1.id
JOIN (
SELECT
ref_data.id,
xt1.reference_type,
ref_data.reference_value
FROM ref_data
JOIN ref_types xt1 ON ref_data.type_id = xt1.id
) xr2 ON xreference.target_ref = xr2.id;
Instead of a set of rules, instead added a trigger:
Code:
CREATE OR REPLACE FUNCTION update_xreference_view()
RETURNS trigger AS
$BODY$
BEGIN
--
-- Perform the required operation on xreference_view.
--
IF (TG_OP = 'DELETE') THEN
DELETE FROM xreference
WHERE
xreference.source_ref = get_refid(old.source_reference_type, old.source_reference_value)
AND xreference.target_ref = get_refid(old.target_reference_type, old.target_reference_value);
IF NOT FOUND
THEN
RETURN NULL;
END IF;
RETURN OLD;
ELSIF (TG_OP = 'UPDATE') THEN
UPDATE xreference
SET
source_ref = get_refid(new.source_reference_type, new.source_reference_value),
target_ref = get_refid(new.target_reference_type, new.target_reference_value),
valid_from = NEW.valid_from,
valid_to = NEW.valid_to
WHERE
xreference.source_ref = get_refid(old.source_reference_type, old.source_reference_value)
AND xreference.target_ref = get_refid(old.target_reference_type, old.target_reference_value);
RETURN NEW;
ELSIF (TG_OP = 'INSERT') THEN
INSERT INTO xreference(source_ref, target_ref, valid_from, valid_to)
VALUES (
get_refid(NEW.source_reference_type, NEW.source_reference_value),
get_refid(NEW.target_reference_type,NEW.target_reference_value),
NEW.valid_from, NEW.valid_to)
ON CONFLICT ON CONSTRAINT xreference_pkey DO
UPDATE SET
valid_from=EXCLUDED.valid_from,
valid_to=EXCLUDED.valid_to;
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER xreference_update
INSTEAD OF INSERT OR UPDATE OR DELETE
ON xreference_view
FOR EACH ROW EXECUTE PROCEDURE update_xreference_view();
and the procedures to create/find the right identifiers were modified to use UUID rather than integer and safe in regard to the race condition:
Code:
DROP FUNCTION get_refid(varchar, varchar);
DROP FUNCTION get_typeid(varchar);
CREATE OR REPLACE FUNCTION get_typeid(_type character varying)
RETURNS uuid AS
$BODY$
DECLARE
_tid uuid;
BEGIN
IF _type IS NULL
THEN
RAISE EXCEPTION 'Argument to get_typeid may not be null.';
ELSE
INSERT INTO ref_types(reference_type) VALUES (_type) ON CONFLICT DO NOTHING;
SELECT INTO _tid id FROM ref_types WHERE reference_type=_type;
END IF;
RETURN _tid;
EXCEPTION WHEN unique_violation THEN
SELECT INTO _tid id FROM ref_types WHERE reference_type=_type;
RETURN _tid;
END;
$BODY$
LANGUAGE plpgsql;
Last edited by Ramurd; 06-01-2016 at 11:03 AM.
Reason: Added fixes
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.